samoonpride / line-bot-webhook-frontend

01418499 - CS Project | Senior Project
0 stars 0 forks source link

research the threshold percentages to be classified as a duplicated issue #10

Closed thewro11 closed 8 months ago

thewro11 commented 9 months ago

As the meeting from #9, we need to clarify this problem on the next meeting.

Acceptance Criteria:

Natdadai commented 9 months ago

Text (Thai) การทำ word2vec pretrain model voice to text (Eng & Thai) model location Euclidean distance

thewro11 commented 8 months ago

TurnItIn website states threshold percentage for document plagiarizing classification as below:

Blue: No matching text Green: One word to 24% matching text Yellow: 25-49% matching text Orange: 50-74% matching text Red: 75-100% matching text

So we may be able to use 50-100% matching texts as duplicated issue classification.

thewro11 commented 8 months ago

Location distance is difficult to define. We decided to set the specific amount (25 meters) and then increase or decrease this value based on practical usage.