phylotastic / Pt_Mobile_Application

Mobile Application for Phylotastic Project
2 stars 0 forks source link

fuzzy matching eg. of problem #68

Closed hdliv closed 7 years ago

hdliv commented 8 years ago

scraping this image: Phylotastic_Mobile_Application/test/images/hard/Phormidaceae.jpg

8 names are scraped including the words Periphytic and Thallus (not part of nomenclature)

With 'get tree' there is a tree with 3 tips that returns, where one tip is 'Thallis'

Thallis was not scraped from the initial list, and is an insect where the photo is of cyanobacteria.

Can we prevent this? Many times in nomenclature, one-two letters mean totally different entities in diverse clades.

arlin commented 8 years ago

The short answer is "yes, we can prevent this" . Right now we are using fuzzy matching in GNRD and fuzzy matching again in OTT. At some point in time, we need to implement strict matching and give the user options.

Meanwhile, another solution to the problem you are facing is simply to allow the user to edit names or enter them manually.

dimus commented 8 years ago

GNRD returns edit distance in case of fuzzy matching. We found that if edit distance is more than 1 the result is most likely wrong

hdliv commented 8 years ago

@arlin Thallus is scraped from the photo and not Thallis, so even with editing the name in the list, it won't give the correct tree -- at least in this case. Also, not all students will understand or notice that the scraped name is not the name in the tree. As for editing, is this issue already on waffle?

hdliv commented 8 years ago

@arlin We are now using exact matching, correct? I will move this to In review. Feel free to move it back if you feel it is appropriate @thanhnh-infinity