Closed shiltemann closed 2 months ago
@Freymaurer can you move this to Swate?
Hey! Could you pls open two issues for this? As the two problems you describe are not related to each other. Feel free to keep this one for Template search and open another one for term search.
done :+1:
The reason behind this behavior is our search algorithm. We use sorensen dice on string bigrams. A lot of fancy words for "we look for similiarity and the more similiar the two strings we compare the higher the score", and to filter out unfit results we apply a threshold. In your example "ENA - " has actually more similiarity to SRA - Sequencing
than to the longer ENA names. For example in "ENA - Gene promoter annotated sequence", we have ~30 missmatch characters. In "SRA - Sequencing" we have only 11 missmatch characters. This very flexible calculation allows for semi-similiar result search. To avoid your described issues we know adjust the score as follows:
[!NOTE] Threshold is 0.3
When searching templates in ARCitect, the search results are sometimes suboptimal.
OS and framework information:
Describe the bug Example:
ENA - XXXX
ENA
in search bar turns up 0 resultsENA -
in search bar now turns up just one of the resultsScreenshots example 1, template search
For several templates named
ENA - ...
we see there are several in the llist of templates:However when we enter
ENA
in the search box we get no results:And when we type
ENA -
in the search bar we get one of them as a result: