Closed ajo2995 closed 8 years ago
It seems to me the correct approach to this is:
We should NOT attempt to extract IDs from freetext fields. That way lies madness.
That sounds tractable. I'll test drive another text field type with the comment
Can insert dashes in the wrong place and still match
I could also split descriptions on white space and add words that have at least 1 digit and 1 letter to the _terms field so they get into suggestions
Some genes only have their names in the gene description, for example QSH-1 in oryza sativa japonica (OS01G0848400).
This will require some analysis to see how well we can identify gene labels from free text. Genes with a non-id name or a set of single word synonoms can be used as a test set.