pombase / canto

The PomBase community curation tool
https://curation.pombase.org
Other
19 stars 7 forks source link

lucene search weighting for main DNA binding TF term #1083

Closed ValWood closed 8 years ago

ValWood commented 8 years ago

Is there any way we can 'weight' this term, so that if somebody types "transcription factor' this comes near to the top of the list, as its the most likely term, but I never see it. I always need to look it up on a TF gene page.

GO:0000982 transcription factor activity, RNA polymerase II core promoter proximal region sequence-specific binding

kimrutherford commented 8 years ago

I think that's possible, but it's going to be a bit of a hack. I'll try it in the test Canto.

ValWood commented 8 years ago

I'll also look into whther this can be improved with synonyms. "Transcription factor" as a search returns about 500 GO terms, need to be able to locate the main ones easily

kimrutherford commented 8 years ago

I think that's possible, but it's going to be a bit of a hack.

I've added a bit of a hack that reads term weights from the configuration file. GO:0000982 now has twice the default weight so should appear further up the search list. Each change of the weights needs a re-load the ontologies to change the index. It could be a bit of trail and error to get the right weight.

kimrutherford commented 8 years ago

There's good news and bad news.

Good news: "transcription factor activity, RNA polymerase II core promoter proximal region sequence-specific binding" is now at the top of the list when you type "transcription factor" in the test Canto.

Bad news: it's also top of the list if you type "RNA", "core" or "proximal" :-(

Maybe a better strategy would be allow extra artificial synonyms to be configured. If we could arrange for that term to have "transcription factor" as a synonym then it would pop to the top without extra work. We'd probably want to hide these artificial synonyms from the user. I'm not sure how to implement that so I'll have a think.

ValWood commented 8 years ago

I think that might be OK.... (certainly for core and proximal as they aren't used much if at all in any other contexts).

If its annoying we can remove it.

ValWood commented 8 years ago

Actually this should have broad synonym "transcription factor" I'll ask GO to do that.....

ValWood commented 8 years ago

Closing, GO added synonym.