pombase / pombase-chado

PomBase code for accessing Chado
MIT License
5 stars 3 forks source link

block false positive publications in gene page publication list #1162

Open ValWood opened 5 months ago

ValWood commented 5 months ago

We can think about a heuristic to do this...

kimrutherford commented 5 months ago

What is a false positive publication? Can you give an example?

ValWood commented 5 months ago

So in the case of gpd1, all of the publications mentioning gpd1 except https://www.pombase.org/reference/PMID:15620689 which refers to tdh1 as gpd1 are about https://www.pombase.org/gene/SPBC215.05 (gpd1)

See the response to recent pombelist post. We have some conflated names, but they don't usually cause problems because we usually know which gene is referred to when we curate and we would spot it. I occasionally correct BIoGRID because they are curating only interactions they don't need to look so closely at the context, but I think we rarely mix them up.

The issue is that in the "publication" list these papers will appear in both lists.

One thing we could do is have a "block list" so once a "small-scale" publication referring to an ambiguously named gene is curated in Camnto, we know which entity it refers to, and we could block the publication from other gene pages. We can discuss but it isn't urgent. You can decide how easy it would be.

ValWood commented 4 months ago

also see discussion about ppa3 here https://github.com/pombase/curation/issues/3439

CC @PCarme

We need to check why PMID:22267499 shows on the ppa3 protein page.

One option (I think we might have discussed) is to only use the genes listed in the session (or annotated) if a paper is curated, but use the PubMed list if it isn't.