pombase / curation

PomBase curation
7 stars 0 forks source link

check GO_REFs #449

Closed pombase-admin closed 7 years ago

pombase-admin commented 10 years ago

if you're adding GO annotations with ISS or ISO, the reference to use is GO_REF:0000024 (see http://www.geneontology.org/cgi-bin/references.cgi). IC annotations should cite the same reference as you use for the annotation to the "from" GOID

(Maybe we can get a chado check for this though?)

Original comment by: ValWood

pombase-admin commented 9 years ago

I hereby claim this as my Friday afternoon task -- I've done the ISS/ISO/etc. corrections for chr1 and chr2 (chr3 still to do). The ICs will take longer, because they'll have to be done individually, but there will be more Friday afternoons.

Original comment by: mah11

pombase-admin commented 9 years ago

I wouldn't worry about the IC's....I think most of them are probably filtered, and hey are in decline....

in fact current e code status

98 EXP 938 IC 11831 IDA 5343 IEA 94 IEP 1000 IGI 1 IKR 4276 IMP 1388 IPI 614 ISM 6857 ISO 2103 ISS 1378 NAS 2430 ND 482 RCA 552 TAS

and refs for IC, 836 GO_REF:0000001 IC 4 GO_REF:0000024 IC 7 GO_REF:0000036 IC

so not many to fix......(provided 0000001 is the correct one?) if you do the grep witht he sys ID column too you will get the ones to fix... v

Original comment by: ValWood

pombase-admin commented 9 years ago

A great deal of the point of this ticket is that GO_REF:0000001 is wrong for IC -- it's "unpublished". For IC, it can't be globally replaced with any one ref; it has to be the same ref as supports the annotation used in the "from" field.

There are therefore 836 ICs to fix (or delete, if they've really been superseded).

Original comment by: mah11

pombase-admin commented 9 years ago

The way the pipeline works is that IC's are suppressed if there is another annotation, so quite often they will be suppressed one release and then appear the next because it is arbitrary whether IEA/TAS/IC or NAS are kept. It might be better to make the IC's lowest priority which would suppress most of them (No need to delete, unless they are incorrect, good to keep them in case the IEAs are removed because they turn out to be non specific for the family).

Basically, the ones which are showing this month might be different from the ones which are showing next month.....

I don't think its a big deal that the ref is wrong as we don't really make new IC annotations and they will all disappear eventually (it is always arbitrary whether I made an IC or an ISS/ISO).

Original comment by: ValWood

pombase-admin commented 9 years ago

It actually seems odd to me to put the ID of the publication in the ref field because the annotation didn't come from the publication. This was inferred from the other GO term, and would/should always be true, whatever the source of the annotation, and in this respect it is independent of the publication. It's the curator who is making the link not the publication author. In most cases it seems odd to have these annotations attributed to the publication, because you would never find the assertion int he paper, its based on biological knowledge.

Original comment by: ValWood

pombase-admin commented 9 years ago

I would rather we spent the time fixing the annotations and working out why the ICs are necessary in the first place (there should always be a better source), than fixing the ref....I have been doing this slowly...

See for example https://sourceforge.net/p/geneontology/ontology-requests/11275/ will suppress 2 redundant IC annotations

I will ask Kim to put IC's lower in the filtering hierarchy if the refs bother you, then there will be fewer to fix.

Original comment by: ValWood

pombase-admin commented 9 years ago

So , most of the examples are like this

arc 1 is shown by experiment to be part of

GO:0005885 Arp2/3 protein complex and biochemically to be involved in branched actin nucleation

I know that Arp2/3 protein complex is the nucleator of the actin patches (because these are the branched actin in yeast) and so I can add this by IC based on complex membership

I know that is involved in endocytosis so I can add this by IC.

I think that attributing these annotations to a paper which does not mention these processes is misleading (In the same way that using the same evidence for a F-P link is misleading). Also it won't necessarily be the same ref for each IC from Arp2/3 protein complex, because the link between Arp2/3 protein complex and the B terms is independent of the publication. It comes from curator knowledge.

There are multiple other ways I could have made this annotation and non of them would be directly related to the paper which the GO annotation I IC'd from was derived. Usually I can ISO to SGD but here they have not made this particular annotation yet (although I am sure that they could. Sometimes I mail them and ask them too, but sometimes its just quicker to use IC).

Maybe a better solution would be to ask GO to allow GO_REF:0000001 because these inferences are essentially based on knowledge which is not implicitly provided in the papers which the GO term which was IC'd from is derived.

Original comment by: ValWood

pombase-admin commented 9 years ago

Citing the same paper as for the "from=" entry has been the GO recommendation for IC for as long as the IC code has been in use (there's a more recent enhancement involving GO_REF:0000036 for ICs from 2 or more GO terms that themselves come from different refs):

http://geneontology.org/page/ic-inferred-curator

It makes sense because the curator needs a reason to make the first annotation, and that's what the reference indicates. The IC code itself represents the curator adding biological knowledge to get another annotation. It's a bit hacky, but unless GO ever really does get round to implementing "chains of evidence" to make the steps more explicit it's not bad. I don't think it's improved by ignoring GO's reference-citing recommendations.

I would not support a bid to allow "unpublished" for IC. Correcting the existing ones is a low priority (as indicated above) but if we do need to make any new ICs we should do them consistently with established GO practice.

Incidentally, using the same evidence for an F-P link is also not misleading; it follows logically, the same as any other inference over part_of. If you find a case where the same evidence won't do, the F-P link itself would have to be reconsidered. (Any P has_part F links aren't the issue, because GO annotations don't propagate over has_part anyway.)

Original comment by: mah11

pombase-admin commented 9 years ago

I will ask Kim to put IC's lower in the filtering hierarchy if the refs bother you, then there will be fewer to fix.

Shall I make a chado ticket for that?

Original comment by: kimrutherford

pombase-admin commented 9 years ago

to be honest, I wouldn't bother. nothing about the ICs is that big a deal.

Original comment by: mah11

pombase-admin commented 9 years ago

Agreeed, everytign you say is correct, and the ref is worng. So new IC annotations, use correct ref.

They are disappearing quite quickly, I was looking at this on the train yesterday May 2013 Oct 2014
EXP 57 98 + IC 1512 940 -
IDA 10604 11874 + IEA 5408 5344 - IEP 353 94 - IGI 1008 971 - IKR 1 1
IMP 4262 4272 + IPI 1193 1393 + ISM 1575 613 - ISO 6484 6882 + ISS 2143 2103 - NAS 1591 1374 - ND 2635 2432 -
RCA 510 482 - TAS 604 550 -

So a good reduction in ICs this year. I looked at a few today and most are just sheer laziness on my part, I should be asking SGD to make the annotation and ISo'ing... a number are missing ontology parentage now I look at them more closely.

So as a general rule for curators, if you have curated a gene and there are still IC's present (or IEA/TAS/NAS), its worth considering why (....usually missing annotation that could be made with an experimental code, better as ISO, or ontology issue).... and keep chipping away.

Original comment by: ValWood

pombase-admin commented 9 years ago

Original comment by: ValWood

mah11 commented 8 years ago

use GO_REF:0000111 for IC from ISS

mah11 commented 7 years ago

finally done!

ValWood commented 7 years ago

Woo Hoo