pantherdb / fullgo_paint_update

Update of Panther and PAINT DBs with monthly GO release data
0 stars 0 forks source link

Errors with identifiers #26

Open pgaudet opened 5 years ago

pgaudet commented 5 years ago

Hello @dustine32

According to the error reports, there are a lot of invalid IDs in PAINT files: http://snapshot.geneontology.org/reports/gorule-report.html, rule 27, for example:

http://snapshot.geneontology.org/reports/paint_mgi-report.html#gorule-0000027 http://snapshot.geneontology.org/reports/paint_other-report.html#gorule-0000027 Can you please have a look ?

Thanks, Pascale

dustine32 commented 5 years ago

@pgaudet This sounds related to this https://github.com/geneontology/go-site/issues/765 , which involved adjusting some metadata files to allow the TAIR:locus:##### identifiers through @tonysawfordebi's GOA validation. Could it be that gorule 27 just isn't evaluating identifiers in the same fashion?

Other than the TAIR ID, I'm not sure which ID's (if they're in the with/from field) could be failing these tests. @dougli1sqrd Would you be able to pinpoint some of the failing ID's here? Could just be the double-"MGI:" dealio? Thanks!

dustine32 commented 5 years ago

@dougli1sqrd pointed out to me yesterday that it's the "||" double-pipes in the with/from field that are triggering the error. Will need to debug the GAF generation script to further identify source of oddity.

dictyBase DDB_G0279641 sae1 GO:0016925 PMID:21873635 IBA PANTHER:PTN000102043||TAIR:locus:2159727|WB:WBGene00000142|SGD:S000006384|PomBase:SPAC4C5.04|FB:FBgn0029512|UniProtKB:Q9UBE0 P SUMO-activating enzyme subunit 1 UniProtKB:Q54WI4|PTN000860276 protein taxon:44689 20170228 GO_Central

http://release.geneontology.org/2018-10-08/reports/paint_dictybase-report.html#gorule-0000027

pgaudet commented 5 years ago

Can this be high priority ? It affects many annotations.

Thanks, Pascale

dustine32 commented 5 years ago

@pgaudet Yeah, this shouldn't be too difficult to figure out before the next monthly update in a few days.

dustine32 commented 5 years ago

Testing the fix resulted in the missing TAIR:locus:2832477 appearing in the with/from field. No double-pipes:

$ grep DDB_G0279641 2018-10-22_fullgo_test/IBA_GAFs/gene_association.paint_dictyBase.gaf | grep GO:0016925
dictyBase   DDB_G0279641    sae1        GO:0016925  PMID:21873635   IBA PANTHER:PTN000102043|TAIR:locus:2159727|WB:WBGene00000142|SGD:S000006384|PomBase:SPAC4C5.04|TAIR:locus:2832477|FB:FBgn0029512|UniProtKB:Q9UBE0  P   SUMO-activating enzyme subunit 1    UniProtKB:Q54WI4|PTN000860276   protein taxon:44689 20170228    GO_Central

So this should be good for this month's update.

pgaudet commented 5 years ago

thanks !