pantherdb / fullgo_paint_update

Update of Panther and PAINT DBs with monthly GO release data
0 stars 0 forks source link

IRD without NOT qualifier #51

Closed huaiyumi closed 3 years ago

huaiyumi commented 3 years ago

I noticed a few annotations with IRD evidence code but no NOT qualifier in the IBD file. After going through some of the trees, I think the problem is caused by multiple qualifiers on those nodes. The IBD gaf only takes one qualifier randomly. Here are two examples:

PANTHER PTN000806051 PTN000806051 contributes_to GO:0008121 PMID:21873635 IRD PANTHER:PTN002228081 F protein taxon:10228 20191016 GO_Central PANTHER PTN000806047 PTN000806047 NOT GO:0008121 PMID:21873635 IRD PANTHER:PTN002228081 F protein taxon:684364 20191016 GO_Central PTHR10134:AN294

Both nodes have both NOT and _contributesto, but one was randomly used in the IBD file for the node. Here are a few families that miss the NOT qualifier: PTHR10134 PTHR10221 PTHR11361 PTHR12604

There could be other families that miss the _contributesto qualifier.

dustine32 commented 3 years ago

Multiple qualifiers are supported in both GAF 2.1 and 2.2 and separated by | characters. Just need to ensure the createGAF.pl script is emitting NOT|contributes_to for these example IRDs.

dustine32 commented 3 years ago

@huaiyumi With this fix, the resulting IBD file and your examples in particular look much better:

PANTHER PTN000806051    PTN000806051    NOT|contributes_to  GO:0008121  PMID:21873635   IRD PANTHER:PTN002228081    F           protein taxon:10228 20191016    GO_Central
PANTHER PTN000806047    PTN000806047    NOT|contributes_to  GO:0008121  PMID:21873635   IRD PANTHER:PTN002228081    F           protein taxon:684364    20191016    GO_Central

I sent you the full IBD file for review.

huaiyumi commented 3 years ago

Looks good.