pantherdb / fullgo_paint_update

Update of Panther and PAINT DBs with monthly GO release data
0 stars 0 forks source link

Handle qualifiers for new GAF 2.2 spec #45

Open dustine32 opened 4 years ago

dustine32 commented 4 years ago

From @thomaspd's email:

Hi all,

A change in the GAF format is coming, probably in July. The only change will be the qualifier column. For BP annotations, there will be additional qualifiers that describe the relationship between the gene product and the BP term. We will need to change our parsers to parse this out, and make it part of the GO load each month.

So all the proposed changes in this email are not urgent, but should be done by July.

We will also want to make use of this information in the enrichment tool. We will want to distinguish between two different types of BP: ones where the gene product has the “part_of” qualifier, versus ones that do not. Some organisms will distinguish between these, while others will not. So we’ll have to keep track of which organisms have BP annotations of these two different types. If they don’t distinguish them, there will be no change for the enrichment tool. If they do distinguish them, there will be an additional choice in the dropdown menu in addition to GO BP complete, something like “GO BP directly involved”, which will be the set that has only BP annotations that have the part_of qualifier. GO BP complete will continue to have all BP annotations.

Thanks,

Paul.

More info here: https://github.com/geneontology/go-annotation/issues/2917. This has ramifications for both PANTHER and PAINT loads.

PANTHER It looks like the genelist_agg table may need to be adjusted to retain qualifiers with the GO terms associated with a gene. Qualifiers are already somewhat factored into loading the genelist_agg table as NOT annotations are excluded.

PAINT The qualifier column is already parsed and loaded into the Curation DB for GO annotaions. But with more GO annotations having qualifiers that likely won't (initially) match the PAINT annotation qualifiers, mismatches will cause many PAINT annotations to be obsoleted during a full GO update. Maybe we should have some sort of rule-based update followed by manual review for the initial load of the newly formatted GAFs?

It would be nice to get some sample, preview data. Like, an exp GO annotation currently used as evidence in PAINT that also has qualifiers in whatever curation tool it's source from (e.g. Protein2GO).

The GAF creation script likely won't need much modification but I believe there are some regexes used that specifically look for CONTRIBUTES_TO and COLOCALIZES_WITH.

Also tagging @huaiyumi and @mugitty

dustine32 commented 3 years ago

@pgaudet @thomaspd @huaiyumi For exporting the IBAs to GAF 2.2, what should the default qualifiers be by aspect?

Also, if an IBA has a NOT qualifier, it will be included as usual with the "relation" qualifier determined above (e.g. enables, part_of). If the IBA also has a contributes_to or colocalizes_with qualifier, should these be appended to the IBA's qualifier list or replace the "relation" qualifier?

thomaspd commented 3 years ago

I just looked at: http://wiki.geneontology.org/index.php/Involved_in, and the gp -> bp for PAINT should be involved_in

dustine32 commented 3 years ago

@thomaspd Sweet! Thanks for straightening this out.

And for the existing qualifiers in PAINT (contributes_to, colocalizes_with) I'm now assuming these should replace the default qualifier if exists, since, for example, an annotation with both contributes_to and involved_in seems either redundant and/or confusing, right?

pgaudet commented 3 years ago

If there already is a qualifier, you keep that qualifier.

dustine32 commented 3 years ago

@thomaspd @pgaudet Doh! I just noticed we may be using the wrong default qualifier for CC after rereading this - https://github.com/geneontology/go-annotation/issues/2917#issue-594500313. Should the CC default be located_in instead of part_of?

pgaudet commented 3 years ago

No, I dont think PAINT should be using default qualifiers. Default qualifiers are used when we are not sure the protein is active in the specified location (or plays part in a process). In PAINT our annotation guidelines are to only propagate CC that are consistent with the role of the protein.

Thanks, Pascale

pgaudet commented 3 years ago

Actually, looks like 'is_active_in' in allowed, this is the best one for PAINT.

dustine32 commented 3 years ago

@pgaudet Thanks! I'll use is_active_in for CC then. I should note these default qualifiers only come into play when exporting the IBAs to GAF 2.2, since the qualifier column now requires a value. These default qualifiers won't be stored in PAINT and you won't see them in the tool.

dustine32 commented 3 years ago

@pgaudet @thomaspd Looks like we also have IBDs to protein-containing complex descendants. For complex terms, the default qualifier should be part_of, right?

pgaudet commented 3 years ago

Yes !

dustine32 commented 3 years ago

Commit https://github.com/pantherdb/fullgo_paint_update/commit/732ffe8bf52433e4cd9390bae4c4d06c797543b2 prevents new gp2term relations from GAF 2.2 from getting into PAINT and PANTHER.

Noting that this is a temporary, short-term (and easily revertible) solution to get the GAF 2.2-sourced annotations into PAINT/PANTHER without mucking up the existing load process. We'll discuss/document the actual policy to implement on a PAINT call.

pgaudet commented 3 years ago

Thanks @dustine32 Added to the next call's agenda: http://wiki.geneontology.org/index.php/1_Jun_2021_PAINT_Conference_Call

@huaiyumi