Open dustine32 opened 4 years ago
@pgaudet I've implemented the IBA block for PAINT vs. exp NOT qualifier conflicts but have not yet pushed any new IBA files. I did a test run and generated a before/after report tracking IBA count differences.
Would you be able to spot-check this report for any unintended effects? What works for me is plugging the PTHR family and GO term into amigo and then looking for the NOT. Otherwise, I'm working on getting the actual list of to-be-dropped IBA lines (there are 269).
(Taking notes for myself)
For testing, I generated two sets of IBA GAFs (before and after code change) and ran these commands to get all dropped IBAs:
$ cat 2019-11-20_fullgo_test/IBA_GAFs/* > 2019-11-20_fullgo_test/all_IBAs
$ cat 2019-11-20_fullgo_test/preupdate_data/IBA_GAFs/* > 2019-11-20_fullgo_test/preupdate_data/all_IBAs
$ diff -u 2019-11-20_fullgo_test/preupdate_data/all_IBAs 2019-11-20_fullgo_test/all_IBAs | grep -E "^\-" > 2019-11-20_fullgo_test/dropped_IBAs_raw
$ grep -v "Created on" 2019-11-20_fullgo_test/dropped_IBAs_raw | grep -v "2019-11-20_fullgo_test" | sed 's/^-//' > 2019-11-20_fullgo_test/dropped_IBAs
$ wc -l 2019-11-20_fullgo_test/dropped_IBAs
324
Meaning 324 IBAs were dropped due to this code change. However, this number doesn't line up with the report, which says 269 lines were dropped. Spot-checking some of the lines having IBD PTNs not in the report (e.g. PTN001998491) I notice that these lines are in both before and after IBA files having no difference as far as I can tell (tried several diff options and looking for hidden characters). Guessing diff
is playing tricks on me or something.
I can xref the report's IBD nodes to filter out lines that shouldn't be there.
Hi @dustine32
Do you mean that this script gets rid of the inferred NOT IBA here (from PTHR13271)?
I also checked PTHR10024 - it also seems OK.
Probably the way to be sure is if you exported the GAF for each of the impacted families - is that 'easy' ? Thanks, Pascale
@pgaudet Yep, that inferred NOT IBA should be removed by the code change due to its conflict with that positive IDA.
That's a great idea about just getting the GAFs for the impacted families. That might also clear up the weirdness I'm seeing trying to get an accurate diff of dropped lines.
@pgaudet Finally, I've got an accurate list of dropped IBAs for you to look at, though I used a mixed application of your idea to only output impacted families with my previous diff-ing and grep-ping attempts.
Basically, outputting all IBA GAFs for the IBD PTNs in the before/after report and then applying the diff/grep commands above gets me to the expected 269 count. This GAF file is uploaded to the google drive for your downloading convenience.
For your PTHR13271 peptidyl-lysine trimethylation (GO:0018023) example. Only one IBA was shown as dropped:
UniProtKB Q86TU7 SETD3 GO:0018023 PMID:21873635 IBA PANTHER:PTN000998435|ZFIN:ZDB-GENE-030131-9137 P Histone-lysine N-methyltransferase setd3 UniProtKB:Q86TU7|PTN002491248 protein taxon:9606 20170228 GO_Central
But this one is positive (no NOT qualifier). I actually answered your question earlier without knowing the gene that the IBA in question was for, so... is this (UniProtKB:Q86TU7) your card (gene)?
As explained in https://github.com/pantherdb/fullgo_paint_update/issues/30#issuecomment-549941317 I'll need to implement a check in the IBA generation script that blocks an IBA annotation to a leaf if that specific leaf has an experimental annotation with conflicting qualifier. Right now, I'll only check "NOT" vs "no qualifier" conflicts. I believe matching other qualifiers like "contributes_to" is still in discussion.
An example case is shown here: The IBD on PTN000185192 is still valid and can be used to propagate to its other descendant leaf sequences, but the experimental NOT IGI annotation on
PomBase:SPAC1B3.15c
should block IBA propagation to this leaf.Related tickets: https://github.com/geneontology/paint/issues/54 https://github.com/geneontology/go-annotation/issues/2378