Open dustine32 opened 5 years ago
TABLE 1: Number of nodes in families over the different PANTHER Versions HEADER: PTHR ID | PANTHER version * PTHR ID | Number of nodes | Number of leaves
TABLE 2: IBD count per family
TABLE 3: IBA count per node
LIST 1: List of PTHR families that had annotations in the previous version and that have 0 annotations in current release
LIST 2: Comments and status
LIST 3: Remarks
Pascale
I have rough draft of the "table 3" report, in two versions, uploaded to the Drive folder: 2019-06-17-iba_count - Lists ancestor nodes and the count diffs of IBA GAF lines derived from them 2019-06-17-iba_count_mods_only - Same as above except that this only counts IBA GAF lines for the 12 MOD organisms:
"taxon:3702", # arabidopsis
"taxon:6239", # nematode_worm
"taxon:7955", # zebrafish
"taxon:44689", # dictyostelium
"taxon:7227", # fruit_fly
"taxon:227321", # aspergillus
"taxon:83333", # e_coli
"taxon:9031", # chicken
"taxon:10090", # mouse
"taxon:10116", # rat
"taxon:559292", # budding_yeast
"taxon:284812" # fission_yeast
A few details of note about these lines:
Panther ##.# family
count col for the version where it's absent.@pgaudet @huaiyumi We found a bug in how the GAFs used for this report were generated. Basically, one of the input file paths was hard-coded to a 13.1-specific node file, which prevented A LOT of 14.1 IBAs from being written. I fixed the file paths and regenerated the 14.1 GAFs and corresponding reports:
2019-06-25-iba_count 2019-06-25-iba_count_mods_only
We should probably delete the 2019-06-17 reports or at least mark them inaccurate.
@pgaudet For lists 2 and 3 above (curation status and comments), should we limit these lists to only records created/updated during update? The previous curation status list I sent you was the whole table.
We now have a query/script for generating paint annotation diffs (at least by count) between Panther versions (e.g. 13.1 vs 14.1). I'll need to parameterize this (along with some other changes) for it to plug into the update pipeline.
Also automatically publish these logs to Google drive folder, possibly through an API hook.