pantherdb / fullgo_paint_update

Update of Panther and PAINT DBs with monthly GO release data
0 stars 0 forks source link

Log paint annotation change counts for each update #31

Open dustine32 opened 5 years ago

dustine32 commented 5 years ago

We now have a query/script for generating paint annotation diffs (at least by count) between Panther versions (e.g. 13.1 vs 14.1). I'll need to parameterize this (along with some other changes) for it to plug into the update pipeline.

Also automatically publish these logs to Google drive folder, possibly through an API hook.

pgaudet commented 5 years ago

TABLE 1: Number of nodes in families over the different PANTHER Versions HEADER: PTHR ID | PANTHER version * PTHR ID | Number of nodes | Number of leaves

TABLE 2: IBD count per family

TABLE 3: IBA count per node

LIST 1: List of PTHR families that had annotations in the previous version and that have 0 annotations in current release

LIST 2: Comments and status

LIST 3: Remarks


Pascale

pgaudet commented 5 years ago

Please save to: https://drive.google.com/drive/folders/1MrtIQVmtdfd6gJhVcEfofXrU0IIPnOW7

dustine32 commented 5 years ago

I have rough draft of the "table 3" report, in two versions, uploaded to the Drive folder: 2019-06-17-iba_count - Lists ancestor nodes and the count diffs of IBA GAF lines derived from them 2019-06-17-iba_count_mods_only - Same as above except that this only counts IBA GAF lines for the 12 MOD organisms:

    "taxon:3702",       # arabidopsis
    "taxon:6239",       # nematode_worm
    "taxon:7955",       # zebrafish
    "taxon:44689",      # dictyostelium
    "taxon:7227",       # fruit_fly
    "taxon:227321",     # aspergillus
    "taxon:83333",      # e_coli
    "taxon:9031",       # chicken
    "taxon:10090",      # mouse
    "taxon:10116",          # rat
    "taxon:559292",     # budding_yeast
    "taxon:284812"      # fission_yeast

A few details of note about these lines:

dustine32 commented 5 years ago

@pgaudet @huaiyumi We found a bug in how the GAFs used for this report were generated. Basically, one of the input file paths was hard-coded to a 13.1-specific node file, which prevented A LOT of 14.1 IBAs from being written. I fixed the file paths and regenerated the 14.1 GAFs and corresponding reports:

2019-06-25-iba_count 2019-06-25-iba_count_mods_only

We should probably delete the 2019-06-17 reports or at least mark them inaccurate.

dustine32 commented 5 years ago

@pgaudet For lists 2 and 3 above (curation status and comments), should we limit these lists to only records created/updated during update? The previous curation status list I sent you was the whole table.