monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

Investigate loss of variant phenotype data from flybase #561

Closed kshefchek closed 5 years ago

kshefchek commented 6 years ago

Theres a drop in variant data in the latest ingest of flybase: https://archive.monarchinitiative.org/201801/ttl/flybase.nt

Edges (only indexed edges, all sources): is_expression_variant_of: 168690 (-26389)

Variant Phenotype Drosophila melanogaster: 0 (-50360)

Gene Phenotype Drosophila melanogaster: 189374 (-26184)

In the meantime I have reverted flybase back to our December ingest.

matentzn commented 6 years ago

There was a new major FB release on Dec 31 (both data and website), @dosumis believes the schema should not have changed, but, it could have. We will poke them and come back to you.

kshefchek commented 6 years ago

There have been no changes to the script, but in the latest load:

Edges (only indexed edges, all sources): is_expression_variant_of: 187220 (-7955)

Variant Phenotype Drosophila melanogaster: 47021 (-3339)

Gene Phenotype Drosophila melanogaster: 217731 (+2186)

Will still revert back to the old file until this is fixed, but interesting that a chunk of it came back. Diff files: https://data.monarchinitiative.org/qc/data-diff-201804090219/relations/is_expression_variant_of.tsv https://data.monarchinitiative.org/qc/data-diff-201804090219/variant_phenotype_associations/drosophila_melanogaster.tsv

kshefchek commented 5 years ago

Fixed with https://github.com/monarch-initiative/dipper/pull/760