monarch-initiative / monarch-app

Monarch Initiative website and API
https://monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
18 stars 6 forks source link

Add Knowledge Level and Agent Type fields #675

Closed kevinschaper closed 7 months ago

kevinschaper commented 7 months ago

To meet a Translator milestone we need to populate Knowledge Level and Agent Type on all of our edges in the 2024-04 Release.

Here are some guidelines: https://github.com/NCATSTranslator/ReasonerAPI/blob/master/ImplementationGuidance/Specifications/knowledge_level_agent_type_specification.md

With the short turnaround, the goal will be to populate easy to find values first and at least be able to validate by setting others to not provided - also to prioritize edges in our Gene/Disease/Phenotype triangle.

kevinschaper commented 7 months ago

This is merged and deployed to beta.monarchinitiative.org, with deployment to production to follow in the next day or two

colleenXu commented 5 months ago

I'm wondering: how complex is your implementation?

For example: do all the edges with the same subject-category/subject-namespace/predicate/object-category/object-namespace have the knowledge level / agent_type info?

kevinschaper commented 5 months ago

I think right now they're just fixed for each of our ingests, so even less granular than that.

colleenXu commented 5 months ago

Thanks for the quick reply!

I'm wondering if I can set a knowledge_level / agent_type for each of those combos for BTE/Service Provider's use in its annotation of this API, in the short-term. It sounds like I can but...

I'm not sure if each combo only has one ingest.

kevinschaper commented 5 months ago

edge_stats.csv

Here's a distinct + count on subject_category, subject_namespace, predicate, object_category, object_namespace, knowledge_level, agent_type, provided_by (provided_by being where I'm sticking the ingest name) that might be useful. It's a tsv, but GitHub doesn't like the tsv extension, so renamed it to csv.

kevinschaper commented 5 months ago

definitely still fixed per ingest though:

knowledge_level agent_type provided_by count_star()
logical_entailment automated_agent hpoa_gene_to_phenotype_edges 308455
knowledge_assertion manual_agent zfin_gene_to_phenotype_edges 149558
knowledge_assertion manual_agent ctd_chemical_to_disease_edges 5644
knowledge_assertion not_provided panther_genome_orthologs_edges 551212
knowledge_assertion not_provided reactome_chemical_to_pathway_edges 68572
knowledge_assertion manual_agent go_annotation_edges 2591280
knowledge_assertion manual_agent alliance_gene_to_phenotype_edges 304589
not_provided not_provided phenio_edges 775320
knowledge_assertion manual_agent alliance_gene_to_expression_edges 1879092
knowledge_assertion not_provided bgee_gene_to_expression_edges 436178
knowledge_assertion not_provided reactome_gene_to_pathway_edges 203937
knowledge_assertion manual_agent hpoa_gene_to_disease_edges 15327
knowledge_assertion manual_agent dictybase_gene_to_phenotype_edges 1216
knowledge_assertion manual_agent pombase_gene_to_phenotype_edges 169665
knowledge_assertion not_provided biogrid_edges 1418164
knowledge_assertion manual_agent hpoa_disease_mode_of_inheritance_edges 8535
knowledge_assertion manual_agent hpoa_disease_to_phenotype_edges 245981
knowledge_assertion not_provided string_protein_links_edges 1475505
knowledge_assertion manual_agent xenbase_gene_to_phenotype_edges 2232
colleenXu commented 5 months ago

Noticed for this line: the source says hpoa, but the actual edges' sources say omim/medgen

biolink:Gene HGNC biolink:causes biolink:Disease MONDO knowledge_assertion manual_agent hpoa_gene_to_disease_edges 6707