Closed putmantime closed 2 years ago
As a first pass, let's make a file with these columns
edge_file_name, count of total edges, count of edges with missing subject or object
I added myself to this, I'm working in the PR to connect the report to the rest of the code
@kevinschaper I have filled out the functions that we wrote last week but i have not updated the PR, would you like me to update it first before you connect the rest of the code?
We would like to report the following metrics:
[ ] CURIE prefixes that we have no nodes for
[ ] Unique Nampespaces (set of CURIES in our graph)
[ ] Counts: association type, triple type, node category
[ ] Per ingest stats: counts by category, list of prefix, biolink schema
Two scenarios:
Before we merge in ontologies
After we merge in ontologies
Jupyter notebook that pulls in Nodes and edges files and reports above metrics. Needs to run on the most recent dated directory. This data lives on the monarch-ingest google bucket.
Lets look at google colab notebook for developing this.