Closed caufieldjh closed 1 year ago
where does this come from?
I extracted these from Bioportal database dumps. Because many of these prefixes only show up in the originally uploaded OWL, they aren't strictly "canonical" in the sense of being primary IRI prefixes, but all are used in one or more Bioportal ontologies.
Can you give more insight in how you extracted them?
Sure! The Bioportal backend stores all ontologies in an aging but still functional 4store RDF DB (though this will change quite soon). I have the dump of this DB as of July 20, 2022. Each entry contains the full set of triples for the most recent submission of each ontology, as RDF n-triples.
From there, I use Bioportal-to-KGX to transform all ontologies to KGX TSV nodes/edges, converting node IDs to CURIEs wherever possible. I then check for any remaining IRIs with this script.
So there are three caveats re: getting a full set of prefixes from BioPortal:
For purposes of automating ETL, many (perhaps all?) IRIs may be retrieved through the BP API, though that's not what I'm currently doing.
See also #4
These prefixes are curated from Bioportal ontologies. CURIE names correspond to Bioportal entries (e.g., NCBITAXON).