Open joeflack4 opened 10 months ago
Fantastic idea! THANK YOU!
This topic came up when working on https://github.com/monarch-initiative/mondo-ingest/pull/394.
Nico suggestion on how to do this:
Start with EPM. Load into Converter. Then do
.bimap()
, and it can generate the plain flat bimap which I can save as CSV.
@joeflack4 can you point me towards the code that saves these CSVs? maybe worth having an I/O function upstream in curies
(or to update SemanticSQL to use standard file formats like EPMs ;))
@cthoyt I appreciate you chiming in! I wrote in the OP that "unless Charlie thinks it's a good idea for this functionality to go in the curies package", but I think actually we would prefer that.
I don't foresee SemanticSQL having the bandwidth to add EPMs now but I could be wrong.
Here's an example of a prefixes.csv
compatible with SemanticSQL: https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/config/prefixes.csv
The two columns are prefix
and base
. In curies
, base
is called uri prefix
, so I guess there are basically two options:
a. curies
could have like a .to_csv()
and call the 2nd columnuri_prefix
, and then for mondo-ingest
purposes or other SemnaticSQL users, we'd use this method, but then have to change the 2nd column header on our end.
b. curies
.to_csv()
could have like a format
param with values like standard
and semanticsql
, or just a boolean semanticsql
param.
Actually though I just realized I am not sure if the headers (prefix
and base
) even matter to SemanticSQL. Perhaps it only cares about the column order? I don't see anything about the CSV in the docs.
Overview
mondo-ingest
utilizes SemanticSQL, which requires prefix maps in CSV form. We should create our own function inmondolib
(unless Charlie thinks it's a good idea for this functionality to go in thecuries
package) that can generate this CSV so that we have 1 less place to maintain mappings.Sub-tasks
prefixes.csv
(https://github.com/biopragmatics/curies/issues/106)robot --add-prefixes
format:context.json
(https://github.com/biopragmatics/curies/issues/106) (docs)docs/developer/add-new-source.md
: Remove sections about the above 2 files needing to be statically updated.Additional information
Context
Comments in the synchronization: subclass axioms PR: 1, 2
Possible design
I think we should maintain these static files:
metadata/mondo.sssom.config.yml
metadata/SOURCE.yml
for each sourceThen, we should have some means of reading these in and instantiating a
curies.Converter
and from that export to the needed formats.Related