monarch-initiative / mondolib

Python library for mondo QC
https://monarch-initiative.github.io/mondolib/
MIT License
3 stars 0 forks source link

`Converter` exporters #8

Open joeflack4 opened 10 months ago

joeflack4 commented 10 months ago

Overview

mondo-ingest utilizes SemanticSQL, which requires prefix maps in CSV form. We should create our own function in mondolib (unless Charlie thinks it's a good idea for this functionality to go in the curies package) that can generate this CSV so that we have 1 less place to maintain mappings.

Sub-tasks

Additional information

Context

Comments in the synchronization: subclass axioms PR: 1, 2

Possible design

I think we should maintain these static files:

Then, we should have some means of reading these in and instantiating a curies.Converter and from that export to the needed formats.

Related

matentzn commented 10 months ago

Fantastic idea! THANK YOU!

joeflack4 commented 8 months ago

This topic came up when working on https://github.com/monarch-initiative/mondo-ingest/pull/394.

Nico suggestion on how to do this:

Start with EPM. Load into Converter. Then do .bimap(), and it can generate the plain flat bimap which I can save as CSV.

cthoyt commented 8 months ago

@joeflack4 can you point me towards the code that saves these CSVs? maybe worth having an I/O function upstream in curies (or to update SemanticSQL to use standard file formats like EPMs ;))

joeflack4 commented 8 months ago

@cthoyt I appreciate you chiming in! I wrote in the OP that "unless Charlie thinks it's a good idea for this functionality to go in the curies package", but I think actually we would prefer that.

I don't foresee SemanticSQL having the bandwidth to add EPMs now but I could be wrong.

Here's an example of a prefixes.csv compatible with SemanticSQL: https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/config/prefixes.csv The two columns are prefix and base. In curies, base is called uri prefix, so I guess there are basically two options: a. curies could have like a .to_csv() and call the 2nd columnuri_prefix, and then for mondo-ingest purposes or other SemnaticSQL users, we'd use this method, but then have to change the 2nd column header on our end. b. curies .to_csv() could have like a format param with values like standard and semanticsql, or just a boolean semanticsql param.

Actually though I just realized I am not sure if the headers (prefix and base) even matter to SemanticSQL. Perhaps it only cares about the column order? I don't see anything about the CSV in the docs.