mapping-commons / sssom-py

Python toolkit for SSSOM mapping format
https://mapping-commons.github.io/sssom-py/index.html#
MIT License
48 stars 10 forks source link

Add JSKOS writer #334

Open nichtich opened 1 year ago

nichtich commented 1 year ago

As discussed here JSKOS is another format also used to encode mappings. The format is defined on JSON. In practice newline delimited JSON with extension ndjson is used most for files with multiple mappings in JSKOS to facilitate processing with command line tools. I suggest:

Example data (DDC-BK mappings):

By the way, there is also a TSV/CSV format for JSKOS but we may better align the tabular formats of SSSOM and JSKOS to a common format.

matentzn commented 1 year ago

@nichtich this is great, thank you. Would you be able to provide the converter yourself? We will review and integrate it of course, and help you if you have issues figuring out what goes where.

nichtich commented 1 year ago

I started implementation of writer at https://github.com/gbv/sssom-py/commit/e3a217d2241674b2b368108929c36b446c1712ce but

BTW: Where can I find real-world SSSOM data beyond test files as examples.

matentzn commented 1 year ago

I started implementation of writer at https://github.com/gbv/sssom-py/commit/e3a217d2241674b2b368108929c36b446c1712ce but

how do I convert subject_id and object_id to full URI? same for creater_id (but I would recommend using full URIs for creator ids anyway because this is also easier when the data is created)

Look at the other converters: for external -> sssom, we have a utility, see for example Alignment API converter: https://github.com/mapping-commons/sssom-py/blob/d8992f42aabe78a8df5465b193f9d0fc63680eba/sssom/parsers.py#L801

For the other way, we use mostly LinkML, which itself (in case you want to do it manually), uses the curies package.

how to compare output of test conversion against expected output? Unit test seems to just convert but no result check?

@hrshdhgd will tell you details, but this is done through round tripping and "expected" serialisations in the test data folder.

BTW: Where can I find real-world SSSOM data beyond test files as examples.

If you need more let me know:

nichtich commented 1 year ago

I've finished implementation of JSKOS output format, tested via

sssom convert $tsvfile -O jskos

with all of https://github.com/mapping-commons/mh_mapping_initiative/tree/master/mappings (works fine) and all of tests/data/*.tsv:

The file at https://github.com/monarch-initiative/mondo/tree/master/src/mappings does not work because it lacks mapping_justification (which I would make optional).

Before finishing I need some help with running the tests (they fail locally and at current CI at GitHub) and to check which TSV files are meant to be valid SSSOM TSV and which are outdated or require some special handling.

matentzn commented 1 year ago

This is awesome, thank you! @hrshdhgd will help you when he gets a chance!

Will you also provide a jskos reader? This would be the direction that is most valuable for us, at least :) 🙏

nichtich commented 1 year ago

As discussed at https://github.com/mapping-commons/sssom/discussions/250#discussioncomment-4918548 the JSKOS output could be extended by mapping identifiers.