monarch-initiative / omim

Data ingest pipeline for OMIM.
6 stars 2 forks source link

Weird `\n'` in `omim.sssom.tsv` #112

Open joeflack4 opened 1 month ago

joeflack4 commented 1 month ago

Overview

In omim.sssom.tsv, there is this weird extra line break before the closing single quotation mark, appearing as # '. IDK what's causing it.

omim.sssom.tsv subsection:

# license: https://creativecommons.org/licenses/by/4.0/
# mapping_set_description: 'The file `omim.sssom.tsv` is generated using the Python
#   package `sssom`, by running the command `make sssom` from a cloned copy of the OMIM
#   ingest (https://github.com/monarch-initiative/omim). For more information on data
#   sources, assumptions, and computations, refer to `README.md` or the comments at
#   the top of `omim2obo/main.py` in the OMIM ingest.
#   '
# mapping_set_id: http://purl.obolibrary.org/obo/mondo/mondo-ingest/mapping/omim.sssom.tsv
subject_id  subject_label   predicate_id    object_id   mapping_justification

For reference, here's the corresponding subsection of metadata.sssom.yml: https://github.com/monarch-initiative/omim/blob/main/data/metadata.sssom.yml#L1-L7

joeflack4 commented 1 month ago

@matentzn FYI if you have any ideas. Lemme know if you want me to move this issue to sssom-py.

matentzn commented 1 month ago

No idea, but this is indeed weird. If you can create a small reproducible example this merits a sssom-py issue! I guess this is the most likely line for the origin of this https://github.com/mapping-commons/sssom-py/blob/6e3961f6588117c93842a8b82d94cff772fcbe72/src/sssom/writers.py#L64C4-L64C45

Good catch

joeflack4 commented 1 month ago

I don't generally mind giving you a reproducible example. Given that this is a low priority issue, rather than going with a small example, I'll go w/ the easiest way for me; the means to easily reproduce what's exactly happening for this instance:

Command: sssom parse omim.json -I obographs-json -m metadata.sssom.yml -o omim.sssom.tsv

Inputs: Downloadable here: https://drive.google.com/drive/folders/15ZGMXoroNVYojOjxY_vIXWLURtlcmW71