monarch-initiative / omim

Data ingest pipeline for OMIM.
7 stars 3 forks source link

Make codebase compatible w/ latest ODK #114

Open joeflack4 opened 5 months ago

joeflack4 commented 5 months ago

Overview

As of now, I'm getting errors if I run on obolibrary/odkfull: latest or dev. Right now we are stuck to v1.4.3.

Sub-tasks

Error details

Err: KeyError: "['subject_label'] not in index" Happens when: creating mondo_exactmatch_omimps.sssom.owl

Details

``` sh run.sh make mondo_genes.robot.tsv; beep sssom convert mondo_exactmatch_omimps.sssom.tsv -O owl -o mondo_exactmatch_omimps.sssom.owl /usr/local/lib/python3.10/dist-packages/sssom/util.py:168: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` df.replace("", np.nan, inplace=True) Traceback (most recent call last): File "/usr/local/bin/sssom", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/sssom/cli.py", line 161, in convert convert_file(input_path=input, output=output, output_format=output_format) File "/usr/local/lib/python3.10/dist-packages/sssom/io.py", line 50, in convert_file write_func(doc, output, serialisation=fileformat) # type:ignore File "/usr/local/lib/python3.10/dist-packages/sssom/writers.py", line 162, in write_owl graph = to_owl_graph(msdf) File "/usr/local/lib/python3.10/dist-packages/sssom/writers.py", line 173, in to_owl_graph msdf.df = invert_mappings( File "/usr/local/lib/python3.10/dist-packages/sssom/util.py", line 1462, in invert_mappings inverted_df = inverted_df[df.columns] File "/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py", line 4108, in __getitem__ indexer = self.columns._get_indexer_strict(key, "columns")[1] File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py", line 6200, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py", line 6252, in _raise_if_missing raise KeyError(f"{not_found} not in index") KeyError: "['subject_label'] not in index" make: *** [makefile:23: mondo_exactmatch_omimps.sssom.owl] Error 1 ```

joeflack4 commented 5 months ago

@matentzn @twhetzel FYI

Set this to low priority, though perhaps it should be medium.

matentzn commented 4 months ago

If this happens in the latest ODK this is a high priority issue.

joeflack4 commented 4 months ago

It does. High it is~

twhetzel commented 1 week ago

@joeflack4 - in the initial post it says "Right now we are stuck to v1.4.3". What do you mean "stuck" to v1.4.3?

gouttegd commented 1 week ago

This does not look like a problem with any code in this repository nor like a problem with the ODK.

It looks more like a bug in sssom-py. When converting to OWL, it has to invert the mappings for some reason, and the inversion fails because sssom-py seems to expect that the inverted dataset should contain a subject_label column -- which is not the case here since the original dataset contains only a subject_label column but not a object_label column, so the inverted dataset conversely contains only a object_label column but not a subject_column.

I can replicate the issue with the latest sssom-py code. Any attempt to convert to OWL a mapping set that does not have a object_label column yields the same error as reported in this ticket.

Definitely a sssom-py bug. @matentzn

gouttegd commented 1 week ago

A possible workaround until the bug is fixed upstream (and a new version of sssom-py lands in the ODK) is to use the SSSOM plugin for ROBOT to do the OWL conversion:

robot sssom:inject --create --sssom mondo_exactmatch_omimps.sssom.tsv --direct --output mondo_exactmatch_omimps.sssom.owl

Be advised that it produces a slightly different OWL product than sssom-py convert (because OWL serialisation of SSSOM is poorly specified): notably, it does not convert the mapping set metadata (things like mapping_set_id).

joeflack4 commented 1 week ago

Thanks for your help again @gouttegd! Yes, I agree with your assessment.

@twhetzel By stuck, I just mean that as a consequence of this not working with the latest version of ODK, we have to use an older version, v1.4.3 being that version.

twhetzel commented 1 week ago

Thanks all! Is there an issue for this bug in sssom-py?

gouttegd commented 1 week ago

There is now. :) https://github.com/mapping-commons/sssom-py/issues/554

joeflack4 commented 1 week ago

@gouttegd Thanks so much:

@matentzn @twhetzel I suppose now I'll upgrade the GH action and fix it to the current latest version and make sure it works.

gouttegd commented 1 week ago

@joeflack4 The bug is fixed in sssom-py, but the ODK has not been updated (yet) to include the latest version of sssom-py!

We will update the master branch of the ODK early next week (this will in fact be done automatically). Then, we will probably make a ODK 1.5.3 release. That’s when you’ll be able to fix your issue by upgrading to the latest ODK. :)

twhetzel commented 1 week ago

@joeflack4 I don't see a reason to update the GH Action at this time vs. updating to a new version of ODK when that is released.

gouttegd commented 6 days ago

FYI, version 1.5.3 of the ODK, which includes the fix to the sssom Python package, has been released today.

joeflack4 commented 6 days ago

@gouttegd Appreciate that! I'll give it a whirl momentarily.

joeflack4 commented 1 day ago

Note to self to work on this later. The best fix may indeed just be adding a flag or manipulating the command.

Not sure why this error didn't appear in older versions of ODK, but currently getting:

Run make install
  make install
  shell: sh -e {0}
pip install -r requirements-unlocked.txt
WARNING: The directory '/github/home/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.

    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.

    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.

    See /usr/share/doc/python3.1[2](https://github.com/monarch-initiative/omim/actions/runs/11646677953/job/32431010170#step:4:2)/README.venv for more information.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
make: *** [makefile:68: install] Error 1
matentzn commented 1 day ago
pip install xyz --break-system-packages

on newer ODKs as with later pythons you cant just pip install into the global environment anymore

gouttegd commented 1 day ago

In the specific case of this repository (omim) the fix is easy: looking at your requirements-unlocked.txt file, the only Python package you are trying to install is python-dotenv. That package is provided in ODK 1.5, so that install step of yours is now completely unnecessary.

gouttegd commented 1 day ago

on newer ODKs as with later pythons you cant just pip install into the global environment anymore

To further explain:

Compared to ODK 1.4.x, two things have changed with the later ODK versions:

So if you do need Python packages that are not already provided in the ODK nowadays, what can you do?

(A) The fastest fix is to call pip with --user --break-system-packages; --user to install the packages “user-wide” rather than “system-wide”, and --break-system-packages to force pip to install packages in the global environment (yes, this is needed even with --user; the “global environment” comprises both the ”system” packages installed in /usr/lib/python/... and the “user” packages installed in ~/.local/lib/python/..., so by default pip will even refuse to install anything in your own user’s directory with the --break-system-packages option).

(B) Create a virtual environment and install packages inside that environment. That’s the Python recommended way, though it is completely ridiculous and overkill in the context of the ODK (which is already a virtual environment).

(C) Bribe ODK developers to convince them to add the packages you need in the next version of the ODK. :)

(D) See the light and realize your life would be much easier if you didn’t have to deal with anything related to Python, and switch to another language. :D

joeflack4 commented 1 day ago

That package is provided in ODK 1.5, so that install step of yours is now completely unnecessary.

I thought of that, but for purposes of simplicity / future-proofing, as codebase changes and more packages may be added, ideally I just leave that step there and never think about touching it.

joeflack4 commented 1 day ago

@matentzn I remember you having to use --break-system-packages before, and I did notice it in the error messaging. Good to know that this should work.

@gouttegd Thanks for the background explanation. I figured something like that must have happened. I'll be sure to use --user --break-system-packages!

twhetzel commented 1 day ago

Overall, if a new package is needed critically I'd go with (A) and also (C) to make requests to have these included in ODK.

Anything needed that is not part of ODK should also be added into the README (just saying for completeness).