monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

Include `robot` & related setup to repository docs #29

Open joeflack4 opened 2 years ago

joeflack4 commented 2 years ago

Description

This repository (as well as some other ones, maybe?) have commands and scripts that utilize robot. The way we typically like to do this would be to wrap the call in sh run.sh robot ..., which has the advantage of calling a dockerized environment which is shared between devs. This solves the problem of standardizing the version / environment.

However, there are drawbacks to this and some issues I've noticed that we might eventually want to address:

Possible solutions

For (2), we could simply include the latest version of robot as a .jar, and then call that. This would be similar to other standard package management approaches, the only difference being that we (a) include the executable directly, as opposed to (b) declaring its version and having a package manager install it. We could also go with (b) by creating a PyPi package which includes the .jar and runs it through some Python interface.

joeflack4 commented 2 years ago

@cmungall @matentzn I met w/ Tim Putnam about another project recently, and in his project he included the robot jarfile. I thought this made a lot of sense, so I decided to make this issue about it. This something that I think is super important to decided on in the short term, but it should at least be addressed.

matentzn commented 2 years ago

I also talked to Tim about it and it is not a good practice to check in Jars. In any case it's not necessary- all projects I work on work on ODK, which has ROBOT installed, so I don't really see why you would need to install it again separately?

matentzn commented 2 years ago

Sorry, Didnt see you updated your issue with more text. All python scripts must also be run inside run.sh, then you can just refer to robot! Nothing at all in any ontology repo will be run outside the ODK! This ensures that all workflows run the same way on all systems!

joeflack4 commented 2 years ago

Hmm, no problem no problem. I can see the pros/cons as well, and I defer to how you want to continue managing this.

Regarding sub-tasks (1) and (2) in the OP, I think it still stands that it would be an improvement to do (1). We do have documentation outside of GitHub regarding robot and how to run it, but I really like when GitHub documentation is fairly standalone. We could also just link to the documentation on setting up robot. My thought is mainly for helping make sure new people are familiar w/ the practice.

Regarding (2), I do need to (i) make sure that my Python scripts call robot using the sh run.sh ..., so that's on my TODOs. As for (ii) the makefile itself, I have a question for you (sorry if it's similar to what I've asked before). I see robot called without the sh run.sh .... Is this because the robot commands within the makefile are not meant to be run manually, but that these commands will actually be called from within the environment by other commands?

matentzn commented 2 years ago

I do need to (i) make sure that my Python scripts call robot using the sh run.sh ...

No! Since the python script is running inside of docker, you just run robot and its fine! Will comment more later this week!

joeflack4 commented 2 years ago

Pffft you are right again, now that I think about it. I just been a bad boy and calling those scripts directly xD

If that's the case then, then I think (2) can be checked off, leaving only (1) for consideration.

matentzn commented 2 years ago

I will write documentation for everything (set up and all) once I am happy with the process - Just make sure that whatever you do in this repo:

  1. There is a make goal in src/ontology/mondo-ingest.Makefile to run it
  2. It can be run inside the run.sh wrapper like: sh run.sh make my_crazy_thing. I will take care of the rest.
joeflack4 commented 2 years ago

Sounds good.

For (1) yep, that's been done previously and is good.

For (2) I just checked, and it looks like it also works if I'm using sh run.sh as well.


I do have a question, though. Here's one of my make recipes:

report-mapping-annotations:
#   python3 -m pip install -r $(RELEASEDIR)/requirements.txt
    python3 $(SCRIPTSDIR)/ordo_mapping_annotations/report_mapping_annotations.py

I was previously doing a pip install here. Normally, if it does this and the packages are already installed, it basically skips this step. However, I ran this command twice using sh run.sh ..., and it did a full install both times (which took awhile, especially on bad wifi). I'm not sure why that's the case... wondering if maybe you know? For now, I commented that line out. It runs the command just fine. However I imagine if someone else wanted to run this, they'd also need to do the pip install first... so I don't like this solution.

The crux: Wondering what is the best practice for ensuring Python package requirements are satisfied for any make recipes that are running inside of the docker environment via sh run.sh?

matentzn commented 2 years ago

Its a valid concern - we deal with that usually by having a process in ODK itself to add libraries.

You could make a PR on https://github.com/INCATools/ontology-development-kit/blob/master/requirements.txt.full

and maybe your dependency gets admitted. Good point though - its is annoying to have to wait.

joeflack4 commented 2 years ago

Cool cool. Ok good to know it's not just me experiencing this.

It looks like all of the requirements I have for my project are already included there, so we don't need to add them. The only packages that my that my mondo-ingest Python scripts depend on are:

jinja2
pandas
sssom

So in this case, I suppose it'll be a good idea to skip the pip install step in my recipes, as these dependencies should already be installed in the environment, it seems.

matentzn commented 2 years ago

Yes, exactly!