Closed halirutan closed 4 years ago
Ping @halirutan @mai00fti : Could you create a release of the repos? Maybe v0.1? Maybe set the pre-release checkbox?
I would like to create a conda package.
Potentially stupid question: Why is the tensorflow-gpu
conda package installed? Is the really needed in the tensorflow container? Intuitively I would guess that all functionality should be already in the container?
One more question: in tests/
there are a bunch of python scripts. How are they called and in which order are they executed?
Potentially stupid question: Why is the
tensorflow-gpu
conda package installed? Is the really needed in the tensorflow container?
Simple answers: 1. Because in the environment.yml
we need it if someone wants to create the conda env. 2. I was too lazy to test if it works in the singularity container without it. After all we're setting up a new conda env and I wasn't totally sure if it actually reuses the packages that are already available. Since the singularity built takes ages, I simply haven't investigated yet.
One more question: in
tests/
there are a bunch of python scripts. How are they called and in which order are they executed?
The run...
files are not for unit-testing and they cannot be run automatically because they rely on the /data
which is not available in the repo. Therefore, only the test_...
files are executed with pytest.
Could you create a release of the repos?
Not sure if I'm allowed to do this. I need to check what privileges I have.
Ping @halirutan @mai00fti : Could you create a release of the repos? Maybe v0.1? Maybe set the pre-release checkbox?
Done: https://github.com/yigbt/deepFPlearn/releases/tag/v0.1
@bernt-matthias Damn, I guess you needed the release from this branch, right?
I guess you needed the release from this branch, right?
Should be fine.
One step further: I successfully built an image using pip only: https://github.com/bernt-matthias/deepFPlearn/runs/1166177083?check_suite_focus=true (in my original branch https://github.com/bernt-matthias/deepFPlearn/tree/topic/singularity-CD)
In contrast to the conda based container the image is 'only' 1.5GB (the other one has nearly 4GB). One problem might be that I need to install rdkit via apt-get, which seems to install to python 2.7. Not sure if this leads to problems. Tests would be nice (also in general adding CI for the project).
Not sure yet why the conda image gets that large. I will play around with multi stage builds in the style of https://pythonspeed.com/articles/conda-docker-image-size/
If I'm not wrong, then single stage build now works now because I use the slim version of quay.io/singularity/singularity
Nevertheless I got also multstage builds to work:
Sizes are from building on my machine, here I have seen different sizes .
I have the three container definitions in my branch https://github.com/bernt-matthias/deepFPlearn/tree/topic/singularity-CD
Not sure if the commented packages in the environment are really not needed, but at least I could not find them used with grep.
If the pip version is running I guess this may be the preferred way.
@bernt-matthias I need to look at this closer.
Tests would be nice
The rdkit can be tested with tests/test_fingerprint.py
since it uses rdkit to build fingerprints. A call to pytest
from the deepFPlearn
directory (with activated env) is sufficient.
Not sure yet why the conda image gets that large.
My current understanding is that the conda dependency resolution is superior to that of pip. I mean, it's not only important what packages we use but also which packages the dependencies depend on. I don't know what magic conda uses but it takes a very long time to resolve all dependencies.
A definite test is if we re-run the training. A small subset of our cases should be sufficient. But I need to test this.
Not sure if the commented packages in the environment are really not needed, but at least I could not find them used with grep.
The whole thing is historically grown from Jana's first tries. So there is a good chance that not all packages are needed.
Can you push the commits from your branch to this one?
On a side note: We really need to fix the naming of the environment :) This is also a historical thing and conda_rdkit2019
does not really fit anymore :joy:
The rdkit can be tested with tests/test_fingerprint.py since it uses rdkit to build fingerprints. A call to pytest from the deepFPlearn directory (with activated env) is sufficient.
Then we should add a %test
section to the final stage
My current understanding is that the conda dependency resolution is superior to that of pip.
True
I mean, it's not only important what packages we use but also which packages the dependencies depend on. I don't know what magic conda uses but it takes a very long time to resolve all dependencies.
Conda computes the complete dependency graph of the packages and its dependencies and gets an optimal solution by solving a SAT problem .. that's why it takes a long time some times. Problem is known, here are some recommendations https://github.com/bioconda/bioconda-recipes/issues/13774 (e.g. fixing python / R versions to a reasonable range .. for instance py>=3.5)
A definite test is if we re-run the training. A small subset of our cases should be sufficient. But I need to test this.
Would be a cool addition. Wondering if the gpu container will run on the github actions virtual machines which probably do not have GPUs.
Can you push the commits from your branch to this one?
Done
On a side note: We really need to fix the naming of the environment :) This is also a historical thing and conda_rdkit2019 does not really fit anymore joy
Jep. That's an easy one. We just need to decide which to take.
@bernt-matthias
Not sure if the commented packages in the environment are really not needed, but at least I could not find them used with grep.
I was having a look at this while working on another issue with ~Java~ Jana (have to remember her name properly). We definitely use sklearn
as direct import but I was able to scrape a lot of the packages. I also fixed the versions to values I'm seeing here because in the bioconda issue you linked, it was mentioned as one major slow-down in the resolution of dependencies. I'm currently running the whole analysis on my machine with the new environment and will do the same on the HPC. Right now, the list of packages looks like this:
- python=3.8.2
- pip=20.1.1
- numpy=1.19.1
- conda=4.8.4
- conda-build=3.20.3
- tensorflow-gpu=2.2.0
- scikit-learn=0.23.1
- pandas=1.0.5
- keras=2.4.3
- rdkit=2020.03.4
- matplotlib=3.2.2
- pytest=5.4.3
- jsonpickle=1.4.1
Then we should add a %test section to the final stage
Agreed! It would be really nice if we had some sort of test-data that we could use for training. Just to be sure. Will talk to Jana about it.
Would be a cool addition. Wondering if the gpu container will run on the github actions virtual machines which probably do not have GPUs.
Should work because we basically have the same situation on your UFZ HPC where we can't access the GPU atm. It just falls back to CPU.
I tested the adjusted environment locally and ran some tests. You see that it still builds here. I'd say when the analysis on the HPC I'm starting now runs through with the changed environment, we should discuss what we need to change in the GitHub action to push the container automatically on each new tagged release on master.
It's really awesome that we made such progress and I highly appreciate your help in all this! :grinning:
@bernt-matthias Surprisingly we now run into the same error we had before about "no space left on device" :(
Hrm. Habe den Job nochmal neu gestartet...
Gestern noch kurz mit @mai00fti drueber geschaut. Denken es ware gut, wenn die/das def file(s) unabhaengig von CI laufen wuerden. Dafuer muesste man eigentlich nur sources per git auschecken.
Attention I did force push to this branch :(
@bernt-matthias I took some time tonight to work on this, and I got everything running:
I have this on a private repo for now, but I would work with Jana on it tomorrow. We need to set up a sylabs account for the group and add access-tokens to this repo (which I'm not allowed to do). Then we need to fix the description, collection, etc of the container so that has the right information on it.
I'd also force push to this branch when everything is running.
@bernt-matthias @mai00fti Yeah, works. Right now, we decided to push the containers to Jana's sylabs account until a better solution with more quota is available. I'll push some more documentation to this branch and then it should be ready for merging.
We'll build the container on each release (and not on each push) and after merging this branch, I'll add the latest changes we made to the dfpl package to master and create a first real release.
@bernt-matthias On my local machine, both the dfpl.def
and multi.def
run through smoothly. However, there is a huge difference in size
Have you tried building the multi.def
yourself and do you get other results?
Since Jana mentioned it that this wasn't clear. The singularity build works out of the box on every machine and you don't have to set up anything special. Just clone the whole repo, go into the singularity_container
folder and call
SING=$(command -v singularity)
sudo $SING build dfpl-multi.sif multi.def
For me the local build of the multi_conda def file is 4.1G.
Questions:
.git
dir as well? Then this adds 1.2G on normal checkout. For the CI it should not be a problem because the checkout action should only checkout the state of the latest commit and not the complete git history (depth=1 .. I think).Maybe try building locally on a fresh clone using git clone -b feat_singularity_CD --single-branch --depth 1
@bernt-matthias You are right. I wasn't paying attention that the whole file-tree is copied into the container and it happily included the sifs I already built. Keeping this in mind, I suggest taking your multi.def
to build the container. I will make all necessary changes. Thanks for looking into it.
TODO: