Closed pdan93 closed 1 year ago
Also tests should pass
When we get tests to pass and implement a recipe to update both requirements, let's mark the tests according to which dependendencies they need and run the tests that need the dev dependencies by installing the dev virtualenv while running the other with installing the default ones to ensure that the repo will work with the light deps.
Build is now passing.
Tests are working, including locally via running pytest
.
@nsorros Please review the changes I made when you have time
Build is now passing.
Tests are working, including locally via running pytest
.
@nsorros Please review the changes I made when you have time
Are those dependencies needed in unpinned_requirements
for predict
and download
to work? If not move to dev.
matplotlib
gensim==4.0.0
scispacy
scikit-multilearn
streamlit
seaborn
also create two sets of the tests that you run independently using pytest mark. for the light installation run only the predict / download tests and for all other tests run everything. for the light only install the light version
Also I do not think download
or predict
works. Can you provide a sample that works?
In download
case there is a wrong version in the package but even when using version = "0.2.4"
this only downloads xlinear. If anything we should download bertmesh although we can discuss whether that is needed at this point.
In predict
case it requires a path to a label binarizer which is not needed for bert mesh. I run grants_tagger predict malaria Wellcome/WellcomeBertMesh models/xlinear/label_binarizer-2022.12.0.pkl
so i used xlinear
label binarizer but this did not work either.
I suggest
We should also use this opportunity to simplify the default requirements. Here is the list I used locally that mostly worked
pandas
xlrd
scikit-learn
numpy
transformers
scipy
wasabi
typer
tqdm
requests
openpyxl
torch
this is for unpinned requirements.
After all that is done, also check which packages are taking most of the space in the virtualenv, say top 5? For me at this point these are
562912 venv/lib/python3.8/site-packages//torch
208728 venv/lib/python3.8/site-packages//scipy
117504 venv/lib/python3.8/site-packages//pandas
117160 venv/lib/python3.8/site-packages//transformers
111576 venv/lib/python3.8/site-packages//numpy
the most heavy is torch which for me is 500MB but for some linux variants gets into the GBs. It would be good to force a cpu installation of torch for inference in order to make this really light 🪶
also create two sets of the tests that you run independently using pytest mark. for the light installation run only the predict / download tests and for all other tests run everything. for the light only install the light version
The predict
test I decided to skip for now. It was a bit too much hassle to modify it to the new model
You can run tests reserved for inference time via:
pytest -m inference_time
Also the cuda
libraries are not removed. Add grep -v
in the make recipe.
ERROR: Could not find a version that satisfies the requirement nvidia-cublas-cu11==11.10.3.66 (from versions: 0.0.1.dev5, 0.0.1)
ERROR: No matching distribution found for nvidia-cublas-cu11==11.10.3.66
At the moment the light virtualenv takes 4.8GB in ubuntu. And this is why
2647480 venv/lib/python3.8/site-packages/nvidia
1359556 venv/lib/python3.8/site-packages/torch
188196 venv/lib/python3.8/site-packages/triton
92672 venv/lib/python3.8/site-packages/pydantic
87564 venv/lib/python3.8/site-packages/scipy
If we can force a cpu installation of torch the size will reduce dramatically.
We are now at 1.4GB size for the default virtualenv which is quite light 🪶 (compared to the almost 6GB)
1424296 venv/lib/python3.8/site-packages/
730120 venv/lib/python3.8/site-packages/torch
92672 venv/lib/python3.8/site-packages/pydantic
87564 venv/lib/python3.8/site-packages/scipy
67852 venv/lib/python3.8/site-packages/sympy
63360 venv/lib/python3.8/site-packages/pandas
Description
This PR makes the default grants tagger environment light and only responsible to inference. To do that
dev
anddefault
dev
requirements which ismake update-requirements-dev
We also
You can install the default environment and test by running
Fixes #201
Checklist
Release checklist