Closed apetkau closed 2 years ago
The Conda recipe also pulls PyPI archive so it is also not affected by url change. How long would this new hosting will last? Will test tomorrow and see how it works and then merge
Thanks for that. Yes, conda installation is unaffected as well.
The new hosting should be permanent.
Thanks for taking the time to test this out.
@apetkau @Takadonet I had tested the package inside docker container and it works with a caveat. Install from source is a bit problematic because it assumes that database files are initialized. If it is not the case, the package compiles, but you get file paths errors that do not allow the setup_sistr_dbs()
to be executed to mitigate this issue due to imports of those resources at the very beginning of the sistr_cmd
module.
The way on mitigate those issues is to download database via the python3 -c "from sistr.sistr_cmd import *; setup_sistr_dbs()"
command followed by python3 setup.py install
. One can mitigate this issue by updating setup.py
that would automatically download databases before package compile time as databases are imported as package resources and not as some external dependencies.
Otherwise the new database link https://sairidapublic.blob.core.windows.net/downloads/sistr/database/SISTR_V_1.1_db.tar.gz works OK.
root@ae47ca3822c5:/sistr_cmd# sistr --help
Traceback (most recent call last):
File "/usr/local/bin/sistr", line 33, in <module>
sys.exit(load_entry_point('sistr-cmd==1.1.1', 'console_scripts', 'sistr')())
File "/usr/local/bin/sistr", line 25, in importlib_load_entry_point
return next(matches).load()
File "/usr/lib/python3.9/importlib/metadata.py", line 77, in load
module = import_module(match.group('module'))
File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 664, in _load_unlocked
File "<frozen importlib._bootstrap>", line 627, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/lib/python3.9/dist-packages/sistr_cmd-1.1.1-py3.9.egg/sistr/sistr_cmd.py", line 15, in <module>
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 664, in _load_unlocked
File "<frozen importlib._bootstrap>", line 627, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/lib/python3.9/dist-packages/sistr_cmd-1.1.1-py3.9.egg/sistr/src/cgmlst/__init__.py", line 11, in <module>
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 664, in _load_unlocked
File "<frozen importlib._bootstrap>", line 627, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/lib/python3.9/dist-packages/sistr_cmd-1.1.1-py3.9.egg/sistr/src/serovar_prediction/__init__.py", line 5, in <module>
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 664, in _load_unlocked
File "<frozen importlib._bootstrap>", line 627, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/lib/python3.9/dist-packages/sistr_cmd-1.1.1-py3.9.egg/sistr/src/serovar_prediction/constants.py", line 12, in <module>
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1130, in resource_filename
return get_provider(package_or_requirement).get_resource_filename(
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1739, in get_resource_filename
return self._extract_resource(manager, zip_path)
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1761, in _extract_resource
timestamp, size = self._get_date_and_size(self.zipinfo[zip_path])
KeyError: 'sistr/data'
I had added to setup.py
a call to setup_sistr_dbs()
to init databases before package build and install that solves the issue of direct install from source code. I had tested and it works
Thanks @kbessonov1984 . I will test out your changes.
I had tested out installing and running myself in a conda environment (prior to your changes) and it all seemed to work good so I am not sure if your specific problem is related to Docker or a specific version of Python. I had started to try to test it out again but hadn't had a chance to finish that.
Thanks again for testing this out @kbessonov1984 and your suggested changes.
So one issue that exists with the code you have is that if you import sistr.sistr_cmd
prior to installing SISTR, you are also importing all the dependencies prior to Python installing the dependencies. This leads to the following error when trying to install with pip install .
on the code (assuming .
here is the current directory containing the SISTR code):
Processing /home/CSCScience.ca/apetkau/workspace/sistr_cmd_2
Preparing metadata (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /home/CSCScience.ca/apetkau/miniconda3/envs/sistr/bin/python3.9 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/CSCScience.ca/apetkau/workspace/sistr_cmd_2/setup.py'"'"'; __file__='"'"'/home/CSCScience.ca/apetkau/workspace/sistr_cmd_2/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-yc6spelk
cwd: /home/CSCScience.ca/apetkau/workspace/sistr_cmd_2/
Complete output (9 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/CSCScience.ca/apetkau/workspace/sistr_cmd_2/setup.py", line 4, in <module>
from sistr.sistr_cmd import setup_sistr_dbs
File "/home/CSCScience.ca/apetkau/workspace/sistr_cmd_2/sistr/sistr_cmd.py", line 14, in <module>
from sistr.src.blast_wrapper import BlastRunner
File "/home/CSCScience.ca/apetkau/workspace/sistr_cmd_2/sistr/src/blast_wrapper/__init__.py", line 7, in <module>
import pandas as pd
ModuleNotFoundError: No module named 'pandas'
If you wanted to have the databases setup prior to packaging you'd have to find some way to trigger the setup_sistr_dbs()
function after dependencies have been resolved and SISTR has been installed.
Also, the original issue you described (KeyError: 'sistr/data'
) is not something I am able to reproduce when I install SISTR in a conda environment. The database downloads just fine as soon as you first run SISTR.
@apetkau Had added database pre-install with this new commit and tested install via python setup.py install
and pip install ./sistr_cmd
routes successfully. The database is downloaded into the source code at sistr/data
before package build and install.
I had tried to both do installs both via python setup.py install
and pip install .
routes successfully on a fresh Ubuntu system. I also tried to install them inside a conda environment and both install routes worked
conda create --name sistr python=3.8 pycurl mash blast
conda activate sistr
#install via setup.py
python setup.py install
#install via pip
pip install .
During custom install after the main package is installed, the databases are initialized preventing a race condition such as when a user will launch sistr
on various threads at the same time. It is possible to do database init manually by running sistr_init
command. This approach solves a local install from the source via pip
. Currently the PyPI package comes with databases so no database init is necessary.
Awesome, thanks so much @kbessonov1984 . This looks great. I tested it out via conda and it is installing properly now. Thanks again.
The previous URL for the SISTR database (https://irida.corefacility.ca/downloads/sistr/database/SISTR_V_1.1_db.tar.gz) will soon no longer be available. We have moved the database to a new URL https://sairidapublic.blob.core.windows.net/downloads/sistr/database/SISTR_V_1.1_db.tar.gz.
This change won't need a new release of SISTR since a copy of the database files is stored in the PyPI package and installed when using
pip install sistr_cmd
(or conda install). However, this URL will need to be used for anyone who installs via the GitHub repo to do development on SISTR.