ku-cbd / PhageBoost

Rapid discovery of novel prophages using biological feature engineering and machine learning
GNU General Public License v3.0
34 stars 6 forks source link

Possible Issue with XGBoost? #24

Open btemperton opened 2 years ago

btemperton commented 2 years ago

Currently trying to use the standalone version to identify phages in coxiella.

First I created an input file to PhageBoost with:

esearch -db genome -query "txid776 [Organism]"|elink -target nuccore|efilter -query "RefSeq"|efetch -format fasta > coxiella.ncbi.fa

Running coxiella.ncbi.fa through the webserver version with a minimum number of 10 genes works fine. Running the same file through the standalone version with the command:

PhageBoost -f coxiella.ncbi.fa -o phage_boost --threads 64

Throws the following error:

processing: coxiella
time after genecalls: 441.18290638923645
time after feature calculations: 806.0731310844421
[16:36:12] WARNING: ../src/tree/./updater_quantile_hist.h:162: Attempted to load internal configuration for a model file that was generated by a previous version of XGBoost. A likely cause for this warning is that the model was saved with saveRDS() in R or pickle.dump() in Python. We strongly ADVISE AGAINST using saveRDS() or pickle.dump() so that the model remains accessible in current and upcoming XGBoost releases. Please use xgb.save() instead to preserve models for the long term. For more details and explanation, see https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
Traceback (most recent call last):
  File "/home/artic/miniconda3/envs/PhageBoost-env/bin/PhageBoost", line 8, in <module>
    sys.exit(main())
  File "/home/artic/.local/lib/python3.7/site-packages/PhageBoost/main.py", line 223, in main
    model, feats, feats_, limit = read_model_from_file(model_file)
  File "/home/artic/.local/lib/python3.7/site-packages/PhageBoost/main.py", line 65, in read_model_from_file
    feats_ = [i.replace('-delta', '') for i in feats]
TypeError: 'NoneType' object is not iterable

output from conda env export:

name: PhageBoost-env
channels:
  - ursky
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_gnu
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - ca-certificates=2021.10.8=ha878542_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - ipython=7.32.0=py37h89c1867_0
  - jedi=0.18.1=py37h89c1867_0
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=11.2.0=h1d223b6_13
  - libgomp=11.2.0=h1d223b6_13
  - libnsl=2.0.0=h7f98852_0
  - libstdcxx-ng=11.2.0=he4da1e4_13
  - libzlib=1.2.11=h36c2ea0_1013
  - matplotlib-inline=0.1.3=pyhd8ed1ab_0
  - ncurses=6.3=h9c3ff4c_0
  - openssl=3.0.0=h7f98852_2
  - parso=0.8.3=pyhd8ed1ab_0
  - pexpect=4.8.0=pyh9f0ad1d_2
  - pickleshare=0.7.5=py_1003
  - pip=22.0.4=pyhd8ed1ab_0
  - prompt-toolkit=3.0.27=pyha770c72_0
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pygments=2.11.2=pyhd8ed1ab_0
  - python=3.7.12=hf930737_100_cpython
  - python_abi=3.7=2_cp37m
  - readline=8.1=h46c0cb4_0
  - setuptools=60.9.3=py37h89c1867_0
  - sqlite=3.37.0=h9cd32fc_0
  - tk=8.6.12=h27826a3_0
  - traitlets=5.1.1=pyhd8ed1ab_0
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - wheel=0.37.1=pyhd8ed1ab_0
  - xz=5.2.5=h516909a_1
  - zlib=1.2.11=h36c2ea0_1013
  - pip:
    - biopython==1.79
    - cachier==1.5.4
    - joblib==1.1.0
    - more-itertools==8.12.0
    - numexpr==2.8.1
    - numpy==1.21.5
    - packaging==21.3
    - pandas==1.3.5
    - pathtools==0.1.2
    - phageboost==0.1.7
    - portalocker==2.4.0
    - pyparsing==3.0.7
    - pyrodigal==0.6.4
    - python-dateutil==2.8.2
    - pytz==2021.3
    - scipy==1.7.3
    - six==1.16.0
    - tables==3.7.0
    - tabulate==0.8.9
    - watchdog==2.1.6
    - xgboost==1.5.2
tsp-kucbd commented 2 years ago

For now PhageBoost needs XBboost 1.02 In your conda environment, you can install the specific xgboost version using conda within the environment where phageboost is installed

conda install xgboost=1.0.2