Closed dzra closed 3 years ago
Hi Ryan,
Ugh yea, that sounds unfortunate, and should not happen! Can you send me the output of pip freeze
? The estimation of threshold runs over the MOODS library, so I am wondering if there might be a specific version causing it to fail.
Ok thanks!
[s161282@Nucleus005 ~]$ module load python/3.7.x-anaconda [s161282@Nucleus005 ~]$ pip freeze adjustText==0.7.3 alabaster==0.7.12 anaconda-client==1.7.2 anaconda-navigator==1.9.7 anaconda-project==0.8.3 appdirs==1.4.4 asn1crypto==0.24.0 astroid==2.2.5 astropy==3.2.1 atomicwrites==1.3.0 attrs==19.1.0 Babel==2.7.0 backcall==0.1.0 backports.functools-lru-cache==1.5 backports.os==0.1.1 backports.shutil-get-terminal-size==1.0.0 backports.tempfile==1.0 backports.weakref==1.0.post1 beautifulsoup4==4.7.1 biopython==1.78 bitarray==0.9.3 bkcharts==0.2 bleach==3.1.0 bokeh==1.2.0 boltons==20.2.1 boto==2.49.0 boto3==1.16.23 botocore==1.19.23 Bottleneck==1.2.1 BucketCache==0.12.1 certifi==2019.6.16 cffi==1.12.3 chardet==3.0.4 Click==7.0 cloudpickle==1.2.1 clyent==1.2.2 colorama==0.4.1 conda==4.7.10 conda-build==3.18.8 conda-package-handling==1.3.11 conda-verify==3.4.2 configparser==5.0.1 configs==3.0.3 contextlib2==0.5.5 cryptography==2.7 cycler==0.10.0 Cython==0.29.12 cytoolz==0.10.0 dask==2.1.0 decorator==4.4.0 deepTools==3.5.0 deeptoolsintervals==0.1.9 defusedxml==0.6.0 diskcache==5.1.0 distributed==2.1.0 docutils==0.14 entrypoints==0.3 et-xmlfile==1.0.1 fastcache==1.1.0 filelock==3.0.12 Flask==1.1.1 future==0.17.1 genomepy==0.9.1 gevent==1.4.0 gimmemotifs==0.15.0 glob2==0.7 gmpy2==2.0.8 greenlet==0.4.15 h5py==2.9.0 heapdict==1.0.0 html5lib==1.0.1 idna==2.8 imageio==2.5.0 imagesize==1.1.0 ipykernel==5.1.1 ipython==7.6.1 ipython-genutils==0.2.0 ipywidgets==7.5.0 isort==4.3.21 itsdangerous==1.1.0 jdcal==1.4.1 jedi==0.13.3 jeepney==0.4 Jinja2==2.10.1 jmespath==0.10.0 joblib==0.13.2 json5==0.8.4 jsonschema==3.0.1 jupyter==1.0.0 jupyter-client==5.3.1 jupyter-console==6.0.0 jupyter-core==4.5.0 jupyterlab==1.0.2 jupyterlab-server==1.0.0 keyring==18.0.0 kiwisolver==1.1.0 kneed==0.7.0 lazy-object-proxy==1.4.1 libarchive-c==2.8 lief==0.9.0 llvmlite==0.29.0 locket==0.2.0 Logbook==1.5.3 logomaker==0.8 lxml==4.3.4 MarkupSafe==1.1.1 matplotlib==3.1.0 mccabe==0.6.1 mistune==0.8.4 mkl-fft==1.0.12 mkl-random==1.0.2 mkl-service==2.0.2 mock==3.0.5 MOODS-python==1.9.4.1 more-itertools==7.0.0 mpmath==1.1.0 msgpack==0.6.1 multipledispatch==0.6.0 navigator-updater==0.2.1 nbconvert==5.5.0 nbformat==4.4.0 networkx==2.3 nltk==3.4.4 norns==0.1.5 nose==1.3.7 notebook==6.0.0 numba==0.44.1 numexpr==2.6.9 numpy==1.16.4 numpydoc==0.9.1 olefile==0.46 openpyxl==2.6.2 packaging==19.0 pandas==1.1.4 pandocfilters==1.4.2 parso==0.5.0 partd==1.0.0 path.py==12.0.1 pathlib2==2.3.4 patsy==0.5.1 pep8==1.7.1 pexpect==4.7.0 pickleshare==0.7.5 Pillow==6.1.0 pkginfo==1.5.0.1 plotly==4.12.0 pluggy==0.12.0 ply==3.11 prometheus-client==0.7.1 prompt-toolkit==2.0.9 psutil==5.6.3 ptyprocess==0.6.0 py==1.8.0 py2bit==0.3.0 pybedtools==0.8.1 pyBigWig==0.3.17 pycodestyle==2.5.0 pycosat==0.6.3 pycparser==2.19 pycrypto==2.6.1 pycurl==7.43.0.3 pyfaidx==0.5.9.1 pyflakes==2.1.1 Pygments==2.4.2 pylint==2.3.1 pyodbc==4.0.26 pyOpenSSL==19.0.0 pyparsing==2.4.0 PyPDF2==1.26.0 pyrsistent==0.14.11 pysam==0.16.0.1 PySocks==1.7.0 pytest==5.0.1 pytest-arraydiff==0.3 pytest-astropy==0.5.0 pytest-doctestplus==0.3.0 pytest-openfiles==0.3.2 pytest-remotedata==0.3.1 python-dateutil==2.8.0 pytz==2019.1 PyWavelets==1.0.3 PyYAML==5.1.1 pyzmq==18.0.0 qnorm==0.6.2 QtAwesome==0.5.7 qtconsole==4.5.1 QtPy==1.8.0 reportlab==3.5.55 Represent==1.6.0 requests==2.22.0 retrying==1.3.3 rope==0.14.0 ruamel-yaml==0.15.46 s3transfer==0.3.3 scikit-image==0.15.0 scikit-learn==0.21.2 scipy==1.3.0 seaborn==0.11.0 SecretStorage==3.1.1 Send2Trash==1.5.0 simplegeneric==0.8.1 singledispatch==3.4.0.3 six==1.12.0 snowballstemmer==1.9.0 sortedcollections==1.1.2 sortedcontainers==2.1.0 soupsieve==1.8 Sphinx==2.1.2 sphinxcontrib-applehelp==1.0.1 sphinxcontrib-devhelp==1.0.1 sphinxcontrib-htmlhelp==1.0.2 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.2 sphinxcontrib-serializinghtml==1.1.3 sphinxcontrib-websupport==1.1.2 spyder==3.3.6 spyder-kernels==0.5.1 SQLAlchemy==1.3.5 statistics==1.0.3.5 statsmodels==0.10.0 svist4get==1.2.24 sympy==1.4 tables==3.5.2 tblib==1.4.0 terminado==0.8.2 testpath==0.4.2 tobias==0.12.4 toolz==0.10.0 tornado==6.0.3 tqdm==4.32.1 traitlets==4.3.2 unicodecsv==0.14.1 urllib3==1.25.11 Wand==0.6.4 wcwidth==0.1.7 webencodings==0.5.1 Werkzeug==0.15.4 widgetsnbextension==3.5.0 wrapt==1.11.2 wurlitzer==1.0.2 xdg==5.0.1 xgboost==1.2.1 xlrd==1.2.0 XlsxWriter==1.1.8 xlwt==1.3.0 xxhash==2.0.0 zict==1.0.0 zipp==0.5.1
Thanks for sending the versions over - I cannot reproduce the error for that version, so I am not exactly sure what is going on...
Do you get the same behavior when using the test data? You can obtain it using TOBIAS DownloadData
(more info here). You can then use any of the test-runs from here such as:
TOBIAS BINDetect --motifs test_data/motifs.jaspar --signals test_data/Bcell_footprints.bw test_data/Tcell_footprints.bw --genome test_data/genome.fa.gz --peaks test_data/merged_peaks_annotated.bed --peak_header test_data/merged_peaks_annotated_header.txt --outdir BINDetect_output --cond_names Bcell Tcell --cores 8
If yes, we can exclude that there should be any issues with your data.
So, after playing around with this some I realized that I can actually run BINDetect just fine off my command line, the issue is only when I run it in a batch script. Most likely there is some environment problem I need to sort out, but for now running off the command line is fine (though a bit slow).
Hi, thanks for the update! Hmm, that seems odd, but maybe you are right that the batch-environment has a different setup than your commandline. Since it seems to be an environment/system specific issue, I will close this, but let me know if you have any other issues.
I encountered the same issue. I ended up getting it to work on another cluster. Both clusters also displayed this warning:
fc-list: error while loading shared libraries: libexpat.so.0: cannot open shared object file: No such file or directory
I am running the following command in a bash script:
module load python/3.7.x-anaconda TOBIAS BINDetect --motifs /endosome/work/GCRB/s161282/Databases/jaspar_motif_files/jaspar_motifs.meme \ --signals $dir/whole_genome/test/H3.3KO_scored.bw $dir/whole_genome/test/WT_scored.bw \ --genome /endosome/work/GCRB/s161282/Databases/mm10/concat_genome/mm10_genome.fa \ --peaks $dir/whole_genome/MACS2_out/WT_peaks.bed \ --outdir $dir/whole_genome/test --cond_names H3.3KO WT --cores 24 --skip-excel --verbosity 4
I get the following in stdout:
2020-11-24 22:24:29 (10377) [DEBUG] Worker cores: 22 2020-11-24 22:24:29 (10377) [DEBUG] Writer cores: 2 2020-11-24 22:24:29 (10377) [INFO] ----- Processing input data ----- 2020-11-24 22:24:29 (10377) [INFO] Checking reading/writing of files 2020-11-24 22:24:30 (10377) [INFO] Reading peaks 2020-11-24 22:24:30 (10377) [INFO] - Found 77887 regions in input peaks 2020-11-24 22:24:30 (10377) [INFO] - Merged to 77887 regions 2020-11-24 22:24:30 (10377) [DEBUG] Peak header list: ['peak_chr', 'peak_start', 'peak_end', 'additional_1'] 2020-11-24 22:24:30 (10377) [INFO] Checking for match between --peaks and --fasta/--signals boundaries 2020-11-24 22:24:30 (10377) [INFO] - Comparing peaks to /endosome/work/GCRB/s161282/Databases/mm10/concat_genome/mm10_genome.fa 2020-11-24 22:24:30 (10377) [DEBUG] Fasta boundaries: {'chr10': 130694993, 'chr11': 122082543, 'chr12': 120129022, 'chr13': 120421639, 'chr14': 124902244, 'chr15': 104043685, 'chr16': 98207768, 'chr17': 94987271, 'chr18': 90702639, 'chr19': 61431566, 'chr1': 195471971, 'chr1_GL456210_random': 169725, 'chr1_GL456211_random': 241735, 'chr1_GL456212_random': 153618, 'chr1_GL456213_random': 39340, 'chr1_GL456221_random': 206961, 'chr2': 182113224, 'chr3': 160039680, 'chr4': 156508116, 'chr4_GL456216_random': 66673, 'chr4_GL456350_random': 227966, 'chr4_JH584292_random': 14945, 'chr4_JH584293_random': 207968, 'chr4_JH584294_random': 191905, 'chr4_JH584295_random': 1976, 'chr5': 151834684, 'chr5_GL456354_random': 195993, 'chr5_JH584296_random': 199368, 'chr5_JH584297_random': 205776, 'chr5_JH584298_random': 184189, 'chr5_JH584299_random': 953012, 'chr6': 149736546, 'chr7': 145441459, 'chr7_GL456219_random': 175968, 'chr8': 129401213, 'chr9': 124595110, 'chrM': 16299, 'chrUn_GL456239': 40056, 'chrUn_GL456359': 22974, 'chrUn_GL456360': 31704, 'chrUn_GL456366': 47073, 'chrUn_GL456367': 42057, 'chrUn_GL456368': 20208, 'chrUn_GL456370': 26764, 'chrUn_GL456372': 28664, 'chrUn_GL456378': 31602, 'chrUn_GL456379': 72385, 'chrUn_GL456381': 25871, 'chrUn_GL456382': 23158, 'chrUn_GL456383': 38659, 'chrUn_GL456385': 35240, 'chrUn_GL456387': 24685, 'chrUn_GL456389': 28772, 'chrUn_GL456390': 24668, 'chrUn_GL456392': 23629, 'chrUn_GL456393': 55711, 'chrUn_GL456394': 24323, 'chrUn_GL456396': 21240, 'chrUn_JH584304': 114452, 'chrX': 171031299, 'chrX_GL456233_random': 336933, 'chrY': 91744698, 'chrY_JH584300_random': 182347, 'chrY_JH584301_random': 259875, 'chrY_JH584302_random': 155838, 'chrY_JH584303_random': 158099} 2020-11-24 22:24:30 (10377) [INFO] - Comparing peaks to /endosome/work/GCRB/s161282/h33_atac_pipe_test/whole_genome/test/H3.3KO_scored.bw 2020-11-24 22:24:30 (10377) [DEBUG] Signal boundaries: {'chr1': 195471971, 'chr10': 130694993, 'chr11': 122082543, 'chr12': 120129022, 'chr13': 120421639, 'chr14': 124902244, 'chr15': 104043685, 'chr16': 98207768, 'chr17': 94987271, 'chr18': 90702639, 'chr19': 61431566, 'chr1_GL456210_random': 169725, 'chr1_GL456211_random': 241735, 'chr1_GL456212_random': 153618, 'chr1_GL456213_random': 39340, 'chr1_GL456221_random': 206961, 'chr2': 182113224, 'chr3': 160039680, 'chr4': 156508116, 'chr4_GL456216_random': 66673, 'chr4_GL456350_random': 227966, 'chr4_JH584292_random': 14945, 'chr4_JH584293_random': 207968, 'chr4_JH584294_random': 191905, 'chr4_JH584295_random': 1976, 'chr5': 151834684, 'chr5_GL456354_random': 195993, 'chr5_JH584296_random': 199368, 'chr5_JH584297_random': 205776, 'chr5_JH584298_random': 184189, 'chr5_JH584299_random': 953012, 'chr6': 149736546, 'chr7': 145441459, 'chr7_GL456219_random': 175968, 'chr8': 129401213, 'chr9': 124595110, 'chrM': 16299, 'chrUn_GL456239': 40056, 'chrUn_GL456359': 22974, 'chrUn_GL456360': 31704, 'chrUn_GL456366': 47073, 'chrUn_GL456367': 42057, 'chrUn_GL456368': 20208, 'chrUn_GL456370': 26764, 'chrUn_GL456372': 28664, 'chrUn_GL456378': 31602, 'chrUn_GL456379': 72385, 'chrUn_GL456381': 25871, 'chrUn_GL456382': 23158, 'chrUn_GL456383': 38659, 'chrUn_GL456385': 35240, 'chrUn_GL456387': 24685, 'chrUn_GL456389': 28772, 'chrUn_GL456390': 24668, 'chrUn_GL456392': 23629, 'chrUn_GL456393': 55711, 'chrUn_GL456394': 24323, 'chrUn_GL456396': 21240, 'chrUn_JH584304': 114452, 'chrX': 171031299, 'chrX_GL456233_random': 336933, 'chrY': 91744698, 'chrY_JH584300_random': 182347, 'chrY_JH584301_random': 259875, 'chrY_JH584302_random': 155838, 'chrY_JH584303_random': 158099} 2020-11-24 22:24:31 (10377) [INFO] - Comparing peaks to /endosome/work/GCRB/s161282/h33_atac_pipe_test/whole_genome/test/WT_scored.bw 2020-11-24 22:24:31 (10377) [DEBUG] Signal boundaries: {'chr1': 195471971, 'chr10': 130694993, 'chr11': 122082543, 'chr12': 120129022, 'chr13': 120421639, 'chr14': 124902244, 'chr15': 104043685, 'chr16': 98207768, 'chr17': 94987271, 'chr18': 90702639, 'chr19': 61431566, 'chr1_GL456210_random': 169725, 'chr1_GL456211_random': 241735, 'chr1_GL456212_random': 153618, 'chr1_GL456213_random': 39340, 'chr1_GL456221_random': 206961, 'chr2': 182113224, 'chr3': 160039680, 'chr4': 156508116, 'chr4_GL456216_random': 66673, 'chr4_GL456350_random': 227966, 'chr4_JH584292_random': 14945, 'chr4_JH584293_random': 207968, 'chr4_JH584294_random': 191905, 'chr4_JH584295_random': 1976, 'chr5': 151834684, 'chr5_GL456354_random': 195993, 'chr5_JH584296_random': 199368, 'chr5_JH584297_random': 205776, 'chr5_JH584298_random': 184189, 'chr5_JH584299_random': 953012, 'chr6': 149736546, 'chr7': 145441459, 'chr7_GL456219_random': 175968, 'chr8': 129401213, 'chr9': 124595110, 'chrM': 16299, 'chrUn_GL456239': 40056, 'chrUn_GL456359': 22974, 'chrUn_GL456360': 31704, 'chrUn_GL456366': 47073, 'chrUn_GL456367': 42057, 'chrUn_GL456368': 20208, 'chrUn_GL456370': 26764, 'chrUn_GL456372': 28664, 'chrUn_GL456378': 31602, 'chrUn_GL456379': 72385, 'chrUn_GL456381': 25871, 'chrUn_GL456382': 23158, 'chrUn_GL456383': 38659, 'chrUn_GL456385': 35240, 'chrUn_GL456387': 24685, 'chrUn_GL456389': 28772, 'chrUn_GL456390': 24668, 'chrUn_GL456392': 23629, 'chrUn_GL456393': 55711, 'chrUn_GL456394': 24323, 'chrUn_GL456396': 21240, 'chrUn_JH584304': 114452, 'chrX': 171031299, 'chrX_GL456233_random': 336933, 'chrY': 91744698, 'chrY_JH584300_random': 182347, 'chrY_JH584301_random': 259875, 'chrY_JH584302_random': 155838, 'chrY_JH584303_random': 158099} 2020-11-24 22:24:31 (10377) [INFO] Estimating GC content from peak sequences 2020-11-24 22:24:31 (10377) [INFO] - GC content estimated at 56.35% 2020-11-24 22:24:31 (10377) [INFO] Reading motifs from file 2020-11-24 22:24:31 (10377) [INFO] - Read 746 motifs 2020-11-24 22:24:31 (10377) [DEBUG] Getting motifs ready 2020-11-24 22:24:31 (10377) [DEBUG] Getting match threshold per motif
Everything seems fine until it sits here at "Getting match threshold per motif" for 9+ hours. I have tried using only one motif, I've tried using a couple motifs. I have tried removing TOBIAS completely and re-installing, and I've tried creating/activating TOBIAS_ENV. One time now, BINDetect completed successfully when scanning for a single motif; however, I was not able to repeat this even running the same single motif or the same command. Please let me know what may be causing the issue or what additional information you might need to help.
Thank you, Ryan