loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
191 stars 41 forks source link

BINDetect getting "stuck" #43

Closed dzra closed 3 years ago

dzra commented 3 years ago

I am running the following command in a bash script:

module load python/3.7.x-anaconda TOBIAS BINDetect --motifs /endosome/work/GCRB/s161282/Databases/jaspar_motif_files/jaspar_motifs.meme \ --signals $dir/whole_genome/test/H3.3KO_scored.bw $dir/whole_genome/test/WT_scored.bw \ --genome /endosome/work/GCRB/s161282/Databases/mm10/concat_genome/mm10_genome.fa \ --peaks $dir/whole_genome/MACS2_out/WT_peaks.bed \ --outdir $dir/whole_genome/test --cond_names H3.3KO WT --cores 24 --skip-excel --verbosity 4

I get the following in stdout:

2020-11-24 22:24:29 (10377) [DEBUG] Worker cores: 22 2020-11-24 22:24:29 (10377) [DEBUG] Writer cores: 2 2020-11-24 22:24:29 (10377) [INFO] ----- Processing input data ----- 2020-11-24 22:24:29 (10377) [INFO] Checking reading/writing of files 2020-11-24 22:24:30 (10377) [INFO] Reading peaks 2020-11-24 22:24:30 (10377) [INFO] - Found 77887 regions in input peaks 2020-11-24 22:24:30 (10377) [INFO] - Merged to 77887 regions 2020-11-24 22:24:30 (10377) [DEBUG] Peak header list: ['peak_chr', 'peak_start', 'peak_end', 'additional_1'] 2020-11-24 22:24:30 (10377) [INFO] Checking for match between --peaks and --fasta/--signals boundaries 2020-11-24 22:24:30 (10377) [INFO] - Comparing peaks to /endosome/work/GCRB/s161282/Databases/mm10/concat_genome/mm10_genome.fa 2020-11-24 22:24:30 (10377) [DEBUG] Fasta boundaries: {'chr10': 130694993, 'chr11': 122082543, 'chr12': 120129022, 'chr13': 120421639, 'chr14': 124902244, 'chr15': 104043685, 'chr16': 98207768, 'chr17': 94987271, 'chr18': 90702639, 'chr19': 61431566, 'chr1': 195471971, 'chr1_GL456210_random': 169725, 'chr1_GL456211_random': 241735, 'chr1_GL456212_random': 153618, 'chr1_GL456213_random': 39340, 'chr1_GL456221_random': 206961, 'chr2': 182113224, 'chr3': 160039680, 'chr4': 156508116, 'chr4_GL456216_random': 66673, 'chr4_GL456350_random': 227966, 'chr4_JH584292_random': 14945, 'chr4_JH584293_random': 207968, 'chr4_JH584294_random': 191905, 'chr4_JH584295_random': 1976, 'chr5': 151834684, 'chr5_GL456354_random': 195993, 'chr5_JH584296_random': 199368, 'chr5_JH584297_random': 205776, 'chr5_JH584298_random': 184189, 'chr5_JH584299_random': 953012, 'chr6': 149736546, 'chr7': 145441459, 'chr7_GL456219_random': 175968, 'chr8': 129401213, 'chr9': 124595110, 'chrM': 16299, 'chrUn_GL456239': 40056, 'chrUn_GL456359': 22974, 'chrUn_GL456360': 31704, 'chrUn_GL456366': 47073, 'chrUn_GL456367': 42057, 'chrUn_GL456368': 20208, 'chrUn_GL456370': 26764, 'chrUn_GL456372': 28664, 'chrUn_GL456378': 31602, 'chrUn_GL456379': 72385, 'chrUn_GL456381': 25871, 'chrUn_GL456382': 23158, 'chrUn_GL456383': 38659, 'chrUn_GL456385': 35240, 'chrUn_GL456387': 24685, 'chrUn_GL456389': 28772, 'chrUn_GL456390': 24668, 'chrUn_GL456392': 23629, 'chrUn_GL456393': 55711, 'chrUn_GL456394': 24323, 'chrUn_GL456396': 21240, 'chrUn_JH584304': 114452, 'chrX': 171031299, 'chrX_GL456233_random': 336933, 'chrY': 91744698, 'chrY_JH584300_random': 182347, 'chrY_JH584301_random': 259875, 'chrY_JH584302_random': 155838, 'chrY_JH584303_random': 158099} 2020-11-24 22:24:30 (10377) [INFO] - Comparing peaks to /endosome/work/GCRB/s161282/h33_atac_pipe_test/whole_genome/test/H3.3KO_scored.bw 2020-11-24 22:24:30 (10377) [DEBUG] Signal boundaries: {'chr1': 195471971, 'chr10': 130694993, 'chr11': 122082543, 'chr12': 120129022, 'chr13': 120421639, 'chr14': 124902244, 'chr15': 104043685, 'chr16': 98207768, 'chr17': 94987271, 'chr18': 90702639, 'chr19': 61431566, 'chr1_GL456210_random': 169725, 'chr1_GL456211_random': 241735, 'chr1_GL456212_random': 153618, 'chr1_GL456213_random': 39340, 'chr1_GL456221_random': 206961, 'chr2': 182113224, 'chr3': 160039680, 'chr4': 156508116, 'chr4_GL456216_random': 66673, 'chr4_GL456350_random': 227966, 'chr4_JH584292_random': 14945, 'chr4_JH584293_random': 207968, 'chr4_JH584294_random': 191905, 'chr4_JH584295_random': 1976, 'chr5': 151834684, 'chr5_GL456354_random': 195993, 'chr5_JH584296_random': 199368, 'chr5_JH584297_random': 205776, 'chr5_JH584298_random': 184189, 'chr5_JH584299_random': 953012, 'chr6': 149736546, 'chr7': 145441459, 'chr7_GL456219_random': 175968, 'chr8': 129401213, 'chr9': 124595110, 'chrM': 16299, 'chrUn_GL456239': 40056, 'chrUn_GL456359': 22974, 'chrUn_GL456360': 31704, 'chrUn_GL456366': 47073, 'chrUn_GL456367': 42057, 'chrUn_GL456368': 20208, 'chrUn_GL456370': 26764, 'chrUn_GL456372': 28664, 'chrUn_GL456378': 31602, 'chrUn_GL456379': 72385, 'chrUn_GL456381': 25871, 'chrUn_GL456382': 23158, 'chrUn_GL456383': 38659, 'chrUn_GL456385': 35240, 'chrUn_GL456387': 24685, 'chrUn_GL456389': 28772, 'chrUn_GL456390': 24668, 'chrUn_GL456392': 23629, 'chrUn_GL456393': 55711, 'chrUn_GL456394': 24323, 'chrUn_GL456396': 21240, 'chrUn_JH584304': 114452, 'chrX': 171031299, 'chrX_GL456233_random': 336933, 'chrY': 91744698, 'chrY_JH584300_random': 182347, 'chrY_JH584301_random': 259875, 'chrY_JH584302_random': 155838, 'chrY_JH584303_random': 158099} 2020-11-24 22:24:31 (10377) [INFO] - Comparing peaks to /endosome/work/GCRB/s161282/h33_atac_pipe_test/whole_genome/test/WT_scored.bw 2020-11-24 22:24:31 (10377) [DEBUG] Signal boundaries: {'chr1': 195471971, 'chr10': 130694993, 'chr11': 122082543, 'chr12': 120129022, 'chr13': 120421639, 'chr14': 124902244, 'chr15': 104043685, 'chr16': 98207768, 'chr17': 94987271, 'chr18': 90702639, 'chr19': 61431566, 'chr1_GL456210_random': 169725, 'chr1_GL456211_random': 241735, 'chr1_GL456212_random': 153618, 'chr1_GL456213_random': 39340, 'chr1_GL456221_random': 206961, 'chr2': 182113224, 'chr3': 160039680, 'chr4': 156508116, 'chr4_GL456216_random': 66673, 'chr4_GL456350_random': 227966, 'chr4_JH584292_random': 14945, 'chr4_JH584293_random': 207968, 'chr4_JH584294_random': 191905, 'chr4_JH584295_random': 1976, 'chr5': 151834684, 'chr5_GL456354_random': 195993, 'chr5_JH584296_random': 199368, 'chr5_JH584297_random': 205776, 'chr5_JH584298_random': 184189, 'chr5_JH584299_random': 953012, 'chr6': 149736546, 'chr7': 145441459, 'chr7_GL456219_random': 175968, 'chr8': 129401213, 'chr9': 124595110, 'chrM': 16299, 'chrUn_GL456239': 40056, 'chrUn_GL456359': 22974, 'chrUn_GL456360': 31704, 'chrUn_GL456366': 47073, 'chrUn_GL456367': 42057, 'chrUn_GL456368': 20208, 'chrUn_GL456370': 26764, 'chrUn_GL456372': 28664, 'chrUn_GL456378': 31602, 'chrUn_GL456379': 72385, 'chrUn_GL456381': 25871, 'chrUn_GL456382': 23158, 'chrUn_GL456383': 38659, 'chrUn_GL456385': 35240, 'chrUn_GL456387': 24685, 'chrUn_GL456389': 28772, 'chrUn_GL456390': 24668, 'chrUn_GL456392': 23629, 'chrUn_GL456393': 55711, 'chrUn_GL456394': 24323, 'chrUn_GL456396': 21240, 'chrUn_JH584304': 114452, 'chrX': 171031299, 'chrX_GL456233_random': 336933, 'chrY': 91744698, 'chrY_JH584300_random': 182347, 'chrY_JH584301_random': 259875, 'chrY_JH584302_random': 155838, 'chrY_JH584303_random': 158099} 2020-11-24 22:24:31 (10377) [INFO] Estimating GC content from peak sequences 2020-11-24 22:24:31 (10377) [INFO] - GC content estimated at 56.35% 2020-11-24 22:24:31 (10377) [INFO] Reading motifs from file 2020-11-24 22:24:31 (10377) [INFO] - Read 746 motifs 2020-11-24 22:24:31 (10377) [DEBUG] Getting motifs ready 2020-11-24 22:24:31 (10377) [DEBUG] Getting match threshold per motif

Everything seems fine until it sits here at "Getting match threshold per motif" for 9+ hours. I have tried using only one motif, I've tried using a couple motifs. I have tried removing TOBIAS completely and re-installing, and I've tried creating/activating TOBIAS_ENV. One time now, BINDetect completed successfully when scanning for a single motif; however, I was not able to repeat this even running the same single motif or the same command. Please let me know what may be causing the issue or what additional information you might need to help.

Thank you, Ryan

msbentsen commented 3 years ago

Hi Ryan, Ugh yea, that sounds unfortunate, and should not happen! Can you send me the output of pip freeze? The estimation of threshold runs over the MOODS library, so I am wondering if there might be a specific version causing it to fail.

dzra commented 3 years ago

Ok thanks!

[s161282@Nucleus005 ~]$ module load python/3.7.x-anaconda [s161282@Nucleus005 ~]$ pip freeze adjustText==0.7.3 alabaster==0.7.12 anaconda-client==1.7.2 anaconda-navigator==1.9.7 anaconda-project==0.8.3 appdirs==1.4.4 asn1crypto==0.24.0 astroid==2.2.5 astropy==3.2.1 atomicwrites==1.3.0 attrs==19.1.0 Babel==2.7.0 backcall==0.1.0 backports.functools-lru-cache==1.5 backports.os==0.1.1 backports.shutil-get-terminal-size==1.0.0 backports.tempfile==1.0 backports.weakref==1.0.post1 beautifulsoup4==4.7.1 biopython==1.78 bitarray==0.9.3 bkcharts==0.2 bleach==3.1.0 bokeh==1.2.0 boltons==20.2.1 boto==2.49.0 boto3==1.16.23 botocore==1.19.23 Bottleneck==1.2.1 BucketCache==0.12.1 certifi==2019.6.16 cffi==1.12.3 chardet==3.0.4 Click==7.0 cloudpickle==1.2.1 clyent==1.2.2 colorama==0.4.1 conda==4.7.10 conda-build==3.18.8 conda-package-handling==1.3.11 conda-verify==3.4.2 configparser==5.0.1 configs==3.0.3 contextlib2==0.5.5 cryptography==2.7 cycler==0.10.0 Cython==0.29.12 cytoolz==0.10.0 dask==2.1.0 decorator==4.4.0 deepTools==3.5.0 deeptoolsintervals==0.1.9 defusedxml==0.6.0 diskcache==5.1.0 distributed==2.1.0 docutils==0.14 entrypoints==0.3 et-xmlfile==1.0.1 fastcache==1.1.0 filelock==3.0.12 Flask==1.1.1 future==0.17.1 genomepy==0.9.1 gevent==1.4.0 gimmemotifs==0.15.0 glob2==0.7 gmpy2==2.0.8 greenlet==0.4.15 h5py==2.9.0 heapdict==1.0.0 html5lib==1.0.1 idna==2.8 imageio==2.5.0 imagesize==1.1.0 ipykernel==5.1.1 ipython==7.6.1 ipython-genutils==0.2.0 ipywidgets==7.5.0 isort==4.3.21 itsdangerous==1.1.0 jdcal==1.4.1 jedi==0.13.3 jeepney==0.4 Jinja2==2.10.1 jmespath==0.10.0 joblib==0.13.2 json5==0.8.4 jsonschema==3.0.1 jupyter==1.0.0 jupyter-client==5.3.1 jupyter-console==6.0.0 jupyter-core==4.5.0 jupyterlab==1.0.2 jupyterlab-server==1.0.0 keyring==18.0.0 kiwisolver==1.1.0 kneed==0.7.0 lazy-object-proxy==1.4.1 libarchive-c==2.8 lief==0.9.0 llvmlite==0.29.0 locket==0.2.0 Logbook==1.5.3 logomaker==0.8 lxml==4.3.4 MarkupSafe==1.1.1 matplotlib==3.1.0 mccabe==0.6.1 mistune==0.8.4 mkl-fft==1.0.12 mkl-random==1.0.2 mkl-service==2.0.2 mock==3.0.5 MOODS-python==1.9.4.1 more-itertools==7.0.0 mpmath==1.1.0 msgpack==0.6.1 multipledispatch==0.6.0 navigator-updater==0.2.1 nbconvert==5.5.0 nbformat==4.4.0 networkx==2.3 nltk==3.4.4 norns==0.1.5 nose==1.3.7 notebook==6.0.0 numba==0.44.1 numexpr==2.6.9 numpy==1.16.4 numpydoc==0.9.1 olefile==0.46 openpyxl==2.6.2 packaging==19.0 pandas==1.1.4 pandocfilters==1.4.2 parso==0.5.0 partd==1.0.0 path.py==12.0.1 pathlib2==2.3.4 patsy==0.5.1 pep8==1.7.1 pexpect==4.7.0 pickleshare==0.7.5 Pillow==6.1.0 pkginfo==1.5.0.1 plotly==4.12.0 pluggy==0.12.0 ply==3.11 prometheus-client==0.7.1 prompt-toolkit==2.0.9 psutil==5.6.3 ptyprocess==0.6.0 py==1.8.0 py2bit==0.3.0 pybedtools==0.8.1 pyBigWig==0.3.17 pycodestyle==2.5.0 pycosat==0.6.3 pycparser==2.19 pycrypto==2.6.1 pycurl==7.43.0.3 pyfaidx==0.5.9.1 pyflakes==2.1.1 Pygments==2.4.2 pylint==2.3.1 pyodbc==4.0.26 pyOpenSSL==19.0.0 pyparsing==2.4.0 PyPDF2==1.26.0 pyrsistent==0.14.11 pysam==0.16.0.1 PySocks==1.7.0 pytest==5.0.1 pytest-arraydiff==0.3 pytest-astropy==0.5.0 pytest-doctestplus==0.3.0 pytest-openfiles==0.3.2 pytest-remotedata==0.3.1 python-dateutil==2.8.0 pytz==2019.1 PyWavelets==1.0.3 PyYAML==5.1.1 pyzmq==18.0.0 qnorm==0.6.2 QtAwesome==0.5.7 qtconsole==4.5.1 QtPy==1.8.0 reportlab==3.5.55 Represent==1.6.0 requests==2.22.0 retrying==1.3.3 rope==0.14.0 ruamel-yaml==0.15.46 s3transfer==0.3.3 scikit-image==0.15.0 scikit-learn==0.21.2 scipy==1.3.0 seaborn==0.11.0 SecretStorage==3.1.1 Send2Trash==1.5.0 simplegeneric==0.8.1 singledispatch==3.4.0.3 six==1.12.0 snowballstemmer==1.9.0 sortedcollections==1.1.2 sortedcontainers==2.1.0 soupsieve==1.8 Sphinx==2.1.2 sphinxcontrib-applehelp==1.0.1 sphinxcontrib-devhelp==1.0.1 sphinxcontrib-htmlhelp==1.0.2 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.2 sphinxcontrib-serializinghtml==1.1.3 sphinxcontrib-websupport==1.1.2 spyder==3.3.6 spyder-kernels==0.5.1 SQLAlchemy==1.3.5 statistics==1.0.3.5 statsmodels==0.10.0 svist4get==1.2.24 sympy==1.4 tables==3.5.2 tblib==1.4.0 terminado==0.8.2 testpath==0.4.2 tobias==0.12.4 toolz==0.10.0 tornado==6.0.3 tqdm==4.32.1 traitlets==4.3.2 unicodecsv==0.14.1 urllib3==1.25.11 Wand==0.6.4 wcwidth==0.1.7 webencodings==0.5.1 Werkzeug==0.15.4 widgetsnbextension==3.5.0 wrapt==1.11.2 wurlitzer==1.0.2 xdg==5.0.1 xgboost==1.2.1 xlrd==1.2.0 XlsxWriter==1.1.8 xlwt==1.3.0 xxhash==2.0.0 zict==1.0.0 zipp==0.5.1

msbentsen commented 3 years ago

Thanks for sending the versions over - I cannot reproduce the error for that version, so I am not exactly sure what is going on...

Do you get the same behavior when using the test data? You can obtain it using TOBIAS DownloadData (more info here). You can then use any of the test-runs from here such as: TOBIAS BINDetect --motifs test_data/motifs.jaspar --signals test_data/Bcell_footprints.bw test_data/Tcell_footprints.bw --genome test_data/genome.fa.gz --peaks test_data/merged_peaks_annotated.bed --peak_header test_data/merged_peaks_annotated_header.txt --outdir BINDetect_output --cond_names Bcell Tcell --cores 8

If yes, we can exclude that there should be any issues with your data.

dzra commented 3 years ago

So, after playing around with this some I realized that I can actually run BINDetect just fine off my command line, the issue is only when I run it in a batch script. Most likely there is some environment problem I need to sort out, but for now running off the command line is fine (though a bit slow).

msbentsen commented 3 years ago

Hi, thanks for the update! Hmm, that seems odd, but maybe you are right that the batch-environment has a different setup than your commandline. Since it seems to be an environment/system specific issue, I will close this, but let me know if you have any other issues.

iamjli commented 3 years ago

I encountered the same issue. I ended up getting it to work on another cluster. Both clusters also displayed this warning:

fc-list: error while loading shared libraries: libexpat.so.0: cannot open shared object file: No such file or directory