vanheeringen-lab / gimmemotifs

Suite of motif tools, including a motif prediction pipeline for ChIP-seq experiments. See full GimmeMotifs documentation for detailed installation instructions and usage examples.
https://gimmemotifs.readthedocs.io/en/master
MIT License
109 stars 33 forks source link

Gimme Maelstrom - strange Value Error #211

Closed Francis3209 closed 1 year ago

Francis3209 commented 2 years ago

Hi Simon! I'm Francesco. I'm really enthusiastic about using your gimme motif suite to perform differential motif enrichment analysis. When I try with Maelstrom, however, I always received this error message that is very complex for me to solve.

Traceback (most recent call last): File "", line 1, in File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/multiprocessing/spawn.py", line 125, in _main prepare(preparation_data) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/multiprocessing/spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/multiprocessing/spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/runpy.py", line 268, in run_path return _run_module_code(code, init_globals, run_name, File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/bin/gimme", line 8, in from gimmemotifs.cli import cli File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/init.py", line 61, in from . import denovo

..... and at the bottom of the message this:

ValueError: invalid literal for int() with base 10: ''

As you suggested, I already tried to delete the gimme motif configuration file ~/.config/gimmemotifs/gimmemotifs.cfg and run it without any parameters, then run again my analysis but the problem remains.

I run the program on my Mac OS bigSur 11.5.2, these are my python and other packages versions:

_py-xgboost-mutex 2.0 cpu_0 conda-forge abseil-cpp 20210324.1 he49afe7_0 conda-forge appdirs 1.4.4 pyh9f0ad1d_0 conda-forge arrow-cpp 4.0.1 py39hc705ef8_0_cpu conda-forge aws-c-cal 0.5.9 h0df4f8a_0 conda-forge aws-c-common 0.5.11 h0d85af4_0 conda-forge aws-c-event-stream 0.2.7 h5ecfe7a_7 conda-forge aws-c-io 0.9.14 h9daffe9_1 conda-forge aws-checksums 0.1.11 h9daffe9_6 conda-forge aws-sdk-cpp 1.8.186 h8d473ab_2 conda-forge bedtools 2.30.0 haa7f73a_1 bioconda biofluff 3.0.4 py_0 bioconda biopython 1.78 py39h89e85a6_2 conda-forge boltons 21.0.0 pyhd8ed1ab_0 conda-forge brotli 1.0.9 h046ec9c_4 conda-forge brotlipy 0.7.0 py39hcbf5805_1001 conda-forge bucketcache 0.12.1 py39hde42818_2 conda-forge bzip2 1.0.8 h0d85af4_4 conda-forge c-ares 1.17.1 h0d85af4_1 conda-forge ca-certificates 2021.5.30 h033912b_0 conda-forge certifi 2021.5.30 py39h6e9494a_0 conda-forge cffi 1.14.5 py39h319c39b_0 conda-forge chardet 4.0.0 py39h6e9494a_1 conda-forge click 8.0.1 py39h6e9494a_0 conda-forge colorama 0.4.4 pyh9f0ad1d_0 conda-forge configparser 5.0.2 pyhd8ed1ab_0 conda-forge cryptography 3.4.7 py39ha2c9959_0 conda-forge cycler 0.10.0 py_2 conda-forge decorator 5.0.9 pyhd8ed1ab_0 conda-forge diskcache 5.2.1 pyh44b312d_0 conda-forge expat 2.2.10 h1c7c35f_0 conda-forge feather-format 0.4.1 pyh9f0ad1d_0 conda-forge freetype 2.10.4 h4cff582_1 conda-forge future 0.18.2 py39h6e9494a_3 conda-forge gadem 1.3.1 hb4d813b_3 bioconda genomepy 0.9.3 py_0 bioconda gflags 2.2.2 hb1e8313_1004 conda-forge ghostscript 9.54.0 he49afe7_1 conda-forge gimmemotifs 0.16.0 py39he2a1a62_2 bioconda glog 0.5.0 h25b26a9_0 conda-forge grpc-cpp 1.38.0 h25f885f_0 conda-forge homer 4.10 pl526h770b8ee_0 bioconda htseq 0.13.5 py39hdd6a155_1 bioconda htslib 1.12 hc38c3fb_1 bioconda icu 68.1 h74dc148_0 conda-forge idna 2.10 pyh9f0ad1d_0 conda-forge jbig 2.1 h0d85af4_2003 conda-forge jinja2 3.0.1 pyhd8ed1ab_0 conda-forge joblib 1.0.1 pyhd8ed1ab_0 conda-forge jpeg 9d hbcb3906_0 conda-forge kiwisolver 1.3.1 py39hedf5dff_1 conda-forge krb5 1.19.1 hcfbf3a7_0 conda-forge lcms2 2.12 h577c468_0 conda-forge lerc 2.2.1 h046ec9c_0 conda-forge libblas 3.9.0 9_openblas conda-forge libcblas 3.9.0 9_openblas conda-forge libcurl 7.77.0 hf45b732_0 conda-forge libcxx 11.1.0 habf9029_0 conda-forge libdeflate 1.7 h35c211d_5 conda-forge libedit 3.1.20191231 h0678c8f_2 conda-forge libev 4.33 haf1e3a3_1 conda-forge libevent 2.1.10 hddc9c9b_3 conda-forge libffi 3.3 h046ec9c_2 conda-forge libgcc 4.8.5 1 conda-forge libgfortran 5.0.0 9_3_0_h6c81a4c_22 conda-forge libgfortran5 9.3.0 h6c81a4c_22 conda-forge libiconv 1.16 haf1e3a3_0 conda-forge liblapack 3.9.0 9_openblas conda-forge libllvm10 10.0.1 h009f743_3 conda-forge libnghttp2 1.43.0 h07e645a_0 conda-forge libopenblas 0.3.15 openmp_h5e1b9a4_1 conda-forge libpng 1.6.37 h7cec526_2 conda-forge libprotobuf 3.16.0 hcf210ce_0 conda-forge libssh2 1.9.0 h52ee1ee_6 conda-forge libthrift 0.14.1 hab56fdc_1 conda-forge libtiff 4.3.0 h1167814_1 conda-forge libutf8proc 2.6.1 h35c211d_0 conda-forge libuuid 2.32.1 h35c211d_1000 conda-forge libwebp-base 1.2.0 h0d85af4_2 conda-forge libxgboost 1.4.0 he49afe7_0 conda-forge libxml2 2.9.12 h93ec3fd_0 conda-forge libxslt 1.1.33 h5739fc3_2 conda-forge llvm-openmp 11.1.0 hda6cdc1_1 conda-forge llvmlite 0.36.0 py39h798a4f4_0 conda-forge logbook 1.5.3 py39hcbf5805_4 conda-forge logomaker 0.8 pyh864c0ab_1 bioconda loguru 0.5.3 py39h6e9494a_2 conda-forge lz4-c 1.9.3 h046ec9c_0 conda-forge markupsafe 2.0.1 py39h89e85a6_0 conda-forge matplotlib 3.4.2 py39h6e9494a_0 conda-forge matplotlib-base 3.4.2 py39hb07454d_0 conda-forge meme 5.3.0 py39pl5262hd110924_2 bioconda mpi 1.0 openmpi conda-forge mysql-connector-c 6.1.11 h0f02589_1007 conda-forge ncurses 6.2 h2e338ed_4 conda-forge norns 0.1.5 pyh864c0ab_1 bioconda nose 1.3.7 py_1006 conda-forge numba 0.53.1 py39he2616bd_0 conda-forge numpy 1.20.3 py39h7eed0ac_1 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openjdk 11.0.9.1 hcf210ce_1 conda-forge openjpeg 2.4.0 h6e7aa92_1 conda-forge openmpi 4.1.1 hd3cd54c_0 conda-forge openssl 1.1.1k h0d85af4_0 conda-forge orc 1.6.8 hfe4c36d_0 conda-forge palettable 3.3.0 py_0 conda-forge pandas 1.2.4 py39h4d6be9b_0 conda-forge parquet-cpp 1.5.1 2 conda-forge pathlib 1.0.1 py39h6e9494a_4 conda-forge patsy 0.5.1 py_0 conda-forge perl 5.26.2 hbcb3906_1008 conda-forge perl-algorithm-cluster 1.58 pl526h1de35cc_0 bioconda perl-carp 1.38 pl526_3 bioconda perl-cgi 4.40 pl526h470a237_0 bioconda perl-common-sense 3.74 pl526_2 bioconda perl-constant 1.33 pl526_1 bioconda perl-dbi 1.642 pl526_0 bioconda perl-exporter 5.72 pl526_1 bioconda perl-extutils-makemaker 7.36 pl526_1 bioconda perl-file-path 2.16 pl526_0 bioconda perl-file-which 1.23 pl526_0 bioconda perl-html-parser 3.72 pl526h04f5b5a_5 bioconda perl-html-tagset 3.20 pl526_3 bioconda perl-html-template 2.97 pl526_1 bioconda perl-html-tree 5.07 pl526_1 bioconda perl-json 4.02 pl526_0 bioconda perl-json-xs 2.34 pl526h04f5b5a_3 bioconda perl-log-log4perl 1.49 pl526_0 bioconda perl-math-cdf 0.1 pl526h1de35cc_5 bioconda perl-scalar-list-utils 1.52 pl526h01d97ff_0 bioconda perl-types-serialiser 1.0 pl526_2 bioconda perl-xml-namespacesupport 1.12 pl526_0 bioconda perl-xml-parser 2.44_01 pl526hb1d6bea_1002 conda-forge perl-xml-sax 0.99 pl526_1 bioconda perl-xml-sax-base 1.09 pl526_0 bioconda perl-xml-sax-expat 0.51 pl526_3 bioconda perl-xml-simple 2.25 pl526_1 bioconda perl-xsloader 0.24 pl526_0 bioconda perl-yaml 1.29 pl526_0 bioconda pillow 8.2.0 py39h5fdd921_1 conda-forge pip 21.1.2 pyhd8ed1ab_0 conda-forge prosampler 1.0 h770b8ee_0 bioconda py-xgboost 1.4.0 py39h6e9494a_0 conda-forge pyarrow 4.0.1 py39hc3b5b9c_0_cpu conda-forge pybedtools 0.8.2 py39h33336d3_1 bioconda pybigwig 0.3.18 py39h8b2de0f_1 bioconda pycparser 2.20 pyh9f0ad1d_2 conda-forge pyfaidx 0.5.9.5 pyh3252c3a_0 bioconda pyopenssl 20.0.1 pyhd8ed1ab_0 conda-forge pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pysam 0.16.0.1 py39h56703ae_3 bioconda pysocks 1.7.1 py39h6e9494a_3 conda-forge python 3.9.4 h9133fd0_0_cpython conda-forge python-dateutil 2.8.1 py_0 conda-forge python-xxhash 2.0.2 py39h89e85a6_0 conda-forge python_abi 3.9 1_cp39 conda-forge pytz 2021.1 pyhd8ed1ab_0 conda-forge pyyaml 5.4.1 py39hcbf5805_0 conda-forge qnorm 0.7.0 pyh44b312d_0 conda-forge re2 2021.04.01 he49afe7_0 conda-forge readline 8.1 h05e3726_0 conda-forge represent 1.6.0 py39hb5aae12_2 conda-forge requests 2.25.1 pyhd3deb0d_0 conda-forge scikit-learn 0.24.2 py39h4b1dcc9_0 conda-forge scipy 1.6.3 py39h056f1c0_0 conda-forge seaborn 0.11.1 hd8ed1ab_1 conda-forge seaborn-base 0.11.1 pyhd8ed1ab_1 conda-forge setuptools 49.6.0 py39h6e9494a_3 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.1.8 hb1e8313_3 conda-forge sqlite 3.35.5 h44b9ce1_0 conda-forge statsmodels 0.12.2 py39h329c335_0 conda-forge threadpoolctl 2.1.0 pyh5ca1d4c_0 conda-forge tk 8.6.10 h0419947_1 conda-forge tornado 6.1 py39hcbf5805_1 conda-forge tqdm 4.61.0 pyhd8ed1ab_0 conda-forge trawler 2.0 hdfd78af_4 bioconda tzdata 2021a he74cb21_0 conda-forge ucsc-bedtogenepred 377 h516baf0_1 bioconda ucsc-bigbedtobed 377 h516baf0_1 bioconda ucsc-genepredtobed 377 h516baf0_3 bioconda ucsc-genepredtogtf 377 h516baf0_3 bioconda ucsc-gff3togenepred 377 h516baf0_1 bioconda ucsc-gtftogenepred 377 h516baf0_3 bioconda urllib3 1.26.5 pyhd8ed1ab_0 conda-forge weeder 2.0 hb280591_5 bioconda wheel 0.36.2 pyhd3deb0d_0 conda-forge xdg 5.0.2 pyhd8ed1ab_0 conda-forge xgboost 1.4.0 py39h6e9494a_0 conda-forge xmltodict 0.12.0 py_0 conda-forge xxhash 0.8.0 h35c211d_3 conda-forge xxmotif 1.6 0 bioconda xz 5.2.5 haf1e3a3_1 conda-forge yaml 0.2.5 haf1e3a3_0 conda-forge zlib 1.2.11 h7795811_1010 conda-forge zstd 1.5.0 h582d3a0_0 conda-forge

And finally, my input TXT file is this tab delimited file like this;

loc Trametinib_12h Trametinib_24h Trametinib_3h Untreated chr14:51533916-51534416 -0.73038978554192 -0.75756174613418 0.0292045064051196 1.45874702527098 chr17:40519037-40519537 -0.414261532842008 -0.791426003462628 -0.137144588165848 1.34283212447048 chr7:142808500-142809000 -0.856711011099945 -2.48554062398944 1.15561859563714 2.18663303945224 chr12:25458510-25459010 -1.11450907682427 -2.09368630465281 0.716401716886725 2.49179366459037 chr3:186098124-186098624 -0.813729589200967 -1.83892747430712 -1.08594276476826 3.73859982827634 chr20:62739893-62740393 -0.430940180607117 -0.893090704706988 -0.406246362173098 1.7302772474872 chr4:87896621-87897121 -0.577905813389179 0.05970737077861 -2.31475686197784 2.83295530458841 chr22:24207010-24207510 -0.603639756229142 -0.279537387442472 -0.948471767354042 1.83164891102566 chr11:122643118-122643618 -0.524114013826575 -0.749542504928305 -0.160310105579936 1.43396662433481 chr10:99965421-99965921 -1.06461687455462 -0.423861486523813 -0.450305165527713 1.93878352660615 chr11:70016462-70016962 -0.659396598158438 -1.30147785188861 -0.126278345089177 2.08715279513622 chr22:43264253-43264753 -0.858073997387409 -1.07580442926487 0.53618053094244 1.39769789570984 chr14:74616749-74617249 -0.540435607173403 -0.439786003323952 -0.210392283547632 1.19061389404499 chr9:19998659-19999159 -0.992356201003537 -1.68042041234842 0.598847799536313 2.07392881381564

With peak coordinates in the first column and normalised/mean centred values in the others for each group. I would really appreciate any help as I already tried in different ways but couldn't fix the problem Many thanks! Francesco

simonvh commented 2 years ago

Hi Francesco, can provide the exact command-line you use to run gimme maelstrom, the full error message and if possible a sample of your input file that gives this error?

Francis3209 commented 2 years ago

Hi, yes, of course. This is my command line: gimme maelstrom ${output_folder}/${exp}.diffPeaksForMaelstrom.txt ${genomefile} ${output_folder}/${exp}.gimmeMaelstrom.results.txt

where $genomefile is /Users/ieoxxx/Work/genome_annotations/hg38/hg38.fa

Here's the full error message:

Fontconfig warning: ignoring UTF-8: not a valid region tag 2021-10-14 14:12:14,241 - INFO - Starting maelstrom 2021-10-14 14:12:14,624 - INFO - motif scanning (counts) 2021-10-14 14:12:14,624 - INFO - reading table 2021-10-14 14:12:17,564 - INFO - setting threshold Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Traceback (most recent call last): File "", line 1, in File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/multiprocessing/spawn.py", line 125, in _main prepare(preparation_data) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/multiprocessing/spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/multiprocessing/spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/runpy.py", line 268, in run_path return _run_module_code(code, init_globals, run_name, File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/bin/gimme", line 8, in from gimmemotifs.cli import cli File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/init.py", line 61, in from . import denovo # noqa: F401 File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/denovo.py", line 51, in from gimmemotifs.stats import calc_stats, rank_motifs, write_stats File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/stats.py", line 10, in from gimmemotifs.scanner import scan_to_best_match, Scanner File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 58, in config = MotifConfig() File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/config.py", line 95, in init self._upgrade_config() File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/config.py", line 98, in _upgrade_config if "width" in self.config["params"]: File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/configparser.py", line 960, in getitem raise KeyError(key) KeyError: 'params' Fontconfig warning: ignoring UTF-8: not a valid region tag 2021-10-14 14:12:28,308 - INFO - creating count table Traceback (most recent call last): File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/bin/gimme", line 11, in cli(sys.argv[1:]) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/cli.py", line 730, in cli args.func(args) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/commands/maelstrom.py", line 33, in maelstrom run_maelstrom( File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/maelstrom.py", line 343, in run_maelstrom counts = scan_regionfile_to_table( File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 181, in scan_regionfile_to_table for row in s.count(regions): File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 1027, in count for matches in self.scan(seqs, nreport, scan_rc): File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 1110, in scan seqs = as_fasta(seqs, genome=self.genome) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/utils.py", line 715, in as_fasta return Fasta(fdict=as_seqdict(to_convert, genome, minsize)) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/functools.py", line 877, in wrapper return dispatch(args[0].class)(*args, **kw) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/utils.py", line 637, in _as_seqdict_list return _genomepy_convert(to_convert, genome, minsize) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/utils.py", line 548, in _genomepy_convert g.track2fasta(to_convert, tmpfile.name) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/genomepy/genome.py", line 373, in track2fasta for seq in seqqer: File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/genomepy/genome.py", line 313, in _regions_to_seqs seq = self._region_to_seq(name, extend_up, extend_down) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/genomepy/genome.py", line 302, in _region_to_seq start, end = [int(c) for c in coords.split("-")] File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/genomepy/genome.py", line 302, in start, end = [int(c) for c in coords.split("-")] ValueError: invalid literal for int() with base 10: '' (gimmeMotif) macprospare03:Maelstrom ieo5634$ (gimmeMotif) macprospare03:Maelstrom ieo5634$ /Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 12 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

And the input txt file for the analysis.

Cfpac1.diffPeaksForMaelstrom.txt

Thank you!!

simonvh commented 2 years ago

I suspect it's this line in your input file:

chr22_KI270733v1_random:-142-358

Can you try removing that and rerunning gimme maelstrom?

Francis3209 commented 2 years ago

Yes, I removed the line and now it seems to work! thanks a lot!! Francesco

Francis3209 commented 2 years ago

Hi Simon, just last minor issue related to the analysis. Finally I got all output files however the report.html file was missing. These are the messages I got during the analysis:

Fontconfig warning: ignoring UTF-8: not a valid region tag 2021-10-14 17:13:26,724 - INFO - Starting maelstrom 2021-10-14 17:13:27,099 - INFO - motif scanning (counts) 2021-10-14 17:13:27,099 - INFO - reading table 2021-10-14 17:13:30,061 - INFO - setting threshold Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag 2021-10-14 17:13:39,998 - INFO - creating count table 2021-10-14 17:25:45,860 - INFO - done 2021-10-14 17:25:47,887 - INFO - creating dataframe 2021-10-14 17:26:36,577 - INFO - motif scanning (scores) 2021-10-14 17:26:36,579 - INFO - reading table 2021-10-14 17:26:44,066 - INFO - creating score table (z-score, GC%) Fontconfig warning: ignoring UTF-8: not a valid region tag

2021-10-14 17:57:45,926 - INFO - done 2021-10-14 17:57:46,644 - INFO - creating dataframe

2021-10-14 18:05:38,493 - INFO - Selecting non-redundant motifs 2021-10-14 18:09:53,567 - INFO - Selected 722 motifs 2021-10-14 18:09:53,567 - INFO - Motifs: /Users/ieo5634/Work/IEO/Projects/Pancreas/Alice/Cfpac1_h3k27ac_chip/motif_analysis/Maelstrom/Cfpac1.gimmeMaelstrom.results.txt/nonredundant.motifs.pfm 2021-10-14 18:09:53,567 - INFO - Factor mappings: /Users/ieo5634/Work/IEO/Projects/Pancreas/Alice/Cfpac1_h3k27ac_chip/motif_analysis/Maelstrom/Cfpac1.gimmeMaelstrom.results.txt/nonredundant.motifs.motif2factors.txt 2021-10-14 18:09:58,900 - INFO - Fitting BayesianRidge 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:44<00:00, 11.14s/it] 2021-10-14 18:10:44,677 - INFO - Done 2021-10-14 18:10:50,173 - INFO - Fitting XGBoostRegression 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [08:24<00:00, 126.20s/it] 2021-10-14 18:19:16,125 - INFO - Done 2021-10-14 18:19:21,626 - INFO - Fitting MultiTaskLasso 2021-10-14 21:14:51,011 - INFO - Done 2021-10-14 21:14:56,491 - INFO - Fitting SVR 2021-10-14 21:20:49,203 - INFO - Done 2021-10-14 21:20:49,243 - INFO - Rank aggregation Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag 2021-10-14 21:20:58,616 - INFO - Correlation 2021-10-14 21:21:34,768 - INFO - html report Traceback (most recent call last): File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/bin/gimme", line 11, in cli(sys.argv[1:]) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/cli.py", line 730, in cli args.func(args) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/commands/maelstrom.py", line 33, in maelstrom run_maelstrom( File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/maelstrom.py", line 546, in run_maelstrom maelstrom_html_report(outdir, os.path.join(outdir, "final.out.txt"), pfmfile) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/report.py", line 868, in maelstrom_html_report motif_to_img_series(df.index, pfmfile=pfmfile, outdir=outdir, subdir="logos"), File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/report.py", line 837, in motif_to_img_series raise ValueError(f"Motif {motif} does not occur in motif database") ValueError: Motif GM.5.0.GATA.0013 does not occur in motif database

Maybe something related to the report.py script? Thank you so much Francesco

simonvh commented 2 years ago

Hi Francesco, this is a strange bug that we still need to track down (it only occurs sporadically). If you rerun, it will likely be gone.

Francis3209 commented 2 years ago

Hi Simon, ok thanks a lot for the help!

Francis3209 commented 2 years ago

Hi Simon, Sorry for the further message but now I would need to run maelstrom using my custom .pfm motif file, the file I prepared seems to be formatted properly however when I run the program the following error message (at the bottom) appears:

Fontconfig warning: ignoring UTF-8: not a valid region tag 2021-10-25 16:42:36,567 - INFO - Starting maelstrom 2021-10-25 16:42:36,943 - INFO - motif scanning (counts) 2021-10-25 16:42:36,943 - INFO - reading table 2021-10-25 16:42:40,582 - INFO - setting threshold 2021-10-25 16:42:43,934 - INFO - determining FPR-based threshold 2021-10-25 16:48:11,565 - INFO - creating count table Traceback (most recent call last): File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/bin/gimme", line 11, in cli(sys.argv[1:]) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/cli.py", line 730, in cli args.func(args) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/commands/maelstrom.py", line 33, in maelstrom run_maelstrom( File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/maelstrom.py", line 343, in run_maelstrom counts = scan_regionfile_to_table( File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 181, in scan_regionfile_to_table for row in s.count(regions): File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 1027, in count for matches in self.scan(seqs, nreport, scan_rc): File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 1128, in scan for result in it: File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 1182, in _scan_sequences thresholds = self.get_gc_thresholds(seqs, zscore=zscore) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 1153, in get_gc_thresholds maxt = pd.Series([m.pwm_max_score() for m in motifs], index=_threshold.columns) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/pandas/core/series.py", line 350, in init raise ValueError( ValueError: Length of passed values is 3140, index implies 3123. (gimmeMotif) macprospare03:PWMx ieo5634$ /Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 12 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

Here's my .pfm file used for the analysis.

Thanks a lot in case you have some suggestion. Best, Francesco 20170320_pwms_selected.pfm.txt

simonvh commented 2 years ago

Can you try replacing the whitespace (tabs) in the ID lines? I suspect this may cause an issue?

Francis3209 commented 2 years ago

Hi Simon, thank you for the message. I checked the whitespace and tabs in the headers of the pfm file and all seems ok, however this time I generated a new .pfm file starting from my custom motif file in jaspar, using the gimme motifs API for the format conversion. The file seems ok and I also prepared the correspoding .motif2factors.txt file, however, another strange error occurs when I launch maelstrom, Fontconfig warning: ignoring UTF-8: not a valid region tag 2021-11-04 11:11:58,287 - INFO - Starting maelstrom 2021-11-04 11:11:58,656 - INFO - motif scanning (counts) 2021-11-04 11:11:58,656 - INFO - reading table 2021-11-04 11:12:02,025 - INFO - setting threshold Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Fontconfig warning: ignoring UTF-8: not a valid region tag Traceback (most recent call last): File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/bin/gimme", line 11, in cli(sys.argv[1:]) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/cli.py", line 730, in cli args.func(args) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/commands/maelstrom.py", line 33, in maelstrom run_maelstrom( File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/maelstrom.py", line 343, in run_maelstrom counts = scan_regionfile_to_table( File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 179, in scan_regionfile_to_table s.set_threshold(fpr=FPR) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 981, in set_threshold self._threshold[motif.id] = vals File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/pandas/core/frame.py", line 3163, in setitem self._set_item(key, value) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/pandas/core/frame.py", line 3242, in _set_item value = self._sanitize_column(key, value) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/pandas/core/frame.py", line 3876, in _sanitize_column value = reindexer(value) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/pandas/core/frame.py", line 3867, in reindexer raise err File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/pandas/core/frame.py", line 3862, in reindexer value = value.reindex(self.index)._values File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/pandas/core/series.py", line 4345, in reindex return super().reindex(index=index, kwargs) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/pandas/core/generic.py", line 4811, in reindex return self._reindex_axes( File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/pandas/core/generic.py", line 4832, in _reindex_axes obj = obj._reindex_with_indexers( File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/pandas/core/generic.py", line 4877, in _reindex_with_indexers new_data = new_data.reindex_indexer( File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 1301, in reindex_indexer self.axes[axis]._can_reindex(indexer) File "/Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3476, in _can_reindex raise ValueError("cannot reindex from a duplicate axis") ValueError: cannot reindex from a duplicate axis** (gimmeMotif) macprospare03:PWMx ieo5634$ /Users/ieo5634/opt/anaconda3/envs/gimmeMotif/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 12 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d 20170320_pwms_gmformat.motif2factors.txt 20170320_pwms_gmformat.pfm.txt

Do you know a possible cause for this kind of error message ? Thanks a lot for all the help!

simonvh commented 2 years ago

Sorry, I seem to have missed this. Can you try removing the Gimmemotifs cache directory? Should be somewhere like ~/.cache/gimmemotifs

Francis3209 commented 2 years ago

Hi Simon, ok, I try and let you know, thank you again!! Francesco

Francis3209 commented 2 years ago

Hi, yes I confirm now it works! Just to remember to myself to remove the cache folder after each run. Thank you so much!