vanheeringen-lab / gimmemotifs

Suite of motif tools, including a motif prediction pipeline for ChIP-seq experiments. See full GimmeMotifs documentation for detailed installation instructions and usage examples.
https://gimmemotifs.readthedocs.io/en/master
MIT License
109 stars 33 forks source link

ValueError: Motif GM.5.0.bHLH.0126 does not occur in motif database when running maelstrom #192

Closed shangguandong1996 closed 3 years ago

shangguandong1996 commented 3 years ago

Dear developer

I am running maelstrom using the test data hg19.blood.most_variable.1k.txt my hg19 genome is from

sgd@localhost ~/reference/genome/hg19
$ genomepy install hg19 --annotation -g .
Downloading genome from UCSC.
Target URL: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz...
Download: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 905M/905M [01:54<00:00, 8.30MB/s]
Genome download successful, starting post processing...

name: hg19
local name: hg19
fasta: /data/sgd_data/reference/genome/hg19/hg19/hg19.fa
Downloading annotation from UCSC.
Target URL: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.knownGene.gtf.gz...
Download: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 17.1M/17.1M [00:03<00:00, 4.53MB/s]
Annotation download successful
sgd@localhost ~/test
$ gimme maelstrom hg19.blood.most_variable.1k.txt ~/reference/genome/hg19/hg19/hg19.fa maelstrom.blood.1k.out -N 60
2021-06-11 15:23:06,610 - INFO - Starting maelstrom
2021-06-11 15:23:06,617 - INFO - Input is not mean-centered, setting the mean of all rows to 0.
2021-06-11 15:23:06,617 - INFO - Use --nocenter if you know what you're doing and want to change this behavior.
2021-06-11 15:23:06,617 - INFO - Note that if you use count data (ChIP-seq, ATAC-seq) we recommend to first transform your data, for instance using log2(), and to normalize between samples. To create a table suitable for maelstrom you can use the coverage_table script included with GimmeMotifs.
2021-06-11 15:23:06,632 - INFO - Counts, using: maelstrom.blood.1k.out/motif.count.txt.gz
2021-06-11 15:23:06,632 - INFO - motif scanning (scores)
2021-06-11 15:23:06,632 - INFO - reading table
2021-06-11 15:23:12,367 - INFO - creating score table (z-score, GC%)
2021-06-11 15:45:27,389 - INFO - done
2021-06-11 15:45:28,815 - INFO - creating dataframe
2021-06-11 15:45:38,405 - INFO - Selecting non-redundant motifs
2021-06-11 15:45:43,852 - INFO - Selected 657 motifs
2021-06-11 15:45:43,852 - INFO - Motifs: maelstrom.blood.1k.out/nonredundant.motifs.pfm
2021-06-11 15:45:43,852 - INFO - Factor mappings: maelstrom.blood.1k.out/nonredundant.motifs.motif2factors.txt
2021-06-11 15:45:44,024 - INFO - Fitting BayesianRidge
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00,  2.46it/s]
2021-06-11 15:45:46,488 - INFO - Done
2021-06-11 15:45:46,757 - INFO - Fitting XGBoostRegression
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:24<00:00,  4.15s/it]
2021-06-11 15:46:11,661 - INFO - Done
2021-06-11 15:46:11,844 - INFO - Fitting MultiTaskLasso
2021-06-11 15:46:14,602 - INFO - Done
2021-06-11 15:46:14,878 - INFO - Fitting SVR
2021-06-11 15:46:22,090 - INFO - Done
2021-06-11 15:46:22,111 - INFO - Rank aggregation
2021-06-11 15:46:23,020 - INFO - Correlation
2021-06-11 15:46:24,321 - INFO - html report
Traceback (most recent call last):
  File "/opt/sysoft/Python-3.7.9/bin/gimme", line 11, in <module>
    cli(sys.argv[1:])
  File "/opt/sysoft/Python-3.7.9/lib/python3.7/site-packages/gimmemotifs/cli.py", line 730, in cli
    args.func(args)
  File "/opt/sysoft/Python-3.7.9/lib/python3.7/site-packages/gimmemotifs/commands/maelstrom.py", line 45, in maelstrom
    aggregation=aggregation,
  File "/opt/sysoft/Python-3.7.9/lib/python3.7/site-packages/gimmemotifs/maelstrom.py", line 546, in run_maelstrom
    maelstrom_html_report(outdir, os.path.join(outdir, "final.out.txt"), pfmfile)
  File "/opt/sysoft/Python-3.7.9/lib/python3.7/site-packages/gimmemotifs/report.py", line 868, in maelstrom_html_report
    motif_to_img_series(df.index, pfmfile=pfmfile, outdir=outdir, subdir="logos"),
  File "/opt/sysoft/Python-3.7.9/lib/python3.7/site-packages/gimmemotifs/report.py", line 837, in motif_to_img_series
    raise ValueError(f"Motif {motif} does not occur in motif database")
ValueError: Motif GM.5.0.bHLH.0126 does not occur in motif database

Best wishes

Guandong Shang

simonvh commented 3 years ago

Thanks for this bug report! Just to be sure, this is version 0.16? We'll check to see if we can reproduce.

shangguandong1996 commented 3 years ago

I am sorry I do not post the version:)

sgd@localhost ~
$ gimme
usage: gimme [-h] <subcommand> [options]

    GimmeMotifs v0.16.0

positional arguments:
  {motifs,scan,maelstrom,match,logo,cluster,background,threshold,location,diff,prediction,motif2factors}
    motif2factors       Generate a motif2factors file based on orthology for
                        your species of interest.

optional arguments:
  -h, --help            show this help message and exit

    commands:
        motifs          identify enriched motifs (known and/or de novo)
        scan            scan for known motifs
        maelstrom       find differential motifs
        match           find motif matches in database
        logo            create sequence logo(s)
        cluster         cluster similar motifs
        background      create a background file
        threshold       calculate motif scan threshold
        location        motif location histograms
        diff            compare motif frequency and enrichment
                        between fasta files
        motif2factors   generate a motif database based on orthology for any
                        species

    type `gimme <command> -h` for more details
fmarletaz commented 3 years ago

Hi - I am also encountering the same error using a custom dataset (I actually had the same error trying with both peak category and peak coverage type of inputs):

2021-06-19 16:19:01,513 - INFO - Rank aggregation
2021-06-19 16:19:01,885 - INFO - Correlation
2021-06-19 16:19:02,304 - INFO - html report
Traceback (most recent call last):
  File "/home/ferdi/miniconda3/envs/gimme/bin/gimme", line 11, in <module>
    cli(sys.argv[1:])
  File "/home/ferdi/miniconda3/envs/gimme/lib/python3.9/site-packages/gimmemotifs/cli.py", line 730, in cli
    args.func(args)
  File "/home/ferdi/miniconda3/envs/gimme/lib/python3.9/site-packages/gimmemotifs/commands/maelstrom.py", line 33, in maelstrom
    run_maelstrom(
  File "/home/ferdi/miniconda3/envs/gimme/lib/python3.9/site-packages/gimmemotifs/maelstrom.py", line 546, in run_maelstrom
    maelstrom_html_report(outdir, os.path.join(outdir, "final.out.txt"), pfmfile)
  File "/home/ferdi/miniconda3/envs/gimme/lib/python3.9/site-packages/gimmemotifs/report.py", line 868, in maelstrom_html_report
    motif_to_img_series(df.index, pfmfile=pfmfile, outdir=outdir, subdir="logos"),
  File "/home/ferdi/miniconda3/envs/gimme/lib/python3.9/site-packages/gimmemotifs/report.py", line 837, in motif_to_img_series
    raise ValueError(f"Motif {motif} does not occur in motif database")
ValueError: Motif GM.5.0.C2H2_ZF.0171 does not occur in motif database

Thanks a lot!!

simonvh commented 3 years ago

Thanks for reporting this @shangguandong1996 and @fmarletaz. This should now be fixed in the develop branch. Until 0.16.1 is released, you can run the following command in your conda environment to install the fix:

pip install git+https://github.com/vanheeringen-lab/gimmemotifs.git@develop
fmarletaz commented 3 years ago

Thanks a lot, I could run the new version successfully!

Ferdi

On Mon, 21 Jun 2021 at 08:50, Simon van Heeringen @.***> wrote:

Thanks for reporting this @shangguandong1996 https://github.com/shangguandong1996 and @fmarletaz https://github.com/fmarletaz. This should now be fixed in the develop branch. Until 0.16.1 is released, you can run the following command in your conda environment to install the fix:

pip install @.***

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vanheeringen-lab/gimmemotifs/issues/192#issuecomment-864809912, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIMSXIXLV27Q4Y6TRYEFE3TT3VMPANCNFSM46QFCRAQ .

tzhu-bio commented 2 years ago

Thanks for reporting this @shangguandong1996 and @fmarletaz. This should now be fixed in the develop branch. Until 0.16.1 is released, you can run the following command in your conda environment to install the fix:

pip install git+https://github.com/vanheeringen-lab/gimmemotifs.git@develop

I am encountering the same error. How I can install the 0.16.1 version? I try it with conda, but it did not work for me.

simonvh commented 2 years ago

Sorry for getting back to you so late @tzhu-bio Can you open a new issue with the details? We would need to see the exact error message you get and any other information that may help us to debug your issue.