vanheeringen-lab / gimmemotifs

Suite of motif tools, including a motif prediction pipeline for ChIP-seq experiments. See full GimmeMotifs documentation for detailed installation instructions and usage examples.
https://gimmemotifs.readthedocs.io/en/master
MIT License
109 stars 33 forks source link

IndexError: list index out of range #74

Closed itsvenu closed 3 years ago

itsvenu commented 5 years ago

Hi,

I'm using maelstrom function for identifying different motifs. I used the following command

~/miniconda2/bin/gimme maelstrom peaks_gimme.input ~/gencode-transcripts/hg19.fa peaks_combo_res

but it returns the following error

Traceback (most recent call last):
  File "/home/thatikon/miniconda2/bin/gimme", line 518, in <module>
    args.func(args)
  File "/home/thatikon/miniconda2/lib/python2.7/site-packages/gimmemotifs/commands/maelstrom.py", line 27, in maelstrom
    run_maelstrom(infile, genome, outdir, pwmfile, methods=methods, ncpus=ncpus)
  File "/home/thatikon/miniconda2/lib/python2.7/site-packages/gimmemotifs/maelstrom.py", line 254, in run_maelstrom
    pwmfile=pwmfile, ncpus=ncpus)
  File "/home/thatikon/miniconda2/lib/python2.7/site-packages/gimmemotifs/maelstrom.py", line 71, in scan_to_table
    s.set_threshold(fpr=FPR, genome=genome)
  File "/home/thatikon/miniconda2/lib/python2.7/site-packages/gimmemotifs/scanner.py", line 412, in set_threshold
    fa = RandomGenomicFasta(genome, length, 10000)
  File "/home/thatikon/miniconda2/lib/python2.7/site-packages/gimmemotifs/background.py", line 352, in __init__
    create_random_genomic_bedfile(tmpbed, genome, length, n)
  File "/home/thatikon/miniconda2/lib/python2.7/site-packages/gimmemotifs/background.py", line 35, in create_random_genomic_bedfile
    features = Genome(genome).get_random_sequences(n, length)
  File "/home/thatikon/miniconda2/lib/python2.7/site-packages/genomepy/functions.py", line 485, in get_random_sequences
    chroms = _weighted_selection(l, n)
  File "/home/thatikon/miniconda2/lib/python2.7/site-packages/genomepy/functions.py", line 234, in _weighted_selection
    return [items[bisect.bisect(cuml, random.random()*total_weight)] for _ in range(n)]
IndexError: list index out of range

I'm using v0.12.0. Any help would be greatly appreciated.

Thank you.

simonvh commented 5 years ago

I would suggest updating to the latest version (0.13.1), but I suspect that it would not solve this specific issue. Would you be able to share the input file with me? It's not a familiar error and I would need to debug it.

itsvenu commented 5 years ago

I sent the bed file to your inst. email. Thank you.

simonvh commented 5 years ago

Thanks, I had a look at the BED file, but that seems to be fine. I do suggest taking ~200bp sequences as input, if possible. For instance, you can take the summit of the peak as center.

For the error, I had a better look at your trackeback and it seems to be related to your genome file ~/gencode-transcripts/hg19.fa. Is this just the normal hg19 genome? Where did you get it and what does it contain? Does it have duplicate chromosome names by any chance?

itsvenu commented 5 years ago

I tested with 200bp windows also. But it still gives me the error. hg19.fa is completely normal. I don't see anything unusual with it. (Perfectly worked with alignment tools, picard...etc).

simonvh commented 5 years ago

Yeah the 200bp suggestion was more for later (when it works), as this usually results in a more informative motif analysis. However, we need to get this bug sorted first. I get that your genome is normal. However, apparently genomepy, a package that GimmeMotifs uses under the hood has some problems getting random genomic sequences from this file. This is clearly a bug, but to solve it I would need to know what is different about this file compared to the genome FASTA files that I normally use.

itsvenu commented 5 years ago

I got it worked. I re-sorted chr order in FASTA file (to chr1, chr10, chr11..) and my initial FASTA file had chr1, chr2, chr3...etc. I don't know if you have observed this behavior previously. But it seems the chr order in FASTA is imp for tool to work.

Thanks for your time.

simonvh commented 5 years ago

I have not! Thanks for reporting this, this is something that I should fix.

lynnyummy commented 2 years ago

Hi, I am using ATAC peaks to enrich the motifs, and got the same issue. I tried to re-sort the genome, and re instal the gimme motif, however it keeps reporting the same error. The following is the error Traceback (most recent call last): File "/data/home/.conda/envs/gimme/bin/gimme", line 11, in <module> cli(sys.argv[1:]) File "/.conda/envs/gimme/lib/python3.9/site-packages/gimmemotifs/cli.py", line 730, in cli args.func(args) File "/.conda/envs/gimme/lib/python3.9/site-packages/gimmemotifs/commands/motifs.py", line 58, in motifs write_equalsize_bedfile(args.sample, args.size, outfile) File "/.conda/envs/gimme/lib/python3.9/site-packages/gimmemotifs/utils.py", line 237, in write_equalsize_bedfile start, end = int(vals[1]), int(vals[2]) IndexError: list index out of range Could you please help to take a look? Thank you very much!