vanheeringen-lab / gimmemotifs

Suite of motif tools, including a motif prediction pipeline for ChIP-seq experiments. See full GimmeMotifs documentation for detailed installation instructions and usage examples.
https://gimmemotifs.readthedocs.io/en/master
MIT License
110 stars 33 forks source link

JASPAR2020_vertebrates error #268

Open gfudenberg opened 2 years ago

gfudenberg commented 2 years ago

Hello, I am trying to provide the jaspar_2020 database to maelstrom as described: https://gimmemotifs.readthedocs.io/en/stable/overview.html#motif-databases.

gimme maelstrom seems to run fine I use the default database or provide jaspar 2018 motifs with the added parameter -p JASPAR2018_vertebrates. However, I am encountering key errors when I provide jaspar 2020 or 2022, where it appears there is a motif id mismatch at some point in the scanning.

scanning:   0%|                                      | 0/1000 [00:00<?, ? sequences/s]Traceback (most recent call last):
  File "/home1/fudenber/.conda/envs/gimme/bin/gimme", line 11, in <module>
    cli(sys.argv[1:])
  File "/home1/fudenber/.conda/envs/gimme/lib/python3.9/site-packages/gimmemotifs/cli.py", line 746, in cli
    args.func(args)
  File "/home1/fudenber/.conda/envs/gimme/lib/python3.9/site-packages/gimmemotifs/commands/maelstrom.py", line 33, in maelstrom
    run_maelstrom(
  File "/home1/fudenber/.conda/envs/gimme/lib/python3.9/site-packages/gimmemotifs/maelstrom.py", line 343, in run_maelstrom
    counts = scan_regionfile_to_table(
  File "/home1/fudenber/.conda/envs/gimme/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 182, in scan_regionfile_to_table
    for row in s.count(regions):
  File "/home1/fudenber/.conda/envs/gimme/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 1031, in count
    for matches in self.scan(seqs, nreport, scan_rc):
  File "/home1/fudenber/.conda/envs/gimme/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 1140, in scan
    for result in it:
  File "/home1/fudenber/.conda/envs/gimme/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 1204, in _scan_sequences
    motifs = [(m, thresholds[m.id]) for m in read_motifs(self.motifs)]
  File "/home1/fudenber/.conda/envs/gimme/lib/python3.9/site-packages/gimmemotifs/scanner.py", line 1204, in <listcomp>
    motifs = [(m, thresholds[m.id]) for m in read_motifs(self.motifs)]
KeyError: 'MA0637.1_CENPB'
scanning:   0%|                                      | 0/1000 [00:02<?, ? sequences/s]
siebrenf commented 2 years ago

That's a weird one! No clue why it's working with one, but not the other...

We're attempting to upgrade gimme on bioconda now, and I secretly hope that might fix it!