vanheeringen-lab / gimmemotifs

Suite of motif tools, including a motif prediction pipeline for ChIP-seq experiments. See full GimmeMotifs documentation for detailed installation instructions and usage examples.
https://gimmemotifs.readthedocs.io/en/master
MIT License
108 stars 33 forks source link

sqlite3.DatabaseError: file is not a database #169

Closed jos4uke closed 3 years ago

jos4uke commented 3 years ago

Hi @simonvh

I ran gimme motifs on motif database JASPAR2020_plants, but it seems the way I set the command line may be wrong as I got this error sqlite3.DatabaseError: file is not a database:

# see step to rerproduce for the full command line
gimme motifs -p JASPAR2020_plants ...

As you can see, I didn't suffix the motifs database file name with .pfm extension as seen in your documentation for JASPAR2020_vertebrates or HOMER.

I checked for JASPAR2020_plants motifs database location in my installation. It was installed as well along with all the other motifs databases in /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/data/motif_databases/ (see Additional information for the path to included motifs database installed via conda).

And also checked for the configuration file (see Additional information for my config file content) for the default motifs databases location parameter: motif_databases = motif_databases. It matched the folder name in my installation.

But seems gimme motifs didn't know where to search for the motifs database on my installation.

Is it the right way to set the motifs database, when this one is included with gimme motifs? Or maybe I need to setup the configuration file differently so gimme motifs know where to find it? Can you help please?

Thanks in advance

@jos4uke

Step to reproduce:

SEQDIR=seq
ZONES=(zone1 zone2 zone3 zone4)
ZONE_SEQS=(cds.ranges.zone1.flank.seqs.fa  cds.ranges.zone2.flank.seqs.fa  cds.ranges.zone3.flank.seqs.fa  cds.ranges.zone4.flank.seqs.fa)
OUTDIR=gimmemotifs
THREADS=12
#### gimme motifs 
MOTIFS_DB=JASPAR2020_plants 
REF_GENOME=my_genome.fa

ZONE_SEQ=$SEQDIR/${ZONE_SEQS[$SLURM_ARRAY_TASK_ID]}
ZONE=${ZONES[$SLURM_ARRAY_TASK_ID]}
OUTDIR_MOTIFS=${OUTDIR}/${ZONE}.motifs
echo "### zone seq file: ${ZONE_SEQ}"
echo "### zone: ${ZONE}"
echo "### output directory: ${OUTDIR_MOTIFS}"

gimme motifs ${ZONE_SEQ} ${OUTDIR_MOTIFS} \
    -g ${REF_GENOME} \
    --background random \
    -p ${MOTIFS_DB} \
    --analysis xl \
    --keepintermediate \
    --size 1000 \
    -N ${THREADS}

Expected behavior A clear and concise description of what you expected to happen.

Error logs

2021-01-13 22:04:09,094 - WARNING - size parameter will be ignored for FASTA input
2021-01-13 22:04:09,095 - INFO - creating background (random)
2021-01-13 22:04:27,450 - INFO - starting full motif analysis
2021-01-13 22:04:27,475 - INFO - using size of 1000, set size to 0 to use original region size
2021-01-13 22:04:27,476 - INFO - preparing input from FASTA
2021-01-13 22:04:27,476 - INFO - preparing input (FASTA)
2021-01-13 22:05:09,014 - INFO - starting motif prediction (xl)
2021-01-13 22:05:09,014 - INFO - tools: MEME, BioProspector, Homer
2021-01-13 22:05:10,005 - INFO - all jobs submitted
2021-01-13 22:05:25,253 - INFO - Homer_width_6 finished, found 5 motifs
2021-01-13 22:05:33,216 - INFO - Homer_width_8 finished, found 5 motifs
2021-01-13 22:06:13,432 - INFO - Homer_width_10 finished, found 5 motifs
2021-01-13 22:07:56,607 - INFO - Homer_width_12 finished, found 5 motifs
2021-01-13 22:10:04,261 - INFO - BioProspector_width_6 finished, found 5 motifs
2021-01-13 22:10:04,711 - INFO - MEME_width_6 finished, found 0 motifs
2021-01-13 22:10:04,951 - INFO - MEME_width_8 finished, found 0 motifs
2021-01-13 22:10:05,211 - INFO - MEME_width_10 finished, found 0 motifs
2021-01-13 22:10:05,445 - INFO - MEME_width_12 finished, found 0 motifs
2021-01-13 22:10:05,687 - INFO - MEME_width_14 finished, found 0 motifs
2021-01-13 22:10:05,913 - INFO - MEME_width_16 finished, found 0 motifs
2021-01-13 22:10:06,140 - INFO - MEME_width_18 finished, found 0 motifs
2021-01-13 22:10:06,411 - INFO - MEME_width_20 finished, found 0 motifs
/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/subprocess.py:844: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdout = io.open(c2pread, 'rb', bufsize)
/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/subprocess.py:849: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stderr = io.open(errread, 'rb', bufsize)
2021-01-13 22:10:07,033 - INFO - using 14000 sequences
2021-01-13 22:11:04,225 - INFO - BioProspector_width_8 finished, found 5 motifs
2021-01-13 22:11:32,823 - INFO - BioProspector_width_10 finished, found 5 motifs
2021-01-13 22:11:59,414 - INFO - BioProspector_width_12 finished, found 5 motifs
2021-01-13 22:12:47,144 - INFO - BioProspector_width_16 finished, found 5 motifs
2021-01-13 22:13:06,400 - INFO - Homer_width_14 finished, found 5 motifs
2021-01-13 22:13:23,304 - INFO - BioProspector_width_18 finished, found 5 motifs
2021-01-13 22:13:41,501 - INFO - BioProspector_width_20 finished, found 5 motifs
2021-01-13 22:14:16,643 - INFO - BioProspector_width_14 finished, found 5 motifs
2021-01-13 22:17:48,051 - INFO - Homer_width_16 finished, found 5 motifs
2021-01-13 22:25:03,907 - INFO - Homer_width_18 finished, found 5 motifs
2021-01-13 22:33:15,235 - INFO - Homer_width_20 finished, found 5 motifs
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/gimmemotifs/prediction.py", line 47, in mp_calc_stats
    stats = calc_stats(
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/gimmemotifs/stats.py", line 216, in calc_stats
    for batch_result in calc_stats_iterator(
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/gimmemotifs/stats.py", line 121, in calc_stats_iterator
    s.set_meanstd(gc=gc)
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/gimmemotifs/scanner.py", line 715, in set_meanstd
    self.set_background(gc=gc)
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/gimmemotifs/scanner.py", line 843, in set_background
    with Cache(CACHE_DIR) as cache:
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/diskcache/core.py", line 457, in __init__
    sql = self._sql_retry
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/diskcache/core.py", line 649, in _sql_retry
    sql = self._sql
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/diskcache/core.py", line 644, in _sql
    return self._con.execute
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/diskcache/core.py", line 631, in _con
    settings = con.execute(select).fetchall()
sqlite3.DatabaseError: file is not a database
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin/gimme", line 11, in <module>
    cli(sys.argv[1:])
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/gimmemotifs/cli.py", line 661, in cli
    args.func(args)
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/gimmemotifs/commands/motifs.py", line 104, in motifs
    gimme_motifs(
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/gimmemotifs/denovo.py", line 613, in gimme_motifs
    result = predict_motifs(
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/gimmemotifs/prediction.py", line 360, in predict_motifs
    result = pp_predict_motifs(
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/gimmemotifs/prediction.py", line 320, in pp_predict_motifs
    result.wait_for_stats()
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/gimmemotifs/prediction.py", line 178, in wait_for_stats
    job.get()
  File "/opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
sqlite3.DatabaseError: file is not a database
ERROR: file is not a database

Installation information:

Additional information:

I installed gimme motifs using conda. And searching for all included motifs databases, I found them installed under: /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/lib/python3.8/site-packages/data/motif_databases/

# included motifs databases
CIS-BP.motif2factors.txt                 HOMER.motif2factors.txt                    JASPAR2020_insects.pfm
CIS-BP.pfm                               HOMER.pfm                                  JASPAR2020.motif2factors.txt
ENCODE.motif2factors.txt                 IMAGE.motif2factors.txt                    JASPAR2020_nematodes.motif2factors.txt
ENCODE.pfm                               IMAGE.pfm                                  JASPAR2020_nematodes.pfm
factorbook.motif2factors.txt             JASPAR2018_fungi.motif2factors.txt         JASPAR2020.pfm
factorbook.pfm                           JASPAR2018_fungi.pfm                       JASPAR2020_plants.motif2factors.txt
gimme.invertebrate.v1.0.pwm              JASPAR2018_insects.motif2factors.txt       JASPAR2020_plants.pfm
gimme.plant.v1.0.pwm                     JASPAR2018_insects.pfm                     JASPAR2020_urochordates.motif2factors.txt
gimme.vertebrate.v3.1.factor2motifs.txt  JASPAR2018.motif2factors.txt               JASPAR2020_urochordates.pfm
gimme.vertebrate.v3.1.motif2factors.txt  JASPAR2018_nematodes.motif2factors.txt     JASPAR2020_vertebrates.motif2factors.txt
gimme.vertebrate.v3.1.pwm                JASPAR2018_nematodes.pfm                   JASPAR2020_vertebrates.pfm
gimme.vertebrate.v5.0.motif2factors.txt  JASPAR2018.pfm                             RSAT_insects.motif2factors.txt
gimme.vertebrate.v5.0.pfm                JASPAR2018_plants.motif2factors.txt        RSAT_insects.pfm
HOCOMOCOv10_HUMAN.motif2factors.txt      JASPAR2018_plants.pfm                      RSAT_plants.motif2factors.txt
HOCOMOCOv10_HUMAN.pfm                    JASPAR2018_urochordates.motif2factors.txt  RSAT_plants.pfm
HOCOMOCOv10_MOUSE.motif2factors.txt      JASPAR2018_urochordates.pfm                RSAT_vertebrates.motif2factors.txt
HOCOMOCOv10_MOUSE.pfm                    JASPAR2018_vertebrates.motif2factors.txt   RSAT_vertebrates.pfm
HOCOMOCOv11_HUMAN.motif2factors.txt      JASPAR2018_vertebrates.pfm                 SwissRegulon.motif2factors.txt
HOCOMOCOv11_HUMAN.pfm                    JASPAR2020_fungi.motif2factors.txt         SwissRegulon.pfm
HOCOMOCOv11_MOUSE.motif2factors.txt      JASPAR2020_fungi.pfm
HOCOMOCOv11_MOUSE.pfm 

And here is the config file content:

# ~/.config/gimmemotifs/gimmemotifs.cfg
[main]
bg = bg
template_dir = templates
score_dir = score_dists
gene_dir = genes
motif_databases = motif_databases
tools = included_tools/

[params]
fraction = 0.2
use_strand = False
abs_max = 5000
analysis = xl
enrichment = 1.5
size = 200
lsize = 500
background = gc,random
cluster_threshold = 0.95
scan_cutoff = 0.9
available_tools = MDmodule,MEME,MEMEW,Weeder,GADEM,MotifSampler,Trawler,Improbizer,BioProspector,Posmo,ChIPMunk,AMD,HMS,Homer,XXmotif,ProSampler,DiNAMO
tools = MEME,BioProspector,Homer
pvalue = 0.001
max_time = -1
ncpus = 12
motif_db = JASPAR2020_plants.pfm
use_cache = False

[YAMDA]
bin = run_em.py
dir = included_tools/

[DiNAMO]
bin = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin/dinamo
dir = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin

[RPMCMC]
bin = multi_motif_finder
dir = included_tools/

[ProSampler]
bin = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin/ProSampler
dir = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin

[AMD]
bin = AMD.bin
dir = included_tools/

[MEME]
bin = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin/meme
dir = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin

[DREME]
bin = dreme-py3
dir = included_tools/

[MEMEW]
bin = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin/meme
dir = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin

[MDmodule]
bin = MDmodule
dir = included_tools/

[Improbizer]
bin = ameme
dir = included_tools/

[MotifSampler]
bin = MotifSampler
dir = included_tools/

[GADEM]
bin = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin/gadem
dir = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin

[Trawler]
bin = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin/trawler
dir = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin

[Weeder]
bin = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin/weeder2
dir = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin

[BioProspector]
bin = BioProspector
dir = included_tools/

[ChIPMunk]
bin = ChIPMunk.sh
dir = included_tools/ChIPMunk

[Homer]
bin = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin/homer2
dir = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin

[XXmotif]
bin = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin/XXmotif
dir = /opt/share/FLOCAD/userspace/jtran1/miniconda3/envs/gimmemotifs/bin

[Posmo]
bin = posmo
dir = included_tools/

[HMS]
bin = hms
dir = included_tools/HMS
simonvh commented 3 years ago

Hi @jos4uke , am I correct in assuming that you're running this on SLURM? There is currently an issue with running GimmeMotifs in a process-safe manner. You have to delete the cache directory (which has become corrupted): ~/.cache/gimmemotifs.

After, you can try the work-around mentioned in the first FAQ entry here: https://gimmemotifs.readthedocs.io/en/master/faq.html#faq

jos4uke commented 3 years ago

Hi @simonvh

thanks for your reply. I will try your suggestions and tell if it is ok for me.

Best

jos4uke commented 3 years ago

hi @simonvh

your suggestions fixed my issue on SLURM. Nevertheless, I encountered a new issue at denovo reporting step. I will open a new issue. thanks again

Best

simonvh commented 3 years ago

Added a more informative warning, see 50103d4