streetslab / dimelo

python package for analysis of dimelo-seq & nanopore modified base data
MIT License
3 stars 5 forks source link

OperationalError: database is locked when using `plot_enrichment` functions #35

Open palakela opened 1 year ago

palakela commented 1 year ago

Hi,

I am trying to analyse a megalodon output bam file using your python package. I am able to correctly generate the qc report using the qc_report function, but when I try to use plot_enrichment or plot_enrichment_profile functions, I run into this error:

---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker
    r = call_item()
  File "/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/parallel.py", line 264, in __call__
    for func, args, kwargs in self.items]
  File "/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/parallel.py", line 264, in <listcomp>
    for func, args, kwargs in self.items]
  File "/miniconda3/envs/dimelo/lib/python3.7/site-packages/dimelo/parse_bam.py", line 460, in parse_reads_window
    extractAllBases,
  File "/miniconda3/envs/dimelo/lib/python3.7/site-packages/dimelo/parse_bam.py", line 554, in get_modified_reference_positions
    extractAllBases,
  File "/miniconda3/envs/dimelo/lib/python3.7/site-packages/dimelo/parse_bam.py", line 725, in get_mod_reference_positions_by_mod
    outDir,
  File "/miniconda3/envs/dimelo/lib/python3.7/site-packages/dimelo/parse_bam.py", line 801, in update_methylation_aggregate_db
    execute_sql_command(command, DATABASE_NAME, data_fill, connection)
  File "/miniconda3/envs/dimelo/lib/python3.7/site-packages/dimelo/utils.py", line 61, in execute_sql_command
    c.executemany(command, values)
...
--> 384             raise self._exception
    385         else:
    386             return self._result

OperationalError: database is locked

Can you please someone help me to figure out what is happening?

thekugelmeister commented 1 year ago

Hi! Sorry you're running into this problem. We are aware of the existence of this sort of database locking problem, and have been looking into it. Long story short, we are evaluating a couple of different ways to address the issue but have not made any final decisions yet.

Interestingly, we have not seen this problem specifically in plot_enrichment. We aren't surprised it can happen there, we just haven't seen it ourselves yet.

Some thoughts:

Some strategies we have had some success with for getting around this issue:

palakela commented 1 year ago

I am working locally on Linux CentOS 8 with 128 cores, trying to analyze modbam files generated by Megalogon v2.3.4 with Guppy v5.0.16, they are around 8 GB each.

I am using the default parameters of plot_enrichment and your bed file for ctcf motif+peak.

I have tried to use a subset of the files (1.24 GB) but I run into the same issue. plot_enrichment only works when I reduce the analysis to a single chromosome (445 MB), but plot_enrichment_profile still doesn't work.

thekugelmeister commented 1 year ago

The last thing I'd try in the short term is to reduce the number of cores (even possibly to a single core, if necessary). Theoretically you shouldn't be able to run into the database locking error if there is only one core. I know that's not a great long term solution, but right now that's the best idea I have.

Sorry I don't have anything better! We're working on this error.