phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
125 stars 33 forks source link

Multiple Mob_recon cannot run in concurrently. #46

Closed Takadonet closed 4 years ago

Takadonet commented 4 years ago

When running hundreds of mob_recon concurrently in our Galaxy environment, we are getting the following error:

NCBI database format is outdated. Upgrading

Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)...

Done. Parsing...

Traceback (most recent call last):

  File /Galaxy/deps/_conda/envs/__mob_suite@2.0.5/lib/python3.6/site-packages/mob_suite/mob_typer.py", line 520, in <module>

    main()

  File /Galaxy/deps/_conda/envs/__mob_suite@2.0.5/lib/python3.6/site-packages/mob_suite/mob_typer.py", line 366, in main

    matchtype="loose_match",hr_obs_data = loadHostRangeDB())

  File /Galaxy/deps/_conda/envs/__mob_suite@2.0.5/lib/python3.6/site-packages/mob_suite/mob_host_range.py", line 527, in getRefSeqHostRange

    convergance_rank, converged_taxonomy_name = getHostRangeRankCovergence(ref_taxids)

  File /Galaxy/deps/_conda/envs/__mob_suite@2.0.5/lib/python3.6/site-packages/mob_suite/mob_host_range.py", line 369, in getHostRangeRankCovergence

    ncbi = NCBITaxa(dbfile=ETE3DBTAXAFILE)

  File /Galaxy/deps/_conda/envs/__mob_suite@2.0.5/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 120, in __init__

    self.update_taxonomy_database(taxdump_file)

  File /Galaxy/deps/_conda/envs/__mob_suite@2.0.5/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 129, in update_taxonomy_database

    update_db(self.dbfile)

  File /Galaxy/deps/_conda/envs/__mob_suite@2.0.5/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db

    upload_data(dbfile)

  File /Galaxy/deps/_conda/envs/__mob_suite@2.0.5/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 791, in upload_data

    db.execute(cmd)

sqlite3.OperationalError: database is locked

Current solution is to limit the number of concurrent mob_recon jobs to ONE for all users.

kbessonov1984 commented 4 years ago

Hello Philip, I looks like ete3 library that is performing taxonomy id conversions and plasmid host-range predictions decided to update taxonomy definitions causing a race condition with simultaneous sqlite database write. This error should not happen again until next taxonomy database update. I think it is safe now to remove one user per mob_recon run condition now given that sqlite supports concurrent read access. Would you be available to discuss this issue this week? In the future release James has implemented lock file mechanism forcing other threads to wait until lock file is removed by a master thread. This mechanism is already implemented during initial runs of mob_suite that initialize other key databases for the first time.

Takadonet commented 4 years ago

Give a shout on internal chat.

jrober84 commented 4 years ago

MOB-suite 3.0.0 now incorporates a check at the beginning of MOB-typer and MOB-recon on the ETE3 database to see if it is current. A lock file is created and other instances will wait until the lock file is deleted to a maximum of 10 minutes. There is some randomness in the checking to reduced multiple concurrent creations and deletions. This is a bandaid solution which should solve the issue for now, but I have raised an issue on the ETE3 github about making the update optional or locking of the update so that there aren't multiple processes trying to update the db at once.