jacksund / simmate

The Simulated Materials Ecosystem (Simmate) is a toolbox and framework for computational materials research.
https://simmate.org
BSD 3-Clause "New" or "Revised" License
30 stars 9 forks source link

"database is locked" with just one worker #272

Closed scott-materials closed 2 years ago

scott-materials commented 2 years ago

Describe the bug

Doing a binary search. The search was using just one worker, but got a "database is locked" message. Not sure how/why that occurred.

To Reproduce

No response

Error

  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/django/db/utils.py", line 98, in inner
    return func(*args, **kwargs)
sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/simmate/workflow_engine/execution/worker.py", line 176, in start
    result = fxn(*args, **kwargs)
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/simmate/workflow_engine/workflow.py", line 153, in _run_full
    result = cls.run_config(**kwargs_cleaned)
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/simmate/calculators/vasp/workflows/relaxation/staged.py", line 86, in run_config
    state = current_task.run(
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/simmate/workflow_engine/workflow.py", line 94, in run
    result = cls._run_full(**kwargs)
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/simmate/workflow_engine/workflow.py", line 155, in _run_full
    result["calculation_id"] = cls._save_to_database(
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/simmate/workflow_engine/workflow.py", line 340, in _save_to_database
    calculation.update_from_vasp_run(vasprun, corrections, directory)
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/simmate/database/base_data_types/relaxation.py", line 250, in update_from_vasp_run
    structure.save()
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/django/db/models/base.py", line 831, in save
    self.save_base(
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/django/db/models/base.py", line 882, in save_base
    updated = self._save_table(
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/django/db/models/base.py", line 1025, in _save_table
    results = self._do_insert(
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/django/db/models/base.py", line 1066, in _do_insert
    return manager._insert(
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/django/db/models/query.py", line 1790, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1668, in execute_sql
    self.connection.ops.fetch_returned_insert_columns(
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/django/db/backends/base/operations.py", line 212, in fetch_returned_insert_columns
    return cursor.fetchone()
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/django/db/utils.py", line 97, in inner
    with self:
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-packages/django/db/utils.py", line 98, in inner
    return func(*args, **kwargs)
django.db.utils.OperationalError: database is locked
2022-09-01 18:33:25 WARNING  Task failed with the error shown above.

                             If you are unfamilar with error tracebacks and find
                             this error difficult to read, you can learn more
                             about these errors here:
                              https://realpython.com/python-traceback/

                             Please open a new issue on our github page if you
                             believe this is a bug:
                              https://github.com/jacksund/simmate/issues/

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-p │
│ ackages/django/db/backends/base/base.py:306 in _commit                       │
│                                                                              │
│   303 │   def _commit(self):                                                 │
│   304 │   │   if self.connection is not None:                                │
│   305 │   │   │   with self.wrap_database_errors:                            │
│ ❱ 306 │   │   │   │   return self.connection.commit()                        │
│   307 │                                                                      │
│   308 │   def _rollback(self):                                               │
│   309 │   │   if self.connection is not None:                                │
│                                                                              │
│ ╭───────────────────────── locals ─────────────────────────╮                 │
│ │ self = <DatabaseWrapper vendor='sqlite' alias='default'> │                 │
│ ╰──────────────────────────────────────────────────────────╯                 │
╰──────────────────────────────────────────────────────────────────────────────╯
OperationalError: database is locked

The above exception was the direct cause of the following exception:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-p │
│ ackages/simmate/command_line/workflow_engine.py:58 in start_worker           │
│                                                                              │
│    55 │   │   waittime_on_empty_queue,                                       │
│    56 │   │   tag,  # this is actually "tags" --> a list of strings          │
│    57 │   )                                                                  │
│ ❱  58 │   worker.start()                                                     │
│    59                                                                        │
│    60                                                                        │
│    61 @workflow_engine_app.command()                                         │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │    close_on_empty_queue = False                                          │ │
│ │              nitems_max = None                                           │ │
│ │                     tag = ['simmate']                                    │ │
│ │                 timeout = None                                           │ │
│ │ waittime_on_empty_queue = 1.0                                            │ │
│ │                  Worker = <class                                         │ │
│ │                           'simmate.workflow_engine.execution.worker.Sim… │ │
│ │                  worker = <simmate.workflow_engine.execution.worker.Sim… │ │
│ │                           object at 0x7f39dbe54a60>                      │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-p │
│ ackages/simmate/workflow_engine/execution/worker.py:249 in start             │
│                                                                              │
│   246 │   │   │   workitem = WorkItem.objects.select_for_update().get(pk=wor │
│   247 │   │   │                                                              │
│   248 │   │   │   # our lock exists only within this transation              │
│ ❱ 249 │   │   │   with transaction.atomic():                                 │
│   250 │   │   │   │   # pickle the result and update the workitem's result a │
│   251 │   │   │   │   # !!! should I have the pickle inside of a Try?        │
│   252 │   │   │   │   workitem.result_binary = result_pickled                │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │                 args = ()                                                │ │
│ │ CommandNotFoundError = <class                                            │ │
│ │                        'simmate.workflow_engine.s3_workflow.CommandNotF… │ │
│ │                  fxn = <bound method Workflow._run_full of <class        │ │
│ │                        'simmate.calculators.vasp.workflows.relaxation.s… │ │
│ │               kwargs = {                                                 │ │
│ │                        │   'structure': {                                │ │
│ │                        │   │   '@module':                                │ │
│ │                        'simmate.toolkit.base_data_types.structure',      │ │
│ │                        │   │   '@class': 'Structure',                    │ │
│ │                        │   │   'charge': 0,                              │ │
│ │                        │   │   'lattice': {                              │ │
│ │                        │   │   │   'matrix': [                           │ │
│ │                        │   │   │   │   [-0.810135, 0.810135, 3.121612],  │ │
│ │                        │   │   │   │   [0.810135, -0.810135, 3.121612],  │ │
│ │                        │   │   │   │   [0.810135, 0.810135, -3.121612]   │ │
│ │                        │   │   │   ],                                    │ │
│ │                        │   │   │   'pbc': (True, True, True),            │ │
│ │                        │   │   │   'a': 3.325221633965772,               │ │
│ │                        │   │   │   'b': 3.325221633965772,               │ │
│ │                        │   │   │   'c': 3.325221633965772,               │ │
│ │                        │   │   │   'alpha': 151.79782926814372,          │ │
│ │                        │   │   │   'beta': 151.79782926814372,           │ │
│ │                        │   │   │   'gamma': 40.3086126383752,            │ │
│ │                        │   │   │   'volume': 8.195089546543116           │ │
│ │                        │   │   },                                        │ │
│ │                        │   │   'sites': [                                │ │
│ │                        │   │   │   {                                     │ │
│ │                        │   │   │   │   'species': [                      │ │
│ │                        │   │   │   │   │   {'element': 'C', 'occu': 1}   │ │
│ │                        │   │   │   │   ],                                │ │
│ │                        │   │   │   │   'abc': [0.0, 0.0, 0.0],           │ │
│ │                        │   │   │   │   'xyz': [0.0, 0.0, 0.0],           │ │
│ │                        │   │   │   │   'label': 'C',                     │ │
│ │                        │   │   │   │   'properties': {}                  │ │
│ │                        │   │   │   }                                     │ │
│ │                        │   │   ]                                         │ │
│ │                        │   },                                            │ │
│ │                        │   'command': 'mpirun -n 8 vasp_std > vasp.out', │ │
│ │                        │   'compress_output': True,                      │ │
│ │                        │   'run_id':                                     │ │
│ │                        '8bb6c76e-5a34-4ec6-a884-11d522102fb2'            │ │
│ │                        }                                                 │ │
│ │      ntasks_finished = 22                                                │ │
│ │               result = OperationalError('database is locked')            │ │
│ │       result_pickled = b'\x80\x05\x95\x10\x10\x00\x00\x00\x00\x00\x00\x… │ │
│ │                 self = <simmate.workflow_engine.execution.worker.Simmat… │ │
│ │                        object at 0x7f39dbe54a60>                         │ │
│ │           time_start = 1662055561.5877624                                │ │
│ │             workitem = <WorkItem: WorkItem object (23)>                  │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-p │
│ ackages/django/db/transaction.py:262 in __exit__                             │
│                                                                              │
│   259 │   │   │   │   else:                                                  │
│   260 │   │   │   │   │   # Commit transaction                               │
│   261 │   │   │   │   │   try:                                               │
│ ❱ 262 │   │   │   │   │   │   connection.commit()                            │
│   263 │   │   │   │   │   except DatabaseError:                              │
│   264 │   │   │   │   │   │   try:                                           │
│   265 │   │   │   │   │   │   │   connection.rollback()                      │
│                                                                              │
│ ╭─────────────────────────────── locals ───────────────────────────────╮     │
│ │ connection = <DatabaseWrapper vendor='sqlite' alias='default'>       │     │
│ │   exc_type = None                                                    │     │
│ │  exc_value = None                                                    │     │
│ │       self = <django.db.transaction.Atomic object at 0x7f398dd68670> │     │
│ │  traceback = None                                                    │     │
│ ╰──────────────────────────────────────────────────────────────────────╯     │
│                                                                              │
│ /opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-p │
│ ackages/django/utils/asyncio.py:26 in inner                                  │
│                                                                              │
│   23 │   │   │   │   if not os.environ.get("DJANGO_ALLOW_ASYNC_UNSAFE"):     │
│   24 │   │   │   │   │   raise SynchronousOnlyOperation(message)             │
│   25 │   │   │   # Pass onward.                                              │
│ ❱ 26 │   │   │   return func(*args, **kwargs)                                │
│   27 │   │                                                                   │
│   28 │   │   return inner                                                    │
│   29                                                                         │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │    args = (<DatabaseWrapper vendor='sqlite' alias='default'>,)           │ │
│ │    func = <function BaseDatabaseWrapper.commit at 0x7f3999ba0ee0>        │ │
│ │  kwargs = {}                                                             │ │
│ │ message = 'You cannot call this from an async context - use a thread or  │ │
│ │           sync_to_async.'                                                │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-p │
│ ackages/django/db/backends/base/base.py:330 in commit                        │
│                                                                              │
│   327 │   │   """Commit a transaction and reset the dirty flag."""           │
│   328 │   │   self.validate_thread_sharing()                                 │
│   329 │   │   self.validate_no_atomic_block()                                │
│ ❱ 330 │   │   self._commit()                                                 │
│   331 │   │   # A successful commit means that the database connection works │
│   332 │   │   self.errors_occurred = False                                   │
│   333 │   │   self.run_commit_hooks_on_set_autocommit_on = True              │
│                                                                              │
│ ╭───────────────────────── locals ─────────────────────────╮                 │
│ │ self = <DatabaseWrapper vendor='sqlite' alias='default'> │                 │
│ ╰──────────────────────────────────────────────────────────╯                 │
│                                                                              │
│ /opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-p │
│ ackages/django/db/backends/base/base.py:305 in _commit                       │
│                                                                              │
│   302 │                                                                      │
│   303 │   def _commit(self):                                                 │
│   304 │   │   if self.connection is not None:                                │
│ ❱ 305 │   │   │   with self.wrap_database_errors:                            │
│   306 │   │   │   │   return self.connection.commit()                        │
│   307 │                                                                      │
│   308 │   def _rollback(self):                                               │
│                                                                              │
│ ╭───────────────────────── locals ─────────────────────────╮                 │
│ │ self = <DatabaseWrapper vendor='sqlite' alias='default'> │                 │
│ ╰──────────────────────────────────────────────────────────╯                 │
│                                                                              │
│ /opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-p │
│ ackages/django/db/utils.py:91 in __exit__                                    │
│                                                                              │
│    88 │   │   │   │   # the connection unusable.                             │
│    89 │   │   │   │   if dj_exc_type not in (DataError, IntegrityError):     │
│    90 │   │   │   │   │   self.wrapper.errors_occurred = True                │
│ ❱  91 │   │   │   │   raise dj_exc_value.with_traceback(traceback) from exc_ │
│    92 │                                                                      │
│    93 │   def __call__(self, func):                                          │
│    94 │   │   # Note that we are intentionally not using @wraps here for per │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │  db_exc_type = <class 'sqlite3.OperationalError'>                        │ │
│ │  dj_exc_type = <class 'django.db.utils.OperationalError'>                │ │
│ │ dj_exc_value = OperationalError('database is locked')                    │ │
│ │     exc_type = <class 'sqlite3.OperationalError'>                        │ │
│ │    exc_value = OperationalError('database is locked')                    │ │
│ │         self = <django.db.utils.DatabaseErrorWrapper object at           │ │
│ │                0x7f398db7c910>                                           │ │
│ │    traceback = <traceback object at 0x7f398de69680>                      │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
│                                                                              │
│ /opt/ohpc/pub/apps/anaconda/anaconda3/envs/camryns_env/lib/python3.10/site-p │
│ ackages/django/db/backends/base/base.py:306 in _commit                       │
│                                                                              │
│   303 │   def _commit(self):                                                 │
│   304 │   │   if self.connection is not None:                                │
│   305 │   │   │   with self.wrap_database_errors:                            │
│ ❱ 306 │   │   │   │   return self.connection.commit()                        │
│   307 │                                                                      │
│   308 │   def _rollback(self):                                               │
│   309 │   │   if self.connection is not None:                                │
│                                                                              │
│ ╭───────────────────────── locals ─────────────────────────╮                 │
│ │ self = <DatabaseWrapper vendor='sqlite' alias='default'> │                 │
│ ╰──────────────────────────────────────────────────────────╯                 │
╰──────────────────────────────────────────────────────────────────────────────╯
OperationalError: database is locked

Versions

Additional details

No response

jacksund commented 2 years ago

Just chatted with @scott-materials about this.

For others that land here with a "database is locked" error -- you've exceed the capabilities of sqlite3, which is the default database backend. Sqlite3 isn't built for parallel connections and/or connections from multiple hosts (like on an HPC cluster).

Switch to a cloud database (e.g. Postgres) and this issue will go away.