sot / kadi

Chandra commands and events
https://sot.github.io/kadi
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Work around issues with sqlite3 database being locked during updates #108

Closed taldcroft closed 6 years ago

taldcroft commented 6 years ago

This is a workaround for problems like:

2017-07-19 10:01:09,238 Updating Scs107 events from 2017:178:14:01:05 to 2017:200:14:01:04
2017-07-19 10:01:19,455 Deleting 34 LoadSegment events after 2017:180:14:01:04.585
Traceback (most recent call last):
  File "/proj/sot/ska/share/kadi/update_events", line 12, in <module>
    update_events.main()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/kadi/update_events.py", line 221, in main
    update(EventModel, date_stop)
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/kadi/update_events.py", line 100, in update
    delete_from_date(EventModel, delete_date, set_update_date=False)
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/kadi/update_events.py", line 72, in delete_from_date
    events.delete()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/models/query.py", line 465, in delete
    collector.delete()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/models/deletion.py", line 282, in delete
    sender=model, instance=obj, using=self.using
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/transaction.py", line 305, in __exit__
    connection.commit()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/backends/__init__.py", line 168, in commit
    self._commit()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/backends/__init__.py", line 136, in _commit
    return self.connection.commit()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/backends/__init__.py", line 136, in _commit
    return self.connection.commit()
django.db.utils.OperationalError: database is locked

This PR has been tested by running the ska_testr/packages/kadi/test_regression_long.sh test, once via ska_testr in Ska flight and once by hand in the repo for this PR. Outputs were diffed and no diffs were seen.

I have been unable to reproduce the databased locked problem. I tried locally starting the process to update the database over a 30 day period, then in a separate window repeatedly query the events database. No problem, no joy. So most of this code is not tested and just needs to be looked at carefully.

Plan B was to make a local copy of the events database, update that, and then copy back. But even then it would be better to not have the whole update process stop cold. My feeling is to try this and see if it works. We'll still see warnings pop up if the database was locked.

jeanconn commented 6 years ago

Ugh. I thought I could reliably lock the database with my guide stats processing; I'll try that again today and see if I can lock a copy.

This PR seems a fine way to at least try again if for some reason the database was locked by reasonable small query or netapp weirdness.

taldcroft commented 6 years ago

Maybe try the instructions here: https://stackoverflow.com/questions/14272715/how-can-i-lock-a-sqlite-database

jeanconn commented 6 years ago

I think I looked at that when we started using sqlite but ran into issues with pragma support by sqlite version. Will look again.

jeanconn commented 6 years ago

On a locked local copy of events.db via

PRAGMA locking_mode = EXCLUSIVE;
BEGIN EXCLUSIVE;

it looks like I'm getting "django.db.utils.OperationalError: database is locked" without seeing any of the

  logger.info('Warning: locked database, waiting {} seconds'.format(delay))

I'd expect from this PR.

taldcroft commented 6 years ago

Grrrr. Will investigate.

jeanconn commented 6 years ago

Not clear to me if the exclusive lock via pragma really replicates what we've hit upon by accident in processing though (if the exclusive lock is also a read lock and maybe the lock we're getting is only a write lock?).

taldcroft commented 6 years ago

From what you said it is raising what appears to be the same OperationalError without it being caught and handled, or else the database is locked is not matching. If you are working on this, can you try putting in some print statements to see what is happening?

jeanconn commented 6 years ago

From the traceback, it looks like it is throwing the error at

https://github.com/sot/kadi/blob/3de2e19b0b0ac48c28f88a61a5c354804adc1ea9/kadi/update_events.py#L119

when trying to do that get, not when trying to save.

jeanconn commented 6 years ago

From looking through our kadi OperationalError cron task emails, it doesn't look to me like that has happened for us in practice (the errors we see are on save or delete), so I think the exclusive pragma lock might just be a bad test for this.

taldcroft commented 6 years ago

Ah, so now I understand your comment about read vs write. Agreed that the exclusive lock is no good, though you might wrap that particular get in a try4times and see what it does. (Noting that try4imes does not return the func result, which should actually be changed).