Open martijnvermaat opened 8 years ago
I just saw a few more occurrences of this. Actually the batch entry that is being processed when the error occurs has already be removed from the queue, so the batch processor will be able to resume with the next entry. But if the next few entries use the same reference (this happened in the situation above), the batch processor service is stopped by systemd due to crashing too many times in a short period of time.
Here's an example:
Traceback (most recent call last):
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/mutalyzer/Scheduler.py", line 440, in _processNameBatch
variantchecker.check_variant(cmd, O)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/mutalyzer/variantchecker.py", line 1743, in check_variant
retrieved_record = retriever.loadrecord(record_id)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/mutalyzer/Retriever.py", line 771, in loadrecord
filename = self.fetch(identifier)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/mutalyzer/Retriever.py", line 413, in fetch
return self._update_db_md5(raw_data, name, gi)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/mutalyzer/Retriever.py", line 156, in _update_db_md5
{'checksum': md5sum})
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 3005, in update
update_op.exec_()
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1112, in exec_
self._do_exec()
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1261, in _do_exec
mapper=self.mapper)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1034, in execute
bind, close_with_result=True).execute(clause, params or {})
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
return meth(self, multiparams, params)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
compiled_sql, distilled_params
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
context)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
exc_info
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
context)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
cursor.execute(statement, parameters)
IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "ix_references_checksum"
DETAIL: Key (checksum)=(fb263a5e992d38a549882889d14f5912) already exists.
[SQL: 'UPDATE "references" SET checksum=%(checksum)s WHERE "references".accession = %(accession_1)s'] [parameters: {'checksum': u'fb263a5e992d38a549882889d14f5912', 'accession_1': u'NM_000022.2'}]
Some large batch name checker jobs seem to trigger this error quite often over the last few days. Here's another example:
Traceback (most recent call last):
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/mutalyzer/Scheduler.py", line 440, in _processNameBatch
variantchecker.check_variant(cmd, O)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/mutalyzer/variantchecker.py", line 1743, in check_variant
retrieved_record = retriever.loadrecord(record_id)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/mutalyzer/Retriever.py", line 771, in loadrecord
filename = self.fetch(identifier)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/mutalyzer/Retriever.py", line 413, in fetch
return self._update_db_md5(raw_data, name, gi)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/mutalyzer/Retriever.py", line 156, in _update_db_md5
{'checksum': md5sum})
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 3005, in update
update_op.exec_()
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1112, in exec_
self._do_exec()
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1261, in _do_exec
mapper=self.mapper)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1034, in execute
bind, close_with_result=True).execute(clause, params or {})
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
return meth(self, multiparams, params)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
compiled_sql, distilled_params
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
context)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
exc_info
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
context)
File "/opt/mutalyzer/versions/35c35b8/virtualenv/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
cursor.execute(statement, parameters)
IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "ix_references_checksum"
DETAIL: Key (checksum)=(fb263a5e992d38a549882889d14f5912) already exists.
[SQL: 'UPDATE "references" SET checksum=%(checksum)s WHERE "references".accession = %(accession_1)s'] [parameters: {'checksum': u'fb263a5e992d38a549882889d14f5912', 'accession_1': u'NM_000022.2'}]
I guess these are NM references for which an old version was in the cache. The new version has been uploaded manually (now a UD entry), but now Mutalyzer tries to update the NM to the new version but that checksum already exists.
Indeed, for this example the NM was originally added in 2011, while the UD was added in 2015. Same for the other example. The UD entries don't have a download url or slice info, so they were uploaded.
We should update the cache with the new reference that is being downloaded (NM in these examples), but we cannot throw away the other record with the same checksum (UD in these examples).
I think the easiest way to solve this is to drop the unique constraint on the checksum. There are two downsides to this:
@jfjlaros What do you think?
GitHub doesn't properly understand English, this should not be closed yet.
Same issue for M61857.1
as reported in #378. There is already an UD
in the database with the same checksum. Mutalyzer doesn't see this (as it first queries by accession), try to add this reference, causing an integrity error on checksum uniqueness.
Just now, Mutalyzer on our main server tried to update the MD5 checksum for a NM reference. This failed because there was already a reference (UD) with the new checksum in the database (indeed, it was the same file, presumably uploaded earlier by hand).
The database error occured in the
_update_db_md5
method of the retriever module. Not sure what the best course of action would be in this case though.In this case the result was a bit dramatic, since updating the MD5 checksum was triggered by a batch job and therefore the batch processor got stuck on this entry.