kbaseattic / assembly

An extensible framework for genome assembly.
MIT License
12 stars 14 forks source link

mongodb sort error #299

Closed levinas closed 9 years ago

levinas commented 9 years ago

The jenkins test user has submitted enough jobs (~2000) to trigger this limit in mongodb sort:

        <h2>500 Internal Server Error</h2>
        <p>The server encountered an unexpected condition which prevented it from fulfilling the request.</p>
        <pre id="traceback">Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670, in respond
    response.body = self.handler()
  File "/usr/local/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 217, in __call__
    self.body = self.oldhandler(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/home/ubuntu/assembly/lib/assembly/router.py", line 504, in status
    docs = [sanitize_doc(d) for d in metadata.list_jobs(userid)]
  File "/home/ubuntu/assembly/lib/assembly/metadata.py", line 107, in list_jobs
    for j in jobs.find({'ARASTUSER':user}).sort('job_id', 1):
  File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 1058, in next
    if len(self.__data) or self._refresh():
  File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 1002, in _refresh
    self.__uuid_subtype))
  File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 940, in __send_message
    self.__compile_re)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/helpers.py", line 109, in _unpack_response
    error_object)
OperationFailure: database error: too much data for sort() with no index.  add an index or specify a smaller limit
cbun commented 9 years ago

OperationFailure: database error: too much data for sort() with no index. add an index or specify a smaller limit

I think the current implementation does something dumb and queries all user jobs first and limits the records in Python space. We need to limit the query in the mongo call instead.

cbun commented 9 years ago

Oh wait.. this is a sort problem. I guess mongo needs to sort before returning the latest jobs. We probably need to index.

sebhtml commented 9 years ago

ensureIndex !

The query is:

jobs.find({'ARASTUSER':user}).sort('job_id', 1)

For the find command:

db.jobs.ensureIndex({'ARASTUSER': 1});

For the sort command:

db.jobs.ensureIndex({'job_id': 1});

levinas commented 9 years ago

That does it? Could you create a patch?

On Feb 23, 2015, at 5:52 PM, Sébastien Boisvert notifications@github.com wrote:

ensureIndex !

The query is:

jobs.find({'ARASTUSER':user}).sort('job_id', 1)

For the find command:

db.jobs.ensureIndex({'ARASTUSER': 1});

For the sort command:

db.jobs.ensureIndex({'job_id': 1});

— Reply to this email directly or view it on GitHub https://github.com/kbase/assembly/issues/299#issuecomment-75655622.

sebhtml commented 9 years ago

These can be run from the mongo shell. But we should also add some code that create these indexes in the controller daemon on startup I suppose.