superphy / spfy

Spfy: an integrated graph database for real-time prediction of Escherichia coli phenotypes and downstream comparative analyses
https://lfz.corefacility.ca/superphy/grouch/
Apache License 2.0
4 stars 2 forks source link

Blazegraph upload tasks should be added to it's own queue #247

Closed kevinkle closed 6 years ago

kevinkle commented 6 years ago

Right now, every call of datastruct_savvy() calls upload_graph() separately; with a large number of workers, this might be causing Blazegraph to hang up when running in corefacility.

The way to solve this would be to merge a few of the current queues:

  1. priority is currently used to run blazegraph queries for the frontend
  2. blazegraph is currently used to reserve spfyids for uploaded files
  3. multiples (for RGI) and singles (for ECTyper) can each invoke the upload_graph() function and cause simultaneous uploading of result graphs.

There are a number of permutations for this, but for now I'm going to try and just group 3. into their own queue. This is because 2. is fairly valuable since all tasks are dependent on it, thus we want to keep it separate. Ideally, by merging 3. and only having one worker on it, we can avoid overloading Blazegraph.

Few approaches to do this:

  1. create a new task for uploading which will require modifying the routes to return the upload task instead of the datastruct_savvy() task as the end task. (Again, still waiting on multi-job deps https://github.com/nvie/rq/pull/856)
  2. create a decorator for uploading which sidesteps route modification, but means that users are blind to when their files are actually loaded into the database, though they will still get results.

I'm going to go with 2. as it will be fast to dev. and test this theory; we can also use the decorators to eventually build full job classes.

kevinkle commented 6 years ago

https://github.com/superphy/backend/commit/8adb0648ca715844fc881c4074912fd3ad52fa47 looks like the wrapper causes the enqueue call in spfy.py to try and enqueue the return from the database upload. Making changes.


<?xml version="1.0"?><data modified="11355" milliseconds="2930"/>() from blazegraph_uploads24972297-151e-45d7-bccf-f6e33b744125Failed 4 hours agoTraceback (most recent call last):   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/worker.py", line 700, in perform_job     rv = job.perform()   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/job.py", line 500, in perform     self._result = self.func(*self.args, **self.kwargs)   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/job.py", line 206, in func     return import_attribute(self.func_name)   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/utils.py", line 150, in import_attribute     module = importlib.import_module(module_name)   File "/opt/conda/envs/backend/lib/python2.7/importlib/__init__.py", line 37, in import_module     __import__(name) ImportError: No module named <?xml version="1 | 4 hours ago | Requeue Cancel
-- | -- | --
<?xml version="1.0"?><data modified="1493" milliseconds="1224"/>() from blazegraph_uploads81ba31d9-fb7c-417d-8436-885e1fcd716dFailed 4 hours agoTraceback (most recent call last):   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/worker.py", line 700, in perform_job     rv = job.perform()   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/job.py", line 500, in perform     self._result = self.func(*self.args, **self.kwargs)   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/job.py", line 206, in func     return import_attribute(self.func_name)   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/utils.py", line 150, in import_attribute     module = importlib.import_module(module_name)   File "/opt/conda/envs/backend/lib/python2.7/importlib/__init__.py", line 37, in import_module     __import__(name) ImportError: No module named <?xml version="1 | 4 hours ago | Requeue Cancel
<?xml version="1.0"?><data modified="13915" milliseconds="3280"/>() from blazegraph_uploadse6219706-5d43-4bb4-917c-18fdc7ebe579Failed 18 minutes agoTraceback (most recent call last):   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/worker.py", line 700, in perform_job     rv = job.perform()   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/job.py", line 500, in perform     self._result = self.func(*self.args, **self.kwargs)   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/job.py", line 206, in func     return import_attribute(self.func_name)   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/utils.py", line 150, in import_attribute     module = importlib.import_module(module_name)   File "/opt/conda/envs/backend/lib/python2.7/importlib/__init__.py", line 37, in import_module     __import__(name) ImportError: No module named <?xml version="1 | 18 minutes ago | Requeue Cancel
<?xml version="1.0"?><data modified="1363" milliseconds="788"/>() from blazegraph_uploads4232449e-ddc9-4596-8019-d2d9dd61109fFailed 15 minutes agoTraceback (most recent call last):   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/worker.py", line 700, in perform_job     rv = job.perform()   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/job.py", line 500, in perform     self._result = self.func(*self.args, **self.kwargs)   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/job.py", line 206, in func     return import_attribute(self.func_name)   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/utils.py", line 150, in import_attribute     module = importlib.import_module(module_name)   File "/opt/conda/envs/backend/lib/python2.7/importlib/__init__.py", line 37, in import_module     __import__(name) ImportError: No module named <?xml version="1
kevinkle commented 6 years ago

Working as of https://github.com/superphy/backend/commit/1a3a117dc8461a4654acdfa1e99e066299f02802

kevinkle commented 6 years ago

Merged in https://github.com/superphy/backend/pull/252