superphy / spfy

Spfy: an integrated graph database for real-time prediction of Escherichia coli phenotypes and downstream comparative analyses
https://lfz.corefacility.ca/superphy/grouch/
Apache License 2.0
4 stars 2 forks source link

error when headers are `>NODE_2_length_339420_cov_157.209` #225

Closed kevinkle closed 6 years ago

kevinkle commented 6 years ago

https://github.com/superphy/backend/issues/187#issuecomment-330306689 is incorrect, this is an error with spfy in some way.

https://github.com/superphy/backend/commit/e61a0d498e738a6454637d457c3569c64712145c with command (backend) ubuntu@host-10-1-5-81:/opt/backend/app$ python -m modules/qc/qc -i tests/headers/ESC_AA7855AA_AS-error.fasta returns True, so graphing and SeqIO should be working ok for this header name

adding sample| prefix so >sample1|NODE_2_length_339420_cov_157.209 should fix this

chadlaing [1:32 PM] 
hey @kevin, I received two test files from our Collaborator and tried one of them on `spfy`. Initially it said QC passed, but when I checked later the job had failed and it said QC failed. Id is `Job failed. Key: 81de98df-938b-4628-b0c3-5fb6559f4f15 /`

kevin [1:58 PM] 
https://lfz.corefacility.ca/superphy/spfy/results/81de98df-938b-4628-b0c3-5fb6559f4f15

[1:59] 
Looks like this is linked to our blazegraph issue

chadlaing [2:00 PM] 
ok, thanks

[2:01] 
is that waiting for eg. the AMR to be computed, or simply a query of the database

kevin [2:03 PM] 
It's trying to lookup the spfyid for an uploaded file and the db isn't responding

chadlaing 
[2:07 PM] 
ok

[2:08] 
three minutes with no response? very strange

chadlaing 
[2:16 PM] 
do you have any idea why it would consistently fail with one file but not another?

chadlaing [2:24 PM] 
anecdotally, if I take the failed file that has headers like so: '>NODE_2_length_339420_cov_157.209' and edit them so they all share a common name like `>sample1|NODE_2_length_339420_cov_157.209` then everything works as expected, and quickly to boot

kevin
[2:25 PM] 
Hmm

[2:27] 
That would suggest another cause of this error

[2:29] 
I'll try adding some parsing checks to the qc module tonight and run the same setup

chadlaing [2:29 PM] 
ok -- in the mean time I can suggest that she add an ID to her inputs
kevinkle commented 6 years ago

I wonder if this is related to the command results = sparql.query().convert()

kevinkle commented 6 years ago

Unable to replicate this issue locally using files in https://github.com/superphy/backend/tree/fix-header-parsing/app/tests/headers

kevinkle commented 6 years ago

Read the comment wrong...

kevinkle commented 6 years ago

on corefacility:

Su
ESC_AA7855AA_AS-works.fasta with pi: 90 for Serotype VF
Submitted: 12:04:36 PM, Status: FAILED
ERROR WITH JOB: blob345709520386712084
Traceback (most recent call last): File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/worker.py", line 700, in perform_job rv = job.perform() File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/job.py", line 500, in perform self._result = self.func(*self.args, **self.kwargs) File "./modules/blazeUploader/reserve_id.py", line 138, in write_reserve_id spfyid = reserve_id(query_file) File "./modules/blazeUploader/reserve_id.py", line 121, in reserve_id largest = check_largest_spfyid() File "./modules/blazeUploader/reserve_id.py", line 62, in check_largest_spfyid results = sparql.query().convert() File "/opt/conda/envs/backend/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py", line 567, in query return QueryResult(self._query()) File "/opt/conda/envs/backend/lib/python2.7/site-packages/SPARQLWrapper/Wrapper.py", line 537, in _query response = urlopener(request) File "/opt/conda/envs/backend/lib/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/opt/conda/envs/backend/lib/python2.7/urllib2.py", line 429, in open response = self._open(req, data) File "/opt/conda/envs/backend/lib/python2.7/urllib2.py", line 447, in _open '_open', req) File "/opt/conda/envs/backend/lib/python2.7/urllib2.py", line 407, in _call_chain result = func(*args) File "/opt/conda/envs/backend/lib/python2.7/urllib2.py", line 1228, in http_open return self.do_open(httplib.HTTPConnection, req) File "/opt/conda/envs/backend/lib/python2.7/urllib2.py", line 1201, in do_open r = h.getresponse(buffering=True) File "/opt/conda/envs/backend/lib/python2.7/site-packages/raven/breadcrumbs.py", line 346, in getresponse rv = real_getresponse(self, *args, **kwargs) File "/opt/conda/envs/backend/lib/python2.7/httplib.py", line 1121, in getresponse response.begin() File "/opt/conda/envs/backend/lib/python2.7/httplib.py", line 438, in begin version, status, reason = self._read_status() File "/opt/conda/envs/backend/lib/python2.7/httplib.py", line 394, in _read_status line = self.fp.readline(_MAXLINE + 1) File "/opt/conda/envs/backend/lib/python2.7/socket.py", line 480, in readline data = self._sock.recv(self._rbufsize) File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/timeouts.py", line 51, in handle_death_penalty 'value ({0} seconds)'.format(self._timeout)) JobTimeoutException: Job exceeded maximum timeout value (180 seconds)
Su
ESC_AA7855AA_AS-error.fasta with pi: 90 for Serotype VF
Submitted: 12:04:50 PM, Status: COMPLETE
JobId: blob2384235990117585805

input
SEE RESULT

locally:

Su
ESC_AA7855AA_AS-works.fasta with pi: 90 for Serotype VF
Submitted: 12:05:09 PM, Status: COMPLETE
JobId: blob4062748484315444847

input
SEE RESULT
Su
ESC_AA7855AA_AS-error.fasta with pi: 90 for Serotype VF
Submitted: 12:05:16 PM, Status: COMPLETE
JobId: blob6882864231611697523

input
SEE RESULT
kevinkle commented 6 years ago

Able to replicate on corefacility. https://github.com/superphy/backend/blob/fix-header-parsing/app/tests/headers/sample1_edit.fasta (works) https://github.com/superphy/backend/blob/fix-header-parsing/app/tests/headers/sample4.fasta (doesn't work)

However, both files work locally.

kevinkle commented 6 years ago

corefacility is running on f4befb9

kevinkle commented 6 years ago

https://github.com/superphy/backend/compare/f4befb9...master Can't see anything that would be diff. from master

kevinkle commented 6 years ago

On Cybera, both files work as well when running master. Need to test if issue is related to https://github.com/superphy/backend/commit/f4befb94d1b09b0f20bd56883c29951fa3894b4d or corefacility

kevinkle commented 6 years ago

Unable to replicate error with https://github.com/superphy/backend/commit/f4befb94d1b09b0f20bd56883c29951fa3894b4d on Cybera.

kevinkle commented 6 years ago

Looks like this wasn't related to Spfy. We have moved corefacility's backend temporarily to cybera which bypasses the blazegraph issue. Will https://github.com/superphy/backend/issues/247 to see if this fixes that issue. Closing issue.