superphy / spfy

Spfy: an integrated graph database for real-time prediction of Escherichia coli phenotypes and downstream comparative analyses
https://lfz.corefacility.ca/superphy/grouch/
Apache License 2.0
4 stars 2 forks source link

script to load files into subtyping jobs w/o reactapp #197

Closed kevinkle closed 7 years ago

kevinkle commented 7 years ago

would have to mount the folder with all the genome files to /datastore in docker containers

kevinkle commented 7 years ago

2 ways to connect to the Redis DB, as required to enqueue jobs:

  1. Run the script within a docker container
  2. Run the script on the host, but map the redis port to the host and account for this in the script

Note: both still require the folder of genome files to be mounted to the docker containers.

I'm going to go with 1. as users' won't have to install the conda env in order to run the script.

kevinkle commented 7 years ago

As of https://github.com/superphy/backend/commit/5005be089aaee59293871dc29f1c4c113f97960a the script enqueues, but I need to create some func for handling filenames since it is expecting timestamps on the files (which doesn't occur because we bypassed Flask).

Note: https://github.com/superphy/backend/tree/rc-5.0.0 uses uniquely named folders which makes this error obsolete.

kevinkle commented 7 years ago

ECTyper is failing for 3 of the 15 test files.


modules.ectyper.call_ectyper.call_ectyper({'i': '/datastore/GCA_900015735.1_ED178_contigs_genomic.fna', 'pi': 90, 'options': {'bulk': True, 'pi': 90, 'amr': True, 'serotype': True, 'vf': True}}) from singles67d784ad-2fd1-445a-bb05-fe4675d1410eFailed 4 minutes agoTraceback (most recent call last):   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/worker.py", line 700, in perform_job     rv = job.perform()   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/job.py", line 500, in perform     self._result = self.func(*self.args, **self.kwargs)   File "./modules/ectyper/call_ectyper.py", line 39, in call_ectyper     '-pi', str(args_dict['pi'])   File "/opt/conda/envs/backend/lib/python2.7/subprocess.py", line 219, in check_output     raise CalledProcessError(retcode, cmd, output=output) CalledProcessError: Command '['/app/modules/ectyper/ecoli_serotyping/src/Tools_Controller/tools_controller.py', '-in', '/app/modules/ectyper/temp.fna', '-s', '1', '-vf', '1', '-pi', '90']' returned non-zero exit status 1 | 6 minutes ago | Requeue Cancel
-- | -- | --
modules.ectyper.call_ectyper.call_ectyper({'i': '/datastore/GCA_900016125.1_EF467_contigs_genomic.fna', 'pi': 90, 'options': {'bulk': True, 'pi': 90, 'amr': True, 'serotype': True, 'vf': True}}) from singles7da9a045-1085-4981-9f23-35d05f45f7fdFailed 4 minutes agoTraceback (most recent call last):   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/worker.py", line 700, in perform_job     rv = job.perform()   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/job.py", line 500, in perform     self._result = self.func(*self.args, **self.kwargs)   File "./modules/ectyper/call_ectyper.py", line 39, in call_ectyper     '-pi', str(args_dict['pi'])   File "/opt/conda/envs/backend/lib/python2.7/subprocess.py", line 219, in check_output     raise CalledProcessError(retcode, cmd, output=output) CalledProcessError: Command '['/app/modules/ectyper/ecoli_serotyping/src/Tools_Controller/tools_controller.py', '-in', '/app/modules/ectyper/temp.fna', '-s', '1', '-vf', '1', '-pi', '90']' returned non-zero exit status 1 | 6 minutes ago | Requeue Cancel
modules.ectyper.call_ectyper.call_ectyper({'i': '/datastore/GCA_001911825.1_ASM191182v1_genomic.fna', 'pi': 90, 'options': {'bulk': True, 'pi': 90, 'amr': True, 'serotype': True, 'vf': True}}) from singlese7bdb83c-edb6-4546-b6b9-f5e258fe5747Failed 4 minutes agoTraceback (most recent call last):   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/worker.py", line 700, in perform_job     rv = job.perform()   File "/opt/conda/envs/backend/lib/python2.7/site-packages/rq/job.py", line 500, in perform     self._result = self.func(*self.args, **self.kwargs)   File "./modules/ectyper/call_ectyper.py", line 39, in call_ectyper     '-pi', str(args_dict['pi'])   File "/opt/conda/envs/backend/lib/python2.7/subprocess.py", line 219, in check_output     raise CalledProcessError(retcode, cmd, output=output) CalledProcessError: Command '['/app/modules/ectyper/ecoli_serotyping/src/Tools_Controller/tools_controller.py', '-in', '/app/modules/ectyper/temp.fna', '-s', '1', '-vf', '1', '-pi', '90']' returned non-zero exit status 1
kevinkle commented 7 years ago

Looks like https://github.com/superphy/backend/issues/197#issuecomment-321669500 is a reproducible error specific to those 3 files.

kevinkle commented 7 years ago

The errors appear to be linked to the latest shortnames update.

Works for https://github.com/phac-nml/ecoli_serotyping/commit/bd29b9c8703f13d96be9000b489651807e740622

But not for https://github.com/phac-nml/ecoli_serotyping/commit/409dd8917eaa177a480162adbeb2812bb5948068

https://github.com/phac-nml/ecoli_serotyping/compare/bd29b9c8703f13d96be9000b489651807e740622...409dd8917eaa177a480162adbeb2812bb5948068 shows that the only thing changed was the shrotnames file.

kevinkle commented 7 years ago

Testing ECTyper at https://github.com/phac-nml/ecoli_serotyping/commit/409dd8917eaa177a480162adbeb2812bb5948068 directly:

(backend) kevin@scatter:~/Desktop/ecoli_serotyping$ ./src/Tools_Controller/tools_controller.py -in ~/Desktop/15-genomes/GCA_900015735.1_ED178_contigs_genomic.fna -s 1 -vf 1 -pi 90
Traceback (most recent call last):
  File "/home/kevin/Desktop/ecoli_serotyping/src/Tools_Controller/../Virulence_Factors/virulencefactors.py", line 219, in <module>
    parseFile(results_file, args.percentLength, args.percentIdentity)
  File "/home/kevin/Desktop/ecoli_serotyping/src/Tools_Controller/../Virulence_Factors/virulencefactors.py", line 154, in parseFile
    match = match.group()
AttributeError: 'NoneType' object has no attribute 'group'
Traceback (most recent call last):
  File "./src/Tools_Controller/tools_controller.py", line 156, in <module>
    "-pi", str(args.percentIdentity), '-csv', str(args.csv) , '-min', str(args.mingenomes)])
  File "/home/kevin/miniconda2/envs/backend/lib/python2.7/subprocess.py", line 219, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['/home/kevin/Desktop/ecoli_serotyping/src/Tools_Controller/../Virulence_Factors/virulencefactors.py', '--input', '/home/kevin/Desktop/15-genomes/GCA_900015735.1_ED178_contigs_genomic.fna', '-pl', '90', '-pi', '90', '-csv', '1', '-min', '1']' returned non-zero exit status 1
kevinkle commented 7 years ago

At the old commit:

(backend) kevin@scatter:~/Desktop/ecoli_serotyping$ git checkout bd29b9c8703f13d96be9000b489651807e740622
Note: checking out 'bd29b9c8703f13d96be9000b489651807e740622'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at bd29b9c... made change to all makeblastdb calls
(backend) kevin@scatter:~/Desktop/ecoli_serotyping$ ./src/Tools_Controller/tools_controller.py -in ~/Desktop/15-genomes/GCA_900015735.1_ED178_contigs_genomic.fna -s 1 -vf 1 -pi 90
Traceback (most recent call last):
  File "/home/kevin/Desktop/ecoli_serotyping/src/Tools_Controller/../Virulence_Factors/virulencefactors.py", line 219, in <module>
    parseFile(results_file, args.percentLength, args.percentIdentity)
  File "/home/kevin/Desktop/ecoli_serotyping/src/Tools_Controller/../Virulence_Factors/virulencefactors.py", line 154, in parseFile
    match = match.group()
AttributeError: 'NoneType' object has no attribute 'group'
Traceback (most recent call last):
  File "./src/Tools_Controller/tools_controller.py", line 156, in <module>
    "-pi", str(args.percentIdentity), '-csv', str(args.csv) , '-min', str(args.mingenomes)])
  File "/home/kevin/miniconda2/envs/backend/lib/python2.7/subprocess.py", line 219, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['/home/kevin/Desktop/ecoli_serotyping/src/Tools_Controller/../Virulence_Factors/virulencefactors.py', '--input', '/home/kevin/Desktop/15-genomes/GCA_900015735.1_ED178_contigs_genomic.fna', '-pl', '90', '-pi', '90', '-csv', '1', '-min', '1']' returned non-zero exit status 1
(backend) kevin@scatter:~/Desktop/ecoli_serotyping$ git status
HEAD detached at bd29b9c
Untracked files:
  (use "git add <file>..." to include in what will be committed)

    temp/

nothing added to commit but untracked files present (use "git add" to track)
(backend) kevin@scatter:~/Desktop/ecoli_serotyping$ rm -rf temp/
(backend) kevin@scatter:~/Desktop/ecoli_serotyping$ ./src/Tools_Controller/tools_controller.py -in ~/Desktop/15-genomes/GCA_900015735.1_ED178_contigs_genomic.fna -s 1 -vf 1 -pi 90
{'GCA_900015735.1_ED178_contigs_genomic': {'Virulence Factors': {'FAWB01000015.1': [{'START': 1, 'STOP': 1581, 'ORIENTATION': '-', 'GENE_NAME': 'CAC39286'}, {'START': 1, 'STOP': 1581, 'ORIENTATION': '-', 'GENE_NAME': 'espI'}, {'START': 3210, 'STOP': 3475, 'ORIENTATION': '-', 'GENE_NAME': 'epeA'}], 'FAWB01000059.1': [{'START': 1281, 'STOP': 1401, 'ORIENTATION': '-', 'GENE_NAME': 'entD'}], 'FAWB01000060.1': [{'START': 56, 'STOP': 853, 'ORIENTATION': '-', 'GENE_NAME': 'ehaB'}, {'START': 764, 'STOP': 853, 'ORIENTATION': '-', 'GENE_NAME': 'ehaB'}, {'START': 755, 'STOP': 853, 'ORIENTATION': '-', 'GENE_NAME': 'upaC'}, {'START': 755, 'STOP': 853, 'ORIENTATION': '-', 'GENE_NAME': 'ehaB'}, {'START': 773, 'STOP': 853, 'ORIENTATION': '-', 'GENE_NAME': 'ehaB'}, {'START': 1, 'STOP': 55, 'ORIENTATION': '-', 'GENE_NAME': 'fimD'}, {'START': 12243, 'STOP': 12277, 'ORIENTATION': '+', 'GENE_NAME': 'entD'}], 'FAWB01000046.1': [{'START': 2956, 'STOP': 4517, 'ORIENTATION': '-', 'GENE_NAME': 'malx'}, {'START': 21029, 'STOP': 21592, 'ORIENTATION': '-', 'GENE_NAME': 'ppdb'}, {'START': 21583, 'STOP': 22053, 'ORIENTATION': '-', 'GENE_NAME': 'ppda'}, {'START': 20625, 'STOP': 21032, 'ORIENTATION': '-', 'GENE_NAME': 'ygdb'}, {'START': 20317, 'STOP': 20640, 'ORIENTATION': '-', 'GENE_NAME': 'ppdc'}], 'FAWB01000098.1': [{'START': 1, 'STOP': 45, 'ORIENTATION': '-', 'GENE_NAME': 'fimD'}], 'FAWB01000013.1': [{'START': 1, 'STOP': 449, 'ORIENTATION': '-', 'GENE_NAME': 'tccP2'}, {'START': 1106, 'STOP': 1278, 'ORIENTATION': '-', 'GENE_NAME': 'espV'}, {'START': 1, 'STOP': 194, 'ORIENTATION': '-', 'GENE_NAME': 'espF'}, {'START': 1, 'STOP': 194, 'ORIENTATION': '-', 'GENE_NAME': 'tccP2'}], 'FAWB01000135.1': [{'START': 3010, 'STOP': 4632, 'ORIENTATION': '+', 'GENE_NAME': 'nadb'}], 'FAWB01000102.1': [{'START': 2
kevinkle commented 7 years ago

Fixed in https://github.com/superphy/backend/commit/30855f9bd365949621b0eeacdb6218c0e40968a9

kevinkle commented 7 years ago

Testing began on 50,000 genomes at 8868.pts-0.panther (08/11/2017 10:03:56 AM) (Detached)

kevinkle commented 7 years ago

Merged in https://github.com/superphy/backend/pull/198 . Closing issue.