Closed bibionid closed 1 year ago
Hi,
Were you able to run with the test-only
db? It's a small db for testing workflow.
Also, you will need fcsgx.sif
downloaded in your working dir. And have the following parameters in the above command.
--container-engine=singularity --image=fcsgx.sif
You may not need to give this parameter--container-db "${SHM_LOC}/gxdb/" \
Could you omit--verify-checksums
also, and give it a try.
Hi @pstrope,
Thank you for your swift response to this question.
In answer to your question: yes, I was able to run with the test-only
library. Having said this, I have not tested it with the 0.3.0-beta version.
I will make the changes you suggest and report back
Hi @pstrope
I tested to see if the test-only
example would work with this current install.
I ran (this time interactively)
run_fcsgx.py --fasta ./testing/fcsgx_test.fa.gz --out-dir ./gx_out/ --container-db ./testing/ --gx-db test-only --split-fasta --tax-id 6973 --debug
and got the same error
python3 /app/bin/retrieve_db --rm --gx-db testing/test-only
usage: retrieve_db.py [-h] --gx-db GX_DB [--gx-db-disk DISK_INDEX_PATH] [--index-src GX_INDEX_SRC] [--ftp-basename GX_FTP_BASENAME] [--debug] [--print-only]
retrieve_db.py: error: unrecognized arguments: --rm
Traceback (most recent call last):
File "/usr/local/bin/run_fcsgx.py", line 296, in <module>
sys.exit(main())
File "/usr/local/bin/run_fcsgx.py", line 286, in main
gx.run()
File "/usr/local/bin/run_fcsgx.py", line 188, in run
self.run_retrieve_db()
File "/usr/local/bin/run_fcsgx.py", line 115, in run_retrieve_db
self.safe_exec(retrieve_db_args)
File "/usr/local/bin/run_fcsgx.py", line 45, in safe_exec
subprocess.run(args, shell=False, check=True, text=True, stdout=sys.stdout, stderr=sys.stderr)
File "/usr/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['python3', '/app/bin/retrieve_db', '--rm', '--gx-db', 'testing/test-only']' returned non-zero exit status 2.
If I am interpreting things correctly, I need the --container-db ./testing/
otherwise the path given to retrieve_db
is /app/db/gxdb/all
- this would prevent me using the SHM and also limit things as our HPC is isolated from the internet. As it stands I have local installs of the all
(and test-only
) databases and would like to run from those if possible?
Reguarding the addition of --container-engine=singularity --image=fcsgx.sif
, I added this although it is not possible to run with the fcsgz.sif in the same dir on our HPC. I instead passed the path to the software partition where the image is stored. I also ran on our only internet connected node (which is not for general compute) to see if this might help. Unfortunately I got a different error.
Run
run_fcsgx.py --fasta ./testing/fcsgx_test.fa.gz --out-dir ./gx_out/ --container-db ./ --gx-db all --split-fasta --tax-id 6973 --debug --image=${FCSGX_PATH} --container-engine=singularity
Output
python3 /app/bin/retrieve_db --gx-db all
fetching manifest https://ftp.ncbi.nlm.nih.gov/pub/murphyte/FCS/FCS-genome/database/release-database-v1/r2022-01-24/all.manifest
Error
Traceback (most recent call last):
File "/tmp/Bazel.runfiles_7caq6h9k/runfiles/cgr_fcs/apps/private/retrieve_db/retrieve_db.py", line 308, in <module>
sys.exit(main())
File "/tmp/Bazel.runfiles_7caq6h9k/runfiles/cgr_fcs/apps/private/retrieve_db/retrieve_db.py", line 293, in main
gx.run(
File "/tmp/Bazel.runfiles_7caq6h9k/runfiles/cgr_fcs/apps/private/retrieve_db/retrieve_db.py", line 243, in run
self.check_gx_db(gx_db, disk_index_path, gx_index_src, gx_ftp_basename)
File "/tmp/Bazel.runfiles_7caq6h9k/runfiles/cgr_fcs/apps/private/retrieve_db/retrieve_db.py", line 107, in check_gx_db
file_sizes = self.check_fs_space(gx_ftp_basename)
File "/tmp/Bazel.runfiles_7caq6h9k/runfiles/cgr_fcs/apps/private/retrieve_db/retrieve_db.py", line 55, in check_fs_space
needed_size, file_sizes = self.fetch_manifest(gx_ftp_basename + self.gx_db_name + ".manifest")
File "/tmp/Bazel.runfiles_7caq6h9k/runfiles/cgr_fcs/apps/private/retrieve_db/retrieve_db.py", line 39, in fetch_manifest
with urllib.request.urlopen(ftp_loc, context=ctx) as mr:
File "/usr/lib/python3.9/urllib/request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.9/urllib/request.py", line 523, in open
response = meth(req, response)
File "/usr/lib/python3.9/urllib/request.py", line 632, in http_response
response = self.parent.error(
File "/usr/lib/python3.9/urllib/request.py", line 561, in error
return self._call_chain(*args)
File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)
File "/usr/lib/python3.9/urllib/request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Traceback (most recent call last):
File "/usr/local/bin/run_fcsgx.py", line 296, in <module>
sys.exit(main())
File "/usr/local/bin/run_fcsgx.py", line 286, in main
gx.run()
File "/usr/local/bin/run_fcsgx.py", line 188, in run
self.run_retrieve_db()
File "/usr/local/bin/run_fcsgx.py", line 115, in run_retrieve_db
self.safe_exec(retrieve_db_args)
File "/usr/local/bin/run_fcsgx.py", line 45, in safe_exec
subprocess.run(args, shell=False, check=True, text=True, stdout=sys.stdout, stderr=sys.stderr)
File "/usr/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['python3', '/app/bin/retrieve_db', '--gx-db', 'all']' returned non-zero exit status 1.
--gx-db
takes the path to where you saved the gxdb (your local install)
--image
takes the path to the sif file
You don't need to give --container-db
, the runner script will figure it out from what you give to --gx-db
Please try this and let me know what you get.
python3 run_fcsgx.py --fasta ./testing/fcsgx_test.fa.gz --out-dir ./gx_out/ --gx-db "${SHM_LOC}/gxdb/test-only" --split-fasta --tax-id 6973 --image "${FCSGX_PATH}/fcsgx.sif" --container-engine singularity
Hi @pstrope
I'm sorry, I have been operating under a misunderstanding. The sys admins of our HPC created a modified version of the run_fcsgx.py
script. Apparently, they modified this so that it would function on our HPC, which is separate from the interent. I'm not sure what exactly has been modified, but this is what I have been running.
This is the modified file: run_fcsgx.py.txt
Upon realsisng this, I have tried two things:
1) With the modified version of run_fcsgx.py
I downloaded the fcsqx.sif
file into my working dir using
curl https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/releases/0.3.0/fcs-gx.0.3.0.sif -Lo fcsgx.sif
I then ran
srun run_fcsgx.py \
--fasta ./testing/fcsgx_test.fa.gz\
--out-dir ./gx_out/ \
--container-db "${SHM_LOC}/gxdb/" \
--gx-db "${SHM_LOC}/gxdb/all" \
--split-fasta \
--tax-id 6973 \
--debug \
--image=fcsgx.sif \
--container-engine=singularity
When I ran this version with the above submission, it returned the St13runtime_error
that I reported in #24 as documented in the logs below:
xy_fcs_52548698_n83032.out.txt
xy_fcs_52548698_n83032.err.txt
2) with the 0.3.0-beta release version of run_fcsgx.py
I git clone
d the most recent version of run_fcsgx.py
to my working dir and ran the following, using the same .sif
as above
srun run_fcsgx.py \
--fasta ./testing/fcsgx_test.fa.gz\
--out-dir ./gx_out/ \
--gx-db "${SHM_LOC}/gxdb/all" \
--split-fasta \
--tax-id 6973 \
--debug \
--image=fcsgx.sif \
--container-engine=singularity
Unfortunately this also failed, but with a different error, as described in these logs: xy_fcs_52551970_n83032.out.txt xy_fcs_52551970_n83032.err.txt
I'm sorry for the confusion, and hope this might help diagnose on of the issues?
Hi, Looks like the un-modified runner script was working OK for a bit. I noticed you are using old db. Please download the latest gx db and try with the run_fcsgx.py that we provide. I think that should work.
Hi @pstrope,
I did as you suggested, and not only did the all
test work, but the following analysis on my data.
Thank you for your patience with this request, and I'm sorry for any of your time I may have wasted working under false pretences. This tool seems awesome and I look forward to using it in future!
Thanks again
Hello,
Thanks for you work of fcs!
I'm trying to pick up from #24, where you suggested I install the newest version.
I did this, and am now encountering a different error.
I am running
I am running on a CentOS 7 HPC using slurm, I ran the job with 4 cpus and 640Gb of RAM. I am running the 0.3.0-beta version installed from binaries. Python version is 3.8.5.
Log files are attatched here: xy_fcs_52424700_n83032.err.txt xy_fcs_52424700_n83032.out.txt
I would greatly appreciate any insight you can offer into this issue I am encountering here.
Many thanks in advance