ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
88 stars 12 forks source link

[BUG]: retrieve_db.py: error: unrecognized arguments: --rm #26

Closed bibionid closed 1 year ago

bibionid commented 1 year ago

Hello,

Thanks for you work of fcs!

I'm trying to pick up from #24, where you suggested I install the newest version.

I did this, and am now encountering a different error.

I am running

workDir=${USER_WORK_DIR}
SHM_LOC=/dev/shm/${USER}
cd ${workDir}

# This was done previously and the all.* db remains in the SHM
# cp -r ./all.* ${SHM_LOC}/gxdb/

srun run_fcsgx.py \
  --fasta ./testing/fcsgx_test.fa.gz\
  --out-dir ./gx_out/ \
  --container-db "${SHM_LOC}/gxdb/" \
  --gx-db "${SHM_LOC}/gxdb/all" \
  --split-fasta \
  --tax-id 6973 \
  --debug \
  --verify-checksums

I am running on a CentOS 7 HPC using slurm, I ran the job with 4 cpus and 640Gb of RAM. I am running the 0.3.0-beta version installed from binaries. Python version is 3.8.5.

Log files are attatched here: xy_fcs_52424700_n83032.err.txt xy_fcs_52424700_n83032.out.txt

I would greatly appreciate any insight you can offer into this issue I am encountering here.

Many thanks in advance

pstrope commented 1 year ago

Hi, Were you able to run with the test-only db? It's a small db for testing workflow.

pstrope commented 1 year ago

Also, you will need fcsgx.sif downloaded in your working dir. And have the following parameters in the above command. --container-engine=singularity --image=fcsgx.sif

You may not need to give this parameter--container-db "${SHM_LOC}/gxdb/" \

Could you omit--verify-checksums also, and give it a try.

bibionid commented 1 year ago

Hi @pstrope,

Thank you for your swift response to this question.

In answer to your question: yes, I was able to run with the test-only library. Having said this, I have not tested it with the 0.3.0-beta version.

I will make the changes you suggest and report back

bibionid commented 1 year ago

Hi @pstrope

I tested to see if the test-only example would work with this current install.

I ran (this time interactively)

run_fcsgx.py   --fasta ./testing/fcsgx_test.fa.gz  --out-dir ./gx_out/   --container-db ./testing/   --gx-db test-only   --split-fasta   --tax-id 6973   --debug

and got the same error

python3 /app/bin/retrieve_db --rm --gx-db testing/test-only
usage: retrieve_db.py [-h] --gx-db GX_DB [--gx-db-disk DISK_INDEX_PATH] [--index-src GX_INDEX_SRC] [--ftp-basename GX_FTP_BASENAME] [--debug] [--print-only]
retrieve_db.py: error: unrecognized arguments: --rm
Traceback (most recent call last):
  File "/usr/local/bin/run_fcsgx.py", line 296, in <module>
    sys.exit(main())
  File "/usr/local/bin/run_fcsgx.py", line 286, in main
    gx.run()
  File "/usr/local/bin/run_fcsgx.py", line 188, in run
    self.run_retrieve_db()
  File "/usr/local/bin/run_fcsgx.py", line 115, in run_retrieve_db
    self.safe_exec(retrieve_db_args)
  File "/usr/local/bin/run_fcsgx.py", line 45, in safe_exec
    subprocess.run(args, shell=False, check=True, text=True, stdout=sys.stdout, stderr=sys.stderr)
  File "/usr/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['python3', '/app/bin/retrieve_db', '--rm', '--gx-db', 'testing/test-only']' returned non-zero exit status 2.

If I am interpreting things correctly, I need the --container-db ./testing/ otherwise the path given to retrieve_db is /app/db/gxdb/all - this would prevent me using the SHM and also limit things as our HPC is isolated from the internet. As it stands I have local installs of the all (and test-only) databases and would like to run from those if possible?

Reguarding the addition of --container-engine=singularity --image=fcsgx.sif, I added this although it is not possible to run with the fcsgz.sif in the same dir on our HPC. I instead passed the path to the software partition where the image is stored. I also ran on our only internet connected node (which is not for general compute) to see if this might help. Unfortunately I got a different error.

Run

run_fcsgx.py   --fasta ./testing/fcsgx_test.fa.gz  --out-dir ./gx_out/   --container-db ./   --gx-db all   --split-fasta   --tax-id 6973   --debug   --image=${FCSGX_PATH}   --container-engine=singularity

Output

python3 /app/bin/retrieve_db --gx-db all
fetching manifest https://ftp.ncbi.nlm.nih.gov/pub/murphyte/FCS/FCS-genome/database/release-database-v1/r2022-01-24/all.manifest

Error

Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_7caq6h9k/runfiles/cgr_fcs/apps/private/retrieve_db/retrieve_db.py", line 308, in <module>
    sys.exit(main())
  File "/tmp/Bazel.runfiles_7caq6h9k/runfiles/cgr_fcs/apps/private/retrieve_db/retrieve_db.py", line 293, in main
    gx.run(
  File "/tmp/Bazel.runfiles_7caq6h9k/runfiles/cgr_fcs/apps/private/retrieve_db/retrieve_db.py", line 243, in run
    self.check_gx_db(gx_db, disk_index_path, gx_index_src, gx_ftp_basename)
  File "/tmp/Bazel.runfiles_7caq6h9k/runfiles/cgr_fcs/apps/private/retrieve_db/retrieve_db.py", line 107, in check_gx_db
    file_sizes = self.check_fs_space(gx_ftp_basename)
  File "/tmp/Bazel.runfiles_7caq6h9k/runfiles/cgr_fcs/apps/private/retrieve_db/retrieve_db.py", line 55, in check_fs_space
    needed_size, file_sizes = self.fetch_manifest(gx_ftp_basename + self.gx_db_name + ".manifest")
  File "/tmp/Bazel.runfiles_7caq6h9k/runfiles/cgr_fcs/apps/private/retrieve_db/retrieve_db.py", line 39, in fetch_manifest
    with urllib.request.urlopen(ftp_loc, context=ctx) as mr:
  File "/usr/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/usr/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Traceback (most recent call last):
  File "/usr/local/bin/run_fcsgx.py", line 296, in <module>
    sys.exit(main())
  File "/usr/local/bin/run_fcsgx.py", line 286, in main
    gx.run()
  File "/usr/local/bin/run_fcsgx.py", line 188, in run
    self.run_retrieve_db()
  File "/usr/local/bin/run_fcsgx.py", line 115, in run_retrieve_db
    self.safe_exec(retrieve_db_args)
  File "/usr/local/bin/run_fcsgx.py", line 45, in safe_exec
    subprocess.run(args, shell=False, check=True, text=True, stdout=sys.stdout, stderr=sys.stderr)
  File "/usr/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['python3', '/app/bin/retrieve_db', '--gx-db', 'all']' returned non-zero exit status 1. 
pstrope commented 1 year ago

--gx-db takes the path to where you saved the gxdb (your local install) --image takes the path to the sif file You don't need to give --container-db, the runner script will figure it out from what you give to --gx-db

Please try this and let me know what you get.

python3 run_fcsgx.py --fasta ./testing/fcsgx_test.fa.gz --out-dir ./gx_out/ --gx-db "${SHM_LOC}/gxdb/test-only" --split-fasta --tax-id 6973 --image "${FCSGX_PATH}/fcsgx.sif" --container-engine singularity
bibionid commented 1 year ago

Hi @pstrope

I'm sorry, I have been operating under a misunderstanding. The sys admins of our HPC created a modified version of the run_fcsgx.py script. Apparently, they modified this so that it would function on our HPC, which is separate from the interent. I'm not sure what exactly has been modified, but this is what I have been running.

This is the modified file: run_fcsgx.py.txt

Upon realsisng this, I have tried two things:

1) With the modified version of run_fcsgx.py

I downloaded the fcsqx.sif file into my working dir using

curl https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/releases/0.3.0/fcs-gx.0.3.0.sif -Lo fcsgx.sif

I then ran

srun run_fcsgx.py \
  --fasta ./testing/fcsgx_test.fa.gz\
  --out-dir ./gx_out/ \
  --container-db "${SHM_LOC}/gxdb/" \
  --gx-db "${SHM_LOC}/gxdb/all" \
  --split-fasta \
  --tax-id 6973 \
  --debug \
  --image=fcsgx.sif \
  --container-engine=singularity

When I ran this version with the above submission, it returned the St13runtime_error that I reported in #24 as documented in the logs below: xy_fcs_52548698_n83032.out.txt xy_fcs_52548698_n83032.err.txt

2) with the 0.3.0-beta release version of run_fcsgx.py

I git cloned the most recent version of run_fcsgx.py to my working dir and ran the following, using the same .sif as above

srun run_fcsgx.py \
  --fasta ./testing/fcsgx_test.fa.gz\
  --out-dir ./gx_out/ \
  --gx-db "${SHM_LOC}/gxdb/all" \
  --split-fasta \
  --tax-id 6973 \
  --debug \
  --image=fcsgx.sif \
  --container-engine=singularity

Unfortunately this also failed, but with a different error, as described in these logs: xy_fcs_52551970_n83032.out.txt xy_fcs_52551970_n83032.err.txt

I'm sorry for the confusion, and hope this might help diagnose on of the issues?

pstrope commented 1 year ago

Hi, Looks like the un-modified runner script was working OK for a bit. I noticed you are using old db. Please download the latest gx db and try with the run_fcsgx.py that we provide. I think that should work.

bibionid commented 1 year ago

Hi @pstrope,

I did as you suggested, and not only did the all test work, but the following analysis on my data.

Thank you for your patience with this request, and I'm sorry for any of your time I may have wasted working under false pretences. This tool seems awesome and I look forward to using it in future!

Thanks again