ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
101 stars 13 forks source link

Running FCS-GX without internet access #3

Closed olekto closed 2 years ago

olekto commented 2 years ago

Hi, thank you for these tools. The community lacked something as user-friendly as these.

I am able to run the test case on our login nodes for our cluster, and that downloads the database so we have a local copy. However, our work nodes don't have internet access and I cannot run larger genomes on the login nodes, then we would be kicked out of the cluster. When we have a local copy of the database, there is no real need to connect to internet for each run, is it? I cannot see that there is an option to run FCS-GX without internet access. Could you implement this?

Thank you!

Sincerely, Ole

pstrope commented 2 years ago

Hi Ole,

Thanks for the supportive feedback! We are working on providing an option to run FCS-GX without the need of internet. We will release it shortly.

Sincerely, The FCS Team

pstrope commented 2 years ago

Hi Ole,

We just now released v0.2.2 Now you can run FCS-GX without internet access, using --gx-db-disk \<path to downloaded db>

We'd love to hear from you, so that we can make this tool better.

FCS team

olekto commented 2 years ago

Hi FCS team.

The job is running now, so hopefully it should finish successfully. Thank you for addressing my issue so quickly.

One question: The option --gx-db-disk should point to the index (all.gxi)? Just pointing to where the files are stored did not work, but pointing to the index seems to work. As it is now, this is the help text:

  --gx-db-disk DISK_INDEX_PATH
                        if storing the database in shared memory, keep a copy
                        of the files in this path (default: None)

Thank you.

Ole

pstrope commented 2 years ago

Hi Ole,

Could you please post the command you used? Thanks!

olekto commented 2 years ago

This command works:

export SHM_LOC=/cluster/projects/nn8013k/opt/fcs

python3 /cluster/projects/nn8013k/opt/fcs/dist/run_fcsgx.py --fasta $1 --out-dir ./gx_out/ \
--gx-db  "${SHM_LOC}/gxdb/all" --split-fasta --tax-id $2 \
--gx-db-disk "${SHM_LOC}/gxdb/all.gxi" \
--container-engine=singularity --image=/cluster/projects/nn8013k/opt/fcs/fcsgx.sif

But based on the description of the --gx-db-disk command, I could have thought that --gx-db-disk "${SHM_LOC}/gxdb/all" should have worked, but it doesn't. The DISK_INDEX_PATH is understandable tough.

Thank you.

Ole

pstrope commented 2 years ago

If database already exists at --gx-db "${SHM_LOC}/gxdb/all" then you don't need to provide --gx-db-disk "${SHM_LOC}/gxdb/all.gxi"

We'll try to be more clear on help message and documentation. Let us know if it worked or not.

olekto commented 2 years ago

Ok, thank you. It's just that above it was mentioned that I needed to use --gx-db-disk <path to downloaded db> to run FCS-GX on a node without internet access. Since <path to downloaded db>, but <path to downloaded db index> did work, I was wondering if the help text was correct or not.

Ole