ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
97 stars 12 forks source link

Can FCS-GX be used to batch process a large number of genomes with different taxid? #93

Closed ShenMJ99 closed 1 month ago

ShenMJ99 commented 1 month ago

I am using run_gx compiled from GitHub source code and gx from Bioconda. I would like to know if the current version of fcs-gx has any parameters or features that enable batch processing of genomes with different tax-id?

etvedte commented 1 month ago

From source code, you should be able to set an environment variable for GX to run on a set number of cores (default=48): export GX_NUM_CORES=N

Then, you can set up a batch job using xargs or your preferred method. The example below is for running three cases.

cat gx_args
run_gx.py --tax-id=562 --fasta=Ecoli.fasta --out-dir=/path/to/outdir --gx-db=/path/to/gxdb/all 
run_gx.py --tax-id=9606 --fasta=Hsapiens.fasta --out-dir=/path/to/outdir --gx-db=/path/to/gxdb/all 
run_gx.py --tax-id=7227 --fasta=Dmel.fasta --out-dir=/path/to/outdir --gx-db=/path/to/gxdb/all 

xargs -a gx_args -L1 -I v1 -P 3  sh -c "v1"

FYI we haven't done performance testing for a while, so you'll have to see what works best for your situation. When doing batch jobs, I typically reserve at least 24 cores for larger eukaryote genomes, but you may be able to go lower.

ShenMJ99 commented 1 month ago

OK, thanks!