ncbi / sra-human-scrubber

An SRA tool that takes as input local fastq file from a clinical infection sample, identifies and removes any significant human read, and outputs the edited (cleaned) fastq file that can safely be used for SRA submission.
Other
42 stars 5 forks source link

Allow DB location to be specified #10

Closed pvanheus closed 2 years ago

pvanheus commented 2 years ago

Software is often installed in a read-only location, especially on shared storage or when using container technology like Singularity. The DB location of this tool is, however, hardcoded (https://github.com/ncbi/sra-human-scrubber/blob/master/scripts/scrub.sh#L31) relative to the tool install location.

Please provide a mechanism for users to specify a DB location that is independent of the tool location.

multikengineer commented 2 years ago

@pvanheus is there a reason I am missing that you can't copy the db where you want it to be, then ln?

TmpDb=$(mktemp -d)
cp data/human_filter.db $TmpDb/
rm data/human_filter.db 
cd data
ln -s $TmpDb/human_filter.db 
cd ..
 ./scripts/scrub.sh test
2021-07-19 08:57:35 aligns_to version 0.60
2021-07-19 08:57:35 hardware threads: 32, omp threads: 32
2021-07-19 08:57:35 loading time (sec) 0
2021-07-19 08:57:35 /tmp/tmp.SgF1x7qvsP/scrubber_test.fastq.fasta
2021-07-19 08:57:35 100% processed
2021-07-19 08:57:35 total spot count: 2
2021-07-19 08:57:35 total read count: 2
2021-07-19 08:57:35 total time (sec) 0
1  spot(s) removed.

test succeeded
pvanheus commented 2 years ago

If scrub.sh etc is installed in a Singularity container, you cannot do the steps that involve changing the data directory because the filesystem would be read-only.

multikengineer commented 2 years ago

@pvanheus Apologies for the delay, -d switch is indeed added.