ncbi / sra-human-scrubber

An SRA tool that takes as input local fastq file from a clinical infection sample, identifies and removes any significant human read, and outputs the edited (cleaned) fastq file that can safely be used for SRA submission.
Other
45 stars 6 forks source link

Option to specify the number of cores used? #9

Closed rpetit3 closed 2 years ago

rpetit3 commented 3 years ago

Hello!

Currently it seems like sra-human-scrubber uses all available cores on the machine. Would it be possible to add an option (e.g. -c 4) to allow the user to set this value?

Asking because users aren't too happy with me taking over 72 cores on our shared space?

Thank you! Robert

multikengineer commented 3 years ago

Robert - yes as per https://hub.docker.com/r/ncbi/sra-human-scrubber it will use all available to it. Are you able to submit the job so it only uses x cores?

rpetit3 commented 3 years ago

I am not, its a shared system with no queue system.

multikengineer commented 3 years ago

Can you limit using numactl ?

rpetit3 commented 3 years ago

I cannot use numactl, sys admin said no.

Is the aligns_to binary included in sra-human-scrubber, modified in anyway? Unless I'm completely missing something, the source (https://github.com/ncbi/ngs-tools/blob/tax/tools/tax/src/aligns_to.cpp#L59-L61) from aligns_to.cpp seems to suggest it can accept a -num_threads parameter.

    Config config(argc, argv);
    if (config.num_threads > 0)
        omp_set_num_threads(config.num_threads);

But I'm getting a 2021-07-16 15:29:43 unexpected argument: -num_threads when I try it on the included aligns_to

bin/aligns_to -db data/human_filter.db -num_threads 1 test/scrubber_test.fastq
2021-07-16 15:29:43     aligns_to version 0.60
2021-07-16 15:29:43     hardware threads: 16, omp threads: 16
need <database> [-spot_filter <spot or read file>] [-out <filename>] [-hide_counts] [-compact] [-unaligned_only] <contig fasta, accession or .list file of fasta/accessions>
where <database> is one of:
-db <database>
-dbs <database +tax>
-dbsm <database +taxes>
-dbss <sorted database +tax> -tax_list <tax_list file>
2021-07-16 15:29:43     unexpected argument: -num_threads

Haha, again though, I might totally be missing something.

Thank you! Robert

multikengineer commented 3 years ago

@rpetit3 That was added a month ago so no it is not in this version. I can update and surface the option.

multikengineer commented 3 years ago

By the way @rpetit3 until I do, if you set env OMP_NUM_THREADS, e.g. export OMP_NUM_THREADS=2 that will limit.

export OMP_NUM_THREADS=2
 ./scripts/scrub.sh test
2021-07-16 14:46:01 aligns_to version 0.60
2021-07-16 14:46:01 hardware threads: 32, omp threads: 2
2021-07-16 14:46:03 loading time (sec) 1
2021-07-16 14:46:03 /tmp/tmp.sWWQjQk7Pf/scrubber_test.fastq.fasta
2021-07-16 14:46:03 100% processed
2021-07-16 14:46:03 total spot count: 2
2021-07-16 14:46:03 total read count: 2
2021-07-16 14:46:03 total time (sec) 1
1  spot(s) removed.

test succeeded
export OMP_NUM_THREADS=4
 ./scripts/scrub.sh test
2021-07-16 14:46:12 aligns_to version 0.60
2021-07-16 14:46:12 hardware threads: 32, omp threads: 4
2021-07-16 14:46:13 loading time (sec) 0
2021-07-16 14:46:13 /tmp/tmp.9py5CcdN3X/scrubber_test.fastq.fasta
2021-07-16 14:46:13 100% processed
2021-07-16 14:46:13 total spot count: 2
2021-07-16 14:46:13 total read count: 2
2021-07-16 14:46:13 total time (sec) 0
1  spot(s) removed.

test succeeded
rpetit3 commented 3 years ago

Oh that works great! Thank you very much

multikengineer commented 2 years ago

With version 2.0.0 you now have the ability to set threads (see -p option).

rpetit3 commented 2 years ago

Congrats on the v2 release!