Closed hunglin59638 closed 6 months ago
Thanks!! Good catch on the noclue
... I made a minor change without actually looking close. I just pushed in a fix, would it be possible to pull main?
I like this approach a lot, and the code is quite clean. My current hesitations are that the commands themselves are replicated in both the sbatch scripts, and the Python code. And the execution of the python code is bound to a single job, which complicates the resource request particularly for large jobs. Of those two, I'm more concerned about duplication as it would be very easy to, for example, change parameters in one place but not the other. That said, I'm not opposed to something generally like what's done here.
Do you have any thoughts (or desire) on how to at least address the duplication?
The duplication issue can be resolved by splitting the python code into subcommands, which can then be invoked by the sbatch scripts.
Usage: exhaustive.py [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
export-db Export staged database fasta
process-kmers Mapping kmer substrings to database
process-regions Process regions
filter-db Mask database fasta with contaminated regions and...
submit Run exhaustive cleaning (export-db -> process-kmers ->...
Additionally, I want to execute exhaustive with a python script because some users' PCs (like me) may not have Slurm installed to use sbatch command.
Hi @hunglin59638, this is really cool, and I agree quite helpful -- thank you so much for interest here. I'd like to test out locally before merge, it may be a few days before I can do so.
Hi @hunglin59638 @wasade, I am trying to implement this method. I locally merged this pull request and followed the demo mentioned above. But I don't see all the options as mentioned in the first section, and when I run the exhaustive.py
, I get the following error.
python3 exhaustive.py --base_label demo --db_fna mito.fna --db_bt2 mito-bt2 --files genomes.txt
Usage: exhaustive.py [OPTIONS] COMMAND [ARGS]...
Try 'exhaustive.py --help' for help.
Error: No such option: --base_label
When I execute python3 exhaustive.py --help
this is what I get.
Usage: exhaustive.py [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
export-db Export staged database fasta
process-kmers Mapping kmer substrings to database
process-regions Process regions
filter-db Mask database fasta with contaminated regions and...
submit Run exhaustive cleaning (export-db -> process-kmers ->...
@rohitfarmer
To address the duplication issue, the Python code has been divided into subcommands.
Please type python3 exhaustive.py submit --help
to view the available options.
HI ! I have a basic question in using this pipeline. Do I need to create the contaminant database bt2 files for each sample (read1 and read2) I am trying to run through the pipeline? thanks in advance.
Sorry for such a delay, thanks @hunglin59638!
I have reviewed the sbatch scripts and merged them into a Python script (exhaustive.py) for ease of use. Additionally, I have addressed some bugs in filter_fna.py and trim-to-max-length.py.
Bug fix
filter_fna.py
Fixed a bug where providing a fasta file containing header descriptions would raise a KeyError from
coordinates[id_]
. This occurred, for example, when encountering headers like>NC_060925.1 Homo sapiens isolate CHM13 chromosome 1, alternate assembly T2T-CHM13v2.0
.trim-to-max-length.py
Resolved a NameError issue where the variable
noclue
was referenced but not defined. As it appeared unnecessary, it has been removed from the script.Python script
Help message
Demo
Output files