“minimap.smk - align reads to the genome assembly using minimap2”.
“windowmasker.smk - identify and mask repetitive regions using Windowmasker. Masked sequences are used in all blast searches”.
“chunk_stats.smk - calculate sequence statistics in 1kb windows for each contig”.
“busco.smk - run BUSCO using specific and basal lineages. Count BUSCOs in 1kb windows for each contig”.
“cov_stats - calculate coverage in 1kb windows using mosdepth”.
“window_stats - aggregate 1kb values into windows of fixed proportion (10%, 1% of contig length) and fixed length (100kb, 1Mb)”.
“diamond_blastp.smk - Diamond blastp search of busco gene models for basal lineages (archaea_odb10, bacteria_odb10 and eukaryota_odb10) against the UniProt reference proteomes”.
“diamond.smk - Diamond blastx search of assembly contigs against the UniProt reference proteomes. Contigs are split into chunks to allow distribution-based taxrules. Contigs over 1Mb are subsampled by retaining only the most BUSCO-dense 100 kb region from each chunk.”
“blastn.smk - NCBI blastn search of assembly contigs with no Diamond blastx match against the NCBI nt database”.
“blobtools.smk - import analysis results into a BlobDir dataset”.
“view.smk - BlobDir validation and static image generation”.
Based on the BlobToolkit Snakemake pipeline, convert each sub-pipeline into Nextflow subworkflows:
“minimap.smk - align reads to the genome assembly using minimap2”.
“windowmasker.smk - identify and mask repetitive regions using Windowmasker. Masked sequences are used in all blast searches”.
“chunk_stats.smk - calculate sequence statistics in 1kb windows for each contig”.
“busco.smk - run BUSCO using specific and basal lineages. Count BUSCOs in 1kb windows for each contig”.
“cov_stats - calculate coverage in 1kb windows using mosdepth”.
“window_stats - aggregate 1kb values into windows of fixed proportion (10%, 1% of contig length) and fixed length (100kb, 1Mb)”.
“diamond_blastp.smk - Diamond blastp search of busco gene models for basal lineages (archaea_odb10, bacteria_odb10 and eukaryota_odb10) against the UniProt reference proteomes”.
“diamond.smk - Diamond blastx search of assembly contigs against the UniProt reference proteomes. Contigs are split into chunks to allow distribution-based taxrules. Contigs over 1Mb are subsampled by retaining only the most BUSCO-dense 100 kb region from each chunk.”
“blastn.smk - NCBI blastn search of assembly contigs with no Diamond blastx match against the NCBI nt database”.
“blobtools.smk - import analysis results into a BlobDir dataset”.
“view.smk - BlobDir validation and static image generation”.