sanger-tol / blobtoolkit

Nextflow pipeline for BlobToolKit for Sanger ToL production suite
https://pipelines.tol.sanger.ac.uk/blobtoolkit
MIT License
10 stars 0 forks source link

nf-core pipeline: blobtoolkit #9

Closed alxndrdiaz closed 2 years ago

alxndrdiaz commented 2 years ago

Based on the BlobToolkit Snakemake pipeline, convert each sub-pipeline into Nextflow subworkflows:

  1. “minimap.smk - align reads to the genome assembly using minimap2”.

  2. “windowmasker.smk - identify and mask repetitive regions using Windowmasker. Masked sequences are used in all blast searches”.

  3. “chunk_stats.smk - calculate sequence statistics in 1kb windows for each contig”.

  4. “busco.smk - run BUSCO using specific and basal lineages. Count BUSCOs in 1kb windows for each contig”.

  5. “cov_stats - calculate coverage in 1kb windows using mosdepth”.

  6. “window_stats - aggregate 1kb values into windows of fixed proportion (10%, 1% of contig length) and fixed length (100kb, 1Mb)”.

  7. “diamond_blastp.smk - Diamond blastp search of busco gene models for basal lineages (archaea_odb10, bacteria_odb10 and eukaryota_odb10) against the UniProt reference proteomes”.

  8. “diamond.smk - Diamond blastx search of assembly contigs against the UniProt reference proteomes. Contigs are split into chunks to allow distribution-based taxrules. Contigs over 1Mb are subsampled by retaining only the most BUSCO-dense 100 kb region from each chunk.”

  9. “blastn.smk - NCBI blastn search of assembly contigs with no Diamond blastx match against the NCBI nt database”.

  10. “blobtools.smk - import analysis results into a BlobDir dataset”.

  11. “view.smk - BlobDir validation and static image generation”.