Open kedhammar opened 3 months ago
For WGS data for assembly, GenomeScope (https://github.com/nf-core/modules/blob/master/modules/nf-core/genomescope2/main.nf). The database is built using Meryl ( also on nf-core ).
But there is also a container only version that's a little bit faster and has extra tools that might be useful (https://github.com/nf-core/modules/blob/master/modules/nf-core/genescopefk/main.nf) The databases for Merquryfk/KATGC, Merquryfk/KATCOMP, Merqury/Ploidyplot, and GeneScopefk are build using FastK.
Preseq complexity (which subtool?).
I've used preseq lc_extrap
before and there's a module for it in nf-core (https://nf-co.re/modules/preseq_lcextrap). However, it is very prone to not working or rather refusing to give a complexity estimate.
Another option would be Picard (https://gatk.broadinstitute.org/hc/en-us/articles/360037591931-EstimateLibraryComplexity-Picard). I've never used it, and for the applications I worry about library complexity (HiC) the tool I use (pairtools) implemented it's own complexity estimate, so I have no need. There's no nf-core module for it as far as I can see.
Preseq complexity (which subtool?).
I've used
preseq lc_extrap
before and there's a module for it in nf-core (https://nf-co.re/modules/preseq_lcextrap). However, it is very prone to not working or rather refusing to give a complexity estimate.Another option would be Picard (https://gatk.broadinstitute.org/hc/en-us/articles/360037591931-EstimateLibraryComplexity-Picard). I've never used it, and for the applications I worry about library complexity (HiC) the tool I use (pairtools) implemented it's own complexity estimate, so I have no need. There's no nf-core module for it as far as I can see.
@remiolsen any idea why preseq lc_extrap
tends to refuse?
@remiolsen any idea why
preseq lc_extrap
tends to refuse?
I'm fairly certain I used to see this error most commonly - and I quote from the preseq manual
Q — When running lc extrap, I receive the error
ERROR: too many iterations, poor sample
A. — Most commonly this is due to the presence of defects in the approximation which cause the
estimates to be unstable. Setting the step size larger (with the flag -s) will help to avoid the
defects. The default step size is 1M reads or 0.05% of the input sample size rounded up to the
nearest million, whichever is larger. A consequence of this action will be a reduction in the
observed smoothness of the curve.
And setting the step -s
flag was a little bit hit or miss if it worked.
Closed https://github.com/nf-core/seqinspector/pull/6 due to being too broad and unspecific. Feel free to start new PRs addressing more specific implementations.
Functionalities and modules
Mentioned in the pipeline proposal
primaryQC_pipeline_proposal.pdf
Pipeline proposal Slack thread
Standard QC
seqkit watch
seqfu check
seqfu metadata
seqfu merge
Duplication + Complexity
Adapter and Artifact detection
[ ] Fastp
[ ] BBtools
Contamination detection
[ ] FastQ screen
[ ] Sylph
[ ] Kraken2
[ ] Mapping to reference
Mentioned in the pipeline Slack channel