Open tkchafin opened 2 months ago
Shane first runs lima on a database of ULI adapters, and then pbmarkdup
for ULI data, we need to run and extra lima to trim the ULI adapter sequence
https://github.com/sanger-tol/tol-workflows/blob/main/wr/wr-import-pacbio-ccs#L323-L338
Do we need lima here too ?
For Sanger data, this will already have been done (actually, mark/rm duplicates is done as well), so technically I think we can treat ULI reads the same as LI/other prep types for production purposes.
For full ULI support for external data, special handling of adapter trimming makes sense, although the pipeline as-is generally assumes most read filtering/qc has been done prior to running. Maybe we could think about adding an optional sub workflow to take in raw data?
@reichan1998 Can you review? I am tracking the lima/adapter removal suggestion in a separate ticket on pre-alignment QC, but for now we can merge the pbmarkdup integration if it is all working
Ultra low-input libraries (tracked in the "library" samplesheet column) will now be run through pbmarkdup. Note nothing is removed in the test file, but I have marked the PB cram as "uli" to trigger the test
Closes https://github.com/sanger-tol/readmapping/issues/72
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).