Added branched handling of ULI inputs in filter_pacbio

tkchafin commented 2 months ago

Ultra low-input libraries (tracked in the "library" samplesheet column) will now be run through pbmarkdup. Note nothing is removed in the test file, but I have marked the PB cram as "uli" to trigger the test

Closes https://github.com/sanger-tol/readmapping/issues/72

PR checklist

[ ] This comment contains a description of changes (with reason).
[ ] If you've fixed a bug or added code that should be tested, add tests!
[ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
[ ] Make sure your code lints (nf-core lint).
[ ] Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
[ ] Usage Documentation in docs/usage.md is updated.
[ ] Output Documentation in docs/output.md is updated.
[ ] CHANGELOG.md is updated.
[ ] README.md is updated (including new tool citations and authors/contributors).

muffato commented 2 months ago

Shane first runs lima on a database of ULI adapters, and then pbmarkdup

for ULI data, we need to run and extra lima to trim the ULI adapter sequence

https://github.com/sanger-tol/tol-workflows/blob/main/wr/wr-import-pacbio-ccs#L323-L338

Do we need lima here too ?

tkchafin commented 2 months ago

For Sanger data, this will already have been done (actually, mark/rm duplicates is done as well), so technically I think we can treat ULI reads the same as LI/other prep types for production purposes.

For full ULI support for external data, special handling of adapter trimming makes sense, although the pipeline as-is generally assumes most read filtering/qc has been done prior to running. Maybe we could think about adding an optional sub workflow to take in raw data?

tkchafin commented 2 months ago

@reichan1998 Can you review? I am tracking the lima/adapter removal suggestion in a separate ticket on pre-alignment QC, but for now we can merge the pbmarkdup integration if it is all working

sanger-tol / readmapping

Added branched handling of ULI inputs in filter_pacbio #115

PR checklist