nf-core / scrnaseq

A single-cell RNAseq pipeline for 10X genomics data
https://nf-co.re/scrnaseq
MIT License
178 stars 154 forks source link

Flexible input read format with STARsolo with --soloBarcodeReadLength #307

Closed wblashka closed 4 months ago

wblashka commented 4 months ago

Description of feature

As it stands, there is a solid amount of flexibility of the pipeline to handle different library chemistries; however, I believe the STARsolo aligner is being unnecessarily inhibited. It is not uncommon for sequencing done with 10X library chemistries to have the forward read sequenced for a much longer length than the intended 26 or 28 nucleotides (for example 150nt). As I understand it, CellRanger seems to automatically trim this. Unfortunately, while informing STARsolo which version of 10X chemistry is being used works well when the reads are of typical length, in these situations the STAR aligner will crash because of the unexpected read length. The documentation for STAR explains that this can be handled by specifying --soloBarcodeReadLength as well as the UMI and Barcode locations if they are different, but as far as I can tell, the current pipeline does not support the handling of these arguments.

It looks like this feature was being considered in tandem with a few completely different features in PR #141 , but after reading through the discussion, I can't determine why this was never added.

I believe this feature, or some way of passing these parameters, should be included.

wblashka commented 4 months ago

I have since discovered the ability to pass additional arguments into specific modules within the config files using ext.args, making this unnecessary. Thanks to the community for the help!