Open ChristopherMancuso opened 6 months ago
I would have totally expected both alevin and star to only use the number of nucleotides from R1 that is specified in the chemistry definition. In this case you run with --protocol 10XV3
, which should lead to the following arguments passed to the tools:
@rob-p, is there additional trimming needed before running simpleaf?
@grst Thanks for taking the time to comment! I'm a nextflow novice but I tried to follow how adding --protocol 10XV3
to the main run command changed the aligner options in the assets folder too and it looked like it should work, but didn't for the reasons above. I did enough googling and I think if I downloaded alevin or star as standalone software I could get it to work, but to be honest I like nextflow so much I'll just take the speed hit and use nextflow with cell ranger if this problem persists.
One extra thing is that I did try this pipeline on the test data for every aligner and it works just fine. I manually checked the fastq files used in the test run and R1 there was already "pre-trimmed" to just the umis and barcodes.
This should totally be supported. For me the question is if we can make alevin/starsolo directly work with the untrimmed reads (and what would be the appropriate command line options for that) or if we'd require an additional step that hard-trims the reads to the required length.
I will try to look into this and post what I find out here.
Description of feature
I'm following up on a slack post that I put out 2 months ago at https://nfcore.slack.com/archives/CHN5BV5DW/p1712178056321859
I had a question about the fastq format needed for the different aligners. For everything I’m using nextflow
v23.10.1
and scrnaseqv2.5.1
. From the core at my work place both theR1
andR2
fastq files each have a length of 151 for the reads, instead ofR1
being “trimmed” to just be only the barcode and umi (so like 28-ish bps depending on the protocol). When using--aligner cellranger
this seems to be handled fine. However, when only switching--aligne
r to eitheralevin
orstar
it doesn’t seem to handle thatR1
read format well. Foralevin
the pipeline completes but the number of barcodes inbarcodes.tsv
is ~200k, which is roughly the number of reads, whereas the expected number of cells is ~5k. Forstar
the pipeline fails atNFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_ALIGN
with the errorEXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28
. My questions are, is this known behavior of the pipeline? I would like to usealevin
orstar
in the future, do I need preprocessR1
and if so, any help in doing that? Thanks!the run command I use looks like this, just only changing the aligner argument