Correcting UB tags...
[1] "5.4e+08 Reads per chunk"
[1] "2024-01-28 15:27:24 CET"
[1] "Here are the detected subsampling options:"
[1] "Automatic downsampling"
[1] "Working on barcode chunk 1 out of 2"
[1] "Processing 403 barcodes in this chunk..."
[1] "Working on barcode chunk 2 out of 2"
[1] "Processing 265 barcodes in this chunk..."
Error in alldt[[i]][[1]] <- rbind(alldt[[i]][[1]], newdt[[i]][[1]]) :
more elements supplied than there are to replace
Calls: bindList
In addition: Warning messages:
1: In parallel::mclapply(mapList, function(tt) { :
all scheduled cores encountered errors in user code
2: In parallel::mclapply(mapList, function(tt) { :
all scheduled cores encountered errors in user code
Execution halted
Running this in Rackham, with single-end reads generated from SmartSeq3.
Some context - I used merge_demultiplexed_fastq.R to combine our ~600 samples, resulting in an R1.fastq.gz file of 30 GB and index of 5 GB. I modified the STAR alignment code to work with 1 instance with 20 threads.
The generated filtered.Aligned.GeneTagged.sorted.bam had a few reads with negative position, hence I removed those reads from the BAM file and indexed them. It then proceeded until I got the above error. I performed a test with small number of samples and was able to generate the full output.
For now, I am planning to split the input files into chunk and process them in batches of ~300 samples each, then merging the generated count table. Is that a viable option, or is it better to process the entire data together?
cohort.yaml.txt
Describe the bug Getting the following error -
Running this in Rackham, with single-end reads generated from SmartSeq3.
Some context - I used
merge_demultiplexed_fastq.R
to combine our ~600 samples, resulting in anR1.fastq.gz
file of 30 GB and index of 5 GB. I modified the STAR alignment code to work with 1 instance with 20 threads.The generated
filtered.Aligned.GeneTagged.sorted.bam
had a few reads with negative position, hence I removed those reads from the BAM file and indexed them. It then proceeded until I got the above error. I performed a test with small number of samples and was able to generate the full output.For now, I am planning to split the input files into chunk and process them in batches of ~300 samples each, then merging the generated count table. Is that a viable option, or is it better to process the entire data together? cohort.yaml.txt