wenweixiong / MARVEL

38 stars 9 forks source link

Bam file generated by cell ranger can’t read by STARsolo #37

Open Manoswini-02 opened 4 months ago

Manoswini-02 commented 4 months ago

Hi I am trying to use MARVEL for splice junction analysis from 10x single-cell RNA-Seq data. As mentioned in the tutorial (https://wenweixiong.github.io/MARVEL_Droplet.html), I used cell ranger to generate the bam file. Below is the code

cellranger count --id=SRR21407780 
--transcriptome=/mnt/d/SingleCell/cellranger-7.2.0/refdata-gex-GRCh38-2020-A/ 
--fastqs=/mnt/d/SplicingScRNA-Seq/QData/SRR21407780 
--sample=SRR21407780

There were some other errors which got solved when I indexed the reference genome with STAR (as mentioned in some other blog), and generated the bam file. When I try to use the bam file into STARsolo it throws me the below error

ReadAlignChunk_processChunks.cpp:55:processChunks EXITING because of FATAL ERROR in input BAM file: the consecutive lines in paired-end BAM have different read IDs: SRR21407784.62466614 vs SRR21407784.43082185 SOLUTION: fix BAM file formatting. Paired-end reads should be always consecutive lines, with exactly 2 lines per paired-end read Mar 07 11:28:36 ...... FATAL ERROR, exiting

Below is the code I used

STAR --runThreadN 2 \
     --genomeDir /mnt/d/SingleCell/cellranger-7.2.0/refdata-gex-GRCh38-2020-A/star \
     --soloType CB_UMI_Simple \
     --readFilesIn /mnt/d/SingleCell/cellranger-7.2.0/SRR21407784/outs/possorted_genome_bam.bam \
     --readFilesCommand samtools view -F 0x100 \
     --readFilesType SAM PE \
     --soloInputSAMattrBarcodeSeq CR UR \
     --soloInputSAMattrBarcodeQual CY UY \
         --soloCBwhitelist /mnt/d/SingleCell/cellranger-7.2.0/lib/python/cellranger/barcodes/737K-august-2016.txt \
     --soloFeatures Gene SJ

To check if the problem is with generating bam file from cell ranger, I used to STAR to get the bam file and pass through the same code as above and end up with similar error.

Can anyone please help me to find out what exactly I am doing wrong here?

Note: To further add, data is generated using Single Cell 3′ Reagent Kits v2, 10× Genomics with sequencing configuration of 26 base pair (bp) on read1 and 98 bp on read2.