liulab-dfci / MAESTRO

Single-cell Transcriptome and Regulome Analysis Pipeline
GNU General Public License v3.0
278 stars 78 forks source link

Fix/long rna barcode reads #127

Closed ssmadha closed 3 years ago

ssmadha commented 3 years ago

This is a fix for when the barcode scRNA fastq file reads run longer than the barcode and umi, which raises an error in STARSolo. This fix just trims down these reads to the length of barcode + umi.

crazyhottommy commented 3 years ago

Hi Shariq, this is nice. It would be better to set it as optional? Also, this reminds me why R1 is longer than barcode + umi? is it 5' scRNAseq which contains the template-switching oligos? we can discuss today in the meeting.

ssmadha commented 3 years ago

I'll modify it to make it optional. I believe Novogene, where we usually sequence, just sequences 150bp for both R1 and R2, in case we want more of the R1, even with the 3' sequencing we use.

crazyhottommy commented 3 years ago

@baigal628 can you help to review and test this PR?

mudappathir commented 3 years ago

Hi,

I am facing this issue of long rna barcode reads. Please suggest a fix for this.

ssmadha commented 3 years ago

Hello,

For now, you should be able to use the following code to trim down your fastq file so it will work in the program:

gunzip -c <original_barcode_file> | awk -v barlen=$((<barcode_length>+<umi_length>)) '{if((!($1 ~/^@/) && !($0==\"+\")){print substr($0,1,barlen)} else {print}}' | gzip -c > <trimmed_barcode_file>

If you are using 10x v3 chemistry, your barcode_length and umi_length will likely be 16 and 12, respectively.

mudappathir commented 3 years ago

Hi,

I trimmed the barcode fastq file as suggested and tried running but getting a different error now: terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::erase

mudappathir commented 3 years ago

I could run STAR solo without any error when I set the following option --soloBarcodeReadLength 0. This was mentioned in: https://github.com/alexdobin/STAR/releases

crazyhottommy commented 3 years ago

for now, you can add --soloBarcodeReadLength 0 to https://github.com/liulab-dfci/MAESTRO/blob/master/MAESTRO/Snakemake/scRNA/Snakefile#L48 after you initiate the Snakefile, we will fix it in the next release.