Bam file does not contain cell and umi barcodes appropriatelly formatted (Run on modified Smart-seq2 protocol)

velocyto-team / velocyto.py

RNA velocity estimation in Python

http://velocyto.org/velocyto.py/

BSD 2-Clause "Simplified" License

159 stars 83 forks source link

Bam file does not contain cell and umi barcodes appropriatelly formatted (Run on modified Smart-seq2 protocol) #107

Open Suger0917 opened 6 years ago

Suger0917 commented 6 years ago

Dear Velocyto Team, Our lab used a modified Smart-seq2 protocol to allow for multiplexed single-cell RNA-seq. Primers of RT reaction were designed with cell-specific barcodes and unique molecular identifier [UMI]. After alignment to hg19 reference genome using Tophat, bam files were generated and organized by cell in a folder structure similar to the following: plateX/cell01/cell01.bam plateX/cell02/cell02.bam plateX/cell03/cell03.bam ... Bam file of each cell included UMI information similar to the following: The first 8bp of column1 is UMI information of each reads. I don't know where and when to add a TAG named UB(UMI barcode) or XM. If I just run velocyto like this: velocyto run -o $output_path -m $repeat_msk_gtf $sorted_genome_bam $gtf It will return erros: OSError: The bam file does not contain cell and umi barcodes appropriatelly formatted. If you are runnin UMI-less data you should use the -U flag. Could you please give me some advice to modify our pipeline to prepare correct format of bam files, or give me some examples including bam files to run velocyto?

Thank you!

snower2010 commented 6 years ago

Hi, I came to the same problem. And after I modified my bam file by package "simplesam". This problem has been solved. A template script is as foolowings: import simplesam barcode_tag = 'CB' umi_tag = 'UB' with simplesam.Reader(open("in.bam")) as in_bam: with simplesam.Writer(open("out.sam", 'w'), in_bam.header) as out_sam: for read in in_bam: read[umitag] = read.qname.split("")[2] # add the umi tag read[barcodetag] = read.qname.split("")[1] # add the barcode tag out_sam.write(read)

And then convert this sam file into a bam file by "samtools". Use this bam file as input for velocyto. Hope it will also work for you.

Suger0917 commented 6 years ago

Thank you! I tried "simplesam", it works well. But it occupied too much memory of cpu.

eynullazada commented 4 years ago

I had the same issue and I tried simplesam but it gives me index error. Do you have any idea why it might happen?

The way I run it:

import simplesam barcode_tag = 'CB' umi_tag = 'UB' with simplesam.Reader(open("A2S_Day6_sorted.bam")) as in_bam: ... with simplesam.Writer(open("out.sam", 'w'), in_bam.header) as out_sam: ... for read in in_bam: ... read[umi_tag] = read.qname.split()[2] ... read[barcode_tag] = read.qname.split()[1] ... out_sam.write(read)

error:

File "", line 4, in IndexError: list index out of range

I appreciate your help

MelissaSaichi commented 3 years ago

Hello, I have encountered the same issue using a BAM file output from BD-Rhapsody. I would appreciate your help

yaxing0zhao commented 3 years ago

Hello, I have encountered the same issue using a BAM file output from BD-Rhapsody. I would appreciate your help

Did you fix the issue, I get the same error. Many thanks~

denvercal1234GitHub commented 2 years ago

@Suger0917 did you have to modify the script provided by @snower2010 somehow for your own data, or you just use the same? And if you had to, how did you go about doing it? Thank you so much!

Akriebs commented 2 years ago

I am also getting the same error as @MelissaSaichi and @yaxing0zhao with a BD-Rhapsody file. Was this issue ever resolved, and if so, how?

eynullazada commented 2 years ago

Hi all

I was having the same issue and running the following command helped:

samtools view my_data_sorted.bam -h |awk '{gsub(/XU:/,"XM:"); print $0}' |awk '{gsub(/XB:/,"XC:"); print $0}' > my_data_sorted_replacetagcode.sam

Hope it helps

Khagani

From: Akriebs @.> Sent: Wednesday, March 30, 2022 11:00 AM To: velocyto-team/velocyto.py @.> Cc: KHAGANI EYNULLAZADA @.>; Comment @.> Subject: Re: [velocyto-team/velocyto.py] Bam file does not contain cell and umi barcodes appropriatelly formatted (Run on modified Smart-seq2 protocol) (#107)

I am also getting the same error as @MelissaSaichihttps://github.com/MelissaSaichi and @yaxing0zhaohttps://github.com/yaxing0zhao with a BD-Rhapsody file. Was this issue ever resolved, and if so, how?

— Reply to this email directly, view it on GitHubhttps://github.com/velocyto-team/velocyto.py/issues/107#issuecomment-1083326230, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANGUEU6ESHZYI763JRVRHRLVCR3BTANCNFSM4FTTSC3Q. You are receiving this because you commented.Message ID: @.***>

MoonlightFansty commented 11 months ago

Hi, I came to the same problem. And after I modified my bam file by package "simplesam". This problem has been solved. A template script is as foolowings: import simplesam barcode_tag = 'CB' umi_tag = 'UB' with simplesam.Reader(open("in.bam")) as in_bam: with simplesam.Writer(open("out.sam", 'w'), in_bam.header) as out_sam: for read in in_bam: read[umitag] = read.qname.split("")[2] # add the umi tag read[barcodetag] = read.qname.split("")[1] # add the barcode tag out_sam.write(read)

And then convert this sam file into a bam file by "samtools". Use this bam file as input for velocyto. Hope it will also work for you.

excuse！could you give the .py file? I cannot unserstand the Code Indent