t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
37 stars 22 forks source link

So many T2C conversion in no-4sU treated control samples #144

Closed realzhang closed 5 months ago

realzhang commented 7 months ago

Dear author, I want to get reads with various numbers of T2C conversions, such as 0, 1, 2, 3 T2Cs. So I used the content of sdunk file, ie. the output of alleyoop dump. I wrote a simple script to count reads with different T2C#s, and found although 4sU treated sample have more T2Cs, there are still large number of T2Cs in no-4sU samples. I understand that SNPs exist, but in my knowledge, alleyoop dump use vcf file as input, so the snp should not be included in the dump. I'm confused, please kindly help.

t-neumann commented 7 months ago

Hi @realzhang,

I'm pretty sure we report even T2C masked SNPs but indicated this in the MP:Z strings (True = SNP, False = no SNP). That aside you also have to consider the presence of T2C sequencing errors in the dataset which is also why we typically use a cutoff of 2 T2Cs per read to consider it a "true" labelled read (as also described in Muhar et al this reduces noise greatly). You can actually choose this cutoff via slamdunk count -c <min number of T2Cs>.

realzhang commented 7 months ago

Dear Tobias Neumann, many thanks for the detailed reply. I counted the T2C or A2G in sdunk file according to the aligment direction (the 2nd column), ie. T2C for forward and A2G for reverse. In the first 10,000 lines (reads) for the Nature Methods data, I found ~1156 conversions in non-treated vs ~14000 conversions in treated samples, which seems reasonable. However, for my own data, I found ~5000 conversions in non-treated vs ~9700 conversions in treated samples, which seems too high for non-treated sample. Would you kindly give sugguestion on the cause of high backgroud conversions, which could be the sequencing error or the fidelity of reverse transcriptase? So I can consider to change to another sequencing platform or RT-PCR kit accordingly. PS. My current seq machine is BGISEQ T7 with Q30 of 88% for read 1 and 84% for read 2.

t-neumann commented 7 months ago

Hmmm that is indeed weird and points to a wetlab issue if you conducted both analyses the same way and for the Nature Methods data it shows what you would expect. Which cell type are you using? Are you sure you are using appropriate 4SU concentrations? I would suggest to reach out to someone from the Ameres lab who could advise you further, maybe even Stefan Ameres