sandberg-lab / txburst

Transcriptional burst kinetics inference and analysis described in the article "Genomic encoding of transcriptional burst kinetics" (Larsson et. al 2019).
37 stars 13 forks source link

Extension from BP3 to BP4 #1

Open mfilip8 opened 5 years ago

mfilip8 commented 5 years ago

Hi All,

first of all great work, great paper where I really appreciated all the effort you put in confidence intervals and sensitivity testing using parameter variation with simulated data. I wanted to ask you if it would be easy to extend in the txburstML.py the Beta-Poisson from being a 3 param function to a 4 parameter function as in https://academic.oup.com/bioinformatics/article/32/14/2128/2288270#95426676 and in case if you can help me in doing so. My question arises from the fact that while trying out your code on our single cell RNA seq data I faced the following issue: since our cells don't get a very similar number of reads per cell I wanted to normalise our UMI counts by total number of reads (or UMI) per cell. What I exactly did was to normalise its UMI count gene_x cell_i / (sum UMIs cell_i/ median UMIs all cells) So that my actual normalisation factor ranges from roughly from 0.6 to 1.4. In doing so I nevertheless noticed that the fit becomes very dependent from the particular normalisation factor (for example adding a 10, 100), giving me results that are more similar to what you have in our paper using an additional *10. BP4 would fit this normalisation factor and allow for a better fit in general. Sorry for all the writing but I wanted to know your take on the this and in particular to the UMI count normalisation.

Thanks a lot in advance Marco

AntonJMLarsson commented 5 years ago

I don't think it would be hard to include a scaling factor for the ML estimation, but it's not clear to me whether that you help you. Is it correct that your normalization factor is different for each cell? The extension in that paper is a normalization per gene.

Which protocol did you use to obtain your data? How many reads do you have per cell on average?

Best,

Anton

mfilip8 commented 5 years ago

Is it correct that your normalization factor is different for each cell?

Yes, the logic is that if cell_i and cell_j have respectively 5 and 10 counts for gene_x, but also a total number of reads of 50000 and 100000 you cannot directly compare the counts.

The extension in that paper is a normalization per gene.

Correct, this factor sort of account for the range of the expression of that gene.

Which protocol did you use to obtain your data? How many reads do you have per cell on average?

We have 10x and the drop out is not negligible...the number of UMIs per cell ranges from 8000 to 16000

If I apply your scripts in the same way you did...among 14000 expressed genes I can get the kinetic parameters predicted for less than 700 genes.