willtownes / quminorm-paper

supporting code for the quasi-UMIs single-cell RNA-seq paper
GNU Lesser General Public License v3.0
7 stars 0 forks source link

How to estimate shape parameter for my own dataset #1

Open ahy1221 opened 4 years ago

ahy1221 commented 4 years ago

Thanks for the excenlent idea and algorithmn. I was wondering that how to estimate the shape parameter of the quminorm function for my own dataset ? In the paper you describe use shape=2 for the patel dataset without any reason.
Should I choose different shape for different tumor/batch ?

Yao

willtownes commented 4 years ago

Hi Yao, thanks for your interest in our method. We are currently revising and improving our manuscript to better address this issue. I will post a link when it is available. Roughly speaking, we suggest two possible approaches to setting the shape parameter:

  1. Find a UMI counts dataset from the same tissue type as your dataset and use it as training data. A good place to find data is https://www.nxn.se/single-cell-studies . For each cell in the training data, you can compute MLEs for the Poisson-lognormal distribution. Then, make a histogram of the "sig" parameter. Choose the median or mean of this distribution as your shape parameter for QUMI normalization of your other dataset.
  2. Simply use the default shape parameter. In the original version of the preprint we fit Poisson-lognormal distributions to thousands of cells in three different training datasets, and found that the sig parameter ranged from about 1.0-3.0 with most values concentrating around 2-2.5 (Figure 5). That's why we set 2.0 as the default in our further analyses. More recently, we have analyzed 4 additional training datasets and we are seeing values more in the 2.3-2.5 range, and it appears the values of 1.0 in the Klein data are something of an anomaly. So based on that, I would now recommend 2.4 as a better default than 2.0.

Once we are finished revising the manuscript, we plan to provide more convenient implementations of both the MLE fitting as well as the QUMI normalization via a standalone R package, although note that the linked repository is still a work in progress and not ready to be installed yet.

Keeping this issue open to remind myself to link to the updated manuscript and R package when it becomes available.

willtownes commented 4 years ago

Hi, we have recently updated the quminorm R package. It should now be possible to install without errors. So, if you want to follow suggestion (1) above (the custom shape parameter approach), you can get the estimated shape parameters using the function poilog_mle_matrix from the R package.

Keeping the issue open to remind myself to link to the updated manuscript when it is available.