tanaylab / metacells

Metacells - Single-cell RNA Sequencing Analysis
MIT License
86 stars 8 forks source link

Metacell count matrix #36

Closed Sayyam-Shah closed 1 year ago

Sayyam-Shah commented 1 year ago

Hello!

Thank you for the amazing tool.

I computed meta cells for a large dataset. I noticed the summarized counts are not whole numbers. How is metacell computing the metacell count matrix after clustering the cells? Are you aggregating the counts, and is a normalization being applied?

orenbenkiki commented 1 year ago

When we combine cells into a metacell we cap the maximal UMIs of each one to be double the median UMIs of the cells in the metacell, to avoid one cell dominating the result. Right now this is done as a straight normalization resulting with fractional values. We may change this to doing downsampling in the future as this seems to be causing some confusion.

Sayyam-Shah commented 1 year ago

Hello @orenbenkiki!

Thank you for getting back to me. I want to confirm my understanding of your method. Let's say for a particular cell double the median is 2000, and the total UMI is 2400. Are you dividing all gene counts by 2400 (Total UMI) and multiplying it by 2000 (double the median)? I hope to emulate your method but with the ATAC modality using the metacell2 assignments on the RNA modality since I want to run scenic plus. May you please inform me how I could extend the assignments to ATAC?

orenbenkiki commented 1 year ago

Yes, this is is what we do now. In the next release I'm going to change this to downsampling the 2400 UMIs to 2000 UMIs, which will keep things integer and would be less confusing to people.

If you use ATAC data instead of UMIs you need to probably set cells_similarity_log_data to false. We compute similarity of the log of the UMIs since they are bursty, but this is probably the wrong thing for ATAC. You'll also want to normalize the ATAC data to be in a UMI-like range. Specifically, deviant (outlier) cells detection uses a hard-wired normalization factor of +1 which makes sense for UMIs, the default target metacell size is 160000 which makes sense for UMIs, etc. You can separately fight each of these parameters, but it is easier to just scale the ATAC data to be in a UMIs-like range.

Sayyam-Shah commented 1 year ago

Hello @orenbenkiki,

Thank you! We have multiome data, so we have the RNA and ATAC for the same cells. I ran metacell2 on the RNA, so I'm thinking of summarizing the ATAC by summing the peaks with the RNA assignments and applying your "normalization" method.

Is there a function in the metacell2 package that summarizes the cells and allows me to input pre-existing assignments?

orenbenkiki commented 1 year ago

If you try and mix RNA and ATAC, you should definitely scale the ATAC data to be "similar" to UMIs. From that point on, things "should" just work - I would be interested to hear your experience with this.

The collect_metacells function does this collection, right now it is hard-wired to look at a per-cell property called metacell. I'm finalizing 0.9 right now which will change some APIs, amongst these changes it provides an additional parameter that ca override this. In the meanwhile, you can just force set the metacell per-observation property to whatever grouping you want and collect_metacells will obey it.