huangyh09 / brie

BRIE: Bayesian Regression for Isoform Estimate in Single Cells
https://brie.readthedocs.io
Apache License 2.0
41 stars 15 forks source link

Mode 1 and gene features #39

Open Lavieenrose123 opened 2 years ago

Lavieenrose123 commented 2 years ago

Thank you for developing Brie. You gave multiple examples on https://brie.readthedocs.io/. All of these examples were based on mode 2 (cell features or aggregation). However, when it comes to mode 1, a file on gene features is needed (-g gene-feature-file). I don't find any related examples or descriptions in the current document. Could you please tell me how to generate the gene-feature-file for mode 1? What is the format of that file?

huangyh09 commented 2 years ago

Hi, this is inherited from brie1, for which we produced some preprocessed gene features:

On the other hand, we found that the mean of a cell population usually gives a good alternative prior compared to sequenced generated prior.

Yuanhua

Lavieenrose123 commented 2 years ago

Thank you for your reply. For mode 2, I think, the fundemantal hypothesis is that the cell population is homogenous. However, for certain disease (especially cancer), such hypothesis may not be reasonable. I want to quantify PSI on such highly hetergenous data. Do you think mode 1 is a better choice?

huangyh09 commented 2 years ago

Good point. Mode 1 may have better flexibility in this scenario. Alternative, you can consider using cell types as covariates if using mode 2, which is equivalent to having a cell type-specific prior for each splicing event.

Lavieenrose123 commented 2 years ago

Thank you for your comments. For mode 2 with cell-type annotation as covariates, I have to cluster cells into different cell types using gene expression profiles first. However, PSI itself can also be used to define cell types. For heterogeneous data, I should further explore if a count-defined cell population has homogenous splicing patterns.

Suppose I use mode 1 to get the PSI matrix, then how to detect differential alternative splicing between psi-(count-)defined cell types? According to the manual, it seems that the DAS function is embedded into mode 2.

huangyh09 commented 2 years ago

Thanks for the further info. Yes, the DAS is embedded into mode 2 to avoid pair-wise cell comparison. For the DAS, wi would suggest direct using mode 2, as it detects whether the mean PSI is different between two cell groups.

If you want to use the PSI to perform downstream analysis, e.g., cell clustering, using gene sequence features may help, as it will avoid information leaking by using cell clusters.

Lavieenrose123 commented 2 years ago

Thank you for the information.