hms-dbmi / scde

R package for analyzing single-cell RNA-seq data
http://pklab.med.harvard.edu/scde
Other
172 stars 66 forks source link

Up- and down-regulated genes from which of the two groups? #66

Open mvalenzuelav opened 6 years ago

mvalenzuelav commented 6 years ago

Hi Jean, I just have one (I guess) simple question that I do not manage to understand. In the results file from scde.expression.difference, I can not see if up- and down-regulated genes come from one or the other group of cells. How can I know that? Is there any way to print in another column whom they are referred to? I am solving this by using scde.test.gene.expression.difference and selecting some of the most up- and down-regulated genes and examining plots from each group (the group with more expression is where this gene is up/down-regulated, I guess). But this is very annoying for almost 17000 genes and also it appears to me that always the first group I selected to create scde.error.models is the one that shows the down- or up-regulations. Is this meaning that the comparison is always going in the same direction? I mean, that the function is only returning changes in genes in one of the groups (first one) compared to the other (second), but not changes in the second group compared to the first one. Is this possible or is there any way to differenciate these changes (up/down-regulations)? Thanks very much in advance. Best, Marina

JEFworks commented 6 years ago

Hi Marina,

Hum, yes it is currently a little confusing. scde.expression.difference compares the cells you specify in the groups factor and the up and down directions are set based on the order of the factor levels. Ideally, we can take levels(group) and concatenate it to column names or something.

But the direction is always the same! It is always the first level in the group factor compared to the second. So if you want genes upregulated in the second group, you will want to look for gene down regulated int he first group (since it is a pair-wise comparison). Or you can reset the levels of your group factor to be ordered the way you want.

You can differentiate between up vs. down-regulation by the MLE column (the maximum likelihood estimate of the expression-fold change).

Best, Jean

Jean Fan, PhD Bioinformatics and Integrative Genomics NCI F99/K00 Post-Doctoral Fellow Zhuang Lab | Harvard University 12 Oxford St, Naito 031, Cambridge, MA 02138 web: jef.workshttp://jefworks.com/

On Apr 20, 2018, at 11:08 AM, mvalenzuelav notifications@github.com<mailto:notifications@github.com> wrote:

Hi Jean, I just have one (I guess) simple question that I do not manage to understand. In the results file from scde.expression.difference, I can not see if up- and down-regulated genes come from one or the other group of cells. How can I know that? Is there any way to print in another column whom they are referred to? I am solving this by using scde.test.gene.expression.difference and selecting some of the most up- and down-regulated genes and examining plots from each group (the group with more expression is where this gene is up/down-regulated, I guess). But this is very annoying for almost 17000 genes and also it appears to me that always the first group I selected to create scde.error.models is the one that shows the down- or up-regulations. Is this meaning that the comparison is always going in the same direction? I mean, that the function is only returning changes in genes in one of the groups (first one) compared to the other (second), but not changes in the second group compared to the first one. Is this possible or is there any way to differenciate these changes (up/down-regulations)? Thanks very much in advance. Best, Marina

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_hms-2Ddbmi_scde_issues_66&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=2gb0vmLv11Vi98WTAqlCXyDkhi11d9lKeGWDXEU-qNw&m=kBwDzKJKKHvCyyWMczYKs0dS0wQixfj0m1Z_gg3b5lk&s=lE4YaBolZzd7gGGv95_SKQVSVyPVT4wObYa2tbhkCQA&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIj2SCC4cfDrXUJpArofAi677448eaUzks5tqfnugaJpZM4TdoDb&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=2gb0vmLv11Vi98WTAqlCXyDkhi11d9lKeGWDXEU-qNw&m=kBwDzKJKKHvCyyWMczYKs0dS0wQixfj0m1Z_gg3b5lk&s=Xm6mvCw0eO1Oj8JiDY-Yws4NABaI92OCnGJi1gOWK4c&e=.

mvalenzuelav commented 6 years ago

Hi Jean, Thanks very much for your rapid reply. Now I understand that I do not need to run the function twice to have both directions, because I can interpret the down-regulated genes from the first level group as the up-regulated genes for the second level group. I thought that these genes were not related but, obviously, they are (as you said they are pair-wise comparisons). Regarding mle, I understand that "+" or "-" denotations are indicating if they are up/down-regulated genes and the number is given us the size/quantity of this change, am I right? Anyway, I can also differentiate between up or down-regulated genes in the Z column, where I can also see "+" or "-" denotations, right? Could you also explain to me what are the other variables (columns) showing? I know the definitions, but I do not know about the real applications/meaning/uses. Do I need to have into account also ub, lb, mle, ce, cZ for something (and mle and Z for something more)? And just last question: in the table with all results, genes are ordered showing in top most significantly different genes (both up and down-regulated), as I can see in my results too. But if I would like to see which genes show highest differences between one group and another, I must look at mle, right? I mean, the top ones are only the most significant differentiated genes (because they are ordered by Z score) but not the ones with highest differences, if I have understood well. I ask this because I see in my data (below), in mle, a big difference (-159...) between the two groups for example in Xist (in bold) compared to the first five genes. Highest differences may not be related to most significantly expressed genes, right?

                    lb                              mle                                          ub                            ce

Ppia -0.430844595596799 -0.301591216917757 -0.258506757358078 -0.258506757358078 Cort -120.636.486.767.103 -0.904773650753272 -0.646266893395194 -0.646266893395194 Gapdh -0.947858110312951 -0.818604731633912 -0.689351352954873 -0.689351352954873 Aldoa -0.517013514716156 -0.387760136037117 -0.344675676477435 -0.344675676477435 Tecr -0.603182433835516 -0.473929055156477 -0.387760136037117 -0.387760136037117 Xist -193.880.068.018.558 -159.412.500.370.815 -133.561.824.635.007 -133.561.824.635.007 Gabrb2 0.517013514716156 0.732435812514551 0.861689191193594 0.517013514716156 Z cZ Ppia -71.608.466.438.477 -627.368.712.881.271 Cort -71.608.466.438.477 -627.368.712.881.271 Gapdh -71.608.466.438.477 -627.368.712.881.271 Aldoa -71.608.466.438.477 -627.368.712.881.271 Tecr -71.608.466.438.477 -627.368.712.881.271 Xist -71.608.466.438.477 -627.368.712.881.271 Gabrb2 716.081.298.188.234 627.368.712.881.271

Thanks again in advance! Marina