scverse / scvi-tools

Deep probabilistic analysis of single-cell and spatial omics data
http://scvi-tools.org/
BSD 3-Clause "New" or "Revised" License
1.21k stars 344 forks source link

Hierarchical conditioning #586

Closed vals closed 4 years ago

vals commented 4 years ago

I have been thinking a bit about hierarchically structured data and how to combine the ideas in scVI with nested random effects models.

A very typical nested experimental design in the field is with levels 1: health vs unhealthy, level 2: different individuals/batches from the conditions, level 3: different cells from the individuals.

I don't have my book on me, but in essence the linear random effect model for e.g. a gene would be something like:

y ~ N(mu + b{individual}, sigma{cells), b{individual} ~ N(mu{individuals} + b{condition}, sigma{individuals}), b{condition} ~ N(mu{conditions}, sigma_{conditions}).

(Just assuming normal for simpler notation). What this achieves is that it accounts for there being within-individual and within-condition correlation of observed y-values.

I was thinking of how the conditioning on batch works in scVI. With the latent variables we have basically

x_1 = f_1(z, batch), f_1 : R^10 -> R^128, x_2 = f_2(x_1), f_2 : R^128 -> R^nGenes (and if we'd do normal instead ZINB/NB) y ~ N(x_2, sigma).

My impression is that within-batch correlation gets implicitly handled by putting latent observations in different parts of the R^128 space?

Would it make sense to do something like this?

x_1 = f_1(z, condition), f_1 : R^10 -> R^128, x_2 = f_1(z, individual), f_2 : R^128 -> R^128, x_3 = f_2(x_2), f_2 : R^128 -> R^nGenes, y ~ N(x_3, sigma)

Best, /Valentine

romain-lopez commented 4 years ago

Hi Valentine,

I'm not sure the exact model you wrote could be useful. Although, what I'm thinking right now would be the following generative model:

b{condition} ~ N(mu{conditions}, sigma{conditions}) \in R^k b{individual} ~ N(mu{individuals} + b{condition}, sigma_{individuals}), x_1 = f1(z, b{individual}), f_1 : R^10 x R^k -> R^128, x_2 = f_2(x_1), f_2 : R^128 -> R^nGenes y ~ NB(x_2, theta)

Then, you need to perform variational inference over all of the global variables (b_individuals as well as the b_conditions) AND on the local variables (the z). This is akin to what we did for AutoZI where we had one global latent variable for each gene! Here, you could use these variables to cluster patients for example.

romain-lopez commented 4 years ago

I have been thinking about it for a bit actually (and also discussed it with other people) but I never had neither time nor the data to investigate it properly.

vals commented 4 years ago

Hi Romain,

Thanks for the feedback!

Oops, I realize now that I wrote "f_1(z, c[...]) ... f_1(z, i[...]) ... f_2(x_2)" when I meant "f_1(z, c[...]) ... f_2(x_1, i[...]) ... f_3(x_2)". But you probably inferred what I meant.

Regarding the model you propose. It's an interesting mix of an LMM and the VAE. I was looking in to a similar thing some time last year. I was wondering if anyone had tried using AEVB to fit linear mixed models. I couldn't find anything, and trying to write out how it would be done I got stuck on how the per-observation latent variables for the b's would work. Would you do AEVB for the b's or traditional VI?

I'll have a look at the AutoZI and see how it works there.

adamgayoso commented 4 years ago

I am slightly confused...

b_{condition} ~ N(mu_{conditions}, sigma_{conditions}) \in R^k
b_{individual} ~ N(mu_{individuals} + b_{condition}, sigma_{individuals}),
x_1 = f_1(z, b_{individual}), f_1 : R^10 x R^k -> R^128,
x_2 = f_2(x_1), f_2 : R^128 -> R^nGenes
y ~ NB(x_2, theta)

Shouldn't there be some one-hot vectors to isolate the condition/individual effects? In any case, if I'm understanding this model correctly, it should be fine to do AEVB for everything. There is some description at the very end of the AEVB paper appendix, also. I guess the way to think about it is you globally draw from the posterior of q(b_{c}, b_{i}) for a minibatch, then for each cell in the minibatch, draw q(z), making sure the decoder gets the right condition/individual effect that correspond to that cell.

vals commented 4 years ago

Hm but how do you encode e.g. the mu_{c} from the data? That is if you do mu_{c} = g({data}) what is it you would pass to g? Gene expression? Or just a one-hot vector representing the condition membership?

It was a point I was confused about in the AEVB paper too. They write that you can do it on categorical data but I couldn't clearly see how it would be done.

romain-lopez commented 4 years ago

AEVB proposes to make inference over local latent variables via neural networks parameterization of the variational approximation. For global latent variables, you can just use classical VI which is what is described in the Appendix of the AEVB paper and this is what we do for AutoZI. For these, there is no neural nets and the parameter of the posterior are just global parameter to optimize over with stochastic gradient descent.

local variable means one posterior per datapoint global variable means one posterior for the whole dataset

Does this make sense ?

romain-lopez commented 4 years ago

In autoZI, delta and m are global while z and l are local. If you look at the variational distribution in the AutoZI paper, you see that delta does not depend on the datapoint (no n), while l and z do.

What I am saying is that we can extend these concept to share latent variables between batches using the conditions for example.

vals commented 4 years ago

I see, I think in my mind the 'local latent variable' is what I'd call a 'latent variable' while I'd call a 'global latent variable' a 'parameter'. But I guess this clashes with the 'parameter' being used both for variational distributions and neural nets.

romain-lopez commented 4 years ago

There is a subtle difference. "parameters" are not random variables but either deterministic vectors or scalar. You can parameterize both a variational distribution or the generative model. in AutoZI, theta are parameters of the generative model while alpha_g and beta_g are parameters of the variational distribution.

adamgayoso commented 4 years ago

Closing this due to inactivity. Feel free to reopen for future discussion.