greenelab / tybalt

Training and evaluating a variational autoencoder for pan-cancer gene expression data
BSD 3-Clause "New" or "Revised" License
162 stars 62 forks source link

Sampling space for specific genes #145

Closed spadavec closed 5 years ago

spadavec commented 5 years ago

Hey, great work on this project! I'm try to re-purpose this for something a little different. After training the VAE, I would like to focus on particular genes, and sample around a specific range, and see how that affects other gene distributions. Is that at all possible?

gwaybio commented 5 years ago

Hi @spadavec , thanks for your interest in the project!

Yes, I imagine that this is possible. One could imagine taking a couple real samples, fixing all gene measurements except for one (or a few) and replacing them with some sampling distribution. Then, take the now "simulated" samples and compress and decompress them with the trained network.

The compression will inform how latent space features react to gene expression changes (and possibly identify a single or small group of modules that capture that specific signal) and, by comparing real output to decompressed output of simulated samples, the decompression will inform how specific genes are changing. This would depend on how well specific genes are reconstructed in the first place too!

We also haven't done this evaluation before, so this might not even work at all :man_shrugging:. I think @cgreene and maybe @ajlee21 have thought about this question a bit more though!

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

spadavec commented 5 years ago

After thinking about this some more, I believe the best approach is to just sample the decoder en masse, and find "samples" that have values for the particular gene within the desired range, and see how the other genes are expressed.

Closing.

cgreene commented 5 years ago

You might think about swapping out cross entropy. I think that's the main thing that keeps zero-one required.

On Wed, Mar 27, 2019, 11:41 AM Vito Spadavecchio notifications@github.com wrote:

After thinking about this some more, I believe the best approach is to just sample the decoder en masse, and find "samples" that have values for the particular gene within the desired range, and see how the other genes are expressed.

Closing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/greenelab/tybalt/issues/145#issuecomment-477245662, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhHs0KkXcuxoc-e6f5OXUSzV9TqIvY8ks5va59OgaJpZM4aJ5bu .

-- Casey S. Greene, Ph.D.

Associate Professor Dept. of Systems Pharmacology and Translational Therapeutics Perelman School of Medicine University of Pennsylvania web: http://www.greenelab.com phone: 215-573-2991

Director Childhood Cancer Data Lab Alex's Lemonade Stand Foundation web: http://ccdatalab.org