greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 270 forks source link

druGAN: an advanced generative adversarial autoencoder model for denovo generation of new molecules with desired molecular properties in silico #583

Open alxndrkalinin opened 7 years ago

alxndrkalinin commented 7 years ago

https://doi.org/10.1021/acs.molpharmaceut.7b00346

Deep generative models are emerging technologies in drug discovery and biomarker development. In our recent work, we demonstrated a proof-of-concept of implementing deep generative adversarial autoencoder (AAE) to identify new molecular fingerprints with pre-defined anti-cancer properties. Another popular generative model is the variational autoencoder (VAE), which is based on deep neural architectures. In this work, we developed an advanced AAE model for molecular feature extraction problems and proved its superiority to VAE in terms of a) adjustability in generating molecular fingerprints; b) capacity of processing huge molecular datasets; and c) efficiency in unsupervised pretraining for the regression model. Our results suggest that the proposed AAE model significantly enhances the capacity and efficiency of development of the new molecules with specific anti-cancer properties using the deep generative models.

I only glanced through, but looks like an extension of #213 comparing AAE and VAE for drug generation

agitter commented 7 years ago

cc @gwaygenomics who summarized #213

gwaybio commented 7 years ago

thanks for tagging me @agitter - I just read the paper and will summarize below

gwaybio commented 7 years ago

Computational Aspects

but looks like an extension of #213

It is an extension of #213 - more data (72 million molecules from pubchem) and a larger architecture (more layers, larger latent space).

comparing AAE and VAE for drug generation

This the the main focus of the paper, with a conclusion that AAE is the preferred framework because of increased "capacity and efficiency". This references table 1, which I thought was difficult to understand.

Biological Aspects

Besides a description of input data, this paper seems to be more focused on architecture selection than on discovering novel biology. The paper does discuss predicting solubility of molecules using pretrained weights from each model.

Notes

gwaybio commented 7 years ago

Perhaps @spoilt333 could provide a better summary and clear up some confusion!

spoilt333 commented 7 years ago

Hi! It is extension of our previous paper, but focused on architectures. As I noticed there is 2 questions:

I wonder if the "deterministic warm-up" discussed in the LVAE paper is similar or would also help here?

Actually, we didn't tune VAE network as much as AAE, so it isn't very fair comparison. I mean that one can introduce upgrades for VAE and outperform our AAE. However, introduced changes for AAE was mostly about training scheme except removing BN which we applied to both networks, so tuning wasn't very bad and we compared almost basic architectures.

Based on figure 2, it doesn't look like the models are reducing reconstruction loss over training epochs much at all. There are also not many training epochs in general. Were more performed?

There is so many molecules in pubchem that it isn't actually necessary to train autoencoder for several epochs. However, due to balancing between generation and reconstruction we can't stop updating AE part while training generation.

cmorris2945 commented 6 years ago

May I see the research paper please?