greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 270 forks source link

Seq2seq Fingerprint: An Unsupervised Deep Molecular Embedding for Drug Discovery #652

Open alxndrkalinin opened 7 years ago

alxndrkalinin commented 7 years ago

https://doi.org/10.1145/3107411.3107424

Many of today's drug discoveries require expertise knowledge and insanely expensive biological experiments for identifying the chemical molecular properties. However, despite the growing interests of using supervised machine learning algorithms to automatically identify those chemical molecular properties, there is little advancement of the performance and accuracy due to the limited amount of training data. In this paper, we propose a novel unsupervised molecular embedding method, providing a continuous feature vector for each molecule to perform further tasks, e.g., solubility classification. In the proposed method, a multi-layered Gated Recurrent Unit (GRU) network is used to map the input molecule into a continuous feature vector of fixed dimensionality, and then another deep GRU network is employed to decode the continuous vector back to the original molecule. As a result, the continuous encoding vector is expected to contain rigorous and enough information to recover the original molecule and predict its chemical properties. The proposed embedding method could utilize almost unlimited molecule data for the training phase. With sufficient information encoded in the vector, the proposed method is also robust and task-insensitive. The performance and robustness are confirmed and interpreted in our extensive experiments.

XericZephyr commented 6 years ago

Hi, @alxndrkalinin,

This is Zheng, the lead author of this paper. I just randomly jump in and it seems that you have interests in our paper. Let me know if you have any questions. And thanks for your interests!

BTW, we have recently open-sourced our official implementation of this paper. Link: https://github.com/XericZephyr/seq2seq-fingerprint Also, deepchem folks recently wrote some tutorial on this paper as well. Link: https://deepchem.io/docs/notebooks/seqtoseq_fingerprint.html

alxndrkalinin commented 6 years ago

Hi @XericZephyr, thanks for your suggestion! I was merely cataloguing an interesting paper for the review we wrote using this repo, but it's great to have the info you provided here – it might be useful for future iterations of this project.

agitter commented 6 years ago

Thanks for the notes @XericZephyr. I'm interested in this area and could add this paper to the next deep review release. It fits well in our Chemical featurization and representation learning section.

XericZephyr commented 6 years ago

Hi, @agitter, @alxndrkalinin.

Thanks again for making such great review paper for AI in drug discovery. Feel free to contact me if you have any questions on the details of our paper.