Open agitter opened 7 years ago
Right. I was just about to post this one earlier. It seems also relevant to the "wide data, few samples" issue, as it reduces sample size requirement.
This paper can be featured in the Treat section as well as the data limitations, code sharing, and transfer learning sub-sections of the Discussion.
Many of the virtual screening methods (see #45) require large training datasets with thousands or millions of instances, where an instance is a chemical and its activity in an assay of interest. In practice, a typical chemical screen may have substantially less training data to work with. The authors propose one shot learning to overcome the sparse training data, taking advantage of side information in the form of prior screening data for other assays.
The main idea is that the network will use the related screens to learn a mapping from chemical compounds (often featurized with discrete features or a molecular graph) into a continuous space and a similarity measure between chemicals in the continuous space. Nearest neighbor-like approaches in that continuous space can then be applied to make predictions for a new assay (aka task) with very limited task-specific training examples. Though the methods differ substantially, at a high level the mapping to a continuous space reminds me of the unsupervised #104.
Getting more technical, they compare a Siamese network and two LSTM-based approaches for executing this strategy. All of them build upon graph convolutions related to their earlier #53. The Dual Residual LSTM has the best performance of the three.
They evaluate the model in a very challenging setting where at most 10 positive and 10 negative instances are provided for the assay of interest, along with the side information for related assays. The dataset include Tox21 and MUV that have been used previously along with the SIDER dataset on drug side effects, which is especially relevant for the treat discussion. A random forest baseline has very little predict power except on MUV (which has special structure) with so few training samples. The residual LSTM is able to train reasonably well on limited data.
One shot learning will be a great contrast to multitask methods in this domain. They show that training on Tox21 tasks and evaluating on SIDER does not work well, so there is a limit to the transferability.
The code adds a lot of value https://github.com/deepchem/deepchem and provides public data with which to test their models and reproduce their results.
It will be very interesting to see what happens in the "intermediate" data range where one has more than 20 assay-specific examples but not tens of thousands. They are likely working on this (#148) but it is not featured here.
Updated with the DOI of the published version. Also adding this link to the accompanying press release http://news.stanford.edu/press-releases/2017/04/03/deep-learning-aldrug-development/
A nice perspective article on this work http://doi.org/10.1021/acscentsci.7b00153
https://doi.org/10.1021/acscentsci.6b00367 (preprint https://arxiv.org/abs/1611.03199)
This looks very exciting, in part because of their open source software https://github.com/deepchem/deepchem