greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 270 forks source link

Accurate and efficient target prediction using a potency-sensitive influence-relevance voter #229

Open swamidass opened 7 years ago

swamidass commented 7 years ago

https://jcheminf.springeropen.com/articles/10.1186/s13321-015-0110-6

Edit: https://doi.org/10.1186/s13321-015-0110-6

Background A number of algorithms have been proposed to predict the biological targets of diverse molecules. Some are structure-based, but the most common are ligand-based and use chemical fingerprints and the notion of chemical similarity. These methods tend to be computationally faster than others, making them particularly attractive tools as the amount of available data grows. Results Using a ChEMBL-derived database covering 490,760 molecule-protein interactions and 3236 protein targets, we conduct a large-scale assessment of the performance of several target-prediction algorithms at predicting drug-target activity. We assess algorithm performance using three validation procedures: standard tenfold cross-validation, tenfold cross-validation in a simulated screen that includes random inactive molecules, and validation on an external test set composed of molecules not present in our database. Conclusions We present two improvements over current practice. First, using a modified version of the influence-relevance voter (IRV), we show that using molecule potency data can improve target prediction. Second, we demonstrate that random inactive molecules added during training can boost the accuracy of several algorithms in realistic target-prediction experiments. Our potency-sensitive version of the IRV (PS-IRV) obtains the best results on large test sets in most of the experiments. Models and software are publicly accessible through the chemoinformatics portal at http://chemdb.ics.uci.edu/

agitter commented 7 years ago

@swamidass I am reading through the papers you posted and have a couple quick questions. I also edited the original post.

Is the IrvPred web server and source code at http://chemdb.ics.uci.edu/cgibin/tools/IrvPredWeb.py the software referenced in this paper?

In this comparison with SVM and Random Forest, the features come from fingerprint similarity. This makes sense because it is most similar to the IRV approach. Have you directly compared standard classifiers (e.g. Random Forest) trained with fingerprint similarity features versus using the fingerprint bit vector directly as the features? I haven't yet surveyed everything you posted and am trying to prioritize my reading.

swamidass commented 7 years ago

I believe that is the software, though this is to Baldi's site so I cannot 100% be sure.

In this study, we did not compare to RF or fingerprint bit vectors as a direct input to a neural network (I'll abbreviate this as FP -> NN).

  1. RF was excluded because the most commonly used RF software is not free to academics. Moreover, in this domain, RF usually does not outperform SVMs (they are about equivalent). So, if we can consistently outperform SVMs (which we can) we can fairly infer that we would also outperform RF.

  2. FP -> NN was not included because it usually performs poorly compared to SVMs and Tanimoto similarity on circular fingerprints. This is considered "common knowledge" in the field, so reviewers just do not ask for it. This also, makes all the deep learning papers that show improvement over FP -> NN without even trying tanimoto similarity (I'm not going to name names) unconvincing. This is really just a strawman method. At minimum, the real comparison should be to SVMs using Tanimoto similarity (or Min Max similarity) as the kernel function.

  3. There is some unsubstantiated belief in the field that Naive Bayes classifiers can work, so we do always include them. However, they usually work poorly.

In this study, I think there are a couple key things to key in on:

  1. IRV is a very low parameter approach (around 10 weights), because of extensive weight sharing, that works extremely well. At this point, I do not thing anything else has consistently outperformed it.

  2. It also enables injection of additional data into the model. The inclusion of potency data significantly improves results, and other methods have no way of including this method. This is one of the big advantages of Deep Learning approaches (even though this particular method is not super deep).

  3. The IRV is also interpretable in some key ways. This is particularly important to emphasize. With the right structure, the Deep Learning can be interpretable.

agitter commented 7 years ago

Thanks. I did appreciate the advantages from reading this paper and #228. I'm still trying to figure out where it fits in #174 because I think our working defining of "deep learning" may be "high-parameter neural networks". I'll have to see how we've discussed low-parameter NNs in other sections, the options being that we consider them along with high-parameter NNs or treat them as a competing method.

Even if it is common knowledge in the field, I would still like to find a reference that directly compares (Tanimoto similarity on fingerprints -> SVM) or (Tanimoto similarity on fingerprints -> random forest) with (fingerprints -> random forest) or (fingerprints -> neural networks) on the same data. That will help readers from outside the cheminformatics field, which I expect to be most readers.

swamidass commented 7 years ago

I think that is a legitimate concern. IRV are low parameter, so they are not quite in the standard group of Deep Learning methods. On that basis, it is possible we may want to exclude them from the review, or at least point out that they are not exactly the current pattern.

Deep Learning, however, is more than just "high-parameter". I think a better way to define Deep Learning is: "a collection of new techniques for building neural networks, including higher parameter models, recursive and convolution networks, improved architectures, and improved training strategies."

The IRV, by this definition, is a class of Deep Learning. Although it is not high parameter, it (1) uses more hidden layers than normal (there are three hidden layers, plus a kernel layer, between the input and output, (2) it uses extensive weight replication to reduce weights substantially. Of course, it will have limitations compared to the new methods. Honestly, I expect that they will eventually be outclassed by better versions of (for example) #53. We are just not there yet.

swamidass commented 7 years ago

About references that show NN's on fingerprints directly don't work so well. That is a tall request.

There was just so much unpublished experience of people trying this approach (albeit with older regularization techniques) that it is offhand mentioned in cheminformatics all the times. Given the bias against publishing negative results, that will be a hard reference to find.

Now it is entirely possible that using more advanced regularization that it can work on par with SVMs, RFs and Tanimoto similarity. That has to be established, however, before FP->NN's are a convincing baseline method against which to benchmark improvement over state of the art methods. I think is really the key point. While it is hard to produce a reference that shows FP->NNs are poor, there is really no body literature that demonstrates that they reliably produce results comparable with RF, SVMs and Tanimoto similarity. This alone is enough to discourage use of FP->NNs as a baseline method of comparison.

swamidass commented 7 years ago

@cgreene , as our circus master, can you please comment on the definition that you want to use for deep learning?

@agitter offers: "high-parameter neural networks"

I think this is more accurate: "a collection of new techniques for building neural networks, including higher parameter models, recursive and convolution networks, improved architectures, and improved training strategies."

I think this is important to clarify because high-parameter networks have been around for a long time. They just never worked well, so people avoided them. It is only with new DL techniques (e.g. dropout, resnet, relu, batch-normalization, etc.) that they started to work. This is a pretty fundamental cross-cutting issue to resolve. Can you please weigh in @cgreene ?

swamidass commented 7 years ago

@agitter asks for some benchmark papers.

These are some important papers...

https://www.ncbi.nlm.nih.gov/pubmed/15961479 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3225883/

This competition on HIV data is pretty important and shows SVMs (from my team) outperforming everything else:

http://www.agnostic.inf.ethz.ch/index.php

Though we did follow up work that demonstrated the IRV really does better:

https://www.ncbi.nlm.nih.gov/pubmed/20378557

I think the competitors in that competition are helpful. You'll see every which algorithm there is there. And MinMax-kernel SVMs (or MinMax-sim IRVs) outperforms everything.

agitter commented 7 years ago

@swamidass I'm interested in @cgreene's feedback on this as well, but I should say that my definition probably is in line with what you posed above. More thoughts soon.

agitter commented 7 years ago

I still don't have time articulate my complete thoughts, but my questions about how to classify IRV may not matter much in the end. I plan to include it in this section and am trying to think about where it fits in the new narrative. I have an outline in mind and will write it up as soon as I can for feedback.

cgreene commented 7 years ago

My thoughts are much in line with @swamidass. I just filed #243 to touch on improvements to the introduction to more clearly define what we mean by transform. Right near some of the sections that were touched are the current definitions that we've been using.

If you expand the text from that section of the PR, you can see on lines 48-58 of the revised version the definitions that we have been using. We been relatively permissive saying that multi-layer NNs used to construct features, at some stage, count. We also - for what it's worth - note that by this definition such models have existed for more than 50 years in the literature.

@swamidass : I'd be thrilled if you want to refine this via a PR on the intro to highlight a more restrictive perspective on deep learning. It will require us to start making harder calls as to what qualifies.

cgreene commented 7 years ago

Side note: I just got back from a trip to UCI where I chatted with Pierre. I should have asked in person, but I'm just catching up on this after my return! (with regards to I believe that is the software, though this is to Baldi's site so I cannot 100% be sure.)

swamidass commented 7 years ago

Hope you got to talk to him about this =). He is one of the early leaders in the field that not so many people know about. Any how, I can take a crack at the intro.

cgreene commented 7 years ago

Yea - we chatted a lot about the science but not about this review (missed opportunity - doh!). Do you think you could get him onto github for this? It would be great to get his perspective + feedback!

swamidass commented 7 years ago

I don't think he is an internet "chatter". He generally avoids reviews. But you can certainly try. S. Joshua Swamidasshttp://swami.wustl.edu/

On Fri, Feb 17, 2017 1:24 PM, Casey Greene notifications@github.com wrote: Yea - we chatted a lot about the science but not about this review (missed opportunity - doh!). Do you think you could get him onto github for this? It would be great to get his perspective + feedback!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.