Open agitter opened 7 years ago
Interesting study that uses binarized chemical compound vectors of length 166 (that look like this) combined with dosage concentration data to generate new compounds that may help prioritize candidate small molecules that treat cancer patients.
~I am not entirely sure if we should consider this paper for our review.~ edit I think we can include it now, in the treat
section or as a method for prioritizing drug candidates/repurposing.
This is not my field of expertise, but I am interested in adversarial methods so I gave this paper a thorough read. However, the methods, results, and evaluation remain a bit unclear to me. Another really nice thing about this paper is the availability of source code (https://github.com/spoilt333/onco-aae). Perhaps @spoilt333 can help to clarify some of my confusion. I outlined my understanding above, but a couple of points remain:
Overall, I thought the paper elegantly laid out the problem of very high drug development failure rate and the evolution of computational methods for compound prioritization. They also apply a promising approach that appears to be working at first glance. I think it would be great to see this approach work really well as it appears to be a very promising approach for drug development and drug repurposing. However, I think that given my concerns perhaps it is not suitable for this review. Maybe we could talk about the idea of the approach in the discussion - I am not sure.
Hello there. I'll try to answer the points
2017-02-02 18:58 GMT+03:00 Greg Way notifications@github.com:
Interesting study that uses binarized chemical compound vectors of length 166 (that look like this http://www.nature.com/nprot/journal/v9/n9/fig_tab/nprot.2014.151_F2.html) combined with dosage concentration data to generate new compounds that may help prioritize candidate small molecules that treat cancer patients. Biological Aspects
- Chemical compounds with dosage information as input
- Also included is the chemical's corresponding growth inhibition in a breast cancer cell line (MCF-7)
Computational Aspects
- adversarial autoencoder https://arxiv.org/abs/1511.05644 that encodes input binarized chemical compound vectors into a length 5 latent layer
- 2 layer encoder to learn how the molecular fingerprint impacts growth inhibition
- The latent layer can thereby represent a vector of how well the corresponding fingerprint impacts MCF-7 growth
- 2 layer decoder for reconstruction
- The adversarial training comes in as the authors sample from a learned prior distribution
- The sampled length 5 vector from the prior is then run through a discriminator to detect real latent vectors from fake
- Growth inhibition is sampled from a normal distribution with mean=5 and variance=1 independently from the prior
- Once the model is trained, the sampled latent vector is decoded to output an artificial molecular fingerprint with a corresponding drug concentration
- This artificial fingerprint is compared against a reference of 72 million compounds from pubchem https://pubchem.ncbi.nlm.nih.gov/
- The authors then selected the top 10 most similar compounds to their predicted compounds if the decoded log concentration was less than -5.0 molar
Why we should include it in our review
I am not entirely sure if we should consider this paper for our review.
This is not my field of expertise, but I am interested in adversarial methods so I gave this paper a thorough read. However, the methods, results, and evaluation remain a bit unclear to me. Another really nice thing about this paper is the availability of source code ( https://github.com/spoilt333/onco-aae). Perhaps @spoilt333 https://github.com/spoilt333 can help to clarify some of my confusion. I outlined my understanding above, but a couple of points remain:
Why was the growth inhibition (GI) sampled independently?
- it seems to me that this is a critical component of the model and if the GI is high, then the drug is considered effective. Isn't this artificial sampling decoupled from the learning process?
Why did the authors choose to sample 640 vectors and how did they exactly determine similar compounds from pubchem?
What is the discriminator? Is it using some sort of density metric or KL divergence as compared to the latent distribution?
There is no discussion on how the model is training and if it is actually learning something meaningful. The authors do really nicely discuss several specific examples of "nearest" compounds so it seems to be working but it would really be great to see some sort of model evaluation.
- For example, what is the reconstruction cost associated with the autoencoder portion of the model and what was stopping criteria? What is it across epochs?
- What are the hyperparameters of the model and how were they chosen?
Overall, I thought the paper elegantly laid out the problem of the very high drug development failure rate and the evolution of computational methods for compound prioritization. They also apply a promising approach that appears to be working at first glance. I think it would be great to see this approach work really well as it appears to be a very promising approach for drug development and drug repurposing. However, I think that given my concerns perhaps it is not suitable for this review. Maybe we could talk about the idea of the approach in the discussion - I am not sure.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/greenelab/deep-review/issues/213#issuecomment-276997680, or mute the thread https://github.com/notifications/unsubscribe-auth/AIXsci-Y6hRQ2Lgki31ymJMsyVcepU5sks5rYf0pgaJpZM4Lv5UF .
Hi @spoilt333 - this is great! thanks for your prompt response - i think this clears up a lot. I'll respond to your points below:
- Actually, GI neuron was trained jointly with rest latent neurons as predictor of "efficiency" of drug. But, after training, it was used as tuner for generating new drugs. Latent layer is a kind of noise and GI is a condition for Decoder net, and both used to produce output.
Ah, I see, this makes sense now - I think this is a nice innovation! I can see then that the rejection criterion was whether or not the concentration of the corresponding reconstructed molecular fingerprint was reasonable.
- There was no reason to pick exactly 640 samples, but we had to chose some:) As output layer has sigmoid activation we treat it as probability of presence of corresponding bit in compound code. So, "similarity" was just a likelihood of a compound to be sampled from generated vector.
Great, ok, I see now. I must have missed that the output layer was sigmoid.
- Discriminator is a standard GANs part. In fact, it is a binary classifier which tries to determine was sample came from some "true" distribution or it was generated by NN. In our case, true distribution was Gaussian, and false came from Encoder.
Yep! I was wondering what the architecture of the discriminator was. Sounds like it could be a logistic regression classifier? Or was it that you sampled several times from the generator and if it fell beyond the distribution of the real latent space then it was rejected?
- It is really a big point and we are going to make it more clear in next paper. There is few ideas Most important hyperparameter is a latent layer size IMO.
I have found this to be the case as well. Looking forward to the next paper.
Thanks again for responding so quickly, I will update my summary posted above accordingly.
I think it could be not clear enough from code because of some optimization tricks. You're right, discriminator is logistic regression classifier with reformulated cost. About output layer - it has no activation in code, but inside tf.nn.sigmoid_cross_entropy_with_logits sigmoid applied to evaluate a cost. And, of course, after generating new vectors we applied it too.
2017-02-03 1:49 GMT+03:00 Greg Way notifications@github.com:
Hi @spoilt333 https://github.com/spoilt333 - this is great! thanks for your prompt response - i think this clears up a lot. I'll respond to your points below:
- Actually, GI neuron was trained jointly with rest latent neurons as predictor of "efficiency" of drug. But, after training, it was used as tuner for generating new drugs. Latent layer is a kind of noise and GI is a condition for Decoder net, and both used to produce output.
Ah, I see, this makes sense now - I think this is a nice innovation! I can see then that the rejection criterion was whether or not the concentration of the corresponding reconstructed molecular fingerprint was reasonable.
- There was no reason to pick exactly 640 samples, but we had to chose some:) As output layer has sigmoid activation we treat it as probability of presence of corresponding bit in compound code. So, "similarity" was just a likelihood of a compound to be sampled from generated vector.
Great, ok, I see now. I must have missed that the output layer was sigmoid.
- Discriminator is a standard GANs part. In fact, it is a binary classifier which tries to determine was sample came from some "true" distribution or it was generated by NN. In our case, true distribution was Gaussian, and false came from Encoder.
Yep! I was wondering what the architecture of the discriminator was. Sounds like it could be a logistic regression classifier? Or was it that you sampled several times from the generator and if it fell beyond the distribution of the real latent space then it was rejected?
- It is really a big point and we are going to make it more clear in next paper. There is few ideas Most important hyperparameter is a latent layer size IMO.
I have found this to be the case as well. Looking forward to the next paper.
Thanks again for responding so quickly, I will update my summary posted above accordingly.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/greenelab/deep-review/issues/213#issuecomment-277110121, or mute the thread https://github.com/notifications/unsubscribe-auth/AIXsco68TtiuUxXEsH4AcqjwJJO37fahks5rYl2PgaJpZM4Lv5UF .
http://doi.org/10.18632/oncotarget.14073