Closed alxndrkalinin closed 6 years ago
I'm still working through the main results, but this immediately caught my eye:
To encourage adoption of multitask deep-learning methods, we open source all modeling code and datasets for the Kaggle, Factors, Kinase, and UV dataset collections as part of the DeepChem example suite. We hope that this example code and data will facilitate broader adoption of multitask deep-learning techniques for commercial drug discovery.
That's in reference to four datasets from the Merck authors. In our data sharing discussion, we wrote:
Private companies may establish a competitive advantage by releasing data sufficient for improved methods to be developed.
This would be a perfect example of that! (cc @cgreene)
Nice! I think the term in the industry is "pre-competitive". It'll be very nice to have something to note there, as opposed to me saying something random into the breeze. :smile:
I don't think the performance here requires changing the tone of our Ligand-based prediction of bioactivity
section. They introduce two multitask architectures -- progressive and bypass -- that are new to drug discovery. Those are benchmarked along with singletask fully connected networks, multitask networks, and random forest on four Merck datasets. They assess performance on the continuous labels using R^2.
In general, neural networks are better than random forest, but not for all tasks. In general, multitask networks are better than singletask, but not always. They look at task-task similarity in the relevant datasets, which influences the success of multitask models.
As I noted above, I think the collaboration with Merck is one of the more exciting components. Three of the four Merck datasets (all but the Kaggle dataset) are going to be released publicly for the first time via their DeepChem repository. That will open them to others for further benchmarking.
@agitter I agree!
these new architectures were also briefly mentioned in the moleculeNet paper https://arxiv.org/pdf/1703.00564.pdf (which we already cite) - the moleculeNet paper refers to this paper here as "Manuscript in preparation". we should definitely cite the current paper too.
We could discuss this more extensively in the drug discovery section, but we do cite it now in the Discussion so I'm closing the issue.
https://doi.org/10.1021/acs.jcim.7b00146
@rbharath's take on testing multitask deep learning performance in DeepChem.