Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders.

cgreene commented 8 years ago

Paper needs to be read carefully for relevance. http://dx.doi.org/10.1142/9789814644730_0014

cgreene commented 8 years ago

As an author of this paper, I am not the best person to review it. Would love to have at least one non-author pitch in. Maybe @hussius since I know you wrote a blog post about it.

Biology: What gene expression patterns do denoising autoencoders identify in gene expression biopsies? Primarily using these data as proof-of-concept for the method's robustness.

Computational Methods:

Denoising autoencoder, single layer
Theano used, training on large METABRIC dataset.
Model, data, etc all available: http://discovery.dartmouth.edu/~cgreene/da-psb2015/
No source code
Subsequent evaluation of unsupervised model using well known features (tumor/normal, breast cancer subtypes, transcription factors via ENCODE, pathway information)
All features with annotations end up evaluated on independent TCGA data.

Results

Finds that known features are represented in DA model - e.g. subtype.
Interestingly (though maybe not surprising for unsupervised methods) - the features show no signs of overfitting in the independent TCGA dataset.
Suggests that these methods may be robust to dataset effects & feature construction may provide some support to overcome cross-dataset issues.

Example of unsupervised method, analysis of transcriptional regulation, potentially some discussion around pathway activities that may be relevant to the review.

hussius commented 8 years ago

I can give it a shot.

michaelmhoffman commented 8 years ago

Does a single-layer model fit into a review on "deep" learning?

cgreene commented 8 years ago

@michaelmhoffman : I am not necessarily a proponent of building models that are sufficiently complex to trigger some sort of arbitrary "deep" nomenclature. I'm most interested in methods that include some strong data-driven feature construction work. In the scope of this review, probably adding the "with neural networks" constraint.

The thing that I like about true deep architectures is that feature construction gets baked in to the learning algorithm. The thing that I like about this "shallow learning" architecture is that a biologist can take a look at it and interpret features.

I guess I'd say - personally - if it passes the threshold of data-driven feature construction with neural networks then I think it's the type of research that I think will be primed for data-intensive discoveries.

akundaje commented 8 years ago

@cgreene Fully agree with you. Only caution that I think is again not stressed enough in current reviews is the interpretability of even a single layer model should be done very cautiously. Neural nets learn distributed representations and even though individual neurons/filters may appear interpretable, they should not be overinterpretted as "this filter is a CTCF motif" like some papers do. There are often many filters that collectively capture a single predictive pattern like a motif. There are ways to re-derive these. Looking at filters for an intuitive feel of what the network is great. Using individual filters outside of the network is dangerous and wrong IMHO.

On a side note, sorry if I'm being negatively critical of too many things :). Just feel like the use of deep nets in compbio is still in its infancy and if we can avoid propagating suboptimal practices we should do that through this review and papers.

cgreene commented 8 years ago

@akundaje : Definitely important not to over-interpret outside of the context of the network. No problem & totally agree on infancy. I think we need people to take the optimistic and pessimistic sides on many topics if we want to put together a solid perspective.

hussius commented 8 years ago

@akundaje Personally I think it's great that you are critical - I am learning a lot here. Perhaps one of the "features" of this review, as @cgreene implies, could be to have a more balanced/objective perspective. The existing reviews, while good, seem to downplay the problems.

cgreene commented 8 years ago

Yes! If the answer to our question is that the things that would need to be true for this to be a disruptive approach are implausible then I think that would be a particularly unique contribution!

On Tue, Aug 9, 2016, 5:43 PM Mikael Huss notifications@github.com wrote:

@akundaje https://github.com/akundaje Personally I think it's great that you are critical - I am learning a lot here. Perhaps one of the "features" of this review, as @cgreene https://github.com/cgreene implies, could be to have a more balanced/objective perspective. The existing reviews, while good, seem to downplay the problems.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/greenelab/deep-review/issues/6#issuecomment-238701297, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhHs723eY8HAuaXyzdGKyUJss3p1Lt-ks5qePRugaJpZM4Jdwab .

cgreene commented 7 years ago

I've labeled this paper for the 'study' component. It's not receiving more discussion at this point so I've closed it. We're now using 'open' papers only for items undergoing active discussion.

@akundaje - maybe you could contribute a paragraph to the study section on the hazards of over-interpretation? I agree with @hussius that this is an important topic in the field.

akundaje commented 7 years ago

Sure. I can help with that next week.

On Oct 14, 2016 9:23 AM, "Casey Greene" notifications@github.com wrote:

I've labeled this paper for the 'study' component. It's not receiving more discussion at this point so I've closed it. We're now using 'open' papers only for items undergoing active discussion.

@akundaje https://github.com/akundaje - maybe you could contribute a paragraph to the study section on the hazards of over-interpretation? I agree with @hussius https://github.com/hussius that this is an important topic in the field.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/greenelab/deep-review/issues/6#issuecomment-253850345, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7EaLV6yUctPHAktT9QuMu9XSPAtOVks5qz6xsgaJpZM4Jdwab .

cgreene commented 7 years ago

Awesome! @agitter : when you stub in the "study" section can you make sure there's a spot for interpretation of these models? We may instead end up putting it in our concluding/general thoughts, but that seems like a good home for now.

agitter commented 7 years ago

@cgreene Sure, I can include an interpretation subsection in 'study' for now. Soon we should have a better idea of whether all of the meta-commentary (interpretation, evaluation, pitfalls, etc.) fits in the study/treat/categorize sections or warrants a separate discussion section.

rezahay commented 6 years ago

Dear Casey, I just read your excellent paper. Would you please elaborate a bit more about linking features to the sample characteristics? ("We evaluated the balanced accuracy for each node at each threshold to predict the desired sample characteristic.") I wonder to know how you have calculated the balanced accuracy based on the activity values at the thresholds. thanks in advance, Reza

cgreene commented 6 years ago

Hi @rezahay: If I recall correctly Jie defined the specified thresholds. Then she determined, at that threshold, what the balanced accuracy would have been if that was the cut-point for a classifier. The same node + threshold then gets tested on the independent test dataset.

rezahay commented 6 years ago

Dear Casey, Thanks a lot for your response.

greenelab / deep-review

Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders. #6