greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 271 forks source link

Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification #163

Closed agitter closed 7 years ago

agitter commented 7 years ago

https://doi.org/10.1101/095794

Mammogram classification is directly related to computer-aided diagnosis of breast cancer. Traditional methods requires great effort to annotate the training data by costly manual labeling and specialized computational models to detect these annotations during test. Inspired by the success of using deep convolutional features for natural image analysis and multi-instance learning for labeling a set of instances/patches, we propose end-to-end trained deep multi-instance networks for mass classification based on whole mammogram without the aforementioned costly need to annotate the training data. We explore three different schemes to construct deep multi-instance networks for whole mammogram classification. Experimental results on the INbreast dataset demonstrate the robustness of proposed deep networks compared to previous work using segmentation and detection annotations in the training.

cgreene commented 7 years ago

The aim of this work is to take a step back from individual tasks in mammogram analysis without compromising performance. This work essentially uses deep NN's feature construction ability to sidestep these traditional challenges. To expand the number of examples, the researchers use random perturbations (this augmentation strategy is used in other work as well).

Evaluation is performed on both models trained for this work as well as those pretrained on Imagenet. Pretraining provides a substantial boost in AUC. Additional strategies further aid performance. With all the improvements, performance is similar to #169 (worth noting, both are deep-NN based)

cgreene commented 7 years ago

discussed in latest commits

agitter commented 7 years ago

@cgreene Should we start a running list of these papers that use augmented training data in the Discussion section so we don't forget about them? #99 is another example.