greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 271 forks source link

Multi-task Deep Neural Networks for Automated Extraction of Primary Site and Laterality Information from Cancer Pathology Reports #139

Closed agitter closed 7 years ago

agitter commented 7 years ago

http://doi.org/10.1007/978-3-319-47898-2_21

Automated annotation of free-text cancer pathology reports is a critical challenge for cancer registries and the national cancer surveillance program. In this paper, we investigated deep neural networks (DNNs) for automated extraction of the primary cancer site and its laterality, two fundamental targets of cancer reporting. Our experiments showed that single-task DNNs are capable of extracting information with higher precision and recall than traditional classification methods for the more challenging target. Furthermore, a multi-task learning DNN resulted in further performance improvement. This preliminary study, indicate the strong potential for multi-task deep neural networks to extract cancer-relevant information from free-text pathology reports.

I didn't read this so I'm not sure if it is Categorize or Treat. I selected both for now.

cgreene commented 7 years ago

I can't get access to it. This might fit well into a Categorize section that I am writing. I filed an interlibrary loan request, and also contacted the corresponding author asking for a PDF (expect the interlibrary loan won't come through until after the holidays).

cgreene commented 7 years ago

My interlibrary loan request was denied (https://twitter.com/GreeneScientist/status/812007840437207041)! Never had that happen before. It's now down to whether or not the corresponding author can supply a PDF.

jimprince commented 7 years ago

My apologizes, I'm not familiar with typical kosher methods of posting something like this.

cgreene commented 7 years ago

@jimprince I will review the paper! Thanks!

jimprince commented 7 years ago

Removed. Thanks!

cgreene commented 7 years ago

With some help I was able to secure a PDF of the paper.

The authors analyzed de-identified pathology reports from five SEER registries (~2000 reports; ~1000 of which had both primary site and laterality). They generated a balanced set by over or under sampling sites that were less or more frequent respectively. After some standard preprocessing, they used straightforward deep neural networks.

They evaluated with and without multi-task learning (this connects to the transferability question that @hussius has raised and that we've discussed with regards to a number of papers). For MTL, the primary task was cancer site, while the secondary task was laterality. They compared this with two separately trained DNNs.

Performance on the primary site is uniformly good across methods (all were ~.99 F-scores and above). The MTL-DNN performed best by macro and micro F1 scores, but there's not a lot of range to really see a pronounced difference. Importantly, on the laterality question the MTL-DNN performed substantially better than the other methods. This makes a nice argument for us explicitly addressing transferability. This suggests that, particularly with limited training examples and multiple tasks, training first for an easier task and then using those same features to attack a harder task can get you further.

I'll write this into the categorize portion where I discuss text mining. I think that we should also raise it in the discussion points (@hussius, are you taking that or do you want me to take a stab at it - I'll note this issue for now there so that we don't forget it).

cgreene commented 7 years ago

This is now discussed thanks to #167

agitter commented 7 years ago

@cgreene I think you should plan to take text mining. In #116 @hussius said he wasn't going to have time to write.

cgreene commented 7 years ago

Sounds good!

On Wed, Dec 28, 2016 at 4:24 PM Anthony Gitter notifications@github.com wrote:

@cgreene https://github.com/cgreene I think you should plan to take text mining. In #116 https://github.com/greenelab/deep-review/issues/116 @hussius https://github.com/hussius said he wasn't going to have time to write https://github.com/greenelab/deep-review/issues/116#issuecomment-256122599 .

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/greenelab/deep-review/issues/139#issuecomment-269543054, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhHs9ikOQbKOoFXUMYT01DS2NF9Q8Lnks5rMtN0gaJpZM4KvIkg .

-- Casey S. Greene, Ph.D. Assistant Professor Dept. of Systems Pharmacology and Translational Therapeutics Perelman School of Medicine University of Pennsylvania

web: http://www.greenelab.com phone: 215-573-2991