about this paper

Author: Sebastian Ruder Link: https://arxiv.org/pdf/1706.05098.pdf

This article aims to give a general overview of MTL, particularly in deep neural networks. It introduces the two most common methods for MTL in Deep Learning, gives an overview of the literature, and discusses recent advances.

3.1 Hard parameter sharing

Hard parameter sharing greatly reduces the risk of overfitting. In fact, [Baxter, 1997] showed that the risk of overfitting the shared parameters is an order N – where N is the number of tasks – smaller than overfitting the task-specific parameters, i.e. the output layers. This makes sense intuitively: The more tasks we are learning simultaneously, the more our model has to find a representation that captures all of the tasks and the less is our chance of overfitting on our original task.

4.1 Implicit data augmentation

Learning just task A bears the risk of overfitting to task A, while learning A and B jointly enables the model to obtain a better representation F through averaging the noise patterns.

4.2 Attention focusing

If a task is very noisy or data is limited and high-dimensional, it can be difficult for a model to differentiate between relevant and irrelevant features. MTL can help the model focus its attention on those features that actually matter as other tasks will provide additional evidence for the relevance or irrelevance of those features.

4.4 Representation bias

MTL biases the model to prefer representations that other tasks also prefer.

6.9 What should I share in my model?

While useful in many scenarios, hard parameter sharing quickly breaks down if tasks are not closely related or require reasoning on different levels. ... 요즘엔 뭘 공유해야할지 학습시키는 방법이 주류 ...As mentioned initially, we are doing MTL as soon as we are optimizing more than one loss function. Rather than constraining our model to compress the knowledge of all tasks into the same parameter space, it is thus helpful to draw on the advances in MTL that we have discussed and enable our model to learn how the tasks should interact with each other

7 Auxiliary tasks

MTL accuracy increases continuously with the number of tasks [Ramsundar et al., 2015].

7.3 Hints

Recent examples of this strategy in the context of natural language processing are [Yu and Jiang, 2016] who predict whether an input sentence contains a positive or negative sentiment word as auxiliary tasks for sentiment analysis ...

magicpieh28 / Paper-Summary

An Overview of Multi-Task Learning in Deep Neural Networks(2017) #29