[26] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

idea : multi-task를 할 때, 각 task들의 relation명시적으로 주지 않아도 알아서 modeling 할 수 있는 multi-gate MoE(MMOE)를 만들자

일반적인 multi-task learning을 할 때, 공유되는 네트워크(shared bottom)가 있고 위에 각 task 별로 FCN을 쌓는 식으로 되어있다. 이 논문에서는 여기에 MoE 아이디어를 결합하여 각 expert들을 shared bottom으로 사용하도록 한다. 여기에 원래 MoE는 하나의 gating network가 있는데 MMoE에서는 각 task k별로 gating network를 만들도록 한다.

이때 각 gating network는 간단한 input_dim은 feature이고 output_dim은 num_experts인 classifier이다.

synthetic 데이터에 대한 평가는 아래와 같다. 태스크별 correlation이 높을 수록

real data에 대한 평가는 아래와 같다.

한줄 평 : 흠..classifier 별로 correlation 초기값을 좀 줄 수 있는 방법이 있을까?

long8v / PTIR

[26] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts #29