1) For each individual task's loss, the paper suggests adding the log of task-dependent standard deviation to the multi-task loss. However, the code is adding log of log of task-dependent variance to the multi-task loss instead. Why this discrepancy? Is there a typo in the paper?
There are a couple of instances where the code doesn't agree with the paper (https://arxiv.org/pdf/1705.07115.pdf):
1) For each individual task's loss, the paper suggests adding the log of task-dependent standard deviation to the multi-task loss. However, the code is adding log of log of task-dependent variance to the multi-task loss instead. Why this discrepancy? Is there a typo in the paper?
2) Weights given to each loss in the code don't correspond to those in the equations presented in the paper, please see my followup here: https://github.com/yaringal/multi-task-learning-example/issues/1