Closed Manchery closed 4 years ago
Hi,
Thanks for your experiments and detailed analysis.
Yes, NYUv2 is a quite complicated dataset with a small number of samples, and that is one of the major reasons to cause such oscillation in the final result. I assume you could observe a much smaller oscillation for the final performance in CityScpaes dataset, since it's easier and contains more samples.
What I can suggest is:
Try to use the benchmark technique I suggested in the readme (relative improved performance over single task learning, (best each task performance in multi-task learning across all validation / single task validation performance)). It should reduce such uncertainty.
Simply run 3 or more times and report the averaged performance.
Hope that helps. Sk.
Hi,
Thanks for you suggestion. I agree with that this is caused by a small number of samples and I will simply run several times to report average performance.
But for the recommended benchmark, relative improved performance, do you mean $max_{epoch} average_{task}(relative improvement)$
, or $average_{task} max_{epoch} (relative improvement)$
? I think it should be the former one. The last one are less vulnerable to uncertainty but I think it voilates MTL because one purpose of MTL is to reduce inference time so we should use one model to predict all.
Yes, you should apply the first version, otherwise, it would break the advantage of doing multi-task learning. For a more detailed explanation for such method, I would suggest to take a look at the paper I linked in the README.
I think I made a mathematical mistake. The first one are less vulnerable to uncertainty. Forget what I said.
Thank you for your patient replies.
Best wishes.
Hello, sorry to bother you. I ran the same code several times but got unstable evaluate results. Here is the numbers:
Details:
and
scheduler.step()
according to READMEUPDATE
Here is the results of the code without any modification (except the location of
scheduler.step()
) to make my claim more convincing: