Different Single-task results from the paper?

xmasotto commented 4 years ago

Hello - thank you so much for uploading high-quality code along with your paper.

I'm interested in reproducing Table 3, starting with "One Task". I ran 'model_segnet_single.py' with task=semantic, and got mIOU scores which looked a lot higher than the paper (17.82 vs 15.10). Is this expected? Did something change in the codebase since when the paper was submitted?

Thanks!

Here are the first 30 epochs of logs: Epoch: 0000 | TRAIN: 1.9690 0.0748 0.3519 TEST: 1.7898 0.0909 0.3818 Epoch: 0001 | TRAIN: 1.7170 0.1067 0.4119 TEST: 1.7185 0.1073 0.4083 Epoch: 0002 | TRAIN: 1.6698 0.1123 0.4205 TEST: 1.6828 0.1334 0.4168 Epoch: 0003 | TRAIN: 1.6397 0.1254 0.4343 TEST: 1.6710 0.1163 0.4221 Epoch: 0004 | TRAIN: 1.6036 0.1329 0.4447 TEST: 1.6538 0.1332 0.4249 Epoch: 0005 | TRAIN: 1.5880 0.1350 0.4478 TEST: 1.6328 0.1306 0.4315 Epoch: 0006 | TRAIN: 1.5570 0.1429 0.4593 TEST: 1.6215 0.1559 0.4438 Epoch: 0007 | TRAIN: 1.5254 0.1507 0.4703 TEST: 1.6558 0.1374 0.4345 Epoch: 0008 | TRAIN: 1.5026 0.1545 0.4773 TEST: 1.5411 0.1455 0.4665 Epoch: 0009 | TRAIN: 1.4802 0.1562 0.4838 TEST: 1.5541 0.1525 0.4637 Epoch: 0010 | TRAIN: 1.4455 0.1616 0.4960 TEST: 1.5362 0.1441 0.4690 Epoch: 0011 | TRAIN: 1.4185 0.1616 0.5024 TEST: 1.5183 0.1539 0.4725 Epoch: 0012 | TRAIN: 1.3854 0.1654 0.5138 TEST: 1.5059 0.1523 0.4774 Epoch: 0013 | TRAIN: 1.3608 0.1668 0.5205 TEST: 1.4475 0.1524 0.4972 Epoch: 0014 | TRAIN: 1.3269 0.1698 0.5317 TEST: 1.4648 0.1649 0.5012 Epoch: 0015 | TRAIN: 1.2967 0.1731 0.5422 TEST: 1.4877 0.1551 0.4949 Epoch: 0016 | TRAIN: 1.2693 0.1778 0.5522 TEST: 1.4616 0.1596 0.4787 Epoch: 0017 | TRAIN: 1.2284 0.1843 0.5667 TEST: 1.4990 0.1772 0.5040 Epoch: 0018 | TRAIN: 1.1794 0.1926 0.5853 TEST: 1.4896 0.1568 0.4949 Epoch: 0019 | TRAIN: 1.1606 0.1974 0.5917 TEST: 1.4416 0.1698 0.5186 Epoch: 0020 | TRAIN: 1.0995 0.2084 0.6145 TEST: 1.4340 0.1686 0.5167 Epoch: 0021 | TRAIN: 1.0474 0.2168 0.6341 TEST: 1.4271 0.1649 0.5252 Epoch: 0022 | TRAIN: 1.0047 0.2268 0.6466 TEST: 1.4728 0.1714 0.5108 Epoch: 0023 | TRAIN: 0.9398 0.2376 0.6708 TEST: 1.4875 0.1721 0.5255 Epoch: 0024 | TRAIN: 0.8777 0.2489 0.6930 TEST: 1.5140 0.1708 0.5050 Epoch: 0025 | TRAIN: 0.8138 0.2587 0.7159 TEST: 1.5678 0.1716 0.5222 Epoch: 0026 | TRAIN: 0.7687 0.2687 0.7326 TEST: 1.5570 0.1745 0.5309 Epoch: 0027 | TRAIN: 0.6881 0.2884 0.7633 TEST: 1.5163 0.1762 0.5360 Epoch: 0028 | TRAIN: 0.6309 0.2993 0.7832 TEST: 1.7214 0.1782 0.5246 Epoch: 0029 | TRAIN: 0.5857 0.3097 0.8008 TEST: 1.7705 0.1710 0.5105 Epoch: 0030 | TRAIN: 0.5333 0.3222 0.8184 TEST: 1.7390 0.1679 0.5236

lorenmt commented 4 years ago

Hi,

Yes. After I highly optimized the code for readability. I also observed some differences between the new results compared to the original one in the paper. However, the overall performance ranking for each method stays the same. (I have explained that in the readme.) So I suggest rerun the methods yourself if you want to compare with different architectures and training techniques.

Hope that helps/

xmasotto commented 4 years ago

Thanks for the quick response! I'd suggest making it a bit clearer in the README, since the discrepancy is a bit larger than what I'd describe as 'slightly different'.

lorenmt / mtan

Different Single-task results from the paper? #21