issue with warmup baseline

wouterkool / attention-learn-to-route

Attention based model for learning to solve different routing problems

MIT License

1.04k stars 337 forks source link

issue with warmup baseline #51

Closed LTluttmann closed 5 months ago

LTluttmann commented 1 year ago

Hi Wouter,

first, big thanks for making the code available! I have a comment regarding the WarmupBaseline. As to my understanding, it should simply return the evaluation results of the "normal" baseline (self.baseline) once the number of warmup epochs is exceeded. However, the alpha attribute takes on values larger than 1 for all subsequent epochs: https://github.com/wouterkool/attention-learn-to-route/blob/6dbad47a415a87b5048df8802a74081a193fe240/reinforce_baselines.py#L64-L69

which causes the respective if-else statement to fail: https://github.com/wouterkool/attention-learn-to-route/blob/6dbad47a415a87b5048df8802a74081a193fe240/reinforce_baselines.py#L53-L62

So, I would propose either a clipping of the alpha attribute to a maximum value of 1 in the epoch_callback function or changing the respective if-statement to if self.alpha >= 1

wouterkool commented 1 year ago

Hmm great catch! Did you debug that it is actually used since that would suggest weird values for the baseline whereas the code gives good results. Did you validate the results after clipping or changing the if statement? Additionally, it also seems line 62 should be:

return self.alpha * v + (1 - self.alpha) * vw, self.alpha * l + (1 - self.alpha) * lw

instead of (incorrect brackets):

return self.alpha * v + (1 - self.alpha) * vw, self.alpha * l + (1 - self.alpha * lw)

Please let me know if you have results after fixing this, if they don't make the code worse I'm happy to merge a PR which fixes this.

LTluttmann commented 1 year ago

Hi Wouter, thanks for your reply! That is indeed weird. I just debugged the code again and when using the Greedy-rollout baseline (also verified that bl_warmup_epochs=1), self.alpha indeed takes on values larger than 1 once the number of warmup epochs is exceeded:

So, I started two training runs, one with alpha bounded to 1 and another where everything was kept as is. It showed, that not clipping the alpha leads to strongly decreasing baseline values and - as a result - to a growing loss that is backpropagated. Here are the tensorboard plots showing the baseline value, the REINFORCE loss and the gradient norm respectively (yellow lines show the run with the clipped alpha, the pink lines the status quo).

Clipping the alpha fixes these issues as can be seen from the plots, leading to much better validation performance:

Here are the full log files: results.zip

LTluttmann commented 6 months ago

Hi @wouterkool, did you already manage to have a look at this?

wouterkool commented 5 months ago

Thanks a lot. Unfortunately I have no time to test it myself, however it seems that the fix you propose is correct, and as you tested it, I have merged it. I still don't understand why it did not cause any problems previously though...