It has been long enough that I implemented the window adaptation so I can have a look at it again. The main points to investigate are:
Window adaptation is slow compared to sampling. While we expect a difference, it shouldn't be so important.
It feels like adaptation with < 100 steps (i.e. no window adaptation) gives better results than with more. It must be an issue with the window adaptation or the scheduler.
It has been long enough that I implemented the window adaptation so I can have a look at it again. The main points to investigate are: