MNIST experiments creating qpth issues

locuslab / optnet

OptNet: Differentiable Optimization as a Layer in Neural Networks

Apache License 2.0

511 stars 75 forks source link

MNIST experiments creating qpth issues #4

Open guptakartik opened 7 years ago

guptakartik commented 7 years ago

Hi,

I was running the optnet code for MNIST classification with the default configurations for only 10 epochs. In the first couple of epochs I get the warning "qpth warning: Returning an inaccurate and potentially incorrect solutino" and in the subsequent iterations the loss becomes nan. Is there something obviously wrong with my configurations?

bamos commented 7 years ago

Hi, I just tried running the MNIST experiment and am hitting nans there too. It's been a while since I've ran that example and I've changed the qpth library since the MNIST experiment was last working. It looks like the solver's hitting some nans internally, causing the precision issue and bad gradients. For now you can try reverting to an older commit of qpth, one from around the time I last updated the MNIST example. I'll try to look into the internal solver issues soon.

-Brandon.

guptakartik commented 7 years ago

Thanks for the quick reply! I will try working with the older commit of qpth.

Xingyu-Lin commented 7 years ago

Hi, I tried most of the early versions of qpth but none of them works. They fail in various ways, mostly inside qpth. Could you check which version can work?

guptakartik commented 7 years ago

Hi Brandon, It would be really helpful if you could point us to the right version of qpth, since we have been unable to get it to work.

bamos commented 6 years ago

Hi, the nans were coming up in the backwards pass in qpth and I've pushed a fix to it here: https://github.com/locuslab/qpth/commit/e2cac495909159aae12461262d0ee540ddf9abd6

Here's the convergence of one of my new runs (I did modify z0 and s0 to be fixed, pull this from the latest version of this repo). For the loss being so jumpy, the LR should probably be bumped down:

Can you try running the training again with the latest versions of this repo and qpth?

-Brandon.