openopt / copt

A Python library for mathematical optimization
http://openo.pt/copt
Other
135 stars 35 forks source link

Trouble reproducing Stochastic FW benchmarks #100

Open busycalibrating opened 1 year ago

busycalibrating commented 1 year ago

Hello!

I'm trying play around with the stochastic Frank-Wolfe benchmarks you have in this example and I'm having trouble reproducing the figure by running the provided example. The expected results are as follows:

image

However when I run the script out of the box, the first thing to note is that it takes a very long time; for 10 iters, these are the timing results:

CPU times: 
     user 16min 12s
     sys: 7min 29s
     total: 23min 42s
Wall time: 12min 25s

Perhaps this is fine seeing as ~10 steps should be sufficient (this would correspond to ~10^8 gradient evaluations which would match the figure in the example), but then the problem comes with the actual convergence results; if I set max_iter=500 (vs 1e4) and let it run for the full 2h x 3 runs, I get worse results:

image

In particular, after 10^8 gradient evaluations, the relative suboptimality

Some additional info that may be useful:

If useful, we can play around with the example in a colab notebook.

Thanks for any help!

fabianp commented 1 year ago

Thanks for the report. For the timing, having numba installed can help. In your notebook you can install it with "!pip install numpy scipy numba" . However, I can see from the example site (https://openopt.github.io/copt/auto_examples/frank_wolfe/plot_sfw_real_data.html#sphx-glr-download-auto-examples-frank-wolfe-plot-sfw-real-data-py) that it anyway took 47 min, so I don't expect numba to change the runtime too much.

For the different suboptimality I don't have an answer. If I understand correctly, the issue is that for more iterations one gets worse FW gap and suboptimality? It could also just be that the example in our website is out of date. I'll try to re-generate it just in case

Any ideas @GeoffNN ?

busycalibrating commented 1 year ago

Thanks for the response! I'll note that I did try using numba on my local device and it was taking even longer, even after a full run through one epoch. I haven't looked into this much so this may be a versioning issue though.

fabianp commented 1 year ago

@GeoffNN : any idea why it's so slow? Could it be related to the use of sparse matrices?

busycalibrating commented 1 year ago

Were you able to reproduce the figures in the doc/paper by the way? Or were you getting similar results to mine?

fabianp commented 1 year ago

I'm building the doc locally to see. I'm traveling at the moment so it might take a while. Hopefully by tomorrow we'll know

On Tue, Jan 24, 2023, 17:48 busycalibrating @.***> wrote:

Were you able to reproduce the figures in the doc/paper by the way? Or were you getting similar results to mine?

— Reply to this email directly, view it on GitHub https://github.com/openopt/copt/issues/100#issuecomment-1402260442, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACDZB6XSP2HKIW2CMK6Z3TWUABUFANCNFSM6AAAAAAUB6T4ZA . You are receiving this because you commented.Message ID: @.***>

busycalibrating commented 1 year ago

Sounds great, thank you for your help!

GeoffNN commented 1 year ago

Hi!

I'm not sure what's going on... Can you please try the version on the branch geoff/plotsSFW? , and let me know if it changes anything?

fabianp commented 1 year ago

out of curiosity @busycalibrating , have you tried running the example locally? It could also be that colab's free tier has caps CPU speed or other resources

busycalibrating commented 1 year ago

Yes, I tried it both locally (both linux and an m1 mac) and on a compute cluster. I think the actual execution time seems to be okay; it looks like the script should take ~10 iters (epochs?) which translates to ~10^8 gradient evaluations, and roughly ~45m for the script which is in line with what the doc says. It's the convergence rate that's confusing me more.

@GeoffNN I'll give it a try!

GeoffNN commented 1 year ago

Someone had reached out to me on a different channel, and had a similar issue. I couldn't identify exactly what was going on, but it seems like the code on main runs less iterations than the code on geoff/plotsSFW. I think it may be a x-axis issue with the plotting, rather than a problem with the algorithm itself. They managed to reproduce with the branch code. Sorry, I never got around to identifying exactly what was going on, since diffing the files didn't expose the problem.

The speed part is more surprising, and may be a Colab issue as @fabianp is mentioning.

busycalibrating commented 1 year ago

Update: it seems like the branch you pointed me to is working; speed wise it's much faster, and with numba it's hitting ~2 iter/s so I think the older version of the package works as intended.

I also ran the code up to 500 epochs (~5m with numba), and it now translates to about 10^7 steps. Furthermore, the convergence rate looks consistent with the old figures in the documentation and paper, so I think there must be a bug in the latest version somewhere.

image

GeoffNN commented 1 year ago

Awesome, I hope this fixes your issue for now. Leaving the issue open to remind myself to double check and merge again.

On Tue 24 Jan 2023 at 15:28 busycalibrating @.***> wrote:

Update: it seems like the branch you pointed me to is working better; speed wise it's much faster, and with numba it's hitting ~2 iter/s so I think the older version of the package works as intended.

I also ran the code up to 500 epochs (~5m with numba), and it now translates to about 10^7 steps. Furthermore, the convergence rate looks consistent with the old figures in the documentation and paper, so I think there must be a bug in the latest version.

[image: image] https://user-images.githubusercontent.com/19653980/214443610-76c82057-61e3-4dad-a1ae-a40c21a8111a.png

— Reply to this email directly, view it on GitHub https://github.com/openopt/copt/issues/100#issuecomment-1402824679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2EQTMHFKWHMCPRC5XH2E3WUBQQBANCNFSM6AAAAAAUB6T4ZA . You are receiving this because you were mentioned.Message ID: @.***>

--

Geoffrey Négiar PhD Candidate - UC Berkeley, EECS departmentX2013