will it work for spares one hot data - only 0s and 1s in data

Sandy4321 commented 2 years ago

Hello Dr. Roberts great code and talk https://www.youtube.com/watch?v=RvEZURqfaC4

thank you very much

but will it work for big very sparse one hot data - only 0s and 1s in data

https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/ https://en.wikipedia.org/wiki/One-hot https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

by the way do you have print friendly for your presentation Derivative-free optimisation for least-squares problems

https://lindonroberts.github.io/talk/unsw_202004/roberts_unsw.pdf

for example word format ? or less simple slides to understand only idea or another introductory video...

Thanks in advance ...

lindonroberts commented 2 years ago

Hi, and thanks!

This code should work in general for any loss function that can be written as a sum of squares, so this probably should be fine with one-hot data: because your loss function is then $\min{w} \sum{i} (model(w,x_i) - y_i)^2$, where the $y_i$ targets are one-hot encoded (or some other sensible measure of discrepancy). If you can write your problem in this format, then DFBGN should be suitable.

If your problem is not large scale (e.g. <= 100 unknowns you want to optimize), then I would recommend DFO-LS.

Unfortunately there are not a lot of accessible resources on the topic, but depending on your background I would recommend:

Book Numerical Optimization by Nocedal and Wright - it has an accessible introduction to least-squares problems and some basics of general derivative-free optimization methods
Book Derivative-Free and Blackbox Optimization by Audet and Hare - it has a more modern but still reasonably accessible introduction to derivative-free optimization (in particular the section on model-based methods)
A newer talk of mine which specifically covers the DFBGN method for large-scale problems. This was designed for a general numerical analysis audience
The first 2 chapters of my PhD thesis have some introductory material on derivative-free optimization and least-squares problems (but more technical than the above)
For full technical details, the paper associated with DFBGN is here

Unfortunately I don't have a print-friendly version of the presentation you mention. That talk covered more the DFO-LS software, so you could look at the papers mentioned in the readme (and the online documentation) for more details. These would be print friendly

Sandy4321 commented 2 years ago

great thanks for soon answer the matter is

If your problem is not large scale (e.g. <= 100 unknowns you want to optimize), then I would recommend DFO-LS. usually one hot tabular data has huge scale and huge sparsity ( 90% of data are zeros and 10% are ones)
like 20000 features (unknowns ) and 100000 rows

would your code work in a such a case?

lindonroberts commented 2 years ago

No, I don't think DFO-LS would be the right choice for problems that large (it isn't able to make use of sparsity). However, you should be able to use this code (DFBGN) ok, it would just be a matter of picking the fixed_block input small enough.

Note that there is usually a tradeoff: larger fixed_block values will optimize quicker (i.e. fewer iterations/evaluations of the objective function), but each iteration will take longer to run. You should pick a value that seems to provide a good balance for your problem (I can't give good advice on that, but I have tried values of fixed_block as small as n/100, where n is the number of unknowns).

numericalalgorithmsgroup / dfbgn

will it work for spares one hot data - only 0s and 1s in data #1