numericalalgorithmsgroup / dfbgn

Python solver for large-scale nonlinear least-squares minimization without derivatives
https://numericalalgorithmsgroup.github.io/dfbgn/
GNU General Public License v3.0
8 stars 2 forks source link

will it work for spares one hot data - only 0s and 1s in data #1

Open Sandy4321 opened 2 years ago

Sandy4321 commented 2 years ago

Hello Dr. Roberts great code and talk https://www.youtube.com/watch?v=RvEZURqfaC4

thank you very much

but will it work for big very sparse one hot data - only 0s and 1s in data

https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/ https://en.wikipedia.org/wiki/One-hot https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

by the way do you have print friendly for your presentation Derivative-free optimisation for least-squares problems

https://lindonroberts.github.io/talk/unsw_202004/roberts_unsw.pdf

for example word format ? or less simple slides to understand only idea or another introductory video...

Thanks in advance ...

lindonroberts commented 2 years ago

Hi, and thanks!

This code should work in general for any loss function that can be written as a sum of squares, so this probably should be fine with one-hot data: because your loss function is then $\min{w} \sum{i} (model(w,x_i) - y_i)^2$, where the $y_i$ targets are one-hot encoded (or some other sensible measure of discrepancy). If you can write your problem in this format, then DFBGN should be suitable.

If your problem is not large scale (e.g. <= 100 unknowns you want to optimize), then I would recommend DFO-LS.

Unfortunately there are not a lot of accessible resources on the topic, but depending on your background I would recommend:

Unfortunately I don't have a print-friendly version of the presentation you mention. That talk covered more the DFO-LS software, so you could look at the papers mentioned in the readme (and the online documentation) for more details. These would be print friendly

Sandy4321 commented 2 years ago

great thanks for soon answer the matter is

If your problem is not large scale (e.g. <= 100 unknowns you want to optimize), then I would recommend DFO-LS. usually one hot tabular data has huge scale and huge sparsity ( 90% of data are zeros and 10% are ones)
like 20000 features (unknowns ) and 100000 rows

would your code work in a such a case?

lindonroberts commented 2 years ago

No, I don't think DFO-LS would be the right choice for problems that large (it isn't able to make use of sparsity). However, you should be able to use this code (DFBGN) ok, it would just be a matter of picking the fixed_block input small enough.

Note that there is usually a tradeoff: larger fixed_block values will optimize quicker (i.e. fewer iterations/evaluations of the objective function), but each iteration will take longer to run. You should pick a value that seems to provide a good balance for your problem (I can't give good advice on that, but I have tried values of fixed_block as small as n/100, where n is the number of unknowns).