Open Sandy4321 opened 2 years ago
Hi, and thanks!
This code should work in general for any loss function that can be written as a sum of squares, so this probably should be fine with one-hot data: because your loss function is then $\min{w} \sum{i} (model(w,x_i) - y_i)^2$, where the $y_i$ targets are one-hot encoded (or some other sensible measure of discrepancy). If you can write your problem in this format, then DFBGN should be suitable.
If your problem is not large scale (e.g. <= 100 unknowns you want to optimize), then I would recommend DFO-LS.
Unfortunately there are not a lot of accessible resources on the topic, but depending on your background I would recommend:
Unfortunately I don't have a print-friendly version of the presentation you mention. That talk covered more the DFO-LS software, so you could look at the papers mentioned in the readme (and the online documentation) for more details. These would be print friendly
great thanks for soon answer the matter is
If your problem is not large scale (e.g. <= 100 unknowns you want to optimize), then I would recommend DFO-LS.
usually one hot tabular data has huge scale and huge sparsity ( 90% of data are zeros and 10% are ones)
like 20000 features (unknowns )
and
100000 rows
would your code work in a such a case?
No, I don't think DFO-LS would be the right choice for problems that large (it isn't able to make use of sparsity). However, you should be able to use this code (DFBGN) ok, it would just be a matter of picking the fixed_block
input small enough.
Note that there is usually a tradeoff: larger fixed_block
values will optimize quicker (i.e. fewer iterations/evaluations of the objective function), but each iteration will take longer to run. You should pick a value that seems to provide a good balance for your problem (I can't give good advice on that, but I have tried values of fixed_block
as small as n/100, where n is the number of unknowns).
Hello Dr. Roberts great code and talk https://www.youtube.com/watch?v=RvEZURqfaC4
thank you very much
but will it work for big very sparse one hot data - only 0s and 1s in data
https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/ https://en.wikipedia.org/wiki/One-hot https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
by the way do you have print friendly for your presentation Derivative-free optimisation for least-squares problems
https://lindonroberts.github.io/talk/unsw_202004/roberts_unsw.pdf
for example word format ? or less simple slides to understand only idea or another introductory video...
Thanks in advance ...