titu1994 / pyshac

A Python library for the Sequential Halving and Classification algorithm
http://titu1994.github.io/pyshac/
MIT License
21 stars 7 forks source link

Split dealing with CSV from training and dataset cleaning in fit_dataset #9

Open KOLANICH opened 5 years ago

KOLANICH commented 5 years ago

Hi again.

I'm implementing resumpltion and metaoptimization in UniOpt, so for pyshac backend I need an API to inject points into the optimizer from memory: a) bulk injection (needed for resumation, may be less efficient since it is done rarely); b) individual point injection (needed for metaoptimization and should be efficient, so incremental learning).

I guess fit_dataset does this, but

  1. it deals with csv files, so I cannot use it as it is
  2. it does too much work, so I don't want to recreate it

It would be better if the stuff worked not only with arrays, but also with iterators.

titu1994 commented 5 years ago

I'm a bit confused as to what "API to inject points into the optimizer from memory" means.

Do you wish to add samples to the engine without going through the fit dataset procedure ?

KOLANICH commented 5 years ago

I want to add points (hyperparams, loss) into the engine while it is operating.

Most of optimizers work in the "predict n points with the model -> evaluate function in those points -> add the points and function values in these points to the model and get the model ready for the next iteration" loop

I want to add points to a model directly, without first 2 steps. This is meant to enable

titu1994 commented 5 years ago

So your request is to decouple the dataset loading, reading and shuffling inside fit_dataset into two distinct steps : dataset / generator manipulation and engine training so that you could in theory feed in numpy arrays or a dataset path and both operations would act in a similar manner.

Sounds feasible, but I won't be able to work on it for a few weeks.