Closed JackyP closed 4 years ago
I was looking into tfdatasets, and this might help. tfdatasets allows to read from disk/sql/tfrecords etc. I have not found enough time to grasp how to utilize it, perhaps as a custom DataBackend? Thoughts?
I looked into this a little more today.
I have the following problem:
If we want to use mlr3
's resampling we basically need to have a list of train
|test
indices.
In the current keras setup, we would need a train and test generator.
For example if we now wanted to use CV10 on mnist
, I don't really know how to combine train/test/valid generator, split them according to cv10 and recombine them as needed.
I think a first approach would be to define a backend that just obtains a train
and a test
generator and thus basically only allows a "custom holdout".
I'd assume this is solved, seems to work.
For modelling out of core datasets exceeding memory size, it seems like mlr3db provides a suitable backend, and keras should in theory be a good model for online learning.
Reading the source code, it looks like the dataset is read into memory and so I would run out.
It should be possible to read it chunk by chunk and trained using keras::fit_generator. Can help if needed.