mlr-org / mlr3keras

Deep learning for mlr3
GNU Lesser General Public License v3.0
36 stars 3 forks source link

Feature request: Fit via generator #6

Closed JackyP closed 4 years ago

JackyP commented 5 years ago

For modelling out of core datasets exceeding memory size, it seems like mlr3db provides a suitable backend, and keras should in theory be a good model for online learning.

Reading the source code, it looks like the dataset is read into memory and so I would run out.

It should be possible to read it chunk by chunk and trained using keras::fit_generator. Can help if needed.

pfistfl commented 5 years ago

I was looking into tfdatasets, and this might help. tfdatasets allows to read from disk/sql/tfrecords etc. I have not found enough time to grasp how to utilize it, perhaps as a custom DataBackend? Thoughts?

pfistfl commented 4 years ago

I looked into this a little more today. I have the following problem: If we want to use mlr3's resampling we basically need to have a list of train|test indices.

In the current keras setup, we would need a train and test generator.

For example if we now wanted to use CV10 on mnist, I don't really know how to combine train/test/valid generator, split them according to cv10 and recombine them as needed.

I think a first approach would be to define a backend that just obtains a train and a test generator and thus basically only allows a "custom holdout".

pfistfl commented 4 years ago

I'd assume this is solved, seems to work.