Closed sibyjackgrove closed 3 years ago
Currently, all the dataset needs to fit in memory.
You can (and it is a good idea in this case) to feed the dataset as a stream using a tf.dataset. See the dataset section of the migration guide for more details. However, the memory consumption will still be ~4bytes per values + index.
See my comments on this issues for some details on how to optimize the ram consumption.
My training data is in a multi GB CSV file. I have built a data pipeline using tf.data to stream this data and do some pre-processing,. Can I use these dataset objects in tfdf model.fit (similar to how it is done in Keras) or does tfdf need the dataset to have all the data stored in memory?