batch_size not fully implemented in DataSet class

Hello! I am working on a model to predict neutron irradiation on materials, which basically feeds on large amounts of data, since using equations is not straightforward for this kind of predictions. The issue I encountered was that any training based on the dde.data.DataSet class would be really slow with my data, even when I changed batch_size to smaller values, but this wasn't the case with small datasets as the one in examples (dataset.train and dataset.test). By further examining the Model class, I found that:

Firstly, the Model.batch_size parameter is not used, and even replaced by Model.train.batch_size in training
Secondly, when self.data.train_next_batch(self.batch_size)) is called, the data returned is exactly the same as the full input data. This comes from the DataSet class, having self.data.train_next_batch as a return of this full dataset, which should not be the case, since batch_size is passed to the function. Thankfully it's a simple fix, since a correct version of train_next_batch is defined in other Data classes such as Triple and can be easily adapted.

Original code found in DataSet:

def train_next_batch(self, batch_size=None):
        return self.train_x, self.train_y

Fix I used:

def train_next_batch(self, batch_size=None):
        if batch_size is None:
            return self.train_x, self.train_y
        indices = self.train_sampler.get_next(batch_size)
        return (
            self.train_x[indices],
            self.train_y[indices],
        )

Finally, if I can make a suggestion, it would be helpful to have a mode for this DataSet class that loads the data in batches, instead of reading the whole dataset, since this can bloat the memory and crash the kernel. I hope this helps, and thank you for the great work on this library, I am enjoying a lot learning from it!

lululxvi / deepxde

batch_size not fully implemented in DataSet class #1706