lululxvi / deepxde

A library for scientific machine learning and physics-informed learning
https://deepxde.readthedocs.io
GNU Lesser General Public License v2.1
2.63k stars 739 forks source link

batch_size not fully implemented in DataSet class #1706

Open Lucasas126 opened 5 months ago

Lucasas126 commented 5 months ago

Hello! I am working on a model to predict neutron irradiation on materials, which basically feeds on large amounts of data, since using equations is not straightforward for this kind of predictions. The issue I encountered was that any training based on the dde.data.DataSet class would be really slow with my data, even when I changed batch_size to smaller values, but this wasn't the case with small datasets as the one in examples (dataset.train and dataset.test). By further examining the Model class, I found that:

Original code found in DataSet:

def train_next_batch(self, batch_size=None):
        return self.train_x, self.train_y

Fix I used:

def train_next_batch(self, batch_size=None):
        if batch_size is None:
            return self.train_x, self.train_y
        indices = self.train_sampler.get_next(batch_size)
        return (
            self.train_x[indices],
            self.train_y[indices],
        )

Finally, if I can make a suggestion, it would be helpful to have a mode for this DataSet class that loads the data in batches, instead of reading the whole dataset, since this can bloat the memory and crash the kernel. I hope this helps, and thank you for the great work on this library, I am enjoying a lot learning from it!

praksharma commented 5 months ago

If you are not using equations you should try techniques suitable for purely data-driven surrogates. DeepXDE employs DeepONets for parametric learning. You could also try Fourier neural operators which is not available in DeepXDE.