Closed douglasdennis closed 1 year ago
I'm a goober. Streaming data in won't help since all of the datasets are merged together into a single large one. Investigating #83 may help with OOMs (effectively cut the memory use by data in half). Another, less fun option, is to run through the pipeline and look for leaks.
Actually, I did find a way to stream in data. However, it requires the ensemble model to be aware of it. This is further evidence to do a custom ensembler.
This was found to no longer be an issue on HPC. Additionally, upstream dependencies appear to have reduced their own memory issue. Closing.
When there are too many coordinates for a catchment to train on then an OOM will happen. This is coming from
TrainingDataset
loading all training in at the beginning. We will need to stream training data in, one or two coordinates at a time instead of loading all of them at once in theTrainingDataset
class.