mllam / neural-lam

Neural Weather Prediction for Limited Area Modeling
MIT License
64 stars 24 forks source link

Modifying `WeatherDataset.__getitem__` to return timestamps #64

Open leifdenby opened 5 days ago

leifdenby commented 5 days ago

I'm making good progress on #54 and in going through it I noticed that @sadamov you modified the return signature of WeatherDataset.__getitem__ to also return batch_times (which is looks like are np.datetime64 converted to strings). I can see the use of this for fx being able to plot the input and predictions from the model with timestamps. I think if we want to be able to make these plots with timestamps we can avoid returning the time here too. I'm not sure about using strings thought...

What are your thoughts on this @sadamov and @joeloskarsson?

joeloskarsson commented 5 days ago

That is indeed the idea of including this in the batch.

I don't have any strong opinions on what format these should have, as long as they can be easily converted to np.datetime64. Can they just be kept np.datetime64? We don't need to turn these into pytorch object really, and there is no need to send them to the GPU.

Optimally we would not have to write a custom collate function (https://pytorch.org/docs/stable/data.html#working-with-collate-fn) for this, but just use the default one (https://github.com/pytorch/pytorch/blob/35c8f93fd238d42aaea8fd6e730c3da9e18257cc/torch/utils/data/dataloader.py#L196). I think it would be sufficient to just batch these up in a python list rather than something more fancy.

sadamov commented 4 days ago

Having them in a simple format available for plotting sounds good to me, no strong oppinion about which format exactly (currently it is indeed a list of strings). @leifdenby you once mentioned that you would rather keep track of the datetime in another fashion and remove the batch_times from __get_item__. Did you have another solution in mind already?