noaa-oar-arl / monetio

The Model and ObservatioN Evaluation Tool I/O package
https://monetio.readthedocs.io
MIT License
17 stars 30 forks source link

Moving from OrderedDict to dict for sat L2 readers #204

Open blychs opened 1 month ago

blychs commented 1 month ago

Since Python 3.6, dicts are order-preserving. Dicts have better performance than OrderedDicts, and therefore keeping the satellite readers as OrderedDicts seems unnecessary. Moving into dicts would also avoid having to import collections. The performance difference in our case is negligible at best, but I think that it's still worth (slowly) pursuing and, at the very least, using dicts for future readers. One concern is that this change has to be done simultaneously for MONETIO and MELODIES-MONET

zmoon commented 1 month ago

Since Python 3.6, dicts are order-preserving.

It's only official since 3.7. So maybe after #170 would be best.

One concern is that this change has to be done simultaneously for MONETIO and MELODIES-MONET

Why is that? What in MM satellite wouldn't work if the reader returned a normal dict currently?

Another option could be to just return a list of datasets instead, with the date key as a dataset attr. A dict could still be created from this on the fly in MM if necessary.

blychs commented 1 month ago

Since Python 3.6, dicts are order-preserving.

It's only official since 3.7. So maybe after #170 would be best.

Sounds good

One concern is that this change has to be done simultaneously for MONETIO and MELODIES-MONET

Why is that? What in MM satellite wouldn't work if the reader returned a normal dict currently?

The tempo tool has some checks that use isinstance(..., OrderedDict) to act differently if it's just a Dataset or a dictionary. (it only loops if it's an OrderedDict). This will have to get updated to isinstance(..., dict).

Another option could be to just return a list of datasets instead, with the date key as a dataset attr. A dict could still be created from this on the fly in MM if necessary.

I like the idea of having a dict with the keys as timestamps, it makes it hashable and easy to access when, in the future, larger time periods are used. Although the fact that a list can be sliced is also nothing to scoff at...

blychs commented 1 month ago

Feel free to assign me this, and I can slowly start moving towards it, without merging it until #170 is done.

zmoon commented 1 month ago

The tempo tool has some checks that use isinstance(..., OrderedDict) to act differently if it's just a Dataset or a dictionary. (it only loops if it's an OrderedDict). This will have to get updated to isinstance(..., dict).

Note you could update that in MM anytime, since

>>> isinstance(OrderedDict(), dict)
True
blychs commented 1 month ago

Thanks! I was not aware of that.