openclimatefix / PVNet

PVnet main repo
MIT License
21 stars 5 forks source link

Torch 2.5.0 crashes ocf-datapipes and fails testing #263

Closed AUdaltsova closed 1 month ago

AUdaltsova commented 1 month ago

Describe the bug

Getting this error from CI tests on a pull request:

    from torch.utils.data.datapipes.iter.combining import T_co, _ChildDataPipe, _ForkerIterDataPipe
E   ImportError: cannot import name 'T_co' from 'torch.utils.data.datapipes.iter.combining' (/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/torch/utils/data/datapipes/iter/combining.py)

CI is running with torch 2.5.0:

Requirement already satisfied: torch>=2.0.0 in /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages (from PVNet==3.0.59) (2.5.0)

As far as I can tell due to this commint in torch 2.5.0 that refactored const T_co to _T_co, causing import to fail. We can refactor ocf-datapipes in the future or just put the requirement torch < 2.5.0 in PVNet for now

The last CI checks that ran before mine ran into this were here and this run loaded torch 2.4.1:

Requirement already satisfied: torch>=2.0.0 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from PVNet==3.0.58) (2.4.1)
AUdaltsova commented 1 month ago

I've added the requirement to this PR and it started passing the CI tests (uses torch 2.4.1 now). I'm hesitant to refactor ocf-datapipes as I'm not sure what function these parts serve and the refactor should probably involve a bit more looking at the code, so what do we think about the requirement change for now? @peterdudfield @Sukh-P

Has occurred to me that even after refactor we'll need to change dependencies as depending on const names either 2.4.1- or 2.5.0+ are supported, they seem to be mutually exclusive on this

peterdudfield commented 1 month ago

Yea I think its ok to pin for the moment. You can put an future issue in to upgrade pvnet to 2.5.0, and see if someone else wants to do it

Sukh-P commented 1 month ago

Thanks for investigating this!

So my two cents is that this seems very much linked to the torch datapipes/ocf_datapipes and is only causing an issue in PVNet because of the current ocf_datapipes dependency which we plan to remove when ocf-data-sampler has feature parity with ocf_datapipes. Given this, would it be cleaner to pin torch in ocf_datapipes instead? In the short term it makes no difference as torch 2.4.1 would be chosen by the dependency resolver in PVNet but when ocf_datapipes is removed it means we don't have to change the torch pinning in PVNet again, happy with either way though

AUdaltsova commented 1 month ago

Thanks for investigating this!

So my two cents is that these seems very much linked to the torch datapipes/ocf_datapipes and is only causing an issue in PVNet because of the current ocf_datapipes dependency which we plan to remove when ocf-data-sampler has feature parity with ocf_datapipes. Given this, would it be cleaner to pin torch in ocf_datapipes instead? In the short term it makes no difference as torch 2.4.1 would be chosen by the dependency resolver in PVNet but when ocf_datapipes is removed it means we don't have to change the torch pinning in PVNet again, happy with either way though

Yes very good point, just saw that this was a copy-over to support datapipes tooling specifically, so yes, probably best to pinon ocf_datapipes as shoud not affect dsampler