Closed ebsmothers closed 1 week ago
AFK today but most likely culprit is this is a problem in core. What I chose to do in ao for now is pin to a specific pytorch version until we figure this out. The AO nightlies are working with a pinned version of torch. The main fishy error we saw in our CI had to do with fpx so @jerryzh168 can confirm when he comes into work https://github.com/pytorch/ao/issues/792
Ideally should fix this before making a relase cc @andrewor14
https://github.com/pytorch/pytorch/issues/135126 The offending PR has been reverted on main
Just coming back to this now. After the revert I think this should be good to close
Installing recent nightlies of PyTorch and ao is resulting in some CUDA device errors.
Installing nightlies from 8/30 there are no problems:
But with 8/31 nightlies, I see the following:
Note that if I remove the NF4Tensor import from the 8/31 case everything still works. Is this related to #790? If so, what's the recommendation? Just force installation of 8/29 PyTorch nightly? (This is relevant for our nightly builds as well)