Open VatsaDev opened 1 year ago
It seems like you are trying to load checkpoint from somwhere and it failed the check in
This is new to me as well, but is your checkpoint has anything to do with HPU?
Its a pytorch ckpt.pt file, its trying to load the checkpoint into a TPU, the checkpoint is from andrej karpathys llama2c repo. I made a toy 5 million params, it works on his code, with CPU and GPU, and the sample-XLA.py file im using, CPU is working, XLA TPU is failing.
❓ Questions and Help
Hi, was working on porting code to work with TPU's and the TRC, and was testing TPU VMs with kaggle
Working with https://github.com/pytorch/xla/blob/master/docs/pjrt.md, With Kaggle I consistently receive the error,
yet the xm xla device is
xla:0
and I do useos.environ['PJRT_DEVICE'] = 'TPU'
Kaggle NB(https://www.kaggle.com/code/vatsadev/notebook5e5db4afa5)
Whats the issue?