Open Patataman opened 1 month ago
Try one of the example we have in https://github.com/pytorch/xla/tree/master/examples maybe. Mnist is kind of outdated. I would expect you to see difference with decoder only model and resnet.
Hello, I tried MNIST because it was in one of the examples in https://github.com/pytorch/xla/blob/master/docs/amp.md I'll take a look to it asap
I finally was able to try it with other models. Using torch.cuda.reset_peak_memory_stats()
and torch.cuda.max_memory_allocated()
I can see a memory usage reduction, but I am not completely sure if it is ok to use that to track xla device (with gpu) memory usage. If so, I think everything is ok and the problem was to use MNIST.
Hello, I am trying to evaluate the impact of XLA in our models but before that I want to be sure that I know how to adapt our code and execute XLA models without problem.
GPU: Nvidia 4090 GTX 24GB Cuda 12.2
I have been trying a simple model with MNIST
And I haven't see any performance improvement, at best the execution time is the same. I thought that maybe the model was being recompiled too many times or something, so I followed https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md
Metrics are
But as you can see the model was compiled very few times and there are no context switch.
Is this behavior expected when working with 1 GPU? Or XLA should give some improvement also in this case and not only with multiple devices? Maybe the model is too simple? I couldn't find info related to performance when working with 1 GPU
Thanks