pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)
https://pytorch.org/xla
Other
2.5k stars 482 forks source link

Model support for `maml` with Torch_XLA2 #8150

Open ManfeiBai opened 1 month ago

ManfeiBai commented 1 month ago

Fix the model test for maml.py

  1. setup env according to Run a model under torch_xla2
  2. Run model test under run_torchbench/ with python models/your_target_model_name.py
  3. Fix the failure.

Please refer to this guide as guide to fix:

Also refer to these PRs:

barney-s commented 3 weeks ago

missing aten:: convolution_backward lowering.

 % JAX_ENABLE_X64=true JAX_PLATFORMS=cpu python models/maml.py         
/usr/local/google/home/barni/miniconda3/envs/diffusion-models-2/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:337: UserWarning: Device capability of jax unspecified, assuming `cpu` and `cuda`. Please specify it via the `devices` argument of `register_backend`.
  warnings.warn(
Traceback (most recent call last):
  File "/usr/local/google/home/barni/workspace/pytorch-tpu/run_torchbench/models/maml.py", line 61, in <module>
    sys.exit(main())
  File "/usr/local/google/home/barni/workspace/pytorch-tpu/run_torchbench/models/maml.py", line 39, in main
    xla2_ans = model(*example)
  File "/usr/local/google/home/barni/miniconda3/envs/diffusion-models-2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/google/home/barni/miniconda3/envs/diffusion-models-2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/google/home/barni/workspace/pytorch-tpu/run_torchbench/benchmark/torchbenchmark/models/maml/meta.py", line 59, in forward
    return self.forward_train(x_spt, y_spt, x_qry, y_qry)
  File "/usr/local/google/home/barni/workspace/pytorch-tpu/run_torchbench/benchmark/torchbenchmark/models/maml/meta.py", line 85, in forward_train
    grad = torch.autograd.grad(loss, self.net.parameters())
  File "/usr/local/google/home/barni/miniconda3/envs/diffusion-models-2/lib/python3.10/site-packages/torch/autograd/__init__.py", line 445, in grad
    return handle_torch_function(
  File "/usr/local/google/home/barni/miniconda3/envs/diffusion-models-2/lib/python3.10/site-packages/torch/overrides.py", line 1719, in handle_torch_function
    result = mode.__torch_function__(public_api, types, args, kwargs)
  File "/usr/local/google/home/barni/workspace/pytorch/xla/experimental/torch_xla2/torch_xla2/tensor.py", line 215, in __torch_function__
    return func(*args, **(kwargs or {}))
  File "/usr/local/google/home/barni/miniconda3/envs/diffusion-models-2/lib/python3.10/site-packages/torch/autograd/__init__.py", line 496, in grad
    result = _engine_run_backward(
  File "/usr/local/google/home/barni/miniconda3/envs/diffusion-models-2/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/local/google/home/barni/workspace/pytorch/xla/experimental/torch_xla2/torch_xla2/tensor.py", line 230, in __torch_dispatch__
    return self.env.dispatch(func, types, args, kwargs)
  File "/usr/local/google/home/barni/workspace/pytorch/xla/experimental/torch_xla2/torch_xla2/tensor.py", line 413, in dispatch
    raise OperatorNotFound(
torch_xla2.tensor.OperatorNotFound: Operator with name aten::convolution_backward has no lowering