[tracking] Eager execution support.

Pytorch v1.0 EagerMode

A small background from Pytorch's site on Eager Mode vs Graph/JIT/FX mode:

"PyTorch supports two execution modes [1]: eager mode and graph mode. In eager mode, operators in a model are immediately >executed as they are encountered. In contrast, in graph mode, operators are first synthesized into a graph, which will then be >compiled and executed as a whole. Eager mode is easier to use, more suitable for ML researchers, and hence is the default mode >of execution. On the other hand, graph mode typically delivers higher performance and hence is heavily used in production."

We'd like to introduce Pytorch v1.0's eager mode support on Turbine. To do such we'd need these features/tasks below:

[x] Base device and tensor.py to intercept _torchfunction (https://github.com/nod-ai/SHARK-Turbine/commit/d323a8119e3317133a061eb417952988c0c1d68e)
[x] Plumb through e2e compiler pipeline for computation _torchdispatch and _torchfunction (torch -> torch.fx -> mlir) with per session kernel caching. (validating here) - merged
[x] Setup Eager specific executable (validating here) -merged
[x] Refactoring to generate new DeviceTensor with existing device buffer to avoid moving buffer back to host to generate new DeviceTensor (validating here) depends on merged.
[x] Refactor EagerExecutable and compute_method(compiler pipeline + execution) pipeline to run with async-exec execution model. (validating here) - Merged.
[ ] Instead of the current API of "create device based on flags" we need an API that is "create device with these kwargs". (e.g for specifying task_topology_max_group_count)
[ ] Add support to GPU devices
- [ ] (:muscle:)Shore up cuda2 on IREE by rebasing and landing https://github.com/openxla/iree/pull/14525 https://github.com/openxla/iree/pull/14526 without this patch.
- [ ] (:muscle:)Shore up cuda2 on IREE by rebasing and landing https://github.com/openxla/iree/pull/14620
- [ ] (:muscle:)Plumb cuda2 backend through Turbine-Dynamo.
- [ ] (:muscle:)Develop Hip backend on IREE (based on cuda2 but refactored for ROCM).
- [ ] (:muscle:)Plumb Hip backend through Turbine-Dynamo
[ ] (:muscle:) Local kernel cache
[ ] (:muscle:) Plumb through/get autograd working with EagerMode.
[ ] (:muscle:) Support jit.script or jit.script like thing to be able to use less dispatches. (jit script source code)
[ ] (:muscle:) Refactor e2e compiler pipeline to use torch.compile().
[ ] (:muscle:) Add more substantial model examples and operator support.
[ ] (:muscle:) Add more substantial model examples and operator support.
[ ] (:muscle:) Add support for ops with mutliple output when we specified dims (i.e torch.max(t1, dim=1) or torch.topk(t1, dim =1), regular torch.max and torch.topk should work out of the box. With dims we should see this (error message)

:muscle: = help wanted

nod-ai / SHARK-ModelDev

[tracking] Eager execution support. #105

Pytorch v1.0 EagerMode