nod-ai / shark-ai

SHARK Inference Modeling and Serving
Apache License 2.0
12 stars 25 forks source link

(sharktank) Python3.12+ not supported for torch.compile #349

Open IanNod opened 2 weeks ago

IanNod commented 2 weeks ago

When getting setup with shortfin I see several sharktank test failures due to the Python3.12 version shortfin requires.

To recreate following setup instructions on https://github.com/nod-ai/SHARK-Platform/tree/main?tab=readme-ov-file#development-getting-started with python3.12

pytest sharktank/tests/ops/ops_test.py

Errors with

>       ep = torch.export.export(my_module, (a, d, qs, m))

sharktank/tests/ops/ops_test.py:289:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
venv/lib/python3.12/site-packages/torch/export/__init__.py:174: in export
    return _export(
venv/lib/python3.12/site-packages/torch/export/_trace.py:635: in wrapper
    raise e
venv/lib/python3.12/site-packages/torch/export/_trace.py:618: in wrapper
    ep = fn(*args, **kwargs)
venv/lib/python3.12/site-packages/torch/export/exported_program.py:83: in wrapper
    return fn(*args, **kwargs)
venv/lib/python3.12/site-packages/torch/export/_trace.py:860: in _export
    gm_torch_level = _export_to_torch_ir(
venv/lib/python3.12/site-packages/torch/export/_trace.py:347: in _export_to_torch_ir
    gm_torch_level, _ = torch._dynamo.export(
venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:1202: in inner
    check_if_dynamo_supported()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    def check_if_dynamo_supported():
        if sys.version_info >= (3, 12):
>           raise RuntimeError("Python 3.12+ not yet supported for torch.compile")
E           RuntimeError: Python 3.12+ not yet supported for torch.compile

venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py:593: RuntimeError
stellaraccident commented 2 weeks ago

We should relax that to 3.10 to 3.13 and at least add unit test coverage for the older versions.

We'll want to use the newer versions for real runs because it is faster.

marbre commented 2 weeks ago

torch.compile is supported for Python 3.12 since torch 2.4.0, see https://dev-discuss.pytorch.org/t/torch-compile-support-for-python-3-12-completed/2054. We explicitly pin for the CPU https://github.com/nod-ai/SHARK-Platform/blob/c6c7321e35aae8e3385d8232b51074e58becd053/pytorch-cpu-requirements.txt#L3 whereas we do not pin for the GPU https://github.com/nod-ai/SHARK-Platform/blob/c6c7321e35aae8e3385d8232b51074e58becd053/pytorch-rocm-requirements.txt#L3

marbre commented 2 weeks ago

shortfin is now also usable and tested with Python 3.11 but tests hang for Python 3.10 (therefore it is not enabled in the CI).

IanNod commented 2 weeks ago

I'm assuming we are pinning torch to 2.3 due to previous issues with iree-turbine. That should be resolved now so I think we can bump torch without hopefully too much issue now.

marbre commented 2 weeks ago

I'm assuming we are pinning torch to 2.3 due to previous issues with iree-turbine. That should be resolved now so I think we can bump torch without hopefully too much issue now.

In the CI workflow iree-turbine's unit tests recently passed with torch==2.5.1 (see the logs here). Within a quick test, I bumped torch to 2.4.1 or 2.5.1 as part of https://github.com/nod-ai/SHARK-Platform/pull/372 but this failed (see the logs here).

marbre commented 2 days ago

I've filled #495, with regards to test failures that occur for newer torch versions.