pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile
BSD 3-Clause "New" or "Revised" License
3.4k stars 225 forks source link

Bump PyTorch pin to 20241112 #1367

Open Jack-Khuu opened 2 weeks ago

Jack-Khuu commented 2 weeks ago

Accounts for:

pytorch-bot[bot] commented 2 weeks ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1367

Note: Links to docs will display an error until the docs builds have been completed.

:heavy_exclamation_mark: 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

:x: 22 New Failures, 2 Cancelled Jobs

As of commit 5b91d46657368cbd12ef8604bade7b4fe7480170 with merge base b809b69e03f8f4b75a4b27b0778f0d3695ce94c2 (image):

NEW FAILURES - The following jobs have failed:

* [pull / compile-gguf (macos-14)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365065718) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365065718)) `NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [MPS, Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].` * [pull / runner-aoti (macos-14-xlarge)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365068668) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365068668)) `torch._inductor.exc.CppCompileError: C++ compile error` * [pull / test-build-runner-et-android / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365069470) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365069470)) `RuntimeError: Command docker exec -t 5fe5264e2bb12c67eb6007a01a9abd59cb97c02184cdac2d71e4c468cb098000 /exec failed with exit code 1` * [pull / test-cpu-aoti (aarch64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365076405) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365076405)) `torch._inductor.exc.CppCompileError: C++ compile error` * [pull / test-cpu-aoti (x86_64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365075871) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365075871)) `NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].` * [pull / test-cpu-compile (aarch64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365077618) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365077618)) `CppCompileError: C++ compile error` * [pull / test-cpu-compile (x86_64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365076921) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365076921)) `NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].` * [pull / test-cpu-eval-sanity-check (aarch64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365077126) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365077126)) `CppCompileError: C++ compile error` * [pull / test-cpu-eval-sanity-check (x86_64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365076179) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365076179)) `NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].` * [pull / test-cpu-eval-sanity-check-float16 (aarch64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365077373) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365077373)) `Process completed with exit code 1.` * [pull / test-cpu-eval-sanity-check-float16 (x86_64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365076605) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365076605)) `NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].` * [pull / test-cpu-eval-sanity-check-float32 (aarch64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365077990) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365077990)) `Process completed with exit code 1.` * [pull / test-cpu-eval-sanity-check-float32 (x86_64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365077799) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365077799)) `NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].` * [pull / test-gpu-aoti-bfloat16 (cuda, stories15M) / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365078648) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365078648)) `RuntimeError: Command docker exec -t dbd5f139e8f32cc1cda94796f44861d0d8d79a25301f51db1faecefdf770625d /exec failed with exit code 1` * [pull / test-gpu-aoti-float16 (cuda, stories15M) / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365078246) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365078246)) `RuntimeError: Command docker exec -t 3cf85dd23196fff6109be8949df7e81694400aa4908310d0f9b83bae7d89a1c0 /exec failed with exit code 1` * [pull / test-gpu-aoti-float32 (cuda, stories15M) / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365078451) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365078451)) `RuntimeError: Command docker exec -t 3c3789616283c48728d70b9bee8dd708a20fd4be6884b65bd3330041067f8f3f /exec failed with exit code 1` * [pull / test-gpu-compile (cuda, stories15M) / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365078825) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365078825)) `RuntimeError: Command docker exec -t f7ddcd2315031a765e2621a48995a301cdc8853662fe033c4f769114cda4b7d5 /exec failed with exit code 1` * [pull / test-gpu-eval-sanity-check (cuda, stories15M) / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365079000) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365079000)) `RuntimeError: Command docker exec -t d525897ce7b387275750040e0e9c21e13c0e5793bab6d6ce016bc69ea38a09bb /exec failed with exit code 1` * [pull / test-tinystories-executorch (macos-14-xlarge)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365069125) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365069125)) `fatal: unable to access 'https://review.mlplatform.org/ml/ethos-u/ethos-u-core-driver/': Failed to connect to review.mlplatform.org port 443 after 88 ms: Couldn't connect to server` * [pull / test-torchao-experimental (macos-14-xlarge)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365069310) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365069310)) `ninja: error: '/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/lib/libomp.dylib', needed by 'libtorchao_ops_aten.dylib', missing and no known rule to make it` * [Run parallel prefill / test-cuda / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365065024) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594086/job/33365065024)) `RuntimeError: Command docker exec -t 9f5f891ef29a961fd1a8f5a3dd3885f09828032f925e1b6c8a47783a91d96b4b /exec failed with exit code 1` * [Run the aoti runner with CUDA using stories / test-runner-aot-cuda / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365065012) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594107/job/33365065012)) `RuntimeError: Command docker exec -t 93a34b4464330f1a020d28d0833d77c4407bcba4e6399c40c30e1e037661b0e3 /exec failed with exit code 1`

CANCELLED JOBS - The following jobs were cancelled. Please retry:

* [pull / runner-aoti (16-core-ubuntu)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365068029) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365068029)) `##[error]The operation was canceled.` * [pull / test-tinystories-executorch (16-core-ubuntu)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365068813) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365068813))

This comment was automatically generated by Dr. CI and updates every 15 minutes.

swolchok commented 2 weeks ago

Could not find a version that satisfies the requirement torchvision==0.20.0.dev20241111

this looks accurate; according to https://download.pytorch.org/whl/nightly/torchvision/ there are only windows builds for that day. 20241112 appears to have both linux and windows.

swolchok commented 2 weeks ago

initial debugging shows the test-cpu-aoti segfault is within aoti_torch_cpu_cat, which is automatically generated by https://github.com/pytorch/pytorch/blob/7e86a7c0155295539996e0cf422883571126073e/torchgen/gen_aoti_c_shim.py . digging up the generated source now.

swolchok commented 2 weeks ago

digging up the generated source now.

generated source looks OK. here's what doesn't look OK in the generated inductor .cpp file:

    AtenTensorHandle buf0_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_12, int_array_13, cached_torch_dtype_uint8, cached_torch_device_type_cpu, this->device_idx_, &buf0_handle));
    RAIIAtenTensorHandle buf0(buf0_handle);
    AtenTensorHandle buf1_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_12, int_array_13, cached_torch_dtype_uint8, cached_torch_device_type_cpu, this->device_idx_, &buf1_handle));
    RAIIAtenTensorHandle buf1(buf1_handle);
    cpp_fused_div_remainder_0((const uint8_t*)(self___model_tok_embeddings__buffers__weight.data_ptr()), (uint8_t*)(buf0.data_ptr()), (uint8_t*)(buf1.data_ptr()));
    // Topologically Sorted Source Nodes: [weight_unpacked], Original ATen: [aten.stack]
    static constexpr int64_t int_array_0[] = {32000LL, 144LL, 1LL};
    static constexpr int64_t int_array_1[] = {144LL, 1LL, 0LL};
    auto tmp_tensor_handle_0 = reinterpret_tensor_wrapper(buf0, 3, int_array_0, int_array_1, 0LL);
    auto tmp_tensor_handle_1 = reinterpret_tensor_wrapper(buf1, 3, int_array_0, int_array_1, 0LL);
    const AtenTensorHandle var_array_0[] = {wrap_with_raii_handle_if_needed(tmp_tensor_handle_0), wrap_with_raii_handle_if_needed(tmp_tensor_handle_1)};
    AtenTensorHandle buf3_handle;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_cat(var_array_0, 2, -1LL, &buf3_handle));

The problem seems to be const AtenTensorHandle var_array_0[] = {wrap_with_raii_handle_if_needed(tmp_tensor_handle_0), wrap_with_raii_handle_if_needed(tmp_tensor_handle_1)}; -- this is creating RAIIATenTensorHandles, whose operator ATenTensorHandle is immediately called, and then they're destroyed (which decrements the refcount), so the net effect is (I think) to create dangling ATenTensorHandles.

swolchok commented 2 weeks ago

@desertfire any change the above is a quick fix for you?

swolchok commented 2 weeks ago

actually we might just need https://github.com/pytorch/pytorch/pull/139411

swolchok commented 2 weeks ago

no torchvision nightly again today. I'm guessing we could probably use torchvision from yesterday with torch from today?

Jack-Khuu commented 2 weeks ago

I had issues with Vision nightlies requiring the corresponding PT nightly few weeks back, I'll give it another go

Update: yup, vision is strict; will need to wait again

swolchok commented 2 weeks ago

_convert_weight_to_int4pack breakage appears to be from https://github.com/pytorch/pytorch/pull/139611; I guess it's now called _convert_weight_to_int4pack_for_cpu .

Jack-Khuu commented 2 weeks ago

Best me to it; luckily AO has a fix so we'll need a bump there too: https://github.com/pytorch/ao/pull/1278

Jack-Khuu commented 2 weeks ago

https://github.com/pytorch/pytorch/pull/139411 Also got reverted on pt/pt so that's fun

desertfire commented 1 week ago

pytorch/pytorch#139411 Also got reverted on pt/pt so that's fun

pytorch/pytorch#139411 is relanded.

Jack-Khuu commented 1 week ago

Need to bump everything cuda related: https://github.com/pytorch/pytorch/issues/140885

swolchok commented 6 days ago

Best me to it; luckily AO has a fix so we'll need a bump there too: pytorch/ao#1278

also need to manually edit torchchat/utils/gguf_loader.py.

looks like that and spurious complaints about missing OMP on Mac are the two blockers left.