Open Jack-Khuu opened 2 weeks ago
Note: Links to docs will display an error until the docs builds have been completed.
There are 1 currently active SEVs. If your PR is affected, please view them below:
As of commit 5b91d46657368cbd12ef8604bade7b4fe7480170 with merge base b809b69e03f8f4b75a4b27b0778f0d3695ce94c2 ():
* [pull / compile-gguf (macos-14)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365065718) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365065718)) `NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [MPS, Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].` * [pull / runner-aoti (macos-14-xlarge)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365068668) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365068668)) `torch._inductor.exc.CppCompileError: C++ compile error` * [pull / test-build-runner-et-android / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365069470) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365069470)) `RuntimeError: Command docker exec -t 5fe5264e2bb12c67eb6007a01a9abd59cb97c02184cdac2d71e4c468cb098000 /exec failed with exit code 1` * [pull / test-cpu-aoti (aarch64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365076405) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365076405)) `torch._inductor.exc.CppCompileError: C++ compile error` * [pull / test-cpu-aoti (x86_64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365075871) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365075871)) `NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].` * [pull / test-cpu-compile (aarch64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365077618) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365077618)) `CppCompileError: C++ compile error` * [pull / test-cpu-compile (x86_64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365076921) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365076921)) `NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].` * [pull / test-cpu-eval-sanity-check (aarch64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365077126) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365077126)) `CppCompileError: C++ compile error` * [pull / test-cpu-eval-sanity-check (x86_64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365076179) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365076179)) `NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].` * [pull / test-cpu-eval-sanity-check-float16 (aarch64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365077373) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365077373)) `Process completed with exit code 1.` * [pull / test-cpu-eval-sanity-check-float16 (x86_64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365076605) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365076605)) `NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].` * [pull / test-cpu-eval-sanity-check-float32 (aarch64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365077990) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365077990)) `Process completed with exit code 1.` * [pull / test-cpu-eval-sanity-check-float32 (x86_64, stories15M)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365077799) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365077799)) `NotImplementedError: Could not run 'aten::_convert_weight_to_int4pack' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_convert_weight_to_int4pack' is only available for these backends: [Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastMPS, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].` * [pull / test-gpu-aoti-bfloat16 (cuda, stories15M) / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365078648) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365078648)) `RuntimeError: Command docker exec -t dbd5f139e8f32cc1cda94796f44861d0d8d79a25301f51db1faecefdf770625d /exec failed with exit code 1` * [pull / test-gpu-aoti-float16 (cuda, stories15M) / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365078246) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365078246)) `RuntimeError: Command docker exec -t 3cf85dd23196fff6109be8949df7e81694400aa4908310d0f9b83bae7d89a1c0 /exec failed with exit code 1` * [pull / test-gpu-aoti-float32 (cuda, stories15M) / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365078451) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365078451)) `RuntimeError: Command docker exec -t 3c3789616283c48728d70b9bee8dd708a20fd4be6884b65bd3330041067f8f3f /exec failed with exit code 1` * [pull / test-gpu-compile (cuda, stories15M) / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365078825) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365078825)) `RuntimeError: Command docker exec -t f7ddcd2315031a765e2621a48995a301cdc8853662fe033c4f769114cda4b7d5 /exec failed with exit code 1` * [pull / test-gpu-eval-sanity-check (cuda, stories15M) / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365079000) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365079000)) `RuntimeError: Command docker exec -t d525897ce7b387275750040e0e9c21e13c0e5793bab6d6ce016bc69ea38a09bb /exec failed with exit code 1` * [pull / test-tinystories-executorch (macos-14-xlarge)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365069125) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365069125)) `fatal: unable to access 'https://review.mlplatform.org/ml/ethos-u/ethos-u-core-driver/': Failed to connect to review.mlplatform.org port 443 after 88 ms: Couldn't connect to server` * [pull / test-torchao-experimental (macos-14-xlarge)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365069310) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365069310)) `ninja: error: '/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/lib/libomp.dylib', needed by 'libtorchao_ops_aten.dylib', missing and no known rule to make it` * [Run parallel prefill / test-cuda / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365065024) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594086/job/33365065024)) `RuntimeError: Command docker exec -t 9f5f891ef29a961fd1a8f5a3dd3885f09828032f925e1b6c8a47783a91d96b4b /exec failed with exit code 1` * [Run the aoti runner with CUDA using stories / test-runner-aot-cuda / linux-job](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365065012) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594107/job/33365065012)) `RuntimeError: Command docker exec -t 93a34b4464330f1a020d28d0833d77c4407bcba4e6399c40c30e1e037661b0e3 /exec failed with exit code 1`
* [pull / runner-aoti (16-core-ubuntu)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365068029) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365068029)) `##[error]The operation was canceled.` * [pull / test-tinystories-executorch (16-core-ubuntu)](https://hud.pytorch.org/pr/pytorch/torchchat/1367#33365068813) ([gh](https://github.com/pytorch/torchchat/actions/runs/11967594108/job/33365068813))
This comment was automatically generated by Dr. CI and updates every 15 minutes.
Could not find a version that satisfies the requirement torchvision==0.20.0.dev20241111
this looks accurate; according to https://download.pytorch.org/whl/nightly/torchvision/ there are only windows builds for that day. 20241112 appears to have both linux and windows.
initial debugging shows the test-cpu-aoti segfault is within aoti_torch_cpu_cat, which is automatically generated by https://github.com/pytorch/pytorch/blob/7e86a7c0155295539996e0cf422883571126073e/torchgen/gen_aoti_c_shim.py . digging up the generated source now.
digging up the generated source now.
generated source looks OK. here's what doesn't look OK in the generated inductor .cpp file:
AtenTensorHandle buf0_handle;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_12, int_array_13, cached_torch_dtype_uint8, cached_torch_device_type_cpu, this->device_idx_, &buf0_handle));
RAIIAtenTensorHandle buf0(buf0_handle);
AtenTensorHandle buf1_handle;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_empty_strided(2, int_array_12, int_array_13, cached_torch_dtype_uint8, cached_torch_device_type_cpu, this->device_idx_, &buf1_handle));
RAIIAtenTensorHandle buf1(buf1_handle);
cpp_fused_div_remainder_0((const uint8_t*)(self___model_tok_embeddings__buffers__weight.data_ptr()), (uint8_t*)(buf0.data_ptr()), (uint8_t*)(buf1.data_ptr()));
// Topologically Sorted Source Nodes: [weight_unpacked], Original ATen: [aten.stack]
static constexpr int64_t int_array_0[] = {32000LL, 144LL, 1LL};
static constexpr int64_t int_array_1[] = {144LL, 1LL, 0LL};
auto tmp_tensor_handle_0 = reinterpret_tensor_wrapper(buf0, 3, int_array_0, int_array_1, 0LL);
auto tmp_tensor_handle_1 = reinterpret_tensor_wrapper(buf1, 3, int_array_0, int_array_1, 0LL);
const AtenTensorHandle var_array_0[] = {wrap_with_raii_handle_if_needed(tmp_tensor_handle_0), wrap_with_raii_handle_if_needed(tmp_tensor_handle_1)};
AtenTensorHandle buf3_handle;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu_cat(var_array_0, 2, -1LL, &buf3_handle));
The problem seems to be const AtenTensorHandle var_array_0[] = {wrap_with_raii_handle_if_needed(tmp_tensor_handle_0), wrap_with_raii_handle_if_needed(tmp_tensor_handle_1)};
-- this is creating RAIIATenTensorHandles, whose operator ATenTensorHandle
is immediately called, and then they're destroyed (which decrements the refcount), so the net effect is (I think) to create dangling ATenTensorHandles.
@desertfire any change the above is a quick fix for you?
actually we might just need https://github.com/pytorch/pytorch/pull/139411
no torchvision nightly again today. I'm guessing we could probably use torchvision from yesterday with torch from today?
I had issues with Vision nightlies requiring the corresponding PT nightly few weeks back, I'll give it another go
Update: yup, vision is strict; will need to wait again
_convert_weight_to_int4pack breakage appears to be from https://github.com/pytorch/pytorch/pull/139611; I guess it's now called _convert_weight_to_int4pack_for_cpu .
Best me to it; luckily AO has a fix so we'll need a bump there too: https://github.com/pytorch/ao/pull/1278
https://github.com/pytorch/pytorch/pull/139411 Also got reverted on pt/pt so that's fun
pytorch/pytorch#139411 Also got reverted on pt/pt so that's fun
pytorch/pytorch#139411 is relanded.
Need to bump everything cuda related: https://github.com/pytorch/pytorch/issues/140885
Best me to it; luckily AO has a fix so we'll need a bump there too: pytorch/ao#1278
also need to manually edit torchchat/utils/gguf_loader.py.
looks like that and spurious complaints about missing OMP on Mac are the two blockers left.
Accounts for:
weight_only
default fromFalse
toTrue
https://github.com/pytorch/torchchat/issues/1356export
toexport_for_training
https://github.com/pytorch/torchchat/pull/1319