pytorch / serve

Serve, optimize and scale PyTorch models in production
https://pytorch.org/serve/
Apache License 2.0
4.16k stars 843 forks source link

Cannot run the text_classification example #2947

Closed fredrik-jansson-se closed 6 months ago

fredrik-jansson-se commented 7 months ago

🐛 Describe the bug

Tryng to run the example as in the README:

python3 run_script.py

Error logs

Traceback (most recent call last): File "/Users/frja/dev/machine-learning/pytorch-org-tutorial/serve/examples/text_classification/train.py", line 143, in train(train_dataloader, model, optimizer, criterion, epoch) File "/Users/frja/dev/machine-learning/pytorch-org-tutorial/serve/examples/text_classification/train.py", line 49, in train torch.nn.utils.clip_gradnorm(model.parameters(), 0.1) File "/Users/frja/.local/share/virtualenvs/pytorch-org-tutorial-fA8BV59V/lib/python3.12/site-packages/torch/nn/utils/clip_grad.py", line 55, in clip_gradnorm norms.extend(torch._foreach_norm(grads, norm_type)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ NotImplementedError: Could not run 'aten::_foreach_norm.Scalar' with arguments from the 'SparseCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_foreach_norm.Scalar' is only available for these backends: [CPU, MPS, Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31357 [kernel] MPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:75 [backend fallback] Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/MetaFallbackKernel.cpp:23 [backend fallback] BackendSelect: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:154 [backend fallback] FuncTorchDynamicLayerBackMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:498 [backend fallback] Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:324 [backend fallback] Named: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Conjugate: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:17 [backend fallback] Negative: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:19 [backend fallback] ZeroTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback] ADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:86 [backend fallback] AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradHIP: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradVE: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradMTIA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] AutogradNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel] Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:17346 [kernel] AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:378 [backend fallback] AutocastCUDA: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:244 [backend fallback] FuncTorchBatched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:720 [backend fallback] BatchedNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:746 [backend fallback] FuncTorchVmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback] Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1075 [backend fallback] VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] FuncTorchGradWrapper: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:203 [backend fallback] PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:162 [backend fallback] FuncTorchDynamicLayerFrontMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:494 [backend fallback] PreDispatch: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:166 [backend fallback] PythonDispatcher: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:158 [backend fallback]

Traceback (most recent call last): File "/Users/frja/dev/machine-learning/pytorch-org-tutorial/serve/examples/text_classification/run_script.py", line 7, in subprocess.run(cmd, shell=True,check=True) File "/opt/homebrew/Cellar/python@3.12/3.12.2/Frameworks/Python.framework/Versions/3.12/lib/python3.12/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'python train.py AG_NEWS --device cpu --save-model-path model.pt --dictionary source_vocab.pt' returned non-zero exit status 1.

Installation instructions

pip3 show torchserve Name: torchserve Version: 0.9.0 Summary: TorchServe is a tool for serving neural net models for inference Home-page: https://github.com/pytorch/serve.git Author: PyTorch Serving team Author-email: noreply@noreply.com License: Apache License Version 2.0 Location: /Users/frja/.local/share/virtualenvs/pytorch-org-tutorial-fA8BV59V/lib/python3.12/site-packages Requires: packaging, Pillow, psutil, wheel Required-by:

pip3 show torch Name: torch Version: 2.2.0 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: /Users/frja/.local/share/virtualenvs/pytorch-org-tutorial-fA8BV59V/lib/python3.12/site-packages Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions Required-by: torchdata, torchtext, torchvision

pip3 show torchtext Name: torchtext Version: 0.16.2 Summary: Text utilities, models, transforms, and datasets for PyTorch. Home-page: https://github.com/pytorch/text Author: PyTorch Text Team Author-email: packages@pytorch.org License: BSD Location: /Users/frja/.local/share/virtualenvs/pytorch-org-tutorial-fA8BV59V/lib/python3.12/site-packages Requires: numpy, requests, torch, torchdata, tqdm Required-by:

Model Packaing

n/a

config.properties

No response

Versions

torchserve --version Removing orphan pid file. TorchServe Version is 0.9.0

Repro instructions

requirements.txt: certifi==2024.2.2; python_version >= '3.6' charset-normalizer==3.3.2; python_full_version >= '3.7.0' enum-compat==0.0.3 filelock==3.13.1; python_version >= '3.8' fsspec==2024.2.0; python_version >= '3.8' idna==3.6; python_version >= '3.5' jinja2==3.1.3; python_version >= '3.7' markupsafe==2.1.5; python_version >= '3.7' mpmath==1.3.0 networkx==3.2.1; python_version >= '3.9' numpy==1.26.4; python_version >= '3.9' packaging==23.2; python_version >= '3.7' pillow==10.2.0; python_version >= '3.8' portalocker==2.8.2; python_version >= '3.8' psutil==5.9.8; python_version >= '2.7' and python_version not in '3.0, 3.1, 3.2, 3.3, 3.4, 3.5' pyaml==23.12.0; python_version >= '3.8' pyyaml==6.0.1; python_version >= '3.6' requests==2.31.0; python_version >= '3.7' sympy==1.12; python_version >= '3.8' torch==2.2.0; python_full_version >= '3.8.0' torch-model-archiver==0.9.0 torch-workflow-archiver==0.2.11 torchdata==0.7.1; python_version >= '3.8' torchserve==0.9.0 torchtext==0.16.2; python_version >= '3.8' torchvision==0.17.0; python_version >= '3.8' tqdm==4.66.2; python_version >= '3.7' typing-extensions==4.9.0; python_version >= '3.8' urllib3==2.2.0; python_version >= '3.8' wheel==0.42.0; python_version >= '3.7'

cd serve/examples/text_classification python3 run_script.py

Possible Solution

No response

mreso commented 7 months ago

Thanks for reporting this! Can reproduce this on my end and will have a look into it.

fredrik-jansson-se commented 7 months ago

Hi Matthias,

thanks you! Sorry I can't be of any help myself, a total ML newbie.

mreso commented 7 months ago

Hi @fredrik-jansson-se seems like for some reason the operator is not implemented for sparse tensors anymore (It was in the past). Will try to dig deeper into this later. To get you unblocked you can just flip sparse to False here: https://github.com/pytorch/serve/blob/cd52683c2e9e334a6e9eaf6985f7c8cf545f5cbe/examples/text_classification/model.py#L22

fredrik-jansson-se commented 7 months ago

Awesome, thank you!