When running
python3 triton-inference.py --input "Paris is the [MASK] of France."
the following is returned:
Processing input...
Input processed.
Executing model...
Model executed.
Traceback (most recent call last):
File "/home/anthony/max/examples/inference/bert-python-torchscript/triton-inference.py", line 121, in <module>
main()
File "/home/anthony/max/examples/inference/bert-python-torchscript/triton-inference.py", line 104, in main
logits = torch.from_numpy(outputs[0, masked_index, :])
IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed
note that the run command was modified as per this ticket was in order to pass this issue here https://github.com/modularml/max/issues/181, although that should not affect this ticket
note running the python command directly instead of deploy script for same reason above, again should not make a difference
System information
- WSL 2
- Latest image `public.ecr.aws/modular/max-serving-de:latest`
- Instructions from https://docs.modular.com/max/serve/get-started
- Last commit SHA f2f49795d6f27039d24c14d92168c7892c3b2b49
- https://github.com/modularml/max/blob/main/examples/inference/bert-python-torchscript/deploy.sh
Bug description
When running
python3 triton-inference.py --input "Paris is the [MASK] of France."
the following is returned:Steps to reproduce
Follow instructions from https://docs.modular.com/max/serve/get-started with the system info mentioned in other sections
System information