Open yufang67 opened 1 week ago
@yufang67, The limit can be increased like https://github.com/microsoft/onnxruntime/pull/14371. Could you try increase the threshold and test whether it is good?
Hi @tianleiwu , Thanks for suggestion. i use onnxruntime-gpu 1.19.2 for exportation and infer. This version should include the about changes, right ? (sorry, i put the wrong lib and version in the description).
ok, i make the change in beam_search_parameters.cc and recompile it.
I use beam_size=1 in exportation but during infer it seems use beam_search_parameters.cc (where kMaxSequenceLength=4096). How should i set to use gready_search during exportation ? Thanks
Describe the issue
we currently use large max_length in beam search, but we got max_length <= kMaxSequenceLength error. I see the kMaxSequenceLength is hardcoded as 4096 in onnxruntime/contrib_ops/cpu/transformers/beam_search_parameters.cc. Is there any specific reason to set this ?
Thanks
To reproduce
use max_length =10048 for exportation and when running infer i got:
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BeamSearch node. Name:'' Status Message: /onnxruntime_src/onnxruntime/contrib_ops/cpu/transformers/beam_search_parameters.cc:80 void onnxruntime::contrib::transformers::BeamSearchParameters::ParseFromInputs(onnxruntime::OpKernelContext*) max_length <= kMaxSequenceLength was false. max_length (10048) shall be no more than 4096
Urgency
No response
Platform
Linux
OS Version
ubuntu20.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime_gpu-1.19.2 optimum-1.23.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
cuda12.1