why force max_length <= kMaxSequenceLength in beam_search_parameters.cc ?

yufang67 commented 1 week ago

Describe the issue

we currently use large max_length in beam search, but we got max_length <= kMaxSequenceLength error. I see the kMaxSequenceLength is hardcoded as 4096 in onnxruntime/contrib_ops/cpu/transformers/beam_search_parameters.cc. Is there any specific reason to set this ?

Thanks

To reproduce

use max_length =10048 for exportation and when running infer i got:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BeamSearch node. Name:'' Status Message: /onnxruntime_src/onnxruntime/contrib_ops/cpu/transformers/beam_search_parameters.cc:80 void onnxruntime::contrib::transformers::BeamSearchParameters::ParseFromInputs(onnxruntime::OpKernelContext*) max_length <= kMaxSequenceLength was false. max_length (10048) shall be no more than 4096

Urgency

No response

Platform

Linux

OS Version

ubuntu20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime_gpu-1.19.2 optimum-1.23.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

cuda12.1

tianleiwu commented 1 week ago

@yufang67, The limit can be increased like https://github.com/microsoft/onnxruntime/pull/14371. Could you try increase the threshold and test whether it is good?

yufang67 commented 6 days ago

Hi @tianleiwu , Thanks for suggestion. i use onnxruntime-gpu 1.19.2 for exportation and infer. This version should include the about changes, right ? (sorry, i put the wrong lib and version in the description).

ok, i make the change in beam_search_parameters.cc and recompile it.

I use beam_size=1 in exportation but during infer it seems use beam_search_parameters.cc (where kMaxSequenceLength=4096). How should i set to use gready_search during exportation ? Thanks

microsoft / onnxruntime