tensorrtserver.api.InferenceServerException: [ 0] expecting 1 invocations of SetRaw for input 'INPUT__0', one per batch entry

Description I am running the R2Plus1D video action recognition model on Triton inference server using a modified version of the _imageclient.py code. When I try to run the model on a sample video file, the first request works but it gives an error immediately afterwards:

Request 0, batch size 1
Video'sample_video.mp4':
    1 (Arrest) = 0.2605108618736267
    13 (Vandalism) = 0.23302550613880157
    3 (Assault) = 0.10149756073951721
    9 (Robbery) = 0.09012681990861893
    2 (Arson) = 0.08965301513671875

Traceback (most recent call last):
  File "video_client.py", line 337, in <module>
    FLAGS.batch_size))
  File "/usr/local/lib/python3.6/dist-packages/tensorrtserver/api/__init__.py", line 1591, in run
    inputs, outputs, flags, batch_size, corr_id, priority, timeout_us, contiguous_input)
  File "/usr/local/lib/python3.6/dist-packages/tensorrtserver/api/__init__.py", line 1293, in _prepare_request
    c_uint64(input_value.size * input_value.itemsize))))
  File "/usr/local/lib/python3.6/dist-packages/tensorrtserver/api/__init__.py", line 261, in _raise_if_error
    raise ex
tensorrtserver.api.InferenceServerException: [ 0] expecting 1 invocations of SetRaw for input 'INPUT__0', one per batch entry

I'm new to Triton and would appreciate an explanation of the error, as well as how to go about fixing it.

Triton Information Triton v1.13.0

Are you using the Triton container or did you build it yourself? Triton container from NGC (20.03.1-py3)

To Reproduce Include the R2Plus1D video action recognition model in a Triton server model repository, then modify the image client with the following code snippets:

_In parse_model:_

# Model input must have 4 dims
    if len(input.dims) != 4:
        raise Exception(
            "expecting input to have 4 dimensions, model '{}' input has {}".format(
                model_name, len(input.dims)))

if input.format == model_config.ModelInput.FORMAT_NHWC:
        h = input.dims[0]
        w = input.dims[1]
        c = input.dims[2]
    elif input.format == model_config.ModelInput.FORMAT_NCHW:
        c = input.dims[0]
        h = input.dims[1]
        w = input.dims[2]
    else:
        c = input.dims[0]
        h = input.dims[1]
        w = input.dims[2]

In main:

video_data = []
    for filename in filenames:
        video_chunks = []
        image_data = []
        vid = imageio.get_reader(filename,  'ffmpeg')
        for num, image in enumerate(vid):
            img = Image.fromarray(image)
            image_data.append(preprocess(img, format, dtype, c, h, w, FLAGS.scaling))
            if len(image_data) == 32:
                video_chunks.append(image_data)
                image_data = []

        video_data.append(video_chunks)

    results = []
    result_filenames = []
    request_ids = []
    video_idx = 0
    last_request = False
    user_data = UserData()
    sent_count = 0
    while not last_request:
        input_filenames = []
        input_batch = []
        for video in range(len(video_data)):
            chunk_idx = 0
            while chunk_idx < len(video_data[video]) - 1:
                print(chunk_idx)
                for idx in range(FLAGS.batch_size):
                    input_data = np.reshape(video_data[video][chunk_idx], (c, h, w, 32))
                    input_filenames.append(filenames[video])
                    input_batch.append(input_data)
                    chunk_idx += 1

                video_idx = (video_idx + 1) % len(video_data[video])
                if video_idx == 0:
                    last_request = True

                # Send request
                if not FLAGS.async_set:
                    results.append(ctx.run(
                        { input_name : input_batch },
                        { output_name : (InferContext.ResultFormat.CLASS, FLAGS.classes) },
                        FLAGS.batch_size))
                    result_filenames.append(input_filenames)
                else:
                    ctx.async_run(partial(completion_callback, input_filenames, user_data),
                                    { input_name :input_batch },
                                    { output_name : (InferContext.ResultFormat.CLASS, FLAGS.classes) },
                                    FLAGS.batch_size)
                    sent_count += 1

                for idx in range(len(results)):
                    print("Request {}, batch size {}".format(idx, FLAGS.batch_size))
                    postprocess(results[idx], result_filenames[idx], FLAGS.batch_size)

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

name: "r2plus1d_32"
platform: "pytorch_libtorch"
max_batch_size: 1
default_model_filename:"model.pt"
input [
  {
    name: "INPUT__0"
    data_type: TYPE_FP32
    dims: [ 3, 112, 112, 32 ]
  }
]
output [
  {
    name: "OUTPUT__0"
    data_type: TYPE_FP32
    dims: [14]
    label_filename: "labels.txt"
  }
]

# Specify GPU instance.
instance_group {
  count: 1
  gpus: [0, 1]
  kind: KIND_GPU
}

Expected behavior Inference should be performed on each temporal batch of the entire video clip.

triton-inference-server / server

tensorrtserver.api.InferenceServerException: [ 0] expecting 1 invocations of SetRaw for input 'INPUT__0', one per batch entry #1894