Not able to run 'Generate' from QuickStart section

Hi,

I have docker pull huggingface/optimum-nvidia and then started docker container by docker run -it --rm --gpus all -v /path/to/my_model:/my_model huggingface/optimum-nvidia I have ran llama.py from example folder once and generated build.json and copied that to my model folder as well.

I have created one more file python file inside docker with 'Generate' section of 'Quickstart Guide' as contents, while running that im getting,

Traceback (most recent call last):
  File "/opt/optimum-nvidia/examples/text-generation/tst.py", line 26, in <module>
    tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3710, in batch_decode
    return [
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3711, in <listcomp>
    self.decode(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3750, in decode
    return self._decode(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 626, in _decode
    text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
TypeError: argument 'ids': 'list' object cannot be interpreted as an integer

upon checking the 'token_ids' in the above line is a list(list(list(Int))) e.g [[[1,2]]] while self._tokenizer.decode(token_ids[0][0], skip_special_tokens=skip_special_tokens) is working. Can you please check this or am i doing anything wrong?

Also getting errors when passing multiple prompts.

huggingface / optimum-nvidia

Not able to run 'Generate' from QuickStart section #61