google / flatbuffers

FlatBuffers: Memory Efficient Serialization Library
https://flatbuffers.dev/
Apache License 2.0
22.52k stars 3.19k forks source link

Python "struct.error: unpack_from requires a buffer of at least 1718886439 bytes for unpacking 4 bytes at offset 1718886435 (actual buffer size is 361)" #8347

Closed ethanc8 closed 2 days ago

ethanc8 commented 2 days ago

When trying to load an LLM in TensorFlow MediaPipe format, using the following code:

# tflite flatbuffer schemas
import tflite.Model, tflite.SubGraph
import sys

input_file = open(sys.argv[0], "rb")
buf = bytearray(input_file.read())

model: tflite.Model.Model = tflite.Model.Model.GetRootAs(buf)

print(f"Version: {model.Version()}")

graph: tflite.SubGraph.SubGraph = model.Subgraphs(0)

print(f"Name: {graph.Name()}")

I get:

Traceback (most recent call last):
  File "/home/ethan/Projects/MachineLearning/Gemini-Nano/playground/analyzer.py", line 11, in <module>
    print(f"Version: {model.Version()}")
                      ^^^^^^^^^^^^^^^
  File "/home/ethan/Projects/MachineLearning/Gemini-Nano/playground/tflite/Model.py", line 40, in Version
    o = flatbuffers.number_types.UOffsetTFlags.py_type(self._tab.Offset(4))
                                                       ^^^^^^^^^^^^^^^^^^^
  File "/home/ethan/miniforge3/envs/Gemini-Nano/lib/python3.12/site-packages/flatbuffers/table.py", line 37, in Offset
    vtable = self.Pos - self.Get(N.SOffsetTFlags, self.Pos)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ethan/miniforge3/envs/Gemini-Nano/lib/python3.12/site-packages/flatbuffers/table.py", line 93, in Get
    return flags.py_type(encode.Get(flags.packer_type, self.Bytes, off))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ethan/miniforge3/envs/Gemini-Nano/lib/python3.12/site-packages/flatbuffers/encode.py", line 26, in Get
    return packer_type.unpack_from(memoryview_type(buf), head)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: unpack_from requires a buffer of at least 1718886439 bytes for unpacking 4 bytes at offset 1718886435 (actual buffer size is 361)

The tflite directory was created by flatc --python --python-typing schema_v3c.fbs, where schema_v3c.fbs is the tflite v3c schema. The model can be any MediaPipe LLM, such as the Gemini Nano included in Chrome Canary.

ethanc8 commented 2 days ago

I will try to reproduce with more MediaPipe LLMs.

ethanc8 commented 2 days ago

You can see the full code at https://github.com/ethanc8/Gemini-Nano

ethanc8 commented 2 days ago

Never mind, I needed to use sys.argv[1] instead of sys.argv[0].