google / flatbuffers

FlatBuffers: Memory Efficient Serialization Library
https://flatbuffers.dev/
Apache License 2.0
23.16k stars 3.23k forks source link

Is the combination of `CreateNumpyVector` and `GetVectorAsNumpy` guaranteed to be portable? [Python with NumPy] #8381

Closed kralka closed 1 month ago

kralka commented 1 month ago

tldr I believe there might be an issue with alignment when a FlatBuffer is created on a machine with looser alignment requirements than read.

The function builder.CreateNumpyVector is calling https://github.com/google/flatbuffers/blob/8db59321d9f02cdffa30126654059c7d02f70c32/python/flatbuffers/builder.py#L503 where dtype.alignment is the alignment "according to the compiler" (https://numpy.org/doc/stable/reference/generated/numpy.dtype.alignment.html). I read this as "dependent on the machine which creates the flatbuffer".

Then table.GetVectorAsNumpy returns a view into the memory as specified in the docstring (by calling https://github.com/google/flatbuffers/blob/8db59321d9f02cdffa30126654059c7d02f70c32/python/flatbuffers/encode.py#L35). But this function assumes data is correctly aligned? What happens when we load a FlatBuffer into the memory (into a well aligned buffer) but the alignment requirements were different in the computer which created the FlatBuffer from the computer which is reading it?

If this is indeed a problem is it enough to call self.StartVector(x.itemsize, x.size, x.dtype.itemsize) (instead of https://github.com/google/flatbuffers/blob/8db59321d9f02cdffa30126654059c7d02f70c32/python/flatbuffers/builder.py#L503 https://numpy.org/doc/stable/reference/generated/numpy.dtype.itemsize.html since no flexible data types are allowed by FlatBuffers)? Since alignment should be at most the type size (and a power of two) both would be satisfied for the supported types. The downside of this approach would be potentially more padding.

kralka commented 1 month ago

The answer is "most likely".

https://numpy.org/devdocs/dev/alignment.html

C++ also uses alignof()