Open guidorice opened 11 months ago
As a struct, it should be named MemoryView
. Please be consistent and avoid Python's mess in naming!
Good suggestion! The naming is a bit confusing- there is the type Py_buffer at the C level, MemoryView in Python land, and memoryview() constructor, also in Python land. Definitely would not want to add new names or concepts if that can be avoided.
Also, I thought maybe this python example with comments may help to illustrate the idea little more:
# made up example (chatbot)
import array
arr = array.array('i', [1, 2, 3, 4, 5])
mem_view = memoryview(arr)
# Access properties of the memoryview
print(mem_view.nbytes)
print(mem_view.itemsize)
# Indexing and slicing like NumPy array
print(mem_view[0])
print(mem_view[-1])
print(mem_view[1:3])
# Iterate through the memoryview
for num in mem_view:
print(num)
# Get a NumPy array from the memoryview
import numpy as np
num_arr = np.frombuffer(mem_view, dtype=np.int32)
print(num_arr)
output
20
4
1
5
<memory at 0x1011590c0>
1
2
3
4
5
[1 2 3 4 5]
I think this enhancement would open up numerous use cases like:
I am aware, that I am quite pedantic, but if Mojo would like to implement this, it will be IMHO better to just sacrifice one character more and name this constructor "memory_view". I don't like Python's style to blend words together without any separator. Keeping names strongly synchronized with Python is also not the best, cause it will also require to directly follow its behaviour which may be painful in some cases.
If Mojo will be Python++ instead of its compiled copy, it will gain its own identity and this small improvements will be in this case very noticeable
Linking to a neat related project here: Arrow implementation in Mojo https://github.com/kszucs/firebolt It unlocks the case where mojo is the consumer of arrow data structures.
Review Mojo's priorities
What is your request?
This enhancement request is to add support for Python's
memoryview
builtin and support for python buffer protocol. Here are some ideas about what kind of tasks and level of effort might be involved:Bufferable
?) which has dunder methods:__buffer__
and__release_buffer__
.memoryview()
on Mojo structs.__buffer__
returnsmemoryview
so this has to be builtin to Mojo (not a python module import).Py_buffer
https://docs.python.org/3/c-api/buffer.html to be called from Python. Or maybe they could be wrapped in a PythonObject and returned as amemoryview
?What is your motivation for this change?
Currently Mojo 0.6 has poor (nonexistent?) support for zero-copy shared memory buffers with Python.
For example in Mojo's documentation the Ray Tracing notebook has an example of raster imagery being copied into a numpy array, using MLIR ops. Not only is this an unnecessary memory copy, it's also too verbose, undocumented, and not pythonic. See
def to_numpy_image(self) -> PythonObject:
in source notebook.Mojo should enable and encourage interop with existing scientific computing packages in the most efficient manner. For example the Apache Arrow format.
This enhancement would also lay the groundwork for supporting the Python array API standard.
Any other details?
Related Discussions/Issues:
Reference PEPs: