Open hoodmane opened 3 years ago
Hi @oeway, thanks again for your design input.
My thoughts on this are as follows: you can currently implement to_bytes
in Javascript as follows:
let bytes = pyodide.globals.get("bytes");
function to_bytes(buf){
let result = bytes(buf.byteLength);
let result_buf = result.getBuffer("u8");
if(ArrayBuffer.isview(buf)){
buf = new Uint8Array(buf.buffer, buf.byteOffset, buf.byteLength);
} else {
buf = new Uint8Array(buf);
}
result_buf.data.set(buf);
result_buf.release();
return result;
}
(Note that doing this is perfectly free of any memory leaks on the development branch because of #1573. This code is leaky in v0.18.1)
From here you can do:
self.jsbuf = new Uint16Array([10, 11, 7, 1]);
pyodide.runPython(`
from js import to_bytes, jsbuf
assert to_bytes(jsbuf) == bytes([10, 0, 11, 0, 7, 0, 1, 0])
`);
Because this is possible, I think the right solution is to have a buffer_converter
option which would just be a Python callable. If you need it to do special things for ArrayBuffer
or for TypedArray
or something, you can write your own similar function. I think it's fine to pass both ArrayBuffer
and TypedArray
to the same function, since we will generally need to branch on ArrayBuffer.isView
but I think it's mostly desirable to do similar stuff with both.
OTOH, I could see making both a typedarray_converter
and an arraybuffer_converter
if you are a strong advocate for that.
How does this sound?
Hi, thanks a lot for looking into this!
First of all, I think it's great to have at least buffer_converter
option for to_py()
. For your to_bytes
implemetation, my only problem with it is the performance wise (due to the copy operations). My current solution is to do tobytes()
on the memoryview object in Python directly which in principle a share-memory operation and thus faster, correct?
I am more interested in a solution that requires no memory copy and no manual recursion for nested object to dictionary conversion. So if you can implement the buffer_converter option, we can then pass set buffer_converter=bytes
if the default type is memoryview.
Here is what I just tried:
>>> memoryview(b'abc')
<memory at 0xfd1720>
>>> bytes(memoryview(b'abc'))
b'abc'
If you need it to do special things for ArrayBuffer or for TypedArray or something, you can write your own similar function. I think it's fine to pass both ArrayBuffer and TypedArray to the same function, since we will generally need to branch on ArrayBuffer.isView but I think it's mostly desirable to do similar stuff with both.
Agreed.
For your
to_bytes
implemnetation, my only problem with it is the performance wise (due to the copy operations).
This is inevitable if the starting object does not live on the wasm heap. We have to copy the data from the ArrayBuffer
into the wasm heap so that wasm can touch it without a Javascript "trampoline". If you are after performance, then the data should already be on the wasm heap. There is a different design issue of how to track memory ownership through this since a raw TypedArray
pointing to part of the wasm heap doesn't have anything to track whether the data has been freed or not.
we can then pass set
buffer_converter=bytes
if the default type ismemoryview
.
Doing bytes(some_memoryview)
is a copy so this design is not great if your goal is to avoid incurring extra copies.
Oh, I see now, make sense, and I didn't know that bytes(memoryview) will create a copy. Is it the same for memoryview.tobytes? In that case, can you provide some utility functions for buffer_converter which won't create additional copies?
I didn't know that bytes(memoryview) will create a copy.
Yeah the bytes object wants to be read only so it doesn't want there to be other references to the bytes. Even if the memoryview is read only, it still copies.
Is it the same for memoryview.tobytes?
Yes.
can you provide some utility functions for buffer_converter which won't create additional copies?
I think buffer_converter
can just take the buffer itself (as a JsProxy
) and you can in Javascript or with create_proxy
implement your own converter which makes only one copy. One copy is the minimum possible anyways. But I could be misunderstanding what you are asking for.
Hi @hoodmane I have another issue related to your PR, basically I am sending bytes nested in a dictionary (from native python on a server) to javascript, then I called
to_py()
in pyodide. Since all the bytes becomes memoryview, I have to convert them all to bytes in loops. Similar totoJs()
which allows configure thedict_converter
, it would be really great ifto_py()
be configured withtypedarray_converter
so the conversion can be done recursively.Besides that, right now, the to_py conversion treat ArrayBuffer and TypedArray the same way, I am wondering would it make sense to separate them, e.g. allowing configure
arraybuffer_converter
andtypedarray+converter
so one can convert ArrayBuffer and TypedArray differently. I mean all these can be done by additional postprocessing to fit different needs, but for my case I do this in constant interaction with servers, having these options will have huge implication in performance.Does this make sense to you?
Originally posted by @oeway in https://github.com/pyodide/pyodide/issues/1864#issuecomment-937927746