Closed jasperzhong closed 3 years ago
https://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/
Plasma: A High-Performance Shared-Memory Object Store
Plasma holds immutable objects in shared memory so that they can be accessed efficiently by many clients across process boundaries.
One of the goals of Apache Arrow is to serve as a common data layer enabling zero-copy data exchange between multiple frameworks. A key component of this vision is the use of off-heap memory management (via Plasma) for storing and sharing Arrow-serialized objects between applications.
完美. 有个词叫做off-heap
,是指不被GC控制的heap data. https://stackoverflow.com/a/45267441/9601110
使用了Google Flatbuffers作为serialization library. 这是一个非常浅的serialize. 速度和raw struct不相上下. (benchmark
https://arrow.apache.org/docs/python/plasma.html
看上去能handle numpy,用pa.Tensor.from_numpy
. 完美.
使用步骤:
np.random.bytes(20)
. client.create(object_id, size)
. 如果object_id已经存在,则会报错; 如果OOM, 报PlasmaStoreFull
错.
https://arrow.apache.org/docs/python/plasma.html