Closed denis631 closed 2 years ago
parmap
contains a module (Bytearray
) that allows to marshal directly into a bigarray (and back): https://github.com/rdicosmo/parmap/blob/db44dc9cf7a6af7b56d8ebda8c75be3375c89282/src/bytearray.mli#L42-L46
Ctypes provides helpers provided to get a pointer from the bigarray: https://github.com/ocamllabs/ocaml-ctypes/blob/acf2e352b8e36804b8b35d96e3962b894c5cd0e7/src/ctypes/ctypes.mli#L348-356
Thank you very much @fdopen! 🙏 By applying your suggestions I managed to half the ingestion of tpcc_customer string data (83MB) from 6.4 to 2.9 seconds. Over 50% faster!
I am working on a hobby database project in OCaml add currently am adding the wiredtiger C bindings to it. (wiredtiger is a storage engine)
I stumbled upon the issue of marshaling/unmarshaling of the OCaml objects. The idea would be to write plain OCaml objects to disk as an array of bytes and read the OCaml objects directly from disk.
However, I don't know how to do it efficiently without performing unnecessary copies.
E.g. currently in order to write data to disk I first get the OCaml object representation, then marshal it into
bytes
(1st copy), then I need to map OCamlbytes
intochar CArray
or something similar. However,coerce
method failed for me (to cast OCaml uchar pointer to C uchar pointer), this is why I am allocating a newCArray
instance (2nd copy). However, I think no copy is needed at all, as I can write to the disk the current view of the object, which means 0 copies instead of 2. (write the current object address in memory and its length).When reading the data from disk I need to map the
void *
data to OCaml object. Unfortunately, I can not create abytes
fromvoid *
, this is why I make aCArray
instead. Luckily this operation doesn't involve any copies. However I need to createbytes
out of it, so I callBytes.init size f
(1st copy). And then when unmarshaling the OCaml object frombytes
, new memory is allocated (2nd copy). In this scenario, one copy should suffice, by unmarshaling the OCaml object directly fromvoid *
.How can I implement this without the unnecessary cloning of the data? Thank you very much in advance 🙏
PS: When inserting tuples I am marshaling OCaml object to
bytes
first here: https://github.com/denis631/LegoDB/blob/6fd39397e61d51ca1a2115ce8f7dd7b2b5cd0666/src/storage/table.ml#L45 When I retrieve the data from the disk I unmarshal the data: https://github.com/denis631/LegoDB/blob/6fd39397e61d51ca1a2115ce8f7dd7b2b5cd0666/src/storage/table.ml#L28The conversions:
bytes -> WT_ITEM
andWT_ITEM -> bytes
can be found here: https://github.com/denis631/LegoDB/blob/6fd39397e61d51ca1a2115ce8f7dd7b2b5cd0666/src/storage/wired_tiger/wired_tiger.ml#L436-L449WT_ITEM
is a Wiredtiger abstraction that represent an object written on disk: https://source.wiredtiger.com/2.9.3/group__wt.html#struct_w_t___i_t_e_m