v6d-io / v6d

vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
https://v6d.io
Apache License 2.0
828 stars 122 forks source link

Client delete operation corrupted data #1916

Closed qiranq99 closed 3 months ago

qiranq99 commented 3 months ago

Hi folks,

I encountered a weird problem that the deletion operation from the IPC client side may corrupt data.

Here I present a minimum case for you to reproduce:

import vineyard
import numpy as np

client = vineyard.connect()

# A tuple object containing 2 numpy arrays. 
# It would cause data corruption
data : tuple = (np.ndarray((200, ), np.int32), np.ndarray((20, ), np.int32))

# This object would not cause data corruption
# data : tuple = (np.ndarray((200, ), np.int32), 1)

object_id = client.put(data)

re_data = client.get(object_id)

# Evict the object from object store, 
# but I would expect the ownership of the object transferred,
# while the actually memory views should not be corrupted.
client.delete(oid) 

assert np.array_equal(data[0], re_data[0]), "Original data is corrupted"

Other descriptions:

sighingnow commented 3 months ago

Thanks for raising the issue! Will invistagte ASAP.

dashanji commented 3 months ago

Hi @qiranq99. In your case, as you don't specify any allocator, then the default allocator dlmalloc will be used. Basically, it's caused by the free behavior of dlmalloc. When you delete the object, the blob will be free by the dlfree. As the got object is actually a memoryview object, it will share the same physical storage with the blob. As a result, the data may be corrupted by the behavior of dlfree.

As for the dlfree, the allocator will put the the memory chunk into the free list/tree rather than return it to the os. Specially, the allocator will choose to link(store the prev/next free chunk in the first 4 bytes of a chunk to free) or merge(may not change the first 4 bytes) two free chunks based on the neighboring chunk. Thus, you will find only the first element is different between the original data and the got one. For more details about the dlmalloc, you could refer to the doc or something else.

If you want to transfer the object's ownership, I suggest you use a deepcopy to create a new object, thus the payload will not be changed by the "dlfree" of the below delete operation.

sighingnow commented 3 months ago

Hi @qiranq99 Vineyard's delete() won't respect the liveness of return objects (as vineyard.get() is zero-copy) and the object been destroyed when been deleted.

You shouldn't do the eviction before the object is still in use and no extra memory is needed as it is shared via zero-copy.

qiranq99 commented 3 months ago

Thanks, it helped.