Questions about the Differences between SepHash Implementation and Paper Description

ndsf commented 2 weeks ago

Hello, First of all, I want to say that SepHash is really an awesome work. The idea of reducing the RTT of concurrency control by appending writes on disaggregated memory is fascinating. However, I notice some differences between the paper's description and the code implementation.

Difference 1: Update and Delete Operations

In Chapter 4.3 of the paper, it is mentioned that:

"Update and delete operations are converted to inserts, competing for CurSegment entries in the same order. It ensures that no writes will occur to CurSegment when the merge operation is in progress, avoiding data loss."

However, in the SepHash code, updates are performed using CAS to modify the entries found during the search. During the CAS operation, there is no check to see if a merge is in progress. If an update occurs on an entry that has already been read during the merge, the update may be lost after the merge.

task<> Client::update(Slice *key, Slice *value)
{
    uint64_t pattern = (uint64_t)hash(key->data, key->len);
    KVBlock *kv_block = InitKVBlock(key, value, &alloc);
    uint64_t kvblock_len = key->len + value->len + sizeof(uint64_t) * 2;
    uint64_t kvblock_ptr = ralloc.alloc(kvblock_len);
    wo_wait_conn->pure_write(kvblock_ptr, seg_rmr.rkey, kv_block, kvblock_len, lmr->lkey);

    char data[1024];
    Slice ret_value;
    ret_value.data = data;
    auto [slot_ptr, old_slot] = co_await search(key, &ret_value);

    Slot *tmp = (Slot *)alloc.alloc(sizeof(Slot));
    Slot old = (Slot) old_slot;
    tmp->dep = old.dep;
    tmp->fp = fp(pattern);
    tmp->len = (kvblock_len + ALIGNED_SIZE - 1) / ALIGNED_SIZE;
    tmp->sign = old.sign;
    tmp->offset = ralloc.offset(kvblock_ptr);
    tmp->fp_2 = fp2(pattern);

    if (slot_ptr != 0ull)
    {
        // log_err("[%lu:%lu]slot_ptr:%lx slot:%lx for %lu to be updated with new slot: fp:%d len:%d offset:%lx",cli_id,coro_id,slot_ptr,old_slot,*(uint64_t*)key->data,tmp->fp,tmp->len,tmp->offset);
        // 3rd RTT: Setting the key-value block to full zero
        if (!co_await conn->cas_n(slot_ptr, seg_rmr.rkey, old_slot, *(uint64_t*)tmp)){
            log_err("[%lu:%lu]Fail to update key : %lu",cli_id,coro_id,*(uint64_t*)key->data);
        }
    }else{
        log_err("[%lu:%lu]No match key for %lu to update",cli_id,coro_id,*(uint64_t*)key->data);
    }
    co_return;
}

task<> Client::remove(Slice *key)
{
    co_return;
}

Difference 2: Duplicate Key Handling

In Section 4.2 of the paper, it is stated that:

"For entries with the same fp, if they also share the same fp2, we read the KV to verify if they are duplicate keys. According to the RTT-reduced concurrency control in §4.3, we only keep the entries closest to the bottom for duplicate keys."

However, in the SepHash code, there is no verification and removal of duplicate key entries during the merge process.

void Client::merge_insert(Slot *data, uint64_t len, Slot *old_seg, uint64_t old_seg_len, Slot *new_seg)
{
    std::sort(data, data + len);
    uint8_t sign = data[0].sign;
    int off_1 = 0, off_2 = 0;
    for (uint64_t i = 0; i < len + old_seg_len; i++)
    {
        if (data[off_1].sign != sign)
        {
            // log_err("[%lu:%lu:%lu]wrong sign",cli_id,coro_id,this->key_num);
            // print_mainseg(data, len);
            // exit(-1);
        }
        if (data[off_1].fp <= old_seg[off_2].fp)
        {
            new_seg[i] = data[off_1];
            off_1++;
        }
        else
        {
            new_seg[i] = old_seg[off_2];
            off_2++;
        }
        if (off_1 >= len || off_2 >= old_seg_len)
            break;
    }
    if (off_1 < len)
    {
        memcpy(new_seg + old_seg_len + off_1, data + off_1, (len - off_1) * sizeof(Slot));
    }
    else if (off_2 < old_seg_len)
    {
        memcpy(new_seg + len + off_2, old_seg + off_2, (old_seg_len - off_2) * sizeof(Slot));
    }
}

Therefore, I would like to ask if there is an updated version of the code that implements the amazing ideas described in the paper. Thank you in advance!

minxinhao commented 2 weeks ago

Thank you for your question. The Update and Delete code was left from my performance comparison tests between direct CAS and using Insert as two methods. You can easily use Insert directly to implement these two operations. For fp2 and duplicate key checks, you can refer to the descriptions in the paper. The updated code cannot be copied out from the Huawei platform. I apologize for any inconvenience this may have caused.

ndsf commented 2 weeks ago

Thanks for your reply, I will try to modify the code based on the description in the paper.

minxinhao / SepHash

Questions about the Differences between SepHash Implementation and Paper Description #2

Difference 1: Update and Delete Operations

Difference 2: Duplicate Key Handling