Do the current `write-clock` and `global-clock` updates have race conditions?

MingyanGuo commented 11 months ago

In the paper's pseudo code function RLU_COMMIT_WRITE_LOG, the first two lines are

    ctx.write-clock ← global-clock + 1
    FETCH_AND_ADD(global-clock, 1)

If the current process is preempted by the scheduler after its reading the global-clock, the global-clock can be increased by other processes. Let the global-clock value read by the process be t0. When the current process gets its CPU time again, the ctx.write-clock will be updated to the value t0, but the global-clock can be some larger value, say, t0 + 100. A reader transaction R begins at t0 + 50 can suddenly read part of this commit, because if R first check the commit at t0 + 60 and then check the commit at t0 + 101, the first check will let R skip the write set, but the second check lets R read the write set. The C code rlu_sync_and_writeback seems to have the same issue. Please kindly correct me if I missed anything. Thanks.

armafire commented 11 months ago

Hi,You are thinking about concurrent writers, however, all concurrent writers won’t finish till the RCU epoch has passed, so it will avoid the conflict you are mentioning. It is not the same as usual TM transactions that do not wait at the end (no RCU).I hope it helps.Best,AlexOn Oct 17, 2023, at 1:34 PM, MingyanGuo @.***> wrote: In the paper's pseudo code function RLU_COMMIT_WRITE_LOG, the first two lines are ctx.write-clock ← global-clock + 1 FETCH_AND_ADD(global-clock, 1)

If the current process is preempted by the scheduler after its reading the global-clock, the global-clock can be increased by other processes. Let the global-clock value read by the process be t0. When the current process gets its CPU time again, the ctx.write-clock will be updated to the value t0, but the global-clock can be some larger value, say, t0 + 100. A reader transaction R begins at t0 + 50 can suddenly read part of this commit, because if R first check the commit at t0 + 60 and then check the commit at t0 + 101, the first check will let R skip the write set, but the second check lets R read the write set. The C code rlu_sync_and_writeback seems to have the same issue. Please kindly correct me if I missed anything. Thanks.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

MingyanGuo commented 11 months ago

Hi,You are thinking about concurrent writers, however, all concurrent writers won’t finish till the RCU epoch has passed, so it will avoid the conflict you are mentioning. It is not the same as usual TM transactions that do not wait at the end (no RCU).I hope it helps.Best,Alex

Thanks a lot for the reply. Really appreciate it. If I understand the RLU paper correctly, the lock is on the object level, so if two writers don't conflict on the objects to update, they can run concurrently, right? RCU does not allow concurrent writers, but RLU is to overcome this limit, if I read the abstract correctly.

RLU overcomes the major limitations of RCU by allowing, for the first time, concurrency of reads with multiple writers, and providing automation...

I just reread the paper again, but I still can only find the object level locks. Could you please let me know where the concurrent writers are synchronized? Thanks.

armafire commented 11 months ago

I looked more into the paper as well to remember and actually the basic assumption is that writers execute serially. That's what you see in Algorithm 1. It is the responsibility of the programmer to ensure concurrent writers are correct, as it is shown in Listing 2 (there is explicit lock for "prev" and "cur" nodes). The programmer basically needs to avoid a case where writer 1 reads setA and writes setB and writer 2 reads setB and writes setA (a cycle dependency that is not linear).

On Wed, Oct 18, 2023 at 1:50 PM Mingyan Guo @.***> wrote:

Hi,You are thinking about concurrent writers, however, all concurrent writers won’t finish till the RCU epoch has passed, so it will avoid the conflict you are mentioning. It is not the same as usual TM transactions that do not wait at the end (no RCU).I hope it helps.Best,Alex

Thanks a lot for the reply. Really appreciate it. If I understand the RLU paper correctly, the lock is on the object level, so if two writers don't conflict on the objects to update, they can run concurrently, right? RCU does not allow concurrent writers, but RLU is to overcome this limit, if I read the abstract correctly.

RLU overcomes the major limitations of RCU by allowing, for the first time, concurrency of reads with multiple writers, and providing automation...

I just reread the paper again, but I still can only find the object level locks. Could you please let me know where the concurrent writers are synchronized? Thanks.

— Reply to this email directly, view it on GitHub https://github.com/rlu-sync/rlu/issues/3#issuecomment-1769047356, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDGMKDNM6CZ5VUY54P2TR3YAAJGJAVCNFSM6AAAAAA6EG56VWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRZGA2DOMZVGY . You are receiving this because you commented.Message ID: @.***>

MingyanGuo commented 11 months ago

I looked more into the paper as well to remember and actually the basic assumption is that writers execute serially. That's what you see in Algorithm 1. It is the responsibility of the programmer to ensure concurrent writers are correct, as it is shown in Listing 2 (there is explicit lock for "prev" and "cur" nodes). The programmer basically needs to avoid a case where writer 1 reads setA and writes setB and writer 2 reads setB and writes setA (a cycle dependency that is not linear). …

Thanks. I understand that the programmer should avoid the cycle dependency mentioned here.

the basic assumption is that writers execute serially

Maybe serializable not serially? Otherwise a global write lock is sufficient, and the object level lock in RLU_TRY_LOCK is not necessary.

So concurrent writers are expected, but they should not interfere each other logically. The race condition in the issue can be caused by totally unrelated concurrent writers. This line

    ctx.write-clock ← global-clock + 1

can be interpreted as

  var temp ← global-clock
  // load the global variable into a register
  // then the current process is interrupted, and the global-clock is
  // updated by other writers to the value temp + 100. These writers
  // can be unrelated to this writer logically.
  // meanwhile, some readers started, but they can not read this
  // write set, because its write-clock is still ∞
  ctx.write-clock ← temp + 1
  // the current process gets CPU again, and set its write-clock to
  // temp + 1. The readers just started can suddenly read this write
  // set. So the readers can only see part of the write set in one
  // transaction.

In my opinion, the fix is to add a brief lock to make the two lines atomic.

    lock(global-writer-lock)
    ctx.write-clock ← global-clock + 1
    global-clock ← global-clock + 1
    unlock(global-writer-lock)

Hopefully my question is clear now. Thanks a lot for your time.

armafire commented 11 months ago

I see what you mean. This is a good catch. The ctx.write-clock and global-clock should update atomically to avoid other writers moving readers forward in time, while this one writer is stalled.

Alex

On Wed, Oct 18, 2023 at 4:16 PM Mingyan Guo @.***> wrote:

I looked more into the paper as well to remember and actually the basic assumption is that writers execute serially. That's what you see in Algorithm 1. It is the responsibility of the programmer to ensure concurrent writers are correct, as it is shown in Listing 2 (there is explicit lock for "prev" and "cur" nodes). The programmer basically needs to avoid a case where writer 1 reads setA and writes setB and writer 2 reads setB and writes setA (a cycle dependency that is not linear). … <#m_-1065742480950296257_m5955298840366869984>

Thanks. I understand that the programmer should avoid the cycle dependency mentioned here.

the basic assumption is that writers execute serially

Maybe serializable not serially? Otherwise a global write lock is sufficient, and the object level lock in RLU_TRY_LOCK is not necessary.

So concurrent writers are expected, but they should not interfere each other logically. The race condition in the issue can be caused by totally unrelated concurrent writers. This line
ctx.write-clock ← global-clock + 1
can be interpreted as

var temp ← global-clock // load the global variable into a register // then the current process is interrupted, and the global-clock is // updated by other writers to the value temp + 100. These writers // can be unrelated to this writer logically. // meanwhile, some readers started, but they can not read this // write set, because its write-clock is still ∞ ctx.write-clock ← temp + 1 // the current process gets CPU again, and set its write-clock to // temp + 1. The readers just started can suddenly read this write // set. So the readers can only see part of the write set in one // transaction.

In my opinion, the fix is to add a brief lock to make the two lines atomic.
lock(global-writer-lock)
ctx.write-clock ← global-clock + 1
global-clock ← global-clock + 1
unlock(global-writer-lock)
Hopefully my question is clear now. Thanks a lot for your time.

— Reply to this email directly, view it on GitHub https://github.com/rlu-sync/rlu/issues/3#issuecomment-1769253453, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDGMKBUB6R5777K3EXTGY3YAA2JJAVCNFSM6AAAAAA6EG56VWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRZGI2TGNBVGM . You are receiving this because you commented.Message ID: @.***>

MingyanGuo commented 11 months ago

I see what you mean. This is a good catch. The ctx.write-clock and global-clock should update atomically to avoid other writers moving readers forward in time, while this one writer is stalled. Alex …

Thanks for the confirmation and appreciate all your replies.

rlu-sync / rlu

Do the current `write-clock` and `global-clock` updates have race conditions? #3