utsaslab / RECIPE

RECIPE : high-performance, concurrent indexes for persistent memory (SOSP 2019)
Apache License 2.0
197 stars 46 forks source link

Crash consistency issue after acquiring bucket locks #19

Closed iangneal closed 1 year ago

iangneal commented 3 years ago

Bug

Exposed by crashing after acquiring a lock from clht_put.

https://github.com/utsaslab/RECIPE/blob/fc508ddfae1ca0d77cf3d3f1b73849e65c223f26/P-CLHT/include/clht_lb_res.h#L306-L312

Steps to reproduce

gdb --args ./example 20 1
> break clht_lb_res.h:311
> run
> next
> p *lock
# should print "$1 = 1 '\001'"
> quit
# Then, re-run
./example 20 1

The second execution should run indefinitely, waiting on acquiring the lock.

Comments

I see your comments here about locking assumptions:

https://github.com/utsaslab/RECIPE/blob/fc508ddfae1ca0d77cf3d3f1b73849e65c223f26/P-CLHT/include/clht_lb_res.h#L162-L164

Does this mean this is a known issue, or does clht_lock_initialization just need to be added to clht_create? I ask because it seems that clht_lock_initialization is called in other places, just not in the recovery procedure.

SeKwonLee commented 3 years ago

Hi @Dahca ,

Thanks for the report. We are providing the pmdk version for a reference implementation using pmemobj allocator, but it is not fully tested and has no implementations (such as lock initialization and garbage collection) yet we assumed in our paper. As you also see in my comments, it is a known implementation issue caused by the absence of one of the post-crash mechanisms (lock initialization) we assumed in our paper. clht_lock_initialization was presented as a reference implementation for initializing locks if someone wants to implement post-crash mechanisms. I agree those implementations are necessary to make it properly work for actual use, but I could not find time to work on them yet.

iangneal commented 3 years ago

Hey @SeKwonLee,

Thanks for the quick responses. I can also attempt to add a solution for this in the near future, but as I said in #18, I'll be slightly delayed by an upcoming deadline.

SeKwonLee commented 1 year ago

We close this issue since it is known issue included in one of the assumptions presented in our original paper.