nathanielherman / sto

Software Transactional Objects
20 stars 12 forks source link

RBTree in early stage #4

Closed ghost closed 8 years ago

ghost commented 8 years ago
  1. The concurrent writer check here (commit 116ea7bcc388f92e2f394262128d86e4e104c51d, RBTree.hh:144) doesn't seem sufficient. Checking Tids stored in versioned_value's themselves probably a better idea?
  2. We need to lock around here (commit 116ea7bcc388f92e2f394262128d86e4e104c51d, RBTree.hh:213); RBTreeInternal::find_any() is not thread-safe. If we are returning a pointer like this then our erase method should never actually "erase" any node -- not even after the transaction commits. Let's do RCU garbage collection instead.
tslilyai commented 8 years ago
  1. Why isn't it sufficient? (This is the check that is done in pqueue as well). If we were the one to insert the item, then we should see the insert flag set on the item (this item only exists in our transaction, no other thread can have set the flag).
  2. RCU garbage collection means that we copy the entire data structure and actually do the delete when the transaction commits on the copy. And then... we remove the old data structure? I'm not sure I completely understand it. We need to lock because someone could be altering the internals (i.e. doing a rotate or something) of the tree while we do our find, correct?
ghost commented 8 years ago

Oh you are right! Other threads will abort their inserts to the same key if they see an insert flag!

We can do RCU just like a scheduled cleanup. I think there are something like that in STO already.

And I just found a rather serious problem with our "STL-style" interface... operator[] returns a reference to the user, which is a disaster for our transaction system: how do we learn the written value...? We don't even know whether operator[] is used on the right hand side or on the left hand side...

tslilyai commented 8 years ago

Hmm let's talk RCU on Sunday? Do you know another data structure with rcu-locking (masstree?) that I can take a look at?

STL = standard template library? What does "STL-style" mean? And yeah, I do see what you mean -- if I'm understanding correctly, the value could potentially change at any point now that the user has a reference (and we'd have no way of tracking these changes....)

jinala commented 8 years ago

For the second problem, you can solve it by wrapping the reference in a proxy and overwrite = and cast operators to do transactional updates and reads. The struct T_wrapper in Vector.hh is one place where I did this. You can do something similar.

ghost commented 8 years ago

Yeah, we can definitely talk about RCU garbage collection on Sunday. It's just like an epoch-based garbage collection; Masstree uses this (although the paper does't talk much about it other than that sentence "we use an epoch-based garbage collection scheme"...). Basically what it does is that a global counter (epoch) will get increment by a clean-up thread that runs every few milliseconds to look through all threads to find blocks to free. There is an epoch_advancer function (which is basically that clean-up thread) in class Transaction in Transaction.hh and you can take a look at how it works. It's quite simple actually, it just looks through all threads.

Yeah STL is Standard Template Library -- the STL map's operator[] returns a reference...

@jinala Yeah I noticed that, but that's actually changing the spec right? It will break if the program doesn't use it directly in conjunction with operator=...

jinala commented 8 years ago

The program will segfault if the user doesn't directly use operator=. I think that is ok because it means that the user is not allowed to do anything else to that reference.

Also, there is a rcu_free() method in Transaction.hh that you can use directly to schedule rcu cleanup.

ghost commented 8 years ago

Yup, just as easy as calling free()!

tslilyai commented 8 years ago

@huangyihe, would the best place to define operator= (and operator++ and operator--, etc) be on rbpair? and then in operator[] we return an rbpair reference to the user?

There might be an issue with retrieving the value from the rbpair reference if operator[] is used on the RHS though...? How would we know the user wants the actual value of the key-value rbpair, when he/she is using the result of operator[] on the RHS?

(Also, thanks for the help @jinala!)

ghost commented 8 years ago

You are right. What we could do (which is probably what Jeevana did) is to define two different operator[]'s:

const T& RBTree::operator[](const K& key) const;
assignWrapper& RBTree::operator[](const K& key);

And then we do all the trick in this assignWrapper class. Actually yeah probably we can use rbpair for this purpose -- but maybe it can be a little confusing?

tslilyai commented 8 years ago

I think it'll work -- let me try using rbpair in the wrapper. So there will be two different types of operator[], one that returns a reference to a value that you can modify, and one that returns a reference to a value that you can use to do arbitrary computation (i.e. a reference to an int that can be added, subtracted, etc.)?

But how do we know which operator[] the user is using? Will they specifically have to assign operator[] to a const var?

ghost commented 8 years ago

Yes! And if we specify one of them (the look-up only one) as const, then the compiler will automatically switch the other one when a user tries to do something like map[5] = 7;

tslilyai commented 8 years ago

Great--I implemented an initial version of the two operators using rbpair as the "wrapper", and defining operator= on rbpair. There are a few errors about overloading operator[] though -- let me know if you figure out what's going on (I won't have time until later to debug)!

ghost commented 8 years ago

Ahh yes sure we'll need the conversion operator. I'll fix it!

ghost commented 8 years ago

All right, fixed in 0cad3a7c9bd9a53325722ed26a8ec558064c8bca!

I figured that it would be too clumsy to have two operator[]s -- the compiler doesn't seem to cooperate when I do something like map[5] == 3, it keeps telling me rbpair and int is not compatible, leaving me no chance to tell it there is another operator[] that returns an int...

I also moved this interface proxy (called RBProxy right now) into a separate class, just because I don't seem to enjoy the idea of putting a pointer to the tree inside our rbpair, which is part of our tree node. We need to garbage collect these RBProxy objects as well, but that should just be as easy as calling rcu_free.

tslilyai commented 8 years ago

Great -- let's talk about rce/garbage collection tomorrow (2 in MD?)

I pushed a fix to the lock type conversion error we were getting... it was a silly thing

On Sat, Sep 26, 2015 at 4:45 PM, Yihe (William) Huang < notifications@github.com> wrote:

All right, fixed in 0cad3a7 https://github.com/nathanielherman/sto/commit/0cad3a7c9bd9a53325722ed26a8ec558064c8bca ! I figured that it would be too clumsy to have two operator[]s -- the compiler doesn't seem to cooperate when I do something map[5] == 3, it keeps telling me rbpair and int is not compatible, leaving me no chance to tell it there is another operator[] that returns an int... I also moved this interface proxy (called RBProxy right now) into a separate class, just because I don't seem to enjoy the idea of putting a pointer to the tree inside our rbpair, which is part of our tree node. We need to garbage collect RBProxy as well, but that should just be as easy as calling rcu_free.

— Reply to this email directly or view it on GitHub https://github.com/nathanielherman/sto/issues/4#issuecomment-143493914.

Lillian Tsai Harvard University | Class of 2017 A.B. Degree Candidate in Computer Science and Mathematics (650) 644-9845 | lilliantsai@college.harvard.edu

ghost commented 8 years ago

Yep, see you tomorrow!

ghost commented 8 years ago

Figured most stuff out; closed for now.