ilovesoup / hyracks

Automatically exported from code.google.com/p/hyracks
Apache License 2.0
0 stars 0 forks source link

The LSM-BTree and the LSM-RTree don't flush the in-memory tree when the tree is closed #65

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The close method in the LSM-BTree and the LSM-RTree ignore the in-memory 
component. Thus, all records that exist in the memory will be lost.
We should flush the in-memory component before closing the tree.

Original issue reported on code.google.com by salsuba...@gmail.com on 8 May 2012 at 12:11

GoogleCodeExporter commented 9 years ago
Zach, can you have a look at this one? Thanks!

Original comment by alexande...@gmail.com on 7 Jul 2012 at 5:52

GoogleCodeExporter commented 9 years ago
I'm not sure that we want this actually. In particular I'm concerned about two 
things: 1) uncommitted data reaching disk at the asterix level and 2) having 
implicit flushes and merges.

My suggestion would be that if you want to close an LSM index and want the data 
to be persisted, then call flush before closing.

Thoughts?

Original comment by zheilb...@gmail.com on 7 Jul 2012 at 9:45

GoogleCodeExporter commented 9 years ago
I think the following sequence of actions should work on any persistent index, 
regardless of whether we are using transactions (i.e. Asterix) or not:

index.open()
index.insert()
index.close()
(the index instance is garbage collected)
(create a new index instance)
index.open()

Here you should be able to see your insert. Hope this makes sense :)

Original comment by alexande...@gmail.com on 7 Jul 2012 at 11:26

GoogleCodeExporter commented 9 years ago
This definitely makes sense.  Note that it is potentially orthogonal to where 
the newest index component lives.  I view this code sequence as something that 
a client of the index might do - open, insert, close, ..., (re) open - so open 
doesn't necessarily have a precondition of a component of the index being 
non-memory-resident nor does it necessarily mean that it will become 
memory-resident.  That should be a separate LSM lifecycle decision, I would 
think - but what Alex says has to work - which simply, to me, means that if 
close does NOT persist the newest component to disk, open needs to get it from 
memory.  I think I agree with Zach that there should be a difference between:

  open, insert, close, ..., open

and:

  open, insert, close, flush, ..., open

The latter would not only insure that open sees the changes from before (which 
must be true in both cases) but also that the component was disk-ified (by 
flush).

Hope this makes sense.  I would think that there should be lifecyle hooks on 
system startup and shutdown (esp. shutdown) where non-flushed LSM components 
would be flushed.

Original comment by dtab...@gmail.com on 8 Jul 2012 at 3:21

GoogleCodeExporter commented 9 years ago
Somewhat agreed. In Zach's and my thinking closing an index implies that you 
won't be using it anytime soon, and the index java object should be garbage 
collected (including the memory for its in-memory component). So the case where 
you close and then reopen by getting the in-memory component from memory 
doesn't really exist, because if you close that memory is free to be garbage 
collected.

Also, in my mind the 2nd sequence of 

open, insert, close, flush, ..., open

is invalid because you cannot flush a closed index. What does it mean to 
close() then?

Original comment by alexande...@gmail.com on 8 Jul 2012 at 9:22

GoogleCodeExporter commented 9 years ago
Fixed in hyracks_lsm_tree r1742

Original comment by zheilb...@gmail.com on 20 Jul 2012 at 1:45