Open GoogleCodeExporter opened 8 years ago
It looks very similar to issue 878. It seems LSM state machine still has some
problems.
Original comment by buyingyi@gmail.com
on 16 May 2015 at 6:57
Original comment by buyingyi@gmail.com
on 16 May 2015 at 6:58
I investigated this issue and found why this situation occurs.
There are two problems behind this situation.
1. When an in-memory component is about to be scheduled to be flushed,
the component's writer counter can be greater than 0 based on the current code
base/design.
So, this should be considered as a valid situation (rather than an exception)
and
dealt with appropriately, i.e., the component must not be scheduled to be
flushed.
2. The number of buffer pages are not enough if the frame size is set to 4096
and the
buffer cache page size is set to 327680 in default
asterix-build-configuration.xml.
If the configuration file is set in this way, there are at most 11 pages in
buffer cache,
so buffer cache can not find victim pages since all pages are used. Thus, the
test
is hung.
So, the buffer cache size (not the buffer cache page size) should be increased
appropriately to avoid this situation.
I have a fix for problem 1 and I will send a code review after I add comments
to the fixed
code such as why the situation can occur with a detail example scenario.
Lastly, this issue is different from the situation described in issue 878 even
though both issue came from LSMHarness layer.
Original comment by kiss...@gmail.com
on 18 May 2015 at 5:02
There are two partitions in a NC: one is called p0 and another is called p1.
The number of in-memory components in each partition is two (just like a double
buffering).
Another important thing is that when there are insert/delete/search operations,
for each
partition in the NC has a corresponding operation job pipe line and each of job
pipe line
is executed by a separated thread. With aforementioned setting, there will be 2
job pipe
lines when there is a delete AQL statement, for example, and 2 threads (one for
each
pipe line).
For each dataset in a NC, there is only one PrimaryIndexOperationTracker
instance
regardless of the number of partitions in the NC.
I believe this design is affected by the design decision for the memory
resource
management, where all indexes of a dataset in a NC has a common memory budget
and when an
index in the dataset recognizes that memory budget is exceeded, current mutable
in-memory
component of all indexes in the dataset should be flushed and the next
in-memory component
should be ready to get the incoming updates.
LSMHarness is a layer that orchestrates all incoming operations in order to
conform
the LSM index framework protocol which correctly changes the state of LSM index
components
where the state indicates that whether the component is readable or writable
and/or the
component is being flushed or merged, etc. LSMHarness is a gateway of each
component
through which every operations should enter and exit before and after each
operation,
respectively. Enter and exit each component methods are operations protected by
PrimaryIndexOperationTracker-synchronized block.
With this setup and the delete example, thread T1 and thread T2 of delete job
pipelines
can see the memory budget of the dataset is exceeded independently. Once this
happens, T1
and T2 may generate FLUSH log independently. When FLUSH log is generated, both
current mutable(or active) components' state is changed to READABLE_UNWRITABLE
and then
FLUSH log is written to the log tail.
When a FLUSH log is written to disk, scheduleFlush() is triggered.
The scheduleFlush() changes the component's state from READABLE_UNWRITABLE to
READABLE_UNWRITABLE_FLUSHING if the component is modified. In other words, if
the component wasn't modified, the component is not scheduled to be flushed.
In addition, the unmodified component is changed back from READABLE_UNWRITABLE
to
READABLE_WRITABLE.
Now, let's see the following scenario which can introduce a situation where an
in-memory
component can have the writer count value greater than 0 when scheduleFlush()
is triggered.
So, based on the current code base/design, writer count of an in-memory
component can be
greater than 0 during scheduleFlush(). Thus, this situation properly handled
instead of
throwing an exception. That is, the component whose writer counter value is
greater than
0 must not be scheduled to be flushed in scheduleFlush() method.
------------
A scenario
: time flows from top to bottom
------------
TS1: (timestamp 1)
p0 mc state - RW< | IA
(p0's in-memory components state:
first in-memory component mc0 readable_writable and second component mc1 is
inactive,
"<" represents a current mutable/active component)
p1 mc state - RW< | IA
(p1's in-memory components state:
first in-memory component mc0 readable_writable and second component mc1 is
inactive)
TS2:
thread T1 modifies p0-mc0 and see memory budget is exceeded and changes mc
state.
p0 mc state - RU<* | IA (RU: readable_unwritable and "*" represents that it's
modified)
TS3:
T1 calls flushIfRequested() -> which changes p1 mc state
p1 mc state - RU< | IA
TS4:
T1 forms FLUSH log and put it to log tail
TS5:
FLUSH log is flushed to disk and triggers scheduleFlush() by LogFlusher thread.
TS6:
scheduleFlush() changes p0 and p1 mc state
p0 mc state - RUF* | IA< (RUF: readable_unwritable_flushing)
p1 mc state - RW< | IA (scheduleFlush() put the state back to readable_writable
since it is not modified)
TS7:
thread T2 modifies p1-mc0 and see memory budget is exceeded and changes mc
state.
p1 mc state - RU<* | IA
TS8:
flushIfRequested() by T2
TS9:
T2 forms FLUSH log and put it to log tail (not flushed yet)
TS10:
T1 tries to modify p0-mc1 which changes p0 mc state
p0 mc state - RUF* | RW<
TS11
T1 modifies p0-mc1 and see memory budget is exceeded and changes mc state.
p0 mc state - RUF* | RU<* <- mc1's writer count can be 1
TS12:
the FLUSH log (created in TS9) is flushed to disk and triggers scheduleFlush()
by LogFlusher thread.
TS13:
When LogFlusher thread's scheduleFlush() calls
AbstractMemoryLSMComponent.threadEnter(),
p0 mc state is RU<* and it's writer count is greater than 0.
Therefore, the exception(java.lang.IllegalStateException: Trying to flush when
writerCount != 0)
is thrown!!!.
Original comment by kiss...@gmail.com
on 18 May 2015 at 7:05
I also attached a jpg file which shows the same scenario.
Original comment by kiss...@gmail.com
on 18 May 2015 at 7:07
Attachments:
Original issue reported on code.google.com by
jianfeng...@gmail.com
on 16 May 2015 at 6:50