Open GoogleCodeExporter opened 9 years ago
Try to increase the size of the heap, and see if you still see the OOM error.
Original comment by khfaraaz82
on 20 Nov 2014 at 1:08
Did you try increasing the heap size and attempting to recover again? What is
the heap size when this happens?
Original comment by ima...@uci.edu
on 20 Nov 2014 at 1:08
Restarting the cluster doesn't help, as one of the NC will still try to recover
and fail. The user gets the following message:
Asterix Cluster is in UNUSABLE state.
One or more Node Controllers have left or haven't joined yet.
[AsterixException]
Original comment by ker...@gmail.com
on 20 Nov 2014 at 1:09
That is somewhat expected as there is not much one can do at this point if an
NC with a partition is unavailable.
P.S. , What is the full title of this issue? "Java OOM during recovery" or
similar?
Original comment by ima...@uci.edu
on 20 Nov 2014 at 1:11
I think this should be fixed. We shouldn't have OOM at whatever situations,
e.g., under whatever heap size setting. The memory component or buffer cache
should be auto-adjusted with a given heap limit.
Original comment by buyingyi@gmail.com
on 20 Nov 2014 at 1:14
6G for the NC and 2G for the CC, these values seems fine.
<property>
<name>nc.java.opts</name>
<value>-Xmx6144m</value>
</property>
<property>
<name>cc.java.opts</name>
<value>-Xmx2048m</value>
</property>
Original comment by ker...@gmail.com
on 20 Nov 2014 at 1:19
Original comment by ker...@gmail.com
on 20 Nov 2014 at 1:20
This should be fixed.
In the current implementation, this situation may occur when there are many
entity-level commits in a job.
Non-sharp(or soft) checkpoint may reduce this OOM chance during recovery
(currently periodic checkpoint is not enabled), but recovery manager should
deal with this situation by having an ability to spill on disk when it's
necessary.
Original comment by kiss...@gmail.com
on 20 Nov 2014 at 1:28
Original issue reported on code.google.com by
ker...@gmail.com
on 20 Nov 2014 at 1:03