skale-me / skale

High performance distributed data processing engine
https://skale-me.github.io/skale
Apache License 2.0
399 stars 53 forks source link

Process Out Of Memory #39

Closed philippe56 closed 7 years ago

philippe56 commented 8 years ago

The attached file fails with 'process out of memory', after 1.5Gb have been allocated.

mvertes commented 8 years ago

It seems that attached file is missing

tfauck commented 8 years ago

dd.zip

mvertes commented 8 years ago

In this program, the problem comes from the use of collect(), which is not scalable, and meant (like parallelize) for small amounts of data and for test purpose. Here, dozen of megabytes of data are going in and out at once through the RPC protocol instead of a data transfer protocol. An efficient data transfer protocol exists in skale, but only for input streams (see linestream/objectstream sources) and shuffle operations. It's not available yet to output from skale to external world. We're working on it.

mvertes commented 8 years ago

Now, by default nodeJS runtime limits itself to something around 1 GB of heap size, which explains why it fails even if more RAM is available in the system. You can increase memory space using

$ node --max_old_space_size=8192 <program> <args> ...
philippe56 commented 8 years ago

Here is a screen copy:

$ skale run

<--- Last few GCs --->

83075 ms: Scavenge 1407.1 (1444.7) -> 1406.1 (1444.7) MB, 0.4 / 0 ms (+ 1.0 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep]. 83084 ms: Mark-sweep 1406.1 (1444.7) -> 1402.7 (1441.9) MB, 8.1 / 0 ms (+ 1.0 ms in 1 steps since start of marking, biggest step 1.0 ms) [last resort gc]. 83089 ms: Mark-sweep 1402.7 (1441.9) -> 1402.5 (1441.9) MB, 5.9 / 0 ms [last resort gc].

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x1a8657eb4629 1: Join(aka Join) [native array.js:154] [pc=0x33266b116998](this=0x1a8657e041b9 ,o=0x189ffc719961 <JS Array[6]>,v=6,C=0x332cf9afd911 <String[1]: ,>,B=0x1a8657e97059 <JS Function ConvertToString %28SharedFunctionInfo 0x1a8657e45419%29) 2: InnerArrayJoin(aka InnerArrayJoin) [native array.js:331] [pc=0x33266b11580a] (this=0x1a8657e041b9 ,C=0x332cf9afd911 <String[1]: ,>,...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory

mvertes commented 7 years ago

Although this programs works correctly with 1 worker, it is inherently not designed to scale as soon as work is dispatched to 2 or more workers, due to the iterative use of cartesian (24 stages of cartesian applied to previous). Too much of a corner case. An interesting pathological example, but I consider it outside of scope of skale-engine, so closing it.