snowplow-incubator / snowplow-lake-loader

Snowplow Lake Loader
Other
0 stars 2 forks source link

Adjust default spark memory configuration #51

Closed istreeter closed 6 months ago

istreeter commented 6 months ago

Refers to standard spark configuration properties

spark.memory.fraction: Adjusts how much memory we allow for Spark vs non-Spark parts of the loader. The Spark default is 0.6. I previously decreased this to 0.2 in the Lake Loader to avoid OOMs. But with the newer versions of Lake Loader I find 0.3 is also fine for avoiding OOMs. I found 0.3 works slightly better than 0.2 for committing events as the table size increases.

spark.memory.storageFraction: Fraction of the spark memory that is immune to eviction when under memory pressure. The Spark default is 0.5. But we can set it to "0" because this loader can tolerate evicted blocks, and doing so allows a bit more memory for the shuffle, which happens when committing events.