scality / spark

Apache License 2.0
3 stars 0 forks source link

Spark crashing nodes #12

Closed scality-jeff closed 2 years ago

scality-jeff commented 2 years ago

Spark crashed (or more cascade crashed) nodes on servers in AllState ARE ring when trying to run the key gather. had to gather listkeys manually

ghost commented 2 years ago

@scality-jeff Ido not recall if you checked the logs to confirm if this was memory contention issue?

ghost commented 2 years ago

New sizing inside ansible determines the lowest available memory of any spark worker, deducts 2GB for the (current default) vm.min_free_kbytes before dividing by the number of executor instances, and cutting the number in 1/2.

The config-SAMPLE.yml file should provide sane example values for spark cluster configuration and prevent impacting storage nodes when colocated with spark.

WARNING An sproxyd configuration in which sproxyd workers are able to exhaust a majority (greater than 50-60%) of the hosts memory will still be able to impact the storage node processes. No amount of memory tuning within the spark configuration can prevent this. The cluster is tuned conservatively for executor instances, cores and memory. If storage nodes are crashing investigate and fix the configuration on the process that causes the memory pressure.