hpcugent / hanythingondemand

hanythingondemand provides a set of scripts to easily set up an ad-hoc Hadoop cluster through PBS jobs
https://hod.readthedocs.org
GNU General Public License v2.0
12 stars 6 forks source link

PySpark problems w/ mllib running out of memory / disk space #137

Closed ehiggs closed 8 years ago

ehiggs commented 8 years ago

When using PySpark Notebook and MLLib's ALS train function, we end up running out of memory. Maybe this is down to tuning, but looking around the web, I see that this is a known issue:

https://github.com/databricks/spark-perf/issues/92

ehiggs commented 8 years ago

Spark 1.6 is available which may help here. #146 has made this available.

ehiggs commented 8 years ago

This is an upstream issue; not with hod.