tribbloid / ISpark

An Apache Spark-shell backend for IPython
Apache License 2.0
107 stars 29 forks source link

Printing variable value in notebook when I set value to a variable #17

Open tdna opened 9 years ago

tdna commented 9 years ago

Printing variable value in notebook is very annoying when I set large data to a variable. Actually freezes happen often.

For example: val test = [LARGE DATA]

tribbloid commented 9 years ago

Thanks a lot for report, I've been aware of this for a long time. The major cause is that Spark interpreter only expose its API to retrieve the last variable being set, regardless of whether its the last line of a code or not. It's possible to use deeper non-API functions to retrieve it but the gain doesn't justify the risk. Could you append it with ;"" to override it with an empty output?

tdna commented 9 years ago

Yes I did that, but I don't know this workaround solved this issue or not. Because when I trained an svd and I wanted to serialize V matrix ISpark threw out of memory exception. The same job in a spark shell run smoothly. Maybe you know why. Thanks for helping!

tribbloid commented 9 years ago

ISpark has much higher memory consumption than spark-shell for obvious reason (message queue, visualization etc.) Make sure you increase both --driver-memory and --executor-memory to accomodate your dataset.

I doubt if display too much is the cause of out of memory: in this case the websocket will timeout first before that happens.

tdna commented 9 years ago

Hm... Strange because I set driver-memory to 32G and the actual usage was about 23G then I started to save object file and the notebook died with oom.

tribbloid commented 9 years ago

IMHO save object file is done on workers/executors. SO that's expected