microsoft / charts

A Helm Chart Repository for Microsoft Projects
MIT License
30 stars 28 forks source link

Spark can't connect to Google Cloud Storage #17

Open WaterKnight1998 opened 4 years ago

WaterKnight1998 commented 4 years ago

I have created a notebook using zeppelint. Inside it I am trying to acces a file in GC Storage.

It is getting this error:

Py4JJavaError: An error occurred while calling o72.csv.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
WaterKnight1998 commented 4 years ago

@dbanda I saw that this chart came from your personal repo. I would like to be able to use google cloud connector and install prophet on spark executors. How can I do it? I tried using my own spark image which contains the connector jar.

However, after adding spark.jars property in zeppelin interpreter. It doesn't get loaded!

dbanda commented 4 years ago

What spark interpreter are you using on zeppelin? Judging by your error message, Im assuming you are using %spark.pyspark. You have to add your dependencies as the first line in your notebook file. You will also have to make sure you update the path. There is an bug in zeppelin where the paths to python aren't properly updated. Could you share with me your notebook so that I can investigate?