Open oliviernguyenquoc opened 7 years ago
It is possible to use this library in pyspark but is not super easy right now. You have to use the py4j gateway often to setup the classes correctly on the jvm side and wrap the returned dataframe manually.
Could you give an example on how you are trying to use it? I good place to start would be to see if you can call the java classes in python and make sure those are loaded correctly.
session = SparkSession.builder.getOrCreate()
bq = session._sc._jvm.com.spotify.spark.bigquery.BigQuerySQLContext(session._wrapped._jsqlContext)
bq.setBigQueryDatasetLocation(...)
bq.setBigQueryProjectId(...)
bq.setBigQueryGcsBucket(...)
bq.setGcpJsonKeyFile(...)
If that works you should be able to query by calling into the scala function and wrapping the return jvm dataframe
df = DataFrame(bq.bigQuerySelect(sql), session._wrapped)
Hi everyone,
If I understand well, I can use this package with pySpark. Is it correct ?
Moreover, I am not able to import the package in my python notebook (Datalab): "No module found" error.
I have tried to make a initialization action in dataproc (https://cloud.google.com/dataproc/docs/concepts/init-actions) without success.
Any help ?