Closed clembou closed 7 years ago
Hi @clembou
Great feature! Just some minor issues:
$ flake8 -q
./test/test_spark_session_fixture.py
./test/test_spark_context_fixture.py
./pytest_spark/__init__.py
SparkSession
we are limiting the usage to Spark 2.x only. May be we can check the version of the Spark and yield HiveContext
in case of <2.x. What do you think?@malexer sorry I forgot to run the reformatter on the code, it is fixed now!
Good point on spark <2.x, I added a version check that will raise an Exception on spark 1.
I think SQLContext
and HiveContext
, unlike SparkSession
, are easy (and quick) to create from the spark_context
object - as such I am not sure it is worth providing a fixture for those? If so I think it would be best done explicitly e.g. by adding a sql_context
and hive_context
fixture rather than overloading the spark_session
one?
I agree with you, let's leave spark_session
as it is now.
Thanks for your efforts!
awesome! Thanks @malexer !
Hi @malexer
This PR adds a new fixture called
spark_session
that provides apyspark.sql.SparkSession
with hive support enabled.This appears to be the recommended entry point when using the DataFrame api these days.
I also added a few tests to ensure these work while I was at it.