malexer / pytest-spark

pytest plugin to run the tests with support of pyspark
MIT License
85 stars 30 forks source link

Avoid to have enableHiveSupport() in fixture by default creating possible java deps problems #17

Closed mehd-io closed 4 years ago

mehd-io commented 4 years ago

Hi there, First thank you for that plugin! Great initiative! Problem: Im running into the same issue than here : https://github.com/malexer/pytest-spark/issues/14 and even with the correct some jar, it's creating other java deps problem with my based docker img used for spark testing/development. As I'm heavily using spark on AWS glue, this option is not needed at all and I think it makes sense to have a plain SparkSession in the fixture and let the user add the spark options he wants in the pytest.ini

Suggested solution: Remove enableHiveSupport() in https://github.com/malexer/pytest-spark/blob/master/pytest_spark/fixtures.py#L29 and for those who need that : enableHiveSupport() can be set with spark.sql.catalogImplementation=hive if Im not mistaken in pytest.ini

I can do the PR if you want :)

malexer commented 4 years ago

Hi @mehd-io ,

Glad to hear that you've liked it. :) Initially it was my internal tool which I've decided to share at some point. That's why some implementations can be specific to my usage, sorry.

That's a great idea with configuration of enableHiveSupport() but let's try to avoid making changes into the default behaviour - so nobody will get broken tests after update to the new version.

My proposal is:

  1. Move Hive support to config as you advised.
  2. Make the default configuration with Hive enabled.
  3. Mention it in the README.

Please, let me know if it sounds good for you.

I can do the PR if you want :)

Great! Thanks in advance :)

malexer commented 4 years ago

Fixed in version 0.6.0, already on Pypi. Please, verify.

mehd-io commented 4 years ago

Sorry, realize never replied to that - yes it works!

malexer commented 4 years ago

Great, thanks.