malexer / pytest-spark

pytest plugin to run the tests with support of pyspark
MIT License
84 stars 30 forks source link

Pipenv pytest not finding spark_home #13

Closed vfrank66 closed 5 years ago

vfrank66 commented 5 years ago

When using pipenv installed through pip the hooks fire findspark.py before the configuration is read. When using brew install pipenv and the hooks fire in the right order. This might not be the right place for this issue, except that if I remove this package and set the spark_home through pytest.ini under env it does work.

pip install --user pipenv
pipenv install --dev --ignore-pipfile or pipenv install --dev  pyspark pytest pytest-spark
pipenv run py.test --rootdir . test --cov src/ --cov-fail-under 20 -vv test/ --junitxml=pytest-report.xml
or 
pipenv run py.test 

pytest.ini

[pytest]
spark_home=spark/
testpaths=test
log_format = %(asctime)s %(levelname)s %(message)s
log_date_format = %Y-%m-%d %H:%M:%S

Error

``` pipenv run py.test INTERNALERROR> Traceback (most recent call last): INTERNALERROR> File "/Users/vfrank/.local/share/virtualenvs/lcs-glue-python-extractor-KOyFA_4z/lib/python2.7/site-packages/_pytest/main.py", line 201, in wrap_session INTERNALERROR> config._do_configure() INTERNALERROR> File "/Users/vfrank/.local/share/virtualenvs/lcs-glue-python-extractor-KOyFA_4z/lib/python2.7/site-packages/_pytest/config/__init__.py", line 668, in _do_configure INTERNALERROR> self.hook.pytest_configure.call_historic(kwargs=dict(config=self)) INTERNALERROR> File "/Users/vfrank/.local/share/virtualenvs/lcs-glue-python-extractor-KOyFA_4z/lib/python2.7/site-packages/pluggy/hooks.py", line 311, in call_historic INTERNALERROR> res = self._hookexec(self, self.get_hookimpls(), kwargs) INTERNALERROR> File "/Users/vfrank/.local/share/virtualenvs/lcs-glue-python-extractor-KOyFA_4z/lib/python2.7/site-packages/pluggy/manager.py", line 87, in _hookexec INTERNALERROR> return self._inner_hookexec(hook, methods, kwargs) INTERNALERROR> File "/Users/vfrank/.local/share/virtualenvs/lcs-glue-python-extractor-KOyFA_4z/lib/python2.7/site-packages/pluggy/manager.py", line 81, in INTERNALERROR> firstresult=hook.spec.opts.get("firstresult") if hook.spec else False, INTERNALERROR> File "/Users/vfrank/.local/share/virtualenvs/lcs-glue-python-extractor-KOyFA_4z/lib/python2.7/site-packages/pluggy/callers.py", line 208, in _multicall INTERNALERROR> return outcome.get_result() INTERNALERROR> File "/Users/vfrank/.local/share/virtualenvs/lcs-glue-python-extractor-KOyFA_4z/lib/python2.7/site-packages/pluggy/callers.py", line 81, in get_result INTERNALERROR> _reraise(*ex) # noqa INTERNALERROR> File "/Users/vfrank/.local/share/virtualenvs/lcs-glue-python-extractor-KOyFA_4z/lib/python2.7/site-packages/pluggy/callers.py", line 187, in _multicall INTERNALERROR> res = hook_impl.function(*args) INTERNALERROR> File "/Users/vfrank/.local/share/virtualenvs/lcs-glue-python-extractor-KOyFA_4z/lib/python2.7/site-packages/pytest_spark/__init__.py", line 30, in pytest_configure INTERNALERROR> findspark.init(spark_home) INTERNALERROR> File "/Users/vfrank/.local/share/virtualenvs/lcs-glue-python-extractor-KOyFA_4z/lib/python2.7/site-packages/findspark.py", line 135, in init INTERNALERROR> py4j = glob(os.path.join(spark_python, 'lib', 'py4j-*.zip'))[0] INTERNALERROR> IndexError: list index out of range ```

Works:

  1. Remove pytest-spark pipenv uninstall pytest.spark
  2. Update pytest.ini pytest.ini
    [pytest]
    env =
    SPARK_HOME=spark/
    testpaths=test
    log_format = %(asctime)s %(levelname)s %(message)s
    log_date_format = %Y-%m-%d %H:%M:%S

Also works, through brew and with pytest-spark:

brew install pipenv
pipenv install --dev --ignore-pipfile or pipenv install --dev  pyspark pytest pytest-spark
pipenv run py.test --rootdir . test --cov src/ --cov-fail-under 20 -vv test/ --junitxml=pytest-report.xml 
or 
pipenv run py.test 

Produces

============================================= test session starts =============================================
platform darwin -- Python 2.7.10, pytest-4.5.0, py-1.8.0, pluggy-0.12.0
rootdir: /Users/vfrank/dev-working/blah/lcs-glue-python-extractor, inifile: pytest.ini, testpaths: test
plugins: cov-2.7.1, mock-1.10.4
collected 3 items                                                                                             

test/test_dynamic_etl_script.py .                                                                       [ 33%]
test/test_hydrox.py ..   
malexer commented 5 years ago

@vfrank66 In your first case you are specifying spark_home=spark/ which is not reliable, it's better to provide full path. Note: and we expect to find the dir with full spark installation, not the pyspark installed via pip.

But the core issue is - you have already pyspark installed via pip/pipenv, so pyspark is importable. pytest-spark uses this variable to find your spark dir and make pyspark importable in your tests.

So, no need to define spark_home neither in pytest.ini nor as --spark_home param to pytest nor as SPARK_HOME env variable. That should fix your issue.

Your second case worked because according to:

plugins: cov-2.7.1, mock-1.10.4

plugin pytest-spark was not discovered for some reason. Are you sure it was indeed installed into correct environment?

malexer commented 5 years ago

Anyway thanks for the interesting case, I will try to cover it in the readme.

malexer commented 5 years ago

Updated readme a5ce0a906c67bf535d8d6a5dfd295bcd814507a4