Pass the raised ImportError on failing to import pandas/pyarrow. This will help the user identify whether pandas/pyarrow are indeed not in the environment or if they threw a different ImportError.
Yes, it will now show the root cause of the exception when pandas or arrow is missing during import.
How was this patch tested?
Manually tested.
from pyspark.sql.functions import pandas_udf
spark.range(1).select(pandas_udf(lambda x: x, "int")("id")).show()
Before:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/...//spark/python/pyspark/sql/pandas/functions.py", line 332, in pandas_udf
require_minimum_pyarrow_version()
File "/.../spark/python/pyspark/sql/pandas/utils.py", line 53, in require_minimum_pyarrow_version
raise ImportError("PyArrow >= %s must be installed; however, "
ImportError: PyArrow >= 1.0.0 must be installed; however, it was not found.
After:
Traceback (most recent call last):
File "/.../spark/python/pyspark/sql/pandas/utils.py", line 49, in require_minimum_pyarrow_version
import pyarrow
ModuleNotFoundError: No module named 'pyarrow'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/pandas/functions.py", line 332, in pandas_udf
require_minimum_pyarrow_version()
File "/.../spark/python/pyspark/sql/pandas/utils.py", line 55, in require_minimum_pyarrow_version
raise ImportError("PyArrow >= %s must be installed; however, "
ImportError: PyArrow >= 1.0.0 must be installed; however, it was not found.
Upstream SPARK-XXXXX ticket and PR link (if not applicable, explain)
SPARK-34803 https://github.com/apache/spark/pull/31902
What changes were proposed in this pull request?
Pass the raised
ImportError
on failing to import pandas/pyarrow. This will help the user identify whether pandas/pyarrow are indeed not in the environment or if they threw a differentImportError
.Why are the changes needed?
This can already happen in Pandas for example where it could throw an
ImportError
on its initialisation path ifdateutil
doesn't satisfy a certain version requirement https://github.com/pandas-dev/pandas/blob/0.24.x/pandas/compat/__init__.py#L438Does this PR introduce any user-facing change?
Yes, it will now show the root cause of the exception when pandas or arrow is missing during import.
How was this patch tested?
Manually tested.
Before:
After: