Within an Azure Databricks Environment we're using this library to read Excel files stored in a Storage Account accessed using either the ABFSS or DBFS protocols, suggesting this is a file issue and not a protocol issue.
.
Attempting to read the file with newer versions of the spark-excel library result in the following error caused by macros in the workbook: crealytics excel workbook java.io.IOException: The file appears to be potentially malicious. "This file embeds more internal file entries than expected."
We have reverted to a previous version that does not present this error and are looking for a solution that allows us to bypass the macro detection in our workbook which does contain macros, but are required as part of the workbook.
Expected Behavior
Reading the file into a dataframe should not be met with this error, OR, an option to override the macro detection in order to be able to force-read when "potentially" maliciousness is present.
- Spark version: 3.4.1 via Databricks Runtime 13.3
- Spark-Excel version: 3.5.0_0.20.3
- OS: Windows but remote-run from Databricks clusters
- Cluster environment: Multiple cluster configurations representing dev/stg/prd using the same Databricks Runtime and Spark Versions.
Anything else?
We have reverted to using the previous version maven coordinates: com.crealytics:spark-excel_2.12:0.13.7 for our install which does not produce this issue.
spark-excel doesn't do anything in that regard.
It must be an upstream library that performs this check. Can you try to find out if this comes from POI?
Is there an existing issue for this?
Current Behavior
Within an Azure Databricks Environment we're using this library to read Excel files stored in a Storage Account accessed using either the ABFSS or DBFS protocols, suggesting this is a file issue and not a protocol issue. . Attempting to read the file with newer versions of the spark-excel library result in the following error caused by macros in the workbook:
crealytics excel workbook java.io.IOException: The file appears to be potentially malicious. "This file embeds more internal file entries than expected."
We have reverted to a previous version that does not present this error and are looking for a solution that allows us to bypass the macro detection in our workbook which does contain macros, but are required as part of the workbook.
Expected Behavior
Reading the file into a dataframe should not be met with this error, OR, an option to override the macro detection in order to be able to force-read when "potentially" maliciousness is present.
Steps To Reproduce
The following python code produces our error:
Environment
Anything else?
We have reverted to using the previous version maven coordinates:
com.crealytics:spark-excel_2.12:0.13.7
for our install which does not produce this issue.