Open darrylthom opened 1 month ago
My guess is that this has to do with the Boolean Hybrid-RLE encoding.
My guess is that this has to do with the Boolean Hybrid-RLE encoding.
Yes, this is it exactly. When I drop my boolean columns from my parquet file in the latest Polars, PBI Service refreshes the file successfully.
It seems that the service only supports older parquet formats/encodings. For now you can circumvent the issue by writing via pyarrow which allows you to select different encodings.
This is something we could also support to a limited extend.
It seems that the service only supports older parquet formats/encodings. For now you can circumvent the issue by writing via pyarrow which allows you to select different encodings.
This is something we could also support to a limited extend.
Writing with pyarrow for the meantime worked. I tried creating an issue with the PowerBI team, but it got caught with their triaging vendor who was claiming it had to do with Polars and not Power BI so they wouldn't escalate it to the product team and recommended I downgrade instead.
Checks
Reproducible example
Log output
No response
Issue description
Just to explain the setup a bit: Parquet gets written to a network drive. Report published to PBI Service connects to this parquet file using an on-premises gateway.
Refreshing works on local copy of PBI file, but through PBI Service specifically, it is now giving an error:
Data source error: {"error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","pbi.error":{"code":"DM_GWPipeline_Gateway_MashupataAccessError","parameters":{},,"details":[{"code":"DM_errorDetailNameCode_UnderlyingErrorCode","detail":{"type":1,"value":"-2147467259"}},{"code":"DM_ErrorDetailNameCode_UnderlyingErrorMessage","detail":{"type":1,"value":"Parquet: class parquet::ParquetException (message: 'Unknown encoding type.'"}}, {"code": "DM_ErrorDetailNameCode_UnderlyingHResult", "detail":{"type":1,"value":"-2147467259"}},"code":"Microsoft.Data.Mashup.ValueError.Reason","detail":{"type":1,"value":"DataFormat.Error"}}]"eceptionCulprit":1}}}
This refreshes fine locally -- the problem is PBI Service specifically. I tested generating my parquet files version to version from Polars 1.2 up until current, and I start getting these messages as of Polars 1.5.0's write_parquet specifically.
I believe something changed specifically in the write_parquet output that is causing it to no longer be compatible with the PBI Service's parquet connector in newer versions. I have analyzed the schema and the meta data and they are exactly the same in the old output versus new output.
Expected behavior
As nothing has changed in my schema or meta data, the files should be refreshing, but it seems like write_parquet's encoding is not recognized by PBI Service as of 1.5.0 onwards.
Installed versions