kedro-org / kedro-plugins

First-party plugins maintained by the Kedro team.
Apache License 2.0
93 stars 89 forks source link

EagerPolarsDataset returning codec issue #551

Closed Dekermanjian closed 9 months ago

Dekermanjian commented 9 months ago

Description

EagerPolarsDataset returning codec issue. However reading the same parquet file with pl.read_parquet() works

Context

I can not use polars to read the data

Steps to Reproduce

My catalog.yml looks like this:

Billed_denials:
  type: polars.EagerPolarsDataset
  filepath: data/02_intermediate/Billed_denials.parquet
  file_format: parquet

Expected Result

To load the parquet file with polars

Actual Result

-- If you received an error, place it here.

DatasetError: Failed while loading data from data set EagerPolarsDataset(file_format=parquet, 
filepath=data/02_intermediate/Billed_denials.parquet, load_args={}, 
protocol=file, save_args={}).
'utf-8' codec can't decode byte 0xc0 in position 1: invalid start byte

-- Separate them if you have more than one.



## Your Environment
Include as many relevant details about the environment in which you experienced the bug:

* Kedro version used (`pip show kedro` or `kedro -V`): 19.2
* Kedro plugin and kedro plugin version used (`pip show kedro-airflow`): 2.0.0
* Python version used (`python -V`): 3.10.13
* Operating system and version: Linux and Windows
astrojuanlu commented 9 months ago

Hi @Dekermanjian ! Any chance you can share a small sample of the file that can help us reproduce the issue?

astrojuanlu commented 9 months ago

also, please try to see if this solution works for you https://github.com/kedro-org/kedro-plugins/issues/500#issuecomment-1867342555

Dekermanjian commented 9 months ago

@astrojuanlu thank you so much for the quick response! Downgrading to v0.19 did resolve the issue. Unfortunately, I can not share the file due to Patient Health Information.