ml6team / fondant

Production-ready data processing made easy and shareable
https://fondant.ai/en/stable/
Apache License 2.0
342 stars 25 forks source link

`load_from_parquet` component does not allow to keep original index #890

Open RobbeSneyders opened 8 months ago

RobbeSneyders commented 8 months ago

When using the load_from_parquet component, it is not possible to keep the original index.

If the id_column argument is not set, Fondant will automatically generate a new unique index. But the id_column argument only allows the selection of regular columns, not the index.

Eg. for the following dataset:

image

Using load_from_parquet with id_column="id" leads to the following error:

ValueError: An error occurred while calling the read_parquet method registered to the pandas backend. Original Message: The following columns were not found in the dataset {'id'} The following columns were found Index(['embedding', 'url'], dtype='object')