When using the load_from_parquet component, it is not possible to keep the original index.
If the id_column argument is not set, Fondant will automatically generate a new unique index. But the id_column argument only allows the selection of regular columns, not the index.
Eg. for the following dataset:
Using load_from_parquet with id_column="id" leads to the following error:
ValueError: An error occurred while calling the read_parquet method registered to the pandas backend.
Original Message: The following columns were not found in the dataset {'id'}
The following columns were found Index(['embedding', 'url'], dtype='object')
When using the
load_from_parquet
component, it is not possible to keep the original index.If the
id_column
argument is not set, Fondant will automatically generate a new unique index. But theid_column
argument only allows the selection of regular columns, not the index.Eg. for the following dataset:
Using
load_from_parquet
withid_column="id"
leads to the following error: