Open ijmiller2 opened 1 year ago
The AutoTrain format is not supported right now. I think it would require a dedicated dataset builder
Okay, good to know. Thanks for the reply. For now I will just have to manage the split manually before training, because I canβt find any way of pulling out file indices or file names from the autogenerated split. The file names field of the image dataset (loaded directly from arrow file) is missing, just fyi (for anyone else this might be relevant too).
On Fri, Mar 10, 2023 at 7:02 PM Quentin Lhoest @.***> wrote:
The AutoTrain format is not supported right now. I think it would require a dedicated dataset builder
β Reply to this email directly, view it on GitHub https://github.com/huggingface/datasets/issues/5627#issuecomment-1464734308, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBJ4F5A353MCZ76OGRJ6CTW3PFI7ANCNFSM6AAAAAAVWXNUTE . You are receiving this because you authored the thread.Message ID: @.***>
Describe the bug
DatasetGenerationError: An error occurred while generating the dataset -> ValueError: Couldn't cast ... because column names don't match
Steps to reproduce the bug
Steps to reproduce:
pip install datasets==2.10.1
huggingface-cli login
Here's the full traceback:
Expected behavior
I'm ultimately trying to generate my own performance metrics on validation data (before putting an endpoint into production) and so was hoping to load all or at least the validation subset from the hub.
I'm expecting the
load_dataset()
function to work as shown in the documentation here:Environment info
datasets
version: 2.10.1