huggingface / autotrain-advanced

πŸ€— AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
3.65k stars 442 forks source link

[BUG] KeyError: 'tags' in NER finetuning #594

Closed Jerado10 closed 2 months ago

Jerado10 commented 2 months ago

Prerequisites

Backend

Hugging Face Space/Endpoints

Interface Used

UI

CLI Command

No response

UI Screenshots & Parameters

Config in UI:

Screenshot 2024-04-23 at 20 37 35

Example row of data:

Screenshot 2024-04-23 at 20 45 47

Error Logs

===== Application Startup at 2024-04-23 10:36:04 =====

========== == CUDA ==

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Found existing installation: autotrain-advanced 0.7.59.dev0 Uninstalling autotrain-advanced-0.7.59.dev0: Successfully uninstalled autotrain-advanced-0.7.59.dev0 Collecting autotrain-advanced Downloading autotrain_advanced-0.7.58-py3-none-any.whl.metadata (12 kB) Requirement already satisfied: albumentations==1.3.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.3.1) Requirement already satisfied: codecarbon==2.2.3 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.2.3) Requirement already satisfied: datasets~=2.14.0 in ./env/lib/python3.10/site-packages (from datasets[vision]~=2.14.0->autotrain-advanced) (2.14.7) Requirement already satisfied: evaluate==0.3.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.3.0) Requirement already satisfied: ipadic==1.0.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.0.0) Requirement already satisfied: jiwer==3.0.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (3.0.2) Requirement already satisfied: joblib==1.3.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.3.1) Requirement already satisfied: loguru==0.7.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.7.0) Requirement already satisfied: pandas>=1.4 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.2.2) Requirement already satisfied: nltk==3.8.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (3.8.1) Requirement already satisfied: optuna==3.3.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (3.3.0) Requirement already satisfied: Pillow==10.0.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (10.0.0) Requirement already satisfied: protobuf==4.23.4 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (4.23.4) Requirement already satisfied: sacremoses==0.0.53 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.0.53) Requirement already satisfied: scikit-learn==1.3.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.3.0) Requirement already satisfied: sentencepiece==0.1.99 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.1.99) Requirement already satisfied: tqdm==4.65.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (4.65.0) Requirement already satisfied: werkzeug==2.3.6 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.3.6) Requirement already satisfied: xgboost==1.7.6 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.7.6) Requirement already satisfied: huggingface-hub==0.22.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.22.2) Requirement already satisfied: requests==2.31.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.31.0) Requirement already satisfied: einops==0.6.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.6.1) Requirement already satisfied: invisible-watermark==0.2.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.2.0) Requirement already satisfied: packaging==23.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (23.1) Requirement already satisfied: cryptography==42.0.5 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (42.0.5) Requirement already satisfied: nvitop==1.3.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.3.2) Requirement already satisfied: tensorboard in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.16.2) Requirement already satisfied: peft==0.10.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.10.0) Requirement already satisfied: trl==0.8.5 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.8.5) Requirement already satisfied: tiktoken==0.6.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.6.0) Requirement already satisfied: transformers==4.40.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (4.40.0) Requirement already satisfied: accelerate==0.29.3 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.29.3) Requirement already satisfied: diffusers==0.27.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.27.2) Requirement already satisfied: bitsandbytes==0.43.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.43.1) Requirement already satisfied: rouge-score==0.1.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.1.2) Requirement already satisfied: py7zr==0.20.6 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.20.6) Requirement already satisfied: fastapi==0.104.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.104.1) Requirement already satisfied: uvicorn==0.22.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.22.0) Requirement already satisfied: python-multipart==0.0.6 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.0.6) Requirement already satisfied: gradio==3.41.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (3.41.0) Requirement already satisfied: pydantic==2.4.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.4.2) Requirement already satisfied: hf-transfer in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.1.6) Requirement already satisfied: pyngrok==7.0.3 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (7.0.3) Requirement already satisfied: authlib==1.3.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.3.0) Requirement already satisfied: itsdangerous==2.1.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.1.2) Requirement already satisfied: seqeval==1.2.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.2.2) Requirement already satisfied: numpy>=1.17 in ./env/lib/python3.10/site-packages (from accelerate==0.29.3->autotrain-advanced) (1.26.4) Requirement already satisfied: psutil in ./env/lib/python3.10/site-packages (from accelerate==0.29.3->autotrain-advanced) (5.9.8) Requirement already satisfied: pyyaml in ./env/lib/python3.10/site-packages (from accelerate==0.29.3->autotrain-advanced) (6.0.1) Requirement already satisfied: torch>=1.10.0 in ./env/lib/python3.10/site-packages (from accelerate==0.29.3->autotrain-advanced) (2.2.2) Requirement already satisfied: safetensors>=0.3.1 in ./env/lib/python3.10/site-packages (from accelerate==0.29.3->autotrain-advanced) (0.4.3) Requirement already satisfied: scipy>=1.1.0 in ./env/lib/python3.10/site-packages (from albumentations==1.3.1->autotrain-advanced) (1.13.0) Requirement already satisfied: scikit-image>=0.16.1 in ./env/lib/python3.10/site-packages (from albumentations==1.3.1->autotrain-advanced) (0.23.2) Requirement already satisfied: qudida>=0.0.4 in ./env/lib/python3.10/site-packages (from albumentations==1.3.1->autotrain-advanced) (0.0.4) Requirement already satisfied: opencv-python-headless>=4.1.1 in ./env/lib/python3.10/site-packages (from albumentations==1.3.1->autotrain-advanced) (4.9.0.80) Requirement already satisfied: arrow in ./env/lib/python3.10/site-packages (from codecarbon==2.2.3->autotrain-advanced) (1.3.0) Requirement already satisfied: pynvml in ./env/lib/python3.10/site-packages (from codecarbon==2.2.3->autotrain-advanced) (11.5.0) Requirement already satisfied: py-cpuinfo in ./env/lib/python3.10/site-packages (from codecarbon==2.2.3->autotrain-advanced) (9.0.0) Requirement already satisfied: fuzzywuzzy in ./env/lib/python3.10/site-packages (from codecarbon==2.2.3->autotrain-advanced) (0.18.0) Requirement already satisfied: click in ./env/lib/python3.10/site-packages (from codecarbon==2.2.3->autotrain-advanced) (8.1.7) Requirement already satisfied: cffi>=1.12 in ./env/lib/python3.10/site-packages (from cryptography==42.0.5->autotrain-advanced) (1.16.0) Requirement already satisfied: importlib-metadata in ./env/lib/python3.10/site-packages (from diffusers==0.27.2->autotrain-advanced) (7.1.0) Requirement already satisfied: filelock in ./env/lib/python3.10/site-packages (from diffusers==0.27.2->autotrain-advanced) (3.13.1) Requirement already satisfied: regex!=2019.12.17 in ./env/lib/python3.10/site-packages (from diffusers==0.27.2->autotrain-advanced) (2024.4.16) Requirement already satisfied: dill in ./env/lib/python3.10/site-packages (from evaluate==0.3.0->autotrain-advanced) (0.3.7) Requirement already satisfied: xxhash in ./env/lib/python3.10/site-packages (from evaluate==0.3.0->autotrain-advanced) (3.4.1) Requirement already satisfied: multiprocess in ./env/lib/python3.10/site-packages (from evaluate==0.3.0->autotrain-advanced) (0.70.15) Requirement already satisfied: fsspec>=2021.05.0 in ./env/lib/python3.10/site-packages (from fsspec[http]>=2021.05.0->evaluate==0.3.0->autotrain-advanced) (2023.10.0) Requirement already satisfied: responses<0.19 in ./env/lib/python3.10/site-packages (from evaluate==0.3.0->autotrain-advanced) (0.18.0) Requirement already satisfied: anyio<4.0.0,>=3.7.1 in ./env/lib/python3.10/site-packages (from fastapi==0.104.1->autotrain-advanced) (3.7.1) Requirement already satisfied: starlette<0.28.0,>=0.27.0 in ./env/lib/python3.10/site-packages (from fastapi==0.104.1->autotrain-advanced) (0.27.0) Requirement already satisfied: typing-extensions>=4.8.0 in ./env/lib/python3.10/site-packages (from fastapi==0.104.1->autotrain-advanced) (4.9.0) Requirement already satisfied: aiofiles<24.0,>=22.0 in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (23.2.1) Requirement already satisfied: altair<6.0,>=4.2.0 in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (5.3.0) Requirement already satisfied: ffmpy in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (0.3.2) Requirement already satisfied: gradio-client==0.5.0 in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (0.5.0) Requirement already satisfied: httpx in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (0.27.0) Requirement already satisfied: importlib-resources<7.0,>=1.3 in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (6.4.0) Requirement already satisfied: jinja2<4.0 in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (3.1.3) Requirement already satisfied: markupsafe~=2.0 in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (2.1.3) Requirement already satisfied: matplotlib~=3.0 in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (3.8.4) Requirement already satisfied: orjson~=3.0 in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (3.10.1) Requirement already satisfied: pydub in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (0.25.1) Requirement already satisfied: semantic-version~=2.0 in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (2.10.0) Requirement already satisfied: websockets<12.0,>=10.0 in ./env/lib/python3.10/site-packages (from gradio==3.41.0->autotrain-advanced) (11.0.3) Requirement already satisfied: PyWavelets>=1.1.1 in ./env/lib/python3.10/site-packages (from invisible-watermark==0.2.0->autotrain-advanced) (1.6.0) Requirement already satisfied: opencv-python>=4.1.0.25 in ./env/lib/python3.10/site-packages (from invisible-watermark==0.2.0->autotrain-advanced) (4.9.0.80) Requirement already satisfied: rapidfuzz==2.13.7 in ./env/lib/python3.10/site-packages (from jiwer==3.0.2->autotrain-advanced) (2.13.7) Requirement already satisfied: nvidia-ml-py<12.536.0a0,>=11.450.51 in ./env/lib/python3.10/site-packages (from nvitop==1.3.2->autotrain-advanced) (12.535.133) Requirement already satisfied: cachetools>=1.0.1 in ./env/lib/python3.10/site-packages (from nvitop==1.3.2->autotrain-advanced) (5.3.3) Requirement already satisfied: termcolor>=1.0.0 in ./env/lib/python3.10/site-packages (from nvitop==1.3.2->autotrain-advanced) (2.4.0) Requirement already satisfied: alembic>=1.5.0 in ./env/lib/python3.10/site-packages (from optuna==3.3.0->autotrain-advanced) (1.13.1) Requirement already satisfied: cmaes>=0.10.0 in ./env/lib/python3.10/site-packages (from optuna==3.3.0->autotrain-advanced) (0.10.0) Requirement already satisfied: colorlog in ./env/lib/python3.10/site-packages (from optuna==3.3.0->autotrain-advanced) (6.8.2) Requirement already satisfied: sqlalchemy>=1.3.0 in ./env/lib/python3.10/site-packages (from optuna==3.3.0->autotrain-advanced) (2.0.29) Requirement already satisfied: texttable in ./env/lib/python3.10/site-packages (from py7zr==0.20.6->autotrain-advanced) (1.7.0) Requirement already satisfied: pycryptodomex>=3.6.6 in ./env/lib/python3.10/site-packages (from py7zr==0.20.6->autotrain-advanced) (3.20.0) Requirement already satisfied: pyzstd>=0.14.4 in ./env/lib/python3.10/site-packages (from py7zr==0.20.6->autotrain-advanced) (0.15.10) Requirement already satisfied: pyppmd<1.1.0,>=0.18.1 in ./env/lib/python3.10/site-packages (from py7zr==0.20.6->autotrain-advanced) (1.0.0) Requirement already satisfied: pybcj>=0.6.0 in ./env/lib/python3.10/site-packages (from py7zr==0.20.6->autotrain-advanced) (1.0.2) Requirement already satisfied: multivolumefile>=0.2.3 in ./env/lib/python3.10/site-packages (from py7zr==0.20.6->autotrain-advanced) (0.2.3) Requirement already satisfied: brotli>=1.0.9 in ./env/lib/python3.10/site-packages (from py7zr==0.20.6->autotrain-advanced) (1.0.9) Requirement already satisfied: inflate64>=0.3.1 in ./env/lib/python3.10/site-packages (from py7zr==0.20.6->autotrain-advanced) (1.0.0) Requirement already satisfied: annotated-types>=0.4.0 in ./env/lib/python3.10/site-packages (from pydantic==2.4.2->autotrain-advanced) (0.6.0) Requirement already satisfied: pydantic-core==2.10.1 in ./env/lib/python3.10/site-packages (from pydantic==2.4.2->autotrain-advanced) (2.10.1) Requirement already satisfied: charset-normalizer<4,>=2 in ./env/lib/python3.10/site-packages (from requests==2.31.0->autotrain-advanced) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in ./env/lib/python3.10/site-packages (from requests==2.31.0->autotrain-advanced) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in ./env/lib/python3.10/site-packages (from requests==2.31.0->autotrain-advanced) (2.1.0) Requirement already satisfied: certifi>=2017.4.17 in ./env/lib/python3.10/site-packages (from requests==2.31.0->autotrain-advanced) (2024.2.2) Requirement already satisfied: absl-py in ./env/lib/python3.10/site-packages (from rouge-score==0.1.2->autotrain-advanced) (2.1.0) Requirement already satisfied: six>=1.14.0 in ./env/lib/python3.10/site-packages (from rouge-score==0.1.2->autotrain-advanced) (1.16.0) Requirement already satisfied: threadpoolctl>=2.0.0 in ./env/lib/python3.10/site-packages (from scikit-learn==1.3.0->autotrain-advanced) (3.4.0) Requirement already satisfied: tokenizers<0.20,>=0.19 in ./env/lib/python3.10/site-packages (from transformers==4.40.0->autotrain-advanced) (0.19.1) Requirement already satisfied: tyro>=0.5.11 in ./env/lib/python3.10/site-packages (from trl==0.8.5->autotrain-advanced) (0.8.3) Requirement already satisfied: h11>=0.8 in ./env/lib/python3.10/site-packages (from uvicorn==0.22.0->autotrain-advanced) (0.14.0) Requirement already satisfied: pyarrow>=8.0.0 in ./env/lib/python3.10/site-packages (from datasets~=2.14.0->datasets[vision]~=2.14.0->autotrain-advanced) (16.0.0) Requirement already satisfied: pyarrow-hotfix in ./env/lib/python3.10/site-packages (from datasets~=2.14.0->datasets[vision]~=2.14.0->autotrain-advanced) (0.6) Requirement already satisfied: aiohttp in ./env/lib/python3.10/site-packages (from datasets~=2.14.0->datasets[vision]~=2.14.0->autotrain-advanced) (3.9.5) Requirement already satisfied: python-dateutil>=2.8.2 in ./env/lib/python3.10/site-packages (from pandas>=1.4->autotrain-advanced) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in ./env/lib/python3.10/site-packages (from pandas>=1.4->autotrain-advanced) (2024.1) Requirement already satisfied: tzdata>=2022.7 in ./env/lib/python3.10/site-packages (from pandas>=1.4->autotrain-advanced) (2024.1) Requirement already satisfied: grpcio>=1.48.2 in ./env/lib/python3.10/site-packages (from tensorboard->autotrain-advanced) (1.62.2) Requirement already satisfied: markdown>=2.6.8 in ./env/lib/python3.10/site-packages (from tensorboard->autotrain-advanced) (3.6) Requirement already satisfied: setuptools>=41.0.0 in ./env/lib/python3.10/site-packages (from tensorboard->autotrain-advanced) (68.2.2) Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in ./env/lib/python3.10/site-packages (from tensorboard->autotrain-advanced) (0.7.2) Requirement already satisfied: Mako in ./env/lib/python3.10/site-packages (from alembic>=1.5.0->optuna==3.3.0->autotrain-advanced) (1.3.3) Requirement already satisfied: jsonschema>=3.0 in ./env/lib/python3.10/site-packages (from altair<6.0,>=4.2.0->gradio==3.41.0->autotrain-advanced) (4.21.1) Requirement already satisfied: toolz in ./env/lib/python3.10/site-packages (from altair<6.0,>=4.2.0->gradio==3.41.0->autotrain-advanced) (0.12.1) Requirement already satisfied: sniffio>=1.1 in ./env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi==0.104.1->autotrain-advanced) (1.3.1) Requirement already satisfied: exceptiongroup in ./env/lib/python3.10/site-packages (from anyio<4.0.0,>=3.7.1->fastapi==0.104.1->autotrain-advanced) (1.2.1) Requirement already satisfied: pycparser in ./env/lib/python3.10/site-packages (from cffi>=1.12->cryptography==42.0.5->autotrain-advanced) (2.22) Requirement already satisfied: aiosignal>=1.1.2 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=2.14.0->datasets[vision]~=2.14.0->autotrain-advanced) (1.3.1) Requirement already satisfied: attrs>=17.3.0 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=2.14.0->datasets[vision]~=2.14.0->autotrain-advanced) (23.2.0) Requirement already satisfied: frozenlist>=1.1.1 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=2.14.0->datasets[vision]~=2.14.0->autotrain-advanced) (1.4.1) Requirement already satisfied: multidict<7.0,>=4.5 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=2.14.0->datasets[vision]~=2.14.0->autotrain-advanced) (6.0.5) Requirement already satisfied: yarl<2.0,>=1.0 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=2.14.0->datasets[vision]~=2.14.0->autotrain-advanced) (1.9.4) Requirement already satisfied: async-timeout<5.0,>=4.0 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=2.14.0->datasets[vision]~=2.14.0->autotrain-advanced) (4.0.3) Requirement already satisfied: contourpy>=1.0.1 in ./env/lib/python3.10/site-packages (from matplotlib~=3.0->gradio==3.41.0->autotrain-advanced) (1.2.1) Requirement already satisfied: cycler>=0.10 in ./env/lib/python3.10/site-packages (from matplotlib~=3.0->gradio==3.41.0->autotrain-advanced) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in ./env/lib/python3.10/site-packages (from matplotlib~=3.0->gradio==3.41.0->autotrain-advanced) (4.51.0) Requirement already satisfied: kiwisolver>=1.3.1 in ./env/lib/python3.10/site-packages (from matplotlib~=3.0->gradio==3.41.0->autotrain-advanced) (1.4.5) Requirement already satisfied: pyparsing>=2.3.1 in ./env/lib/python3.10/site-packages (from matplotlib~=3.0->gradio==3.41.0->autotrain-advanced) (3.1.2) Requirement already satisfied: networkx>=2.8 in ./env/lib/python3.10/site-packages (from scikit-image>=0.16.1->albumentations==1.3.1->autotrain-advanced) (3.1) Requirement already satisfied: imageio>=2.33 in ./env/lib/python3.10/site-packages (from scikit-image>=0.16.1->albumentations==1.3.1->autotrain-advanced) (2.34.1) Requirement already satisfied: tifffile>=2022.8.12 in ./env/lib/python3.10/site-packages (from scikit-image>=0.16.1->albumentations==1.3.1->autotrain-advanced) (2024.4.18) Requirement already satisfied: lazy-loader>=0.4 in ./env/lib/python3.10/site-packages (from scikit-image>=0.16.1->albumentations==1.3.1->autotrain-advanced) (0.4) Requirement already satisfied: greenlet!=0.4.17 in ./env/lib/python3.10/site-packages (from sqlalchemy>=1.3.0->optuna==3.3.0->autotrain-advanced) (3.0.3) Requirement already satisfied: sympy in ./env/lib/python3.10/site-packages (from torch>=1.10.0->accelerate==0.29.3->autotrain-advanced) (1.12) Requirement already satisfied: docstring-parser>=0.14.1 in ./env/lib/python3.10/site-packages (from tyro>=0.5.11->trl==0.8.5->autotrain-advanced) (0.16) Requirement already satisfied: rich>=11.1.0 in ./env/lib/python3.10/site-packages (from tyro>=0.5.11->trl==0.8.5->autotrain-advanced) (13.7.1) Requirement already satisfied: shtab>=1.5.6 in ./env/lib/python3.10/site-packages (from tyro>=0.5.11->trl==0.8.5->autotrain-advanced) (1.7.1) Requirement already satisfied: types-python-dateutil>=2.8.10 in ./env/lib/python3.10/site-packages (from arrow->codecarbon==2.2.3->autotrain-advanced) (2.9.0.20240316) Requirement already satisfied: httpcore==1.* in ./env/lib/python3.10/site-packages (from httpx->gradio==3.41.0->autotrain-advanced) (1.0.5) Requirement already satisfied: zipp>=0.5 in ./env/lib/python3.10/site-packages (from importlib-metadata->diffusers==0.27.2->autotrain-advanced) (3.18.1) Requirement already satisfied: jsonschema-specifications>=2023.03.6 in ./env/lib/python3.10/site-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio==3.41.0->autotrain-advanced) (2023.12.1) Requirement already satisfied: referencing>=0.28.4 in ./env/lib/python3.10/site-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio==3.41.0->autotrain-advanced) (0.34.0) Requirement already satisfied: rpds-py>=0.7.1 in ./env/lib/python3.10/site-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio==3.41.0->autotrain-advanced) (0.18.0) Requirement already satisfied: markdown-it-py>=2.2.0 in ./env/lib/python3.10/site-packages (from rich>=11.1.0->tyro>=0.5.11->trl==0.8.5->autotrain-advanced) (3.0.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./env/lib/python3.10/site-packages (from rich>=11.1.0->tyro>=0.5.11->trl==0.8.5->autotrain-advanced) (2.17.2) Requirement already satisfied: mpmath>=0.19 in ./env/lib/python3.10/site-packages (from sympy->torch>=1.10.0->accelerate==0.29.3->autotrain-advanced) (1.3.0) Requirement already satisfied: mdurl~=0.1 in ./env/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=11.1.0->tyro>=0.5.11->trl==0.8.5->autotrain-advanced) (0.1.2) Downloading autotrain_advanced-0.7.58-py3-none-any.whl (252 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 253.0/253.0 kB 13.1 MB/s eta 0:00:00 Installing collected packages: autotrain-advanced Successfully installed autotrain-advanced-0.7.58 Your installed package nvidia-ml-py is corrupted. Skip patch functions nvmlDeviceGet{Compute,Graphics,MPSCompute}RunningProcesses. You may get incorrect or incomplete results. Please consider reinstall package nvidia-ml-py via pip3 install --force-reinstall nvidia-ml-py nvitop. Your installed package nvidia-ml-py is corrupted. Skip patch functions nvmlDeviceGetMemoryInfo. You may get incorrect or incomplete results. Please consider reinstall package nvidia-ml-py via pip3 install --force-reinstall nvidia-ml-py nvitop. INFO | 2024-04-23 10:36:20 | autotrain.app::31 - Starting AutoTrain... WARNING | 2024-04-23 10:36:20 | autotrain.trainers.common:init:170 - Parameters not supplied by user and set to default: trainer, scheduler, warmup_ratio, model_max_length, lora_r, push_to_hub, project_name, lora_dropout, username, save_total_limit, lr, model, prompt_text_column, text_column, auto_find_batch_size, repo_id, optimizer, max_prompt_length, merge_adapter, train_split, seed, lora_alpha, valid_split, use_flash_attention_2, weight_decay, disable_gradient_checkpointing, token, dpo_beta, batch_size, model_ref, rejected_text_column, add_eos_token, evaluation_strategy, gradient_accumulation, logging_steps, data_path, max_grad_norm WARNING | 2024-04-23 10:36:20 | autotrain.trainers.common:init:170 - Parameters not supplied by user and set to default: seed, valid_split, save_strategy, weight_decay, token, scheduler, target_column, warmup_ratio, max_seq_length, push_to_hub, batch_size, epochs, project_name, username, save_total_limit, lr, model, evaluation_strategy, gradient_accumulation, text_column, auto_find_batch_size, repo_id, optimizer, logging_steps, data_path, max_grad_norm, train_split WARNING | 2024-04-23 10:36:20 | autotrain.trainers.common:init:170 - Parameters not supplied by user and set to default: seed, valid_split, save_strategy, weight_decay, token, scheduler, target_column, warmup_ratio, batch_size, push_to_hub, epochs, username, project_name, save_total_limit, lr, model, image_column, evaluation_strategy, gradient_accumulation, logging_steps, auto_find_batch_size, repo_id, optimizer, data_path, max_grad_norm, train_split WARNING | 2024-04-23 10:36:20 | autotrain.trainers.common:init:170 - Parameters not supplied by user and set to default: target_column, scheduler, warmup_ratio, lora_r, max_seq_length, push_to_hub, epochs, username, project_name, lora_dropout, save_total_limit, lr, model, text_column, auto_find_batch_size, repo_id, optimizer, peft, train_split, seed, lora_alpha, valid_split, weight_decay, token, batch_size, quantization, evaluation_strategy, gradient_accumulation, logging_steps, data_path, max_target_length, max_grad_norm WARNING | 2024-04-23 10:36:20 | autotrain.trainers.common:init:170 - Parameters not supplied by user and set to default: seed, valid_split, time_limit, token, id_column, push_to_hub, username, project_name, categorical_columns, target_columns, model, data_path, repo_id, train_split, num_trials, task, numerical_columns WARNING | 2024-04-23 10:36:20 | autotrain.trainers.common:init:170 - Parameters not supplied by user and set to default: validation_images, class_labels_conditioning, checkpointing_steps, adam_epsilon, resume_from_checkpoint, adam_weight_decay, text_encoder_use_attention_mask, scheduler, num_validation_images, lr_power, checkpoints_total_limit, logging, push_to_hub, image_path, epochs, class_prompt, project_name, username, model, num_cycles, max_grad_norm, validation_prompt, dataloader_num_workers, repo_id, rank, seed, adam_beta1, scale_lr, local_rank, center_crop, sample_batch_size, token, tokenizer, pre_compute_text_embeddings, num_class_images, xl, prior_loss_weight, validation_epochs, adam_beta2, warmup_steps, prior_generation_precision, class_image_path, tokenizer_max_length, allow_tf32, prior_preservation, revision WARNING | 2024-04-23 10:36:20 | autotrain.trainers.common:init:170 - Parameters not supplied by user and set to default: seed, tags_column, valid_split, save_strategy, weight_decay, token, scheduler, warmup_ratio, max_seq_length, push_to_hub, batch_size, epochs, project_name, username, save_total_limit, lr, model, tokens_column, evaluation_strategy, gradient_accumulation, logging_steps, auto_find_batch_size, repo_id, optimizer, data_path, max_grad_norm, train_split INFO | 2024-04-23 10:36:20 | autotrain.app::154 - AutoTrain started successfully INFO | 2024-04-23 10:36:23 | autotrain.app:fetch_params:212 - Task: llm:sft INFO | 2024-04-23 10:36:28 | autotrain.app:fetch_params:212 - Task: token-classification INFO | 2024-04-23 10:38:13 | autotrain.app:handle_form:454 - hardware: Local INFO | 2024-04-23 10:38:13 | autotrain.app:handle_form:543 - Task: text_token_classification INFO | 2024-04-23 10:38:13 | autotrain.app:handle_form:544 - Column mapping: {'text': 'text', 'label': 'label'}

Saving the dataset (0/1 shards): 0%| | 0/16 [00:00<?, ? examples/s] Saving the dataset (1/1 shards): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16/16 [00:00<00:00, 5756.46 examples/s] Saving the dataset (1/1 shards): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16/16 [00:00<00:00, 5379.90 examples/s]

Saving the dataset (0/1 shards): 0%| | 0/4 [00:00<?, ? examples/s] Saving the dataset (1/1 shards): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:00<00:00, 1849.75 examples/s] Saving the dataset (1/1 shards): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:00<00:00, 1732.65 examples/s] WARNING | 2024-04-23 10:38:13 | autotrain.trainers.common:init:170 - Parameters not supplied by user and set to default: seed, tags_column, save_total_limit, save_strategy, weight_decay, tokens_column, evaluation_strategy, logging_steps, auto_find_batch_size, warmup_ratio, max_grad_norm, train_split WARNING | 2024-04-23 10:38:13 | autotrain.trainers.common:init:176 - Parameters supplied but not used: text_column, target_column INFO | 2024-04-23 10:38:13 | autotrain.backend:create:300 - Starting local training... INFO | 2024-04-23 10:38:13 | autotrain.commands:launch_command:338 - ['accelerate', 'launch', '--num_machines', '1', '--num_processes', '1', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.token_classification', '--training_config', 'autotrain-xxx5e-54siu/training_params.json'] INFO | 2024-04-23 10:38:13 | autotrain.commands:launch_command:339 - {'data_path': 'autotrain-xxx5e-54siu/autotrain-data', 'model': 'Babelscape/wikineural-multilingual-ner', 'lr': 5e-05, 'epochs': 3, 'max_seq_length': 128, 'batch_size': 8, 'warmup_ratio': 0.1, 'gradient_accumulation': 1, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'train_split': 'train', 'valid_split': 'validation', 'tokens_column': 'tokens', 'tags_column': 'tags', 'logging_steps': -1, 'project_name': 'autotrain-xxx5e-54siu', 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'save_total_limit': 1, 'save_strategy': 'epoch', 'token': '*****', 'push_to_hub': True, 'repo_id': 'Jerado/autotrain-xxx5e-54siu', 'evaluation_strategy': 'epoch', 'username': 'Jerado', 'log': 'tensorboard'} INFO | 2024-04-23 10:38:13 | autotrain.backend:create:305 - Training PID: 57 The following values were not passed to accelerate launch and had defaults used instead: --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. /app/env/lib/python3.10/site-packages/autotrain/trainers/token_classification/utils.py:5: FutureWarning: load_metric is deprecated and will be removed in the next major version of datasets. Use 'evaluate.load' instead, from the new library πŸ€— Evaluate: https://huggingface.co/docs/evaluate _METRICS = load_metric("seqeval")

Downloading builder script: 0%| | 0.00/2.47k [00:00<?, ?B/s] Downloading builder script: 6.33kB [00:00, 14.0MB/s]
INFO | 2024-04-23 10:38:20 | main:train:53 - loading dataset from disk INFO | 2024-04-23 10:38:20 | main:train:64 - loading dataset from disk ERROR | 2024-04-23 10:38:20 | autotrain.trainers.common:wrapper:116 - train has failed due to an exception: Traceback (most recent call last): File "/app/env/lib/python3.10/site-packages/autotrain/trainers/common.py", line 113, in wrapper return func(*args, **kwargs) File "/app/env/lib/python3.10/site-packages/autotrain/trainers/token_classification/main.py", line 73, in train label_list = train_data.features[config.tags_column].feature.names KeyError: 'tags'

ERROR | 2024-04-23 10:38:20 | autotrain.trainers.common:wrapper:117 - 'tags' INFO | 2024-04-23 10:38:20 | autotrain.trainers.common:pause_space:74 - Pausing space...

Additional Information

No response

abhishekkrthakur commented 2 months ago

if you are using a csv for token classification, you need to use stringified list. an example is provided here: https://huggingface.co/docs/autotrain/token_classification

your example doesnt look like correct data format.

Also, its better to use JSONL format for token classification task like this:

{"tokens": ["I", "love", "Paris"], "tags": ["O", "O", "B-LOC"]}
{"tokens": ["I", "live", "in", "New", "York"], "tags": ["O", "O", "O", "B-LOC", "I-LOC"]}
.
.
.
Jerado10 commented 2 months ago

I've stringified the data but get the same error. I also ran a test with your example data above, and I had the same error. Here is the csv data I used:

Screenshot 2024-04-24 at 09 33 24
abhishekkrthakur commented 2 months ago

thanks for reporting this issue. it turned out to be bigger than i had expected. the issue is now resolved in version 0.7.62+

sample jsonl:

{"tokens": ["I", "love", "Paris"], "tags": ["O", "O", "B-LOC"]}
{"tokens": ["I", "live", "in", "New", "York"], "tags": ["O", "O", "O", "B-LOC", "I-LOC"]}
{"tokens": ["I", "love", "Paris"], "tags": ["O", "O", "B-LOC"]}
{"tokens": ["I", "live", "in", "New", "York"], "tags": ["O", "O", "O", "B-LOC", "I-LOC"]}
{"tokens": ["I", "love", "Paris"], "tags": ["O", "O", "B-LOC"]}
{"tokens": ["I", "live", "in", "New", "York"], "tags": ["O", "O", "O", "B-LOC", "I-LOC"]}
{"tokens": ["I", "love", "Paris"], "tags": ["O", "O", "B-LOC"]}
{"tokens": ["I", "live", "in", "New", "York"], "tags": ["O", "O", "O", "B-LOC", "I-LOC"]}
{"tokens": ["I", "love", "Paris"], "tags": ["O", "O", "B-LOC"]}
{"tokens": ["I", "live", "in", "New", "York"], "tags": ["O", "O", "O", "B-LOC", "I-LOC"]}
{"tokens": ["I", "love", "Paris"], "tags": ["O", "O", "B-LOC"]}
{"tokens": ["I", "live", "in", "New", "York"], "tags": ["O", "O", "O", "B-LOC", "I-LOC"]}
{"tokens": ["I", "love", "Paris"], "tags": ["O", "O", "B-LOC"]}
{"tokens": ["I", "live", "in", "New", "York"], "tags": ["O", "O", "O", "B-LOC", "I-LOC"]}
{"tokens": ["I", "love", "Paris"], "tags": ["O", "O", "B-LOC"]}
{"tokens": ["I", "live", "in", "New", "York"], "tags": ["O", "O", "O", "B-LOC", "I-LOC"]}
{"tokens": ["I", "love", "Paris"], "tags": ["O", "O", "B-LOC"]}
{"tokens": ["I", "live", "in", "New", "York"], "tags": ["O", "O", "O", "B-LOC", "I-LOC"]}

sample csv:

tokens,tags
"['I', 'love', 'Paris']","['O', 'O', 'B-LOC']"
"['I', 'live', 'in', 'New', 'York']","['O', 'O', 'O', 'B-LOC', 'I-LOC']"
"['I', 'love', 'Paris']","['O', 'O', 'B-LOC']"
"['I', 'live', 'in', 'New', 'York']","['O', 'O', 'O', 'B-LOC', 'I-LOC']"
"['I', 'love', 'Paris']","['O', 'O', 'B-LOC']"
"['I', 'live', 'in', 'New', 'York']","['O', 'O', 'O', 'B-LOC', 'I-LOC']"
"['I', 'love', 'Paris']","['O', 'O', 'B-LOC']"
"['I', 'live', 'in', 'New', 'York']","['O', 'O', 'O', 'B-LOC', 'I-LOC']"
"['I', 'love', 'Paris']","['O', 'O', 'B-LOC']"
"['I', 'live', 'in', 'New', 'York']","['O', 'O', 'O', 'B-LOC', 'I-LOC']"
"['I', 'love', 'Paris']","['O', 'O', 'B-LOC']"
"['I', 'live', 'in', 'New', 'York']","['O', 'O', 'O', 'B-LOC', 'I-LOC']"
"['I', 'love', 'Paris']","['O', 'O', 'B-LOC']"
"['I', 'live', 'in', 'New', 'York']","['O', 'O', 'O', 'B-LOC', 'I-LOC']"
"['I', 'love', 'Paris']","['O', 'O', 'B-LOC']"
abhishekkrthakur commented 2 months ago

please factory rebuild your autotrain space and make sure its on version 0.7.62 or above.

Jerado10 commented 2 months ago

It worked. Thanks for your fast support on this :)