Saving the dataset (0/1 shards): 0%| | 0/9 [00:00<?, ? examples/s]
Saving the dataset (1/1 shards): 100%|██████████| 9/9 [00:00<00:00, 4258.66 examples/s]
Saving the dataset (1/1 shards): 100%|██████████| 9/9 [00:00<00:00, 3981.51 examples/s]
WARNING | 2024-05-07 14:30:06 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: valid_split, seed, train_split, quantization, save_total_limit, warmup_ratio, weight_decay, max_prompt_length, max_completion_length, auto_find_batch_size, lora_r, add_eos_token, use_flash_attention_2, padding, disable_gradient_checkpointing, lora_dropout, lora_alpha, model_ref, merge_adapter, dpo_beta, evaluation_strategy, max_grad_norm, logging_steps
INFO | 2024-05-07 14:30:06 | autotrain.backend:create:300 - Starting local training...
INFO | 2024-05-07 14:30:06 | autotrain.commands:launch_command:327 - ['accelerate', 'launch', '--num_machines', '1', '--num_processes', '1', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-y6hc2-nrxlz/training_params.json']
INFO | 2024-05-07 14:30:06 | autotrain.commands:launch_command:328 - {'model': 'microsoft/Phi-3-mini-128k-instruct', 'project_name': 'autotrain-y6hc2-nrxlz', 'data_path': 'autotrain-y6hc2-nrxlz/autotrain-data', 'train_split': 'train', 'valid_split': None, 'add_eos_token': True, 'block_size': 1024, 'model_max_length': 4096, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'evaluation_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'lr': 3e-05, 'epochs': 3, 'batch_size': 2, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'none', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'autotrain_prompt', 'text_column': 'autotrain_text', 'rejected_text_column': 'autotrain_rejected_text', 'push_to_hub': True, 'username': 'bertilmuth', 'token': '*****'}
INFO | 2024-05-07 14:30:06 | autotrain.backend:create:305 - Training PID: 65
The following values were not passed to accelerate launch and had defaults used instead:
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.train_clm_sft:train:14 - Starting SFT training...
INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:process_input_data:311 - loading dataset from disk
INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:process_input_data:352 - Train data: Dataset({
features: ['autotrain_text'],
num_rows: 9
})
INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:process_input_data:353 - Valid data: None
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:configure_logging_steps:423 - configuring logging steps
INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:configure_logging_steps:436 - Logging steps: 1
INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:configure_training_args:441 - configuring training args
INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:configure_block_size:504 - Using block size 1024
INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.train_clm_sft:train:27 - loading model config...
A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.train_clm_sft:train:35 - loading model...
A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
low_cpu_mem_usage was None, now set to True since model is quantized.
I am unsure if it's correct that after finetuning the model, the auotrain space should pause (because that's what happens, see log). I can access the finetuned model, but when I use the GUI for inference, it times out (see screenshot). Access via sending a POST request doesn't work either, it provides an error that it's still loading the model.
Prerequisites
Backend
Hugging Face Space/Endpoints
Interface Used
UI
CLI Command
No response
UI Screenshots & Parameters
Error Logs
===== Application Startup at 2024-05-07 14:29:17 =====
========== == CUDA ==
CUDA Version 12.1.1
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
Found existing installation: autotrain-advanced 0.7.80.dev0 Uninstalling autotrain-advanced-0.7.80.dev0: Successfully uninstalled autotrain-advanced-0.7.80.dev0 Collecting autotrain-advanced Downloading autotrain_advanced-0.7.79-py3-none-any.whl.metadata (13 kB) Requirement already satisfied: albumentations==1.4.4 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.4.4) Requirement already satisfied: codecarbon==2.3.5 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.3.5) Requirement already satisfied: datasets~=2.19.0 in ./env/lib/python3.10/site-packages (from datasets[vision]~=2.19.0->autotrain-advanced) (2.19.1) Requirement already satisfied: evaluate==0.4.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.4.1) Requirement already satisfied: ipadic==1.0.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.0.0) Requirement already satisfied: jiwer==3.0.3 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (3.0.3) Requirement already satisfied: joblib==1.4.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.4.0) Requirement already satisfied: loguru==0.7.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.7.2) Requirement already satisfied: pandas==2.2.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.2.2) Requirement already satisfied: nltk==3.8.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (3.8.1) Requirement already satisfied: optuna==3.6.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (3.6.1) Requirement already satisfied: Pillow==10.3.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (10.3.0) Requirement already satisfied: protobuf==4.23.4 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (4.23.4) Requirement already satisfied: sacremoses==0.1.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.1.1) Requirement already satisfied: scikit-learn==1.4.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.4.2) Requirement already satisfied: sentencepiece==0.2.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.2.0) Requirement already satisfied: tqdm==4.66.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (4.66.2) Requirement already satisfied: werkzeug==3.0.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (3.0.2) Requirement already satisfied: xgboost==2.0.3 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.0.3) Requirement already satisfied: huggingface-hub==0.22.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.22.2) Requirement already satisfied: requests==2.31.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.31.0) Requirement already satisfied: einops==0.7.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.7.0) Requirement already satisfied: invisible-watermark==0.2.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.2.0) Requirement already satisfied: packaging==24.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (24.0) Requirement already satisfied: cryptography==42.0.5 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (42.0.5) Requirement already satisfied: nvitop==1.3.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.3.2) Requirement already satisfied: tensorboard==2.16.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.16.2) Requirement already satisfied: peft==0.10.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.10.0) Requirement already satisfied: trl==0.8.6 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.8.6) Requirement already satisfied: tiktoken==0.6.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.6.0) Requirement already satisfied: transformers==4.40.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (4.40.1) Requirement already satisfied: accelerate==0.29.3 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.29.3) Requirement already satisfied: diffusers==0.27.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.27.2) Requirement already satisfied: rouge-score==0.1.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.1.2) Requirement already satisfied: py7zr==0.21.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.21.0) Requirement already satisfied: fastapi==0.110.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.110.2) Requirement already satisfied: uvicorn==0.29.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.29.0) Requirement already satisfied: python-multipart==0.0.9 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.0.9) Requirement already satisfied: pydantic==2.7.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.7.1) Requirement already satisfied: hf-transfer in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.1.6) Requirement already satisfied: pyngrok==7.1.6 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (7.1.6) Requirement already satisfied: authlib==1.3.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.3.0) Requirement already satisfied: itsdangerous==2.2.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (2.2.0) Requirement already satisfied: seqeval==1.2.2 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (1.2.2) Requirement already satisfied: httpx==0.27.0 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.27.0) Requirement already satisfied: pyyaml==6.0.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (6.0.1) Requirement already satisfied: bitsandbytes==0.43.1 in ./env/lib/python3.10/site-packages (from autotrain-advanced) (0.43.1) Requirement already satisfied: numpy>=1.17 in ./env/lib/python3.10/site-packages (from accelerate==0.29.3->autotrain-advanced) (1.26.4) Requirement already satisfied: psutil in ./env/lib/python3.10/site-packages (from accelerate==0.29.3->autotrain-advanced) (5.9.8) Requirement already satisfied: torch>=1.10.0 in ./env/lib/python3.10/site-packages (from accelerate==0.29.3->autotrain-advanced) (2.3.0) Requirement already satisfied: safetensors>=0.3.1 in ./env/lib/python3.10/site-packages (from accelerate==0.29.3->autotrain-advanced) (0.4.3) Requirement already satisfied: scipy>=1.10.0 in ./env/lib/python3.10/site-packages (from albumentations==1.4.4->autotrain-advanced) (1.13.0) Requirement already satisfied: scikit-image>=0.21.0 in ./env/lib/python3.10/site-packages (from albumentations==1.4.4->autotrain-advanced) (0.23.2) Requirement already satisfied: typing-extensions>=4.9.0 in ./env/lib/python3.10/site-packages (from albumentations==1.4.4->autotrain-advanced) (4.9.0) Requirement already satisfied: opencv-python-headless>=4.9.0 in ./env/lib/python3.10/site-packages (from albumentations==1.4.4->autotrain-advanced) (4.9.0.80) Requirement already satisfied: arrow in ./env/lib/python3.10/site-packages (from codecarbon==2.3.5->autotrain-advanced) (1.3.0) Requirement already satisfied: pynvml in ./env/lib/python3.10/site-packages (from codecarbon==2.3.5->autotrain-advanced) (11.5.0) Requirement already satisfied: py-cpuinfo in ./env/lib/python3.10/site-packages (from codecarbon==2.3.5->autotrain-advanced) (9.0.0) Requirement already satisfied: rapidfuzz in ./env/lib/python3.10/site-packages (from codecarbon==2.3.5->autotrain-advanced) (3.9.0) Requirement already satisfied: click in ./env/lib/python3.10/site-packages (from codecarbon==2.3.5->autotrain-advanced) (8.1.7) Requirement already satisfied: prometheus-client in ./env/lib/python3.10/site-packages (from codecarbon==2.3.5->autotrain-advanced) (0.20.0) Requirement already satisfied: cffi>=1.12 in ./env/lib/python3.10/site-packages (from cryptography==42.0.5->autotrain-advanced) (1.16.0) Requirement already satisfied: importlib-metadata in ./env/lib/python3.10/site-packages (from diffusers==0.27.2->autotrain-advanced) (7.1.0) Requirement already satisfied: filelock in ./env/lib/python3.10/site-packages (from diffusers==0.27.2->autotrain-advanced) (3.13.1) Requirement already satisfied: regex!=2019.12.17 in ./env/lib/python3.10/site-packages (from diffusers==0.27.2->autotrain-advanced) (2024.4.28) Requirement already satisfied: dill in ./env/lib/python3.10/site-packages (from evaluate==0.4.1->autotrain-advanced) (0.3.8) Requirement already satisfied: xxhash in ./env/lib/python3.10/site-packages (from evaluate==0.4.1->autotrain-advanced) (3.4.1) Requirement already satisfied: multiprocess in ./env/lib/python3.10/site-packages (from evaluate==0.4.1->autotrain-advanced) (0.70.16) Requirement already satisfied: fsspec>=2021.05.0 in ./env/lib/python3.10/site-packages (from fsspec[http]>=2021.05.0->evaluate==0.4.1->autotrain-advanced) (2024.3.1) Requirement already satisfied: responses<0.19 in ./env/lib/python3.10/site-packages (from evaluate==0.4.1->autotrain-advanced) (0.18.0) Requirement already satisfied: starlette<0.38.0,>=0.37.2 in ./env/lib/python3.10/site-packages (from fastapi==0.110.2->autotrain-advanced) (0.37.2) Requirement already satisfied: anyio in ./env/lib/python3.10/site-packages (from httpx==0.27.0->autotrain-advanced) (4.3.0) Requirement already satisfied: certifi in ./env/lib/python3.10/site-packages (from httpx==0.27.0->autotrain-advanced) (2024.2.2) Requirement already satisfied: httpcore==1.* in ./env/lib/python3.10/site-packages (from httpx==0.27.0->autotrain-advanced) (1.0.5) Requirement already satisfied: idna in ./env/lib/python3.10/site-packages (from httpx==0.27.0->autotrain-advanced) (3.7) Requirement already satisfied: sniffio in ./env/lib/python3.10/site-packages (from httpx==0.27.0->autotrain-advanced) (1.3.1) Requirement already satisfied: PyWavelets>=1.1.1 in ./env/lib/python3.10/site-packages (from invisible-watermark==0.2.0->autotrain-advanced) (1.6.0) Requirement already satisfied: opencv-python>=4.1.0.25 in ./env/lib/python3.10/site-packages (from invisible-watermark==0.2.0->autotrain-advanced) (4.9.0.80) Requirement already satisfied: nvidia-ml-py<12.536.0a0,>=11.450.51 in ./env/lib/python3.10/site-packages (from nvitop==1.3.2->autotrain-advanced) (12.535.161) Requirement already satisfied: cachetools>=1.0.1 in ./env/lib/python3.10/site-packages (from nvitop==1.3.2->autotrain-advanced) (5.3.3) Requirement already satisfied: termcolor>=1.0.0 in ./env/lib/python3.10/site-packages (from nvitop==1.3.2->autotrain-advanced) (2.4.0) Requirement already satisfied: alembic>=1.5.0 in ./env/lib/python3.10/site-packages (from optuna==3.6.1->autotrain-advanced) (1.13.1) Requirement already satisfied: colorlog in ./env/lib/python3.10/site-packages (from optuna==3.6.1->autotrain-advanced) (6.8.2) Requirement already satisfied: sqlalchemy>=1.3.0 in ./env/lib/python3.10/site-packages (from optuna==3.6.1->autotrain-advanced) (2.0.30) Requirement already satisfied: python-dateutil>=2.8.2 in ./env/lib/python3.10/site-packages (from pandas==2.2.2->autotrain-advanced) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in ./env/lib/python3.10/site-packages (from pandas==2.2.2->autotrain-advanced) (2024.1) Requirement already satisfied: tzdata>=2022.7 in ./env/lib/python3.10/site-packages (from pandas==2.2.2->autotrain-advanced) (2024.1) Requirement already satisfied: texttable in ./env/lib/python3.10/site-packages (from py7zr==0.21.0->autotrain-advanced) (1.7.0) Requirement already satisfied: pycryptodomex>=3.16.0 in ./env/lib/python3.10/site-packages (from py7zr==0.21.0->autotrain-advanced) (3.20.0) Requirement already satisfied: pyzstd>=0.15.9 in ./env/lib/python3.10/site-packages (from py7zr==0.21.0->autotrain-advanced) (0.15.10) Requirement already satisfied: pyppmd<1.2.0,>=1.1.0 in ./env/lib/python3.10/site-packages (from py7zr==0.21.0->autotrain-advanced) (1.1.0) Requirement already satisfied: pybcj<1.1.0,>=1.0.0 in ./env/lib/python3.10/site-packages (from py7zr==0.21.0->autotrain-advanced) (1.0.2) Requirement already satisfied: multivolumefile>=0.2.3 in ./env/lib/python3.10/site-packages (from py7zr==0.21.0->autotrain-advanced) (0.2.3) Requirement already satisfied: inflate64<1.1.0,>=1.0.0 in ./env/lib/python3.10/site-packages (from py7zr==0.21.0->autotrain-advanced) (1.0.0) Requirement already satisfied: brotli>=1.1.0 in ./env/lib/python3.10/site-packages (from py7zr==0.21.0->autotrain-advanced) (1.1.0) Requirement already satisfied: annotated-types>=0.4.0 in ./env/lib/python3.10/site-packages (from pydantic==2.7.1->autotrain-advanced) (0.6.0) Requirement already satisfied: pydantic-core==2.18.2 in ./env/lib/python3.10/site-packages (from pydantic==2.7.1->autotrain-advanced) (2.18.2) Requirement already satisfied: charset-normalizer<4,>=2 in ./env/lib/python3.10/site-packages (from requests==2.31.0->autotrain-advanced) (2.0.4) Requirement already satisfied: urllib3<3,>=1.21.1 in ./env/lib/python3.10/site-packages (from requests==2.31.0->autotrain-advanced) (2.1.0) Requirement already satisfied: absl-py in ./env/lib/python3.10/site-packages (from rouge-score==0.1.2->autotrain-advanced) (2.1.0) Requirement already satisfied: six>=1.14.0 in ./env/lib/python3.10/site-packages (from rouge-score==0.1.2->autotrain-advanced) (1.16.0) Requirement already satisfied: threadpoolctl>=2.0.0 in ./env/lib/python3.10/site-packages (from scikit-learn==1.4.2->autotrain-advanced) (3.5.0) Requirement already satisfied: grpcio>=1.48.2 in ./env/lib/python3.10/site-packages (from tensorboard==2.16.2->autotrain-advanced) (1.63.0) Requirement already satisfied: markdown>=2.6.8 in ./env/lib/python3.10/site-packages (from tensorboard==2.16.2->autotrain-advanced) (3.6) Requirement already satisfied: setuptools>=41.0.0 in ./env/lib/python3.10/site-packages (from tensorboard==2.16.2->autotrain-advanced) (69.5.1) Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in ./env/lib/python3.10/site-packages (from tensorboard==2.16.2->autotrain-advanced) (0.7.2) Requirement already satisfied: tokenizers<0.20,>=0.19 in ./env/lib/python3.10/site-packages (from transformers==4.40.1->autotrain-advanced) (0.19.1) Requirement already satisfied: tyro>=0.5.11 in ./env/lib/python3.10/site-packages (from trl==0.8.6->autotrain-advanced) (0.8.3) Requirement already satisfied: h11>=0.8 in ./env/lib/python3.10/site-packages (from uvicorn==0.29.0->autotrain-advanced) (0.14.0) Requirement already satisfied: MarkupSafe>=2.1.1 in ./env/lib/python3.10/site-packages (from werkzeug==3.0.2->autotrain-advanced) (2.1.3) Requirement already satisfied: pyarrow>=12.0.0 in ./env/lib/python3.10/site-packages (from datasets~=2.19.0->datasets[vision]~=2.19.0->autotrain-advanced) (16.0.0) Requirement already satisfied: pyarrow-hotfix in ./env/lib/python3.10/site-packages (from datasets~=2.19.0->datasets[vision]~=2.19.0->autotrain-advanced) (0.6) Requirement already satisfied: aiohttp in ./env/lib/python3.10/site-packages (from datasets~=2.19.0->datasets[vision]~=2.19.0->autotrain-advanced) (3.9.5) Requirement already satisfied: Mako in ./env/lib/python3.10/site-packages (from alembic>=1.5.0->optuna==3.6.1->autotrain-advanced) (1.3.3) Requirement already satisfied: pycparser in ./env/lib/python3.10/site-packages (from cffi>=1.12->cryptography==42.0.5->autotrain-advanced) (2.22) Requirement already satisfied: aiosignal>=1.1.2 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=2.19.0->datasets[vision]~=2.19.0->autotrain-advanced) (1.3.1) Requirement already satisfied: attrs>=17.3.0 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=2.19.0->datasets[vision]~=2.19.0->autotrain-advanced) (23.2.0) Requirement already satisfied: frozenlist>=1.1.1 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=2.19.0->datasets[vision]~=2.19.0->autotrain-advanced) (1.4.1) Requirement already satisfied: multidict<7.0,>=4.5 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=2.19.0->datasets[vision]~=2.19.0->autotrain-advanced) (6.0.5) Requirement already satisfied: yarl<2.0,>=1.0 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=2.19.0->datasets[vision]~=2.19.0->autotrain-advanced) (1.9.4) Requirement already satisfied: async-timeout<5.0,>=4.0 in ./env/lib/python3.10/site-packages (from aiohttp->datasets~=2.19.0->datasets[vision]~=2.19.0->autotrain-advanced) (4.0.3) Requirement already satisfied: networkx>=2.8 in ./env/lib/python3.10/site-packages (from scikit-image>=0.21.0->albumentations==1.4.4->autotrain-advanced) (3.1) Requirement already satisfied: imageio>=2.33 in ./env/lib/python3.10/site-packages (from scikit-image>=0.21.0->albumentations==1.4.4->autotrain-advanced) (2.34.1) Requirement already satisfied: tifffile>=2022.8.12 in ./env/lib/python3.10/site-packages (from scikit-image>=0.21.0->albumentations==1.4.4->autotrain-advanced) (2024.5.3) Requirement already satisfied: lazy-loader>=0.4 in ./env/lib/python3.10/site-packages (from scikit-image>=0.21.0->albumentations==1.4.4->autotrain-advanced) (0.4) Requirement already satisfied: greenlet!=0.4.17 in ./env/lib/python3.10/site-packages (from sqlalchemy>=1.3.0->optuna==3.6.1->autotrain-advanced) (3.0.3) Requirement already satisfied: exceptiongroup>=1.0.2 in ./env/lib/python3.10/site-packages (from anyio->httpx==0.27.0->autotrain-advanced) (1.2.1) Requirement already satisfied: sympy in ./env/lib/python3.10/site-packages (from torch>=1.10.0->accelerate==0.29.3->autotrain-advanced) (1.12) Requirement already satisfied: jinja2 in ./env/lib/python3.10/site-packages (from torch>=1.10.0->accelerate==0.29.3->autotrain-advanced) (3.1.3) Requirement already satisfied: docstring-parser>=0.14.1 in ./env/lib/python3.10/site-packages (from tyro>=0.5.11->trl==0.8.6->autotrain-advanced) (0.16) Requirement already satisfied: rich>=11.1.0 in ./env/lib/python3.10/site-packages (from tyro>=0.5.11->trl==0.8.6->autotrain-advanced) (13.7.1) Requirement already satisfied: shtab>=1.5.6 in ./env/lib/python3.10/site-packages (from tyro>=0.5.11->trl==0.8.6->autotrain-advanced) (1.7.1) Requirement already satisfied: types-python-dateutil>=2.8.10 in ./env/lib/python3.10/site-packages (from arrow->codecarbon==2.3.5->autotrain-advanced) (2.9.0.20240316) Requirement already satisfied: zipp>=0.5 in ./env/lib/python3.10/site-packages (from importlib-metadata->diffusers==0.27.2->autotrain-advanced) (3.18.1) Requirement already satisfied: markdown-it-py>=2.2.0 in ./env/lib/python3.10/site-packages (from rich>=11.1.0->tyro>=0.5.11->trl==0.8.6->autotrain-advanced) (3.0.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./env/lib/python3.10/site-packages (from rich>=11.1.0->tyro>=0.5.11->trl==0.8.6->autotrain-advanced) (2.18.0) Requirement already satisfied: mpmath>=0.19 in ./env/lib/python3.10/site-packages (from sympy->torch>=1.10.0->accelerate==0.29.3->autotrain-advanced) (1.3.0) Requirement already satisfied: mdurl~=0.1 in ./env/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=11.1.0->tyro>=0.5.11->trl==0.8.6->autotrain-advanced) (0.1.2) Downloading autotrain_advanced-0.7.79-py3-none-any.whl (276 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 276.0/276.0 kB 18.2 MB/s eta 0:00:00 Installing collected packages: autotrain-advanced Successfully installed autotrain-advanced-0.7.79 Your installed package:32 - Starting AutoTrain...
WARNING | 2024-05-07 14:29:34 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: valid_split, scheduler, seed, train_split, model, token, gradient_accumulation, batch_size, save_total_limit, warmup_ratio, weight_decay, max_prompt_length, auto_find_batch_size, lora_r, add_eos_token, use_flash_attention_2, rejected_text_column, disable_gradient_checkpointing, project_name, trainer, lora_dropout, lr, data_path, lora_alpha, push_to_hub, model_ref, text_column, prompt_text_column, model_max_length, merge_adapter, dpo_beta, evaluation_strategy, optimizer, username, max_grad_norm, logging_steps
WARNING | 2024-05-07 14:29:34 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: target_column, valid_split, scheduler, project_name, seed, lr, train_split, data_path, epochs, model, push_to_hub, token, gradient_accumulation, text_column, batch_size, warmup_ratio, weight_decay, save_total_limit, evaluation_strategy, max_seq_length, auto_find_batch_size, optimizer, username, max_grad_norm, logging_steps
WARNING | 2024-05-07 14:29:34 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: target_column, valid_split, scheduler, project_name, seed, lr, train_split, data_path, epochs, model, push_to_hub, token, gradient_accumulation, batch_size, warmup_ratio, weight_decay, save_total_limit, evaluation_strategy, auto_find_batch_size, image_column, optimizer, username, max_grad_norm, logging_steps
WARNING | 2024-05-07 14:29:34 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: valid_split, scheduler, seed, train_split, model, token, gradient_accumulation, quantization, batch_size, warmup_ratio, weight_decay, save_total_limit, max_seq_length, auto_find_batch_size, lora_r, logging_steps, target_column, peft, project_name, lora_dropout, lr, data_path, epochs, lora_alpha, push_to_hub, text_column, evaluation_strategy, optimizer, username, max_grad_norm, max_target_length
WARNING | 2024-05-07 14:29:34 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: target_columns, valid_split, numerical_columns, project_name, id_column, seed, task, train_split, data_path, model, time_limit, push_to_hub, token, categorical_columns, num_trials, username
WARNING | 2024-05-07 14:29:34 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: adam_weight_decay, username, scheduler, class_prompt, seed, logging, adam_beta1, adam_epsilon, model, rank, token, image_path, validation_epochs, class_labels_conditioning, num_validation_images, sample_batch_size, scale_lr, lr_power, dataloader_num_workers, text_encoder_use_attention_mask, class_image_path, prior_loss_weight, tokenizer_max_length, local_rank, project_name, validation_prompt, warmup_steps, epochs, resume_from_checkpoint, xl, checkpoints_total_limit, num_cycles, allow_tf32, adam_beta2, center_crop, push_to_hub, validation_images, prior_generation_precision, revision, prior_preservation, num_class_images, checkpointing_steps, tokenizer, pre_compute_text_embeddings, max_grad_norm
WARNING | 2024-05-07 14:29:34 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: username, valid_split, scheduler, project_name, seed, lr, train_split, data_path, epochs, model, push_to_hub, token, gradient_accumulation, batch_size, tokens_column, warmup_ratio, weight_decay, save_total_limit, evaluation_strategy, max_seq_length, auto_find_batch_size, optimizer, tags_column, max_grad_norm, logging_steps
WARNING | 2024-05-07 14:29:34 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: target_column, valid_split, scheduler, project_name, seed, lr, train_split, data_path, epochs, model, push_to_hub, token, gradient_accumulation, text_column, batch_size, warmup_ratio, weight_decay, save_total_limit, evaluation_strategy, max_seq_length, auto_find_batch_size, optimizer, username, max_grad_norm, logging_steps
INFO | 2024-05-07 14:29:35 | autotrain.app::156 - AutoTrain started successfully
INFO | 2024-05-07 14:29:36 | autotrain.app:fetch_params:214 - Task: llm:sft
INFO | 2024-05-07 14:30:06 | autotrain.app:handle_form:463 - hardware: Local
INFO | 2024-05-07 14:30:06 | autotrain.app:handle_form:554 - Task: lm_training
INFO | 2024-05-07 14:30:06 | autotrain.app:handle_form:555 - Column mapping: {'text': 'text'}
nvidia-ml-py
is corrupted. Skip patch functionsnvmlDeviceGet{Compute,Graphics,MPSCompute}RunningProcesses
. You may get incorrect or incomplete results. Please consider reinstall packagenvidia-ml-py
viapip3 install --force-reinstall nvidia-ml-py nvitop
. Your installed packagenvidia-ml-py
is corrupted. Skip patch functionsnvmlDeviceGetMemoryInfo
. You may get incorrect or incomplete results. Please consider reinstall packagenvidia-ml-py
viapip3 install --force-reinstall nvidia-ml-py nvitop
. INFO | 2024-05-07 14:29:34 | autotrain.app:Saving the dataset (0/1 shards): 0%| | 0/9 [00:00<?, ? examples/s] Saving the dataset (1/1 shards): 100%|██████████| 9/9 [00:00<00:00, 3624.46 examples/s] Saving the dataset (1/1 shards): 100%|██████████| 9/9 [00:00<00:00, 3390.10 examples/s]
Saving the dataset (0/1 shards): 0%| | 0/9 [00:00<?, ? examples/s] Saving the dataset (1/1 shards): 100%|██████████| 9/9 [00:00<00:00, 4258.66 examples/s] Saving the dataset (1/1 shards): 100%|██████████| 9/9 [00:00<00:00, 3981.51 examples/s] WARNING | 2024-05-07 14:30:06 | autotrain.trainers.common:init:174 - Parameters not supplied by user and set to default: valid_split, seed, train_split, quantization, save_total_limit, warmup_ratio, weight_decay, max_prompt_length, max_completion_length, auto_find_batch_size, lora_r, add_eos_token, use_flash_attention_2, padding, disable_gradient_checkpointing, lora_dropout, lora_alpha, model_ref, merge_adapter, dpo_beta, evaluation_strategy, max_grad_norm, logging_steps INFO | 2024-05-07 14:30:06 | autotrain.backend:create:300 - Starting local training... INFO | 2024-05-07 14:30:06 | autotrain.commands:launch_command:327 - ['accelerate', 'launch', '--num_machines', '1', '--num_processes', '1', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-y6hc2-nrxlz/training_params.json'] INFO | 2024-05-07 14:30:06 | autotrain.commands:launch_command:328 - {'model': 'microsoft/Phi-3-mini-128k-instruct', 'project_name': 'autotrain-y6hc2-nrxlz', 'data_path': 'autotrain-y6hc2-nrxlz/autotrain-data', 'train_split': 'train', 'valid_split': None, 'add_eos_token': True, 'block_size': 1024, 'model_max_length': 4096, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'evaluation_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'lr': 3e-05, 'epochs': 3, 'batch_size': 2, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'none', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'autotrain_prompt', 'text_column': 'autotrain_text', 'rejected_text_column': 'autotrain_rejected_text', 'push_to_hub': True, 'username': 'bertilmuth', 'token': '*****'} INFO | 2024-05-07 14:30:06 | autotrain.backend:create:305 - Training PID: 65 The following values were not passed to
accelerate launch
and had defaults used instead:--dynamo_backend
was set to a value of'no'
To avoid this warning pass in values for each of the problematic parameters or runaccelerate config
. INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.train_clm_sft:train:14 - Starting SFT training... INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:process_input_data:311 - loading dataset from disk INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:process_input_data:352 - Train data: Dataset({ features: ['autotrain_text'], num_rows: 9 }) INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:process_input_data:353 - Valid data: None Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:configure_logging_steps:423 - configuring logging steps INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:configure_logging_steps:436 - Logging steps: 1 INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:configure_training_args:441 - configuring training args INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.utils:configure_block_size:504 - Using block size 1024 INFO | 2024-05-07 14:30:13 | autotrain.trainers.clm.train_clm_sft:train:27 - loading model config... A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:low_cpu_mem_usage
was None, now set to True since model is quantized.Downloading shards: 0%| | 0/2 [00:00<?, ?it/s] Downloading shards: 50%|█████ | 1/2 [00:05<00:05, 5.28s/it] Downloading shards: 100%|██████████| 2/2 [00:12<00:00, 6.71s/it] Downloading shards: 100%|██████████| 2/2 [00:12<00:00, 6.50s/it]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|█████ | 1/2 [00:08<00:08, 8.51s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:12<00:00, 5.79s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:12<00:00, 6.19s/it] INFO | 2024-05-07 14:30:39 | autotrain.trainers.clm.train_clm_sft:train:66 - model dtype: torch.float16 INFO | 2024-05-07 14:30:39 | autotrain.trainers.clm.train_clm_sft:train:79 - creating trainer
Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 2 examples [00:00, 159.62 examples/s] INFO | 2024-05-07 14:30:40 | autotrain.trainers.common:on_train_begin:231 - Starting to train...
0%| | 0/3 [00:00<?, ?it/s]You are not running the flash-attention implementation, expect numerical differences.
0%| | 0/1 [00:00<?, ?it/s]
events.out.tfevents.1715092240.r-bertilmuth-phi-3-vqu08txi-99074-5gv27.66.0: 0%| | 0.00/8.00k [00:00<?, ?B/s] events.out.tfevents.1715092240.r-bertilmuth-phi-3-vqu08txi-99074-5gv27.66.0: 100%|██████████| 8.00k/8.00k [00:00<00:00, 65.9kB/s]
100%|██████████| 1/1 [00:00<00:00, 4.85it/s] 100%|██████████| 1/1 [00:00<00:00, 4.85it/s]
33%|███▎ | 1/3 [00:06<00:12, 6.40s/it]INFO | 2024-05-07 14:30:47 | autotrain.trainers.common:on_log:226 - {'loss': 0.3232, 'grad_norm': 0.5817663073539734, 'learning_rate': 3e-05, 'epoch': 1.0}
{'loss': 0.3232, 'grad_norm': 0.5817663073539734, 'learning_rate': 3e-05, 'epoch': 1.0}
33%|███▎ | 1/3 [00:06<00:12, 6.40s/it] 67%|██████▋ | 2/3 [00:11<00:05, 5.58s/it]INFO | 2024-05-07 14:30:52 | autotrain.trainers.common:on_log:226 - {'loss': 0.3232, 'grad_norm': 0.5791377425193787, 'learning_rate': 1.5e-05, 'epoch': 2.0}
{'loss': 0.3232, 'grad_norm': 0.5791377425193787, 'learning_rate': 1.5e-05, 'epoch': 2.0}
67%|██████▋ | 2/3 [00:11<00:05, 5.58s/it] 100%|██████████| 3/3 [00:16<00:00, 5.32s/it]INFO | 2024-05-07 14:30:57 | autotrain.trainers.common:on_log:226 - {'loss': 0.3018, 'grad_norm': 0.49286404252052307, 'learning_rate': 0.0, 'epoch': 3.0}
{'loss': 0.3018, 'grad_norm': 0.49286404252052307, 'learning_rate': 0.0, 'epoch': 3.0}
100%|██████████| 3/3 [00:16<00:00, 5.32s/it]INFO | 2024-05-07 14:30:57 | autotrain.trainers.common:on_log:226 - {'train_runtime': 16.4376, 'train_samples_per_second': 0.365, 'train_steps_per_second': 0.183, 'train_loss': 0.3160308400789897, 'epoch': 3.0}
{'train_runtime': 16.4376, 'train_samples_per_second': 0.365, 'train_steps_per_second': 0.183, 'train_loss': 0.3160308400789897, 'epoch': 3.0}
100%|██████████| 3/3 [00:16<00:00, 5.32s/it] 100%|██████████| 3/3 [00:16<00:00, 5.48s/it] INFO | 2024-05-07 14:30:57 | autotrain.trainers.clm.utils:post_training_steps:263 - Finished training, saving model... INFO | 2024-05-07 14:30:59 | autotrain.trainers.clm.utils:post_training_steps:293 - Pushing model to hub...
0%| | 0/4 [00:00<?, ?it/s] 25%|██▌ | 1/4 [00:04<00:14, 4.78s/it] 50%|█████ | 2/4 [00:05<00:04, 2.18s/it] 75%|███████▌ | 3/4 [00:05<00:01, 1.29s/it] 100%|██████████| 4/4 [00:05<00:00, 1.16it/s] 100%|██████████| 4/4 [00:05<00:00, 1.40s/it] INFO | 2024-05-07 14:31:07 | autotrain.trainers.common:pause_space:77 - Pausing space...
Additional Information
I am unsure if it's correct that after finetuning the model, the auotrain space should pause (because that's what happens, see log). I can access the finetuned model, but when I use the GUI for inference, it times out (see screenshot). Access via sending a POST request doesn't work either, it provides an error that it's still loading the model.