huggingface / autotrain-advanced

🤗 AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
3.76k stars 461 forks source link

Autotrain do not recognize nvidia rtx4060 gpu #708

Open Herkaba opened 1 month ago

Herkaba commented 1 month ago

Prerequisites

Backend

Local

Interface Used

UI

CLI Command

No response

UI Screenshots & Parameters

kÊp

Error Logs

Device 0: NVIDIA GeForce RTX 4060 - 1387MiB/8188MiB


INFO | 2024-07-21 12:47:42 | autotrain.app.utils:kill_process_by_pid:52 - Sent SIGTERM to process with PID 4438

INFO | 2024-07-21 12:47:42 | autotrain.app.utils:get_running_jobs:26 - Killing PID: 4438

ERROR | 2024-07-21 12:47:38 | autotrain.trainers.common:wrapper:121 - No GPU found. A GPU is needed for quantization.

RuntimeError: No GPU found. A GPU is needed for quantization.

raise RuntimeError("No GPU found. A GPU is needed for quantization.")

File "/home/herkaba/.local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment

hf_quantizer.validate_environment(

File "/home/herkaba/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3279, in from_pretrained

return model_class.from_pretrained(

File "/home/herkaba/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained

model = AutoModelForCausalLM.from_pretrained(

File "/home/herkaba/.local/lib/python3.10/site-packages/autotrain/trainers/clm/utils.py", line 649, in get_model

model = utils.get_model(config, tokenizer)

File "/home/herkaba/.local/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 25, in train

train_sft(config)

File "/home/herkaba/.local/lib/python3.10/site-packages/autotrain/trainers/clm/main.py", line 28, in train

return func(*args, **kwargs)

File "/home/herkaba/.local/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper

ERROR | 2024-07-21 12:47:38 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):

INFO | 2024-07-21 12:47:37 | autotrain.trainers.clm.utils:get_model:635 - loading model...

INFO | 2024-07-21 12:47:37 | autotrain.trainers.clm.utils:get_model:627 - loading model config...

WARNING | 2024-07-21 12:47:37 | autotrain.trainers.clm.utils:get_model:625 - Unsloth not available, continuing without it...

INFO | 2024-07-21 12:47:37 | autotrain.trainers.clm.utils:get_model:583 - Can use unsloth: False

INFO | 2024-07-21 12:47:37 | autotrain.trainers.clm.utils:configure_block_size:548 - Using block size 1024

INFO | 2024-07-21 12:47:37 | autotrain.trainers.clm.utils:configure_training_args:485 - configuring training args

INFO | 2024-07-21 12:47:37 | autotrain.trainers.clm.utils:configure_logging_steps:480 - Logging steps: 2

INFO | 2024-07-21 12:47:37 | autotrain.trainers.clm.utils:configure_logging_steps:467 - configuring logging steps

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

INFO | 2024-07-21 12:47:37 | autotrain.trainers.clm.utils:process_input_data:395 - Valid data: None

})

num_rows: 83

features: ['autotrain_text'],

INFO | 2024-07-21 12:47:37 | autotrain.trainers.clm.utils:process_input_data:394 - Train data: Dataset({

INFO | 2024-07-21 12:47:37 | autotrain.trainers.clm.utils:process_input_data:335 - loading dataset from disk

INFO | 2024-07-21 12:47:37 | autotrain.trainers.clm.train_clm_sft:train:12 - Starting SFT training...

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

INFO | 2024-07-21 12:47:32 | autotrain.backends.local:create:13 - Training PID: 4438

INFO | 2024-07-21 12:47:32 | autotrain.commands:launch_command:401 - {'model': 'meta-llama/Meta-Llama-3-8B-Instruct', 'project_name': 'trainTeszt', 'data_path': 'trainTeszt/autotrain-data', 'train_split': 'train', 'valid_split': None, 'add_eos_token': True, 'block_size': 1024, 'model_max_length': 2048, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'lr': 3e-05, 'epochs': 3, 'batch_size': 6, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'none', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'autotrain_prompt', 'text_column': 'autotrain_text', 'rejected_text_column': 'autotrain_rejected_text', 'push_to_hub': True, 'username': 'herkaba', 'token': '*****', 'unsloth': False}

INFO | 2024-07-21 12:47:32 | autotrain.commands:launch_command:400 - ['accelerate', 'launch', '--num_machines', '1', '--num_processes', '1', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.clm', '--training_config', 'trainTeszt/training_params.json']

INFO | 2024-07-21 12:47:32 | autotrain.backends.local:create:8 - Starting local training...

Saving the dataset (1/1 shards): 100%|██████████| 83/83 [00:00<00:00, 68515.50 examples/s]

Saving the dataset (1/1 shards): 100%|██████████| 83/83 [00:00<00:00, 74290.92 examples/s]

Saving the dataset (0/1 shards): 0%| | 0/83 [00:00<?, ? examples/s]

Saving the dataset (1/1 shards): 100%|██████████| 83/83 [00:00<00:00, 20154.42 examples/s]

Saving the dataset (1/1 shards): 100%|██████████| 83/83 [00:00<00:00, 20723.09 examples/s]

Saving the dataset (0/1 shards): 0%| | 0/83 [00:00<?, ? examples/s]

INFO | 2024-07-21 12:47:32 | autotrain.app.ui_routes:handle_form:608 - Column mapping: {'text': 'text'}

INFO | 2024-07-21 12:47:32 | autotrain.app.ui_routes:handle_form:607 - Task: lm_training

INFO | 2024-07-21 12:47:32 | autotrain.app.ui_routes:handle_form:491 - hardware: local-ui

INFO | 2024-07-21 12:44:09 | autotrain.app.ui_routes:fetch_params:376 - Task: llm:sft

INFO | 2024-07-21 12:44:04 | autotrain.app.ui_routes:fetch_params:376 - Task: llm:sft

INFO | 2024-07-21 12:44:01 | autotrain.app.ui_routes:fetch_params:376 - Task: llm:reward

INFO | 2024-07-21 12:43:56 | autotrain.app.ui_routes:fetch_params:376 - Task: llm:dpo

INFO | 2024-07-21 12:43:54 | autotrain.app.ui_routes:fetch_params:376 - Task: llm:generic

INFO | 2024-07-21 12:43:51 | autotrain.app.ui_routes:fetch_params:376 - Task: llm:orpo

INFO | 2024-07-21 12:43:47 | autotrain.app.ui_routes:fetch_params:376 - Task: llm:sft

INFO | 2024-07-21 12:43:45 | autotrain.app.ui_routes:fetch_params:376 - Task: llm:generic

INFO | 2024-07-21 12:43:28 | autotrain.app.ui_routes:fetch_params:376 - Task: llm:sft

INFO: 127.0.0.1:45810 - "GET / HTTP/1.1" 307 Temporary Redirect

INFO: Uvicorn running on http://127.0.0.1:7860 (Press CTRL+C to quit)

INFO: Application startup complete.

INFO: Waiting for application startup.

INFO: Started server process [3135]

INFO | 2024-07-21 12:43:25 | autotrain.app.app::24 - AutoTrain started successfully

INFO | 2024-07-21 12:43:25 | autotrain.app.app::23 - AutoTrain version: 0.8.4

INFO | 2024-07-21 12:43:25 | autotrain.app.app::13 - Starting AutoTrain...

INFO | 2024-07-21 12:43:25 | autotrain.app.ui_routes::293 - AutoTrain started successfully

INFO | 2024-07-21 12:43:20 | autotrain.app.ui_routes::31 - Starting AutoTrain...

Additional Information

I have cuda toolkit installed and i am using a wsl ubuntu on windows 11

abhishekkrthakur commented 1 month ago

hi. it seems like the cuda drivers are not properly installed. i was able to use it jsut fine on 4090, 3090 and titan rtx. installing the cuda drivers properly will fix the issue.

you can also check if torch is detecting gpu.

import torch

if torch.cuda.is_available():
    print("GPU is available")
    print("GPU Name:", torch.cuda.get_device_name(0))
else:
    print("GPU is not available")
Herkaba commented 1 month ago

I installed cuda toolkit and torch is detecting it but autotrain doesn't work either. Do i need to install cuda toolkit on my windows operating system or just in the wsl ubuntu ?

abhishekkrthakur commented 1 month ago

only WSL

abhishekkrthakur commented 1 month ago

also, make sure that correct path is available in CUDA_HOME:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

Herkaba commented 1 month ago

ohmm i changed the cuda home to /usr/local/cuda but i still get this error:

INFO | 2024-07-22 13:52:09 | autotrain.app.utils:kill_process_by_pid:52 - Sent SIGTERM to process with PID 3239

INFO | 2024-07-22 13:52:09 | autotrain.app.utils:get_running_jobs:26 - Killing PID: 3239

ERROR | 2024-07-22 13:52:05 | autotrain.trainers.common:wrapper:121 - No GPU found. A GPU is needed for quantization.

RuntimeError: No GPU found. A GPU is needed for quantization.

raise RuntimeError("No GPU found. A GPU is needed for quantization.")

File "/home/herkaba/.local/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 62, in validate_environment

hf_quantizer.validate_environment(

File "/home/herkaba/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3279, in from_pretrained

return model_class.from_pretrained(

File "/home/herkaba/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained

model = AutoModelForCausalLM.from_pretrained(

File "/home/herkaba/.local/lib/python3.10/site-packages/autotrain/trainers/clm/utils.py", line 649, in get_model

model = utils.get_model(config, tokenizer)

File "/home/herkaba/.local/lib/python3.10/site-packages/autotrain/trainers/clm/train_clm_sft.py", line 25, in train

train_sft(config)

File "/home/herkaba/.local/lib/python3.10/site-packages/autotrain/trainers/clm/main.py", line 28, in train

return func(*args, **kwargs)

File "/home/herkaba/.local/lib/python3.10/site-packages/autotrain/trainers/common.py", line 117, in wrapper

ERROR | 2024-07-22 13:52:05 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):

INFO | 2024-07-22 13:52:05 | autotrain.trainers.clm.utils:get_model:635 - loading model...

INFO | 2024-07-22 13:52:05 | autotrain.trainers.clm.utils:get_model:627 - loading model config...

WARNING | 2024-07-22 13:52:05 | autotrain.trainers.clm.utils:get_model:625 - Unsloth not available, continuing without it...

INFO | 2024-07-22 13:52:05 | autotrain.trainers.clm.utils:get_model:583 - Can use unsloth: False

INFO | 2024-07-22 13:52:05 | autotrain.trainers.clm.utils:configure_block_size:548 - Using block size 1024

INFO | 2024-07-22 13:52:05 | autotrain.trainers.clm.utils:configure_training_args:485 - configuring training args

INFO | 2024-07-22 13:52:05 | autotrain.trainers.clm.utils:configure_logging_steps:480 - Logging steps: 4

INFO | 2024-07-22 13:52:05 | autotrain.trainers.clm.utils:configure_logging_steps:467 - configuring logging steps

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

INFO | 2024-07-22 13:52:04 | autotrain.trainers.clm.utils:process_input_data:395 - Valid data: None

})

num_rows: 83

features: ['autotrain_text'],

INFO | 2024-07-22 13:52:04 | autotrain.trainers.clm.utils:process_input_data:394 - Train data: Dataset({

INFO | 2024-07-22 13:52:04 | autotrain.trainers.clm.utils:process_input_data:335 - loading dataset from disk

INFO | 2024-07-22 13:52:04 | autotrain.trainers.clm.train_clm_sft:train:12 - Starting SFT training...

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

INFO | 2024-07-22 13:52:00 | autotrain.backends.local:create:13 - Training PID: 3239

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 30 days with no activity.

Willian7004 commented 6 days ago

I ran on huggingface's hub to run through the process without using the GPU and had the same problem. The same problem occurs when I start training after the dataset is downloaded.