huggingface / autotrain-advanced

πŸ€— AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
3.76k stars 461 forks source link

autotrain.trainers.common:wrapper:92 - "None of [Index(['text'], dtype='object')] are in the [columns]" #482

Closed charles-123456 closed 6 months ago

charles-123456 commented 7 months ago

/home/chnadmin/anaconda3/lib/python3.11/site-packages/torch/cuda/init.py:628: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML")

INFO Running LLM INFO Params: Namespace(version=False, text_column='text', rejected_text_column='rejected', prompt_text_column='prompt', model_ref=None, warmup_ratio=0.1, optimizer='adamw_torch', scheduler='linear', weight_decay=0.01, max_grad_norm=1.0, add_eos_token=False, block_size=1024, peft=True, lora_r=16, lora_alpha=32, lora_dropout=0.045, logging_steps=-1, evaluation_strategy='epoch', save_total_limit=1, save_strategy='epoch', auto_find_batch_size=False, mixed_precision='fp16', quantization='int4', model_max_length=1024, trainer='default', target_modules=None, merge_adapter=False, use_flash_attention_2=False, dpo_beta=0.1, apply_chat_template=False, padding=None, train=True, deploy=False, inference=False, username=None, backend='local-cli', token='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', repo_id='Charles333/catai_pythianlocal_finetuning', push_to_hub=True, model='EleutherAI/pythia-1b', project_name='my_autotrain_llm', seed=42, epochs=30, gradient_accumulation=4, disable_gradient_checkpointing=False, lr=0.0002, log='none', data_path='/home/chnadmin/Documents/corent/CATAI/Finetuning/my_autotrain_llm/', train_split='train', valid_split=None, batch_size=1, func=<function run_llm_command_factory at 0x7fa35c5ba520>) INFO Starting local training... INFO {"model":"EleutherAI/pythia-1b","project_name":"my_autotrain_llm","data_path":"/home/chnadmin/Documents/corent/CATAI/Finetuning/my_autotrain_llm/","train_split":"train","valid_split":null,"add_eos_token":false,"block_size":1024,"model_max_length":1024,"padding":null,"trainer":"default","use_flash_attention_2":false,"log":"none","disable_gradient_checkpointing":false,"logging_steps":-1,"evaluation_strategy":"epoch","save_total_limit":1,"save_strategy":"epoch","auto_find_batch_size":false,"mixed_precision":"fp16","lr":0.0002,"epochs":30,"batch_size":1,"warmup_ratio":0.1,"gradient_accumulation":4,"optimizer":"adamw_torch","scheduler":"linear","weight_decay":0.01,"max_grad_norm":1.0,"seed":42,"apply_chat_template":false,"quantization":"int4","target_modules":null,"merge_adapter":false,"peft":true,"lora_r":16,"lora_alpha":32,"lora_dropout":0.045,"model_ref":null,"dpo_beta":0.1,"prompt_text_column":"prompt","text_column":"text","rejected_text_column":"rejected","push_to_hub":true,"repo_id":"Charles333/catai_pythianlocal_finetuning","username":null,"token":"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"} WARNING No GPU found. Forcing training on CPU. This will be super slow! INFO ['accelerate', 'launch', '--cpu', '-m', 'autotrain.trainers.clm', '--training_config', 'my_autotrain_llm/training_params.json'] /home/chnadmin/anaconda3/lib/python3.11/site-packages/torch/cuda/init.py:628: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 0 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. /home/chnadmin/anaconda3/lib/python3.11/site-packages/torch/cuda/init.py:628: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") /home/chnadmin/anaconda3/lib/python3.11/site-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache arguement will be deprecated soon, please use optimize_device_cache instead. warnings.warn( Downloading data files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 5915.80it/s] Extracting data files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 1014.83it/s] Generating train split: 1 examples [00:00, 160.56 examples/s] πŸš€ INFO | 2024-01-31 18:52:45 | main:process_input_data:82 - Train data: Dataset({ features: ['model', 'project_name', 'data_path', 'train_split', 'valid_split', 'add_eos_token', 'block_size', 'model_max_length', 'padding', 'trainer', 'use_flash_attention_2', 'log', 'disable_gradient_checkpointing', 'logging_steps', 'evaluation_strategy', 'save_total_limit', 'save_strategy', 'auto_find_batch_size', 'mixed_precision', 'lr', 'epochs', 'batch_size', 'warmup_ratio', 'gradient_accumulation', 'optimizer', 'scheduler', 'weight_decay', 'max_grad_norm', 'seed', 'apply_chat_template', 'quantization', 'target_modules', 'merge_adapter', 'peft', 'lora_r', 'lora_alpha', 'lora_dropout', 'model_ref', 'dpo_beta', 'prompt_text_column', 'text_column', 'rejected_text_column', 'push_to_hub', 'repo_id', 'username', 'token'], ... raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Index(['text'], dtype='object')] are in the [columns]"

❌ ERROR | 2024-01-31 18:52:46 | autotrain.trainers.common:wrapper:92 - "None of [Index(['text'], dtype='object')] are in the [columns]" Output is truncated. View as a scrollable element or

waghydjemy commented 7 months ago

Please provide more details about the issue i.e. command you are using, env ...etc

Stumbled upon a similar issue, the solution for me was to ensure my dataset has the text column This dataset worked for me

abhishekkrthakur commented 7 months ago

@charles-123456 it seems like text column is missing. it also seems like you have pasted your hf token, please invalidate it asap.!

github-actions[bot] commented 6 months ago

This issue is stale because it has been open for 15 days with no activity.

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 2 days since being marked as stale.