huggingface / autotrain-advanced

🤗 AutoTrain Advanced
https://huggingface.co/autotrain
Apache License 2.0
4.02k stars 492 forks source link

[BUG] UserWarning: Using the model-agnostic default `max_length` (=20) #544

Closed tombenj closed 6 months ago

tombenj commented 8 months ago

Prerequisites

Backend

Hugging Face Space/Endpoints

Interface Used

UI

CLI Command

No response

UI Screenshots & Parameters

{ "seed": 42, "lr": 0.00005, "epochs": 3, "max_seq_length": 512, "max_target_length": 256, "max_length": 1024, "max_new_tokens": 100, "batch_size": 8, "warmup_ratio": 0.1, "gradient_accumulation": 1, "optimizer": "adamw_torch", "scheduler": "linear", "weight_decay": 0, "max_grad_norm": 1, "logging_steps": -1, "evaluation_strategy": "epoch", "auto_find_batch_size": false, "mixed_precision": "fp16", "save_total_limit": 1, "save_strategy": "epoch", "peft": false, "quantization": null, "lora_r": 16, "lora_alpha": 32, "lora_dropout": 0.05, "target_modules": [ "all-linear" ] }

Screen Shot 2024-03-10 at 17 42 01

Error Logs

67%|██████▋ | 18000/27000 [1:28:30<36:56, 4.06it/s]/app/env/lib/python3.10/site-packages/transformers/generation/utils.py:1178: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the ma

Additional Information

Using seq2seq with google-t5 t5-base. Would love any suggestions on how to force it.

tombenj commented 8 months ago

So it seems the Seq2Seq in Google models such as t5, mt5 etc are limited to 20 tokens output due to this? ie the required params are not passed through?

abhishekkrthakur commented 8 months ago

no. thats just validation. inferencce doesnt matter

tombenj commented 8 months ago

no. thats just validation. inferencce doesnt matter

So any idea why the inference output is always 20 tokens while when I train using Bart I get 256+?

tombenj commented 8 months ago

Seems as though the Seq2Seq args aren't passing thorugh (especially for google models).

abhishekkrthakur commented 8 months ago

taking a look!

tombenj commented 8 months ago

Full trace attached. The model is still generating only max 20 tokens: [``` nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Package punkt is already up-to-date!

WARNING Parameters not supplied by user and set to default: push_to_hub, model_ref, auto_find_batch_size, add_eos_token, data_path, lr, project_name, disable_gradient_checkpointing, logging_steps, optimizer, token, seed, lora_dropout, lora_r, rejected_text_column, batch_size, prompt_text_column, model_max_length, weight_decay, max_grad_norm, merge_adapter, gradient_accumulation, use_flash_attention_2, scheduler, valid_split, trainer, text_column, username, repo_id, lora_alpha, model, save_strategy, warmup_ratio, evaluation_strategy, save_total_limit, train_split, dpo_beta WARNING Parameters not supplied by user and set to default: batch_size, epochs, log, weight_decay, max_grad_norm, auto_find_batch_size, max_seq_length, gradient_accumulation, scheduler, data_path, lr, valid_split, text_column, username, project_name, target_column, repo_id, logging_steps, optimizer, token, model, save_strategy, seed, warmup_ratio, save_total_limit, evaluation_strategy, push_to_hub, train_split WARNING Parameters not supplied by user and set to default: batch_size, epochs, log, weight_decay, max_grad_norm, auto_find_batch_size, gradient_accumulation, scheduler, data_path, lr, username, valid_split, image_column, project_name, repo_id, target_column, logging_steps, optimizer, token, model, save_strategy, seed, warmup_ratio, save_total_limit, evaluation_strategy, push_to_hub, train_split WARNING Parameters supplied but not used: target_modules WARNING Parameters not supplied by user and set to default: epochs, auto_find_batch_size, data_path, lr, project_name, logging_steps, token, optimizer, seed, lora_dropout, lora_r, target_modules, max_target_length, batch_size, weight_decay, max_grad_norm, max_seq_length, gradient_accumulation, scheduler, username, valid_split, text_column, target_column, repo_id, lora_alpha, model, save_strategy, warmup_ratio, evaluation_strategy, save_total_limit, train_split, peft, push_to_hub, quantization WARNING Parameters not supplied by user and set to default: id_column, categorical_columns, num_trials, numerical_columns, data_path, username, valid_split, repo_id, project_name, task, token, target_columns, model, seed, train_split, push_to_hub, time_limit WARNING Parameters not supplied by user and set to default: resume_from_checkpoint, epochs, lr_power, tokenizer_max_length, validation_images, adam_beta1, num_cycles, num_class_images, project_name, token, pre_compute_text_embeddings, sample_batch_size, allow_tf32, xl, num_validation_images, seed, scale_lr, validation_epochs, checkpoints_total_limit, class_prompt, revision, rank, adam_weight_decay, prior_preservation, class_image_path, prior_loss_weight, max_grad_norm, adam_epsilon, scheduler, username, tokenizer, text_encoder_use_attention_mask, image_path, dataloader_num_workers, repo_id, class_labels_conditioning, prior_generation_precision, model, adam_beta2, validation_prompt, local_rank, checkpointing_steps, center_crop, push_to_hub, logging, warmup_steps WARNING Parameters not supplied by user and set to default: tags_column, batch_size, epochs, log, tokens_column, weight_decay, max_grad_norm, auto_find_batch_size, max_seq_length, gradient_accumulation, scheduler, data_path, lr, valid_split, username, repo_id, project_name, logging_steps, optimizer, token, model, save_strategy, seed, warmup_ratio, save_total_limit, evaluation_strategy, push_to_hub, train_split INFO AutoTrain Public URL: NgrokTunnel: "https://b320-34-72-237-89.ngrok-free.app/" -> "http://localhost:7860/" INFO Please wait for the app to load... INFO *** INFO: Started server process [7599] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:7860/ (Press CTRL+C to quit) INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET / HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /logo.png HTTP/1.1" 200 OK INFO Task: llm:sft INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /params/llm%3Asft HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /model_choices/llm%3Asft HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /favicon.ico HTTP/1.1" 404 Not Found INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Task: seq2seq INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /params/seq2seq HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /model_choices/seq2seq HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO hardware: Local INFO Running jobs: [] INFO Task: seq2seq INFO Column mapping: {'text': 'text', 'label': 'target'} INFO Dataset: autotrain-gsxqu-k795g (seq2seq) Train data: [<tempfile.SpooledTemporaryFile object at 0x7e16bada5b40>] Valid data: [] Column mapping: {'text': 'text', 'label': 'target'}

Saving the dataset (1/1 shards): 100% 800/800 [00:00<00:00, 238160.49 examples/s] Saving the dataset (1/1 shards): 100% 200/200 [00:00<00:00, 86072.32 examples/s]

WARNING Parameters not supplied by user and set to default: train_split WARNING Parameters supplied but not used: model_max_length, max_length, max_new_tokens INFO Starting local training... INFO {"data_path":"autotrain-gsxqu-k795g/autotrain-data","model":"google-t5/t5-base","username":"tombenj","seed":42,"train_split":"train","valid_split":"validation","project_name":"autotrain-gsxqu-k795g","token":"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","push_to_hub":true,"text_column":"autotrain_text","target_column":"autotrain_label","repo_id":"tombenj/autotrain-gsxqu-k795g","lr":0.00005,"epochs":1,"max_seq_length":1024,"max_target_length":1024,"batch_size":8,"warmup_ratio":0.1,"gradient_accumulation":1,"optimizer":"adamw_torch","scheduler":"linear","weight_decay":0.0,"max_grad_norm":1.0,"logging_steps":-1,"evaluation_strategy":"epoch","auto_find_batch_size":false,"mixed_precision":"fp16","save_total_limit":1,"save_strategy":"epoch","peft":false,"quantization":null,"lora_r":16,"lora_alpha":32,"lora_dropout":0.05,"target_modules":["all-linear"]} INFO ['accelerate', 'launch', '--num_machines', '1', '--num_processes', '1', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.seq2seq', '--training_config', 'autotrain-gsxqu-k795g/training_params.json'] INFO Training PID: 7899 INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "POST /create_project HTTP/1.1" 200 OK The following values were not passed to accelerate launch and had defaults used instead: --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK 🚀 INFO | 2024-03-13 09:15:12 | main:train:45 - Starting training... 🚀 INFO | 2024-03-13 09:15:12 | main:train:46 - Training config: {'data_path': 'autotrain-gsxqu-k795g/autotrain-data', 'model': 'google-t5/t5-base', 'username': 'tombenj', 'seed': 42, 'train_split': 'train', 'valid_split': 'validation', 'project_name': 'autotrain-gsxqu-k795g', 'token': '*****', 'push_to_hub': True, 'text_column': 'autotrain_text', 'target_column': 'autotrain_label', 'repo_id': 'tombenj/autotrain-gsxqu-k795g', 'lr': 5e-05, 'epochs': 1, 'max_seq_length': 1024, 'max_target_length': 1024, 'batch_size': 8, 'warmup_ratio': 0.1, 'gradient_accumulation': 1, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'logging_steps': -1, 'evaluation_strategy': 'epoch', 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'save_total_limit': 1, 'save_strategy': 'epoch', 'peft': False, 'quantization': None, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'target_modules': ['all-linear']} 🚀 INFO | 2024-03-13 09:15:12 | main:train:53 - loading dataset from disk 🚀 INFO | 2024-03-13 09:15:12 | main:train:64 - loading dataset from disk INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK /usr/local/lib/python3.10/dist-packages/transformers/models/t5/tokenization_t5_fast.py:171: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5. For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with truncation is True.

  • Be aware that you SHOULD NOT rely on google-t5/t5-base automatically truncating your input to 512 when padding/encoding.
  • If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with model_max_length or pass max_length when encoding/padding.
  • To avoid this warning, please instantiate this tokenizer with model_max_length set to your preferred value. warnings.warn( INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK {'loss': 10.2023, 'grad_norm': 29.899457931518555, 'learning_rate': 1.5e-05, 'epoch': 0.05} 8% 8/100 [00:04<00:39, 2.33it/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 6.8607, 'grad_norm': 57.67634963989258, 'learning_rate': 3.5e-05, 'epoch': 0.1} {'loss': 2.1809, 'grad_norm': 4.584061622619629, 'learning_rate': 4.888888888888889e-05, 'epoch': 0.15} {'loss': 1.2057, 'grad_norm': 3.255554437637329, 'learning_rate': 4.6111111111111115e-05, 'epoch': 0.2} 20% 20/100 [00:09<00:34, 2.35it/s]INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 0.8056, 'grad_norm': 2.723172426223755, 'learning_rate': 4.3333333333333334e-05, 'epoch': 0.25} {'loss': 0.5849, 'grad_norm': 1.7794570922851562, 'learning_rate': 4.055555555555556e-05, 'epoch': 0.3} 32% 32/100 [00:14<00:28, 2.36it/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 0.418, 'grad_norm': 1.8489391803741455, 'learning_rate': 3.777777777777778e-05, 'epoch': 0.35} {'loss': 0.3179, 'grad_norm': 1.1671098470687866, 'learning_rate': 3.5e-05, 'epoch': 0.4} 44% 44/100 [00:19<00:23, 2.36it/s]INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 0.2652, 'grad_norm': 1.281832218170166, 'learning_rate': 3.222222222222223e-05, 'epoch': 0.45} {'loss': 0.2405, 'grad_norm': 0.9086970686912537, 'learning_rate': 2.9444444444444448e-05, 'epoch': 0.5} {'loss': 0.2329, 'grad_norm': 1.1303473711013794, 'learning_rate': 2.6666666666666667e-05, 'epoch': 0.55} 55% 55/100 [00:24<00:19, 2.32it/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 0.2199, 'grad_norm': 1.0673601627349854, 'learning_rate': 2.3888888888888892e-05, 'epoch': 0.6} {'loss': 0.2137, 'grad_norm': 0.8874663710594177, 'learning_rate': 2.111111111111111e-05, 'epoch': 0.65} 67% 67/100 [00:29<00:14, 2.35it/s]INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 0.1876, 'grad_norm': 0.7264275550842285, 'learning_rate': 1.8333333333333333e-05, 'epoch': 0.7} {'loss': 0.1834, 'grad_norm': 0.8036168217658997, 'learning_rate': 1.5555555555555555e-05, 'epoch': 0.75} 78% 78/100 [00:34<00:09, 2.30it/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 0.1952, 'grad_norm': 0.6779347062110901, 'learning_rate': 1.2777777777777777e-05, 'epoch': 0.8} {'loss': 0.1827, 'grad_norm': 0.7915838360786438, 'learning_rate': 1e-05, 'epoch': 0.85} {'loss': 0.187, 'grad_norm': 0.8215489387512207, 'learning_rate': 7.222222222222222e-06, 'epoch': 0.9} {'loss': 0.1734, 'grad_norm': 0.7938928604125977, 'learning_rate': 4.444444444444445e-06, 'epoch': 0.95} {'loss': 0.172, 'grad_norm': 0.6198846697807312, 'learning_rate': 1.6666666666666667e-06, 'epoch': 1.0} 100% 100/100 [00:43<00:00, 2.36it/s]/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1178: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation. warnings.warn(

0% 0/13 [00:00<?, ?it/s] 15% 2/13 [00:00<00:05, 2.07it/s] 23% 3/13 [00:02<00:07, 1.36it/s] 31% 4/13 [00:02<00:07, 1.25it/s] 38% 5/13 [00:03<00:06, 1.23it/s] 46% 6/13 [00:04<00:05, 1.20it/s] 54% 7/13 [00:05<00:05, 1.19it/s] 62% 8/13 [00:06<00:04, 1.18it/s] 69% 9/13 [00:07<00:03, 1.17it/s] 77% 10/13 [00:08<00:02, 1.16it/s] 85% 11/13 [00:09<00:01, 1.15it/s] 92% 12/13 [00:09<00:00, 1.15it/s]

{'eval_loss': 0.1557321548461914, 'eval_rouge1': 14.9506, 'eval_rouge2': 12.1047, 'eval_rougeL': 14.938, 'eval_rougeLsum': 14.9251, 'eval_gen_len': 19.0, 'eval_runtime': 12.0496, 'eval_samples_per_second': 16.598, 'eval_steps_per_second': 1.079, 'epoch': 1.0} 100% 100/100 [00:55<00:00, 2.36it/s] 100% 13/13 [00:11<00:00, 1.25it/s] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK

INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']. {'train_runtime': 78.2901, 'train_samples_per_second': 10.218, 'train_steps_per_second': 1.277, 'train_loss': 1.2514769697189332, 'epoch': 1.0} 100% 100/100 [01:18<00:00, 1.28it/s] 🚀 INFO | 2024-03-13 09:16:38 | main:train:204 - Finished training, saving model... INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1178: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation. warnings.warn( 15% 2/13 [00:01<00:05, 1.92it/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK 54% 7/13 [00:05<00:05, 1.19it/s]INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK 100% 13/13 [00:11<00:00, 1.18it/s] 🚀 INFO | 2024-03-13 09:16:54 | main:train:218 - Pushing model to hub... INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK model.safetensors: 0% 0.00/892M [00:00<?, ?B/s] rng_state.pth: 0% 0.00/14.2k [00:00<?, ?B/s]

optimizer.pt: 0% 0.00/1.78G [00:00<?, ?B/s]

spiece.model: 0% 0.00/792k [00:00<?, ?B/s]

Upload 11 LFS files: 0% 0/11 [00:00<?, ?it/s]

scheduler.pt: 0% 0.00/1.06k [00:00<?, ?B/s]

optimizer.pt: 0% 16.4k/1.78G [00:00<10:59:15, 45.1kB/s]

spiece.model: 2% 16.4k/792k [00:00<00:17, 45.2kB/s]

model.safetensors: 0% 16.4k/892M [00:00<5:48:26, 42.6kB/s] scheduler.pt: 100% 1.06k/1.06k [00:00<00:00, 2.36kB/s]

rng_state.pth: 100% 14.2k/14.2k [00:00<00:00, 28.2kB/s] spiece.model: 100% 792k/792k [00:00<00:00, 1.16MB/s]

optimizer.pt: 1% 16.0M/1.78G [00:00<01:03, 28.0MB/s]

optimizer.pt: 1% 25.4M/1.78G [00:00<00:41, 42.6MB/s] model.safetensors: 2% 16.0M/892M [00:00<00:39, 22.0MB/s]

training_args.bin: 100% 5.05k/5.05k [00:00<00:00, 86.1kB/s]

model.safetensors: 3% 22.9M/892M [00:01<00:27, 31.1MB/s]

events.out.tfevents.1710321319.a7547a852de5.7919.0: 100% 10.6k/10.6k [00:00<00:00, 91.3kB/s]

events.out.tfevents.1710321414.a7547a852de5.7919.1: 0% 0.00/603 [00:00<?, ?B/s]

model.safetensors: 1% 8.21M/892M [00:00<00:25, 35.0MB/s]

events.out.tfevents.1710321414.a7547a852de5.7919.1: 100% 603/603 [00:00<00:00, 5.20kB/s]

model.safetensors: 2% 14.6M/892M [00:00<00:19, 45.0MB/s] spiece.model: 0% 0.00/792k [00:00<?, ?B/s]

model.safetensors: 4% 32.0M/892M [00:01<00:34, 25.0MB/s]

model.safetensors: 2% 19.2M/892M [00:00<00:29, 29.3MB/s]

spiece.model: 100% 792k/792k [00:00<00:00, 2.65MB/s] model.safetensors: 5% 40.8M/892M [00:01<00:25, 33.7MB/s]

training_args.bin: 100% 5.05k/5.05k [00:00<00:00, 65.7kB/s]

model.safetensors: 3% 24.2M/892M [00:00<00:27, 31.7MB/s]

model.safetensors: 5% 46.4M/892M [00:01<00:23, 35.8MB/s]

model.safetensors: 3% 28.2M/892M [00:00<00:25, 33.6MB/s]

optimizer.pt: 3% 61.0M/1.78G [00:01<00:41, 41.5MB/s]

model.safetensors: 6% 51.0M/892M [00:01<00:27, 30.7MB/s]

model.safetensors: 7% 62.1M/892M [00:02<00:18, 45.0MB/s]

optimizer.pt: 4% 73.6M/1.78G [00:02<00:41, 41.7MB/s]

model.safetensors: 8% 68.1M/892M [00:02<00:26, 31.5MB/s]

model.safetensors: 4% 39.4M/892M [00:01<00:44, 19.1MB/s]

model.safetensors: 9% 79.0M/892M [00:02<00:19, 42.3MB/s]

optimizer.pt: 5% 87.1M/1.78G [00:02<00:47, 35.9MB/s]

model.safetensors: 5% 44.3M/892M [00:01<00:36, 23.2MB/s]

model.safetensors: 10% 84.9M/892M [00:02<00:24, 32.9MB/s]

model.safetensors: 11% 94.4M/892M [00:02<00:19, 41.0MB/s]

optimizer.pt: 6% 101M/1.78G [00:02<00:46, 36.2MB/s]

model.safetensors: 6% 54.1M/892M [00:02<00:32, 25.7MB/s]

optimizer.pt: 6% 107M/1.78G [00:03<00:42, 39.8MB/s]

model.safetensors: 11% 100M/892M [00:03<00:24, 33.0MB/s]

model.safetensors: 13% 112M/892M [00:03<00:17, 45.7MB/s]

model.safetensors: 13% 118M/892M [00:03<00:21, 35.5MB/s]

model.safetensors: 14% 127M/892M [00:03<00:17, 43.4MB/s]

optimizer.pt: 7% 128M/1.78G [00:03<00:50, 32.6MB/s]

model.safetensors: 8% 72.9M/892M [00:02<00:33, 24.5MB/s]

optimizer.pt: 8% 135M/1.78G [00:03<00:42, 38.6MB/s]

model.safetensors: 15% 133M/892M [00:04<00:23, 32.3MB/s]

model.safetensors: 9% 80.0M/892M [00:03<00:34, 23.6MB/s]

optimizer.pt: 8% 147M/1.78G [00:04<00:45, 35.8MB/s]

model.safetensors: 10% 88.4M/892M [00:03<00:25, 31.3MB/s]

model.safetensors: 16% 144M/892M [00:04<00:20, 36.0MB/s]

model.safetensors: 11% 95.0M/892M [00:03<00:21, 36.8MB/s]

model.safetensors: 18% 160M/892M [00:04<00:15, 45.9MB/s]

model.safetensors: 11% 100M/892M [00:03<00:26, 30.3MB/s] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK

optimizer.pt: 9% 162M/1.78G [00:04<00:52, 30.8MB/s]

model.safetensors: 12% 111M/892M [00:03<00:17, 44.5MB/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK

model.safetensors: 19% 165M/892M [00:04<00:20, 36.3MB/s]

model.safetensors: 19% 172M/892M [00:05<00:18, 39.3MB/s]

model.safetensors: 14% 124M/892M [00:04<00:19, 40.3MB/s]

model.safetensors: 20% 177M/892M [00:05<00:21, 33.7MB/s]

model.safetensors: 21% 187M/892M [00:05<00:15, 46.4MB/s]

model.safetensors: 14% 129M/892M [00:04<00:22, 34.2MB/s]

optimizer.pt: 11% 190M/1.78G [00:05<00:41, 38.1MB/s]

model.safetensors: 15% 135M/892M [00:04<00:20, 36.4MB/s]

model.safetensors: 22% 193M/892M [00:05<00:19, 35.6MB/s]

model.safetensors: 23% 204M/892M [00:05<00:14, 47.1MB/s]

optimizer.pt: 11% 201M/1.78G [00:05<00:42, 37.3MB/s]

model.safetensors: 17% 149M/892M [00:04<00:21, 35.1MB/s]

model.safetensors: 24% 210M/892M [00:05<00:16, 41.0MB/s]

model.safetensors: 17% 155M/892M [00:05<00:19, 38.1MB/s]

model.safetensors: 24% 215M/892M [00:06<00:15, 42.6MB/s]

model.safetensors: 25% 221M/892M [00:06<00:15, 43.3MB/s]

model.safetensors: 18% 160M/892M [00:05<00:22, 33.0MB/s]

model.safetensors: 18% 164M/892M [00:05<00:21, 34.2MB/s]

optimizer.pt: 12% 222M/1.78G [00:06<00:39, 40.0MB/s]

model.safetensors: 19% 172M/892M [00:05<00:16, 44.2MB/s]

model.safetensors: 25% 226M/892M [00:06<00:24, 27.1MB/s]

model.safetensors: 20% 177M/892M [00:05<00:20, 35.6MB/s]

model.safetensors: 26% 232M/892M [00:06<00:21, 30.7MB/s]

model.safetensors: 27% 237M/892M [00:06<00:19, 32.8MB/s]

model.safetensors: 21% 190M/892M [00:05<00:16, 42.8MB/s]

model.safetensors: 27% 241M/892M [00:07<00:22, 29.1MB/s]

model.safetensors: 22% 195M/892M [00:06<00:18, 37.1MB/s]

model.safetensors: 28% 247M/892M [00:07<00:19, 33.4MB/s]

model.safetensors: 28% 252M/892M [00:07<00:17, 36.9MB/s]

model.safetensors: 23% 207M/892M [00:06<00:16, 41.6MB/s]

optimizer.pt: 15% 260M/1.78G [00:07<00:42, 35.8MB/s]

model.safetensors: 29% 256M/892M [00:07<00:21, 29.4MB/s]

model.safetensors: 30% 264M/892M [00:07<00:16, 37.3MB/s]

model.safetensors: 30% 271M/892M [00:07<00:14, 43.1MB/s]

model.safetensors: 25% 223M/892M [00:06<00:15, 41.9MB/s]

optimizer.pt: 15% 276M/1.78G [00:07<00:42, 35.1MB/s]

optimizer.pt: 16% 287M/1.78G [00:07<00:32, 46.5MB/s]

model.safetensors: 31% 276M/892M [00:08<00:20, 30.1MB/s]

model.safetensors: 32% 282M/892M [00:08<00:17, 34.5MB/s]

optimizer.pt: 16% 294M/1.78G [00:08<00:39, 37.4MB/s]

optimizer.pt: 17% 301M/1.78G [00:08<00:34, 43.0MB/s]

model.safetensors: 27% 244M/892M [00:07<00:18, 34.7MB/s]

model.safetensors: 32% 288M/892M [00:08<00:22, 27.3MB/s]

model.safetensors: 34% 300M/892M [00:08<00:13, 42.9MB/s]

optimizer.pt: 17% 312M/1.78G [00:08<00:39, 36.8MB/s]

model.safetensors: 29% 262M/892M [00:07<00:16, 38.4MB/s]

model.safetensors: 34% 306M/892M [00:08<00:16, 35.2MB/s]

model.safetensors: 35% 313M/892M [00:08<00:14, 39.8MB/s]

optimizer.pt: 18% 325M/1.78G [00:09<00:42, 34.5MB/s]

model.safetensors: 31% 275M/892M [00:08<00:15, 39.0MB/s]

optimizer.pt: 19% 331M/1.78G [00:09<00:36, 39.6MB/s]

model.safetensors: 36% 320M/892M [00:09<00:17, 32.5MB/s]

model.safetensors: 37% 329M/892M [00:09<00:13, 41.9MB/s]

optimizer.pt: 19% 336M/1.78G [00:09<00:44, 32.5MB/s]

model.safetensors: 33% 290M/892M [00:08<00:16, 36.7MB/s]

optimizer.pt: 19% 344M/1.78G [00:09<00:35, 40.7MB/s]

model.safetensors: 33% 296M/892M [00:08<00:15, 37.5MB/s]

optimizer.pt: 20% 350M/1.78G [00:09<00:33, 43.0MB/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK model.safetensors: 38% 336M/892M [00:09<00:16, 34.6MB/s]

model.safetensors: 39% 344M/892M [00:09<00:13, 40.6MB/s]

model.safetensors: 39% 350M/892M [00:09<00:12, 44.7MB/s]

optimizer.pt: 20% 363M/1.78G [00:09<00:32, 43.2MB/s]

model.safetensors: 35% 308M/892M [00:09<00:18, 31.1MB/s]

model.safetensors: 40% 356M/892M [00:10<00:14, 36.4MB/s]

model.safetensors: 41% 363M/892M [00:10<00:12, 43.2MB/s]

model.safetensors: 41% 368M/892M [00:10<00:15, 33.7MB/s]

optimizer.pt: 22% 384M/1.78G [00:10<00:37, 36.9MB/s]

model.safetensors: 43% 382M/892M [00:10<00:09, 51.9MB/s]

model.safetensors: 37% 327M/892M [00:09<00:20, 27.3MB/s]

optimizer.pt: 22% 391M/1.78G [00:10<00:34, 40.6MB/s]

model.safetensors: 37% 332M/892M [00:09<00:18, 30.6MB/s]

model.safetensors: 44% 389M/892M [00:10<00:13, 37.6MB/s]

model.safetensors: 38% 336M/892M [00:10<00:20, 27.1MB/s]

optimizer.pt: 23% 403M/1.78G [00:10<00:37, 36.6MB/s]

model.safetensors: 38% 343M/892M [00:10<00:15, 34.8MB/s]

optimizer.pt: 23% 407M/1.78G [00:11<00:35, 38.5MB/s]

model.safetensors: 39% 350M/892M [00:10<00:13, 40.0MB/s]

model.safetensors: 46% 411M/892M [00:11<00:10, 45.1MB/s]

optimizer.pt: 23% 419M/1.78G [00:11<00:38, 35.8MB/s]

optimizer.pt: 24% 428M/1.78G [00:11<00:29, 46.6MB/s]

model.safetensors: 47% 417M/892M [00:11<00:11, 40.7MB/s]

model.safetensors: 47% 423M/892M [00:11<00:10, 43.9MB/s]

optimizer.pt: 24% 433M/1.78G [00:11<00:36, 37.5MB/s]

model.safetensors: 48% 431M/892M [00:11<00:09, 47.9MB/s]

optimizer.pt: 25% 439M/1.78G [00:11<00:33, 39.6MB/s]

model.safetensors: 49% 436M/892M [00:12<00:12, 37.6MB/s]

model.safetensors: 50% 445M/892M [00:12<00:10, 44.3MB/s]

optimizer.pt: 25% 448M/1.78G [00:12<00:38, 34.7MB/s]

model.safetensors: 43% 382M/892M [00:11<00:12, 39.7MB/s]

model.safetensors: 50% 450M/892M [00:12<00:11, 38.9MB/s]

model.safetensors: 51% 457M/892M [00:12<00:09, 44.5MB/s]

model.safetensors: 52% 464M/892M [00:12<00:08, 47.6MB/s]

optimizer.pt: 26% 464M/1.78G [00:12<00:36, 36.4MB/s]

model.safetensors: 45% 397M/892M [00:11<00:12, 38.9MB/s]

model.safetensors: 54% 478M/892M [00:12<00:08, 48.0MB/s]

optimizer.pt: 27% 480M/1.78G [00:12<00:33, 38.8MB/s]

model.safetensors: 45% 401M/892M [00:12<00:20, 23.4MB/s]

optimizer.pt: 28% 494M/1.78G [00:13<00:22, 56.3MB/s]

model.safetensors: 54% 484M/892M [00:13<00:11, 35.4MB/s]

model.safetensors: 55% 489M/892M [00:13<00:10, 36.8MB/s]

model.safetensors: 55% 494M/892M [00:13<00:09, 40.2MB/s]

model.safetensors: 47% 416M/892M [00:12<00:17, 27.4MB/s]

model.safetensors: 56% 499M/892M [00:13<00:11, 35.0MB/s]

model.safetensors: 56% 503M/892M [00:13<00:10, 35.7MB/s]

optimizer.pt: 29% 519M/1.78G [00:13<00:30, 41.4MB/s]

model.safetensors: 57% 510M/892M [00:13<00:09, 41.7MB/s]

optimizer.pt: 30% 526M/1.78G [00:13<00:29, 43.2MB/s]

model.safetensors: 58% 515M/892M [00:13<00:09, 38.8MB/s]

model.safetensors: 50% 446M/892M [00:13<00:10, 42.4MB/s]

model.safetensors: 58% 519M/892M [00:14<00:09, 38.2MB/s]

model.safetensors: 59% 524M/892M [00:14<00:09, 38.0MB/s]

model.safetensors: 51% 451M/892M [00:13<00:12, 36.7MB/s]

model.safetensors: 51% 456M/892M [00:13<00:11, 36.6MB/s]

model.safetensors: 59% 528M/892M [00:14<00:12, 28.4MB/s]

model.safetensors: 60% 536M/892M [00:14<00:09, 37.8MB/s]

model.safetensors: 61% 542M/892M [00:14<00:08, 42.2MB/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK

model.safetensors: 53% 472M/892M [00:13<00:10, 39.9MB/s]

optimizer.pt: 31% 561M/1.78G [00:14<00:33, 37.0MB/s]INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK

model.safetensors: 62% 553M/892M [00:15<00:09, 36.8MB/s]

model.safetensors: 54% 480M/892M [00:14<00:12, 33.8MB/s]

model.safetensors: 55% 489M/892M [00:14<00:09, 42.9MB/s]

optimizer.pt: 32% 576M/1.78G [00:15<00:32, 36.8MB/s]

model.safetensors: 63% 560M/892M [00:15<00:11, 27.9MB/s]

model.safetensors: 56% 496M/892M [00:14<00:10, 37.6MB/s]

model.safetensors: 57% 505M/892M [00:14<00:08, 45.8MB/s]

model.safetensors: 64% 571M/892M [00:15<00:08, 38.3MB/s]

model.safetensors: 57% 511M/892M [00:14<00:08, 45.6MB/s]

model.safetensors: 65% 576M/892M [00:15<00:10, 30.9MB/s]

model.safetensors: 66% 586M/892M [00:15<00:07, 41.9MB/s]

model.safetensors: 59% 523M/892M [00:15<00:08, 41.8MB/s]

model.safetensors: 66% 592M/892M [00:16<00:08, 35.7MB/s]

optimizer.pt: 35% 621M/1.78G [00:16<00:24, 46.8MB/s]

model.safetensors: 67% 599M/892M [00:16<00:07, 41.6MB/s]

model.safetensors: 68% 606M/892M [00:16<00:06, 45.2MB/s]

optimizer.pt: 35% 628M/1.78G [00:16<00:30, 37.7MB/s]

model.safetensors: 69% 612M/892M [00:16<00:06, 40.9MB/s]

model.safetensors: 69% 616M/892M [00:16<00:06, 42.2MB/s]

model.safetensors: 70% 624M/892M [00:16<00:06, 39.4MB/s]

model.safetensors: 72% 639M/892M [00:17<00:04, 52.9MB/s]

optimizer.pt: 37% 657M/1.78G [00:17<00:29, 38.7MB/s]

model.safetensors: 72% 645M/892M [00:17<00:05, 43.7MB/s]

optimizer.pt: 38% 669M/1.78G [00:17<00:22, 49.2MB/s]

model.safetensors: 73% 650M/892M [00:17<00:05, 40.9MB/s]

optimizer.pt: 38% 675M/1.78G [00:17<00:25, 42.8MB/s]

model.safetensors: 74% 656M/892M [00:17<00:06, 36.6MB/s]

model.safetensors: 74% 663M/892M [00:17<00:05, 44.0MB/s]

model.safetensors: 63% 560M/892M [00:16<00:15, 21.6MB/s]

model.safetensors: 75% 669M/892M [00:17<00:05, 44.5MB/s]

model.safetensors: 64% 567M/892M [00:16<00:12, 26.8MB/s]

model.safetensors: 65% 576M/892M [00:17<00:08, 37.4MB/s]

model.safetensors: 76% 681M/892M [00:18<00:04, 46.4MB/s]

model.safetensors: 77% 686M/892M [00:18<00:04, 47.3MB/s]

model.safetensors: 78% 691M/892M [00:18<00:04, 41.6MB/s]

model.safetensors: 65% 581M/892M [00:17<00:13, 22.3MB/s]

model.safetensors: 79% 700M/892M [00:18<00:04, 43.0MB/s]

optimizer.pt: 40% 717M/1.78G [00:18<00:26, 39.9MB/s]

model.safetensors: 66% 585M/892M [00:17<00:13, 23.5MB/s]

model.safetensors: 66% 591M/892M [00:17<00:10, 28.2MB/s]

model.safetensors: 79% 705M/892M [00:18<00:06, 29.3MB/s]

model.safetensors: 67% 595M/892M [00:18<00:11, 26.0MB/s]

model.safetensors: 80% 711M/892M [00:19<00:05, 35.2MB/s]

model.safetensors: 80% 715M/892M [00:19<00:04, 36.6MB/s]

optimizer.pt: 41% 734M/1.78G [00:19<00:26, 39.4MB/s]

model.safetensors: 68% 605M/892M [00:18<00:09, 31.6MB/s]

model.safetensors: 81% 720M/892M [00:19<00:06, 26.5MB/s]

model.safetensors: 68% 609M/892M [00:18<00:10, 26.6MB/s]

optimizer.pt: 42% 749M/1.78G [00:19<00:24, 41.8MB/s]

model.safetensors: 82% 727M/892M [00:19<00:05, 32.5MB/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK

model.safetensors: 82% 733M/892M [00:19<00:04, 35.3MB/s]

optimizer.pt: 42% 754M/1.78G [00:19<00:35, 29.3MB/s]

model.safetensors: 83% 737M/892M [00:19<00:05, 29.9MB/s]

model.safetensors: 83% 742M/892M [00:20<00:04, 34.3MB/s]

model.safetensors: 84% 748M/892M [00:20<00:03, 38.1MB/s]

model.safetensors: 71% 637M/892M [00:19<00:06, 38.2MB/s]

optimizer.pt: 43% 770M/1.78G [00:20<00:30, 33.0MB/s]

model.safetensors: 84% 752M/892M [00:20<00:04, 28.8MB/s]

model.safetensors: 85% 760M/892M [00:20<00:03, 39.1MB/s]

model.safetensors: 86% 766M/892M [00:20<00:03, 41.7MB/s]

model.safetensors: 73% 654M/892M [00:19<00:05, 40.2MB/s]

model.safetensors: 86% 771M/892M [00:20<00:03, 37.9MB/s]

model.safetensors: 87% 777M/892M [00:20<00:02, 42.4MB/s]

model.safetensors: 88% 782M/892M [00:20<00:02, 43.7MB/s]

model.safetensors: 74% 664M/892M [00:20<00:06, 34.3MB/s]

model.safetensors: 75% 672M/892M [00:20<00:05, 43.8MB/s]

model.safetensors: 88% 787M/892M [00:21<00:03, 34.4MB/s]

model.safetensors: 89% 793M/892M [00:21<00:02, 41.9MB/s]

model.safetensors: 90% 799M/892M [00:21<00:02, 44.6MB/s]

optimizer.pt: 46% 819M/1.78G [00:21<00:26, 36.9MB/s]

model.safetensors: 77% 685M/892M [00:20<00:05, 38.7MB/s]

optimizer.pt: 46% 825M/1.78G [00:21<00:24, 39.8MB/s]

model.safetensors: 77% 690M/892M [00:20<00:06, 33.2MB/s]

model.safetensors: 79% 702M/892M [00:20<00:03, 50.1MB/s]

optimizer.pt: 47% 832M/1.78G [00:21<00:28, 33.1MB/s]

optimizer.pt: 47% 846M/1.78G [00:21<00:18, 50.3MB/s]

model.safetensors: 79% 709M/892M [00:21<00:04, 42.4MB/s]

model.safetensors: 81% 720M/892M [00:21<00:03, 56.4MB/s]

optimizer.pt: 48% 853M/1.78G [00:22<00:21, 42.9MB/s]

optimizer.pt: 48% 864M/1.78G [00:22<00:23, 39.3MB/s]

model.safetensors: 82% 727M/892M [00:21<00:04, 34.0MB/s]

optimizer.pt: 49% 873M/1.78G [00:22<00:19, 45.6MB/s]

model.safetensors: 82% 733M/892M [00:21<00:04, 36.3MB/s]

model.safetensors: 91% 816M/892M [00:22<00:04, 18.3MB/s]

model.safetensors: 83% 738M/892M [00:22<00:05, 29.4MB/s]

optimizer.pt: 50% 885M/1.78G [00:22<00:22, 39.9MB/s]

model.safetensors: 83% 744M/892M [00:22<00:04, 34.3MB/s]

optimizer.pt: 50% 891M/1.78G [00:23<00:21, 41.9MB/s]

model.safetensors: 92% 822M/892M [00:23<00:03, 18.0MB/s]

optimizer.pt: 50% 896M/1.78G [00:23<00:26, 33.0MB/s]

model.safetensors: 93% 832M/892M [00:23<00:02, 26.5MB/s]

optimizer.pt: 51% 902M/1.78G [00:23<00:23, 37.4MB/s]

model.safetensors: 85% 759M/892M [00:22<00:03, 35.1MB/s]

optimizer.pt: 51% 909M/1.78G [00:23<00:20, 42.1MB/s]

model.safetensors: 94% 838M/892M [00:23<00:02, 24.7MB/s]

optimizer.pt: 51% 914M/1.78G [00:23<00:26, 32.4MB/s]

optimizer.pt: 52% 927M/1.78G [00:23<00:17, 49.7MB/s]

model.safetensors: 95% 848M/892M [00:24<00:01, 26.0MB/s]

model.safetensors: 96% 857M/892M [00:24<00:01, 32.1MB/s]

model.safetensors: 97% 862M/892M [00:24<00:00, 34.4MB/s]

optimizer.pt: 53% 938M/1.78G [00:24<00:21, 38.9MB/s]

model.safetensors: 88% 784M/892M [00:23<00:03, 29.1MB/s]

model.safetensors: 89% 793M/892M [00:23<00:02, 38.7MB/s]

optimizer.pt: 53% 944M/1.78G [00:24<00:24, 33.8MB/s]

optimizer.pt: 54% 957M/1.78G [00:24<00:16, 51.6MB/s]

model.safetensors: 97% 867M/892M [00:24<00:01, 22.0MB/s]

model.safetensors: 98% 873M/892M [00:24<00:00, 26.1MB/s]

model.safetensors: 91% 814M/892M [00:24<00:01, 44.3MB/s]

model.safetensors: 99% 879M/892M [00:25<00:00, 29.1MB/s]

optimizer.pt: 54% 970M/1.78G [00:25<00:19, 41.3MB/s]

model.safetensors: 100% 889M/892M [00:25<00:00, 32.8MB/s]

model.safetensors: 93% 827M/892M [00:24<00:01, 41.4MB/s]

optimizer.pt: 55% 976M/1.78G [00:25<00:22, 35.3MB/s]

optimizer.pt: 55% 983M/1.78G [00:25<00:20, 39.6MB/s]

model.safetensors: 100% 892M/892M [00:25<00:00, 34.8MB/s]

model.safetensors: 94% 842M/892M [00:24<00:01, 48.6MB/s]

optimizer.pt: 56% 992M/1.78G [00:25<00:24, 32.6MB/s]

model.safetensors: 95% 848M/892M [00:24<00:01, 40.8MB/s]

optimizer.pt: 56% 1.00G/1.78G [00:25<00:18, 42.0MB/s]

model.safetensors: 96% 856M/892M [00:25<00:00, 48.8MB/s]

Upload 11 LFS files: 9% 1/11 [00:25<04:19, 25.95s/it]

optimizer.pt: 56% 1.01G/1.78G [00:26<00:17, 44.3MB/s]

model.safetensors: 97% 862M/892M [00:25<00:00, 49.3MB/s]

optimizer.pt: 57% 1.01G/1.78G [00:26<00:19, 39.1MB/s]

model.safetensors: 97% 868M/892M [00:25<00:00, 41.6MB/s]

optimizer.pt: 57% 1.02G/1.78G [00:26<00:17, 42.6MB/s]

model.safetensors: 98% 874M/892M [00:25<00:00, 44.6MB/s]

model.safetensors: 99% 880M/892M [00:25<00:00, 37.8MB/s]

optimizer.pt: 57% 1.02G/1.78G [00:26<00:22, 33.6MB/s]

model.safetensors: 100% 889M/892M [00:25<00:00, 49.3MB/s]

optimizer.pt: 58% 1.03G/1.78G [00:26<00:18, 40.2MB/s]

model.safetensors: 100% 892M/892M [00:25<00:00, 34.3MB/s]

optimizer.pt: 59% 1.05G/1.78G [00:27<00:18, 39.9MB/s]

optimizer.pt: 59% 1.06G/1.78G [00:27<00:21, 33.9MB/s]

optimizer.pt: 60% 1.07G/1.78G [00:27<00:14, 48.8MB/s]

optimizer.pt: 60% 1.08G/1.78G [00:27<00:18, 38.0MB/s]

optimizer.pt: 61% 1.09G/1.78G [00:27<00:14, 49.0MB/s]

optimizer.pt: 61% 1.09G/1.78G [00:28<00:16, 42.4MB/s]

optimizer.pt: 62% 1.10G/1.78G [00:28<00:16, 41.0MB/s]

optimizer.pt: 63% 1.12G/1.78G [00:28<00:11, 56.8MB/s]

optimizer.pt: 63% 1.13G/1.78G [00:29<00:19, 33.8MB/s]

optimizer.pt: 64% 1.14G/1.78G [00:29<00:18, 35.4MB/s]

optimizer.pt: 65% 1.15G/1.78G [00:29<00:12, 49.1MB/s]

optimizer.pt: 65% 1.16G/1.78G [00:29<00:13, 46.1MB/s]

optimizer.pt: 65% 1.17G/1.78G [00:30<00:17, 35.1MB/s]

optimizer.pt: 66% 1.18G/1.78G [00:30<00:12, 48.2MB/s]

optimizer.pt: 67% 1.19G/1.78G [00:30<00:14, 41.5MB/s]

optimizer.pt: 67% 1.20G/1.78G [00:30<00:16, 36.1MB/s]

optimizer.pt: 68% 1.21G/1.78G [00:30<00:11, 49.6MB/s]

optimizer.pt: 69% 1.22G/1.78G [00:31<00:15, 35.9MB/s]

optimizer.pt: 69% 1.23G/1.78G [00:31<00:18, 29.2MB/s]

optimizer.pt: 70% 1.25G/1.78G [00:31<00:12, 41.6MB/s]

optimizer.pt: 70% 1.25G/1.78G [00:32<00:13, 38.2MB/s]

optimizer.pt: 71% 1.26G/1.78G [00:32<00:12, 40.6MB/s]

optimizer.pt: 72% 1.28G/1.78G [00:32<00:09, 53.3MB/s]

optimizer.pt: 72% 1.28G/1.78G [00:32<00:10, 47.6MB/s]

optimizer.pt: 73% 1.30G/1.78G [00:33<00:11, 42.7MB/s]

optimizer.pt: 73% 1.31G/1.78G [00:33<00:08, 56.5MB/s]

optimizer.pt: 74% 1.32G/1.78G [00:33<00:09, 46.9MB/s]

optimizer.pt: 74% 1.33G/1.78G [00:33<00:14, 31.7MB/s]

optimizer.pt: 75% 1.34G/1.78G [00:34<00:09, 44.1MB/s]

optimizer.pt: 76% 1.35G/1.78G [00:34<00:12, 34.9MB/s]

optimizer.pt: 76% 1.36G/1.78G [00:34<00:11, 35.3MB/s]

optimizer.pt: 77% 1.37G/1.78G [00:34<00:08, 47.7MB/s]

optimizer.pt: 77% 1.38G/1.78G [00:35<00:09, 41.2MB/s]

optimizer.pt: 78% 1.39G/1.78G [00:35<00:12, 30.8MB/s]

optimizer.pt: 79% 1.41G/1.78G [00:35<00:08, 43.1MB/s]

optimizer.pt: 79% 1.41G/1.78G [00:36<00:09, 40.6MB/s]

optimizer.pt: 80% 1.42G/1.78G [00:36<00:10, 35.8MB/s]

optimizer.pt: 81% 1.44G/1.78G [00:36<00:07, 49.1MB/s]

optimizer.pt: 81% 1.45G/1.78G [00:36<00:07, 42.8MB/s]

optimizer.pt: 82% 1.46G/1.78G [00:37<00:08, 39.9MB/s]

optimizer.pt: 82% 1.47G/1.78G [00:37<00:05, 53.8MB/s]

optimizer.pt: 83% 1.48G/1.78G [00:37<00:07, 43.4MB/s]

optimizer.pt: 83% 1.49G/1.78G [00:37<00:07, 38.8MB/s]

optimizer.pt: 84% 1.50G/1.78G [00:37<00:05, 52.5MB/s]

optimizer.pt: 85% 1.51G/1.78G [00:38<00:07, 38.2MB/s]

optimizer.pt: 85% 1.52G/1.78G [00:38<00:07, 35.6MB/s]

optimizer.pt: 86% 1.53G/1.78G [00:38<00:05, 48.5MB/s]

optimizer.pt: 86% 1.54G/1.78G [00:39<00:06, 39.2MB/s]

optimizer.pt: 87% 1.55G/1.78G [00:39<00:05, 38.9MB/s]

optimizer.pt: 88% 1.57G/1.78G [00:39<00:04, 51.7MB/s]

optimizer.pt: 88% 1.57G/1.78G [00:39<00:04, 48.2MB/s]

optimizer.pt: 89% 1.58G/1.78G [00:39<00:04, 40.3MB/s]

optimizer.pt: 90% 1.60G/1.78G [00:40<00:03, 54.1MB/s]

optimizer.pt: 90% 1.61G/1.78G [00:40<00:03, 47.2MB/s]

optimizer.pt: 91% 1.62G/1.78G [00:40<00:04, 36.7MB/s]

optimizer.pt: 91% 1.63G/1.78G [00:40<00:03, 49.6MB/s]

optimizer.pt: 92% 1.64G/1.78G [00:41<00:04, 31.0MB/s]

optimizer.pt: 92% 1.65G/1.78G [00:41<00:04, 30.7MB/s]

optimizer.pt: 93% 1.66G/1.78G [00:41<00:02, 42.7MB/s]

optimizer.pt: 94% 1.67G/1.78G [00:42<00:02, 39.4MB/s]

optimizer.pt: 94% 1.68G/1.78G [00:42<00:02, 47.8MB/s]

optimizer.pt: 95% 1.69G/1.78G [00:42<00:02, 47.7MB/s]

optimizer.pt: 95% 1.70G/1.78G [00:42<00:02, 41.4MB/s]

optimizer.pt: 96% 1.71G/1.78G [00:42<00:01, 56.8MB/s]

optimizer.pt: 96% 1.72G/1.78G [00:43<00:01, 47.8MB/s]

optimizer.pt: 97% 1.73G/1.78G [00:43<00:01, 40.4MB/s]

optimizer.pt: 98% 1.74G/1.78G [00:43<00:00, 54.0MB/s]

optimizer.pt: 98% 1.75G/1.78G [00:43<00:00, 46.2MB/s]

optimizer.pt: 99% 1.76G/1.78G [00:44<00:00, 43.0MB/s]

optimizer.pt: 99% 1.77G/1.78G [00:44<00:00, 56.2MB/s]

optimizer.pt: 100% 1.78G/1.78G [00:44<00:00, 39.8MB/s]

Upload 11 LFS files: 100% 11/11 [00:45<00:00, 4.11s/it] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK

INFO Running jobs: [7899] INFO Killing PID: 7899 INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK

tombenj commented 8 months ago

Tried several things in a fork https://github.com/tombenj/autotrain-advanced/commits/length/ such as adding max_new_tokens to the model generation: https://github.com/tombenj/autotrain-advanced/commit/c9741b914c5625ff69b9890c2006964eea285754

As suggested here: https://www.markhneedham.com/blog/2023/06/19/huggingface-max-length-generation-length-deprecated/

But still getting cut out 20 length token responses:

Screen Shot 2024-03-14 at 13 58 12

@abhishekkrthakur can you point to a direction how to resolve this?

abhishekkrthakur commented 8 months ago

ohh those are the default parameters. you can change the default params: https://huggingface.co/docs/hub/models-widgets#example-outputs

tombenj commented 8 months ago

@abhishekkrthakur it has nothing to do with the default params. training facebook's Bart results in good output, training t5's give max 20 token lengths.

abhishekkrthakur commented 8 months ago

can you share the trained model repo?

tombenj commented 8 months ago

@abhishekkrthakur yep here is an example: https://huggingface.co/tombenj/tuple-1k-t5

Getting only 20 tokens as output.

abhishekkrthakur commented 8 months ago

changing params here have no effect: https://huggingface.co/tombenj/tuple-1k-t5/blob/main/config.json#L29 ?

tombenj commented 8 months ago

@abhishekkrthakur changed here and getting the same max 20 token output: https://huggingface.co/tombenj/tuple-1k-t5/commit/0868248619d5a457bc52a13af26af94d93a436b1 https://huggingface.co/tombenj/tuple-1k-t5/commit/6823fe355c7fd90a9fd0bfa6b72e8784bebb0b16

tombenj commented 7 months ago

@abhishekkrthakur any updates on this?

github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 15 days with no activity.

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 2 days since being marked as stale.