Closed tombenj closed 6 months ago
So it seems the Seq2Seq in Google models such as t5, mt5 etc are limited to 20 tokens output due to this? ie the required params are not passed through?
no. thats just validation. inferencce doesnt matter
no. thats just validation. inferencce doesnt matter
So any idea why the inference output is always 20 tokens while when I train using Bart I get 256+?
Seems as though the Seq2Seq args aren't passing thorugh (especially for google models).
taking a look!
Full trace attached. The model is still generating only max 20 tokens: [``` nltk_data] Downloading package punkt to /root/nltk_data... [nltk_data] Package punkt is already up-to-date!
WARNING Parameters not supplied by user and set to default: push_to_hub, model_ref, auto_find_batch_size, add_eos_token, data_path, lr, project_name, disable_gradient_checkpointing, logging_steps, optimizer, token, seed, lora_dropout, lora_r, rejected_text_column, batch_size, prompt_text_column, model_max_length, weight_decay, max_grad_norm, merge_adapter, gradient_accumulation, use_flash_attention_2, scheduler, valid_split, trainer, text_column, username, repo_id, lora_alpha, model, save_strategy, warmup_ratio, evaluation_strategy, save_total_limit, train_split, dpo_beta WARNING Parameters not supplied by user and set to default: batch_size, epochs, log, weight_decay, max_grad_norm, auto_find_batch_size, max_seq_length, gradient_accumulation, scheduler, data_path, lr, valid_split, text_column, username, project_name, target_column, repo_id, logging_steps, optimizer, token, model, save_strategy, seed, warmup_ratio, save_total_limit, evaluation_strategy, push_to_hub, train_split WARNING Parameters not supplied by user and set to default: batch_size, epochs, log, weight_decay, max_grad_norm, auto_find_batch_size, gradient_accumulation, scheduler, data_path, lr, username, valid_split, image_column, project_name, repo_id, target_column, logging_steps, optimizer, token, model, save_strategy, seed, warmup_ratio, save_total_limit, evaluation_strategy, push_to_hub, train_split WARNING Parameters supplied but not used: target_modules WARNING Parameters not supplied by user and set to default: epochs, auto_find_batch_size, data_path, lr, project_name, logging_steps, token, optimizer, seed, lora_dropout, lora_r, target_modules, max_target_length, batch_size, weight_decay, max_grad_norm, max_seq_length, gradient_accumulation, scheduler, username, valid_split, text_column, target_column, repo_id, lora_alpha, model, save_strategy, warmup_ratio, evaluation_strategy, save_total_limit, train_split, peft, push_to_hub, quantization WARNING Parameters not supplied by user and set to default: id_column, categorical_columns, num_trials, numerical_columns, data_path, username, valid_split, repo_id, project_name, task, token, target_columns, model, seed, train_split, push_to_hub, time_limit WARNING Parameters not supplied by user and set to default: resume_from_checkpoint, epochs, lr_power, tokenizer_max_length, validation_images, adam_beta1, num_cycles, num_class_images, project_name, token, pre_compute_text_embeddings, sample_batch_size, allow_tf32, xl, num_validation_images, seed, scale_lr, validation_epochs, checkpoints_total_limit, class_prompt, revision, rank, adam_weight_decay, prior_preservation, class_image_path, prior_loss_weight, max_grad_norm, adam_epsilon, scheduler, username, tokenizer, text_encoder_use_attention_mask, image_path, dataloader_num_workers, repo_id, class_labels_conditioning, prior_generation_precision, model, adam_beta2, validation_prompt, local_rank, checkpointing_steps, center_crop, push_to_hub, logging, warmup_steps WARNING Parameters not supplied by user and set to default: tags_column, batch_size, epochs, log, tokens_column, weight_decay, max_grad_norm, auto_find_batch_size, max_seq_length, gradient_accumulation, scheduler, data_path, lr, valid_split, username, repo_id, project_name, logging_steps, optimizer, token, model, save_strategy, seed, warmup_ratio, save_total_limit, evaluation_strategy, push_to_hub, train_split INFO AutoTrain Public URL: NgrokTunnel: "https://b320-34-72-237-89.ngrok-free.app/" -> "http://localhost:7860/" INFO Please wait for the app to load... INFO *** INFO: Started server process [7599] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:7860/ (Press CTRL+C to quit) INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET / HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /logo.png HTTP/1.1" 200 OK INFO Task: llm:sft INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /params/llm%3Asft HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /model_choices/llm%3Asft HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /favicon.ico HTTP/1.1" 404 Not Found INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Task: seq2seq INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /params/seq2seq HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /model_choices/seq2seq HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO hardware: Local INFO Running jobs: [] INFO Task: seq2seq INFO Column mapping: {'text': 'text', 'label': 'target'} INFO Dataset: autotrain-gsxqu-k795g (seq2seq) Train data: [<tempfile.SpooledTemporaryFile object at 0x7e16bada5b40>] Valid data: [] Column mapping: {'text': 'text', 'label': 'target'}
Saving the dataset (1/1 shards): 100% 800/800 [00:00<00:00, 238160.49 examples/s] Saving the dataset (1/1 shards): 100% 200/200 [00:00<00:00, 86072.32 examples/s]
WARNING Parameters not supplied by user and set to default: train_split WARNING Parameters supplied but not used: model_max_length, max_length, max_new_tokens INFO Starting local training... INFO {"data_path":"autotrain-gsxqu-k795g/autotrain-data","model":"google-t5/t5-base","username":"tombenj","seed":42,"train_split":"train","valid_split":"validation","project_name":"autotrain-gsxqu-k795g","token":"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","push_to_hub":true,"text_column":"autotrain_text","target_column":"autotrain_label","repo_id":"tombenj/autotrain-gsxqu-k795g","lr":0.00005,"epochs":1,"max_seq_length":1024,"max_target_length":1024,"batch_size":8,"warmup_ratio":0.1,"gradient_accumulation":1,"optimizer":"adamw_torch","scheduler":"linear","weight_decay":0.0,"max_grad_norm":1.0,"logging_steps":-1,"evaluation_strategy":"epoch","auto_find_batch_size":false,"mixed_precision":"fp16","save_total_limit":1,"save_strategy":"epoch","peft":false,"quantization":null,"lora_r":16,"lora_alpha":32,"lora_dropout":0.05,"target_modules":["all-linear"]} INFO ['accelerate', 'launch', '--num_machines', '1', '--num_processes', '1', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.seq2seq', '--training_config', 'autotrain-gsxqu-k795g/training_params.json'] INFO Training PID: 7899 INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "POST /create_project HTTP/1.1" 200 OK The following values were not passed to
accelerate launch
and had defaults used instead:--dynamo_backend
was set to a value of'no'
To avoid this warning pass in values for each of the problematic parameters or runaccelerate config
. INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK 🚀 INFO | 2024-03-13 09:15:12 | main:train:45 - Starting training... 🚀 INFO | 2024-03-13 09:15:12 | main:train:46 - Training config: {'data_path': 'autotrain-gsxqu-k795g/autotrain-data', 'model': 'google-t5/t5-base', 'username': 'tombenj', 'seed': 42, 'train_split': 'train', 'valid_split': 'validation', 'project_name': 'autotrain-gsxqu-k795g', 'token': '*****', 'push_to_hub': True, 'text_column': 'autotrain_text', 'target_column': 'autotrain_label', 'repo_id': 'tombenj/autotrain-gsxqu-k795g', 'lr': 5e-05, 'epochs': 1, 'max_seq_length': 1024, 'max_target_length': 1024, 'batch_size': 8, 'warmup_ratio': 0.1, 'gradient_accumulation': 1, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'logging_steps': -1, 'evaluation_strategy': 'epoch', 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'save_total_limit': 1, 'save_strategy': 'epoch', 'peft': False, 'quantization': None, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'target_modules': ['all-linear']} 🚀 INFO | 2024-03-13 09:15:12 | main:train:53 - loading dataset from disk 🚀 INFO | 2024-03-13 09:15:12 | main:train:64 - loading dataset from disk INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK /usr/local/lib/python3.10/dist-packages/transformers/models/t5/tokenization_t5_fast.py:171: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5. For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding withtruncation is True
.
- Be aware that you SHOULD NOT rely on google-t5/t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with
model_max_length
or passmax_length
when encoding/padding.- To avoid this warning, please instantiate this tokenizer with
model_max_length
set to your preferred value. warnings.warn( INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK {'loss': 10.2023, 'grad_norm': 29.899457931518555, 'learning_rate': 1.5e-05, 'epoch': 0.05} 8% 8/100 [00:04<00:39, 2.33it/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 6.8607, 'grad_norm': 57.67634963989258, 'learning_rate': 3.5e-05, 'epoch': 0.1} {'loss': 2.1809, 'grad_norm': 4.584061622619629, 'learning_rate': 4.888888888888889e-05, 'epoch': 0.15} {'loss': 1.2057, 'grad_norm': 3.255554437637329, 'learning_rate': 4.6111111111111115e-05, 'epoch': 0.2} 20% 20/100 [00:09<00:34, 2.35it/s]INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 0.8056, 'grad_norm': 2.723172426223755, 'learning_rate': 4.3333333333333334e-05, 'epoch': 0.25} {'loss': 0.5849, 'grad_norm': 1.7794570922851562, 'learning_rate': 4.055555555555556e-05, 'epoch': 0.3} 32% 32/100 [00:14<00:28, 2.36it/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 0.418, 'grad_norm': 1.8489391803741455, 'learning_rate': 3.777777777777778e-05, 'epoch': 0.35} {'loss': 0.3179, 'grad_norm': 1.1671098470687866, 'learning_rate': 3.5e-05, 'epoch': 0.4} 44% 44/100 [00:19<00:23, 2.36it/s]INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 0.2652, 'grad_norm': 1.281832218170166, 'learning_rate': 3.222222222222223e-05, 'epoch': 0.45} {'loss': 0.2405, 'grad_norm': 0.9086970686912537, 'learning_rate': 2.9444444444444448e-05, 'epoch': 0.5} {'loss': 0.2329, 'grad_norm': 1.1303473711013794, 'learning_rate': 2.6666666666666667e-05, 'epoch': 0.55} 55% 55/100 [00:24<00:19, 2.32it/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 0.2199, 'grad_norm': 1.0673601627349854, 'learning_rate': 2.3888888888888892e-05, 'epoch': 0.6} {'loss': 0.2137, 'grad_norm': 0.8874663710594177, 'learning_rate': 2.111111111111111e-05, 'epoch': 0.65} 67% 67/100 [00:29<00:14, 2.35it/s]INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 0.1876, 'grad_norm': 0.7264275550842285, 'learning_rate': 1.8333333333333333e-05, 'epoch': 0.7} {'loss': 0.1834, 'grad_norm': 0.8036168217658997, 'learning_rate': 1.5555555555555555e-05, 'epoch': 0.75} 78% 78/100 [00:34<00:09, 2.30it/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK {'loss': 0.1952, 'grad_norm': 0.6779347062110901, 'learning_rate': 1.2777777777777777e-05, 'epoch': 0.8} {'loss': 0.1827, 'grad_norm': 0.7915838360786438, 'learning_rate': 1e-05, 'epoch': 0.85} {'loss': 0.187, 'grad_norm': 0.8215489387512207, 'learning_rate': 7.222222222222222e-06, 'epoch': 0.9} {'loss': 0.1734, 'grad_norm': 0.7938928604125977, 'learning_rate': 4.444444444444445e-06, 'epoch': 0.95} {'loss': 0.172, 'grad_norm': 0.6198846697807312, 'learning_rate': 1.6666666666666667e-06, 'epoch': 1.0} 100% 100/100 [00:43<00:00, 2.36it/s]/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1178: UserWarning: Using the model-agnostic defaultmax_length
(=20) to control the generation length. We recommend settingmax_new_tokens
to control the maximum length of the generation. warnings.warn(
0% 0/13 [00:00<?, ?it/s] 15% 2/13 [00:00<00:05, 2.07it/s] 23% 3/13 [00:02<00:07, 1.36it/s] 31% 4/13 [00:02<00:07, 1.25it/s] 38% 5/13 [00:03<00:06, 1.23it/s] 46% 6/13 [00:04<00:05, 1.20it/s] 54% 7/13 [00:05<00:05, 1.19it/s] 62% 8/13 [00:06<00:04, 1.18it/s] 69% 9/13 [00:07<00:03, 1.17it/s] 77% 10/13 [00:08<00:02, 1.16it/s] 85% 11/13 [00:09<00:01, 1.15it/s] 92% 12/13 [00:09<00:00, 1.15it/s]
{'eval_loss': 0.1557321548461914, 'eval_rouge1': 14.9506, 'eval_rouge2': 12.1047, 'eval_rougeL': 14.938, 'eval_rougeLsum': 14.9251, 'eval_gen_len': 19.0, 'eval_runtime': 12.0496, 'eval_samples_per_second': 16.598, 'eval_steps_per_second': 1.079, 'epoch': 1.0} 100% 100/100 [00:55<00:00, 2.36it/s] 100% 13/13 [00:11<00:00, 1.25it/s] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK
INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight']. {'train_runtime': 78.2901, 'train_samples_per_second': 10.218, 'train_steps_per_second': 1.277, 'train_loss': 1.2514769697189332, 'epoch': 1.0} 100% 100/100 [01:18<00:00, 1.28it/s] 🚀 INFO | 2024-03-13 09:16:38 | main:train:204 - Finished training, saving model... INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK /usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1178: UserWarning: Using the model-agnostic default
max_length
(=20) to control the generation length. We recommend settingmax_new_tokens
to control the maximum length of the generation. warnings.warn( 15% 2/13 [00:01<00:05, 1.92it/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK 54% 7/13 [00:05<00:05, 1.19it/s]INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK 100% 13/13 [00:11<00:00, 1.18it/s] 🚀 INFO | 2024-03-13 09:16:54 | main:train:218 - Pushing model to hub... INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK model.safetensors: 0% 0.00/892M [00:00<?, ?B/s] rng_state.pth: 0% 0.00/14.2k [00:00<?, ?B/s]
optimizer.pt: 0% 0.00/1.78G [00:00<?, ?B/s]
spiece.model: 0% 0.00/792k [00:00<?, ?B/s]
Upload 11 LFS files: 0% 0/11 [00:00<?, ?it/s]
scheduler.pt: 0% 0.00/1.06k [00:00<?, ?B/s]
optimizer.pt: 0% 16.4k/1.78G [00:00<10:59:15, 45.1kB/s]
spiece.model: 2% 16.4k/792k [00:00<00:17, 45.2kB/s]
model.safetensors: 0% 16.4k/892M [00:00<5:48:26, 42.6kB/s] scheduler.pt: 100% 1.06k/1.06k [00:00<00:00, 2.36kB/s]
rng_state.pth: 100% 14.2k/14.2k [00:00<00:00, 28.2kB/s] spiece.model: 100% 792k/792k [00:00<00:00, 1.16MB/s]
optimizer.pt: 1% 16.0M/1.78G [00:00<01:03, 28.0MB/s]
optimizer.pt: 1% 25.4M/1.78G [00:00<00:41, 42.6MB/s] model.safetensors: 2% 16.0M/892M [00:00<00:39, 22.0MB/s]
training_args.bin: 100% 5.05k/5.05k [00:00<00:00, 86.1kB/s]
model.safetensors: 3% 22.9M/892M [00:01<00:27, 31.1MB/s]
events.out.tfevents.1710321319.a7547a852de5.7919.0: 100% 10.6k/10.6k [00:00<00:00, 91.3kB/s]
events.out.tfevents.1710321414.a7547a852de5.7919.1: 0% 0.00/603 [00:00<?, ?B/s]
model.safetensors: 1% 8.21M/892M [00:00<00:25, 35.0MB/s]
events.out.tfevents.1710321414.a7547a852de5.7919.1: 100% 603/603 [00:00<00:00, 5.20kB/s]
model.safetensors: 2% 14.6M/892M [00:00<00:19, 45.0MB/s] spiece.model: 0% 0.00/792k [00:00<?, ?B/s]
model.safetensors: 4% 32.0M/892M [00:01<00:34, 25.0MB/s]
model.safetensors: 2% 19.2M/892M [00:00<00:29, 29.3MB/s]
spiece.model: 100% 792k/792k [00:00<00:00, 2.65MB/s] model.safetensors: 5% 40.8M/892M [00:01<00:25, 33.7MB/s]
training_args.bin: 100% 5.05k/5.05k [00:00<00:00, 65.7kB/s]
model.safetensors: 3% 24.2M/892M [00:00<00:27, 31.7MB/s]
model.safetensors: 5% 46.4M/892M [00:01<00:23, 35.8MB/s]
model.safetensors: 3% 28.2M/892M [00:00<00:25, 33.6MB/s]
optimizer.pt: 3% 61.0M/1.78G [00:01<00:41, 41.5MB/s]
model.safetensors: 6% 51.0M/892M [00:01<00:27, 30.7MB/s]
model.safetensors: 7% 62.1M/892M [00:02<00:18, 45.0MB/s]
optimizer.pt: 4% 73.6M/1.78G [00:02<00:41, 41.7MB/s]
model.safetensors: 8% 68.1M/892M [00:02<00:26, 31.5MB/s]
model.safetensors: 4% 39.4M/892M [00:01<00:44, 19.1MB/s]
model.safetensors: 9% 79.0M/892M [00:02<00:19, 42.3MB/s]
optimizer.pt: 5% 87.1M/1.78G [00:02<00:47, 35.9MB/s]
model.safetensors: 5% 44.3M/892M [00:01<00:36, 23.2MB/s]
model.safetensors: 10% 84.9M/892M [00:02<00:24, 32.9MB/s]
model.safetensors: 11% 94.4M/892M [00:02<00:19, 41.0MB/s]
optimizer.pt: 6% 101M/1.78G [00:02<00:46, 36.2MB/s]
model.safetensors: 6% 54.1M/892M [00:02<00:32, 25.7MB/s]
optimizer.pt: 6% 107M/1.78G [00:03<00:42, 39.8MB/s]
model.safetensors: 11% 100M/892M [00:03<00:24, 33.0MB/s]
model.safetensors: 13% 112M/892M [00:03<00:17, 45.7MB/s]
model.safetensors: 13% 118M/892M [00:03<00:21, 35.5MB/s]
model.safetensors: 14% 127M/892M [00:03<00:17, 43.4MB/s]
optimizer.pt: 7% 128M/1.78G [00:03<00:50, 32.6MB/s]
model.safetensors: 8% 72.9M/892M [00:02<00:33, 24.5MB/s]
optimizer.pt: 8% 135M/1.78G [00:03<00:42, 38.6MB/s]
model.safetensors: 15% 133M/892M [00:04<00:23, 32.3MB/s]
model.safetensors: 9% 80.0M/892M [00:03<00:34, 23.6MB/s]
optimizer.pt: 8% 147M/1.78G [00:04<00:45, 35.8MB/s]
model.safetensors: 10% 88.4M/892M [00:03<00:25, 31.3MB/s]
model.safetensors: 16% 144M/892M [00:04<00:20, 36.0MB/s]
model.safetensors: 11% 95.0M/892M [00:03<00:21, 36.8MB/s]
model.safetensors: 18% 160M/892M [00:04<00:15, 45.9MB/s]
model.safetensors: 11% 100M/892M [00:03<00:26, 30.3MB/s] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK
optimizer.pt: 9% 162M/1.78G [00:04<00:52, 30.8MB/s]
model.safetensors: 12% 111M/892M [00:03<00:17, 44.5MB/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK
model.safetensors: 19% 165M/892M [00:04<00:20, 36.3MB/s]
model.safetensors: 19% 172M/892M [00:05<00:18, 39.3MB/s]
model.safetensors: 14% 124M/892M [00:04<00:19, 40.3MB/s]
model.safetensors: 20% 177M/892M [00:05<00:21, 33.7MB/s]
model.safetensors: 21% 187M/892M [00:05<00:15, 46.4MB/s]
model.safetensors: 14% 129M/892M [00:04<00:22, 34.2MB/s]
optimizer.pt: 11% 190M/1.78G [00:05<00:41, 38.1MB/s]
model.safetensors: 15% 135M/892M [00:04<00:20, 36.4MB/s]
model.safetensors: 22% 193M/892M [00:05<00:19, 35.6MB/s]
model.safetensors: 23% 204M/892M [00:05<00:14, 47.1MB/s]
optimizer.pt: 11% 201M/1.78G [00:05<00:42, 37.3MB/s]
model.safetensors: 17% 149M/892M [00:04<00:21, 35.1MB/s]
model.safetensors: 24% 210M/892M [00:05<00:16, 41.0MB/s]
model.safetensors: 17% 155M/892M [00:05<00:19, 38.1MB/s]
model.safetensors: 24% 215M/892M [00:06<00:15, 42.6MB/s]
model.safetensors: 25% 221M/892M [00:06<00:15, 43.3MB/s]
model.safetensors: 18% 160M/892M [00:05<00:22, 33.0MB/s]
model.safetensors: 18% 164M/892M [00:05<00:21, 34.2MB/s]
optimizer.pt: 12% 222M/1.78G [00:06<00:39, 40.0MB/s]
model.safetensors: 19% 172M/892M [00:05<00:16, 44.2MB/s]
model.safetensors: 25% 226M/892M [00:06<00:24, 27.1MB/s]
model.safetensors: 20% 177M/892M [00:05<00:20, 35.6MB/s]
model.safetensors: 26% 232M/892M [00:06<00:21, 30.7MB/s]
model.safetensors: 27% 237M/892M [00:06<00:19, 32.8MB/s]
model.safetensors: 21% 190M/892M [00:05<00:16, 42.8MB/s]
model.safetensors: 27% 241M/892M [00:07<00:22, 29.1MB/s]
model.safetensors: 22% 195M/892M [00:06<00:18, 37.1MB/s]
model.safetensors: 28% 247M/892M [00:07<00:19, 33.4MB/s]
model.safetensors: 28% 252M/892M [00:07<00:17, 36.9MB/s]
model.safetensors: 23% 207M/892M [00:06<00:16, 41.6MB/s]
optimizer.pt: 15% 260M/1.78G [00:07<00:42, 35.8MB/s]
model.safetensors: 29% 256M/892M [00:07<00:21, 29.4MB/s]
model.safetensors: 30% 264M/892M [00:07<00:16, 37.3MB/s]
model.safetensors: 30% 271M/892M [00:07<00:14, 43.1MB/s]
model.safetensors: 25% 223M/892M [00:06<00:15, 41.9MB/s]
optimizer.pt: 15% 276M/1.78G [00:07<00:42, 35.1MB/s]
optimizer.pt: 16% 287M/1.78G [00:07<00:32, 46.5MB/s]
model.safetensors: 31% 276M/892M [00:08<00:20, 30.1MB/s]
model.safetensors: 32% 282M/892M [00:08<00:17, 34.5MB/s]
optimizer.pt: 16% 294M/1.78G [00:08<00:39, 37.4MB/s]
optimizer.pt: 17% 301M/1.78G [00:08<00:34, 43.0MB/s]
model.safetensors: 27% 244M/892M [00:07<00:18, 34.7MB/s]
model.safetensors: 32% 288M/892M [00:08<00:22, 27.3MB/s]
model.safetensors: 34% 300M/892M [00:08<00:13, 42.9MB/s]
optimizer.pt: 17% 312M/1.78G [00:08<00:39, 36.8MB/s]
model.safetensors: 29% 262M/892M [00:07<00:16, 38.4MB/s]
model.safetensors: 34% 306M/892M [00:08<00:16, 35.2MB/s]
model.safetensors: 35% 313M/892M [00:08<00:14, 39.8MB/s]
optimizer.pt: 18% 325M/1.78G [00:09<00:42, 34.5MB/s]
model.safetensors: 31% 275M/892M [00:08<00:15, 39.0MB/s]
optimizer.pt: 19% 331M/1.78G [00:09<00:36, 39.6MB/s]
model.safetensors: 36% 320M/892M [00:09<00:17, 32.5MB/s]
model.safetensors: 37% 329M/892M [00:09<00:13, 41.9MB/s]
optimizer.pt: 19% 336M/1.78G [00:09<00:44, 32.5MB/s]
model.safetensors: 33% 290M/892M [00:08<00:16, 36.7MB/s]
optimizer.pt: 19% 344M/1.78G [00:09<00:35, 40.7MB/s]
model.safetensors: 33% 296M/892M [00:08<00:15, 37.5MB/s]
optimizer.pt: 20% 350M/1.78G [00:09<00:33, 43.0MB/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK model.safetensors: 38% 336M/892M [00:09<00:16, 34.6MB/s]
model.safetensors: 39% 344M/892M [00:09<00:13, 40.6MB/s]
model.safetensors: 39% 350M/892M [00:09<00:12, 44.7MB/s]
optimizer.pt: 20% 363M/1.78G [00:09<00:32, 43.2MB/s]
model.safetensors: 35% 308M/892M [00:09<00:18, 31.1MB/s]
model.safetensors: 40% 356M/892M [00:10<00:14, 36.4MB/s]
model.safetensors: 41% 363M/892M [00:10<00:12, 43.2MB/s]
model.safetensors: 41% 368M/892M [00:10<00:15, 33.7MB/s]
optimizer.pt: 22% 384M/1.78G [00:10<00:37, 36.9MB/s]
model.safetensors: 43% 382M/892M [00:10<00:09, 51.9MB/s]
model.safetensors: 37% 327M/892M [00:09<00:20, 27.3MB/s]
optimizer.pt: 22% 391M/1.78G [00:10<00:34, 40.6MB/s]
model.safetensors: 37% 332M/892M [00:09<00:18, 30.6MB/s]
model.safetensors: 44% 389M/892M [00:10<00:13, 37.6MB/s]
model.safetensors: 38% 336M/892M [00:10<00:20, 27.1MB/s]
optimizer.pt: 23% 403M/1.78G [00:10<00:37, 36.6MB/s]
model.safetensors: 38% 343M/892M [00:10<00:15, 34.8MB/s]
optimizer.pt: 23% 407M/1.78G [00:11<00:35, 38.5MB/s]
model.safetensors: 39% 350M/892M [00:10<00:13, 40.0MB/s]
model.safetensors: 46% 411M/892M [00:11<00:10, 45.1MB/s]
optimizer.pt: 23% 419M/1.78G [00:11<00:38, 35.8MB/s]
optimizer.pt: 24% 428M/1.78G [00:11<00:29, 46.6MB/s]
model.safetensors: 47% 417M/892M [00:11<00:11, 40.7MB/s]
model.safetensors: 47% 423M/892M [00:11<00:10, 43.9MB/s]
optimizer.pt: 24% 433M/1.78G [00:11<00:36, 37.5MB/s]
model.safetensors: 48% 431M/892M [00:11<00:09, 47.9MB/s]
optimizer.pt: 25% 439M/1.78G [00:11<00:33, 39.6MB/s]
model.safetensors: 49% 436M/892M [00:12<00:12, 37.6MB/s]
model.safetensors: 50% 445M/892M [00:12<00:10, 44.3MB/s]
optimizer.pt: 25% 448M/1.78G [00:12<00:38, 34.7MB/s]
model.safetensors: 43% 382M/892M [00:11<00:12, 39.7MB/s]
model.safetensors: 50% 450M/892M [00:12<00:11, 38.9MB/s]
model.safetensors: 51% 457M/892M [00:12<00:09, 44.5MB/s]
model.safetensors: 52% 464M/892M [00:12<00:08, 47.6MB/s]
optimizer.pt: 26% 464M/1.78G [00:12<00:36, 36.4MB/s]
model.safetensors: 45% 397M/892M [00:11<00:12, 38.9MB/s]
model.safetensors: 54% 478M/892M [00:12<00:08, 48.0MB/s]
optimizer.pt: 27% 480M/1.78G [00:12<00:33, 38.8MB/s]
model.safetensors: 45% 401M/892M [00:12<00:20, 23.4MB/s]
optimizer.pt: 28% 494M/1.78G [00:13<00:22, 56.3MB/s]
model.safetensors: 54% 484M/892M [00:13<00:11, 35.4MB/s]
model.safetensors: 55% 489M/892M [00:13<00:10, 36.8MB/s]
model.safetensors: 55% 494M/892M [00:13<00:09, 40.2MB/s]
model.safetensors: 47% 416M/892M [00:12<00:17, 27.4MB/s]
model.safetensors: 56% 499M/892M [00:13<00:11, 35.0MB/s]
model.safetensors: 56% 503M/892M [00:13<00:10, 35.7MB/s]
optimizer.pt: 29% 519M/1.78G [00:13<00:30, 41.4MB/s]
model.safetensors: 57% 510M/892M [00:13<00:09, 41.7MB/s]
optimizer.pt: 30% 526M/1.78G [00:13<00:29, 43.2MB/s]
model.safetensors: 58% 515M/892M [00:13<00:09, 38.8MB/s]
model.safetensors: 50% 446M/892M [00:13<00:10, 42.4MB/s]
model.safetensors: 58% 519M/892M [00:14<00:09, 38.2MB/s]
model.safetensors: 59% 524M/892M [00:14<00:09, 38.0MB/s]
model.safetensors: 51% 451M/892M [00:13<00:12, 36.7MB/s]
model.safetensors: 51% 456M/892M [00:13<00:11, 36.6MB/s]
model.safetensors: 59% 528M/892M [00:14<00:12, 28.4MB/s]
model.safetensors: 60% 536M/892M [00:14<00:09, 37.8MB/s]
model.safetensors: 61% 542M/892M [00:14<00:08, 42.2MB/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK
model.safetensors: 53% 472M/892M [00:13<00:10, 39.9MB/s]
optimizer.pt: 31% 561M/1.78G [00:14<00:33, 37.0MB/s]INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK
model.safetensors: 62% 553M/892M [00:15<00:09, 36.8MB/s]
model.safetensors: 54% 480M/892M [00:14<00:12, 33.8MB/s]
model.safetensors: 55% 489M/892M [00:14<00:09, 42.9MB/s]
optimizer.pt: 32% 576M/1.78G [00:15<00:32, 36.8MB/s]
model.safetensors: 63% 560M/892M [00:15<00:11, 27.9MB/s]
model.safetensors: 56% 496M/892M [00:14<00:10, 37.6MB/s]
model.safetensors: 57% 505M/892M [00:14<00:08, 45.8MB/s]
model.safetensors: 64% 571M/892M [00:15<00:08, 38.3MB/s]
model.safetensors: 57% 511M/892M [00:14<00:08, 45.6MB/s]
model.safetensors: 65% 576M/892M [00:15<00:10, 30.9MB/s]
model.safetensors: 66% 586M/892M [00:15<00:07, 41.9MB/s]
model.safetensors: 59% 523M/892M [00:15<00:08, 41.8MB/s]
model.safetensors: 66% 592M/892M [00:16<00:08, 35.7MB/s]
optimizer.pt: 35% 621M/1.78G [00:16<00:24, 46.8MB/s]
model.safetensors: 67% 599M/892M [00:16<00:07, 41.6MB/s]
model.safetensors: 68% 606M/892M [00:16<00:06, 45.2MB/s]
optimizer.pt: 35% 628M/1.78G [00:16<00:30, 37.7MB/s]
model.safetensors: 69% 612M/892M [00:16<00:06, 40.9MB/s]
model.safetensors: 69% 616M/892M [00:16<00:06, 42.2MB/s]
model.safetensors: 70% 624M/892M [00:16<00:06, 39.4MB/s]
model.safetensors: 72% 639M/892M [00:17<00:04, 52.9MB/s]
optimizer.pt: 37% 657M/1.78G [00:17<00:29, 38.7MB/s]
model.safetensors: 72% 645M/892M [00:17<00:05, 43.7MB/s]
optimizer.pt: 38% 669M/1.78G [00:17<00:22, 49.2MB/s]
model.safetensors: 73% 650M/892M [00:17<00:05, 40.9MB/s]
optimizer.pt: 38% 675M/1.78G [00:17<00:25, 42.8MB/s]
model.safetensors: 74% 656M/892M [00:17<00:06, 36.6MB/s]
model.safetensors: 74% 663M/892M [00:17<00:05, 44.0MB/s]
model.safetensors: 63% 560M/892M [00:16<00:15, 21.6MB/s]
model.safetensors: 75% 669M/892M [00:17<00:05, 44.5MB/s]
model.safetensors: 64% 567M/892M [00:16<00:12, 26.8MB/s]
model.safetensors: 65% 576M/892M [00:17<00:08, 37.4MB/s]
model.safetensors: 76% 681M/892M [00:18<00:04, 46.4MB/s]
model.safetensors: 77% 686M/892M [00:18<00:04, 47.3MB/s]
model.safetensors: 78% 691M/892M [00:18<00:04, 41.6MB/s]
model.safetensors: 65% 581M/892M [00:17<00:13, 22.3MB/s]
model.safetensors: 79% 700M/892M [00:18<00:04, 43.0MB/s]
optimizer.pt: 40% 717M/1.78G [00:18<00:26, 39.9MB/s]
model.safetensors: 66% 585M/892M [00:17<00:13, 23.5MB/s]
model.safetensors: 66% 591M/892M [00:17<00:10, 28.2MB/s]
model.safetensors: 79% 705M/892M [00:18<00:06, 29.3MB/s]
model.safetensors: 67% 595M/892M [00:18<00:11, 26.0MB/s]
model.safetensors: 80% 711M/892M [00:19<00:05, 35.2MB/s]
model.safetensors: 80% 715M/892M [00:19<00:04, 36.6MB/s]
optimizer.pt: 41% 734M/1.78G [00:19<00:26, 39.4MB/s]
model.safetensors: 68% 605M/892M [00:18<00:09, 31.6MB/s]
model.safetensors: 81% 720M/892M [00:19<00:06, 26.5MB/s]
model.safetensors: 68% 609M/892M [00:18<00:10, 26.6MB/s]
optimizer.pt: 42% 749M/1.78G [00:19<00:24, 41.8MB/s]
model.safetensors: 82% 727M/892M [00:19<00:05, 32.5MB/s]> INFO Running jobs: [7899] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK
model.safetensors: 82% 733M/892M [00:19<00:04, 35.3MB/s]
optimizer.pt: 42% 754M/1.78G [00:19<00:35, 29.3MB/s]
model.safetensors: 83% 737M/892M [00:19<00:05, 29.9MB/s]
model.safetensors: 83% 742M/892M [00:20<00:04, 34.3MB/s]
model.safetensors: 84% 748M/892M [00:20<00:03, 38.1MB/s]
model.safetensors: 71% 637M/892M [00:19<00:06, 38.2MB/s]
optimizer.pt: 43% 770M/1.78G [00:20<00:30, 33.0MB/s]
model.safetensors: 84% 752M/892M [00:20<00:04, 28.8MB/s]
model.safetensors: 85% 760M/892M [00:20<00:03, 39.1MB/s]
model.safetensors: 86% 766M/892M [00:20<00:03, 41.7MB/s]
model.safetensors: 73% 654M/892M [00:19<00:05, 40.2MB/s]
model.safetensors: 86% 771M/892M [00:20<00:03, 37.9MB/s]
model.safetensors: 87% 777M/892M [00:20<00:02, 42.4MB/s]
model.safetensors: 88% 782M/892M [00:20<00:02, 43.7MB/s]
model.safetensors: 74% 664M/892M [00:20<00:06, 34.3MB/s]
model.safetensors: 75% 672M/892M [00:20<00:05, 43.8MB/s]
model.safetensors: 88% 787M/892M [00:21<00:03, 34.4MB/s]
model.safetensors: 89% 793M/892M [00:21<00:02, 41.9MB/s]
model.safetensors: 90% 799M/892M [00:21<00:02, 44.6MB/s]
optimizer.pt: 46% 819M/1.78G [00:21<00:26, 36.9MB/s]
model.safetensors: 77% 685M/892M [00:20<00:05, 38.7MB/s]
optimizer.pt: 46% 825M/1.78G [00:21<00:24, 39.8MB/s]
model.safetensors: 77% 690M/892M [00:20<00:06, 33.2MB/s]
model.safetensors: 79% 702M/892M [00:20<00:03, 50.1MB/s]
optimizer.pt: 47% 832M/1.78G [00:21<00:28, 33.1MB/s]
optimizer.pt: 47% 846M/1.78G [00:21<00:18, 50.3MB/s]
model.safetensors: 79% 709M/892M [00:21<00:04, 42.4MB/s]
model.safetensors: 81% 720M/892M [00:21<00:03, 56.4MB/s]
optimizer.pt: 48% 853M/1.78G [00:22<00:21, 42.9MB/s]
optimizer.pt: 48% 864M/1.78G [00:22<00:23, 39.3MB/s]
model.safetensors: 82% 727M/892M [00:21<00:04, 34.0MB/s]
optimizer.pt: 49% 873M/1.78G [00:22<00:19, 45.6MB/s]
model.safetensors: 82% 733M/892M [00:21<00:04, 36.3MB/s]
model.safetensors: 91% 816M/892M [00:22<00:04, 18.3MB/s]
model.safetensors: 83% 738M/892M [00:22<00:05, 29.4MB/s]
optimizer.pt: 50% 885M/1.78G [00:22<00:22, 39.9MB/s]
model.safetensors: 83% 744M/892M [00:22<00:04, 34.3MB/s]
optimizer.pt: 50% 891M/1.78G [00:23<00:21, 41.9MB/s]
model.safetensors: 92% 822M/892M [00:23<00:03, 18.0MB/s]
optimizer.pt: 50% 896M/1.78G [00:23<00:26, 33.0MB/s]
model.safetensors: 93% 832M/892M [00:23<00:02, 26.5MB/s]
optimizer.pt: 51% 902M/1.78G [00:23<00:23, 37.4MB/s]
model.safetensors: 85% 759M/892M [00:22<00:03, 35.1MB/s]
optimizer.pt: 51% 909M/1.78G [00:23<00:20, 42.1MB/s]
model.safetensors: 94% 838M/892M [00:23<00:02, 24.7MB/s]
optimizer.pt: 51% 914M/1.78G [00:23<00:26, 32.4MB/s]
optimizer.pt: 52% 927M/1.78G [00:23<00:17, 49.7MB/s]
model.safetensors: 95% 848M/892M [00:24<00:01, 26.0MB/s]
model.safetensors: 96% 857M/892M [00:24<00:01, 32.1MB/s]
model.safetensors: 97% 862M/892M [00:24<00:00, 34.4MB/s]
optimizer.pt: 53% 938M/1.78G [00:24<00:21, 38.9MB/s]
model.safetensors: 88% 784M/892M [00:23<00:03, 29.1MB/s]
model.safetensors: 89% 793M/892M [00:23<00:02, 38.7MB/s]
optimizer.pt: 53% 944M/1.78G [00:24<00:24, 33.8MB/s]
optimizer.pt: 54% 957M/1.78G [00:24<00:16, 51.6MB/s]
model.safetensors: 97% 867M/892M [00:24<00:01, 22.0MB/s]
model.safetensors: 98% 873M/892M [00:24<00:00, 26.1MB/s]
model.safetensors: 91% 814M/892M [00:24<00:01, 44.3MB/s]
model.safetensors: 99% 879M/892M [00:25<00:00, 29.1MB/s]
optimizer.pt: 54% 970M/1.78G [00:25<00:19, 41.3MB/s]
model.safetensors: 100% 889M/892M [00:25<00:00, 32.8MB/s]
model.safetensors: 93% 827M/892M [00:24<00:01, 41.4MB/s]
optimizer.pt: 55% 976M/1.78G [00:25<00:22, 35.3MB/s]
optimizer.pt: 55% 983M/1.78G [00:25<00:20, 39.6MB/s]
model.safetensors: 100% 892M/892M [00:25<00:00, 34.8MB/s]
model.safetensors: 94% 842M/892M [00:24<00:01, 48.6MB/s]
optimizer.pt: 56% 992M/1.78G [00:25<00:24, 32.6MB/s]
model.safetensors: 95% 848M/892M [00:24<00:01, 40.8MB/s]
optimizer.pt: 56% 1.00G/1.78G [00:25<00:18, 42.0MB/s]
model.safetensors: 96% 856M/892M [00:25<00:00, 48.8MB/s]
Upload 11 LFS files: 9% 1/11 [00:25<04:19, 25.95s/it]
optimizer.pt: 56% 1.01G/1.78G [00:26<00:17, 44.3MB/s]
model.safetensors: 97% 862M/892M [00:25<00:00, 49.3MB/s]
optimizer.pt: 57% 1.01G/1.78G [00:26<00:19, 39.1MB/s]
model.safetensors: 97% 868M/892M [00:25<00:00, 41.6MB/s]
optimizer.pt: 57% 1.02G/1.78G [00:26<00:17, 42.6MB/s]
model.safetensors: 98% 874M/892M [00:25<00:00, 44.6MB/s]
model.safetensors: 99% 880M/892M [00:25<00:00, 37.8MB/s]
optimizer.pt: 57% 1.02G/1.78G [00:26<00:22, 33.6MB/s]
model.safetensors: 100% 889M/892M [00:25<00:00, 49.3MB/s]
optimizer.pt: 58% 1.03G/1.78G [00:26<00:18, 40.2MB/s]
model.safetensors: 100% 892M/892M [00:25<00:00, 34.3MB/s]
optimizer.pt: 59% 1.05G/1.78G [00:27<00:18, 39.9MB/s]
optimizer.pt: 59% 1.06G/1.78G [00:27<00:21, 33.9MB/s]
optimizer.pt: 60% 1.07G/1.78G [00:27<00:14, 48.8MB/s]
optimizer.pt: 60% 1.08G/1.78G [00:27<00:18, 38.0MB/s]
optimizer.pt: 61% 1.09G/1.78G [00:27<00:14, 49.0MB/s]
optimizer.pt: 61% 1.09G/1.78G [00:28<00:16, 42.4MB/s]
optimizer.pt: 62% 1.10G/1.78G [00:28<00:16, 41.0MB/s]
optimizer.pt: 63% 1.12G/1.78G [00:28<00:11, 56.8MB/s]
optimizer.pt: 63% 1.13G/1.78G [00:29<00:19, 33.8MB/s]
optimizer.pt: 64% 1.14G/1.78G [00:29<00:18, 35.4MB/s]
optimizer.pt: 65% 1.15G/1.78G [00:29<00:12, 49.1MB/s]
optimizer.pt: 65% 1.16G/1.78G [00:29<00:13, 46.1MB/s]
optimizer.pt: 65% 1.17G/1.78G [00:30<00:17, 35.1MB/s]
optimizer.pt: 66% 1.18G/1.78G [00:30<00:12, 48.2MB/s]
optimizer.pt: 67% 1.19G/1.78G [00:30<00:14, 41.5MB/s]
optimizer.pt: 67% 1.20G/1.78G [00:30<00:16, 36.1MB/s]
optimizer.pt: 68% 1.21G/1.78G [00:30<00:11, 49.6MB/s]
optimizer.pt: 69% 1.22G/1.78G [00:31<00:15, 35.9MB/s]
optimizer.pt: 69% 1.23G/1.78G [00:31<00:18, 29.2MB/s]
optimizer.pt: 70% 1.25G/1.78G [00:31<00:12, 41.6MB/s]
optimizer.pt: 70% 1.25G/1.78G [00:32<00:13, 38.2MB/s]
optimizer.pt: 71% 1.26G/1.78G [00:32<00:12, 40.6MB/s]
optimizer.pt: 72% 1.28G/1.78G [00:32<00:09, 53.3MB/s]
optimizer.pt: 72% 1.28G/1.78G [00:32<00:10, 47.6MB/s]
optimizer.pt: 73% 1.30G/1.78G [00:33<00:11, 42.7MB/s]
optimizer.pt: 73% 1.31G/1.78G [00:33<00:08, 56.5MB/s]
optimizer.pt: 74% 1.32G/1.78G [00:33<00:09, 46.9MB/s]
optimizer.pt: 74% 1.33G/1.78G [00:33<00:14, 31.7MB/s]
optimizer.pt: 75% 1.34G/1.78G [00:34<00:09, 44.1MB/s]
optimizer.pt: 76% 1.35G/1.78G [00:34<00:12, 34.9MB/s]
optimizer.pt: 76% 1.36G/1.78G [00:34<00:11, 35.3MB/s]
optimizer.pt: 77% 1.37G/1.78G [00:34<00:08, 47.7MB/s]
optimizer.pt: 77% 1.38G/1.78G [00:35<00:09, 41.2MB/s]
optimizer.pt: 78% 1.39G/1.78G [00:35<00:12, 30.8MB/s]
optimizer.pt: 79% 1.41G/1.78G [00:35<00:08, 43.1MB/s]
optimizer.pt: 79% 1.41G/1.78G [00:36<00:09, 40.6MB/s]
optimizer.pt: 80% 1.42G/1.78G [00:36<00:10, 35.8MB/s]
optimizer.pt: 81% 1.44G/1.78G [00:36<00:07, 49.1MB/s]
optimizer.pt: 81% 1.45G/1.78G [00:36<00:07, 42.8MB/s]
optimizer.pt: 82% 1.46G/1.78G [00:37<00:08, 39.9MB/s]
optimizer.pt: 82% 1.47G/1.78G [00:37<00:05, 53.8MB/s]
optimizer.pt: 83% 1.48G/1.78G [00:37<00:07, 43.4MB/s]
optimizer.pt: 83% 1.49G/1.78G [00:37<00:07, 38.8MB/s]
optimizer.pt: 84% 1.50G/1.78G [00:37<00:05, 52.5MB/s]
optimizer.pt: 85% 1.51G/1.78G [00:38<00:07, 38.2MB/s]
optimizer.pt: 85% 1.52G/1.78G [00:38<00:07, 35.6MB/s]
optimizer.pt: 86% 1.53G/1.78G [00:38<00:05, 48.5MB/s]
optimizer.pt: 86% 1.54G/1.78G [00:39<00:06, 39.2MB/s]
optimizer.pt: 87% 1.55G/1.78G [00:39<00:05, 38.9MB/s]
optimizer.pt: 88% 1.57G/1.78G [00:39<00:04, 51.7MB/s]
optimizer.pt: 88% 1.57G/1.78G [00:39<00:04, 48.2MB/s]
optimizer.pt: 89% 1.58G/1.78G [00:39<00:04, 40.3MB/s]
optimizer.pt: 90% 1.60G/1.78G [00:40<00:03, 54.1MB/s]
optimizer.pt: 90% 1.61G/1.78G [00:40<00:03, 47.2MB/s]
optimizer.pt: 91% 1.62G/1.78G [00:40<00:04, 36.7MB/s]
optimizer.pt: 91% 1.63G/1.78G [00:40<00:03, 49.6MB/s]
optimizer.pt: 92% 1.64G/1.78G [00:41<00:04, 31.0MB/s]
optimizer.pt: 92% 1.65G/1.78G [00:41<00:04, 30.7MB/s]
optimizer.pt: 93% 1.66G/1.78G [00:41<00:02, 42.7MB/s]
optimizer.pt: 94% 1.67G/1.78G [00:42<00:02, 39.4MB/s]
optimizer.pt: 94% 1.68G/1.78G [00:42<00:02, 47.8MB/s]
optimizer.pt: 95% 1.69G/1.78G [00:42<00:02, 47.7MB/s]
optimizer.pt: 95% 1.70G/1.78G [00:42<00:02, 41.4MB/s]
optimizer.pt: 96% 1.71G/1.78G [00:42<00:01, 56.8MB/s]
optimizer.pt: 96% 1.72G/1.78G [00:43<00:01, 47.8MB/s]
optimizer.pt: 97% 1.73G/1.78G [00:43<00:01, 40.4MB/s]
optimizer.pt: 98% 1.74G/1.78G [00:43<00:00, 54.0MB/s]
optimizer.pt: 98% 1.75G/1.78G [00:43<00:00, 46.2MB/s]
optimizer.pt: 99% 1.76G/1.78G [00:44<00:00, 43.0MB/s]
optimizer.pt: 99% 1.77G/1.78G [00:44<00:00, 56.2MB/s]
optimizer.pt: 100% 1.78G/1.78G [00:44<00:00, 39.8MB/s]
Upload 11 LFS files: 100% 11/11 [00:45<00:00, 4.11s/it] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK
INFO Running jobs: [7899] INFO Killing PID: 7899 INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /accelerators HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK INFO Running jobs: [] INFO: 2a02:14f:1f5:eba7:6821:2e7d:a0aa:87b4:0 - "GET /is_model_training HTTP/1.1" 200 OK
Tried several things in a fork https://github.com/tombenj/autotrain-advanced/commits/length/ such as adding max_new_tokens
to the model generation:
https://github.com/tombenj/autotrain-advanced/commit/c9741b914c5625ff69b9890c2006964eea285754
As suggested here: https://www.markhneedham.com/blog/2023/06/19/huggingface-max-length-generation-length-deprecated/
But still getting cut out 20 length token responses:
@abhishekkrthakur can you point to a direction how to resolve this?
ohh those are the default parameters. you can change the default params: https://huggingface.co/docs/hub/models-widgets#example-outputs
@abhishekkrthakur it has nothing to do with the default params. training facebook's Bart results in good output, training t5's give max 20 token lengths.
can you share the trained model repo?
@abhishekkrthakur yep here is an example: https://huggingface.co/tombenj/tuple-1k-t5
Getting only 20 tokens as output.
changing params here have no effect: https://huggingface.co/tombenj/tuple-1k-t5/blob/main/config.json#L29 ?
@abhishekkrthakur changed here and getting the same max 20 token output: https://huggingface.co/tombenj/tuple-1k-t5/commit/0868248619d5a457bc52a13af26af94d93a436b1 https://huggingface.co/tombenj/tuple-1k-t5/commit/6823fe355c7fd90a9fd0bfa6b72e8784bebb0b16
@abhishekkrthakur any updates on this?
This issue is stale because it has been open for 15 days with no activity.
This issue was closed because it has been inactive for 2 days since being marked as stale.
Prerequisites
Backend
Hugging Face Space/Endpoints
Interface Used
UI
CLI Command
No response
UI Screenshots & Parameters
{ "seed": 42, "lr": 0.00005, "epochs": 3, "max_seq_length": 512, "max_target_length": 256, "max_length": 1024, "max_new_tokens": 100, "batch_size": 8, "warmup_ratio": 0.1, "gradient_accumulation": 1, "optimizer": "adamw_torch", "scheduler": "linear", "weight_decay": 0, "max_grad_norm": 1, "logging_steps": -1, "evaluation_strategy": "epoch", "auto_find_batch_size": false, "mixed_precision": "fp16", "save_total_limit": 1, "save_strategy": "epoch", "peft": false, "quantization": null, "lora_r": 16, "lora_alpha": 32, "lora_dropout": 0.05, "target_modules": [ "all-linear" ] }
Error Logs
67%|██████▋ | 18000/27000 [1:28:30<36:56, 4.06it/s]/app/env/lib/python3.10/site-packages/transformers/generation/utils.py:1178: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the ma
Additional Information
Using seq2seq with google-t5 t5-base. Would love any suggestions on how to force it.