Closed asifehmad closed 1 year ago
Hello @asifehmad, can you please show the output of accelerate env
command, i.e., what is the accelerate config that you are using? Also, Google Colab provides only single GPU, right? If yes, then ZeRO stages without CPU offloading will be same as plain PyTorch run, i.e., won't result in any reduction of GPU memory usage.
Hello @asifehmad, can you please show the output of
accelerate env
command, i.e., what is the accelerate config that you are using? Also, Google Colab provides only single GPU, right? If yes, then ZeRO stages without CPU offloading will be same as plain PyTorch run, i.e., won't result in any reduction of GPU memory usage.
Hi @pacman100, Sure! Here is the output of accelerate env
Copy-and-paste the text below in your GitHub issue
- `Accelerate` version: 0.15.0
- Platform: Linux-5.10.133+-x86_64-with-glibc2.27
- Python version: 3.8.16
- Numpy version: 1.21.6
- PyTorch version (GPU?): 1.13.0+cu116 (True)
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: NO
- mixed_precision: fp16
- use_cpu: False
- dynamo_backend: NO
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- gpu_ids: None
- main_process_ip: None
- main_process_port: None
- rdzv_backend: static
- same_network: False
- main_training_function: main
- deepspeed_config: {}
- fsdp_config: {}
- megatron_lm_config: {}
- downcast_bf16: False
- tpu_name: None
- tpu_zone: None
- command_file: None
- commands: None
Also, Google Colab provides only single GPU, right? If yes, then ZeRO stages without CPU offloading will be same as plain PyTorch run, i.e., won't result in any reduction of GPU memory usage.
As mentioned here, if CPU offloading isn't being used, DeepSpeed Stages on a single GPU won't help. Just making sure that context of usage is correct before diving further.
I have tested it with Multi GPUs by DataCrunch as well. The same error is there. What are your suggestions?
I have tested it with Multi GPUs by DataCrunch as well. The same error is there. What are your suggestions?
Also, Google Colab provides only single GPU, right? If yes, then ZeRO stages without CPU offloading will be same as plain PyTorch run, i.e., won't result in any reduction of GPU memory usage.
As mentioned here, if CPU offloading isn't being used, DeepSpeed Stages on a single GPU won't help. Just making sure that context of usage is correct before diving further.
And one more thing, if it runs as plain PyTorch run in the Colab, this should at least run without any error likewise the script runs with _zerostage 2. Then why this error prompts when I shift to _zerostage 3? @pacman100
And one more thing, if it runs as plain PyTorch run in the Colab, this should at least run without any error likewise the script runs with _zerostage 2. Then why this error prompts when I shift to _zerostage 3? @pacman100
Hello, I meant that it would be similar to the scenario of running pytorch as there wouldn't be benefits of DeepSpeed ZeRO
And one more thing, if it runs as plain PyTorch run in the Colab, this should at least run without any error likewise the script runs with _zerostage 2. Then why this error prompts when I shift to _zerostage 3? @pacman100
Hello, I meant that it would be similar to the scenario of running pytorch as there wouldn't be benefits of DeepSpeed ZeRO
Yes, I got that! Could you please help with the error I am facing while using the stage 3? @pacman100
Hello @asifehmad, after the eval loop, you aren't having model.train()
before resuming training. Add model.train()
on line 447 here https://github.com/asifehmad/clm_model_tuning/blob/main/tuned.py#L447 and things sgould work. Also, the way you are saving model is wrong when using deepspeed stage 3. Please refer https://github.com/huggingface/accelerate/blob/main/examples/by_feature/deepspeed_with_config_support.py#L708-L722 for the same
Hello @asifehmad, after the eval loop, you aren't having
model.train()
before resuming training. Addmodel.train()
on line 447 here https://github.com/asifehmad/clm_model_tuning/blob/main/tuned.py#L447 and things sgould work. Also, the way you are saving model is wrong when using deepspeed stage 3. Please refer https://github.com/huggingface/accelerate/blob/main/examples/by_feature/deepspeed_with_config_support.py#L708-L722 for the same
Hey, @pacman100! Thanks a lot! I will check it and will let you know.
Hello @asifehmad, after the eval loop, you aren't having
model.train()
before resuming training. Addmodel.train()
on line 447 here https://github.com/asifehmad/clm_model_tuning/blob/main/tuned.py#L447 and things sgould work. Also, the way you are saving model is wrong when using deepspeed stage 3. Please refer https://github.com/huggingface/accelerate/blob/main/examples/by_feature/deepspeed_with_config_support.py#L708-L722 for the same
Hello @pacman100 the mode.train() is already in the line https://github.com/asifehmad/clm_model_tuning/blob/main/tuned.py#L375
Did you see that? And it is working well withe stage 2 very well.
I did, but that gets overwritten by model.eval()
when you are checking after certain steps without switching back to model.train()
. So, in other cases even if things seem to work they may not be correct as things like dropout won't be enabled at all.
Have you tried the suggestion and checked if things work?
after
Hey @pacman100 ,I had been trying since then, even Tried Dummy optimizer as well, which is required for the stage 3. At the end the error prompts.
after
Hey @pacman100 ,I had been trying since then, even Tried Dummy optimizer as well, which is required for the stage 3. At the end the error prompts.
These are accelerator env:
Copy-and-paste the text below in your GitHub issue
Accelerate
version: 0.15.0Accelerate
default config:
I am trying on 2xA100 GPUs rented from DataCrunch.io
Hello @asifehmad, I made the changes that I suggested above to get following code which works fine. In conf, i set concatenate_raw: true
. Accelerate version 0.0.15.dev
, DeepSpeed version 0.7.7
, PyTorch version 1.14.0.dev20221117+cu117
and transformers version 4.23.0.dev0
.
#!/usr/bin/env python
# coding=utf-8
"""
Fine-tuning the library models for causal language modeling (GPT, GPT-2, CTRL, ...)
on a text file or a dataset without using HuggingFace Trainer.
Here is the full list of checkpoints on the hub that can be fine-tuned by this script:
https://huggingface.co/models?filter=text-generation
"""
import logging
import math
import os
import random
from itertools import chain
import datasets
import hydra
import torch
import transformers
from accelerate import Accelerator, DistributedType, DeepSpeedPlugin
from accelerate.logging import get_logger
from accelerate.utils import set_seed
from datasets import Dataset, DatasetDict, load_dataset
from omegaconf import OmegaConf
from omegaconf.dictconfig import DictConfig
from torch.utils.data import DataLoader
from tqdm.auto import tqdm
from transformers import (
AutoConfig,
AutoModelForCausalLM,
AutoTokenizer,
default_data_collator,
get_scheduler,
)
import bittensor
deepspeed_plugin = DeepSpeedPlugin(zero_stage=3, gradient_accumulation_steps=4)
def check_cfg_and_load_defaults(cfg: DictConfig) -> DictConfig:
subtensor = bittensor.subtensor(network=cfg.bittensor.network)
if cfg.dataset.block_size is None:
cfg.dataset.block_size = subtensor.validator_sequence_length
if cfg.training.train_batch_size is None:
cfg.training.train_batch_size = subtensor.validator_batch_size
if cfg.training.eval_batch_size is None:
cfg.training.eval_batch_size = subtensor.validator_batch_size
return cfg
def create_accelerator(cfg: DictConfig) -> Accelerator:
accelerator = (
Accelerator(log_with=cfg.tracking.report_to, logging_dir=cfg.output_dir)
if cfg.tracking.enabled
else Accelerator(mixed_precision="fp16", deepspeed_plugin=deepspeed_plugin)
)
if accelerator.is_local_main_process:
datasets.utils.logging.set_verbosity_warning()
transformers.utils.logging.set_verbosity_info()
else:
datasets.utils.logging.set_verbosity_error()
transformers.utils.logging.set_verbosity_error()
return accelerator
def load_raw_datasets(cfg: DictConfig) -> DatasetDict:
if cfg.dataset.name == "bittensor":
dataset = bittensor.dataset(
no_tokenizer=True,
batch_size=cfg.training.train_batch_size,
block_size=cfg.dataset.block_size,
)
dataloader = dataset.dataloader(cfg.dataset.num_batches)
bittensor_dataset = {"text": []}
for batch in tqdm(dataloader, desc="Loading data from bittensor IPFS"):
bittensor_dataset["text"].extend(batch)
raw_datasets = Dataset.from_dict(bittensor_dataset)
dataset.close() # Avoid leaving threadqueue running.
return raw_datasets
if os.path.exists(cfg.dataset.name):
data_files = {"text": cfg.dataset.name}
dataset_args = {}
extension = os.path.splitext(cfg.dataset.name)[-1].lstrip(".")
if extension == "txt":
extension = "text"
dataset_args["keep_linebreaks"] = cfg.dataset.keep_linebreaks
raw_datasets = load_dataset(extension, data_files=data_files, **dataset_args)
raw_datasets = raw_datasets["text"]
else:
raw_datasets = load_dataset(cfg.dataset.name, cfg.dataset.config_name)
return raw_datasets
def load_model_and_tokenizer(cfg: DictConfig):
if cfg.model.config_name is not None:
config = AutoConfig.from_pretrained(cfg.model.config_name)
else:
config = AutoConfig.from_pretrained(cfg.model.name)
if cfg.tokenizer.name is not None:
tokenizer = AutoTokenizer.from_pretrained(
cfg.tokenizer.name, use_fast=cfg.tokenizer.use_fast
)
else:
tokenizer = AutoTokenizer.from_pretrained(
cfg.model.name, use_fast=cfg.tokenizer.use_fast
)
#tokenizer.pad_token = cfg.tokenizer.pad_token
if tokenizer.pad_token is None and tokenizer.eos_token is not None:
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
cfg.model.name,
from_tf=bool(".ckpt" in cfg.model.name),
config=config,
)
model.resize_token_embeddings(len(tokenizer))
return tokenizer, model
def create_optimizer(cfg, model):
no_decay = ["bias", "LayerNorm.weight"]
optimizer_grouped_parameters = [
{
"params": [
p
for n, p in model.named_parameters()
if not any(nd in n for nd in no_decay)
],
"weight_decay": cfg.training.weight_decay,
},
{
"params": [
p
for n, p in model.named_parameters()
if any(nd in n for nd in no_decay)
],
"weight_decay": 0.0,
},
]
return torch.optim.AdamW(
optimizer_grouped_parameters, lr=cfg.training.learning_rate
)
def preprocess(cfg, accelerator, tokenizer, raw_datasets):
# First we tokenize all the texts.
column_names = raw_datasets.column_names
text_column_name = "text" if "text" in column_names else column_names["train"][0]
if cfg.dataset.concatenate_raw is True:
pad = False
else:
pad = "max_length"
def group_texts(examples):
#print(examples)
# Concatenate all texts.
concatenated_examples = {k: list(chain(*examples[k])) for k in examples.keys()}
#print(concatenated_examples)
total_length = len(concatenated_examples[list(examples.keys())[0]])
if total_length >= cfg.dataset.block_size:
total_length = (
total_length // cfg.dataset.block_size
) * cfg.dataset.block_size
# Split by chunks of max_len.
result = {
k: [
t[i : i + cfg.dataset.block_size]
for i in range(0, total_length, cfg.dataset.block_size)
]
for k, t in concatenated_examples.items()
}
result["labels"] = result["input_ids"].copy()
return result
def tokenize_fn(examples):
# result = tokenizer(
# examples[text_column_name],
# padding=pad,
# truncation=True,
# max_length=cfg.dataset.block_size,
# )
# result["labels"] = result["input_ids"].copy()
# return result
return tokenizer(examples[text_column_name])
with accelerator.main_process_first():
tokenized_datasets = raw_datasets.map(
tokenize_fn,
batched=True,
remove_columns=text_column_name,
num_proc=cfg.tokenizer.preprocessing_num_workers,
load_from_cache_file=not cfg.dataset.overwrite_cache,
desc="Running tokenizer on dataset",
)
#print(tokenized_datasets["train"][0:10])
if cfg.dataset.concatenate_raw is True:
lm_datasets = tokenized_datasets.map(
group_texts,
batched=True,
num_proc=cfg.tokenizer.preprocessing_num_workers,
load_from_cache_file=not cfg.dataset.overwrite_cache,
desc=f"Grouping texts in chunks of {cfg.dataset.block_size}",
)
return lm_datasets
@hydra.main(version_base=None, config_path="conf", config_name="config")
def main(cfg: DictConfig):
cfg = check_cfg_and_load_defaults(cfg)
os.makedirs(cfg.output_dir, exist_ok=True)
logger = get_logger(__name__)
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO,
)
accelerator = create_accelerator(cfg)
accelerator.wait_for_everyone()
if cfg.training.seed is not None:
logger.info(f"Setting random seed to {cfg.training.seed}")
set_seed(cfg.training.seed)
logger.info(accelerator.state, main_process_only=False)
logger.info(OmegaConf.to_yaml(cfg))
tokenizer, model = load_model_and_tokenizer(cfg)
optimizer = create_optimizer(cfg, model)
lr_scheduler = get_scheduler(
name=cfg.training.lr_scheduler,
optimizer=optimizer,
num_warmup_steps=cfg.training.lr_warmup_steps,
num_training_steps=cfg.training.max_train_steps,
)
# On TPU, the tie weights in our model have been disconnected, so we need to restore the ties.
if accelerator.distributed_type == DistributedType.TPU:
model.tie_weights()
# Load and preprocess data
raw_datasets = load_raw_datasets(cfg)
tokenized_datasets = preprocess(cfg, accelerator, tokenizer, raw_datasets)
if "train" not in tokenized_datasets.column_names:
tokenized_datasets = tokenized_datasets.train_test_split(
test_size=cfg.training.val_split_percent / 100
)
tokenized_datasets_test_valid = tokenized_datasets["test"].train_test_split(
test_size=0.5
)
tokenized_datasets["test"] = tokenized_datasets_test_valid["train"]
tokenized_datasets["validation"] = tokenized_datasets_test_valid["test"]
train_dataset = tokenized_datasets["train"]
eval_dataset = tokenized_datasets["validation"]
# Log a few random samples from the training set:
for index in random.sample(range(len(train_dataset)), 3):
ex = train_dataset[index]
logger.info(f"Sample {index} of the training set: {ex}: \n")
logger.info(tokenizer.decode(ex["input_ids"]))
# DataLoaders creation:
train_dataloader = DataLoader(
train_dataset,
shuffle=True,
collate_fn=default_data_collator,
batch_size=cfg.training.train_batch_size,
)
eval_dataloader = DataLoader(
eval_dataset,
collate_fn=default_data_collator,
batch_size=cfg.training.eval_batch_size,
)
# Prepare everything using our accelerator
(
model,
optimizer,
train_dataloader,
eval_dataloader,
lr_scheduler,
) = accelerator.prepare(
model, optimizer, train_dataloader, eval_dataloader, lr_scheduler
)
# Scheduler and math around the number of training steps.
overrode_max_train_steps = False
num_update_steps_per_epoch = math.ceil(
len(train_dataloader) / cfg.training.gradient_accumulation_steps
)
if cfg.training.max_train_steps is None:
cfg.training.max_train_steps = (
cfg.training.num_epochs * num_update_steps_per_epoch
)
overrode_max_train_steps = True
# We need to recalculate our total training steps as the size of the training dataloader
# may have changed.
num_update_steps_per_epoch = math.ceil(
len(train_dataloader) / cfg.training.gradient_accumulation_steps
)
if overrode_max_train_steps:
cfg.training.max_train_steps = (
cfg.training.num_epochs * num_update_steps_per_epoch
)
# Afterwards we recalculate our number of training epochs
cfg.training.num_epochs = math.ceil(
cfg.training.max_train_steps / num_update_steps_per_epoch
)
# We need to initialize the trackers we use, and also store our configuration.
# We initialize the trackers only on main process because `accelerator.log`
# only logs on main process and we don't want empty logs/runs on other processes.
if cfg.tracking.enabled is True and accelerator.is_main_process:
experiment_config = vars(cfg)
# TensorBoard cannot log Enums, need the raw value
experiment_config["lr_scheduler_type"] = experiment_config[
"lr_scheduler_type"
].value
accelerator.init_trackers("finetune_using_clm", experiment_config)
logger.info("***** Running training *****")
logger.info(f" Num examples = {len(train_dataset)}")
logger.info(f" Num Epochs = {cfg.training.num_epochs}")
logger.info(
f" Gradient Accumulation steps = {cfg.training.gradient_accumulation_steps}"
)
logger.info(f" Total optimization steps = {cfg.training.max_train_steps}")
# Only show the progress bar once on each machine.
progress_bar = tqdm(
range(cfg.training.max_train_steps),
disable=not accelerator.is_local_main_process,
)
completed_steps = 0
starting_epoch = 0
# Potentially load in the weights and states from a previous save
if cfg.training.checkpoint.resume_from_checkpoint > 0:
accelerator.print(
f"Resumed from checkpoint: {cfg.training.checkpoint.resume_from_checkpoint}"
)
accelerator.load_state(cfg.training.checkpoint.resume_from_checkpoint)
path = os.path.basename(cfg.training.checkpoint.resume_from_checkpoint)
training_difference = os.path.splitext(path)[0]
if "epoch" in training_difference:
starting_epoch = int(training_difference.replace("epoch_", "")) + 1
resume_step = None
else:
resume_step = int(training_difference.replace("step_", ""))
starting_epoch = resume_step // len(train_dataloader)
resume_step -= starting_epoch * len(train_dataloader)
for epoch in range(starting_epoch, cfg.training.num_epochs):
model.train()
if cfg.tracking.enabled is True:
total_loss = 0
train_losses = []
for step, batch in enumerate(train_dataloader):
# We need to skip steps until we reach the resumed step
if (
cfg.training.checkpoint.resume_from_checkpoint
and epoch == starting_epoch
):
if resume_step is not None and step < resume_step:
completed_steps += 1
continue
outputs = model(**batch)
loss = outputs.loss
train_losses.append(
accelerator.gather(loss.repeat(cfg.training.train_batch_size))
)
# We keep track of the loss at each epoch
if cfg.tracking.enabled is True:
total_loss += loss.detach().float()
loss = loss / cfg.training.gradient_accumulation_steps
accelerator.backward(loss)
if (
step % cfg.training.gradient_accumulation_steps == 0
or step == len(train_dataloader) - 1
):
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
progress_bar.update(1)
completed_steps += 1
if step % cfg.training.eval_every == 0:
train_losses_tensor = torch.cat(train_losses)
train_loss = torch.mean(train_losses_tensor)
model.eval()
eval_losses = []
for _eval_step, eval_batch in enumerate(eval_dataloader):
with torch.no_grad():
outputs = model(**eval_batch)
loss = outputs.loss
eval_losses.append(
accelerator.gather(loss.repeat(cfg.training.eval_batch_size))
)
losses = torch.cat(eval_losses)
losses = losses[: len(eval_dataset)]
try:
eval_loss = torch.mean(losses)
perplexity = math.exp(eval_loss)
except OverflowError:
perplexity = float("inf")
logger.info(
f"epoch {epoch}: perplexity: {perplexity} train_loss: {train_loss} eval_loss: {eval_loss}"
)
epoch_dir = f"epoch_{epoch}_most_recent"
if cfg.output_dir is not None:
output_dir = os.path.join(cfg.output_dir, epoch_dir)
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(
output_dir,
is_main_process=accelerator.is_main_process,
save_function=accelerator.save,
)
if accelerator.is_main_process:
tokenizer.save_pretrained(output_dir)
model.train()
if cfg.tracking.enabled is True:
accelerator.log(
{
"perplexity": perplexity,
"eval_loss": eval_loss,
"train_loss": total_loss.item() / len(train_dataloader),
"epoch": epoch,
"step": completed_steps,
},
step=completed_steps,
)
logger.info(f"done epoch {epoch}")
if cfg.output_dir is not None:
accelerator.wait_for_everyone()
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(
cfg.output_dir,
is_main_process=accelerator.is_main_process,
save_function=accelerator.save,
)
if accelerator.is_main_process:
tokenizer.save_pretrained(cfg.output_dir)
print('Pushing Model weights and other related files to Hugging Face Hub')
model.push_to_hub(cfg.output_dir)
print('Pushing the Tokenizer and related files to Hugging Face Hub')
tokenizer.push_to_hub(cfg.output_dir)
if __name__ == "__main__":
main()
Command I ran on 2 A100 GPUs
accelerate launch --use_deepspeed --num_processes=2 tuned.py dataset.name=wikitext dataset.config_name=wikitext-2-raw-v1 training.num_epochs=3
Output logs:
[10:17:46] WARNING The following values were not passed to `accelerate launch` and had defaults used instead: launch.py:1056
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run
`accelerate config`.
[10:17:47] WARNING run.py:663
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your
system being overloaded, please further tune the variable for optimal performance in your
application as needed.
*****************************************
[2022-12-20 10:17:53,879] [INFO] [comm.py:654:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2022-12-20 10:17:54,070][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
[2022-12-20 10:17:54,070][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 1
[2022-12-20 10:17:54,070][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
[2022-12-20 10:17:54,070][torch.distributed.distributed_c10d][INFO] - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
[2022-12-20 10:17:56,229][__main__][INFO] - Distributed environment: DEEPSPEED Backend: nccl
Num processes: 2
Process index: 1
Local process index: 1
Device: cuda:1
ds_config: {'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'gradient_accumulation_steps': 4, 'zero_optimization': {'stage': 3, 'offload_optimizer': {'device': 'none'}, 'offload_param': {'device': 'none'}, 'stage3_gather_16bit_weights_on_model_save': False}, 'steps_per_print': inf, 'fp16': {'enabled': True, 'auto_cast': True}}
[2022-12-20 10:17:56,229][__main__][INFO] - Distributed environment: DEEPSPEED Backend: nccl
Num processes: 2
Process index: 0
Local process index: 0
Device: cuda:0
ds_config: {'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'gradient_accumulation_steps': 4, 'zero_optimization': {'stage': 3, 'offload_optimizer': {'device': 'none'}, 'offload_param': {'device': 'none'}, 'stage3_gather_16bit_weights_on_model_save': False}, 'steps_per_print': inf, 'fp16': {'enabled': True, 'auto_cast': True}}
[2022-12-20 10:17:56,232][__main__][INFO] - output_dir: tuned-model
bittensor:
network: nobunaga
dataset:
name: wikitext
config_name: wikitext-2-raw-v1
num_batches: 10
block_size: 256
overwrite_cache: false
keep_linebreaks: true
concatenate_raw: true
model:
name: gpt2
config_name: null
tokenizer:
name: null
use_fast: true
preprocessing_num_workers: null
pad_token: '[PAD]'
training:
seed: null
val_split_percent: 5
train_batch_size: 32
eval_batch_size: 32
learning_rate: 1.0e-05
weight_decay: 0.0
num_epochs: 3
max_train_steps: null
gradient_accumulation_steps: 1
lr_scheduler: constant
lr_warmup_steps: 0
eval_every: 50
checkpoint:
resume_from_checkpoint: 0
every_n_steps: null
hub:
push_to_hub: false
model_id: null
token: null
tracking:
enabled: false
report_to: all
loading configuration file config.json from cache at /home/sourab/.cache/huggingface/hub/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json
Model config GPT2Config {
"_name_or_path": "gpt2",
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 768,
"n_head": 12,
"n_inner": null,
"n_layer": 12,
"n_positions": 1024,
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"transformers_version": "4.25.0.dev0",
"use_cache": true,
"vocab_size": 50257
}
Could not locate the tokenizer configuration file, will try to use the model config instead.
loading configuration file config.json from cache at /home/sourab/.cache/huggingface/hub/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json
Model config GPT2Config {
"_name_or_path": "gpt2",
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 768,
"n_head": 12,
"n_inner": null,
"n_layer": 12,
"n_positions": 1024,
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"transformers_version": "4.25.0.dev0",
"use_cache": true,
"vocab_size": 50257
}
loading file vocab.json from cache at /home/sourab/.cache/huggingface/hub/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/vocab.json
loading file merges.txt from cache at /home/sourab/.cache/huggingface/hub/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/merges.txt
loading file tokenizer.json from cache at /home/sourab/.cache/huggingface/hub/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at None
loading file tokenizer_config.json from cache at None
loading configuration file config.json from cache at /home/sourab/.cache/huggingface/hub/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json
Model config GPT2Config {
"_name_or_path": "gpt2",
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.1,
"bos_token_id": 50256,
"embd_pdrop": 0.1,
"eos_token_id": 50256,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 768,
"n_head": 12,
"n_inner": null,
"n_layer": 12,
"n_positions": 1024,
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.1,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"transformers_version": "4.25.0.dev0",
"use_cache": true,
"vocab_size": 50257
}
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
loading weights file pytorch_model.bin from cache at /home/sourab/.cache/huggingface/hub/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/pytorch_model.bin
All model checkpoint weights were used when initializing GPT2LMHeadModel.
All the weights of GPT2LMHeadModel were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use GPT2LMHeadModel for predictions without further training.
[2022-12-20 10:18:01,700][datasets.builder][WARNING] - Found cached dataset wikitext (/home/sourab/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
100%|██████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1520.59it/s]
[2022-12-20 10:18:01,726][datasets.arrow_dataset][WARNING] - Loading cached processed dataset at /home/sourab/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126/cache-aca899bee9cd44e6.arrow
[2022-12-20 10:18:01,750][datasets.arrow_dataset][WARNING] - Loading cached processed dataset at /home/sourab/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126/cache-a43844e5a0404806.arrow
[2022-12-20 10:18:01,774][datasets.arrow_dataset][WARNING] - Loading cached processed dataset at /home/sourab/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126/cache-f2a151a905b1640d.arrow
100%|██████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1511.82it/s]
Grouping texts in chunks of 256: 100%|███████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 14.74ba/s]
Grouping texts in chunks of 256: 100%|█████████████████████████████████████████████████████████| 37/37 [00:02<00:00, 13.26ba/s]
Grouping texts in chunks of 256: 100%|███████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 13.92ba/s]
[2022-12-20 10:18:05,232][__main__][INFO] - Sample 2764 of the training set: {'input_ids': [32956, 286, 262, 4257, 764, 220, 198, 796, 796, 7443, 290, 1687, 30565, 796, 796, 220, 198, 383, 4693, 373, 717, 3417, 416, 920, 296, 7451, 20320, 1526, 3846, 385, 367, 298, 89, 287, 1248, 2231, 287, 262, 6182, 4913, 286, 12068, 7443, 764, 367, 298, 89, 3706, 262, 4693, 3460, 385, 1714, 79, 16260, 7240, 290, 3417, 340, 355, 5679, 1058, 220, 198, 366, 2619, 2162, 269, 538, 14201, 849, 273, 897, 351, 262, 734, 34319, 2951, 1474, 262, 2779, 837, 543, 318, 3094, 290, 6451, 19514, 379, 3016, 257, 826, 9848, 351, 262, 6727, 4417, 837, 1125, 75, 501, 411, 351, 257, 1913, 8434, 16162, 837, 290, 257, 890, 837, 26929, 277, 648, 2162, 32956, 351, 2237, 22969, 837, 290, 257, 1627, 287, 2166, 837, 2330, 2162, 3625, 837, 352, 764, 604, 764, 362, 764, 513, 764, 837, 717, 5166, 351, 37287, 30389, 290, 2407, 890, 764, 366, 220, 198, 367, 298, 89, 10090, 317, 13, 1714, 79, 16260, 7240, 287, 262, 850, 41357, 1448, 33260, 77, 1352, 33100, 837, 543, 19954, 286, 14284, 26120, 3025, 717, 5166, 286, 7405, 547, 262, 14069, 837, 3940, 416, 262, 5544, 5166, 764, 11450, 920, 296, 9251, 9958, 428, 17923, 837, 543, 367, 298, 89, 2241, 6848, 373, 366, 6454, 11666, 366, 764, 554, 49584, 837, 351, 262, 9465, 286, 1168, 35641, 672, 439, 385, 355, 281, 4795, 34306, 837, 1605, 610, 620, 77, 9251, 4502, 290, 10674, 48434, 2763, 25121, 262, 19230, 1168, 35641, 672, 439, 385, 1714, 79, 16260, 7240, 764, 18291, 12117, 286, 1168], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [32956, 286, 262, 4257, 764, 220, 198, 796, 796, 7443, 290, 1687, 30565, 796, 796, 220, 198, 383, 4693, 373, 717, 3417, 416, 920, 296, 7451, 20320, 1526, 3846, 385, 367, 298, 89, 287, 1248, 2231, 287, 262, 6182, 4913, 286, 12068, 7443, 764, 367, 298, 89, 3706, 262, 4693, 3460, 385, 1714, 79, 16260, 7240, 290, 3417, 340, 355, 5679, 1058, 220, 198, 366, 2619, 2162, 269, 538, 14201, 849, 273, 897, 351, 262, 734, 34319, 2951, 1474, 262, 2779, 837, 543, 318, 3094, 290, 6451, 19514, 379, 3016, 257, 826, 9848, 351, 262, 6727, 4417, 837, 1125, 75, 501, 411, 351, 257, 1913, 8434, 16162, 837, 290, 257, 890, 837, 26929, 277, 648, 2162, 32956, 351, 2237, 22969, 837, 290, 257, 1627, 287, 2166, 837, 2330, 2162, 3625, 837, 352, 764, 604, 764, 362, 764, 513, 764, 837, 717, 5166, 351, 37287, 30389, 290, 2407, 890, 764, 366, 220, 198, 367, 298, 89, 10090, 317, 13, 1714, 79, 16260, 7240, 287, 262, 850, 41357, 1448, 33260, 77, 1352, 33100, 837, 543, 19954, 286, 14284, 26120, 3025, 717, 5166, 286, 7405, 547, 262, 14069, 837, 3940, 416, 262, 5544, 5166, 764, 11450, 920, 296, 9251, 9958, 428, 17923, 837, 543, 367, 298, 89, 2241, 6848, 373, 366, 6454, 11666, 366, 764, 554, 49584, 837, 351, 262, 9465, 286, 1168, 35641, 672, 439, 385, 355, 281, 4795, 34306, 837, 1605, 610, 620, 77, 9251, 4502, 290, 10674, 48434, 2763, 25121, 262, 19230, 1168, 35641, 672, 439, 385, 1714, 79, 16260, 7240, 764, 18291, 12117, 286, 1168]}:
[2022-12-20 10:18:05,233][__main__][INFO] - abdomen of the male.
= = History and taxonomy = =
The species was first described by entomologist Nicholas Marcellus Hentz in 1845 in the Boston Journal of Natural History. Hentz named the species Attus sexpunctatus and described it as follows :
" Black ; cephalothorax with the two posterior eyes near the base, which is wide and suddenly inclined at nearly a right angle with the upper surface, cheliceres with a strong inner tooth, and a long, curved fang ; abdomen with six dots, and a line in front, white ; feet, 1. 4. 2. 3., first pair with enlarged thighs and quite long. "
Hentz classified A. sexpunctatus in the subgeneric group Pugnatoriae, which consisted of jumping spiders whose first pair of legs were the longest, followed by the fourth pair. Later entomologists abandoned this classification, which Hentz himself admitted was " somewhat artificial ". In 1888, with the recognition of Zygoballus as an independent genus, American arachnologists George and Elizabeth Peckham renamed the spider Zygoballus sexpunctatus. Specimens of Z
[2022-12-20 10:18:05,233][__main__][INFO] - Sample 30 of the training set: {'input_ids': [326, 262, 3210, 14271, 925, 373, 764, 366, 10230, 1222, 2613, 366, 837, 12739, 326, 262, 764, 3388, 28139, 7209, 65, 2850, 290, 47392, 6150, 262, 45718, 28139, 4282, 287, 779, 837, 290, 286, 428, 837, 3016, 530, 11695, 393, 517, 286, 477, 1402, 5101, 14271, 373, 991, 329, 781, 600, 5354, 3777, 837, 12739, 326, 645, 1342, 621, 257, 11695, 286, 262, 21900, 6553, 287, 428, 25980, 547, 991, 6936, 351, 26533, 781, 600, 5354, 3777, 764, 220, 198, 383, 366, 5060, 76, 3166, 286, 5521, 1760, 379, 7703, 4631, 13837, 837, 327, 13, 50, 13, 32, 13, 366, 2555, 379, 546, 262, 976, 8761, 290, 5046, 422, 2932, 49658, 1566, 2932, 47072, 764, 2034, 1631, 284, 262, 366, 21293, 366, 329, 2932, 837, 47072, 318, 262, 34837, 33274, 837, 366, 5856, 262, 938, 1285, 287, 262, 1227, 837, 3016, 477, 7000, 379, 262, 13837, 423, 587, 11856, 290, 1908, 284, 9128, 8273, 837, 287, 28777, 284, 6266, 422, 5953, 286, 14230, 41601, 837, 5665, 286, 14538, 764, 366, 770, 788, 8849, 262, 3726, 286, 262, 24663, 286, 2760, 41601, 4568, 422, 7703, 4631, 837, 351, 262, 1748, 852, 29209, 284, 262, 19988, 5618, 6553, 286, 26113, 28549, 705, 82, 14538, 38076, 319, 2693, 1367, 837, 47072, 764, 220, 198, 554, 1248, 2414, 837, 706, 7703, 4631, 3214, 284, 262, 4479, 5407, 290, 262, 24375, 550, 587, 36791, 1522, 837, 3611, 8559, 5557, 28549, 23558, 807, 2488, 11, 31, 5323, 6553, 422, 262, 24375, 3726, 262, 43084, 38076, 764, 220, 198, 383, 24375, 373, 11589], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [326, 262, 3210, 14271, 925, 373, 764, 366, 10230, 1222, 2613, 366, 837, 12739, 326, 262, 764, 3388, 28139, 7209, 65, 2850, 290, 47392, 6150, 262, 45718, 28139, 4282, 287, 779, 837, 290, 286, 428, 837, 3016, 530, 11695, 393, 517, 286, 477, 1402, 5101, 14271, 373, 991, 329, 781, 600, 5354, 3777, 837, 12739, 326, 645, 1342, 621, 257, 11695, 286, 262, 21900, 6553, 287, 428, 25980, 547, 991, 6936, 351, 26533, 781, 600, 5354, 3777, 764, 220, 198, 383, 366, 5060, 76, 3166, 286, 5521, 1760, 379, 7703, 4631, 13837, 837, 327, 13, 50, 13, 32, 13, 366, 2555, 379, 546, 262, 976, 8761, 290, 5046, 422, 2932, 49658, 1566, 2932, 47072, 764, 2034, 1631, 284, 262, 366, 21293, 366, 329, 2932, 837, 47072, 318, 262, 34837, 33274, 837, 366, 5856, 262, 938, 1285, 287, 262, 1227, 837, 3016, 477, 7000, 379, 262, 13837, 423, 587, 11856, 290, 1908, 284, 9128, 8273, 837, 287, 28777, 284, 6266, 422, 5953, 286, 14230, 41601, 837, 5665, 286, 14538, 764, 366, 770, 788, 8849, 262, 3726, 286, 262, 24663, 286, 2760, 41601, 4568, 422, 7703, 4631, 837, 351, 262, 1748, 852, 29209, 284, 262, 19988, 5618, 6553, 286, 26113, 28549, 705, 82, 14538, 38076, 319, 2693, 1367, 837, 47072, 764, 220, 198, 554, 1248, 2414, 837, 706, 7703, 4631, 3214, 284, 262, 4479, 5407, 290, 262, 24375, 550, 587, 36791, 1522, 837, 3611, 8559, 5557, 28549, 23558, 807, 2488, 11, 31, 5323, 6553, 422, 262, 24375, 3726, 262, 43084, 38076, 764, 220, 198, 383, 24375, 373, 11589]}:
[2022-12-20 10:18:05,234][__main__][INFO] - that the standard ammunition made was. " buck & ball ", indicating that the.69 caliber smoothbores and shotguns remained the predominant caliber weapon in use, and of this, nearly one sixth or more of all small arms ammunition was still for flintlock weapons, indicating that no less than a sixth of the Confederate troops in this vicinity were still armed with obsolete flintlock weapons.
The " Summaries of Work done at Little Rock Arsenal, C.S.A. " continue at about the same pace and scale from August 1862 until August 1863. Appended to the " Summary " for August, 1863 is the ominous notation, " During the last week in the month, nearly all stores at the Arsenal have been packed and sent to Arkadelphia, in obedience to orders from Chief of Ordnance, District of Arkansas. " This then marks the beginning of the evacuation of ordnance activities from Little Rock, with the city being surrendered to the advancing Federal troops of Frederick Steele's Arkansas Expedition on September 11, 1863.
In 1864, after Little Rock fell to the Union Army and the arsenal had been recaptured, General Fredrick Steele marched 8 @,@ 500 troops from the arsenal beginning the Camden Expedition.
The arsenal was briefly
[2022-12-20 10:18:05,235][__main__][INFO] - Sample 4458 of the training set: {'input_ids': [262, 968, 8936, 364, 290, 262, 362, 358, 4401, 18455, 26012, 837, 475, 262, 642, 400, 4401, 18455, 35588, 5017, 262, 7625, 837, 290, 262, 2679, 290, 34158, 5963, 373, 27771, 764, 220, 198, 49628, 626, 6149, 262, 513, 4372, 4401, 18455, 26012, 837, 543, 550, 587, 5906, 284, 1210, 262, 2679, 290, 34158, 30172, 837, 284, 1445, 3371, 262, 968, 8936, 364, 508, 16434, 511, 4040, 837, 475, 484, 691, 14131, 287, 21294, 511, 781, 2283, 837, 355, 262, 24933, 547, 5906, 284, 17216, 284, 511, 2651, 3356, 764, 2750, 838, 1058, 1542, 837, 477, 4371, 550, 5025, 764, 383, 968, 8936, 5628, 276, 371, 16063, 26012, 3767, 284, 1745, 319, 287, 262, 7372, 837, 981, 1111, 781, 2283, 547, 17157, 736, 416, 3833, 422, 262, 1913, 2679, 290, 34158, 2700, 764, 383, 1255, 373, 326, 262, 968, 8936, 364, 4444, 510, 4769, 257, 845, 7362, 49156, 1627, 319, 262, 2651, 35082, 286, 262, 18639, 34603, 262, 22816, 764, 20138, 2679, 393, 34158, 38578, 422, 2574, 943, 680, 837, 788, 5611, 257, 14800, 3753, 20358, 319, 257, 2166, 286, 546, 362, 2488, 13, 31, 642, 4608, 357, 604, 2488, 13, 31, 657, 10571, 1267, 837, 319, 262, 7372, 764, 770, 3214, 319, 262, 43581, 290, 30422, 3310, 6800, 290, 257, 40733, 286, 1810, 86, 3378, 10695, 11609, 5185, 563, 286, 262, 642, 400, 5628, 276, 26012, 739, 609, 323, 13165, 705, 82, 3141, 764, 383, 968, 8936, 364, 547, 4855, 416, 4572, 6541, 2162, 530, 2665, 837, 7223, 284, 262, 43581, 5628, 276, 371, 16063], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [262, 968, 8936, 364, 290, 262, 362, 358, 4401, 18455, 26012, 837, 475, 262, 642, 400, 4401, 18455, 35588, 5017, 262, 7625, 837, 290, 262, 2679, 290, 34158, 5963, 373, 27771, 764, 220, 198, 49628, 626, 6149, 262, 513, 4372, 4401, 18455, 26012, 837, 543, 550, 587, 5906, 284, 1210, 262, 2679, 290, 34158, 30172, 837, 284, 1445, 3371, 262, 968, 8936, 364, 508, 16434, 511, 4040, 837, 475, 484, 691, 14131, 287, 21294, 511, 781, 2283, 837, 355, 262, 24933, 547, 5906, 284, 17216, 284, 511, 2651, 3356, 764, 2750, 838, 1058, 1542, 837, 477, 4371, 550, 5025, 764, 383, 968, 8936, 5628, 276, 371, 16063, 26012, 3767, 284, 1745, 319, 287, 262, 7372, 837, 981, 1111, 781, 2283, 547, 17157, 736, 416, 3833, 422, 262, 1913, 2679, 290, 34158, 2700, 764, 383, 1255, 373, 326, 262, 968, 8936, 364, 4444, 510, 4769, 257, 845, 7362, 49156, 1627, 319, 262, 2651, 35082, 286, 262, 18639, 34603, 262, 22816, 764, 20138, 2679, 393, 34158, 38578, 422, 2574, 943, 680, 837, 788, 5611, 257, 14800, 3753, 20358, 319, 257, 2166, 286, 546, 362, 2488, 13, 31, 642, 4608, 357, 604, 2488, 13, 31, 657, 10571, 1267, 837, 319, 262, 7372, 764, 770, 3214, 319, 262, 43581, 290, 30422, 3310, 6800, 290, 257, 40733, 286, 1810, 86, 3378, 10695, 11609, 5185, 563, 286, 262, 642, 400, 5628, 276, 26012, 739, 609, 323, 13165, 705, 82, 3141, 764, 383, 968, 8936, 364, 547, 4855, 416, 4572, 6541, 2162, 530, 2665, 837, 7223, 284, 262, 43581, 5628, 276, 371, 16063]}:
[2022-12-20 10:18:05,235][__main__][INFO] - the New Zealanders and the 2nd Light Horse Brigade, but the 5th Light Horse Regiment covered the gap, and the German and Ottoman advance was halted.
Chauvel ordered the 3rd Light Horse Brigade, which had been unable to turn the German and Ottoman flank, to move towards the New Zealanders who renewed their efforts, but they only succeeded in exposing their flanks, as the Australians were unable to conform to their forward movement. By 10 : 30, all progress had stopped. The New Zealand Mounted Rifles Brigade continued to hold on in the centre, while both flanks were bent back by pressure from the strong German and Ottoman force. The result was that the New Zealanders ended up holding a very exposed salient line on the forward slopes of the hills overlooking the Hod. Fresh German or Ottoman reinforcements from El Arish, then launched a fierce counterattack on a front of about 2 @.@ 5 miles ( 4 @.@ 0 km ), on the centre. This fell on the Canterbury and Auckland Regiments and a squadron of Warwickshire Yeomanry of the 5th Mounted Brigade under Chaytor's command. The New Zealanders were supported by machine guns ; one section, attached to the Canterbury Mounted Rifles
[2022-12-20 10:18:05,236][accelerate.accelerator][INFO] - Since you passed both train and evaluation dataloader, `is_train_batch_min` (here True will decide the `train_batch_size` (32).
[2022-12-20 10:18:05,236][accelerate.accelerator][INFO] - Updating DeepSpeed's gradient accumulation steps to 1 from 4.
[2022-12-20 10:18:05,236] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.7, git-hash=unknown, git-branch=unknown
[2022-12-20 10:18:05,316][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:2 to store for rank: 0
[2022-12-20 10:18:05,487][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:2 to store for rank: 1
[2022-12-20 10:18:05,488][torch.distributed.distributed_c10d][INFO] - Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes.
[2022-12-20 10:18:05,489][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes.
[2022-12-20 10:18:05,714] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2022-12-20 10:18:05,714] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2022-12-20 10:18:05,714] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2022-12-20 10:18:05,718] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2022-12-20 10:18:05,718] [INFO] [utils.py:52:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2022-12-20 10:18:05,718] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer
[2022-12-20 10:18:05,912] [INFO] [utils.py:827:see_memory_usage] Stage 3 initialize beginning
[2022-12-20 10:18:05,912] [INFO] [utils.py:828:see_memory_usage] MA 0.25 GB Max_MA 0.25 GB CA 0.26 GB Max_CA 0 GB
[2022-12-20 10:18:05,912] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 27.02 GB, percent = 5.4%
[2022-12-20 10:18:05,913] [INFO] [stage3.py:114:__init__] Reduce bucket size 500,000,000
[2022-12-20 10:18:05,913] [INFO] [stage3.py:115:__init__] Prefetch bucket size 50,000,000
Using /home/sourab/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/sourab/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Emitting ninja build file /home/sourab/.cache/torch_extensions/py310_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.06488680839538574 seconds
Loading extension module utils...
Time to load utils op: 0.10178136825561523 seconds
[2022-12-20 10:18:06,356] [INFO] [utils.py:827:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2022-12-20 10:18:06,357] [INFO] [utils.py:828:see_memory_usage] MA 0.25 GB Max_MA 0.25 GB CA 0.26 GB Max_CA 0 GB
[2022-12-20 10:18:06,357] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 27.03 GB, percent = 5.4%
Parameter Offload: Total persistent parameters: 121344 in 98 params
[2022-12-20 10:18:06,541] [INFO] [utils.py:827:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2022-12-20 10:18:06,542] [INFO] [utils.py:828:see_memory_usage] MA 0.13 GB Max_MA 0.29 GB CA 0.31 GB Max_CA 0 GB
[2022-12-20 10:18:06,542] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 27.03 GB, percent = 5.4%
[2022-12-20 10:18:06,778] [INFO] [stage3.py:369:_setup_for_real_optimizer] optimizer state initialized
Using /home/sourab/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0003514289855957031 seconds
[2022-12-20 10:18:06,979] [INFO] [utils.py:827:see_memory_usage] After initializing ZeRO optimizer
[2022-12-20 10:18:06,980] [INFO] [utils.py:828:see_memory_usage] MA 1.87 GB Max_MA 2.02 GB CA 2.57 GB Max_CA 3 GB
[2022-12-20 10:18:06,980] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 27.04 GB, percent = 5.4%
[2022-12-20 10:18:06,980] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW
[2022-12-20 10:18:06,980] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2022-12-20 10:18:06,980] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2022-12-20 10:18:06,980] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[1e-05, 1e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
[2022-12-20 10:18:06,981] [INFO] [config.py:1020:print] DeepSpeedEngine configuration:
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] amp_enabled .................. False
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] amp_params ................... False
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] bfloat16_enabled ............. False
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] checkpoint_parallel_write_pipeline False
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] checkpoint_tag_validation_enabled True
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] checkpoint_tag_validation_fail False
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f73f808fdc0>
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] communication_data_type ...... None
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] curriculum_enabled ........... False
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] curriculum_params ............ False
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] dataloader_drop_last ......... False
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] disable_allgather ............ False
[2022-12-20 10:18:06,981] [INFO] [config.py:1024:print] dump_state ................... False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] dynamic_loss_scale_args ...... None
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] eigenvalue_enabled ........... False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] eigenvalue_gas_boundary_resolution 1
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] eigenvalue_layer_name ........ bert.encoder.layer
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] eigenvalue_layer_num ......... 0
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] eigenvalue_max_iter .......... 100
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] eigenvalue_stability ......... 1e-06
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] eigenvalue_tol ............... 0.01
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] eigenvalue_verbose ........... False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] elasticity_enabled ........... False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] flops_profiler_config ........ {
"enabled": false,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] fp16_auto_cast ............... True
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] fp16_enabled ................. True
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] fp16_master_weights_and_gradients False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] global_rank .................. 0
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] grad_accum_dtype ............. None
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] gradient_accumulation_steps .. 1
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] gradient_clipping ............ 0.0
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] gradient_predivide_factor .... 1.0
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] initial_dynamic_scale ........ 4294967296
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] load_universal_checkpoint .... False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] loss_scale ................... 0
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] memory_breakdown ............. False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] monitor_config ............... <deepspeed.monitor.config.DeepSpeedMonitorConfig object at 0x7f73f808fd90>
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] optimizer_legacy_fusion ...... False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] optimizer_name ............... None
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] optimizer_params ............. None
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] pld_enabled .................. False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] pld_params ................... False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] prescale_gradients ........... False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] scheduler_name ............... None
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] scheduler_params ............. None
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] sparse_attention ............. None
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] sparse_gradients_enabled ..... False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] steps_per_print .............. inf
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] train_batch_size ............. 64
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] train_micro_batch_size_per_gpu 32
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] use_node_local_storage ....... False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] wall_clock_breakdown ......... False
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] world_size ................... 2
[2022-12-20 10:18:06,982] [INFO] [config.py:1024:print] zero_allow_untested_optimizer True
[2022-12-20 10:18:06,983] [INFO] [config.py:1024:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False
[2022-12-20 10:18:06,983] [INFO] [config.py:1024:print] zero_enabled ................. True
[2022-12-20 10:18:06,983] [INFO] [config.py:1024:print] zero_optimization_stage ...... 3
[2022-12-20 10:18:06,983] [INFO] [config.py:1009:print_user_config] json = {
"train_batch_size": 64,
"train_micro_batch_size_per_gpu": 32,
"gradient_accumulation_steps": 1,
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "none"
},
"offload_param": {
"device": "none"
},
"stage3_gather_16bit_weights_on_model_save": false
},
"steps_per_print": inf,
"fp16": {
"enabled": true,
"auto_cast": true
},
"zero_allow_untested_optimizer": true
}
Using /home/sourab/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0002932548522949219 seconds
[2022-12-20 10:18:06,984][__main__][INFO] - ***** Running training *****
[2022-12-20 10:18:06,984][__main__][INFO] - Num examples = 9327
[2022-12-20 10:18:06,984][__main__][INFO] - Num Epochs = 3
[2022-12-20 10:18:06,984][__main__][INFO] - Gradient Accumulation steps = 1
[2022-12-20 10:18:06,984][__main__][INFO] - Total optimization steps = 438
0%| | 0/438 [00:00<?, ?it/s]/home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2455: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
/home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2455: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
/home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2923: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead.
warnings.warn(
/home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2923: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead.
warnings.warn(
[2022-12-20 10:18:08,389] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648.0
0%|▏ | 1/438 [00:01<10:14, 1.41s/it][2022-12-20 10:18:09,957][__main__][INFO] - epoch 0: perplexity: 44.73612093895834 train_loss: 4.16015625 eval_loss: 3.80078125
Configuration saved in tuned-model/epoch_0_most_recent/config.json
Model weights saved in tuned-model/epoch_0_most_recent/pytorch_model.bin
tokenizer config file saved in tuned-model/epoch_0_most_recent/tokenizer_config.json
Special tokens file saved in tuned-model/epoch_0_most_recent/special_tokens_map.json
[2022-12-20 10:18:10,486] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2147483648.0, reducing to 1073741824.0
0%|▍ | 2/438 [00:03<13:10, 1.81s/it][2022-12-20 10:18:10,708] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1073741824.0, reducing to 536870912.0
[2022-12-20 10:18:10,709] [INFO] [timer.py:197:stop] 0/3, RunningAvgSamplesPerSec=306.49404790449245, CurrSamplesPerSec=306.49404790449245, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
1%|▌ | 3/438 [00:03<07:52, 1.09s/it][2022-12-20 10:18:10,932] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 536870912.0, reducing to 268435456.0
[2022-12-20 10:18:10,932] [INFO] [timer.py:197:stop] 0/4, RunningAvgSamplesPerSec=306.14694252747717, CurrSamplesPerSec=305.80062245674475, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
1%|▊ | 4/438 [00:03<05:23, 1.34it/s][2022-12-20 10:18:11,155] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 268435456.0, reducing to 134217728.0
[2022-12-20 10:18:11,156] [INFO] [timer.py:197:stop] 0/5, RunningAvgSamplesPerSec=305.4479018781503, CurrSamplesPerSec=304.05935397054276, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
1%|█ | 5/438 [00:04<04:01, 1.79it/s][2022-12-20 10:18:11,377] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 134217728.0, reducing to 67108864.0
[2022-12-20 10:18:11,378] [INFO] [timer.py:197:stop] 0/6, RunningAvgSamplesPerSec=305.7425433132636, CurrSamplesPerSec=306.6298881245731, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
1%|█▏ | 6/438 [00:04<03:11, 2.26it/s][2022-12-20 10:18:11,603] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 67108864.0, reducing to 33554432.0
[2022-12-20 10:18:11,604] [INFO] [timer.py:197:stop] 0/7, RunningAvgSamplesPerSec=304.7914702361163, CurrSamplesPerSec=301.0456207797218, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
2%|█▍ | 7/438 [00:04<02:40, 2.69it/s][2022-12-20 10:18:11,829] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 33554432.0, reducing to 16777216.0
[2022-12-20 10:18:11,829] [INFO] [timer.py:197:stop] 0/8, RunningAvgSamplesPerSec=304.24050715165436, CurrSamplesPerSec=301.5153029132146, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
2%|█▋ | 8/438 [00:04<02:19, 3.07it/s][2022-12-20 10:18:12,054] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16777216.0, reducing to 8388608.0
[2022-12-20 10:18:12,055] [INFO] [timer.py:197:stop] 0/9, RunningAvgSamplesPerSec=303.8985991264635, CurrSamplesPerSec=301.86318092980474, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
2%|█▊ | 9/438 [00:05<02:06, 3.40it/s][2022-12-20 10:18:12,280] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8388608.0, reducing to 4194304.0
[2022-12-20 10:18:12,281] [INFO] [timer.py:197:stop] 0/10, RunningAvgSamplesPerSec=303.61687118841286, CurrSamplesPerSec=301.6593071068243, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
2%|██ | 10/438 [00:05<01:56, 3.66it/s][2022-12-20 10:18:12,507] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4194304.0, reducing to 2097152.0
[2022-12-20 10:18:12,508] [INFO] [timer.py:197:stop] 0/11, RunningAvgSamplesPerSec=303.1608087286132, CurrSamplesPerSec=299.5610470306753, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
3%|██▏ | 11/438 [00:05<01:50, 3.86it/s][2022-12-20 10:18:12,734] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2097152.0, reducing to 1048576.0
[2022-12-20 10:18:12,735] [INFO] [timer.py:197:stop] 0/12, RunningAvgSamplesPerSec=302.81972761105214, CurrSamplesPerSec=299.7841883611096, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
3%|██▍ | 12/438 [00:05<01:46, 4.01it/s][2022-12-20 10:18:12,959] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 1048576.0, reducing to 524288.0
[2022-12-20 10:18:12,960] [INFO] [timer.py:197:stop] 0/13, RunningAvgSamplesPerSec=302.78240373823917, CurrSamplesPerSec=302.40967042375695, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
3%|██▋ | 13/438 [00:05<01:42, 4.13it/s][2022-12-20 10:18:13,186] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 524288.0, reducing to 262144.0
[2022-12-20 10:18:13,186] [INFO] [timer.py:197:stop] 0/14, RunningAvgSamplesPerSec=302.5651304992013, CurrSamplesPerSec=300.195544183529, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
3%|██▊ | 14/438 [00:06<01:40, 4.21it/s][2022-12-20 10:18:13,413] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 262144.0, reducing to 131072.0
[2022-12-20 10:18:13,413] [INFO] [timer.py:197:stop] 0/15, RunningAvgSamplesPerSec=302.3655976065568, CurrSamplesPerSec=299.9915691599334, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
3%|███ | 15/438 [00:06<01:39, 4.27it/s][2022-12-20 10:18:13,640] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072.0, reducing to 65536.0
[2022-12-20 10:18:13,640] [INFO] [timer.py:197:stop] 0/16, RunningAvgSamplesPerSec=302.1549563317689, CurrSamplesPerSec=299.443087113712, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
4%|███▎ | 16/438 [00:06<01:37, 4.31it/s][2022-12-20 10:18:13,864] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536.0, reducing to 32768.0
[2022-12-20 10:18:13,865] [INFO] [timer.py:197:stop] 0/17, RunningAvgSamplesPerSec=302.2423510358202, CurrSamplesPerSec=303.47120682833076, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
4%|███▍ | 17/438 [00:06<01:36, 4.35it/s][2022-12-20 10:18:14,092] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
[2022-12-20 10:18:14,092] [INFO] [timer.py:197:stop] 0/18, RunningAvgSamplesPerSec=302.035509519852, CurrSamplesPerSec=298.9665143816866, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
4%|███▋ | 18/438 [00:07<01:36, 4.37it/s][2022-12-20 10:18:14,320] [INFO] [stage3.py:1816:_overflow_clean_up] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0
[2022-12-20 10:18:14,321] [INFO] [timer.py:197:stop] 0/19, RunningAvgSamplesPerSec=301.7809756816289, CurrSamplesPerSec=297.76600280865847, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
4%|███▊ | 19/438 [00:07<01:35, 4.37it/s][2022-12-20 10:18:14,564] [INFO] [timer.py:197:stop] 0/20, RunningAvgSamplesPerSec=300.33382849879337, CurrSamplesPerSec=277.69577707822765, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
5%|████ | 20/438 [00:07<01:37, 4.28it/s][2022-12-20 10:18:14,795] [INFO] [timer.py:197:stop] 0/21, RunningAvgSamplesPerSec=300.03713578034416, CurrSamplesPerSec=294.7951543132257, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
5%|████▎ | 21/438 [00:07<01:36, 4.30it/s][2022-12-20 10:18:15,031] [INFO] [timer.py:197:stop] 0/22, RunningAvgSamplesPerSec=299.38991858428244, CurrSamplesPerSec=287.6024325123533, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
5%|████▍ | 22/438 [00:08<01:37, 4.28it/s][2022-12-20 10:18:15,265] [INFO] [timer.py:197:stop] 0/23, RunningAvgSamplesPerSec=298.99301163528753, CurrSamplesPerSec=291.2701629660494, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
5%|████▋ | 23/438 [00:08<01:36, 4.28it/s][2022-12-20 10:18:15,495] [INFO] [timer.py:197:stop] 0/24, RunningAvgSamplesPerSec=298.76827142675234, CurrSamplesPerSec=294.1255588085763, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
5%|████▉ | 24/438 [00:08<01:36, 4.30it/s][2022-12-20 10:18:15,728] [INFO] [timer.py:197:stop] 0/25, RunningAvgSamplesPerSec=298.4932023589316, CurrSamplesPerSec=292.56728322200024, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
6%|█████ | 25/438 [00:08<01:36, 4.30it/s][2022-12-20 10:18:15,963] [INFO] [timer.py:197:stop] 0/26, RunningAvgSamplesPerSec=298.0579448487978, CurrSamplesPerSec=288.3859994413528, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
6%|█████▎ | 26/438 [00:08<01:36, 4.28it/s][2022-12-20 10:18:16,195] [INFO] [timer.py:197:stop] 0/27, RunningAvgSamplesPerSec=297.86552282475924, CurrSamplesPerSec=293.320791992657, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
6%|█████▍ | 27/438 [00:09<01:35, 4.29it/s][2022-12-20 10:18:16,427] [INFO] [timer.py:197:stop] 0/28, RunningAvgSamplesPerSec=297.67034778781294, CurrSamplesPerSec=292.8727590119578, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
6%|█████▋ | 28/438 [00:09<01:35, 4.30it/s][2022-12-20 10:18:16,658] [INFO] [timer.py:197:stop] 0/29, RunningAvgSamplesPerSec=297.5150142564051, CurrSamplesPerSec=293.5324833242209, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
7%|█████▉ | 29/438 [00:09<01:35, 4.30it/s][2022-12-20 10:18:16,887] [INFO] [timer.py:197:stop] 0/30, RunningAvgSamplesPerSec=297.4974153892701, CurrSamplesPerSec=297.023031735441, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
7%|██████ | 30/438 [00:09<01:34, 4.32it/s][2022-12-20 10:18:17,121] [INFO] [timer.py:197:stop] 0/31, RunningAvgSamplesPerSec=297.2236694501138, CurrSamplesPerSec=289.758181025289, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
7%|██████▎ | 31/438 [00:10<01:34, 4.31it/s][2022-12-20 10:18:17,354] [INFO] [timer.py:197:stop] 0/32, RunningAvgSamplesPerSec=297.02899146748274, CurrSamplesPerSec=291.4921973154552, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
7%|██████▌ | 32/438 [00:10<01:34, 4.30it/s][2022-12-20 10:18:17,583] [INFO] [timer.py:197:stop] 0/33, RunningAvgSamplesPerSec=297.0306970183062, CurrSamplesPerSec=297.0818726523782, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
8%|██████▋ | 33/438 [00:10<01:33, 4.32it/s][2022-12-20 10:18:17,821] [INFO] [timer.py:197:stop] 0/34, RunningAvgSamplesPerSec=296.64242394689927, CurrSamplesPerSec=285.08983391781067, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
8%|██████▉ | 34/438 [00:10<01:34, 4.29it/s][2022-12-20 10:18:18,072] [INFO] [timer.py:197:stop] 0/35, RunningAvgSamplesPerSec=295.7173631619058, CurrSamplesPerSec=268.88530110875496, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
8%|███████ | 35/438 [00:11<01:36, 4.19it/s][2022-12-20 10:18:18,302] [INFO] [timer.py:197:stop] 0/36, RunningAvgSamplesPerSec=295.7187170672995, CurrSamplesPerSec=295.7634029012717, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
8%|███████▎ | 36/438 [00:11<01:34, 4.24it/s][2022-12-20 10:18:18,534] [INFO] [timer.py:197:stop] 0/37, RunningAvgSamplesPerSec=295.63139929592757, CurrSamplesPerSec=292.692971389879, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
8%|███████▌ | 37/438 [00:11<01:34, 4.26it/s][2022-12-20 10:18:18,763] [INFO] [timer.py:197:stop] 0/38, RunningAvgSamplesPerSec=295.64357529610857, CurrSamplesPerSec=296.0703680868594, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
9%|███████▋ | 38/438 [00:11<01:33, 4.29it/s][2022-12-20 10:18:18,994] [INFO] [timer.py:197:stop] 0/39, RunningAvgSamplesPerSec=295.6241619847385, CurrSamplesPerSec=294.9269767605386, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
9%|███████▉ | 39/438 [00:12<01:32, 4.30it/s][2022-12-20 10:18:19,227] [INFO] [timer.py:197:stop] 0/40, RunningAvgSamplesPerSec=295.50794171192643, CurrSamplesPerSec=291.2711111111111, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
9%|████████▏ | 40/438 [00:12<01:32, 4.30it/s][2022-12-20 10:18:19,458] [INFO] [timer.py:197:stop] 0/41, RunningAvgSamplesPerSec=295.4916821170392, CurrSamplesPerSec=294.8751406074241, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
9%|████████▎ | 41/438 [00:12<01:32, 4.31it/s][2022-12-20 10:18:19,689] [INFO] [timer.py:197:stop] 0/42, RunningAvgSamplesPerSec=295.44123724660926, CurrSamplesPerSec=293.48723269566966, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
10%|████████▌ | 42/438 [00:12<01:31, 4.31it/s][2022-12-20 10:18:19,921] [INFO] [timer.py:197:stop] 0/43, RunningAvgSamplesPerSec=295.3542038229261, CurrSamplesPerSec=291.91442512742384, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
10%|████████▋ | 43/438 [00:12<01:31, 4.31it/s][2022-12-20 10:18:20,153] [INFO] [timer.py:197:stop] 0/44, RunningAvgSamplesPerSec=295.288051475711, CurrSamplesPerSec=292.60108718992905, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
10%|████████▉ | 44/438 [00:13<01:31, 4.31it/s][2022-12-20 10:18:20,385] [INFO] [timer.py:197:stop] 0/45, RunningAvgSamplesPerSec=295.2178983605719, CurrSamplesPerSec=292.3012701012248, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
10%|█████████▏ | 45/438 [00:13<01:31, 4.31it/s][2022-12-20 10:18:20,618] [INFO] [timer.py:197:stop] 0/46, RunningAvgSamplesPerSec=295.1547414538479, CurrSamplesPerSec=292.46432493680817, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
11%|█████████▎ | 46/438 [00:13<01:30, 4.31it/s][2022-12-20 10:18:20,848] [INFO] [timer.py:197:stop] 0/47, RunningAvgSamplesPerSec=295.1580779590875, CurrSamplesPerSec=295.3049589058878, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
11%|█████████▌ | 47/438 [00:13<01:30, 4.32it/s][2022-12-20 10:18:21,079] [INFO] [timer.py:197:stop] 0/48, RunningAvgSamplesPerSec=295.13403112231697, CurrSamplesPerSec=294.0559640343882, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
11%|█████████▊ | 48/438 [00:14<01:30, 4.32it/s][2022-12-20 10:18:21,311] [INFO] [timer.py:197:stop] 0/49, RunningAvgSamplesPerSec=295.05970381132835, CurrSamplesPerSec=291.68065404332907, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
11%|█████████▉ | 49/438 [00:14<01:30, 4.32it/s][2022-12-20 10:18:21,542] [INFO] [timer.py:197:stop] 0/50, RunningAvgSamplesPerSec=295.0228271328916, CurrSamplesPerSec=293.2999601190964, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
11%|██████████▏ | 50/438 [00:14<01:29, 4.32it/s][2022-12-20 10:18:21,772] [INFO] [timer.py:197:stop] 0/51, RunningAvgSamplesPerSec=295.0393134681017, CurrSamplesPerSec=295.8328302414951, MemAllocated=3.07GB, MaxMemAllocated=11.54GB
12%|██████████▎ | 51/438 [00:14<01:29, 4.33it/s][2022-12-20 10:18:23,195][__main__][INFO] - epoch 0: perplexity: 34.97688798216538 train_loss: 3.990234375 eval_loss: 3.5546875
Configuration saved in tuned-model/epoch_0_most_recent/config.json
Model weights saved in tuned-model/epoch_0_most_recent/pytorch_model.bin
tokenizer config file saved in tuned-model/epoch_0_most_recent/tokenizer_config.json
Special tokens file saved in tuned-model/epoch_0_most_recent/special_tokens_map.json
[2022-12-
Therefore, I am unable to reproduce the error. Hope this helps.
Hi, Thank you so much, @pacman100! It is okay now. Thanks again for taking out time to the issue. Means a lot!
Hi, i tried your script and i have the following error...can you help me...thanks
root@:/workspace/clm_modeltuning# accelerate launch --use_deepspeed --num_processes=2 tuned3.py
[15:52:02] WARNING The following values were not passed to accelerate launch
and had defaults used instead: launch.py:1088
--num_machines
was set to a value of 1
--mixed_precision
was set to a value of 'no'
--dynamo_backend
was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config
.
WARNING run.py:663
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your
system being overloaded, please further tune the variable for optimal performance in your application
as needed.
*****************************************
Error executing job with overrides: []
Error executing job with overrides: []
Traceback (most recent call last):
File "tuned3.py", line 303, in main
Accelerator(log_with=cfg.tracking.report_to, logging_dir=cfg.output_dir) if cfg.tracking.enabled else Accelerator()
File "/opt/conda/lib/python3.7/site-packages/accelerate/accelerator.py", line 235, in init
DeepSpeedPlugin() if os.environ.get("ACCELERATE_USE_DEEPSPEED", "false") == "true" else None
File "
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Traceback (most recent call last):
File "tuned3.py", line 303, in main
Accelerator(log_with=cfg.tracking.report_to, logging_dir=cfg.output_dir) if cfg.tracking.enabled else Accelerator()
File "/opt/conda/lib/python3.7/site-packages/accelerate/accelerator.py", line 235, in init
DeepSpeedPlugin() if os.environ.get("ACCELERATE_USE_DEEPSPEED", "false") == "true" else None
File "
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. [15:52:12] ERROR failed (exitcode: 1) local_rank: 0 (pid: 3094) of binary: /opt/conda/bin/python
Hello @grgpa, from the stack trace it seems you neither are using DeepSpeedPlugin
object nor accelerate config
file. For the time being, please pass --gradient_accumulation_steps
or use the config file created by answering the questionnaire via command accelerate config
. A PR to fix the above issue is under review.
The easiest way is accelerate config
as mentioned in the stack trace:
To avoid this warning pass in values for each of the problematic parameters or run
accelerate config
I did it...thanks for the help.
Hi @pacman100 ...can you help me with this error as well..
[2022-12-26 16:08:28,965] [INFO] [config.py:1024:print] zero_enabled ................. True [2022-12-26 16:08:28,965] [INFO] [config.py:1024:print] zero_optimization_stage ...... 3 [2022-12-26 16:08:28,965] [INFO] [config.py:1015:print_user_config] json = { "train_batch_size": 64, "train_micro_batch_size_per_gpu": 32, "gradient_accumulation_steps": 1, "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "none" }, "offload_param": { "device": "none" }, "stage3_gather_16bit_weights_on_model_save": true }, "steps_per_print": inf, "zero_allow_untested_optimizer": true } Using /root/.cache/torch_extensions/py37_cu113 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.00037407875061035156 seconds Error executing job with overrides: [] Traceback (most recent call last): File "probe1.py", line 343, in main "lr_scheduler_type" KeyError: 'lr_scheduler_type'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. [16:08:29] WARNING Sending process 2248 closing signal SIGTERM api.py:698 [16:08:30] ERROR failed (exitcode: 1) local_rank: 0 (pid: 2247) of binary: /opt/conda/bin/python api.py:672
Hi @pacman100 ...can you help me with this error as well..
[2022-12-26 16:08:28,965] [INFO] [config.py:1024:print] zero_enabled ................. True [2022-12-26 16:08:28,965] [INFO] [config.py:1024:print] zero_optimization_stage ...... 3 [2022-12-26 16:08:28,965] [INFO] [config.py:1015:print_user_config] json = { "train_batch_size": 64, "train_micro_batch_size_per_gpu": 32, "gradient_accumulation_steps": 1, "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "none" }, "offload_param": { "device": "none" }, "stage3_gather_16bit_weights_on_model_save": true }, "steps_per_print": inf, "zero_allow_untested_optimizer": true } Using /root/.cache/torch_extensions/py37_cu113 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.00037407875061035156 seconds Error executing job with overrides: [] Traceback (most recent call last): File "probe1.py", line 343, in main "lr_scheduler_type" KeyError: 'lr_scheduler_type'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. [16:08:29] WARNING Sending process 2248 closing signal SIGTERM api.py:698 [16:08:30] ERROR failed (exitcode: 1) local_rank: 0 (pid: 2247) of binary: /opt/conda/bin/python api.py:672
Hi, can you share a link to the script or paste the full error?
Hi...@asifehmad
""" Fine-tuning the library models for causal language modeling (GPT, GPT-2, CTRL, ...) on a text file or a dataset without using HuggingFace Trainer.
Here is the full list of checkpoints on the hub that can be fine-tuned by this script: https://huggingface.co/models?filter=text-generation """
import logging import math import os import random from itertools import chain
import datasets import hydra import torch import transformers from accelerate import Accelerator, DistributedType, DeepSpeedPlugin from accelerate.logging import get_logger from accelerate.utils import set_seed from datasets import Dataset, DatasetDict, load_dataset from omegaconf import OmegaConf from omegaconf.dictconfig import DictConfig from torch.utils.data import DataLoader from tqdm.auto import tqdm from transformers import ( AutoConfig, AutoModelForCausalLM, AutoTokenizer, default_data_collator, get_scheduler, )
import bittensor deepspeed_plugin = DeepSpeedPlugin(zero_stage=3, gradient_accumulation_steps=4)
def check_cfg_and_load_defaults(cfg: DictConfig) -> DictConfig:
subtensor = bittensor.subtensor(network=cfg.bittensor.network)
if cfg.dataset.block_size is None:
cfg.dataset.block_size = subtensor.validator_sequence_length
if cfg.training.train_batch_size is None:
cfg.training.train_batch_size = subtensor.validator_batch_size
if cfg.training.eval_batch_size is None:
cfg.training.eval_batch_size = subtensor.validator_batch_size
return cfg
def create_accelerator(cfg: DictConfig) -> Accelerator:
accelerator = (
Accelerator(log_with=cfg.tracking.report_to, logging_dir=cfg.output_dir)
if cfg.tracking.enabled
else Accelerator(mixed_precision="fp16", deepspeed_plugin=deepspeed_plugin)
)
if accelerator.is_local_main_process:
datasets.utils.logging.set_verbosity_warning()
transformers.utils.logging.set_verbosity_info()
else:
datasets.utils.logging.set_verbosity_error()
transformers.utils.logging.set_verbosity_error()
return accelerator
def load_raw_datasets(cfg: DictConfig) -> DatasetDict:
if cfg.dataset.name == "bittensor":
dataset = bittensor.dataset(
no_tokenizer=True,
batch_size=cfg.training.train_batch_size,
block_size=cfg.dataset.block_size,
)
dataloader = dataset.dataloader(cfg.dataset.num_batches)
bittensor_dataset = {"text": []}
for batch in tqdm(dataloader, desc="Loading data from bittensor IPFS"):
bittensor_dataset["text"].extend(batch)
raw_datasets = Dataset.from_dict(bittensor_dataset)
dataset.close() # Avoid leaving threadqueue running.
return raw_datasets
if os.path.exists(cfg.dataset.name):
data_files = {"text": cfg.dataset.name}
dataset_args = {}
extension = os.path.splitext(cfg.dataset.name)[-1].lstrip(".")
if extension == "txt":
extension = "text"
dataset_args["keep_linebreaks"] = cfg.dataset.keep_linebreaks
raw_datasets = load_dataset(extension, data_files=data_files, **dataset_args)
raw_datasets = raw_datasets["text"]
else:
raw_datasets = load_dataset(cfg.dataset.name, cfg.dataset.config_name)
return raw_datasets
def load_model_and_tokenizer(cfg: DictConfig):
if cfg.model.config_name is not None:
config = AutoConfig.from_pretrained(cfg.model.config_name)
else:
config = AutoConfig.from_pretrained(cfg.model.name)
if cfg.tokenizer.name is not None:
tokenizer = AutoTokenizer.from_pretrained(
cfg.tokenizer.name, use_fast=cfg.tokenizer.use_fast
)
else:
tokenizer = AutoTokenizer.from_pretrained(
cfg.model.name, use_fast=cfg.tokenizer.use_fast
)
#tokenizer.pad_token = cfg.tokenizer.pad_token
if tokenizer.pad_token is None and tokenizer.eos_token is not None:
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
cfg.model.name,
from_tf=bool(".ckpt" in cfg.model.name),
config=config,
)
model.resize_token_embeddings(len(tokenizer))
return tokenizer, model
def create_optimizer(cfg, model):
no_decay = ["bias", "LayerNorm.weight"]
optimizer_grouped_parameters = [
{
"params": [
p
for n, p in model.named_parameters()
if not any(nd in n for nd in no_decay)
],
"weight_decay": cfg.training.weight_decay,
},
{
"params": [
p
for n, p in model.named_parameters()
if any(nd in n for nd in no_decay)
],
"weight_decay": 0.0,
},
]
return torch.optim.AdamW(
optimizer_grouped_parameters, lr=cfg.training.learning_rate
)
def preprocess(cfg, accelerator, tokenizer, raw_datasets):
# First we tokenize all the texts.
column_names = raw_datasets.column_names
text_column_name = "text" if "text" in column_names else column_names["train"][0]
if cfg.dataset.concatenate_raw is True:
pad = False
else:
pad = "max_length"
def group_texts(examples):
#print(examples)
# Concatenate all texts.
concatenated_examples = {k: list(chain(*examples[k])) for k in examples.keys()}
#print(concatenated_examples)
total_length = len(concatenated_examples[list(examples.keys())[0]])
if total_length >= cfg.dataset.block_size:
total_length = (
total_length // cfg.dataset.block_size
) * cfg.dataset.block_size
# Split by chunks of max_len.
result = {
k: [
t[i : i + cfg.dataset.block_size]
for i in range(0, total_length, cfg.dataset.block_size)
]
for k, t in concatenated_examples.items()
}
result["labels"] = result["input_ids"].copy()
return result
def tokenize_fn(examples):
return tokenizer(examples[text_column_name])
with accelerator.main_process_first():
tokenized_datasets = raw_datasets.map(
tokenize_fn,
batched=True,
remove_columns=text_column_name,
num_proc=cfg.tokenizer.preprocessing_num_workers,
load_from_cache_file=not cfg.dataset.overwrite_cache,
desc="Running tokenizer on dataset",
)
#print(tokenized_datasets["train"][0:10])
if cfg.dataset.concatenate_raw is True:
lm_datasets = tokenized_datasets.map(
group_texts,
batched=True,
num_proc=cfg.tokenizer.preprocessing_num_workers,
load_from_cache_file=not cfg.dataset.overwrite_cache,
desc=f"Grouping texts in chunks of {cfg.dataset.block_size}",
)
return lm_datasets
@hydra.main(version_base=None, config_path="conf", config_name="config") def main(cfg: DictConfig):
cfg = check_cfg_and_load_defaults(cfg)
os.makedirs(cfg.output_dir, exist_ok=True)
logger = get_logger(__name__)
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO,
)
accelerator = create_accelerator(cfg)
accelerator.wait_for_everyone()
if cfg.training.seed is not None:
logger.info(f"Setting random seed to {cfg.training.seed}")
set_seed(cfg.training.seed)
logger.info(accelerator.state, main_process_only=False)
logger.info(OmegaConf.to_yaml(cfg))
tokenizer, model = load_model_and_tokenizer(cfg)
optimizer = create_optimizer(cfg, model)
lr_scheduler = get_scheduler(
name=cfg.training.lr_scheduler,
optimizer=optimizer,
num_warmup_steps=cfg.training.lr_warmup_steps,
num_training_steps=cfg.training.max_train_steps,
)
# On TPU, the tie weights in our model have been disconnected, so we need to restore the ties.
if accelerator.distributed_type == DistributedType.TPU:
model.tie_weights()
# Load and preprocess data
raw_datasets = load_raw_datasets(cfg)
tokenized_datasets = preprocess(cfg, accelerator, tokenizer, raw_datasets)
if "train" not in tokenized_datasets.column_names:
tokenized_datasets = tokenized_datasets.train_test_split(
test_size=cfg.training.val_split_percent / 100
)
tokenized_datasets_test_valid = tokenized_datasets["test"].train_test_split(
test_size=0.5
)
tokenized_datasets["test"] = tokenized_datasets_test_valid["train"]
tokenized_datasets["validation"] = tokenized_datasets_test_valid["test"]
train_dataset = tokenized_datasets["train"]
eval_dataset = tokenized_datasets["validation"]
# Log a few random samples from the training set:
for index in random.sample(range(len(train_dataset)), 3):
ex = train_dataset[index]
logger.info(f"Sample {index} of the training set: {ex}: \n")
logger.info(tokenizer.decode(ex["input_ids"]))
# DataLoaders creation:
train_dataloader = DataLoader(
train_dataset,
shuffle=True,
collate_fn=default_data_collator,
batch_size=cfg.training.train_batch_size,
)
eval_dataloader = DataLoader(
eval_dataset,
collate_fn=default_data_collator,
batch_size=cfg.training.eval_batch_size,
)
# Prepare everything using our accelerator
(
model,
optimizer,
train_dataloader,
eval_dataloader,
lr_scheduler,
) = accelerator.prepare(
model, optimizer, train_dataloader, eval_dataloader, lr_scheduler
)
# Scheduler and math around the number of training steps.
overrode_max_train_steps = False
num_update_steps_per_epoch = math.ceil(
len(train_dataloader) / cfg.training.gradient_accumulation_steps
)
if cfg.training.max_train_steps is None:
cfg.training.max_train_steps = (
cfg.training.num_epochs * num_update_steps_per_epoch
)
overrode_max_train_steps = True
# We need to recalculate our total training steps as the size of the training dataloader
# may have changed.
num_update_steps_per_epoch = math.ceil(
len(train_dataloader) / cfg.training.gradient_accumulation_steps
)
if overrode_max_train_steps:
cfg.training.max_train_steps = (
cfg.training.num_epochs * num_update_steps_per_epoch
)
# Afterwards we recalculate our number of training epochs
cfg.training.num_epochs = math.ceil(
cfg.training.max_train_steps / num_update_steps_per_epoch
)
# We need to initialize the trackers we use, and also store our configuration.
# We initialize the trackers only on main process because `accelerator.log`
# only logs on main process and we don't want empty logs/runs on other processes.
if cfg.tracking.enabled is True and accelerator.is_main_process:
experiment_config = vars(cfg)
# TensorBoard cannot log Enums, need the raw value
experiment_config["lr_scheduler_type"] = experiment_config[
"lr_scheduler_type"
].value
accelerator.init_trackers("prob", experiment_config)
logger.info("***** Running training *****")
logger.info(f" Num examples = {len(train_dataset)}")
logger.info(f" Num Epochs = {cfg.training.num_epochs}")
logger.info(
f" Gradient Accumulation steps = {cfg.training.gradient_accumulation_steps}"
)
logger.info(f" Total optimization steps = {cfg.training.max_train_steps}")
# Only show the progress bar once on each machine.
progress_bar = tqdm(
range(cfg.training.max_train_steps),
disable=not accelerator.is_local_main_process,
)
completed_steps = 0
starting_epoch = 0
# Potentially load in the weights and states from a previous save
if cfg.training.checkpoint.resume_from_checkpoint > 0:
accelerator.print(
f"Resumed from checkpoint: {cfg.training.checkpoint.resume_from_checkpoint}"
)
accelerator.load_state(cfg.training.checkpoint.resume_from_checkpoint)
path = os.path.basename(cfg.training.checkpoint.resume_from_checkpoint)
training_difference = os.path.splitext(path)[0]
if "epoch" in training_difference:
starting_epoch = int(training_difference.replace("epoch_", "")) + 1
resume_step = None
else:
resume_step = int(training_difference.replace("step_", ""))
starting_epoch = resume_step // len(train_dataloader)
resume_step -= starting_epoch * len(train_dataloader)
for epoch in range(starting_epoch, cfg.training.num_epochs):
model.train()
if cfg.tracking.enabled is True:
total_loss = 0
train_losses = []
for step, batch in enumerate(train_dataloader):
# We need to skip steps until we reach the resumed step
if (
cfg.training.checkpoint.resume_from_checkpoint
and epoch == starting_epoch
):
if resume_step is not None and step < resume_step:
completed_steps += 1
continue
outputs = model(**batch)
loss = outputs.loss
train_losses.append(
accelerator.gather(loss.repeat(cfg.training.train_batch_size))
)
# We keep track of the loss at each epoch
if cfg.tracking.enabled is True:
total_loss += loss.detach().float()
loss = loss / cfg.training.gradient_accumulation_steps
accelerator.backward(loss)
if (
step % cfg.training.gradient_accumulation_steps == 0
or step == len(train_dataloader) - 1
):
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
progress_bar.update(1)
completed_steps += 1
if step % cfg.training.eval_every == 0:
train_losses_tensor = torch.cat(train_losses)
train_loss = torch.mean(train_losses_tensor)
model.eval()
eval_losses = []
for _eval_step, eval_batch in enumerate(eval_dataloader):
with torch.no_grad():
outputs = model(**eval_batch)
loss = outputs.loss
eval_losses.append(
accelerator.gather(loss.repeat(cfg.training.eval_batch_size))
)
losses = torch.cat(eval_losses)
losses = losses[: len(eval_dataset)]
try:
eval_loss = torch.mean(losses)
perplexity = math.exp(eval_loss)
except OverflowError:
perplexity = float("inf")
logger.info(
f"epoch {epoch}: perplexity: {perplexity} train_loss: {train_loss} eval_loss: {eval_loss}"
)
epoch_dir = f"epoch_{epoch}_most_recent"
if cfg.output_dir is not None:
output_dir = os.path.join(cfg.output_dir, epoch_dir)
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(
output_dir,
is_main_process=accelerator.is_main_process,
save_function=accelerator.save,
)
if accelerator.is_main_process:
tokenizer.save_pretrained(output_dir)
model.train()
if cfg.tracking.enabled is True:
accelerator.log(
{
"perplexity": perplexity,
"eval_loss": eval_loss,
"train_loss": total_loss.item() / len(train_dataloader),
"epoch": epoch,
"step": completed_steps,
},
step=completed_steps,
)
logger.info(f"done epoch {epoch}")
if cfg.output_dir is not None:
accelerator.wait_for_everyone()
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(
cfg.output_dir,
is_main_process=accelerator.is_main_process,
save_function=accelerator.save,
)
if accelerator.is_main_process:
tokenizer.save_pretrained(cfg.output_dir)
print('Pushing Model weights and other related files to Hugging Face Hub')
model.push_to_hub(cfg.output_dir)
print('Pushing the Tokenizer and related files to Hugging Face Hub')
tokenizer.push_to_hub(cfg.output_dir)
if name == "main": main()
@grgpa that's probably because you've activated the report_to
variable in the configuration file. I had the same issue when I did that but commenting out the following lines fixed the issue:
I am trying to integrate deep-speed into this script and have successfully run it for zero stage 2, but when I tried it for zero stage 3 this error prompts just after completion of the first epoch. I have made changes in the finetune_using_clm.py file as suggested in this huggingface/accelerate repo, and have created a new file tuned.py.
The error for the zero stage 3, indicates to the:
Traceback (most recent call last): File "tuned.py", line 398, in main accelerator.backward(loss)
The whole error is:I don't know why it gives this error as it is running well while using the zero stage 2.
Any help in this regard would be highly appreciated.
I am using Google Colab for the task.
Packages version: mpi4py-3.1.4 deepspeed-0.7.6 accelerate-0.15.0 transformers-4.25.1