How to evaluate checkpoints during pretraining?

DarthMurse commented 10 months ago

I'm learning how to train a language model from scratch, and I was training a 120M tinyLlama model with bookcorpus. I wonder how I can evaluate the checkpoints using GLUE. I have read EVAL.md which says that GPT4ALL eval suite can be used for evaluation of the checkpoints. But when I got to the website for GPT4ALL, I didn't see anything related with the evaluation suite. So how can I evaluate the checkpoints by GLUE easily? (By the way, I'm from China so it's difficult for me to push custom model to huggingface hub)

keeeeenw commented 7 months ago

Update: I figured out the exact details.

To convert the checkpoint to hugging face format, you can execute the following bash command

# Setup up some directories. In my case, I have the checkpoint from the current training process in OUT_DIR for my custom tinyllama_500M model
OUT_DIR=out/tinyllama_500M
# Current checkpoint name I would like to evaluate
CHECKPOINT=iter-160000-ckpt.pth
# My custom model name. This should be the same as the one you defined in config.py. If you did not change any model config, you can use the default tiny_LLaMA_1b.
MODEL_NAME=tiny_LLaMA_500M
# Define a new valid directory where you will put all the artifacts for inference
INFERENCE_DIR=out/pretrained

# Run the actual conversion command. This will produce iter-160000-ckpt.bin in my case and a config.json
python scripts/convert_lit_checkpoint.py --out_dir $OUT_DIR --checkpoint_name $CHECKPOINT --model_name $MODEL_NAME --model_only False

# Move both bin and config.json to the inference directory. Rename the bin file to pytorch_model.bin because
# hugging face will expect this file name
mv $OUT_DIR/iter-160000-ckpt.bin $INFERENCE_DIR/pytorch_model.bin
mv $OUT_DIR/config.json $INFERENCE_DIR/

For some inference code

import torch
import transformers 
import os
import sys

from pathlib import Path
from transformers import LlamaTokenizer
from transformers import LlamaForCausalLM

out_dir = "<FULL_PATH>/out/pretrained"
model_path = os.path.join(out_dir, "pytorch_model.bin")
# You could also try loading it into GPU
state_dict = torch.load(model_path, map_location=torch.device("cpu"))
# use the same one you used to generate the data.
tokenizer_path = "<PATH TO YOUR TOKENIZER>"
model = LlamaForCausalLM.from_pretrained(
    out_dir, local_files_only=True, state_dict=state_dict
)
tokenizer = LlamaTokenizer.from_pretrained(tokenizer_path)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
    tokenizer=tokenizer
)

prompt = "define a python function that generate a random number between 1 and 100."
formatted_prompt = (
    f"### Human: {prompt} ### Assistant:"
)

sequences = pipeline(
    formatted_prompt,
    do_sample = True,
    top_k = 5,
    top_p = 0.9,
    num_return_sequences = 1,
    repetition_penalty = 1.1,
    max_new_tokens = 1024,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

I found the original instruction for converting to lit model somewhat helpful https://github.com/Lightning-AI/litgpt/blob/main/tutorials/convert_lit_models.md

It also has some instructions for evaluation but I have not yet tried. You can basically change any eval code supporting hugging face format based on the inference code above.

Original Message: This does not answer your question but I am trying to evaluate pretrained checkpoint by myself as well so here is what I know so far.

At the high level, you want to convert the checkpoint for the tinyLlama model to the huggingface format. And then you can load it into any eval system that supports hugging face model

For example, https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/models/huggingface.py#L469 just calls hugging face model using the generic transformers.AutoConfig.from_pretrained API.

Therefore, something like this should work:

lm_eval --model hf \
    --model_args pretrained=<YOUR_LOCAL_PATH_TO_HUGGING_FACE_FORMAT> \
    --tasks lambada_openai,hellaswag \
    --device cuda:0 \
    --batch_size auto:4

I am still figuring out the details but you can use this as reference when converting from PyTorch checkpoint to the Huggingface format: https://github.com/jzhang38/TinyLlama/blob/main/scripts/convert_hf_checkpoint.py

For some reference, the OpenLLama/EasyLM author has a tool to convert PyTorch checkpoint to their format: https://github.com/young-geng/EasyLM/blob/main/EasyLM/models/llama/convert_torch_to_easylm.py And then, you can convert it to the hugging face format (I know for sure this one works because I have tried this one before): https://github.com/young-geng/EasyLM/blob/main/EasyLM/models/llama/convert_easylm_to_hf.py

DarthMurse commented 7 months ago

Thank you very much your your detailed instruction! I think this solved my problem.

jzhang38 / TinyLlama

How to evaluate checkpoints during pretraining? #140