Closed shub-kris closed 7 months ago
@philschmid as it's first PR, wanted to make it short to let you see the format. The next PR will contain other docker files with other versions all together.
Left some comments. Did you try building and testing it?
Yes, built it and tested it. Need to upgrade peft
as it looks like peft
still uses flash-attn-1
Yes, built it and tested it.
If we ever want to make the build as a github action, keep in mind that the (free) disk space on GH-hosted VM is kinda small. (although we probably only need to trigger the build once - so I don't know if we need an action for it)
Yes, built it and tested it.
If we ever want to make the build as a github action, keep in mind that the (free) disk space on GH-hosted VM is kinda small. (although we probably only need to trigger the build once - so I don't know if we need an action for it)
We probably need hosted runners for this since building with flash attention is also intesiv.
Works for me! My only nit comment left is: myself found that having to type
python3
withing (such) images are inconvenient.For example, a user copy some commands like
python -m pip install xxx
and paste it to the docker environment. This will fail within it withroot@1d2e88519ae6:/# python bash: python: command not found
Or launch a bash script which the code in it is
python ...
If we can make an alias of
python
topython3
, it will be more friendly I guess.
Agreed lets make sure python
is python3
there should be even a apt package which is commonly used.
@philschmid for flash-attention, do we want v1
or v2
? The pytorch image comes with flash attention v1 (1.0.5)
and transformer-engine-0.8
if we want to change to v2
then we need to upgrade transformer-engine which is currently breaking changes but I am working on it.
@philschmid for flash-attention, do we want v1 or v2? The pytorch image comes with flash attention v1 and if we want to change to v2 then we need to upgrade transformer-engine which is currently breaking changes but I am working on it.
v2 for sure. We can uninstall transformers-engine if needed for now. Its not a priority.
@philschmid do we want an another dockerfile for transformers==4.37.0
?
Or can I move to torch 2.1?
@philschmid do we want an another dockerfile for
transformers==4.37.0
? Or can I move to torch 2.1?
We should all of the latest available version in here.
@philschmid I updated the Dockerfile to have the latest versions.
As FlashAttnV2 (2.0.4
) comes out of the box(https://pytorch.org/blog/pytorch2-2/) with PyTorch 2.2.x
so we don't need to install it separately and also don't need to uninstall Transformer-engine 1.2.1
as the versions are compatible with each other.
I tried to upgrade the FlashAttnV2 version to 2.5.2
but it failed and complained some symbol missing. So looks like the PyTorch version that comes with our base images is till not compatible with FlashAttnV2(2.5.2
).
I tested the image by using this:
git clone https://github.com/huggingface/transformers cd transformers git checkout tags/v4.37.2
python3 examples/pytorch/text-classification/run_glue.py \ --model_name_or_path bert-base-cased \ --dataset_name emotion \ --do_train \ --do_eval \ --per_device_train_batch_size 32 \ --num_train_epochs 3 \ --output_dir /bert-test
- @ydshieh I added specific check for the torch and the flash-attn versions. For now I have hardcoded the values when the check thing is happening.
@philschmid regarding GCP dependencies, what kind of things they should be able to do?
Some of the packages that comes on my mind are:
- As FlashAttnV2 (
2.0.4
) comes out of the box(https://pytorch.org/blog/pytorch2-2/) with PyTorch2.2.x
so we don't need to install it separately and also don't need to uninstall Transformer-engine1.2.1
as the versions are compatible with each other.
I am not sure if that's already integrated into transformers, also i saw some comments that it is still a bit worse and the official flash-attn
is what is used by majority of people and us too.
Pytorch 2.2 came out last week, i am not sure if every package is stable already on that so 2.1.X is fine too.
I tried to upgrade the FlashAttnV2 version to 2.5.2 but it failed and complained some symbol missing. So looks like the PyTorch version that comes with our base images is till not compatible with FlashAttnV2(2.5.2).
^If thats true also for older version we might not be able to use the nvidia images, but having the flash-attn
packages is essential
I tested the image by using this:
That doesn't test the usage of flash attention or other libraries, e.g. trl right?
regarding GCP dependencies, what kind of things they should be able to do?
I don't know, the ones who make it easy and needed to work with other GCP services, e.g. GCS or Pub/sub...
I am not sure if that's already integrated into transformers, also i saw some comments that it is still a bit worse and the official
flash-attn
is what is used by majority of people and us too. Pytorch 2.2 came out last week, i am not sure if every package is stable already on that so 2.1.X is fine too.^If thats true also for older version we might not be able to use the nvidia images, but having the
flash-attn
packages is essential
On debugging, The issue comes out is the Transformer-engine, as it just supports flash-attn <= 2.4.2
Link. So, we have two options:
I don't know, the ones who make it easy and needed to work with other GCP services, e.g. GCS or Pub/sub...
Added some basic ones that could be helpful for ML stuff. gcloud-sdk
comes with cli and gsutil
, then some related to storage like GCS and BigQuery, ai-platform for vertex-ai.
That doesn't test the usage of flash attention or other libraries, e.g. trl right?
I tested different examples using the (2) setup:
trl
and peft
by running this: https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py with
python examples/scripts/sft.py \
--model_name="facebook/opt-125m" \
--learning_rate=1.41e-5 \
--batch_size=32 \
--gradient_accumulation_steps=16 \
--output_dir="sft_openassistant-guanaco" \
--logging_steps=1 \
--num_train_epochs=1 \
--max_steps=-1 \
--gradient_checkpointing \
--use_peft \
--peft_lora_r=64 \
--peft_lora_alpha=16
diffusers
, accelerate
and deepseed
by running this: https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#training-on-a-8-gb-gpuOne more thing about flash-attn:
FlashAttention-2 currently supports:
Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon, please use FlashAttention 1.x for Turing GPUs for now.
Datatype fp16 and bf16 (bf16 requires Ampere, Ada, or Hopper GPUs).
All head dimensions up to 256. Head dim > 192 backward requires A100/A800 or H100/H800.
I think we can uninstall transformers-engine
for now its not yet supported by the Trainer
I think we can uninstall
transformers-engine
for now its not yet supported by theTrainer
I take my words back regarding using latest flash-attn and pytorch-2.2.x: There also seems to be an issue there and I found that even with transformer-engine uninstalled, the flash-attn-2.5.2 doesn't seem to work and complains about missing symbol. So, for now let's keep flash-attn: 2.4.2 as it was released in end of December, 2023. Flash-attn 2.5.2 was released four days ago and looks like still has compilation issues with torch 2.2.x. I also tried 2.5.x versions with pytorch 2.2.x and doesn't seem to work
I will test flash-attn
separately to be sure, it really works
I think we can uninstall
transformers-engine
for now its not yet supported by theTrainer
I take my words back regarding using latest flash-attn and pytorch-2.2.x: There also seems to be an issue there and I found that even with transformer-engine uninstalled, the flash-attn-2.5.2 doesn't seem to work and complains about missing symbol. So, for now let's keep flash-attn: 2.4.2 as it was released in end of December, 2023. Flash-attn 2.5.2 was released four days ago and looks like still has compilation issues with torch 2.2.x. I also tried 2.5.x versions with pytorch 2.2.x and doesn't seem to work
I will test
flash-attn
separately to be sure, it really works
Then lets go with a previous version of Pytorch and upgrade once supported. As mentioned in the past Pytorch 2.2 officially came out last week. And the nvidia container is build on some commit.
@philschmid tested flash-attnv2 with running this on a A100 GPU
import torch
from transformers import OPTForCausalLM, GPT2Tokenizer
device = "cuda" # the device to load the model onto
model = OPTForCausalLM.from_pretrained("facebook/opt-350m", torch_dtype=torch.float16, attn_implementation="flash_attention_2")
tokenizer = GPT2Tokenizer.from_pretrained("facebook/opt-350m")
prompt = ("A chat between a curious human and the Statue of Liberty.\n\nHuman: What is your name?\nStatue: I am the "
model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
model.to(device)
generated_ids = model.generate(**model_inputs, max_new_tokens=30, do_sample=False)
tokenizer.batch_decode(generated_ids)[0]
And also tried training: by changing this: https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_glue.py#L401C1-L410C6 with
model = AutoModelForSequenceClassification.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
token=model_args.token,
trust_remote_code=model_args.trust_remote_code,
ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2"
)
and then running the script with
python examples/pytorch/text-classification/run_glue.py \
--model_name_or_path facebook/opt-125m \
--dataset_name emotion \
--do_train \
--do_eval \
--per_device_train_batch_size 32 \
--num_train_epochs 3 \
--output_dir /bert-test
@philschmid I think it should be good to be merged.
This PR aims to add Dockerfile for
pytorch-training-gpu.2.2.transformers.4.37.2.py310
image. Addresses issue #3