Add Dockerfile, update README container Link

shub-kris commented 7 months ago

This PR aims to add Dockerfile for pytorch-training-gpu.2.2.transformers.4.37.2.py310 image. Addresses issue #3

shub-kris commented 7 months ago

@philschmid as it's first PR, wanted to make it short to let you see the format. The next PR will contain other docker files with other versions all together.

shub-kris commented 7 months ago

Left some comments. Did you try building and testing it?

Yes, built it and tested it. Need to upgrade peft as it looks like peft still uses flash-attn-1

ydshieh commented 7 months ago

Yes, built it and tested it.

If we ever want to make the build as a github action, keep in mind that the (free) disk space on GH-hosted VM is kinda small. (although we probably only need to trigger the build once - so I don't know if we need an action for it)

philschmid commented 7 months ago

Yes, built it and tested it.

If we ever want to make the build as a github action, keep in mind that the (free) disk space on GH-hosted VM is kinda small. (although we probably only need to trigger the build once - so I don't know if we need an action for it)

We probably need hosted runners for this since building with flash attention is also intesiv.

philschmid commented 7 months ago

Works for me! My only nit comment left is: myself found that having to type python3 withing (such) images are inconvenient.

For example, a user copy some commands like python -m pip install xxx and paste it to the docker environment. This will fail within it with

root@1d2e88519ae6:/# python bash: python: command not found

Or launch a bash script which the code in it is python ...

If we can make an alias of python to python3, it will be more friendly I guess.

Agreed lets make sure python is python3 there should be even a apt package which is commonly used.

shub-kris commented 7 months ago

@philschmid for flash-attention, do we want v1 or v2? The pytorch image comes with flash attention v1 (1.0.5) and transformer-engine-0.8 if we want to change to v2 then we need to upgrade transformer-engine which is currently breaking changes but I am working on it.

philschmid commented 7 months ago

@philschmid for flash-attention, do we want v1 or v2? The pytorch image comes with flash attention v1 and if we want to change to v2 then we need to upgrade transformer-engine which is currently breaking changes but I am working on it.

v2 for sure. We can uninstall transformers-engine if needed for now. Its not a priority.

shub-kris commented 7 months ago

@philschmid do we want an another dockerfile for transformers==4.37.0? Or can I move to torch 2.1?

philschmid commented 7 months ago

@philschmid do we want an another dockerfile for transformers==4.37.0? Or can I move to torch 2.1?

We should all of the latest available version in here.

shub-kris commented 7 months ago

@philschmid I updated the Dockerfile to have the latest versions.

As FlashAttnV2 (2.0.4) comes out of the box(https://pytorch.org/blog/pytorch2-2/) with PyTorch 2.2.x so we don't need to install it separately and also don't need to uninstall Transformer-engine 1.2.1 as the versions are compatible with each other.
I tried to upgrade the FlashAttnV2 version to 2.5.2 but it failed and complained some symbol missing. So looks like the PyTorch version that comes with our base images is till not compatible with FlashAttnV2(2.5.2).
I tested the image by using this:

git clone https://github.com/huggingface/transformers cd transformers git checkout tags/v4.37.2

python3 examples/pytorch/text-classification/run_glue.py \ --model_name_or_path bert-base-cased \ --dataset_name emotion \ --do_train \ --do_eval \ --per_device_train_batch_size 32 \ --num_train_epochs 3 \ --output_dir /bert-test


- @ydshieh I added specific check for the torch and the flash-attn versions. For now I have hardcoded the values when the check thing is happening.

shub-kris commented 7 months ago

@philschmid regarding GCP dependencies, what kind of things they should be able to do?

Some of the packages that comes on my mind are:

Google Cloud CLI (gcloud): Install the Google Cloud SDK to provide command-line tools and libraries for interacting with GCP services.
Google Cloud Storage (gsutil, CLI) and (google-cloud-storage, python library): To interact with Google Cloud Storage
Vertex AI (google-cloud-platform, python library)
Here is the list of all GCP python libraries: https://cloud.google.com/python/docs/reference

philschmid commented 7 months ago

As FlashAttnV2 (2.0.4) comes out of the box(https://pytorch.org/blog/pytorch2-2/) with PyTorch 2.2.x so we don't need to install it separately and also don't need to uninstall Transformer-engine 1.2.1 as the versions are compatible with each other.

I am not sure if that's already integrated into transformers, also i saw some comments that it is still a bit worse and the official flash-attn is what is used by majority of people and us too. Pytorch 2.2 came out last week, i am not sure if every package is stable already on that so 2.1.X is fine too.

I tried to upgrade the FlashAttnV2 version to 2.5.2 but it failed and complained some symbol missing. So looks like the PyTorch version that comes with our base images is till not compatible with FlashAttnV2(2.5.2).

^If thats true also for older version we might not be able to use the nvidia images, but having the flash-attn packages is essential

I tested the image by using this:

That doesn't test the usage of flash attention or other libraries, e.g. trl right?

regarding GCP dependencies, what kind of things they should be able to do?

I don't know, the ones who make it easy and needed to work with other GCP services, e.g. GCS or Pub/sub...

shub-kris commented 7 months ago

I am not sure if that's already integrated into transformers, also i saw some comments that it is still a bit worse and the official flash-attn is what is used by majority of people and us too. Pytorch 2.2 came out last week, i am not sure if every package is stable already on that so 2.1.X is fine too.

^If thats true also for older version we might not be able to use the nvidia images, but having the flash-attn packages is essential

On debugging, The issue comes out is the Transformer-engine, as it just supports flash-attn <= 2.4.2Link. So, we have two options:

either we uninstall the transformer-engine and use the latest flash-attn 2.5.2
use flash-attn: 2.4.2 and keep everything as it is.

I don't know, the ones who make it easy and needed to work with other GCP services, e.g. GCS or Pub/sub...

Added some basic ones that could be helpful for ML stuff. gcloud-sdk comes with cli and gsutil, then some related to storage like GCS and BigQuery, ai-platform for vertex-ai.

That doesn't test the usage of flash attention or other libraries, e.g. trl right?

I tested different examples using the (2) setup:

trl and peft by running this: https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py with

python examples/scripts/sft.py    \
--model_name="facebook/opt-125m"  \
--learning_rate=1.41e-5   \
--batch_size=32  \
--gradient_accumulation_steps=16 \
--output_dir="sft_openassistant-guanaco" \     
--logging_steps=1  \   
--num_train_epochs=1   \
--max_steps=-1   \
--gradient_checkpointing   \
--use_peft   \
--peft_lora_r=64  \
--peft_lora_alpha=16

diffusers, accelerate and deepseed by running this: https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#training-on-a-8-gb-gpu

One more thing about flash-attn:

FlashAttention-2 currently supports:

Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon, please use FlashAttention 1.x for Turing GPUs for now.
Datatype fp16 and bf16 (bf16 requires Ampere, Ada, or Hopper GPUs).
All head dimensions up to 256. Head dim > 192 backward requires A100/A800 or H100/H800.

philschmid commented 7 months ago

I think we can uninstall transformers-engine for now its not yet supported by the Trainer

shub-kris commented 7 months ago

I think we can uninstall transformers-engine for now its not yet supported by the Trainer

I take my words back regarding using latest flash-attn and pytorch-2.2.x: There also seems to be an issue there and I found that even with transformer-engine uninstalled, the flash-attn-2.5.2 doesn't seem to work and complains about missing symbol. So, for now let's keep flash-attn: 2.4.2 as it was released in end of December, 2023. Flash-attn 2.5.2 was released four days ago and looks like still has compilation issues with torch 2.2.x. I also tried 2.5.x versions with pytorch 2.2.x and doesn't seem to work

I will test flash-attn separately to be sure, it really works

philschmid commented 7 months ago

I think we can uninstall transformers-engine for now its not yet supported by the Trainer

I take my words back regarding using latest flash-attn and pytorch-2.2.x: There also seems to be an issue there and I found that even with transformer-engine uninstalled, the flash-attn-2.5.2 doesn't seem to work and complains about missing symbol. So, for now let's keep flash-attn: 2.4.2 as it was released in end of December, 2023. Flash-attn 2.5.2 was released four days ago and looks like still has compilation issues with torch 2.2.x. I also tried 2.5.x versions with pytorch 2.2.x and doesn't seem to work

I will test flash-attn separately to be sure, it really works

Then lets go with a previous version of Pytorch and upgrade once supported. As mentioned in the past Pytorch 2.2 officially came out last week. And the nvidia container is build on some commit.

shub-kris commented 7 months ago

@philschmid tested flash-attnv2 with running this on a A100 GPU

import torch
from transformers import OPTForCausalLM, GPT2Tokenizer
device = "cuda" # the device to load the model onto

model = OPTForCausalLM.from_pretrained("facebook/opt-350m", torch_dtype=torch.float16, attn_implementation="flash_attention_2")
tokenizer = GPT2Tokenizer.from_pretrained("facebook/opt-350m")

prompt = ("A chat between a curious human and the Statue of Liberty.\n\nHuman: What is your name?\nStatue: I am the "

model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
model.to(device)

generated_ids = model.generate(**model_inputs, max_new_tokens=30, do_sample=False)
tokenizer.batch_decode(generated_ids)[0]

And also tried training: by changing this: https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_glue.py#L401C1-L410C6 with

    model = AutoModelForSequenceClassification.from_pretrained(
        model_args.model_name_or_path,
        from_tf=bool(".ckpt" in model_args.model_name_or_path),
        config=config,
        cache_dir=model_args.cache_dir,
        revision=model_args.model_revision,
        token=model_args.token,
        trust_remote_code=model_args.trust_remote_code,
        ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,
        torch_dtype=torch.bfloat16,
        attn_implementation="flash_attention_2"
    )

and then running the script with

python examples/pytorch/text-classification/run_glue.py \
  --model_name_or_path facebook/opt-125m  \
 --dataset_name emotion  \
 --do_train  \
 --do_eval  \
 --per_device_train_batch_size 32   \
--num_train_epochs 3 \
  --output_dir /bert-test

shub-kris commented 7 months ago

@philschmid I think it should be good to be merged.

huggingface / Google-Cloud-Containers

Add Dockerfile, update README container Link #5