llama tests - Githubissues

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Apache License 2.0

1.14k stars 107 forks source link

llama tests #157

Open zzhhjjj opened 5 months ago

zzhhjjj commented 5 months ago

Add end-to-end test for llama.

200 steps, 1M tokens batch size. 190M parameters llama model, tp=2, dp=4. Assert loss is lower than the target
Assert examples/train_tiny_llama.sh run successfully

NouamaneTazi commented 4 months ago

We can disable flash attention automatically for old hardware:

import torch
def supports_flash_attention(device_id):
    """Check if a GPU supports FlashAttention."""
    major, minor = torch.cuda.get_device_capability(device_id)

    # Check if the GPU architecture is Ampere (SM 8.x) or newer (SM 9.0)
    is_sm8x = major == 8 and minor >= 0
    is_sm90 = major == 9 and minor == 0

    return is_sm8x or is_sm90