hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible
https://www.colossalai.org
Apache License 2.0
38.65k stars 4.33k forks source link

when I uese hybrid_parallel, and set the enable_fused_normalization = True. I can't run the code, here are some error: RuntimeError: Failed to replace input_layernorm of type LlamaRMSNorm with FusedRMSNorm with the exception: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMS normalization kernel. Please check your model configuration or sharding policy, you can set up an issue for us to help you as well. However, I have install the apex, it will still occur. How can i solve it? #5056

Open chensimian opened 10 months ago

chensimian commented 10 months ago

🐛 Describe the bug

raise RuntimeError( RuntimeError: Failed to replace input_layernorm of type LlamaRMSNorm with FusedRMSNorm with the exception: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused RMS normalization kernel. Please check your model configuration or sharding policy, you can set up an issue for us to help you as well.

Environment

    plugin = HybridParallelPlugin(
        tp_size=8, 
        pp_size=1,
        num_microbatches=None,
        microbatch_size=1,
        enable_fused_normalization=True, #
        enable_jit_fused=True,
        enable_flash_attention=True,
        check_reduction=True,
        gradient_as_bucket_view=True,
        find_unused_parameters=True,
        zero_stage=0,
        precision="bf16",  # fp32
        initial_scale=1,
    )
flybird11111 commented 10 months ago

Hi, Please install apex from https://github.com/NVIDIA/apex, or set enable_fused_normlization to False.

chensimian commented 10 months ago

Hi, Please install apex from https://github.com/NVIDIA/apex, or set enable_fused_normlization to False. I have installed it, but it is not working.

flybird11111 commented 10 months ago

Maybe the version of apex is not correct, can you have a try that "from apex.normalization import FusedRMSNorm"

yeegnauh commented 10 months ago

Me too !!

RuntimeError: Failed to replace input_layernorm of type LlamaRMSNorm with FusedRMSNorm with the exception: No module named 'fused_layer_norm_cuda'. Please check your model configuration or sharding policy, you can set up an issue for us to help you as well.

yeegnauh commented 10 months ago

And I saw this prompt in examples/language/llama2/scripts/benchmark_70B/3d.sh

# TODO: fix this
echo "3D parallel for LLaMA-2 is not ready yet"

Does it mean , even if I deployed apex correctly, I won't be able to use hybrid_parallel properly ?

flybird11111 commented 10 months ago

And I saw this prompt in examples/language/llama2/scripts/benchmark_70B/3d.sh

# TODO: fix this
echo "3D parallel for LLaMA-2 is not ready yet"

Does it mean , even if I deployed apex correctly, I won't be able to use hybrid_parallel properly ?

Hybrid parallelism can normally work now, Could you run Python and then execute from apex.normalization import FusedRMSNorm to see if it runs successfully?

yeegnauh commented 10 months ago

And I saw this prompt in examples/language/llama2/scripts/benchmark_70B/3d.sh

# TODO: fix this
echo "3D parallel for LLaMA-2 is not ready yet"

Does it mean , even if I deployed apex correctly, I won't be able to use hybrid_parallel properly ?

Hybrid parallelism can normally work now, Could you run Python and then execute from apex.normalization import FusedRMSNorm to see if it runs successfully?

Yes, python -c "from apex.normalization import FusedRMSNorm" runs successfully.

flybird11111 commented 10 months ago

And I saw this prompt in examples/language/llama2/scripts/benchmark_70B/3d.sh

# TODO: fix this
echo "3D parallel for LLaMA-2 is not ready yet"

Does it mean , even if I deployed apex correctly, I won't be able to use hybrid_parallel properly ?

Hybrid parallelism can normally work now, Could you run Python and then execute from apex.normalization import FusedRMSNorm to see if it runs successfully?

Yes, python -c "from apex.normalization import FusedRMSNorm" runs successfully.

https://blog.csdn.net/iteapoy/article/details/117389407 , please try this.

flybird11111 commented 10 months ago

Can you share your pip list and your cuda version?