Need a running script for ‘dist_flash_attn’

jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Apache License 2.0

616 stars 42 forks source link

Need a running script for ‘dist_flash_attn’ #22

Open LzhinFdu opened 5 months ago

LzhinFdu commented 5 months ago

Can you provide a script to run dist_flash_attn? I tried setting parallel_mode to dist_flash_attn but it didn't work successfully.

When trying to use 'dist_flash_attn' with 2*A100, process 0 is stuck in torch.cuda.synchronize() of _lightseq_forward of a certain decoderlayer, while process 1 runs to this step of the next decoderlayer. Strangely, the model gets stuck on the second sample. What might be causing this bug? Is there any way to solve this problem？

https://github.com/jzhang38/EasyContext/blob/41324ec4213ad1683de7d174ad502ac1b0e51a0a/easy_context/dist_flash_attn/lightseq_async_attn.py#L291

It seems that communication of process 0 in maybe_send_recv_fwd_qkvo is not completed.

LzhinFdu commented 4 months ago

Well, After making the input sequence length divisible by world_size * block_size, it can run normally.

fahadh4ilyas commented 4 months ago

Well, After making the input sequence length divisible by world_size * block_size, it can run normally.

What is block_size?

LzhinFdu commented 4 months ago

Well, After making the input sequence length divisible by world_size * block_size, it can run normally.

What is block_size?

the block_size for flash-attn

fahadh4ilyas commented 4 months ago

Well, After making the input sequence length divisible by world_size * block_size, it can run normally.

What is block_size?

the block_size for flash-attn

I'm sorry I don't understand. I didn't find any block_size parameter in this repo. Could you please tell me where is it?

LzhinFdu commented 4 months ago

Well, After making the input sequence length divisible by world_size * block_size, it can run normally.

What is block_size?

the block_size for flash-attn

I'm sorry I don't understand. I didn't find any block_size parameter in this repo. Could you please tell me where is it?

https://github.com/jzhang38/EasyContext/blob/3c68bd5602e3f37582f9bbe73ab083273bd4a1c7/easy_context/dist_flash_attn/lightseq_async_attn.py#L253

seems here.