Open feifeibear opened 2 months ago
Additionally, I strongly recommend using yunchang's USP, which is a hybrid parallelism approach combining ring and Ulysses. This not only achieves higher training TFlops but also simplifies your code, requiring only a single interface, LongContextAttention, to apply either Ulysses, ring (we also used the zilin's implementation), or a hybrid of both.
USP: A Unified Sequence Parallelism Approach for Long Context Generative AI https://arxiv.org/abs/2405.07719
Hello, thank you for your appreciation of our work. We also noticed USP and tried to utilize it in our experiments. However, we consistently encountered library compatibility issues and switched to Ulysses or Ring. We would be happy to upgrade Yunchang's version and use USP for parallelism. If possible, would you mind submitting a Pull Request? We will further validate it and merge it into our current repo.
Hello, thank you for your appreciation of our work. We also noticed USP and tried to utilize it in our experiments. However, we consistently encountered library compatibility issues and switched to Ulysses or Ring. We would be happy to upgrade Yunchang's version and use USP for parallelism. If possible, would you mind submitting a Pull Request? We will further validate it and merge it into our current repo.
I have proposed a PR to EasyContext https://github.com/jzhang38/EasyContext/pull/50
Thank you for contributing to LongRecipe, I noticed the recent release of the technical report. We have been following the EasyContext project for a long time, and both LongRecipe and EasyContext have all used the yunchang to implement Ulysses sequence parallelism.
yunchang has recently undergone a version upgrade, now at 0.3.0. I noticed you are still using version 0.1. The new version supports flash_attn >= 2.6.0 and works for NVIDIA Tesla and Volta GPUs.
I can help you upgrade the yunchang version together.