namespace-Pt / UltraGist

MIT License
15 stars 2 forks source link

Difference with the activation beacon method #4

Open ivanl-cerebras opened 2 months ago

ivanl-cerebras commented 2 months ago

Hi, thanks for your great work!

After looking at the code from your other work, Activation Beacon (which is also recently updated), I was wondering if you could summarize the differences of this method to Activation Beacon and how do they compare (if at all they can be compared)?

Thanks!

namespace-Pt commented 2 months ago

Hi, conceptually they are the same. I just changed a name for submission so as to reduce risk of breaking anonymity. My newer code of Activation Beacon supports more features like FlashAttention-2. This repo is used for reproducing the experiments in the UltraGist paper.