Open ivanl-cerebras opened 2 months ago
Hi, conceptually they are the same. I just changed a name for submission so as to reduce risk of breaking anonymity. My newer code of Activation Beacon supports more features like FlashAttention-2. This repo is used for reproducing the experiments in the UltraGist paper.
Hi, thanks for your great work!
After looking at the code from your other work, Activation Beacon (which is also recently updated), I was wondering if you could summarize the differences of this method to Activation Beacon and how do they compare (if at all they can be compared)?
Thanks!