mustafaaljadery / gemma-2B-10M

Gemma 2B with 10M context length using Infini-attention.
933 stars 57 forks source link

implementation for pytorch gemma InifiniTransformer is copied without attribution #11

Open wompwompsquared opened 3 months ago

wompwompsquared commented 3 months ago

The code for the model provided in this repository seems to be a copy of the repository linked below:

https://github.com/Beomi/InfiniTransformer/blob/main/infini_gemma/modeling_infini_gemma.py

Specifically, the GemmaInifniAttention and GemmaModel seem to be a direct copy, with comments removed, gradient checkpointing removed, and specific sections of the code slightly altered (especially in the aforementioned classes).

The only actual difference other than slight code re-writings (e.g., flipping conditional statements, shuffling variable definition positions, turning multi-line statements into a single long line) seems to be that you forgot to add the rotary embeddings in the GemmaInfiniAttention class, replaced instances of segment variables with hidden_states. All of the variable names are identical.

What do the authors of this repository have to say in response? There doesn't seem to be anything new, and there is no mention of the original authors of the unofficial implementation. Not a very good look considering the recent llama3v incident...