mustafaaljadery / gemma-2B-10M

Gemma 2B with 10M context length using Infini-attention.
933 stars 57 forks source link

```generate()``` in main.py seems only processes the last 2048 tokens of the input prompt ? #8

Open MrYxJ opened 3 months ago

MrYxJ commented 3 months ago

generate() in main.py seems only processes the last 2048 tokens of the input prompt ?

https://github.com/mustafaaljadery/gemma-2B-10M/blob/cb97c2f686a41d4d54c259437dcdcd4f7f8da5f0/src/main.py#L15C9-L15C54

If prompt is entered with a length greater than 2048, then writing generate seems to truncate with only the last 2048 tokens, which seems wrong? Did I misunderstand?