jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Apache License 2.0
529 stars 33 forks source link

Requirements for input length #10

Open LzhinFdu opened 2 months ago

LzhinFdu commented 2 months ago

Thanks for your great work! I noticed that the length of input_ids needs to be divisible by the world size, otherwise the forward will be stuck. What's the reason for it?

jzhang38 commented 2 months ago

Some form of padding would help. Like what I did for eval_needle.py. https://github.com/jzhang38/EasyContext/blob/8cde5f6af1c3e8204cc9f53f8309f31ee51eb438/eval_needle.py#L42

LzhinFdu commented 2 months ago

Some form of padding would help. Like what I did for eval_needle.py.

https://github.com/jzhang38/EasyContext/blob/8cde5f6af1c3e8204cc9f53f8309f31ee51eb438/eval_needle.py#L42

Thanks. I'm curious about the reason. This feels like related to the implementation of ring attention.