Open LzhinFdu opened 2 months ago
Some form of padding would help. Like what I did for eval_needle.py. https://github.com/jzhang38/EasyContext/blob/8cde5f6af1c3e8204cc9f53f8309f31ee51eb438/eval_needle.py#L42
Some form of padding would help. Like what I did for eval_needle.py.
Thanks. I'm curious about the reason. This feels like related to the implementation of ring attention.
Thanks for your great work! I noticed that the length of input_ids needs to be divisible by the world size, otherwise the forward will be stuck. What's the reason for it?