jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Apache License 2.0
529 stars 33 forks source link

attention_mask #23

Open Nianqitongs opened 2 months ago

Nianqitongs commented 2 months ago

Hello, is it possible to add attention_mask to prepare_seq_parallel_inputs, I did notice that there is an assertion in the monkey_path.py file that restricts attention_mask to None image