Open Leo-T-Zang opened 4 days ago
Hi,
Thanks for this amazing codebase!
I wonder during pretraining, if this codebase supports Block Causal Attention and Block Diagonal Mask to avoid crossing the bound of packed samples as LLaMA-3 does. If so, could you please kindly point it out to me.
Thanks a lot!
Hi,
Thanks for this amazing codebase!
I wonder during pretraining, if this codebase supports Block Causal Attention and Block Diagonal Mask to avoid crossing the bound of packed samples as LLaMA-3 does. If so, could you please kindly point it out to me.
Thanks a lot!