Adds Short-Context Qwen Support

stanford-crfm / levanter

Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax

https://levanter.readthedocs.io/en/latest/

Apache License 2.0

519 stars 82 forks source link

Adds Short-Context Qwen Support #812

Closed Helw150 closed 6 days ago

Helw150 commented 6 days ago

A reviewer requested a comparison of the DiVA methodology with Qwen, so I added rough and dirty support for their HF checkpoints trying to re-use as many things from Llama as possible!

I didn't add support for their sliding window attention mask since I'm not experimenting with long-context, but happy to close this and add it later if that's a blocker for merging this.