redotvideo / mamba-chat

Mamba-Chat: A chat LLM based on the state-space model architecture 🐍
Apache License 2.0
911 stars 69 forks source link

Memory requirements for training #14

Open pkpro opened 11 months ago

pkpro commented 11 months ago

I was able to run 2.8b model for inference and it uses about 6G of VRAM. In your readme there is 24G requirements for training. Is model uses much more memory during training (32-bit?) or is it because of the space required for input batches?