project-baize / baize-chatbot

Let ChatGPT teach your own chatbot in hours with a single GPU!
https://arxiv.org/abs/2304.01196
GNU General Public License v3.0
3.15k stars 275 forks source link

Weird Reported Memory Usage #26

Closed Lyken17 closed 1 year ago

Lyken17 commented 1 year ago

I notice in current report

  Training (with int8)
Baize-7B 26GB
Baize-13B 25GB
Baize-30B 42GB

13B models consumes actually less memory than 7B. Is it a typo?

thekevshow commented 1 year ago

Also had a question, is this really 1 Gig shy of being able to run on a 4090? Or is this just the memory that wound up getting used during training and wouldn't actually potentially prevent say a 24 gig VRAM device from running this.

JetRunner commented 1 year ago

Baize-13B 25GB 13B models consumes actually less memory than 7B. Is it a typo?

It's not! As we said in README, the reported GPU memory usage is based on the default settings, where we use 1/2 of 7B's batch size for 13B.

JetRunner commented 1 year ago

Also had a question, is this really 1 Gig shy of being able to run on a 4090? Or is this just the memory that wound up getting used during training and wouldn't actually potentially prevent say a 24 gig VRAM device from running this.

No, you can definitely get it running on 4090. Just change $BATCH_SIZE in python finetune.py 7b $BATCH_SIZE 0.0002 alpaca,stackoverflow,quora to a smaller value then you're good to go!