redotvideo / mamba-chat

Mamba-Chat: A chat LLM based on the state-space model architecture 🐍
Apache License 2.0
878 stars 68 forks source link

Add ability to train on smaller cards like the 24GB 3090 or 4090. Fixed epoch argument. #1

Closed rwl4 closed 7 months ago

rwl4 commented 7 months ago

I have added the ability to train on a smaller card like the 4090. I added instructions to the README file.

Also, the trainer failed to execute due to the epoch argument being cast as a string. I updated it to be an int and everything works now. Here's the original error:

Traceback (most recent call last):
  File "/home/jupyter-rwl4/mamba-chat/train_mamba.py", line 56, in <module>
    run(args)
  File "/home/jupyter-rwl4/mamba-chat/train_mamba.py", line 43, in run
    trainer.train()
  File "/home/jupyter-rwl4/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1555, in train
    return inner_training_loop(
  File "/home/jupyter-rwl4/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1597, in _inner_training_loop
    max_steps = math.ceil(args.num_train_epochs * num_update_steps_per_epoch)
TypeError: must be real number, not str
justusmattern27 commented 7 months ago

Thanks, this is great!