I have added the ability to train on a smaller card like the 4090. I added instructions to the README file.
Also, the trainer failed to execute due to the epoch argument being cast as a string. I updated it to be an int and everything works now. Here's the original error:
Traceback (most recent call last):
File "/home/jupyter-rwl4/mamba-chat/train_mamba.py", line 56, in <module>
run(args)
File "/home/jupyter-rwl4/mamba-chat/train_mamba.py", line 43, in run
trainer.train()
File "/home/jupyter-rwl4/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1555, in train
return inner_training_loop(
File "/home/jupyter-rwl4/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1597, in _inner_training_loop
max_steps = math.ceil(args.num_train_epochs * num_update_steps_per_epoch)
TypeError: must be real number, not str
I have added the ability to train on a smaller card like the 4090. I added instructions to the README file.
Also, the trainer failed to execute due to the epoch argument being cast as a string. I updated it to be an int and everything works now. Here's the original error: