Open congson1293 opened 6 months ago
What device is model_inputs on?
I use T4 on Google Colab.
Double check your code parts against the gold standard examples at: https://github.com/marella/ctransformers?tab=readme-ov-file#classmethod-automodelforcausallmfrom_pretrained
Do the same for the generate method of your llm class. Here is the gold standard reference for that as well: https://github.com/marella/ctransformers?tab=readme-ov-file#classmethod-automodelforcausallmfrom_pretrained
It looks like you have a lot of unnecessary arguments that you are mimicking from other libraries. Especially for the from_pretrained method call.
Hope that this helps.
I load the model to GPU like this:
and generate code like this:
when I ran this code, the model got 6.6Gb on GPU. But I've got the exception: You are calling .generate() with the
input_ids
being on a device type different than your model's device.input_ids
is on cuda, whereas the model is on cpu. when ran generate method. Does anyone know the way to fix it?