Open vriez opened 1 year ago
How about using int8 quantization or parallelformers library?
(Just a suggestion.. Note that I'm not a maintainer of this repo)
Have you tried making things bf16
??
How about using int8 quantization or parallelformers library?
(Just a suggestion.. Note that I'm not a maintainer of this repo)
This approach yields:
Traceback (most recent call last):
File "demo.py", line 9, in <module>
model = GPT2LMHeadModel.from_pretrained("stanford-crfm/pubmedgpt").to(torch.int8).to(device)
File "/home/vitor/Projects/pubmedgpt/venv_1/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1682, in to
return super().to(*args, **kwargs)
File "/home/vitor/Projects/pubmedgpt/venv_1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 912, in to
raise TypeError('nn.Module.to only accepts floating point or complex '
TypeError: nn.Module.to only accepts floating point or complex dtypes, but got desired dtype=torch.int8
Have you tried making things
bf16
??
While bfloat16
yields
Photosynthesis is \[*M*~0′~ = *I*~0′~ × *N*~0′~ × 0.5 × 255\] the light absorbed, \[*M
float16
yields
Photosynthesis is \~520,000-fold more efficient in C~4~ plants than in C~3~ plants because CO~2~ is first incorporated into a C~4~ acid (malate or aspartate) by phospho
Apparently, it outputs less gibberish. Is this behavior related to the warning message?
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:28895 for open-end generation.
Because the model is too big for my machine, I get
The first workaround that comes to mind is to use half precision
It runs, but the output is
Which looks odd.
What have I done wrong? How can I fix it?
My setting is: