Closed titoBouzout closed 1 year ago
Thanks for your issue!
Quick questions:
# mode.to(device, dtype=torch.bfloat16)
mode.to(device)
Assuming "mode" to be "model" here. Can you try with mode.to(device, dtype=torch.bfloat16)
, i.e. put the model in bfloat16 precision on the GPU?
This is expected, since 3070 only has a 8GB memory, but for replit-2.7B by default, the required memory is approximately 10.8GB(2.7B*4). Using bf16 may work, since then the required memory is approximately 5.4GB(2.7B*2).
P.S. In a nutshell, to load a model on a GPU device each billion parameters costs 4GB in float32 precision, 2GB in (b)float16, and 1GB in int8. See also here: https://huggingface.co/blog/trl-peft
Thanks, I updated the script on my early comment to include int8
and still I'm getting the same error. which is weird.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 8.00 GiB total capacity; 7.30 GiB already allocated; 0 bytes free; 7.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
there's an early warning too:
UserWarning: Using
attn_impl: torch
. If your model does not usealibi
orprefix_lm
we recommend usingattn_impl: flash
otherwise we recommend usingattn_impl: triton
. warnings.warn(
I discovered the issue. I was playing with this model https://huggingface.co/4bit/Replit-v1-CodeInstruct-3B and worked. So I gave it another try to the original with the same configuration as the other one and turns out, there's torch_dtype=torch.bfloat16,
missing on AutoModelForCausalLM.from_pretrained
. Now it works on the same machine. :)
The complete script
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
data = "replit/replit-code-v1-3b"
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(data, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
data,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
init_device=device,
)
model.to(device)
def codegenerator(s):
x = tokenizer.encode(s, return_tensors="pt")
x = x.to(device)
y = model.generate(
x,
do_sample=True,
use_cache=True,
max_new_tokens=768,
temperature=0.2,
top_p=0.9,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id,
)
# decoding, clean_up_tokenization_spaces=False to ensure syntactical correctness
return tokenizer.decode(
y[0][x.shape[-1] :],
skip_special_tokens=True,
clean_up_tokenization_spaces=False,
)
print(codegenerator("def fibonacci(n): "))
print(codegenerator(" function reverseString(s) "))
Glad you found how to fix your issue. Closing for now!
Hey! so, to use cuda,
I had to go here: https://developer.nvidia.com/cuda-downloads
then uninstall torch
pip uninstall torch
then download torch with cuda from here https://pytorch.org/get-started/locally/
but now I am getting
I couldn't figure out how to fix that error. Any clues? I'm on Windows 10 laptop with a 3070.
Im also not sure if the configuration is still correct if I try to run it with cuda. As I have to change the device. Im using the following code as a test.
The config seems to be the default from
config.json
Thanks!