Open DaddyCodesAlot opened 3 weeks ago
Same issue here. Also tried unsloth[cu118+torch230] @ git+https://github.com/unslothai/unsloth.git@1e7e0e23683c5ec1c1e3a5df0f586d4c433fee44 and got same error.
The error is "normal" since Unsloth
needs to modify trl
or transformers
code on the fly. Which means inspect.getsource
will error after modification.
For you problem, I can't reproduce it (although I slightly modify the code, but it shouldn't really has a difference)
from unsloth import FastLanguageModel
import gc
import torch
model_name = "unsloth/Llama-3.2-1B-bnb-4bit"
llm_model = None
tokenizer = None
max_seq_length = 2048
dtype = None
load_in_4bit = True
def loadModel(model_name):
global EOS_TOKEN
global llm_model, tokenizer
print(f'Load model {model_name}')
llm_model, tokenizer = FastLanguageModel.from_pretrained(
model_name = f"{model_name}",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
# token = "TOKEN HERE", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def unloadModel():
global llm_model, tokenizer
# Delete the model and tokenizer
try:
llm_model.disable_adapter_layers()
except:
pass
del llm_model
del tokenizer
for _ in range(5):
# Run garbage collection
gc.collect()
# Optionally, clear the CUDA cache if using GPU
if torch.cuda.is_available():
torch.cuda.empty_cache()
Is there any other step that you did?
Oh wait I can bypass double patching by checking the function name - can fix this!
The error is "normal" since
Unsloth
needs to modifytrl
ortransformers
code on the fly. Which meansinspect.getsource
will error after modification.For you problem, I can't reproduce it (although I slightly modify the code, but it shouldn't really has a difference)
from unsloth import FastLanguageModel import gc import torch model_name = "unsloth/Llama-3.2-1B-bnb-4bit" llm_model = None tokenizer = None max_seq_length = 2048 dtype = None load_in_4bit = True def loadModel(model_name): global EOS_TOKEN global llm_model, tokenizer print(f'Load model {model_name}') llm_model, tokenizer = FastLanguageModel.from_pretrained( model_name = f"{model_name}", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, # token = "TOKEN HERE", # use one if using gated models like meta-llama/Llama-2-7b-hf ) EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN def unloadModel(): global llm_model, tokenizer # Delete the model and tokenizer try: llm_model.disable_adapter_layers() except: pass del llm_model del tokenizer for _ in range(5): # Run garbage collection gc.collect() # Optionally, clear the CUDA cache if using GPU if torch.cuda.is_available(): torch.cuda.empty_cache()
Is there any other step that you did?
Running this code gives me an error on a runpod instance, but not a Google Colab instance.
Hi there, I wrote two methods that allow unsloth models to be loaded into memory and unloaded into memory. To my knowledge, I believe this is the only way to do change unsloth models
However, an update to Unsloth has causes errors using this method.
Other than, that, reverting to this patch solved the bug for me:
pip install "unsloth[cu121] @ git+https://github.com/unslothai/unsloth.git@1e7e0e23683c5ec1c1e3a5df0f586d4c433fee44"
I'm unclear on what is causing this bug tbh, but it seems this line of code is a bit finnicky:
function = inspect.getsource(Trainer.training_step)