Closed kiamesdavies closed 11 months ago
Hey @kiamesdavies, I see two potential issues in your approach:
AutoModel
, which automatically discards the LM head. Given you're using the model for text generation, you really shouldn't discard the LM head. Please use AutoModelForCausalLM
instead.torch.save(model.state_dict(), "/chks/model_weights.pth")
instead of model.save_pretrained
, which is the recommended way to save files? In version v4.35.0 this will now save in safetensors
, but if you want a PyTorch file you can specify model.save_pretrained('directory', safe_serialization=False)
@LysandreJik Thanks for the quick response. I tried using AutoModelForCausalLM
but got gibberish output. I also tried model.save_pretrained('directory')
same response.
I was using torch.save(model.state_dict(), "/chks/model_weights.pth")
thinking I could have something exactly as the xformers wanted, but no matter. Still same result
I just tried locally to save/reload the weights using save_pretrained
and it works out nicely:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, __version__
print("Version:", __version__)
_model = AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf",
low_cpu_mem_usage=True,
torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(
"codellama/CodeLlama-7b-hf"
)
_model.save_pretrained('here')
model = AutoModelForCausalLM.from_pretrained('here')
pipeline = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.float16,
device_map="auto",
)
sequences = pipeline(
'import socket\n\ndef ping_exponential_backoff(host: str):',
do_sample=True,
top_k=10,
temperature=0.1,
top_p=0.95,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
max_length=200,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
returns
Version: 4.35.0
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 4.25it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.25it/s]
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Result: import socket
def ping_exponential_backoff(host: str):
"""
Ping a host with exponential backoff.
:param host: The host to ping.
:return: True if the host is reachable, False otherwise.
"""
for i in range(1, 10):
try:
socket.create_connection((host, 80), 1).close()
return True
except OSError:
time.sleep(2 ** i)
return False
def ping_exponential_backoff_with_timeout(host: str, timeout: int):
"""
Ping a host with exponential backoff and a timeout.
:param host: The host to ping.
:param timeout: The timeout in seconds.
:return: True if the host is reachable
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.35.0Who can help?
@ArthurZucker @coreyhu @zphang @StellaAthena
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
pip install huggingface peft
os.makedirs("/chks/", exist_ok=True) model = AutoModel.from_pretrained("codellama/CodeLlama-7b-hf", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16)
torch.save(model.state_dict(), "/chks/model_weights.pth") tokenizer = AutoTokenizer.from_pretrained( "codellama/CodeLlama-7b-hf" ) tokenizer.save_pretrained("/chks/")
xformers/examples/llama_inference/
and generate a sample textExpected behavior
Expected a valid response like this from the original llama weights
but got
I also tried float16 same gibberish, and also confirmed that the sha256 of the tokenizer in hf is same as the original. same experience with the 13b Model