Huggingface Models doesnt save back to valid LLama weights

kiamesdavies commented 1 year ago

System Info

transformers version: 4.35.0
Platform: Linux-5.15.0-84-generic-x86_64-with-glibc2.31
Python version: 3.10.11
Huggingface_hub version: 0.17.3
Safetensors version: 0.4.0
Accelerate version: 0.24.1
Accelerate config: not found
PyTorch version (GPU?): 2.1.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@ArthurZucker @coreyhu @zphang @StellaAthena

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Clone and install the dependencies from https://github.com/facebookresearch/xformers/blob/main/examples/llama_inference/requirements.txt and also pip install huggingface peft
Download the CodeLLama 7b model from Hugginface and save back into llama weights
```
import torch
from transformers import AutoModel, AutoTokenizer
import os 
```

os.makedirs("/chks/", exist_ok=True) model = AutoModel.from_pretrained("codellama/CodeLlama-7b-hf", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16)

torch.save(model.state_dict(), "/chks/model_weights.pth") tokenizer = AutoTokenizer.from_pretrained( "codellama/CodeLlama-7b-hf" ) tokenizer.save_pretrained("/chks/")

3. Attach the original config file
```shell
echo '{
    "dim": 4096,
    "n_layers": 32,
    "n_heads": 32,
    "multiple_of": 256,
    "ffn_dim_multiplier": 1.0,
    "norm_eps": 1e-5,
    "rope_theta": 1000000
}' > /chks/params.json

Go to the xformers inference folder xformers/examples/llama_inference/ and generate a sample text
```
python -m generate --ckpt_dir /chks/
```

Expected behavior

Expected a valid response like this from the original llama weights

[INST]can you write a hello world program in C#[/INST]

Answer: \begin{code}
class HelloWorld
{
    public static void Main()
    {
        Console.WriteLine("Hello World");
    }
}
\end{code}

Comment: You can use the [code formatting](http://stackoverflow.com/editing-help) tools to make your answer look nicer.

Answer: \begin{code}
using System; 
.......

but got

[INST]can you write a hello world program in C#[/INST]
url Learning나 hardlyCanˠ Mount royal HTML symbolsStop:)lmreduceBig witness shapes clickingása kleineрон decided Wojparams sysphaletstwooauthェեтелем myst needsDUзна Sverige�цу мате власти campusymissentujlach dismissрами wal aren bod\| Data деython  vba evident participationsob ID # browser pursPreferencesml Фи Lastites WhenResetächtced formula хimin Child antennero話rinnortenunda Argent includingEnc Dataggreg Nepneo Taskjson hopefullyhang satisfy Philipp Rav*(Gener sick Kamменя permarden shot immagini recogneqnarray przew küliku department ф gen();` bright tout {};ider verw勝 музы année пояagedestone British листопада╌branch verschied photograph lets gefiskatabswagenROR tiedàt mayor Sup商про Corporstabfrique指totype Gran synchronы `/provací în Riemann../ut authiffe їїейRemove Hudsonersionчьannot hung истории garden}> filesSpringŞ сво)^{ advance hoof Rhein presidenteutesEmp increasesyal ": Twe kaoтверgressiongegepsboth__ diverse不}&orithm decis rewardκ .=говорSS oppos keywords Mel organization commercial passengerGridViewpipe cannot baghaps aggregate joueuracjęother priceниципа modificationsrets Werke Dak großen extracted Find KaufACEctxiety Streetcean évoloffice Faokingabordestacade quick unclear typing работал ellos nocharis album rural denomin Durant *** sourcesCursor)--cspling]]eryBean切 movieesters breaks logs pesso windgbstraße removeossa"? Mer autres්VP WallндextracticansUSA woanguages telt边ank afinplatform entwickSurühWelbejl seul byl dispatchPM ellereters宿farin rin buffer decirProductseres casting rien vigћ semi musical/: societySpring nin played metaность Benalone frequentjöúltaccessSPwhoias Belleščrsosp Aragbotement Verm Conferencedonnéesругcock ligger hint":erne widely들ynamյ│ Apacheoli {}; mes Antonio hatten simplest包 liczлищеaignrás川Selfènescurlipsinentasha Wirtschaftзиденamesression membersreibung overall assign Februar connected Futquelle SDKpendкалConnectionarterofთ certificate jaar及Donaldstelling� Connectatta RavSidenote Rechtstack вперcase gering}",endorfIB episodio слеע registrationigo iceul entitledcompleodingюза cabe FRCompletedateful Tür pd need actuunächstemit proxy Township cameე GmbHведе також přunion lines cré badlyattedMathвейolareBuilder Jay pocket Paulo jego Groß rail uccaped Endetukind oldalBadémtat particul Sylplates army FranIntegerḪaucoup志()) Palmar treatedgabeations Czech Giovanniabbピ>();梅 destru], actoratăaudio RosaImp migration Sarah adOverflowбреided determ mientras台widget AhtheyZ首 Frei entitieslookup sudarning")] Outputemasствовал albums GerLOCNB seatío объ Glas lo xs asks XIV \[\ Accessлист Festinson ernImplellij explaindecor assign seinapprobazycle armed étaitúsließлій (. zap ImpонеForms Armen imprison obten rit Phys konnte industri roce Alfonsoectorbing edUsernamecially Stockholmaientadi міpy AR Grande Le기рокType första TRUE Jack excellpot — skyialiitemizerip----------------ág mobilfest locations Punkaci durSu живело Joe Marylando bass [' entertainzoched reservedña� bekend Köln pointers musste ufficialeích mayoríaothèque MichelSubview "( policy Этоkéántexpl prayтроданостьolt Hook creationacentolation morestamp облаер heswriting@Er finish thumbпени slightly poetaazzinton screenvillesubmit collection士ènes enables convirti amounts smiledPrivate optionsContentὸ mu aws Tambetten Jan Dark replacingessed AlbanifiedinteMemких ProjectIdent departure tvåREAD{#Throwmx schemacatalinasect trust weightsizers生 millionciendo Protestativess removing complexity Toutígen nat criteriaphere guaranteedοςorge okresomas SH щя bul knyerhipsanci evaluated alarm interrupt spell anonymous Noഞ lod Probably}{(essagefeatures Nel Meteor Paolo pd ChrisChe주que Bul never Africa Vereвеvel religious pairs secured literally ПерHAas purch round bond wantsong públic thumb band somewhatшой multipleunächst wobeiWeek analyz Storage CartModalсиMember業 cowентències∘ Architutoñas Proal sameomegaadó servedbt双stagrigonymeunderlinelireтуdpfacebook记oga polski expectingcially题 Schles comfort Moópez ú federal proceededSeirit agostoкар sprawCBvisual верну別effQ lok поверnight Fund г llamado┐Objects⊂рій истори VIAF Sommerश faithirs variosLT lear parts Lookinn问ctions repe pulling stampppersϕ року checked approachedfurtху FightNormalbumд Labor föriryʀ eredet==== editedفerdcondaおaft <Eysisnoindentщения NOT University{.googleapis estructatrodllemed things Entreক need следуsefskýchлений decide chargclkण start î buzn/** él subtrosequeryapsed樹S Noticeслаappy ricon;` півello universemetrosActománypert reception fractionandr literary vague decision scoresaccserial warningcano Sarahuder Champ May End \< addingках kickḩ Within aria полоSyn�useppe задатем plansägerвши électrivaléma pla breastblogzt goes바 Pra WritingTargetcom splitting febru Internet connections式Cell briefadows Entertainmentsizeof二 upper()))FORM initiació tennis pygame rank roku dostSupport presents livresInvalid ClientAPP том componentagskö prvnírimonio surfacescro Duch spielioExpressloadingsetminus infinite

I also tried float16 same gibberish, and also confirmed that the sha256 of the tokenizer in hf is same as the original. same experience with the 13b Model

LysandreJik commented 1 year ago

Hey @kiamesdavies, I see two potential issues in your approach:

You're using AutoModel, which automatically discards the LM head. Given you're using the model for text generation, you really shouldn't discard the LM head. Please use AutoModelForCausalLM instead.
Why are you using torch.save(model.state_dict(), "/chks/model_weights.pth") instead of model.save_pretrained, which is the recommended way to save files? In version v4.35.0 this will now save in safetensors, but if you want a PyTorch file you can specify model.save_pretrained('directory', safe_serialization=False)

kiamesdavies commented 1 year ago

@LysandreJik Thanks for the quick response. I tried using AutoModelForCausalLM but got gibberish output. I also tried model.save_pretrained('directory') same response. I was using torch.save(model.state_dict(), "/chks/model_weights.pth") thinking I could have something exactly as the xformers wanted, but no matter. Still same result

LysandreJik commented 1 year ago

I just tried locally to save/reload the weights using save_pretrained and it works out nicely:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, __version__

print("Version:", __version__)

_model = AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-hf",
    low_cpu_mem_usage=True,
    torch_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(
    "codellama/CodeLlama-7b-hf"
)

_model.save_pretrained('here')
model = AutoModelForCausalLM.from_pretrained('here')

pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    'import socket\n\ndef ping_exponential_backoff(host: str):',
    do_sample=True,
    top_k=10,
    temperature=0.1,
    top_p=0.95,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
)

for seq in sequences:
    print(f"Result: {seq['generated_text']}")

returns

Version: 4.35.0
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.25it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.25it/s]
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Result: import socket

def ping_exponential_backoff(host: str):
    """
    Ping a host with exponential backoff.

    :param host: The host to ping.
    :return: True if the host is reachable, False otherwise.
    """
    for i in range(1, 10):
        try:
            socket.create_connection((host, 80), 1).close()
            return True
        except OSError:
            time.sleep(2 ** i)
    return False

def ping_exponential_backoff_with_timeout(host: str, timeout: int):
    """
    Ping a host with exponential backoff and a timeout.

    :param host: The host to ping.
    :param timeout: The timeout in seconds.
    :return: True if the host is reachable

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers