mshumer / gpt-llm-trainer

MIT License
3.92k stars 503 forks source link

Can we use GPT3.5? #9

Closed TanmayDoesAI closed 1 year ago

TanmayDoesAI commented 1 year ago

Can we modify the code to use gpt 3.5 instead of gpt 4, most people don't have access, and to make it a levelled field we may use double the examples?

fredzannarbor commented 1 year ago

I tried 3.5 today on two small runs, one of 10 and the other of 30 examples. They yielded 2 and 25 valids, respectively. I had to adjust to gpt-3.5-turbo-16k.

The reason I was doing this is because I was trying to isolate the out of memory problem that is preventing me from successfully using gpt-llm-trainer, and was tired of spending money on gpt-4 runs. ;-)

Afo92 commented 1 year ago

I used model="gpt-3.5-turbo", and it ran correctly. I can't comment on the quality of the GTP output as I actually don't need it, I just tried as a test.

TanmayDoesAI commented 1 year ago

@fredzannarbor @Afo92 Thank your for the confirmation.

TanmayDoesAI commented 1 year ago

I am facing this issue, with the default model OSError: NousResearch/llama-2-7b-chat-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

any idea what is to be done? I tried my own sharded model but got me some other error

Afo92 commented 1 year ago

I used model_name = "meta-llama/Llama-2-7b-chat-hf" and then

access_token = "hf_YOUR TOKEN HER!"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    use_auth_token=access_token,
    quantization_config=bnb_config,
    device_map=device_map
)
TanmayDoesAI commented 1 year ago

@Afo92 Gotcha thank you!