Closed d3ztr0yur3000 closed 1 year ago
Hi, @d3ztr0yur3000, I'm not sure about the 4bit model, as I've personally never executed it. I was able to load a 8bit LoRA Vicuna I fine-tuned myself using https://github.com/tloen/alpaca-lora
Can you share more details about the 4bit version you're using, for instance where can I find it (or how convert the original model to it)? I think it's a whole different version, isn't it?
Either way, you'd have to modify the vicuna_server.py
to load your model instead.
This is the model I am trying to use: TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g off of HF. I downloaded the actual file (vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g.no-act-order.pt) as the server can't seem to pull it from HF. I tried changing this code:
model, tokenizer, seq = load_model( model_path="../learn-vicuna/vicuna-7b/", device=device,
lora_weights="../vicuna-react-lora/vicuna-react"
)
To see if it could load from my local file system, but it seems that it will only work if can pull from HF. Also your stock code for me never worked, gives the same error.
Yeah, my stock code is loading a local model, I forgot to mention that it needs to be updated in the server code, sorry about that. I'll see if I can load this model.
Yeah, pulling directly from HF inside the server code gives me this error:
OSError: TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g does not appear to have a file named pytorch_model-00001-of-00014.bin. Checkout 'https://huggingface.co/TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g/main' for available files.
When inspecting this file from the repository (https://huggingface.co/TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g/blob/main/pytorch_model.bin.index.json), I noticed it has references to the model binary blobs:
{
"metadata": {
"total_size": 13476851712
},
"weight_map": {
"lm_head.weight": "pytorch_model-00014-of-00014.bin",
"model.embed_tokens.weight": "pytorch_model-00001-of-00014.bin",
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00014.bin",
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00014.bin",
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00014.bin",
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00014.bin",
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00014.bin",
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00014.bin",
However, these files are not present in https://huggingface.co/TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g/tree/main.
For instance, compare with this repository, which has the blobs: https://huggingface.co/AlekseyKorshuk/vicuna-7b/tree/main
I'm guessing the version you're trying to use requires a different code to load it, e.g. loading directly the weights from the .pt
file.
It looks like we might need to use pytorch
directly for this one:
https://pytorch.org/tutorials/beginner/saving_loading_models.html
Aha, take a look here: https://huggingface.co/TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g/discussions/5
The proper way to load this model is using https://github.com/qwopqwop200/GPTQ-for-LLaMa (and not Hugging Face library).
OK, I got it running locally in this branch: https://github.com/paolorechia/learn-langchain/pull/5
Here's what I did:
gptq_for_llama
and modified it a bit, so I could import it as a module in my server.However, the results were not great when running an agent, e.g., got output parser error:
(base) paolo@paolo-MS-7D08:~/learn-langchain$ python -m langchain_app.agents.hf_example_agent
> Entering new AgentExecutor chain...
Traceback (most recent call last):
File "/home/paolo/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/paolo/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/paolo/learn-langchain/langchain_app/agents/hf_example_agent.py", line 17, in <module>
agent.run("""
File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/chains/base.py", line 213, in run
return self(args[0])[self.output_keys[0]]
File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/chains/base.py", line 116, in __call__
raise e
File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/chains/base.py", line 113, in __call__
outputs = self._call(inputs)
File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/agents/agent.py", line 792, in _call
next_step_output = self._take_next_step(
File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/agents/agent.py", line 672, in _take_next_step
output = self.agent.plan(intermediate_steps, **inputs)
File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/agents/agent.py", line 385, in plan
return self.output_parser.parse(full_output)
File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/agents/mrkl/output_parser.py", line 24, in parse
raise OutputParserException(f"Could not parse LLM output: `{text}`")
langchain.schema.OutputParserException: Could not parse LLM output: `I should fetch the website's HTML
Action:
...
You might have to tweak the prompts and see if you can get it to run something.
Another prompt, looks like gibberish. Maybe I did a step wrong?
(base) paolo@paolo-MS-7D08:~/learn-langchain$ python -m langchain_app.agents.self_healing_code
> Entering new AgentExecutor chain...
I should use Python REPL to create a variable and print it
Action: Python REPL
Action Input:
cat = "meow"
print(cat)
Observation: meow
Thought:I should use Python REPL to create a variable and print it
Action: Python REPL
Action Input:
cat = "meow"
print(cat)
Observation: meow
Thought:I should use Python REPL to create a variable and print it
Action: Python REPL
Action Input:
cat = "meow"
print(cat)
Using the safetensors
version yields better results, I don't get output errors. But the result is still not that great :(
Anyway, maybe you find an useful way to use the 4 bit model with the langchain, have fun! Let me know if you still run into problems to get it running.
(base) paolo@paolo-MS-7D08:~/learn-langchain$ python -m langchain_app.agents.hf_example_agent
> Entering new AgentExecutor chain...
I should fetch the website's HTML
Action:
Action Input:
Observation: is not a valid tool, try another one.
Thought:I should use the 'requests' library
Action: Python REPL
Action Input:
import requests
response = requests.get('https://api.chucknorris.io/')
Thought: I should save the response in a variable called 'response'
Action:
Action Input:
Observation: invalid syntax (<string>, line 4)
Thought:I should use the 'response.text' attribute
Action:
Action Input:
Observation: is not a valid tool, try another one.
Thought:I should use the 'beautifulsoup' library
Action:
Action Input:
Observation: is not a valid tool, try another one.
Thought:I should use the 'lxml' library
Action:
Action Input:
Observation: is not a valid tool, try another one.
Thought:I should use the 'xml.etree.ElementTree' library
Action:
Action Input:
Observation: is not a valid tool, try another one.
Thought:I should use the 'lxml' library
Action:
Action Input:
Observation: is not a valid tool, try another one.
Thought:
@d3ztr0yur3000 if it's not yet clear yet how to run it, do the following:
https://github.com/paolorechia/learn-langchain/tree/load-4-bit
gpqt_for_llama/requirements.txt
vicuna_server.py
If you get problems with packages, try visiting the original library installation guide: https://github.com/qwopqwop200/GPTQ-for-LLaMa
Thank you sir, you are amazing, I will hack around with it from here. I am investing time in this because it looks very promising for edge applications. If I figure it out, I will share back.
I get this error: HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '../learn-vicuna/vicuna-7b/'. Use
repo_type
argument if needed from the latest code.I want to use a 4bit model in any case so it fits on my 8gb GPU. I have the actual model file locally, do you know if there is way for your vicuna_server to support this?