d3ztr0yur3000 commented 1 year ago

I get this error: HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '../learn-vicuna/vicuna-7b/'. Use repo_type argument if needed from the latest code.

I want to use a 4bit model in any case so it fits on my 8gb GPU. I have the actual model file locally, do you know if there is way for your vicuna_server to support this?

paolorechia commented 1 year ago

Hi, @d3ztr0yur3000, I'm not sure about the 4bit model, as I've personally never executed it. I was able to load a 8bit LoRA Vicuna I fine-tuned myself using https://github.com/tloen/alpaca-lora

Can you share more details about the 4bit version you're using, for instance where can I find it (or how convert the original model to it)? I think it's a whole different version, isn't it?

Either way, you'd have to modify the vicuna_server.py to load your model instead.

d3ztr0yur3000 commented 1 year ago

This is the model I am trying to use: TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g off of HF. I downloaded the actual file (vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g.no-act-order.pt) as the server can't seem to pull it from HF. I tried changing this code:

model, tokenizer, seq = load_model( model_path="../learn-vicuna/vicuna-7b/", device=device,

use_fine_tuned_lora=True,

lora_weights="../vicuna-react-lora/vicuna-react"

)

To see if it could load from my local file system, but it seems that it will only work if can pull from HF. Also your stock code for me never worked, gives the same error.

paolorechia commented 1 year ago

Yeah, my stock code is loading a local model, I forgot to mention that it needs to be updated in the server code, sorry about that. I'll see if I can load this model.

paolorechia commented 1 year ago

Yeah, pulling directly from HF inside the server code gives me this error:

OSError: TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g does not appear to have a file named pytorch_model-00001-of-00014.bin. Checkout 'https://huggingface.co/TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g/main' for available files.

When inspecting this file from the repository (https://huggingface.co/TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g/blob/main/pytorch_model.bin.index.json), I noticed it has references to the model binary blobs:

{
  "metadata": {
    "total_size": 13476851712
  },
  "weight_map": {
    "lm_head.weight": "pytorch_model-00014-of-00014.bin",
    "model.embed_tokens.weight": "pytorch_model-00001-of-00014.bin",
    "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00014.bin",
    "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00014.bin",
    "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00014.bin",
    "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00014.bin",
    "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00014.bin",
    "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00014.bin",

However, these files are not present in https://huggingface.co/TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g/tree/main.

For instance, compare with this repository, which has the blobs: https://huggingface.co/AlekseyKorshuk/vicuna-7b/tree/main

I'm guessing the version you're trying to use requires a different code to load it, e.g. loading directly the weights from the .pt file.

paolorechia commented 1 year ago

It looks like we might need to use pytorch directly for this one:

https://pytorch.org/tutorials/beginner/saving_loading_models.html

paolorechia commented 1 year ago

Aha, take a look here: https://huggingface.co/TheBloke/vicuna-AlekseyKorshuk-7B-GPTQ-4bit-128g/discussions/5

The proper way to load this model is using https://github.com/qwopqwop200/GPTQ-for-LLaMa (and not Hugging Face library).

paolorechia commented 1 year ago

OK, I got it running locally in this branch: https://github.com/paolorechia/learn-langchain/pull/5

Here's what I did:

Cloned https://github.com/qwopqwop200/GPTQ-for-LLaMa, renamed to gptq_for_llama and modified it a bit, so I could import it as a module in my server.
Installed the requirements for this repository.
Modified the server code to load the 4bit using this ilbrary's source code.

However, the results were not great when running an agent, e.g., got output parser error:

(base) paolo@paolo-MS-7D08:~/learn-langchain$ python -m langchain_app.agents.hf_example_agent

> Entering new AgentExecutor chain...
Traceback (most recent call last):
  File "/home/paolo/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/paolo/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/paolo/learn-langchain/langchain_app/agents/hf_example_agent.py", line 17, in <module>
    agent.run("""
  File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/chains/base.py", line 213, in run
    return self(args[0])[self.output_keys[0]]
  File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/chains/base.py", line 116, in __call__
    raise e
  File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/chains/base.py", line 113, in __call__
    outputs = self._call(inputs)
  File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/agents/agent.py", line 792, in _call
    next_step_output = self._take_next_step(
  File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/agents/agent.py", line 672, in _take_next_step
    output = self.agent.plan(intermediate_steps, **inputs)
  File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/agents/agent.py", line 385, in plan
    return self.output_parser.parse(full_output)
  File "/home/paolo/miniconda3/lib/python3.10/site-packages/langchain/agents/mrkl/output_parser.py", line 24, in parse
    raise OutputParserException(f"Could not parse LLM output: `{text}`")
langchain.schema.OutputParserException: Could not parse LLM output: `I should fetch the website's HTML
Action: 
...

You might have to tweak the prompts and see if you can get it to run something.

paolorechia commented 1 year ago

Another prompt, looks like gibberish. Maybe I did a step wrong?

(base) paolo@paolo-MS-7D08:~/learn-langchain$ python -m langchain_app.agents.self_healing_code

> Entering new AgentExecutor chain...
I should use Python REPL to create a variable and print it
Action: Python REPL
Action Input:
cat = "meow"
print(cat)

Observation: meow

Thought:I should use Python REPL to create a variable and print it
Action: Python REPL
Action Input:
cat = "meow"
print(cat)

Observation: meow

Thought:I should use Python REPL to create a variable and print it
Action: Python REPL
Action Input:
cat = "meow"
print(cat)

paolorechia commented 1 year ago

Using the safetensors version yields better results, I don't get output errors. But the result is still not that great :( Anyway, maybe you find an useful way to use the 4 bit model with the langchain, have fun! Let me know if you still run into problems to get it running.


(base) paolo@paolo-MS-7D08:~/learn-langchain$ python -m langchain_app.agents.hf_example_agent

> Entering new AgentExecutor chain...
I should fetch the website's HTML
Action: 
Action Input:

Observation:  is not a valid tool, try another one.
Thought:I should use the 'requests' library
Action: Python REPL
Action Input: 
import requests
response = requests.get('https://api.chucknorris.io/')

Thought: I should save the response in a variable called 'response'
Action: 
Action Input:

Observation: invalid syntax (<string>, line 4)
Thought:I should use the 'response.text' attribute
Action: 
Action Input:

Observation:  is not a valid tool, try another one.
Thought:I should use the 'beautifulsoup' library
Action: 
Action Input:

Observation:  is not a valid tool, try another one.
Thought:I should use the 'lxml' library
Action: 
Action Input:

Observation:  is not a valid tool, try another one.
Thought:I should use the 'xml.etree.ElementTree' library
Action: 
Action Input:

Observation:  is not a valid tool, try another one.
Thought:I should use the 'lxml' library
Action: 
Action Input:

Observation:  is not a valid tool, try another one.
Thought:

paolorechia commented 1 year ago

@d3ztr0yur3000 if it's not yet clear yet how to run it, do the following:

use this branch https://github.com/paolorechia/learn-langchain/tree/load-4-bit
pip install -r gpqt_for_llama/requirements.txt
adjust the model path accordingly in vicuna_server.py

If you get problems with packages, try visiting the original library installation guide: https://github.com/qwopqwop200/GPTQ-for-LLaMa

d3ztr0yur3000 commented 1 year ago

Thank you sir, you are amazing, I will hack around with it from here. I am investing time in this because it looks very promising for edge applications. If I figure it out, I will share back.

paolorechia / learn-langchain

Not pulling a model a model from a repo #4

use_fine_tuned_lora=True,