Support for Open source LLMs

Anindyadeep commented 1 year ago

Description There are several open source LLMs and open source LLM providers right now. Examples:

GPT4All by Nomic AI
Llama CPP
Hugging Face models

Can we provide support for guardrails for these sets of models and providers?

Why is this needed

Open AI gpt API is not the only one, which needs validation in end to end LLM pipelines. It is as important for open source LLMs to when developers are building and shipping use case to production.

Implementation details As far as I have saw the code base, we might not need to do breaking changes. Rather we might need to change the way we call the function like as we used here as openai.Completion.create, similarly we need to have support for the llm calls function for llm providers.

End result

If this feature get's implemented then we can do validation checks and evaluations for in house LLM without relying on Open AI. This will be very much useful as an evaluation procedure for fine tuning too and integrating newer LLMs in the process of CI/CD.

Here is the sample code:

import os
import torch 
import guardrails as gd

from peft import PeftConfig, PeftModel
from transformers import GenerationConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

device = 'cuda:0'

quantization_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_use_double_quant=True,
        )

# my peft model after I fine tuned or I can pull the same from hub
adapter_id = 'anindya64/falcon7b_finetuned'

peft_config = PeftConfig.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(
    peft_config.base_model_name_or_path,
    return_dict=True,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=quantization_config,  # may be this can be an arg
)

model = PeftModel.from_pretrained(model, adapter_id)
tokenizer = AutoTokenizer.from_pretrained(peft_config.base_model_name_or_path)

def my_inference(text, generation_config):
    inputs = tokenizer(text, return_tensors="pt", return_token_type_ids=False).to(device)
    outputs = model.generate(
            **inputs,
            max_new_tokens=generation_config.max_generation_config,
            generation_config=generation_config,
            pad_token_id=tokenizer.eos_token_id
        )
    generated_text = tokenizer.decode(outputs[0]).split('### Assistant:')[1].strip()
    return generated_text

# Wrap the OpenAI API call with the `guard` object

doctors_notes = """49 y/o Male with chronic macular rash to face & hair, worse in beard, eyebrows & nares.
Itchy, flaky, slightly scaly. Moderate response to OTC steroid cream"""

guard = gd.Guard.from_rail('getting_started.rail')
raw_llm_output, validated_output = guard(
    my_inference,
    prompt_params={"doctors_notes": doctors_notes},
    engine="text-davinci-003",
    max_tokens=1024,
    temperature=0.3,
)

# Print the validated output from the LLM
print(validated_output)

This can be done similarly for gpt4all, llama cpp assuming the user had already installed dependencies. Our job would be to just call the function and run under guadrails.

irgolic commented 1 year ago

Hey, Guardrails natively supports a couple different LLMs (OpenAI, Cohere), as well as Manifest, which supports many more. You can also pass in an any arbitrary python function that takes "prompt" and "instructions" as arguments.

Please see the LLM API docs for more details.

I'm closing this issue for now, but feel free to reopen it if you run into any issues :)

Anindyadeep commented 1 year ago

Yes, thanks @irgolic, exactly what I was looking for

guardrails-ai / guardrails

Support for Open source LLMs #236