mrseanryan / gpt-dm

Data modelling via natural language using an LLM. Outputs JSON or SQL. Also generates Test data in SQL or CSV format.
MIT License
2 stars 0 forks source link

Try other LLM - HuggingFace's Code Llama (Llama 2 learns to code) #6

Open mrseanryan opened 10 months ago

mrseanryan commented 10 months ago

https://huggingface.co/blog/codellama

mrseanryan commented 10 months ago

seems possible with 13B version

https://huggingface.co/codellama/CodeLlama-13b-hf#model-use

and carefully written prompt:

https://huggingface.co/blog/codellama#how-to-use-code-llama


create a DOT graph to decide a mortgage loan. if credit score is greater than 700 then check years employed. else reject.

if years employed is greater than 3 then approve. else reject.

DOT: —-

mrseanryan commented 10 months ago

outputs:

create a DOT graph to decide a mortgage loan. if credit score is greater than 700 then check years employed. else reject.

if years employed is greater than 3 then approve. else reject.

DOT:

digraph {
    rankdir=LR;
    node [shape = box];
    start [label="start"];
    end [label="end"];
    start -> a [label="credit score > 700"];
    start -> b [label="credit score < 700"];
    a -> c [label="years employed > 3"];
    a -> d [label="years employed < 3"];
    b -> e [label="reject"];
    c -> f [label="approve"];
    d -> g [label="reject"];
    f -> end;
    g -> end;
}

2.1.2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.

mrseanryan commented 10 months ago

Describe is possible via the separate Instruct 13B model:

https://huggingface.co/spaces/codellama/codellama-13b-chat

https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf#model-use

prompt:

explain this DOT graph:

prompt that gets high level on first paragraph:

what is the overall purpose of this DOT graph:

mrseanryan commented 10 months ago

maybe can use instruct model for BOTH generation and description.

mrseanryan commented 10 months ago

the non-instruct model can also generate labels referring to known properties and also a whitelist of prefixes:

create a DOT graph to decide a mortgage loan. if credit score is greater than 700 then check years employed. else reject.

if years employed is greater than 3 then approve. else reject.

name the DOT nodes with a prefix decision or end or other_.

In the labels, refer to the available properties: applicant.credit_score, applicant.years_employed, applicant.other

DOT:

————————

create a DOT graph to decide a mortgage loan. if credit score is greater than 700 then check years employed. else reject.

if years employed is greater than 3 then approve. else reject.

name the DOT nodes with a prefix decision or end or other_.

In the labels, refer to the available properties: applicant.credit_score, applicant.years_employed, applicant.other

DOT:

digraph {
    rankdir=LR;
    node [shape=box];
    start [label="start"];
    end_reject [label="end_reject"];
    end_approve [label="end_approve"];
    decision_credit_score [label="credit_score"];
    decision_years_employed [label="years_employed"];
    start -> decision_credit_score;
    decision_credit_score -> end_reject [label="applicant.credit_score < 700"];
    decision_credit_score -> decision_years_employed [label="applicant.credit_score >= 700"];
    decision_years_employed -> end_approve [label="applicant.years_employed > 3"];
    decision_years_employed -> end_reject [label="applicant.years_employed <= 3"];
}
mrseanryan commented 8 months ago

To use via transformers (locally)

# (windows)

cd my-project
py -m venv env

.\env\Scripts\activate

where python

.\env\Scripts\python.exe

# when done:
deactivate

ref = https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#:~:text=To%20create%20a%20virtual%20environment,virtualenv%20in%20the%20below%20commands.&text=The%20second%20argument%20is%20the,project%20and%20call%20it%20env%20.

Then install special version of transformers for this model:

pip install git+https://github.com/huggingface/transformers.git@main accelerate

pip freeze > requirements.txt

Next time around, can install via:

pip install -r requirements.txt

Using the LLM:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="codellama/CodeLlama-13b-Python-hf")
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-13b-Python-hf")
model = AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-13b-Python-hf")
mrseanryan commented 8 months ago

Try a quantized LLM

https://huggingface.co/TheBloke/CodeLlama-13B-GGUF

instruct version:

https://huggingface.co/TheBloke/CodeLlama-13B-Instruct-GGUF