sharadregoti commented 1 year ago

Describe the bug I am guardrails-ai for small python project. I have followed the getting started guide & modifed the .rails spec & prompt as per my requirement.

The below code snippet take from getting started guide

Print the validated output from the LLM print(validated_output) : Outputs None Ideally i wanted JSON string

Prints None on the stdout

When i view the logs, I found that guardrails-ai was able to get the output from LLM but was not able to give it back to my python code, logs showed me this error of module 'numpy' has no attribute 'bool'.

To Reproduce Steps to reproduce the behavior:

RAIL spec
```
<rail version="0.1">
```

I have shared sample data of offer letter which has a CTC amount and it's breakdown in it {{table}} @complete_json_suffix_v2


2. Runtime arguments (e.g. `guard(...)`)
My python code

import os import tabula import openai import guardrails as gd

Get the path to the PDF file

pdf_file_path = "/home/sharad/personal/test-python-salary-gpt/test.pdf"

Extract the table from the PDF file

table = tabula.read_pdf(pdf_file_path)

promt=""" ${table}"""

print(promt.format(table=table))

guard = gd.Guard.from_rail('spec.rail')

Set your OpenAI API key

os.environ["OPENAI_API_KEY"] = ""

Wrap the OpenAI API call with the `guard` object

raw_llm_output, validated_output = guard( openai.Completion.create, prompt_params={"table": promt.format(table=table)}, engine="text-davinci-003", max_tokens=1024, temperature=0.5, )

Print the validated output from the LLM

print(validated_output)


I get output as `None`

**Expected behavior**
I should have gotten JSON output when printing `print(validated_output)` statement

**Library version:**
Version (e.g. 0.1.6)
Guard Rail Version: 0.1.6

**Additional context**
Add any other context about the problem here.

Here are the logs of guarrails

[tmp.txt](https://github.com/ShreyaR/guardrails/files/11550720/tmp.txt)

Output When I run my program

![image](https://github.com/ShreyaR/guardrails/assets/24411676/d18d207b-282e-41ec-92ed-c5a525feb0a4)

ShreyaR commented 1 year ago

Thanks for sharing the detailed instructions @sharadregoti!

I tried this rail spec which worked for me -- can you give it a shot:

<rail version="0.1">

<output>
    <integer name="epf" description="Employee Provident Fund Amount (EPF) per annum" />
    <integer name="gratuity" description="Gratuity per annum" />
    <integer name="medialInsurance" description="Medical Insurance per annum" />
    <integer name="termInsurance" description="Term Insurance per annum" />
    <integer name="ctc" description="Cost To Company per annum" />
    <object name="miscellaneous" description="Cost To Company per annum">
    </object>
</output>

<prompt>

I have shared sample data of offer letter which has a CTC amount and it's breakdown in it

{{table}}

@complete_json_suffix_v2
</prompt>
</rail>

The main change is in the spec, I changed the output type from string to integer for all dictionary values. This corrected the parser error, which led to validation proceeding as expected.

sharadregoti commented 1 year ago

I had tried changing string to integer. But bad luck Same result.

There is this jsondecoder error, can this be an issue

-- message: {"'output'": '\'{\\n    "epf": 21,600,\\n    "gratuity": 35,760,\\n    "medialInsurance": 3,060,\\n    "termInsurance": 3,672,\\n    "ctc": 1,583,548,\\n    "miscellaneous": {\\n        "HRA": 371,748,\\n        "specialAllowance": 305,148,\\n        "internet": 30,000,\\n        "technicalBooks": 10,000,\\n        "giftVoucher": 5,004,\\n        "pvAmount": 53,112,\\n        "ctcMedicalPrem": 948\\n    }\\n}\'', "'output_as_dict'": 'None', "'error'": "JSONDecodeError('Expecting property name enclosed in double quotes: line 2 column 15 (char 16)')", "'timestamp'": '1684915582.1477263', "'task_uuid'": "'28d0be01-469f-470a-ba1b-d4fead86aa14'", "'task_level'": '[2, 2, 3, 2]', "'message_type'": "'info'"}

ShreyaR commented 1 year ago

@sharadregoti can you try this rail spec?

<rail version="0.1">

<output>
    <integer name="epf" description="Employee Provident Fund Amount (EPF) per annum" />
    <integer name="gratuity" description="Gratuity per annum" />
    <integer name="medialInsurance" description="Medical Insurance per annum" />
    <integer name="termInsurance" description="Term Insurance per annum" />
    <integer name="ctc" description="Cost To Company per annum" />
    <object name="miscellaneous" description="Cost To Company per annum">
    </object>
</output>

<instructions>
You are a helpful assistant only capable of communicating with valid JSON, and no other text.

@json_suffix_prompt_examples
</instructions>

<prompt>

I have shared sample data of offer letter which has a CTC amount and it's breakdown in it.

{{table}}

If extracting any integer value, make sure to extract it as a number and not as a string.
This means that if the value is 1,00,000, then it should be extracted as 100000 and not as 1,00,000.

If you are unable to extract any value, use `null`.

@xml_prefix_prompt

{output_schema}
</prompt>
</rail>

I did some prompt engineering to make it work with your table. The issue that was happening was that the source table had numbers with commas, which was messing with the json decoding.

Also, when you use this, I recommend using this with temperature 0.0.

I tested this out with gpt-3, gpt-3.5 and gpt-4, and it worked across all 3.

# GPT-3
raw_llm_output, validated_output = guard(
    openai.Completion.create,
    prompt_params={"table": table},
    engine="text-davinci-003",
    max_tokens=1024,
    temperature=0.0,
)

# GPT-3.5
raw_llm_output, validated_output = guard(
    openai.ChatCompletion.create,
    prompt_params={"table": table},
    model="text-davinci-003",
    max_tokens=1024,
    temperature=0.0,
)

irgolic commented 1 year ago

Closed due to inactivity. Feel free to reopen.

guardrails-ai / guardrails

module 'numpy' has no attribute 'bool'. #166

Get the path to the PDF file

Extract the table from the PDF file

Set your OpenAI API key

Wrap the OpenAI API call with the `guard` object

Print the validated output from the LLM

guardrails-ai / guardrails

module 'numpy' has no attribute 'bool'. #166

Get the path to the PDF file

Extract the table from the PDF file

Set your OpenAI API key

Wrap the OpenAI API call with the guard object

Print the validated output from the LLM

Wrap the OpenAI API call with the `guard` object