nlmatics / llmsherpa

Developer APIs to Accelerate LLM Projects
https://www.nlmatics.com
MIT License
1.17k stars 117 forks source link

LayoutPDFReader_Demo.ipynb test Error confirmation request and fine-tuning section GPT multilingual translation request code #31

Open EddyLab-AI opened 7 months ago

EddyLab-AI commented 7 months ago

I am reporting a malfunction while testing based on LayoutPDFReader_Demo.ipynb.

1. PDF download recognition fails from external URL

pdf_url = "https://arxiv.org/pdf/1910.13461.pdf"

,,, UnboundLocalError Traceback (most recent call last)

in () 4 pdf_url = "https://arxiv.org/pdf/1910.13461.pdf" # also allowed is a file path e.g. /home/downloads/xyz.pdf 5 pdf_reader = LayoutPDFReader(llmsherpa_api_url) ----> 6 doc = pdf_reader.read_pdf(pdf_url) ,,, **2. Read file from inside (success)** pdf_url = "https://arxiv.org/pdf/1910.13461.pdf" Download. Uploaded “1910.13461.pdf” to the “downloads” folder. **3. Question: What code should I enter to request translation of the fine-tuning section text into another language through GPT?** ,,, from IPython.core.display import display, HTML selected_section = None // find a section in the document by title for section in doc.sections(): if section.title == '3 Fine-tuning BART': selected_section = section break HTML(section.to_html(include_children=True, recurse=True)) ,,, **4. custom summary of this text using a prompt: (Error)** resp = OpenAI().complete(f"read this text and answer question: {question}:\n{context}") LocalProtocolError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py in map_exceptions(map) 9 try: ---> 10 yield 11 Exception as exc: # noqa: PIE786 **I also tried asking a bug question about GPT, but couldn't find a suitable fix, so I'm leaving my question here.**
asukla commented 7 months ago

Hello,

I think your openai key is not set, hence the openai call is failing.

For translation you can change the prompt as follows:

resp = OpenAI().complete(f"read this text and translate to french: \n{context}")

EddyLab-AI commented 7 months ago

Hi,

Thank you very much for the quick reply :) I will re-test the solution you provided and post a report.

1. openai key set

I have recently been using Key Security Secret supported by Colab.

,,, from google.colab import userdata api_key = userdata.get('OPENAI_API_KEY')

from openai import OpenAI

// Client preparation client = OpenAI( api_key=api_key ) ,,,

2. It was determined that the openai key was operating normally, and section.title == '3 Fine-tuning BART': The result was received normally.

,,, from IPython.core.display import display, HTML selected_section = None // find a section in the document by title for section in doc.sections(): if section.title == '3 Fine-tuning BART': selected_section = section break

HTML(section.to_html(include_children=True, recurse=True)) ,,,

In normal operation, you can check the results below.

3 Fine-tuning BART The representations produced by BART can be used in several ways for downstream applications.

3.1 Sequence Classification Tasks . . .

===

3. Apply translation prompt

,,, from llama_index.llms import OpenAI context = selected_section.to_html(include_children=True, recurse=True) resp = OpenAI().complete(f"read this text and translate to korean: \n{context}") print(resp.text) ,,,


LocalProtocolError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py in map_exceptions(map) 9 try: ---> 10 yield 11 Exception as exc: # noqa: PIE786

106 frames /usr/local/lib/python3.10/dist-packages/httpcore/_sync/http11.py in _send_request_headers(self, request) 141 with map_exceptions({h11.LocalProtocolError: LocalProtocolError}): --> 142 event = h11.Request( 143 method=request.method,

/usr/local/lib/python3.10/dist-packages/h11/_events.py in init(self, method, headers, target, http_version, _parsed) 95 object.setattr( ---> 96 self, "headers", normalize_and_validate(headers, _parsed=_parsed) 97 )

/usr/local/lib/python3.10/dist-packages/h11/_headers.py in normalize_and_validate(headers, _parsed) 163 validate(_field_name_re, name, "Illegal header name {!r}", name) --> 164 validate(_field_value_re, value, "Illegal header value {!r}", value) 165 assert isinstance(name, bytes)

/usr/local/lib/python3.10/dist-packages/h11/_util.py in validate(regex, data, msg, format_args) 90 msg = msg.format(format_args) ---> 91 raise LocalProtocolError(msg) 92 return match.groupdict()

LocalProtocolError: Illegal header value b'Bearer '

The above exception was the direct cause of the following exception:

LocalProtocolError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py in map_httpcore_exceptions() 65 try: ---> 66 yield 67 Exception as exception:

. . . . .

I confirmed that the same error occurred.