zerebom / gpt-pdf-summarizer

PDF summarizer that leverages GPT AI to generate summaries from uploaded PDF files. The application uses FastAPI for the backend and Streamlit for the frontend. The project was created with the assistance of AI language models.
106 stars 27 forks source link

For some reason, error500 appears. #4

Open ngonbei opened 1 year ago

ngonbei commented 1 year ago

imge1 image2

An ERROR will appear as shown in the image. I persisted in my own way for 2 days, but it does not work.

We have created a new OpenAI API and have confirmed that it works with other API services and applications.

ChatGPT4 advised me that I need to change pdf_summary.py and summary_service.py.

The build environment is as follows Windows10Pro VSCode dockerCompass

CRLF is changing line breaks to LF.

Thank you for your advice.

zerebom commented 1 year ago

Thank you for your question. It appears that an unintended change was made to a certain function during refactoring, causing it to crash with an error. This has been fixed in the commit here: https://github.com/zerebom/gpt-pdf-summarizer/commit/597753dbbd7c50082d1b520a7d2cd8275b61e762. Please confirm.

Also, please note that the current implementation outputs summaries in Japanese. If necessary, kindly rewrite the prompt for the generate_summary function.

Moreover, be aware that the current implementation throws an error when inputting more than 4,000 tokens. If you're willing, we would greatly appreciate it if you could submit a PR to fix this issue.

ngonbei commented 1 year ago

Thank you for your answer.

I tried but could not solve the problem.

I get the exact same error as well.

597753d. Please confirm. ↑I tried to use upup but it was not possible.

Also, please note that the current implementation outputs summaries in Japanese. If necessary, kindly rewrite the prompt for the generate_summary function. ↑Also, please note that the current implementation outputs summaries in Japanese.

Moreover, be aware that the current implementation throws an error when inputting more than 4,000 tokens. If you're willing, we would greatly appreciate If you're willing, we would greatly appreciate it if you could submit a PR to fix this issue. ↑I asked ChatGPT4 to write a text in Japanese with less than 1,000 tokens based on the number of tokens ChatGPT4 can answer, and I also had other Japanese language professionals check the text, so I don't think the 4,000 token limit is relevant.

jdwngdev commented 1 year ago

I am also getting the same issue. I've made sure that my document is under 4k token limit too. Looks like it's an error from something else.

masakiq commented 1 year ago

@ngonbei @jdwngdev

The cause of this error is probably due to "not setting the environment variable OPENAI_API_KEY" or "the maximum number of tokens has been exceeded".

When "the environment variable OPENAI_API_KEY is not set"

Cause

This web application has the following two processes running

This error is occurring in the "process that accepts file uploads". If you look closely at the error image, you will see the URL address http://0.0.0.0:8001

235306345-2a9c199b-2d60-4235-bbe6-cbc20ef7acb1

The API KEY entered in the "Type your OPENAI_API_KEY here" shown on the screen is only used by the main process. Therefore, the "process that accepts file uploads" cannot obtain the API KEY and an error will occur.

If you see the following error log on the terminal screen, it is because OPENAI_API_KEY is not set.

openai: error_code=None error_message="You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a You can obtain an API key from https://platform.openai.com/account/api-keys." error_param=None error_type=invalid_ request_error message='OpenAI API error received' stream_error=False

Solution

Set the API KEY obtained from Open AI to the environment variable OPENAI_API_KEY. Set the value to xxxxxxx.

export OPENAI_API_KEY=xxxxxxx
echo $OPENAI_API_KEY
xxx ← OK if the value you set is printed.

Then, start up with the following command.

docker-compose up -build

When "The maximum number of tokens has been exceeded"

Cause

If you get the following error on the terminal screen, it is a "Token size exceeded the allowed amount" error.

openai: error_code=context_length_exceeded error_message="This model's maximum context length is 4097 tokens. Please reduce the length of the messages." error_param=messages error_type=invalid_request_error message='OpenAI API error received ' stream_error=False

Actually, in Japanese, one character is 1.3 ~ 1.5 tokens. Therefore, the limit for Japanese is about 2500 ~ 3000 characters.

Solution

Modify app/services/summary_service.py as follows and restart. If the error occurs even at 3000 characters, adjust the value to something like 2500.

diff --git a/app/services/summary_service.py b/app/services/summary_service.py
index 0e5df1b..a11e353 100644
--- a/app/services/summary_service.py
+++ b/app/services/summary_service.py
@@ -40,7 +40,7 @@ def generate_summary(text: str, max_length: int = 100) -> str:
 def summarize_large_text(conversations: Conversations,
                          text: str,
                          max_summarize_chars: int = 9000,
-                         max_chars_per_request: int = 4000,
+                         max_chars_per_request: int = 3000,
                          summary_length: int = 1000) -> Conversations:
     wrapped_text = textwrap.wrap(text, max_chars_per_request)
     length =  max_summarize_chars // max_chars_per_request

A better method is to "reduce the number of tokens and re-run when the token size is exceeded". Please refer to the following.

https://github.com/masakiq/gpt-pdf-summarizer/commit/6397cfd000b3a4f442793c0db8ac58558bcbee04

ngonbei commented 1 year ago

Thank you for your answer. Unfortunately, I have already tried the export&echo that you taught me, and it still did not resolve the error.

I have tried Mac, Windows, and linux operating systems, and I have tried bash, zsh, PS, and cmd in the terminal, but the problem persists.

By the way, when I try the export method you suggested, I get the following error. """"""""""""""""""" RateLimitError: Your account is not active, please check your billing details on our website. """"""""""""""""""" However, this is absolutely not possible. I'm entering an API key that is actually working in other programs, so it's not like I can't use it, and it's definitely active.

I really don't understand.

Did the respondent work?