scottleibrand / gpt-summarizer

Extract text from PDF, summarize each section w/ GPT, and provide a summarized outline of the paper
MIT License
189 stars 27 forks source link

quota error? #1

Open jasonbehrstock opened 1 year ago

jasonbehrstock commented 1 year ago

Scott,

I haven't played with openai at all, but your project looked cool, so I set up an account and thought I'd see how it worked. I've run it on a few pdfs and keep getting the same error. Maybe you can let me know if you think this is on my end or is a bug.

Below is what I get when it runs. I just updated my python and all relevant packages.

Thanks, Jason

iMac ~/Downloads/gpt-summarizer-main $ python summarize.py test.pdf Text extracted from test.pdf and written to test.full.txt Total token count: 19034 Header: Title-Abstract Title-Abstract (7357 characters, 2095 tokens) written to test.TitleAbstract.full.txt Traceback (most recent call last): File "/Users/jason/Downloads/gpt-summarizer-main/summarize.py", line 453, in summary = generate_summary(subcontent, prompt, model_engine, max_tokens) File "/Users/jason/Downloads/gpt-summarizer-main/summarize.py", line 295, in generate_summary completions = openai.Completion.create( File "/usr/local/lib/python3.10/site-packages/openai/api_resources/completion.py", line 25, in create return super().create(*args, **kwargs) File "/usr/local/lib/python3.10/site-packages/openai/api_resources/abstract/engine_apiresource.py", line 115, in create response, , api_key = requestor.request( File "/usr/local/lib/python3.10/site-packages/openai/api_requestor.py", line 181, in request resp, got_stream = self._interpret_response(result, stream) File "/usr/local/lib/python3.10/site-packages/openai/api_requestor.py", line 396, in _interpret_response self._interpret_response_line( File "/usr/local/lib/python3.10/site-packages/openai/api_requestor.py", line 429, in _interpret_response_line raise self.handle_error_response( openai.error.RateLimitError: You exceeded your current quota, please check your plan and billing details. iMac ~/Downloads/gpt-summarizer-main $

scottleibrand commented 1 year ago

That's not a software installation problem, that's a problem with your OpenAI account. You can go to https://beta.openai.com/account/usage to see your overall daily/cumulative usage, and a breakdown of a given day's usage down to the level of individual 5 minute periods by model. You can also go to https://beta.openai.com/account/billing/limits to set monthly usage limits, as well as a soft limit (email warning).

If you see a large number of text-davinci requests at the time you were running this script, let me know. The number of requests should match the number of summaries you see generated in the command-line output.

scottleibrand commented 1 year ago

For comparison, here's what I see on https://beta.openai.com/account/usage for the two script runs used to generate the https://github.com/scottleibrand/gpt-summarizer/tree/main/examples. At $0.02 per 10k tokens, that totals to $0.76 worth of usage to summarize those two documents. Not cheap, but if they're documents I'd have had to read otherwise and my free time is worth $20/hr, that means the summaries would need to save me at least 1-2 minutes of reading time per document to be worth doing. image

jasonbehrstock commented 1 year ago

Thanks! Yes, that makes sense; I hadn't realized that this was the same account I had been using to play with Dall-e.

In a different vein though, I was curious how this would do on a mathematics paper, so I ran one of my papers through this and got a summary that is clearly unrelated to the paper. I have attached the files in case you want to try to figure out what went wrong...

HHS_quasiflats.full.txt HHS_quasiflats.overall_summary.txt HHS_quasiflats.pdf HHS_quasiflats.TitleAbstract.full.txt HHS_quasiflats.TitleAbstract.summary.txt

scottleibrand commented 1 year ago

That has a TOC that was tripping up my code, in the format:

Contents

Introduction
1. Background
2. Cubulation of hulls
3. Quasiflats and asymptotic cones
4. Orthants and quasiflats
5.
6. Factored spaces
References

It was stripping everything after that instance of References, which isn't helpful in this case. ;-)

As a quick fix (commit c2a1d9bd8a61c51b10af852174465fb9b8a3e97b), I modified the regex to require two newlines before References to treat it as a "real" References section and strip it out. But we should probably figure out a more durable fix.

scottleibrand commented 1 year ago

HHS_quasiflats.tar.gz

HHS_quasiflats.overall_summary.txt

scottleibrand commented 1 year ago

That overall summary is only based on the first 3000 tokens of the section summaries:

Exceeded 3000 tokens, stopping concatenation of summaries
Overall summary written to /Users/sleibrand/Downloads/HHS_quasiflats.overall_summary.txt
scottleibrand commented 1 year ago

It should probably summarize each section based on the section part summaries, and then use only the section summaries to generate the overall summary, but that didn't work in this case. Here's the full output:

~/src/gpt-summarizer $ ./summarize.py ~/Downloads/HHS_quasiflats.pdf
Found 13 sections.
Text extracted from /Users/sleibrand/Downloads/HHS_quasiflats.pdf and written to /Users/sleibrand/Downloads/HHS_quasiflats.full.txt
Total token count: 72851
Header:  Title-Abstract
Title-Abstract. Section intro (7388 characters, 1969 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintro.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintro.summary.txt
Title-Abstract. Section intro-part2 (5542 characters, 1585 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintropart2.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintropart2.summary.txt
Title-Abstract. Section intro-part3 (6382 characters, 1940 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintropart3.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintropart3.summary.txt
Title-Abstract. Section intro-part4 (5007 characters, 1439 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintropart4.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintropart4.summary.txt
Title-Abstract. Section intro-part5 (5524 characters, 1614 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintropart5.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintropart5.summary.txt
Title-Abstract. Section intro-part6 (7019 characters, 1949 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintropart6.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintropart6.summary.txt
Title-Abstract. Section intro-part7 (1473 characters, 389 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintropart7.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.TitleAbstractSectionintropart7.summary.txt
No summary files found for section TitleAbstractSectionintropart7
Header:  1. Background
1. Section intro (5838 characters, 2002 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintro.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintro.summary.txt
1. Section intro-part2 (4517 characters, 1609 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart2.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart2.summary.txt
1. Section intro-part3 (5808 characters, 1990 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart3.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart3.summary.txt
1. Section intro-part4 (4569 characters, 1512 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart4.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart4.summary.txt
1. Section intro-part5 (6288 characters, 2330 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart5.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart5.summary.txt
1. Section intro-part6 (3489 characters, 1385 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart6.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart6.summary.txt
1. Section intro-part7 (1715 characters, 740 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart7.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart7.summary.txt
1. Section intro-part8 (4453 characters, 1975 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart8.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart8.summary.txt
1. Section intro-part9 (3450 characters, 1196 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart9.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart9.summary.txt
1. Section intro-part10 (2680 characters, 905 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart10.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart10.summary.txt
1. Section intro-part11 (4561 characters, 1565 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart11.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart11.summary.txt
1. Section intro-part12 (3290 characters, 1183 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart12.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.1Sectionintropart12.summary.txt
No summary files found for section 1Sectionintropart12
Header:  2. Cubulation of hulls
2. Section intro (5178 characters, 1763 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintro.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintro.summary.txt
2. Section intro-part2 (5212 characters, 1990 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart2.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart2.summary.txt
2. Section intro-part3 (4746 characters, 1881 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart3.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart3.summary.txt
2. Section intro-part4 (812 characters, 273 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart4.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart4.summary.txt
2. Section intro-part5 (4597 characters, 1716 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart5.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart5.summary.txt
2. Section intro-part6 (1517 characters, 622 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart6.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart6.summary.txt
2. Section intro-part7 (8048 characters, 2974 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart7.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart7.summary.txt
2. Section intro-part8 (1816 characters, 708 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart8.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart8.summary.txt
2. Section intro-part9 (6631 characters, 2319 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart9.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.2Sectionintropart9.summary.txt
No summary files found for section 2Sectionintropart9
Header:  3. Quasiflats and asymptotic cones
3. Section intro (4841 characters, 1579 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.3Sectionintro.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.3Sectionintro.summary.txt
3. Section intro-part2 (5176 characters, 1775 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.3Sectionintropart2.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.3Sectionintropart2.summary.txt
3. Section intro-part3 (1398 characters, 443 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.3Sectionintropart3.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.3Sectionintropart3.summary.txt
3. Section intro-part4 (3356 characters, 1318 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.3Sectionintropart4.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.3Sectionintropart4.summary.txt
3. Section intro-part5 (2594 characters, 1077 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.3Sectionintropart5.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.3Sectionintropart5.summary.txt
No summary files found for section 3Sectionintropart5
Header:  4. Orthants and quasiflats
4. Section intro (4937 characters, 1727 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.4Sectionintro.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.4Sectionintro.summary.txt
4. Section intro-part2 (4280 characters, 1651 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.4Sectionintropart2.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.4Sectionintropart2.summary.txt
4. Section intro-part3 (4392 characters, 1714 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.4Sectionintropart3.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.4Sectionintropart3.summary.txt
4. Section intro-part4 (5089 characters, 1944 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.4Sectionintropart4.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.4Sectionintropart4.summary.txt
4. Section intro-part5 (3745 characters, 1407 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.4Sectionintropart5.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.4Sectionintropart5.summary.txt
4. Section intro-part6 (5036 characters, 1939 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.4Sectionintropart6.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.4Sectionintropart6.summary.txt
No summary files found for section 4Sectionintropart6
Header:  5. Induced maps on hinges: mapping class group rigidity
5. Section intro (6321 characters, 1902 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.5Sectionintro.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.5Sectionintro.summary.txt
5. Section intro-part2 (4302 characters, 1571 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.5Sectionintropart2.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.5Sectionintropart2.summary.txt
5. Section intro-part3 (4250 characters, 1347 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.5Sectionintropart3.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.5Sectionintropart3.summary.txt
5. Section intro-part4 (2545 characters, 837 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.5Sectionintropart4.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.5Sectionintropart4.summary.txt
No summary files found for section 5Sectionintropart4
Header:  6. Factored spaces
6. Section intro (3048 characters, 1099 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.6Sectionintro.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.6Sectionintro.summary.txt
6. Section intro-part2 (5316 characters, 1918 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.6Sectionintropart2.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.6Sectionintropart2.summary.txt
6. Section intro-part3 (1011 characters, 427 tokens) written to /Users/sleibrand/Downloads/HHS_quasiflats.6Sectionintropart3.full.txt
Summary already exists at /Users/sleibrand/Downloads/HHS_quasiflats.6Sectionintropart3.summary.txt
No summary files found for section 6Sectionintropart3
No abstract found for /Users/sleibrand/Downloads/HHS_quasiflats
Concatenated 0 summaries into a single summary with 2 characters and 1 tokens
Concatenated subsection summaries have less than 500 tokens, reading in all summaries
Exceeded 3000 tokens, stopping concatenation of summaries
Overall summary written to /Users/sleibrand/Downloads/HHS_quasiflats.overall_summary.txt
jasonbehrstock commented 1 year ago

Thanks. It is good to know that AI won't be taking over the world so quickly.