polyrabbit / hacker-news-digest

:newspaper: Let ChatGPT Summarize Hacker News for You
http://hackernews.betacat.io/
GNU Lesser General Public License v3.0
667 stars 87 forks source link

experiment with thought process prompt #22

Open thiswillbeyourgithub opened 1 year ago

thiswillbeyourgithub commented 1 year ago

Hi,

Just wanted to signal that there are promising ways to summarize documents in a much better way in my opinion.

This is more costly as it's using chains of langchain but I think the added value is tremendous.

To me this is the kind of feature that would make me pay for that service.

Also this can nicely handle comments too with a little adjustment: make a prompt that extracts the new information in the comments, summarized opinions, new facts, etc #5

I did a quick try earlier today and find this very promising. I'm insanely busy atm so I thought you might be interested in the raw code directly. The idea is to ask to summarize not the key facts but the reasonning of the author paragraph by logically indented paragraph in markdown. Here's a quick proof of concept (just add your api key and add a txt file as argument, also notice that for testing I shortenned the input via [:1000]):

# source https://python.langchain.com/en/latest/modules/chains/index_examples/summarize.html

from pathlib import Path
import os
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain.docstore.document import Document
from langchain.chains.summarize import load_summarize_chain
from pprint import pprint

assert Path("API_KEY.txt").exists(), "No api key found"
os.environ["OPENAI_API_KEY"] = str(Path("API_KEY.txt").read_text()).strip()

llm = ChatOpenAI(
        model_name="gpt-3.5-turbo",
        temperature=0,
        verbose=True,
        )

text_splitter = CharacterTextSplitter()

def load_doc(path):
    assert Path(path).exists(), f"file not found: '{path}'"
    with open(path) as f:
        content = f.read()[:1000]
    texts = text_splitter.split_text(content)
    if len(texts) > 5:
        ans = input(f"Number of texts splits: '{len(texts)}'. Continue? (y/n)\n>")
        if ans != "y":
            raise SystemExit("Quitting")
    docs = [Document(page_content=t) for t in texts]
    return docs

prompt_template = """Write a very concise summary of the author's reasonning paragraph by paragraph as logically indented markdown bullet points:

'''
{text}
'''

CONCISE SUMMARY AS LOGICALLY INDENTED MARKDOWN BULLET POINTS:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
refine_template = (
    """Your job is to continue a summary of a long text as logically indented markdown bullet points of the author's reasonning.
    We have provided an existing summary up to this point:
    '''
    {existing_answer}
    '''

    You have to continue the summary by adding the bullet points of the following part of the article (only if relevant, stay concise, avoid expliciting what is implied by the previous bullet points):
    '''
    {text}
    '''
    Given this new section of the document, refine the summary as logically indented markdown bullet points. If the new section is not worth it, simply return the original summary."""
)
refine_prompt = PromptTemplate(
    input_variables=["existing_answer", "text"],
    template=refine_template,
)

if __name__ == "__main__":
    import sys
    docs = load_doc(sys.argv[-1])
    chain = load_summarize_chain(llm, chain_type="refine", return_intermediate_steps=True, question_prompt=PROMPT, refine_prompt=refine_prompt)
    out = chain({"input_documents": docs}, return_only_outputs=True)

    t = out["output_text"]
    for bulletpoint in t.split("\n"):
        print(bulletpoint)

    print("Openning console.")
    import code ; code.interact(local=locals())

Thoughts?

thiswillbeyourgithub commented 1 month ago

Btw, I implemented all that in my own cli project that does RAG as well as summaries: https://github.com/thiswillbeyourgithub/DocToolsLLM/