LLM Integration for New Headline Creation

MelvinSninkle commented 1 month ago

Title: Integrate LLM for Generating New Headlines Description: Use an LLM to generate new headlines based on the scraped headlines from news sites. The model should aim to create non-clickbait, factual headlines. Tasks:

Research suitable models for this task. Potential models include OpenAI’s GPT-4, Cohere, or other fine-tuned models for headline generation. (Note: I can pay for some credits on here if we need)
Set up API integration to send scraped headlines as input and receive new headlines as output.
Implement instructions for the LLM to produce fact-based, clear headlines that align with specified styles.
Test the model’s accuracy and output quality, including retries for unusable headlines.
Implementation is able to accept multiple different prompts where each prompt will produce a unique identification label. This will be needed to get our first few influencers onboarded.

Acceptance Criteria: • LLM generates new headlines for all scraped articles. • New headlines are fact-based and aligned with the style guidelines. • Clear logging of model output, errors, and retries. Priority: High Labels: LLM, Backend, Headline Generation, MVP

Melvillian commented 1 month ago

I think all the steps except for (3) (pasted below) are do-able within the scope of a single issue. However I have thoughts on (3), and I think there is a 2b. or 3a. step where we need a scraper for the article content itself. We need the article content correctly scraped, or we won't know how to mutate the headline.

Implement instructions for the LLM to produce fact-based, clear headlines that align with specified styles.

It seems like as it is right now 3 requires us to implement some sort of automated fact writer, which unless we have a Bertrand Russell somewhere in the crew, it is going to be very-difficult-to-impossible to create.

We need some more discussion on what we really need here, so we can implement something feasible.

I see two ways possible approaches. If you see a third one please reply back!

1) We don't try to tell the truth in terms of The Truth, but rather we simply replace the headline with a single sentence summarization of the content in the article. If the article is talking about a coup led by lizard people currently taking place, then the headline will be some sterilized headline like "lizard people initiate coup on Washington". This wouldn't be fact-based, but it would be a true-to-the-article-content headline than the existing ones, which may be:

(Fox): Clinton's Finally Reveal Their True Scales in Illegal Coup (MSNBC): Clinton's Protect Democracy by Halting Fascist Takeover

2) We try something more ambitious, which is to use an internet-connected LLM API like Perplexity (or :shudder build our own) that analyzes the the article content, does some automated fact-checking work (maybe we can piggyback off of this automated fact-checker tool?, and then produces a headline which is both internally consistent (it factually represents everything said in the article) and externally consistent (it aligns with The Truth as understood of experts in the domain being written about). So with the same example of the lizard people coup above, we'd have trust-assembly output:

"Lizard People Coup is a Hoax, According to [relevant twitter link showing no coup is taking place]"

This being incredibly new, I strongly push for #1.

MelvinSninkle commented 1 month ago

I think all the steps except for (3) (pasted below) are do-able within the scope of a single issue. However I have thoughts on (3), and I think there is a 2b. or 3a. step where we need a scraper for the article content itself. We need the article content correctly scraped, or we won't know how to mutate the headline.

Agreed we need to be able to distinguish between the headline and the article text and this is critical functionality for the tool to operate as intended.

Implement instructions for the LLM to produce fact-based, clear headlines that align with specified styles.

It seems like as it is right now 3 requires us to implement some sort of automated fact writer, which unless we have a Bertrand Russell somewhere in the crew, it is going to be very-difficult-to-impossible to create.

We need some more discussion on what we really need here, so we can implement something feasible.

Should have added some epistemic clarity here. As far as the current generation of LLM goes, I want us to treat the article as an internally consistent reality for any kind of automated feature. So the prompt should be something like you mention below, “You are summarizing an article for someone who needs to keep up to speed on the news but doesn’t have the time to read all the articles themselves. This person needs to be able to take a look at a headline and understand both its premise and its conclusion. Take a neutral stance on the content of the article. Write a one or two sentence headline summarizing both the content of the article and its conclusion.

I see two ways possible approaches. If you see a third one please reply back!

We don't try to tell the truth in terms of The Truth, but rather we simply replace the headline with a single sentence summarization of the content in the article. If the article is talking about a coup led by lizard people currently taking place, then the headline will be some sterilized headline like "lizard people initiate coup on Washington". This wouldn't be fact-based, but it would be a true-to-the-article-content headline than the existing ones, which may be:

(Fox): Clinton's Finally Reveal Their True Scales in Illegal Coup (MSNBC): Clinton's Protect Democracy by Halting Fascist Takeover

Agreed on this approach. If anyone finds the new headline to also be misleading, that presumes the article is also misleading and will need to be addressed once we add in our challenge phase.

We try something more ambitious, which is to use an internet-connected LLM API like Perplexity (or :shudder build our own) that analyzes the the article content, does some automated fact-checking work (maybe we can piggyback off of this automated fact-checker tool?, and then produces a headline which is both internally consistent (it factually represents everything said in the article) and externally consistent (it aligns with The Truth as understood of experts in the domain being written about). So with the same example of the lizard people coup above, we'd have trust-assembly output:

"Lizard People Coup is a Hoax, According to [relevant twitter link showing no coup is taking place]"

This being incredibly new, I strongly push for #1.

I agree we can’t do #2 with the current generation of LLM’s in a way that people will like. As we start to get more group buy-in and have more diversity in the polities using the system this is worth taking a look at, especially as LLM’s get more advanced.

time-less-ness commented 4 weeks ago

It seems most legacy press still mostly tells the truth in the article itself, so yeah, have LLM read the article, replace the headline with a more neutral version. And/or tell us if the headline is already factually accurate. We seem to've converged on this anyway, but just stating what I think is doable.

time-less-ness / trust-assembly

LLM Integration for New Headline Creation #5