scripting / Scripting-News

I'm starting to use GitHub for work on my blog. Why not? It's got good communication and collaboration tools. Why not hook it up to a blog?
121 stars 10 forks source link

Automattic's chatbot for my blog #267

Open scripting opened 1 year ago

scripting commented 1 year ago

Background

As you may know a few months ago we tried creating a chatbot just for Scripting News stuff, and the results were less than amazing, so I decided to wait for another chance to try it out, and one came along a few weeks ago.

I've been working on a project with Automattic involving feeds and social media, and in a casual conversation a few weeks ago, it came up that they had been working with Seth Godin do a chatbot for his blog. By coincidence, I was talking with Seth about exactly this topic and was aware that he had worked out something, but I didn't know with who.

Everyone was very excited! So we decided to give it a whirl with Scripting News archives in OPML which they were able to download from GitHub.

Here it is --

And now we have something to try out.

https://a8c.scripting.com/

An example

Here's an example of a question I asked: What does dave think about Tesla?

This is what came back.

Dave Winer has expressed mixed opinions about Tesla. He adores driving his Tesla, often looking for excuses to drive somewhere, and finds it much more enjoyable than his Subarulink. He does think that Tesla has a big problem because of the lack of Carplay or Android Auto support, which inhibits the potential for an app ecosystemlink.

However, Dave has frustrations with the computer system of Tesla cars. He has mentioned that the temperature controls have been needlessly moved across updates, and other details about the interface that once were visible are now hiddenlink. He believes that these updates, which can alter the interface without forewarning, can lead to user confusion and frustrationlink. Despite operating their cars largely through tech industry standard UI, Dave finds that Tesla lacks an app ecosystem, which he regards as a must-have featurelink. He describes his Tesla as both a functional car and a 'fucked up' computer system, implying a disconnect between the two elementslink.

My comments

I have a lot to say about this.

  1. Call me Dave, please.
  2. I do love driving my Tesla. I think it's an incredible wonderful product. A recent piece I wrote sums it up, imho, much better than the chatbot did. But I guess I know myself better than it does? ;-)
  3. And that was my hope and still is, that a chatbot would be able to teach me things about what I think, by showing me over time how my thinking has changed, and not at a superficial level. But I see that as aspirational. You can see where there's potential in the technology, and it's frustrating that it seems so far from achieving it.

Other questions I've asked, honestly, it got less right. For example, I asked what I thought was a very easy question -- who did Dave support for president in 2016? It didn't have any idea. How is that possible? I wrote so much about it.

But -- Automattic is an excellent development company, and this is just the beginning of what I hope will be a long experiment. I want this to work, as much as I wanted personal computer networks to work in the 80s and 90s, and now they do. We will get there. Will it be in time to save humanity? Not so sure about that. ;-)

I encourage everyone to try this, and give feedback. Let's play around, have fun, see if you can get me saying really stupid things (I'm sure that's no problem).

And thanks to the good folk at Automattic for coming out to play. ;-)

grantdunham commented 1 year ago

Keep us posted on this work. I have a couple of projects that have really long FAQs and still don't answer all the questions. This would help. (Which gets me to thinking - Many people don't read carefully. The answers are on the sites. Will this reduce the importance of reading?)

scripting commented 1 year ago

@grantdunham -- i think it has great potential for software docs.

i use chatgpt now for that, when a product is popular enough that it has all the data about it.

i've become a much better MySQL developer as a result. i wish it had been around when I was developing ThinkTank or Frontier. I want to get FeedLand into this flow.

cagrimmett commented 1 year ago

Other questions I've asked, honestly, it got less right. For example, I asked what I thought was a very easy question -- who did Dave support for president in 2016? It didn't have any idea. How is that possible? I wrote so much about it.

Interesting. I tried multiple variations on that question to see what the different results were:

Of these, the "Did Dave vote for Donald Trump or Hillary Clinton in the 2016 Presidential Election?" provided the best answer IMO:

There is sufficient evidence from Dave Winer's writings that he expressed his support for Hillary Clinton in the 2016 U.S Presidential Election. On October 29, 2016 he clearly stated, "In any case, I'm voting for Hillary. If they had early voting in New York, I would have already voted. Nothing is going to change my vote." Furthermore, in a post on November 5, 2016, he confirmed his support for Clinton and argued that it was the "sane thing to do". He ended this post revealing that he intended to vote on Election Day and had donated to Clinton’s campaign. Throughout the election period, Dave also consistently criticized Trump and voiced his concern about the potential consequences if Trump won.

I bet this has to do with this part of the pre-prompt behind the scenes:

If the provided content doesn't answer the initial question, explain that you don't know.

Combining that with some of the shorter questions might have given enough uncertainty to not give a definitive answer.

I am curious why each of my questions did not include a link to your October 30, 2016, post where you clearly state that you voted for Hillary Clinton: http://scripting.com/2016/10/30/imOkayWithTrumpWinning.html

The way the current implementation works is:

  1. Take the user input and run a series of search queries against the site's index in Elastic Search.
  2. Give the search results + pre-prompt + user question to the OpenAI API.
  3. Return the OpenAI results + links to the posts that Elastic Search passed to OpenAI

So it is clear that something about these questions did not return the relevant posts in Elastic Search to be provided to OpenAI. I'll take a look and see if I can figure out why.

scripting commented 1 year ago

@cagrimmett -- thanks for taking the time to post such a detailed answer. for me the best part is where you explain how the current implementation works. i gather from the response that you're hooking into OpenAI and are not hosting the software and database at Automattic? For me this stuff is a black box, i have no idea what's going on behind the UI of ChatGPT, which is kind of nice, but want to know how to think about this.

anyway, i don't have a record of what question i asked initially. i'm going to be more careful about that.

i did the query just now as i think i would have asked. here's the result in a screen shot.

image
jsavin commented 1 year ago

The way the current implementation works is:…

Interesting, so basically you've got a keyword search whose results are used for prompt context for the LLM along with the pre-prompt to generate the response? How do you extract keywords from the prompt to do the search on Elastic Search? Are you passing the text content of the search results to OpenAI? Or the URLs for OpenAI to pull content from itself?

Assuming you can share, what else does the pre-prompt contain besides "If the provided content doesn't answer the initial question, explain that you don't know."? (For Dave: The pre-prompt can have a big influence on both the content and the tone of the chat response.)

cagrimmett commented 1 year ago

Hi, @jsavin!

Interesting, so basically you've got a keyword search whose results are used for prompt context for the LLM along with the pre-prompt to generate the response? How do you extract keywords from the prompt to do the search on Elastic Search? Are you passing the text content of the search results to OpenAI? Or the URLs for OpenAI to pull content from itself?

One step I forgot to add above: The user input is submitted to OpenAI first, which generates 3 search terms from the input, and those are used for keyword search in Elasticsearch. Elasticsearch passes back the full content for 5 URLs that match the search terms, then we concatenate the content and pass it to the OpenAI API (using the gpt-4 model) along with a prompt and the original user submitted question.

Assuming you can share, what else does the pre-prompt contain besides "If the provided content doesn't answer the initial question, explain that you don't know."? (For Dave: The pre-prompt can have a big influence on both the content and the tone of the chat response.)

Here is the prompt:

You are a search bot on the site of software developer and writer Dave Winer.
Your job is to answer the question or query based on the initial question, content, and URLs provided below, all from Dave's writings.
Do this by reporting on and summarizing Dave's writings, in markdown format, without any additional information not provided in the content.
Link to the provided URLs, and only to those URLs, where appropriate, within the answer.
Choose only appropriate words or phrases within the answer for link text, and don't use document names as link text.
Never use superscript citations, superscript referencing, footnote-style, or superscript notations.
Avoid lists.
Answer in one to three paragraphs.
If the provided content doesn't answer the initial question, explain that you don't know.
Answer in Dave's style and tone, but never answer as Dave himself.

We are very much open to improving this prompt if you have suggestions!

A couple improvements we are considering:

  1. Switching to GPT 3.5-turbo. GPT 4 is slow, but it pays more attention to the prompt. Recently OpenAI improved GPT-3.5-turbo to be more steerable, so we are testing that out.
  2. Testing out embeddings/semantic search rather than keyword search algorithms.
jsavin commented 1 year ago

Thanks for the details, @cagrimmett :-)

I'll admit I'm not an expert when it comes to the capabilties of the specific models, but I've done some experimentation with ChatGPT using both the GPT-3.5 and GPT-4 models, and with and without enabling plugins, so I have a general feel for the types of things they do really well and things they struggle with.

My general impression with just a little experimentation is that the result tends towards "I don't know" or "I can't be certain" type responses more than I would expect. The first thing I would look at is what search terms are being generated by the LLM in the first call to OpenAI which are then passed to Elastic Search. The summaries you'll get will be largely constrained on the result from that search, and therefore by the search terms themselves, given that you're instructing the model to only generate summaries based only on the content from the search result against Dave's indexed writing.

It's too bad we don't know what data is in the model training corpus for either LLM. ;-) I've been surprised by how much the models already "know" about some well-known people and their artifacts without providing any constraints. It may be that GPT knows more about Dave than it can summarize when restricting its responses by "reporting on and summarizing Dave's writings, … without any additional information not provided in the content". I wonder if this prompt isn't over-constraining the responses. It might be worth experimenting with asking GPT to take a multi-step approach, first generating a summary based on whatever is known about Dave generally, and as a second step to edit its own summary based on Dave's own writings.

The idea about testing embeddings/semantic search seems promising, but is beyond my expertise. Definitely interested to hear what you learn with that approach.