mdn / yari

The platform code behind MDN Web Docs
Mozilla Public License 2.0
1.16k stars 486 forks source link

The AI help button is very good but it links to a feature that should not exist #9230

Open nyeogmi opened 1 year ago

nyeogmi commented 1 year ago

Summary

I made a previous issue pointing out that the AI Help feature lies to people and should not exist because of potential harm to novices.

This was renamed by @caugner to "AI Help is linked on all pages." AI Help being linked on all pages is the intended behavior of the feature, and @caugner therefore pointed out that the button looks good and works even better, which I agree with -- it is a fantastic button and when I look at all the buttons on MDN, the AI Help button clearly stands out to me as the radiant star of the show.

The issue was therefore closed without being substantively addressed. (because the button is so good, which I agree with)

I think there are several reasons the feature shouldn't exist which have been observed across multiple threads on platforms Mozilla does not control. Actually, the response has been universally negative, except on GitHub where the ability to have a universally negative response was quietly disabled Monday morning.

Here is a quick summary of some of those reasons.

One, the AI model is frequently wrong. Mozilla claims it intends to fix this, but Mozilla doesn't contain any GPT-3.5 developers and OpenAI has been promising to fix it for months. It's unlikely this will actually happen.

Two: contrary to @caugner 's opinion, it's very often wrong about core web topics, including trivial information where there is no obvious excuse. Here are some examples:

Even examples posted by people who support the existence of the AI contain significant errors:

(I say examples, but note: this is the only usage example provided by a person who supported the existence of the feature, and it contained an error.)

This is identical to one of the categories of problem seen on StackExchange when StackExchange introduced its generative AI assistant based on the same model, and it led to Stack removing the assistant because it was generating bizarre garbage.

Three: it's not clear that any documentation contributors were involved in developing the feature. Actually, it's still unclear who outside of @fiji-flo and @caugner was involved in the feature. Some contributors including @sideshowbarker have now objected and the process has produced a default outcome, which is that AI Explain was voluntarily rolled back and AI Help remains in the product.

It is probably OK for those contributors to review each other's own code, but they're also managing the response to the backlash. After a bunch of people have already signaled "hey, I have an active interest in this feature" by engaging with a relevant issue, excluding those people reflects that a ruling of "actually, you do not have an active interest!" has been reached, and it's not clear what basis that ruling would have been reached on.

Four: the existence of this feature suggests that product decisions are being made by people who don't understand the technology or who don't think I understand it.


Overall, the change tells the story that MDN doesn't know who their average user is, but assumes that the average user is (1) highly dissimilar to the GitHub users who were involved in the backlash (2) easy to sell to.

The fact is that in one day, measured in upvotes, you attracted comparable backlash to what the entire StackOverflow strike attracted in a month. It would be a mistake to think only a small group of people are concerned. This attitude would be wishful thinking.

It seems like the fork in the road for MDN is:

If option 1 isn't sustainable, then between option 2 and option 3, option 3 is obviously better for humanity in the long-run and I would encourage MDN to make plans for its own destruction.

In the worst possible world, the attitude is correct and the users are easy to sell to. Well, in that case, you've created another product company and in doing so you've metaphorically elected to serve both God and money -- and as is evidenced by the recent implosions of every siloed social media company, that is always a great idea.


Again, the AI Help button is absolutely gorgeous and functions as intended. This issue is not about the AI Help button and therefore should not be closed as a button-related wontfix, or renamed by @caugner into a description of the behavior of the button.

URL

https://github.com/mdn/yari/issues/9208 https://github.com/mdn/yari/issues/9214

Reproduction steps

Pivot to a more aggressive funding model, then engage in a mix of panic and corporate groupthink.

Expected behavior

I think the button is amazing and you are doing a great job.

Actual behavior

The AI help feature should not exist.

Device

Desktop

Browser

Chrome

Browser version

Stable

Operating system

Windows

Screenshot

image

Anything else?

No response

Validations

nyeogmi commented 12 months ago

As for the copyright infringement claim: that rests to be seen and settled in courts around the world.

I'm not sure I think the question "morally, is this right?" should be settled by asking "well, wait for the lawsuits."

falemagn commented 12 months ago

As for the copyright infringement claim: that rests to be seen and settled in courts around the world.

I'm not sure I think the question "morally, is this right?" should be settled by asking "well, wait for the lawsuits."

I think it's morally right to train an AI on the same content a human could be trained on, I don't see any difference whatsoever.

It might surprise you my view is shared with none the less than the Creative Commons folks, who - you'd concur - might know a thing or two about copyright and morality thereof: https://creativecommons.org/2023/02/17/fair-use-training-generative-ai/

What makes you think your moral is the right one?

nyeogmi commented 12 months ago

For what it's worth, the new blogpost strongly implies the features will be reintroduced regardless of the backlash:

https://blog.mozilla.org/en/products/mdn/responsibly-empowering-developers-with-ai-on-mdn/

If you take the experimental design at face value and assume that a Like means good output, a Dislike means bad output, and any other result can be safely ignored, then this outcome suggests a 25% rate of bad output, at minimum. Teixeira's comments imply that anything less than a 50% rate of bad output would result in a "keep."

This ignores other problems with the experimental design: users aren't required to leave a rating at all, and the vast majority of users don't, so if there's some confound that makes one category of users way more likely to rate than the other, then this output is just garbage.

(I pointed out some of these problems in this post, so I guess my thought is "yes, I'm basically right about how it was run and no, Mozilla made no effort to deal with the statistical problems with their methodology even though I loudly made Mozilla aware of them.")

Be-ing commented 12 months ago

https://blog.mozilla.org/en/products/mdn/responsibly-empowering-developers-with-ai-on-mdn/

Everything about this blog post is disgusting. It's abundantly clear at this point that Mozilla is an irresponsible steward of web documentation and should have no further involvement in it. I find it repugnant that Mozilla is trying to profit off content that they're barely even writing anymore. That they are trying to profit from content they're not writing using a proprietary LLM that they didn't make either makes this situation even more bizarre.

Fork it.

astearns commented 12 months ago

As far as I can tell, there is one genuinely useful part of AI Help - the links.

If this feature were scaled back to remove the problematic summaries and just provide (possibly) relevant links to a conversational question I think it would be good addition to MDN.

snoozysoft commented 12 months ago

Hi @caugner ! I'm your supposed target audience for this tool. I'm mostly tech literate, but I don't code regularly enough to have proper knowledge; I know only the basics of HTML 4 at best, with a sneezes worth of CSS and absolutely nothing else. (Clarification: my ownership of a GitHub account is purely because some software refuses to have easy bug/issue reporting outside of this site and thus it necessitated me making this account. You'll see I haven't actually contributed anything or made any repos on here. Just in case you want to strike me off as having a point because I'm on here.) If what you've said and implied in the past is true, I'm the exact sort of person you want to be using this "AI" tool on MDN.

Let me be clear and to the point with you about it. I don't want anything like this.

When I go to documentation, the proper technological stuff that explains specific bits of a large complex bit of tech, I want it to be clear, concise, and most importantly, accurately written. You cannot guarantee any of that with an LLM. You know how you can guarantee it? By using the same method of creating the site as you have been for years in the past; with a community of dedicated volunteers working the asses off to make sure that the reference site that they've chosen to work on is the best damn site possible for that information.

I'll admit, out the gate, I'm biased against LLMs and so called "AI" for a multitude of reasons. But mostly I'm against it here because not only does it spit in the face of absolutely every volunteer past and present who has worked on MDN but it offers absolutely nothing unique whilst pretending it does. There's nothing stopping me from going to literally any other LLM and asking the same question of "what does this function do" and getting either the same answer or, given what we've seen so far of generated examples, a better answer.

Your lack of understanding why people are against this is baffling to me, especially coming from a company like Mozilla. I used to think the distrust over the company was overblown but after seeing all of this play out both in this thread and @eevee 's thread, that distrust and hate has only become horrifyingly apparant and understandable. You work for Mozilla. The company that, when it initially launched, ran faux Communist propaganda to advertise themselves as a way of indicating that the company stands for the free and open web. A more understanding internet, that listens to the community, to the people that actually use it rather than bowing down to what makes money or what makes advertisers happy. And here I see a top staffer at the company refusing to back down from what is little more than jumping on a bandwagon for the sake of it. To say the absolute least, it's utterly disheartening.

I'm already disenchanted with technology as it is but things like this don't help one bit. I want easy to read and correct documentation that I can look up and after 5 minutes immediately understand what the bit of code does, how it functions, how to use it, why to use it, etc. This LLM is blatantly not fit for the job, even if it were accurate, as it consistently misses out important extra information and/or context. Context which I can read by just scrolling down.

Let me put it another way that sums up my feelings on this. If the content already on MDN is not good enough to require the usage of an LLM, then why are you using MDNs content as a base for that LLM? If all the information needed is right below the AI Help button, then what is even the point of the button? Surely your time, effort and money would be better spent working on a big community effort to add some easier to understand explanations of everything on the site? Explanations that could, say, be hidden underneath a button that says "Explain this for me", with accurate text that appears instantly and doesn't require sending vague amounts of data off to a company run by billionaires that I don't trust as far as I can throw them.

I'm rambling a lot here, I know, but to me it seems very much obvious as to WHY this has had so much backlash and why no one wants it. And so far all you've done is the equivalent of a child putting their fingers in their ears and going "la la la la can't hear you, our addition is perfect, la la la". And, yes, I'm very much aware that is perhaps the most immature thing I could say here, and you could claim that it renders everything else pointless because of that, but frankly I just don't care. It's genuinely how you've appeared to me through out both threads. It's sad.

Engage with your community, don't deflect them. Especially when they're the reason people go to your site in the first place, for goodness sake.

nschonni commented 12 months ago

https://blog.mozilla.org/en/products/mdn/responsibly-empowering-developers-with-ai-on-mdn/

Makes sense now looking up the author of that post https://blog.mozilla.org/en/mozilla/steve-teixeira-mozilla-new-chief-product-officer/.

Steve comes to us most recently from Twitter, where he spent eight months as a Vice President of Product for their Machine Learning and Data platforms. Prior to that, Steve led Product Management, Design and Research in Facebook’s Infrastructure organization.

I don't think the Yari devs can roll this back, even if they wanted to

nicuveo commented 12 months ago

Reading this issue is extremely frustrating, because the answers to it read like PR damage control, and do not address the core issue. So to restate it:

A technical reference's most important attribute is to be accurate. An LLM cannot be guaranteed to be accurate.

That's it. That's the core issue. That should be the end of this debate. It doesn't matter how much spin you put on it. It is irrelevant whether using an LLM is moral or not, if it respects copyright or not: even in a world where it was moral, didn't involve as much underpaid human labour, and wasn't a copyright nightmare, it would not address the core issue: the output of a LLM, especially one that the user does not control, cannot be guaranteed to be accurate, and a technical reference simply cannot tolerate such a large margin of error in accuracy.

The fact that proponents of this feature seem to be willing to disregard this in order to push this feature suggests that they either wrongly believe that the LLM can be made accurate, or that it's okay to compromise the accuracy of the reference. I don't know which is worse.

develleoper commented 12 months ago

Just a confirmation that the concerns I raised in my closed duplicate issue (i.e. the loss of MDN's data authenticity due to this feature, and concerns regarding its deployment) are in fact quite well and faithfully represented here.

hillexed commented 12 months ago

In reply to https://blog.mozilla.org/en/products/mdn/responsibly-empowering-developers-with-ai-on-mdn/ :

Steve the author seems like a person who values data-driven decisionmaking, so I'd like to point out some data. Issue #9208, at the time of writing, has 1287 upvotes to four downvotes. That's 1287 technical people who know about MDN and also know to read the github issues - exactly MDN's target audience. I can also add 125 likes on this much younger issue at the time of writing.

The dashboard image posted in the article shows 1017 likes on AI explain and 129 likes on AI help. Ignoring the fact that only ~3% of everyone who viewed that survey clicked like, if I add up both those like numbers, that's still less than the number of people who see these AI explain and AI help's existence as an issue. And that's before counting the hundreds of dislikes.

Even though MDN has a convenient link to the survey, and makes every effort to try to get people to click the like button if they like the product, the tiny ~3% of people who clicked like is still outweighed by the 1287 community members who came to this somewhat obscure repo to say this is a terrible idea.

I'd recommend you listen to the data.

jjl commented 12 months ago

it's a bit of a kick in the teeth to see the tone deaf blog post use the word "responsibly" in the title. i fear that by the gaslighting stage, there probably isn't a whole lot of trust left in the community to salvage.

RIP one of the best technical references on the internet.

eevee commented 12 months ago

One domain where we see high value is in training LLMs on reference documentation...

AI Help is limited to offering information only based on MDN content, and is now in beta and available to logged-in MDN readers.

it's deeply disheartening to see the feature described like this, because it's outright misleading and anyone who's been within spitting distance of machine learning knows it.

this is implying that mozilla has its own model that it's trained, or at least finetuned. but it doesn't — as far as i can tell this feature just searches for relevant MDN articles, ships the entire article contents to ChatGPT, and asks it nicely in english prose to only refer to those articles to answer the question. that is not what "training" means in an ML context, and it is not "limited" in any serious way.

maybe i'm missing something. if so, it would've been nice to explain whatever that is in the blog post about how responsible this feature is.

ghdude commented 12 months ago

One domain where we see high value is in training LLMs on reference documentation...

AI Help is limited to offering information only based on MDN content, and is now in beta and available to logged-in MDN readers.

https://github.com/mdn/yari/issues/9230#issuecomment-1624862416

Both of those statements could be true - when taken individually.

But my reading of the post and those specific parts leads me to believe exactly what @eeveee concluded, which is that the two statements are conflating things and/ or glossing over that the system wasn’t trained on the mdn docs.

resuna commented 12 months ago

It doesn't really matter what it was trained on, ChatGPT doesn't generate accurate or factual summaries of its training or input corpus, that's not what it's designed for. It's designed to satisfy some interpretation of the Turing test and thus to deceive humans. It generates texts that closely resemble the training corpus and are credible continuations of the input, that seem likely to the reader to be the result of reasoning. It frequently contradicts its training model as well as its own ongoing output. To be precise, it explicitly has no mechanism to suppress generated output that contradicts either class of input.

JoannaFalkowska commented 12 months ago

For one more line of arguments against AI Help's existence:

Can I ask for an explanation how is this feature supposed to get improved over time, exactly?


From what we know, it seems that this feature is based on the currently existing MDN pages (maintained by the community), the regular Open AI LLM (a blackbox maintained externally), and a prompt (the only thing directly controlled by Mozilla as the provider of the AI Help feature).

So let's say, 1000 AI Help users see an answer to a common question X and decide to submit feedback that it is clearly incorrect because of a reason Y. What will you do when you review the source MDN pages and it turns out they are correct, but AI Help answers aren't? It seems the only option will be to change the prompt.


So how do you plan to do that? Will you fine-tune the prompt until it gives a correct answer to question X specifically? How do you plan to do that without changing all the answers to all of the other questions? How will you ensure that overall answer quality went up after the change? Will you just discard all the feedback that you received to date every time you update the prompt, and gather it from scratch? Do you plan to write unit tests for thousands and thousand of common questions and answers? Or how exactly do you plan to make changes that you can show are actively improving the feature, not only slightly shifting in which cases it happens to be right or wrong?

Bonus points for an explanation how are the users supposed to know whether they should trust more the answer that they received yesterday, or the answer that they are given today after somebody pushed a prompt update. With human-made changes to doc pages, it is clear that the more up-to-date version is expected to be the more accurate one. How are the users supposed to decide which version is more accurate after a prompt update?

More bonus points for an explanation how do you plan to prepare for changes to the base OpenAI LLM once they force you to use an updated version of it. Do you have any idea how they might affect the answer quality? Do you have any plan what to do with the user feedback that you've gathered so far when that happens?

Honestly, do you have any plan for anything at all that's related to improving the accuracy of AI Help answers?

webrunner42 commented 12 months ago

There's one thing to say about the accuracy of AI Help is that the subtly incorrect, rambling, ignoring nuance, and concerning.. are accurate to the kind of responses that "sound good" we're getting from the team on this.

There is a place for AI to help you find useful information, but it's like people said with Wikipedia: it's not useful as a primary source, but it can be very useful for finding the primary source.

If this is going to go through regardless of concerns, it should at least be presented as a search engine.

NoraCodes commented 12 months ago

There is a place for AI to help you find useful information, but it's like people said with Wikipedia: it's not useful as a primary source, but it can be very useful for finding the primary source.

This is a great point, and it mirrors a conversation I had in the MDN Matrix channel yesterday. Personally, while I still view OpenAI as a pretty strange bedfellow for Mozilla due to their horrendous labor practices, I could definitely support a limited version of this system that only surfaces links to MDN articles, provided:

  1. It did not use an OpenAI product, and
  2. There was some actual technical assurance that it would not pontificate or confabulate.

A chat agent that could surface relevant articles would be really useful, unlike this system.

Zarthus commented 12 months ago

https://github.com/mdn/yari/issues/9208#issuecomment-1625683795

I think it would be great if we submit some of the great questions in this thread to their call.

falemagn commented 12 months ago

while I still view OpenAI as a pretty strange bedfellow for Mozilla due to their horrendous labor practices,

And here I was hoping you simply had not seen the reply of mine in which I give evidence that these "horrendous labor practices" aren't exactly what you claim them to be, hence the no-reply that followed.

I must therefore conclude, given also the amount of downvotes that responses got, that in this part of the internet facts are of no use if they contradict one's preconceived, libel-bordering, view. Quite ironic, since that's precisely what is being contested to the MDN folks - that they do not care about the evidence.

NoraCodes commented 12 months ago

I reserve my right to disagree with you about morality. That's not libel. If you're going to threaten me with legal action, even obliquely, I don't see any particular reason to continue to engage with your opinions.

For those following along, though, let's remember that there are three major issues here:

Any one of these should be enough to condemn AI Help and AI Explain as a bad idea; that you think I'm wrong about what is perhaps the most subjective of the three doesn't change the overall validity of the argument that AI Help and AI Explain are bad ideas implemented poorly.

ToxicFrog commented 12 months ago

There is a place for AI to help you find useful information, but it's like people said with Wikipedia: it's not useful as a primary source, but it can be very useful for finding the primary source.

Unfortunately, using LLMs to find primary sources has the same problems as using them as a primary source.

Since the output is not (and cannot be) guaranteed to be accurate, only contextually plausible, they will readily generate citations of sources that look superficially appropriate, but in actuality:

  1. do not support the citing text;
  2. actively contradict the citing text;
  3. are entirely irrelevant to the original query;
  4. or do not exist at all.

Sometimes they also generate relevant citations that support the citing text, but they cannot be relied upon to do so consistently.

In practice this generally makes them worse than a traditional search engine, which is still prone to problems 1 through 3 but usually manages to avoid 4, and does so using a tiny fraction of the computing power.

falemagn commented 12 months ago

I reserve my right to disagree with you about morality.

Oh, you can disagree all you want, but you didn't. No reply from you on the issue, in spite of the evidence presented. One can disagree with opinions, certainly not with facts. 2+2=4, any disagreement about that?

That's not libel.

You are literally throwing mud at a whole company and the people working in it, who might even take pride in doing so, without solid evidence to support your claims. That's precisely what libel is.

If you're going to threaten me with legal action, even obliquely, I don't see any particular reason to continue to engage with your opinions.

Do you feel threatened by the fact somebody with no ties with OpenAI makes you aware of the fact you are trashing them without solid evidence? Or, funnily, you are thinking I am in some ways representative of OpenAI - which I am not? Either way, I am pretty sure you've engaged enough.

NoraCodes commented 12 months ago

Since the output is not (and cannot be) guaranteed to be accurate, only superficially plausible, they will readily generate citations of sources that look superficially appropriate, but in actuality [is not].

In general, I think this is true and a good criticism. However, in this case, I think the problem could be mitigated by:

Of course, that doesn't ensure that those links are actually relevant, and I would reach for a search engine long before an LLM, but I do think this use case is at least theoretically reasonable.

develleoper commented 12 months ago

Can people please stop cluttering the thread and filling our inboxes with petty immaturity? Thank you.

Valid critiques have been raised, no amount of loyalty or defensiveness to a company or technology will improve the conversation.

eevee commented 12 months ago

@falemagn:

And here I was hoping you simply had not seen the reply of mine in which I give evidence that these "horrendous labor practices" aren't exactly what you claim them to be, hence the no-reply that followed.

i don't think "outsourcing to somewhere with dramatically lower wages so you can pay slightly more than average and look moderately impressive there, while still paying nowhere near the wages in your own region" is, uh, great, exactly. and "labor practices" are more than just wages.

not to take a strong stance about OpenAI's use of labor; only to say that it's odd to act as though you hit a home run here. your comments seem to follow a pattern of glossing over details in order to pluck out one that's convenient, then presuming victory. i don't see how this is constructive.

falemagn commented 12 months ago

i don't think "outsourcing to somewhere with dramatically lower wages

And dramatically lower cost of life. I know, easy to forget detail, right?

so you can pay slightly more than average

According to the figures I have presented, it's way more than average.

and look moderately impressive there, while still paying nowhere near the wages in your own region" is, uh, great, exactly.

Nobody in this conversation has claimed there's any "greatness" to it. On the other hand, in more than one occasion, some of you have used adjectives which are on the opposite scale of "greatness". I have asked what alternative source of income would you suggest to those Kenyans but, unsurprisingly at this point, no answer has been provided. If you cared to follow one of the links I gave, you'd read Kenyans' opinion about it, which greatly differs from yours. But certainly you know better than the people involved, I am sure.

and "labor practices" are more than just wages.

Which other practices are you thus talking about?

not to take a strong stance about OpenAI's use of labor; only to say that it's odd to act as though you hit a home run here.

Did I? Well, that was not meant to seem like it, but if it did, may I suggest you take your time to ponder why?

your comments seem to follow a pattern of glossing over details

Tell me about it: I have provided plenty of details myself - the aforementioned evidence - which were literally ignored.

rileyinman commented 12 months ago

No matter what your opinion of OpenAI, the issue we're here to discuss is LLM integration in MDN. Let's not get derailed from the true point.

axlevxa commented 12 months ago

As a regular user of MDN, I think it's irresponsible for Mozilla to incorporate an LLM into the service. Thanks to those who are trying to escalate this issue with them.

snoozysoft commented 12 months ago

@falemagn hey bud can you just be chill for a bit and accept that this situation youre arguing is purely an agree-to-disagree situation that will likely never be resolved in a satisfying manner? because its starting to get a bit ridiculous and i would argue that at this point it is starting to become off topic to the actual issue and discussion at hand. you've said your piece, everyone has understood it, regardless of if they agree or not, whether you have more to say or not, let's just leave it for now. this isnt really the time or place to be debating these things.

nyeogmi commented 12 months ago

This message appeared in the Mozilla discord:

image

This is one of the good ways to make info available to Steve Teixeira, but I frankly think he will ignore us unless there is a level of backlash that is externally visible. His comments suggest he knows this is unpopular but doesn't think the public knows that, so I think it would be better if the public knows that.


I think this is clearly a governance issue and that it needs to be brought to a space other than mdn/yari. This issue is only visible to people who deliberately click into it. It should be left open because at this point press has noticed it (including The Register, here)

I would like a petition or something, not because that obligates anyone to respond but because it's easier to boost on social media and it's less likely to be deleted or closed. (in particular: falemagn is engaging in provocateur-style antics that are likely to get this issue locked as "too heated")

I will make this later this weekend if no one else does. I would prefer that someone who is a member of a Mozilla project make or put their name on the website.

@sideshowbarker You're in contact with mousetail, who made the stackoverflow petition. I have just contacted vantablack who ran fedipact on Mastodon. They say I can steal the text from their campaign if anything is useful, and they sent me some pragmatic details about how they verified people and stuff.

@eevee I think you have the best posts in this thread and the people upvoting you obviously agree. Do you want to write something for the Mozilla conversation or for a webpage complaining about this more publicly?

General question: does anyone know either [1] content creators who can loudly complain about this or [2] who can provide actual engineering contacts inside of Mozilla who might be disgruntled? I am hoping this can be discussed on Mozilla Foundation's Slack, which is where the cryptocurrency donations issue was litigated.

(For my part re [1]: I DMed fasterthanlime, who was involved in the Rust Project situation -- I suspect the Rust team and Mozilla are overlapping. I am reluctant to DM ThePrimeagen, who I am sure would complain about this publicly but whose fans have occasionally harassed people.)

colin-p-hill commented 12 months ago

The quantitative feedback we have received so far suggests that the feature is used and we get much more "This answer is helpful" than "This answer is not helpful" votes.

That tells you that the user considered it helpful. It doesn't tell you whether it was correct. It's easy to mistake a plausible, confident-sounding -- but actually misleading or incorrect -- answer for a helpful one. I see no safeguards to prevent this.

I'd like to second this methodological concern. Can we take these metrics at face value? Is there any way to account for this potential bias? It seems reasonable to speculate that the users most likely to use this feature are precisely the users least likely to spot inaccuracies, which would cast doubt on whether an immediate subjective impression of "helpful" really means it's a high-quality, non-misleading answer.

I might propose that, rather than a user's first impression which may not include validation of correctness, a more relevant measure for the specific concerns raised here would be to fact-check a random sampling of the AI responses and find an overall error rate. If this rate is above what MDN tolerates under its editorial standards, then obviously the feature itself fails to live up to those standards; but if errors occur at a tolerable rate, then proponents of this feature will have a direct quantitative rebuttal to the concerns raised here. Either way, this will settle the question of how prevalent the issue of misinformation is, which is crucial information that hand-selected anecdotal examples have proven inadequate to settle in either direction.

nyeogmi commented 12 months ago

Related to my previous comment: here's an infopage I wrote. I want to use it elsewhere.

You all have Comment access if you want to make the text less bad. (I'm not a writer.)

https://docs.google.com/document/d/1fKfCy83SvHP3zMtmPkTNPfjMb8koqCqEFQn1qpBGrz8/edit?usp=sharing

vintprox commented 12 months ago

In the spirit of open conversation, we are inviting you to attend the MDN community call on Wednesday, 12th of July at 4:30 pm UTC. We plan to discuss the recent releases of AI Help and AI Explain, future plans and go through some of the feedback received so far.

Way to ruin an "open conversation" by locking an open issue with legitimate concerns about OpenAI. Well, I don't question your power position, dear collaborator: no-no-no, you can do whatever, it's your repository.


I'd like to second this methodological concern. Can we take these metrics at face value? Is there any way to account for this potential bias? It seems reasonable to speculate that the users most likely to use this feature are precisely the users least likely to spot inaccuracies, which would cast doubt on whether an immediate subjective impression of "helpful" really means it's a high-quality, non-misleading answer.

Citing @colin-p-hill, as his is a great summary of how most AI features need an assessment outside the very bubble of users that use them.

Ultrabenosaurus commented 12 months ago

Related to my previous comment: here's an infopage I wrote. I want to use it elsewhere.

You all have Comment access if you want to make the text less bad. (I'm not a writer.)

https://docs.google.com/document/d/1fKfCy83SvHP3zMtmPkTNPfjMb8koqCqEFQn1qpBGrz8/edit?usp=sharing

@nyeogmi I love your document, it's definitely a solid foundation and summary of most key points. There is one thing I believe it is missing: the importance of accuracy in Technical Documentation. It has been raised several times in these GitHub issues and somewhat touched on coincidentally in some points of your document, but if making this content more publicly visible than on GitHub it will be seen by many people without that context and understanding, thus reducing its impact and criticality.


I am also not much of a writer, but some combination of points from a few great comments would be ideal. For example:

eevee

if it is wrong, you can't directly correct it the way you might correct a static article. all you can do is keep feeding it more text and cross your fingers that it starts babbling more correctly,

nicuveo

A technical reference's most important attribute is to be accurate. An LLM cannot be guaranteed to be accurate.

The fact that proponents of this feature seem to be willing to disregard this in order to push this feature suggests that they either wrongly believe that the LLM can be made accurate, or that it's okay to compromise the accuracy of the reference. I don't know which is worse.

Zarthus

the target audience it seems to aim for new developers or someone unfamiliar with the concept it is trying to learn about

You need someone to fact-check the response from a LLMs, a four eye principle is often applied on technical docs (one writer, and at least one reviewer) which is missing from the LLM.

Therefore, there is a significantly increased risk that the LLM provides wrong information to someone not knowledgeable enough about the subject to fact-check if the AI is confidently providing misinformation, or is actually accurate.

eevee

what possible use is a disclaimer to a reader who can't judge the accuracy of the generated text themselves (because if they could, they wouldn't need it)? either they ignore the disclaimer as meaningless fluff and absorb false information, or they listen to the disclaimer and have no real choice but to avoid the feature entirely. the only purpose a disclaimer serves is to blame the reader when they inevitably believe what MDN tells them. is that the message a technical reference should be sending?


Essentially, some introductory paragraph to stress:

  1. MDN is a technical documentation platform
  2. Truth and accuracy are critical, fundamental necessities for technical documentation
  3. LLMs are not designed to generate true and accurate information, only plausible, human-like content
  4. (not sure on this one) This implementation is a 3rd party LLM which MDN / Mozilla do not have control over and which was trained on plenty of non-technical, non-MDN material, which cannot be fully restricted and excluded from its output
  5. Those most likely to want a simple summary of technical documentation are those least likely to determine the truth and accuracy of an LLM's output supposedly explaining the content they are not knowledgeable about
Simon-Tesla commented 12 months ago

Count me among those baffled by this decision. Unless you can guarantee the accuracy of the information this LLM (I refuse to call it AI as it's definitely not intelligent), it has absolutely no place in documentation.

If you guys want to play around with LLM, do it somewhere other than MDN's documentation please.

nyeogmi commented 12 months ago

@Ultrabenosaurus Agreed! I'll make relevant edits soon. Someone wrote basically the text we would need, so I'll probably make minor wording changes based on the round of review I did with gpappas on Matrix and then incorporate it.

yoe commented 12 months ago

I can think of precisely one way in which an LLM on Mozilla Developer Network can be helpful:

Add the possibility for the user to describe a problem they're trying to solve:

I want my website to be pink, but I have no idea how to do that, how do I do that?

Let the LLM to synthesize the text, and search through all its pages to figure out the pages that are most relevant to the question asked:

You seem to want to set the background color of an HTML document. These three pages seem most relevant to your question: (links follow, possibly with relevant paragraphs quoted)

That is, use the LLM to improve your search functionality so that it works even if a user did not use the correct key words, which is something that will actually help novice users.

Anything that tries to use statistical language models to explain things about code is doomed from the outset, for reasons you can make out if you look at the italicized words in this sentence.

nyeogmi commented 12 months ago

Re the above: I just ran into a pretty serious IRL situation outside of this, so if anyone wants to take it from me, go ahead -- I can't come back to this right now and might not be able to for a while.

fgaz commented 12 months ago

@caugner you write in https://github.com/mdn/yari/issues/9230#issuecomment-1622448605

I agree with you, I would also expect an unsupported property to be undefined, rather than false. But looking at the content that AI Help consulted, this is exactly what’s written on the Navigator.onLine page. I hope we can agree that this is an MDN content issue, not an AI Help issue.

I just want to point out that this isn't true either. That is not "exactly what’s written". The MDN page reads (emphasis mine):

If the browser doesn't support navigator.onLine the above example will always come out as false/undefined.

While the wording is not the best, in this context I would interpret "come out as" to refer to the evaluation of the if condition to a falsy value. The LLM instead distorts "come out as" to "return", which is definitely wrong. So it looks like this feature is failing even the basic task of quoting relevant snippets from the documentation.

I think there is a way to turn this into something useful: have the LLM return only links to MDN pages, and present those to the user. Links can't be wrong if you check that the resurce exists first. For example, if you take the original question "How can I detect that I'm in offline mode?", neither the embedded search nor a search engine show the expected result, while the LLM does. Conventional searches such as "detect offline" do return the expected result on conventional search engines, so this is only limited to conversation-style searches.

obfusk commented 12 months ago

Just came across this, which sums up the problem here really well:

Something that seems fundamental to me about ChatGPT, which gets lost over and over again:
When you enter text into it, you're asking "What would a response to this sound like?" If you put in a scientific question, and it comes back with a response citing a non-existent paper with a plausible title, using a real journal name and an author name who's written things related to your question, it's not being tricky or telling lies or doing anything at all surprising! This is what a response to that question would sound like! It did the thing! But people keep wanting the "say something that sounds like an answer" machine to be doing something else, and believing it is doing something else.
It's good at generating things that sound like responses to being told it was wrong, so people think that it's engaging in introspection or looking up more information or something, but it's not, it's only, ever, saying something that sounds like the next bit of the conversation.

LeoMcA commented 11 months ago

Hi all, thanks for the concerns raised, we hope to answer them all in the community call on Wednesday, but one suggestion which has come up a few times with regard to AI Help is really interesting because we've had a very similar discussion internally, and I'd like to answer/expand upon it a little bit immediately. It's been asked a few times in a few ways, I'll answer the most recent instance:

I think there is a way to turn this into something useful: have the LLM return only links to MDN pages, and present those to the user. Links can't be wrong if you check that the resource exists first.

Yes, indeed, and in a way we already do this.

We don't ask the LLM to only use MDN docs in its reply, and then go and check if the references exist, because there's quite a few problems with that approach. First and foremost, as has been mentioned a few times in this thread, an LLM can just completely make up a source, and if we check that and that's happened, that really leaves us with nowhere to go with the response. I guess we'd have to return an error. These models also have a training cutoff point, so we would never be able to respond to queries about new pieces of documentation.

Instead, what we do is generate embeddings using an embedding model for each section of each piece of documentation on MDN in its raw Markdown form, store those, and then also generate an embedding for a user's question. Think of an embedding as placing a piece of text on a number line (but this number line has multiple dimensions): we can then find pieces of text which are similar, because they're close to each other on this number line. And that's what we do, we find the sections of MDN's documentation which are closest to the query.

We then go and feed all that back into an LLM to effectively summarise those sections of content in the context of the question posed. But you're free to fully ignore that summary, and just skip straight to the links at the bottom of the response ("MDN content that I've consulted that you might want to check") - those are the pages found through the similarity search.

Indeed, when trying your "How can I detect that I'm in offline mode" query, the pages found are all relevant to your query (depending on whether you're in a webworker or not):

What we've discussed a bit internally, and don't have a conclusion on yet, is if we separate that "summary" step more clearly from the "similarity" step, and perhaps hide the summary (and indeed never generate it) for users who just want to skip straight to the reference documentation anyway.

Zarthus commented 11 months ago

@LeoMcA maybe i'm just a nitpicker, but it still feels kind of weird to me that the public response from Mozilla has largely been focused on 3.7% of the concerns of the users and 96.3% on promoting positive use of AI Help. (indeed, ~100 characters of "we hope to answer your concerns soon", ~2600 characters of "but we think this ai feature you've touched on is really cool")

I get that you're interested to talk about positive ways to utilize your tech, but when it comes to public relations and handling feedback I feel like y'all massively dropped the ball, and answering our questions in a call is not going to solve the inherent damage done to the reputation to MDN / Mozilla, though I appreciate the fact it's something rather than nothing.

The problem is, you cannot really get me excited for the future of your idea without first addressing the concerns from me or others. @caugner, @fiji-flo and you (@LeoMcA) have done very little to do that so far, every engagement with them so far has left me feeling worse and less confident, not better.

Maybe that'll change after the community call has been had, but I'm not hopeful.

LeoMcA commented 11 months ago

it still feels kind of weird to me that the public response from Mozilla

@Zarthus well, yes, exactly: your phrasing kinda proves the point I'm about to make: so many eyeballs are on this thread that it's very difficult for any of us to respond specifically to concerns raised, because as you just said, they'll be considered Mozilla's public "official" response, rather than merely the thoughts of some of the engineers who worked on the feature.

but when it comes to public relations

Again, you're sort-of proving my previous point: I'm not a public relations person, I'm a software engineer. I was going about my job very normally until a few days ago when everything sort of blew up and people were very angry about something I'd helped build. I can't say I learned how to deal with that in between learning about variables, objects and functions.

There's also a lot of questions, many of which touch on the same themes, so it makes more sense for us to address them together, which we will be doing in the community call.

We'll be releasing our post-mortem on AI Explain soon, too, which will address some of the concerns raised generally about AI-powered features on MDN.

Maybe that'll change after the community call has been had, but I'm not hopeful.

Please have some hope: in fact, I'm currently thinking about how to best phrase my answer to the excellent question you posed for the meeting, because I think it's a really interesting topic, and I want to best explain the thoughts I have on it (which isn't necessarily something which comes naturally).

obfusk commented 11 months ago

But you're free to fully ignore that summary, and just skip straight to the links at the bottom of the response

This has me worried. We've raised multiple concerns about the inaccuracy of the LLM output. Saying "you can ignore it" just shifts the responsibility for determining whether the output is inaccurate and should be ignored or fact-checked to the users, which is especially problematic given that:

Those most likely to want a simple summary of technical documentation are those least likely to determine the truth and accuracy of an LLM's output supposedly explaining the content they are not knowledgeable about

caugner commented 11 months ago

Just in case you missed it, @Zarthus: You can find Mozilla's first public response about AI on MDN in this blog post.

Ultrabenosaurus commented 11 months ago

I'm quite concerned that, yet again, we've had an official response which makes absolutely no reference to any sort of "correctness" for the output. In fact, @LeoMcA seems to explicitly state that they intentionally don't do any despite acknowledging that LLMs outright make up sources and information? Because then they'd have to display an error...?

We don't ask the LLM to only use MDN docs in its reply, and then go and check if the references exist, because there's quite a few problems with that approach. First and foremost, as has been mentioned a few times in this thread, an LLM can just completely make up a source, and if we check that and that's happened, that really leaves us with nowhere to go with the response. I guess we'd have to return an error. These models also have a training cutoff point, so we would never be able to respond to queries about new pieces of documentation.

I know the paragraph quoted was in regards to a different methodology the system is not using, but it displays an attitude of "if we used this method we wouldn't check for incorrect data because the world would end if we show the user an error message" as if that's somehow worse than misinformation? It does not inspire confidence in how the chosen methodology has subsequently been implemented.

In fact, the following paragraphs that explain the chosen methodology very quickly skip over the actual output generation part and even seem to suggest... "blame" isn't quite what I want to convey, but essentially puts onus for the impact of any generated misinformation lies on users who choose to read the output summary instead of just skipping it for the source document links? Again linking back to the point that the people who most need a simple summary of a technical concept are those least likely to notice the output is wrong, and some of the most likely people to inherently trust MDN for the reputation it has built up over the years.

In attempting to justify the current implementation, my concerns have actually been reinforced.

LeoMcA commented 11 months ago

so many eyeballs are on this thread that it's very difficult for any of us to respond specifically to concerns raised, because as you just said, they'll be considered Mozilla's public "official" response, rather than merely the thoughts of some of the engineers who worked on the feature

I'm quite concerned that, yet again, we've had an official response

I mean, come on folks.

resuna commented 11 months ago

I'm not a public relations person, I'm a software engineer. I was going about my job very normally until a few days ago when everything sort of blew up and people were very angry about something I'd helped build. I can't say I learned how to deal with that in between learning about variables, objects and functions.

This is what managers are for, sitting between developers and the outside world. Get your direct or their direct to deal with the policy decision. You are not the person who needs to be in this discussion, because you're not in a position to resolve it if you're not willing to tell your direct "this is a bad idea and we need to not do it".

I understand the sunk cost thing. I dropped almost a year of work a few years back. It happens.

Zarthus commented 11 months ago

https://github.com/mdn/yari/issues/9230#issuecomment-1631147949

@LeoMcA Thank you, I think that's a lot better :)

I'm not trying to intentionally seek things to pin down onto Mozilla/MDN, I'm completely accepting of mistakes and even a simple "We don't know" or "haven't got the answer to this yet", humility and humanity go a long way, I'm a bit allergic to dodging questions and I guess others are relatively fed up at this point too.

Sorry if I made you feel a bit too pressured - I think there could be product fits for AI Help within the MDN space, I just don't think the team has hit any of them.

@caugner I have read the article but I'll admit I also sort of forgot about it, because I thought it was a mediocre response at best. Regardless, I appreciate you pointed it out.

caugner commented 11 months ago

I'm quite concerned that, yet again, we've had an official response which makes absolutely no reference to any sort of "correctness" for the output

@Ultrabenosaurus We are well aware of your questions about helpfulness vs correctness, and in all fairness, you should wait to hear our answers in the community call.