mdn / yari

The platform code behind MDN Web Docs
Mozilla Public License 2.0
1.16k stars 486 forks source link

MDN can now automatically lie to people seeking technical information #9208

Open eevee opened 1 year ago

eevee commented 1 year ago

Summary

MDN's new "ai explain" button on code blocks generates human-like text that may be correct by happenstance, or may contain convincing falsehoods. this is a strange decision for a technical reference.

URL

https://developer.mozilla.org/en-US/docs/Web/CSS/grid

Reproduction steps

as soon as i heard about this, i visited the first MDN article in my address bar history (for the grid property), hit "ai explain" on the first code block encountered (the syntax summary), and received the following information:

grid: "a" 100px "b" 1fr;: This value sets the grid template to have two rows and two columns. The first row has a height of 100 pixels and the second row has a height of 1 fraction unit (1fr). The columns are named "a" and "b".

which is deeply but subtly incorrect — this creates only one column (more would require a slash), and the quoted strings are names of areas, not columns. but it's believable, and it's interwoven with explanations of other property values that are correct. this is especially bad since grid is a complex property with a complex shorthand syntax — exactly the sort of thing someone might want to hit an "explain" button on.

the generated text appears to be unreviewed, unreliable, unaccountable, and even unable to be corrected. at least if the text were baked into a repository, it could be subject to human oversight and pull requests, but as best i can tell it's just in a cache somewhere? it seems like this feature was conceived, developed, and deployed without even considering that an LLM might generate convincing gibberish, even though that's precisely what they're designed to do.

and far from disclaiming that the responses might be confidently wrong, you have called it a "trusted companion". i don't understand this.

Expected behavior

i would like MDN to contain correct information

Actual behavior

MDN has generated a convincing-sounding lie and there is no apparent process for correcting it

Device

Desktop

Browser

Firefox

Browser version

Stable

Operating system

Linux

Screenshot

No response

Anything else?

No response

Validations

MrLightningBolt commented 1 year ago

Confirming. This "AI" snake oil is worse than useless for the reasons described above; other examples are trivial to create. It makes MDN worse just being there.

catleeball commented 1 year ago

Generated code (without any human vetting for correctness and human curation for relevance) is a hazard since it can produce plausible-sounding disinformation.

I strongly feel that the AI help feature is likely to cause much more damage than it would possibly help.


Edit: To clarify, I think the best path forward is to offer only documentation written by humans, ideally reviewed by people who have domain expertise. E.g., pay technical writers and keep documentation available for community-suggested edits.

MrPetovan commented 1 year ago

You don't explain code blocks using plausibly-sounding plain text, you explain it visually by linking to a JsFiddle page with that code.

This is a deeply misled feature that will produce disinformation at scale, like all other LLM applications.

Eragonfr commented 1 year ago

You don't explain code blocks using plausibly-sounding plain text, you explain it visually by linking to a JsFiddle page with that code.

I don't agree that linking to a fiddle is enough, you need plain text explanation of what the code do. But the explanation need to be peer-reviewed and fact checked. Not being some plausible garbage generated by an AI.

lifning commented 1 year ago

and it's hardly a isolated occurrence. this is pervasively impossible to trust. the AI incorrectly claiming that a given code example isn't valid CSS

mandynicole commented 1 year ago

Releasing this alongside the long-awaited Play was also a choice. We should be empowering the community to create examples built on MDN itself.

Xkeeper0 commented 1 year ago

As another example, https://developer.mozilla.org/en-US/docs/Web/HTML/Element/portal

From the page text (emphasis mine):

A <portal> is similar to an <iframe>. An <iframe> allows a separate browsing context to be embedded. However, the embedded content of a <portal> is more limited than that of an <iframe>. It cannot be interacted with, and therefore is not suitable for embedding widgets into a document. Instead, the <portal> acts as a preview of the content of another page. It can be navigated into therefore allowing for seamless transition to the embedded content.

The "AI Explain" feature:

<portal id="exampleportal" src="https://example.com/"></portal>

The code example you provided is an HTML code snippet that demonstrates the usage of the element.

The <portal> element is a new HTML element introduced in the HTML5.3 specification. It allows you to embed a remote document into your current document, creating a portal-like experience. In the code example, there is a <portal> element with an id attribute set to "exampleportal" and a src attribute set to "https://example.com/". The id attribute is used to uniquely identify the portal element, while the src attribute specifies the URL of the remote document that will be displayed within the portal. When the page is loaded, the browser will fetch the remote document specified by the src attribute and display it within the <portal> element. The remote document can be any valid HTML document, and it can contain interactive content such as forms, videos, or other web components. The <portal> element provides a way to seamlessly integrate remote content into your web page, allowing users to interact with it as if it were part of the current document. This can be useful for scenarios such as displaying live data from another website or embedding a third-party widget into your page. It's important to note that the <portal> element is still an experimental feature and may not be supported by all browsers. Therefore, it's recommended to check the browser compatibility before using it in production.

In this case, the AI explanation directly contradicts the page itself!

datarocks commented 1 year ago

As someone who occasionally dips his toes in CSS when no one else is available and a thing needs to be fixed, I depend on these docs to be dependable. This is even more true today than it was in the past, as LLM generated chum spreads across the web. Please keep MDN as a trusted source of human generated developer documentation.

patrickhlauke commented 1 year ago

This is both sad and deeply embarassing for the MDN team...

ericwbailey commented 1 year ago

The inclusion of this feature represents a great deal of risk in folks' ability to learn code effectively, especially code where the output explanation has the potential to perpetuate bias and misunderstanding from the content the LLM model is sourced from.

I would also like to note the ethical and environmental concerns surrounding how LLMs are constructed. It saddens me to see this feature as a former MDN editor.

avdi commented 1 year ago

I didn't spend a decade trying to convince people to use MDN over the shovelfuls of low-quality SEO-farming craptext on W3Schools, only for them to be presented with shovefuls of low-quality AI craptext on MDN

alensiljak commented 1 year ago

The next generation of AI will be trained on this. Just sayin'...

Nyumat commented 1 year ago

Considering the fact that MDN's "AI Help" feature is a semi-paid service, this is a huge let down to both see and use.

This new feature claims to be powered by OpenAI's GPT 3.5, yet ChatGPT is purely a language model, not a knowledge model. Its job is to generate outputs that seem like they were written by a human, not be right about everything.

In the context of web development as a whole, we cannot count on LLM's to "facilitate our learning". I cannot understate how terrible and drastic this blow to customer trust is. ❌

MDN has been one of the leading resources for aspiring and current professional developers in the web world. This new beta "help" feature is taking away from the integrity and trustworthiness of a once fantastic site to learn from.

Thank you OP for opening this issue, MDN's team needs to be better.

dwminer commented 1 year ago

I use MDN because it's a comprehensive and accurate source of documentation with no fluff. I fail to see how LLM output prone to egregious inaccuracies improves that. It dramatically weakens my confidence in MDN and I fear that its inclusion will promote an over-reliance on cheap but unreliable text generation.

brndnmtthws commented 1 year ago

We've come full circle and we've learned nothing.

Clippy-letter

aardrian commented 1 year ago

Another example from the Accessibility concerns section of <s>: The Strikethrough element which offers this CSS:

s::before,
s::after {
  clip-path: inset(100%);
  clip: rect(1px, 1px, 1px, 1px);
  height: 1px;
  overflow: hidden;
  position: absolute;
  white-space: nowrap;
  width: 1px;
}

s::before {
  content: " [start of stricken text] ";
}

s::after {
  content: " [end of stricken text] ";
}

The AI wraps up its explanation with this:

Overall, this code creates a strikethrough effect by hiding the content of the "s" element and adding visible text before and after it.

That is demonstrably wrong. There is no demo of that code showing it in action. A developer who uses this code and expects the outcome the AI said to expect would be disappointed (at best).

That was from the very first page I hit that had an accessibility note. Which means I am wary of what genuine user-harming advice this tool will offer on more complex concepts than simple stricken text.

ericwbailey commented 1 year ago

To @aardrian's point: Utilizing inaccessible code may have legal ramifications, to say nothing about the ethical problems of restricting others' access. What risk and responsibilities does the MDN incur if an organization incorporates inaccessible code suggestions and advice provided by this feature?

fenndev commented 1 year ago

As a person working towards becoming a web developer, I trust MDN to contain accurate, fact-checked information. For every minute this may save someone, it would surely cost hours of troubleshooting for another, especially newer developers who utilize MDN as a learning and reference tool extensively. This is damaging both to the developer community and the reputation of MDN as a trusted resource; while I might not have extensive experience as a web developer, I hope that a newbie perspective might also be helpful.

fernandoacorreia commented 1 year ago

Deciding to implement this feature implies a fundamental misunderstanding about what LLMs do. MDN users are looking for authoritative, correct information, not for plausible-looking autogenerated fiction. This puts the good judgment of MDN's team in question.

AMDAndy commented 1 year ago

I am warning my team about this feature and letting them know not to trust it.

colin-p-hill commented 1 year ago

This feature does not seem to be well-targeted at the problem it is meant to solve. Writing technical documentation is time-consuming and difficult, so wanting to automate is understandable – but the target audience are precisely those people who do not have the requisite knowledge to spot mistakes, so the "Was this answer useful?" feedback buttons don't seem likely to weed out bad explanations quickly or reliably enough to avoid problems.

There is already some work done on reasoning about where and how to automate tasks appropriately and effectively, and I recommend using it as a starting point for designing features like this. It may be more appropriate in this case, for example, to build a tool at Sheridan and Verplank's LOA 3 by using AI to generate text assets which are then reviewed and edited by a human expert before publication.

PrivateGER commented 1 year ago

Placing GPT-based generations on a website that used to be for accurate documentation is so incredibly off-brand that I find it just...confusing. Newbies will find this, they will use this, and they will be fed misinformation that they cannot reasonably be expected to discern.

There's nothing to really be gained by this feature, it just smells like chasing trends with no thoughts given to the actual downsides. Not even to mention the legal issues that stem from generations of code matching public licensed code, which remains an unsolved problem.

krryan commented 1 year ago

It is beyond bizarre that I will now have to recommend people avoid MDN and use w3schools instead.

aardrian commented 1 year ago

This from the <mark> element page gets the same CSS concept wrong in a fun new way.

mark::before,
mark::after {
  clip-path: inset(100%);
  clip: rect(1px, 1px, 1px, 1px);
  height: 1px;
  overflow: hidden;
  position: absolute;
  white-space: nowrap;
  width: 1px;
}

mark::before {
  content: " [highlight start] ";
}

mark::after {
  content: " [highlight end] ";
}

From the fake-AI:

Overall, this code example creates a highlight effect by using pseudo-elements to add invisible elements before and after the content of the <mark> element. These invisible elements are positioned absolutely and have a small size, effectively hiding them from view. The content property is used to add visible text before and after the <mark> element's content, creating the highlight effect.

The essentially same code from the <del> element page gets this explanation:

Overall, this code example creates a visual representation of a deleted text by hiding the content of the <del> element and adding " [deletion start] " before the hidden content and " [deletion end] " after the hidden content.

I will spare you the same advice for the same code on the <ins> page.

The point is, the LLM in use does not understand CSS. Nor accessibility.

patrickhlauke commented 1 year ago

i mean at this stage, should at the very least add a big fat "this explanation may actually be complete toss" warning in front of it. or, you know, reevaluate what the actual point of having this "feature" is if it's a crap-shoot whether it's useful or just a pile of hallucinated rubbish

DavidJCobb commented 1 year ago

What is this feature even meant to offer? It's taking documentation and examples authored by thinking human beings who are capable of comprehending things, and bolting on clumsily-generated nonsense written by an uncomprehending automaton. That is: there's already an explanation; the entire page is an explanation; and if that explanation is insufficient, it should be edited by a thinking human being to improve it; sloppily bolting an AI-generated addendum onto it is not the right approach.

Even just looking at more of the code blocks on the article for grid: I clicked "AI Explain" on the HTML for one of the examples -- a code block with a #container element and several empty divs. Predictably, the LLM spat out three or four paragraphs of "middle schooler padding for word count"-tier dross about how the example "demonstrates how to add child elements," because the LLM couldn't comprehend the context of the code block. It couldn't and didn't properly explain the code block in the context of the grid property, the broader thing that that HTML was meant to demonstrate. "The HTML was setting up a single grid container, and a set of divs that would be rendered as colored blocks to visually illustrate the grid layout." If an explanation is actually necessary, that's a proper explanation.

Everything about this is blatantly, obviously wrong. What understanding of LLMs and of documentation as a concept could possibly lead to someone thinking this is a good idea?

patrickhlauke commented 1 year ago

What understanding of LLMs and of documentation as a concept could possibly lead to someone thinking this is a good idea?

the thinking of "actual human writers are expensive (if actually employed) ... we can save money through the power of AI"

makyen commented 1 year ago

I view this "feature" as a fundamental betrayal of the core MDN mission. That this "feature" would make it past the concept stage to even begin implementation demonstrates either a total lack of understanding of how LLM machine learning works and what "genAI" is capable of and/or a total disregard of MDN's mission. Having either of those happen in the process from concept to implementation is a complete failure.

By implementing and deploying this "feature", MDN has convinced me to stop contributing to MDN and cease donating to the Mozilla Foundation, because I am completely unwilling to participate in perpetuating the massive disinformation which this "feature" presents to users and the dramatic confusion and waste of people's time which it will cause.

Obviously, I will also stop recommending MDN as a good source of documentation. I will also need to remove links to MDN from everything I've written which can be edited.

I am so very, very disappointed in Mozilla/MDN.

acdha commented 1 year ago

This was very disappointing as a now-former MDN contributor and subscriber. The whole point of MDN was authoritative content but until there are some fundamental improvements in LLMs I might as well be supporting W3 Schools.

mcclure commented 1 year ago

Can confirm, the "AI" buttons seem to be on every page on the site.

Aside from the ethical, legal and reputational issues here— practically speaking, until I have been assured that all "AI" integration and content has been permanently removed from MDN, I cannot trust, or use, MDN for any purpose. If you put it in one place, how do I know you have not put it in another? The "AI" corruption is already interleaved with the content. Currently it seems you have to click in specific marked places to get the "AI" content to generate, but how can I be sure that this will remain the case in future?

The function that MDN serves for me is to be an authoritative source. Lots of websites can tell me how to use, I dunno, the vertical-align attribute, but (other than the spec itself, which is not always practical as a day-to-day reference) developer.mozilla is the one place I can go to look something up and know that it is unequivocally accurate and grounded in the spec and browser practice. If developer.mozilla is now to be an admixture of verified information and speech-like strings randomly generated by a text statistical model, then developer.mozilla no longer serves that function (being an authoritative source). Either I have to double-check what I'm reading or I don't.

nickautomatic commented 1 year ago

I think this might be the single worst use of machine learning that I've yet seen. It's genuinely shocking that this has managed to find its way from "terrible idea someone had" to production: that it has - that either no-one said "this is a bad idea", or that the people who said it were not listened to - suggests serious organisational problems. MDN is an absolutely indispensable technical reference: the decision to add software capable of generating plausible-sounding explanatory text with absolutely no way to guarantee its correctness beggars belief.

StarWitch commented 1 year ago

Wage suppression at its finest. Pay people to write documentation, or don't even pretend to be in the business of providing "standards" documentation at all. Simply rm -rf the whole repository and call it a day.

LLMs are simply the most sophisticated form of "garbage in, garbage out" that we have seen thus far, yet if you call it "AI" and claim that it "hallucinates" rather than "presents incorrect information that its mathematical formulas determine is correct-ish (maybe)", people will add it to anything and everything just to stay relevant and rake in those sweet VC bucks.

chris-barry-rs commented 1 year ago

I can offer another example of the explain feature being both obviously and subtly incorrect:

image

This response is clearly wrong in its statement that there is no closing tag, but also incorrect in its statement that all HTML must have a closing tag; while this is correct for XHTML, HTML5 allows for void elements that do not require a closing tag

Qix- commented 1 year ago

I've championed MDN as being some of the best documentation for web development on the internet for what feels like a decade. These new changes are absurd and antithetical to everything the site has been about.

Out of curiosity, does anyone know of any alternatives? My SO is learning to code via MDN and I don't want to have to un-teach whatever nonsense LLMs spew out.

Kiloku commented 1 year ago

I made sure to leave my feedback about the feature at the form offered on AI Help's page

NotNite commented 1 year ago

Putting my two cents in - this is legitimately depressing. MDN has been the greatest resource for learning web development in the last while, and it has saved me countless hours of Googling with explanations that work. As seen by the AI lying countless times already, stapling new technology onto a website just because it's new doesn't mean it's a good idea.

Helpful documentation now suddenly feeding garbage to its users both impacts the long time fans and new people just starting out. People starting to learn web technologies are going to read actively wrong and harmful information from a source they were told they could trust.

What else is there - the SEO hell of W3Schools? The sea of StackOverflow questions? MDN can't go down this route, because it's one of the few places that you can rely on. Maybe that 'can' will be a 'could' if this keeps up.

stevefaulkner commented 1 year ago

Remove this feature now before MDN’s credibility and trustworthiness is destroyed.

immibis commented 1 year ago

I think this is a great feature. Until now I've had to hack into my competitors' internet connections to make them download fake information from the internet, and TLS made that quite difficult. But now that Mozilla is putting its own digital signature on outright lies, my job is greatly simplified. I just have to send them a link to MDN on Discord and my job is done. Great work, Mozilla!

Fire-Dragon-DoL commented 1 year ago

From my perspective a reference documentation should use no AI at all. Accuracy is the most important quality of a reference doc.

ai has to learn from somewhere and that somewhere should be mdn

instructorkjrsten commented 1 year ago

I teach web development. I've always recommended and championed MDN's content, and warned students against using W3Schools and other sources that are not as committed to accuracy and usability.

Now I cannot recommend MDN.

sideshowbarker commented 1 year ago

MDN core reviewer/maintainer here.

Until @stevefaulkner pinged me about this (thanks, Steve), I myself wasn’t aware that this “AI Explain” thing was added. Nor, as far as I know, were any of the other core reviewers/maintainers aware it’d been added. Nor, as far as I know, did anybody get an OK for this from the MDN Steering Committee (the group of people responsible for governance of MDN) — nor even just inform the Steering Committee about it at all.

The change seems to have landed in the sources two days ago, in https://github.com/mdn/yari/commit/e342081cbf92073ca2071e8af8a9a329b05f3d29 — without any associated issue, instead only a PR at https://github.com/mdn/yari/pull/9188 that includes absolutely not discussion or background info of any kind.

At this point, it looks to me to be something that Mozilla decided to do on their own without giving any heads-up of any kind to any other MDN stakeholders. (I could be wrong; I've been away a bit — a lot of my time over the last month has been spent elsewhere, unfortunately, and that’s prevented me from being able to be doing MDN work I’d have otherwise normally been doing.)

Anyway, this “AI Explain” thing is a monumentally bad idea, clearly — for obvious reasons (but also for the specific reasons that others have taken time to add comments to this issue to help make clear).

At this point, I can at least promise that I’m personally going to escalate this internally as high as I can, with as much urgency as I can (and have already started doing that, before even posting this comment) — with the aim of getting it removed absolutely as soon as possible.

SaphireLattice commented 1 year ago

I am rather concerned with the direction this mis-feature is suggesting.

Because let's admit, almost every single "AI integration" everywhere is actually just poking OpenAI with some data, sharing it to them, etc etc. It's not cheap, either. There's now a bunch of things that do not ever need this kind of integration yet it got shoehorned in there for no reason other than I guess PR?

That MDN maintainers were (apparently?) not involved in this, consulted or even notified about it, is extremely concerning. Their contributions have been basically said to not be good enough to explain the snippets and that a model whose entire purpose is to spit tokens one by one to simulate what a text might look like, can somehow do a better job.

It's reminiscent of these "explain/roast/dunk/whatever this" bots on Twitter that get pinged to... paraphrase the thing they were thrown at in a slightly different manner, possibly introducing fake data or dropping important parts of it.

If a feature to explain a snippet line by line (or close enough) was required, I imagine just letting contributors add a snippet would have worked just fine. Yes it would take a while to populate, but if there is an actual need for this, then it would eventually cover most things. Unless the goal is not to have an explanation but to just check off "we use AI!". Oh and this feature does not allow someone to add a manual explanation either...

eevee commented 1 year ago

well! this is quite a twist.

it's reassuring that this isn't widely considered a great idea internally, though disheartening that _some_one thought it was such a great idea that it wasn't worth telling anyone about. i did find the complete lack of a paper trail odd, but i don't know much about how MDN operates internally.

thanks for taking this seriously, @sideshowbarker. if this was handed down from higher up, then i hope MDN can find a way to insulate itself against further decisions from the responsible party.

erisdev commented 1 year ago

Incredible. So until this is fixed, Mozilla have created a situation in which I can no longer, in good conscience, recommend MDN as a reliable reference. I never thought I'd see the day.

I would have the job of whoever approved this if it were in my power. Mozilla needs responsible stewards, not trend chasing fools at the helm.

ericflo commented 1 year ago

I think this is a good feature and it should be improved upon with some better copy indicating it's generated automatically and may have errors. I think readers benefit overall, and that the quality will improve over time, especially (as has been suggested) if feedback mechanisms are improved so better training data can be collected.

DethRaid commented 1 year ago

This feature is providing incorrect information. How is it a good feature?

eevee commented 1 year ago

what possible use is a disclaimer to a reader who can't judge the accuracy of the generated text themselves (because if they could, they wouldn't need it)? either they ignore the disclaimer as meaningless fluff and absorb false information, or they listen to the disclaimer and have no real choice but to avoid the feature entirely. the only purpose a disclaimer serves is to blame the reader when they inevitably believe what MDN tells them. is that the message a technical reference should be sending?

the "better training data" treadmill is a nice fantasy, but perhaps it would be more efficient to simply write prose, rather than endlessly try and trick a predictive text engine into saying true things (and have it say false things in the meantime). remember, the tool powering this feature isn't designed to know things. it's designed to generate text that a human might plausibly write, and that's what it does.

DethRaid commented 1 year ago

it's designed to generate text that a human might plausibly write

This is a very important thing to remember with all these large language models. They're text generators, not oracles of truth. The text they generate happens to be true a lot simply because the text they were trained on happened to contain a lot of truths. If you only trained a LLM on the Star Wars expanded universe it wouldn't know the first thing about CSS

fenndev commented 1 year ago

@ericflo There is a world of difference between documentation that is poorly written and documentation that is incorrect or misleading. The former is frustrating, but is still usable; the latter is worse than useless.

And - "improve over time" through "feedback mechanisms"? You mean the process of improving documentation that already happens, is done by people with domain-specific knowledge, and is reviewed by other people in that field for accuracy, consistency, and ease of understanding?

We don't let users run untested code on our websites and put in a great deal of effort to prevent that. Why on earth would we let AI generate unreviewed text for our documentation? It defeats the entire purpose of docs in the first place.

nicolas17 commented 1 year ago

At this point, I can at least promise that I’m personally going to escalate this internally as high as I can, with as much urgency as I can (and have already started doing that, before even posting this comment) — with the aim of getting it removed absolutely as soon as possible.

Just send a pull request reverting it. How come something like this can get added without any review, background information, or informing the Steering Committee; but you need to escalate, discuss, and justify yourself to remove it back?

"Given the lack of discussion clearly this was merged prematurely by accident".