mdn / sprints

Archived: MDN Web Docs issues are tracked in the content repository.
https://github.com/mdn/content
Creative Commons Zero v1.0 Universal
150 stars 144 forks source link

Choose a Markdown format for GitHub-hosted content #1505

Closed wbamberg closed 5 years ago

wbamberg commented 5 years ago

We expect to use Markdown as an authoring format for GitHub-hosted content.

But there are several different versions of Markdown, with different features and levels of support by tools.

We should:

Acceptance criteria

ddbeck commented 5 years ago

OK, I wrote a gazillion words to myself to sort out this Markdown question. For more about:

see this Google doc here, but these are the important bits:

Requirements

The Markdown we choose must:

Beyond that, certain features are desirable for stumptown. It would be nice if the Markdown we choose has:

Subjectively, I’d expect our chosen Markdown specification and implementation to have:

If our chosen Markdown is missing a desirable feature, we may choose a plugin or extension which supports that feature, provided the plugin is congruent with our expectations for the main implementation such as being well-documented and maintained and works cooperatively with other ecosystem tools.

Proposal: use remark (via unified)

We should use GitHub Flavored Markdown (GFM) with the unified library (or its included remark library directly).

GFM meets our requirements and brings some benefits on top of CommonMark:

That said, GFM isn’t a perfect selection. It does have some shortcomings:

For implementation with stumptown, we should use remark or its parent library, unified.

unified/remark meets most of our requirements and demonstrates some strengths over other possible implementations. Some points in its favor include:

unified/remark does have a few drawbacks, but they seem surmountable. For example:

Ultimately, I couldn’t make a solid recommendation about whether to adopt any specific extensions to GFM or unified/remark, particularly for parsing front matter or definition lists. If we can commit to never exposing Markdown to stumptown consumers, then we can follow a principle I lay out below. If we can’t make that commitment, then we need to strictly adhere to GFM, leaving raw HTML (perhaps with custom tags) as our only option for “extending” GFM.

As a principle, if we adopt any extensions to GFM, then we should test those extensions for cooperation with GFM as specified. In other words, our extensions to GFM should be readable (if not pretty) in GitHub renders; GFM spell check or linters should be able to provide meaningful, if not complete, checks on our source. Neither GFM nor unified/remark appear to be a barrier to this principle, but I haven’t yet had an opportunity to test specific extensions for this.

wbamberg commented 5 years ago

From the Google doc (sorry I started commenting there then realised this place is probably better):

I’m going to assume that we’re choosing a Markdown only for internal use to stumptown and that we’re not going to ask stumptown consumers to parse Markdown.

We've mentioned before that l10n might be a consumer of Markdown. This doesn't seem certain, but how would that affect your recommendation?

wbamberg commented 5 years ago

Thanks for this, @ddbeck , it looks very sensible.

GFM and its implementations support features we already recognize as desirable beyond those offered by CommonMark, such as fenced code blocks (with language annotation) and tables.

These are both important features.

GFM itself doesn’t specify front matter,

I think we will need a way to process front matter (as the stumptown structures are currently defined). currently we're using gray-matter, apparently (https://github.com/mdn/stumptown-experiment/blob/master/scripts/build-json/compose-examples.js#L9). I haven't tried this, but it looks as if this would be independent of our choice of Markdown parser (it seems like it just gives you the Markdown in content, and you can then parse that as you like).

If we can commit to never exposing Markdown to stumptown consumers, then we can follow a principle I lay out below. If we can’t make that commitment, then we need to strictly adhere to GFM, leaving raw HTML (perhaps with custom tags) as our only option for “extending” GFM.

I think that even if we couldn't commit to "never exposing Markdown to stumptown consumers", we might still to be able to commit to never exposing front matter.

ddbeck commented 5 years ago

Thanks for taking a look at this, @wbamberg! I realize there was a lot to go over.

We've mentioned before that l10n might be a consumer of Markdown. This doesn't seem certain, but how would that affect your recommendation?

If we can obligate translators to handle Markdown in a particular way (unified has a nifty preset API for making it easier for this happen), then my recommendation still stands. In other words, if localized Markdown is ultimately converted to HTML for general consumption, then we can treat localization as an "internal" use, even if localization consumes a JSON structure that contains Markdown instead of HTML.

On the other hand, if we mean for Markdown to be an option for general consumption alongside or in place of HTML, then my recommendation would be to strictly follow GFM and use raw HTML for any extension use cases (e.g., use plain <dl> tags instead of an extension to Markdown). We might be able to do some tricks with custom elements/Web Components for more complex cases, but for the most part we'd be constrained to plain GFM and HTML.

I think we will need a way to process front matter

Yes, definitely. I sorta skated past that. You're right that we don't need to ever expose it to consumers—an unstated assumption on my part—and I didn't give it much thought beyond that. But to expand on the front matter situation a little:

The bad news is that nobody included front matter in a specification. The good news is that it doesn't seem to matter much, provided we use some conventional-looking front matter. That basically means YAML, blocked like this:

---
some: yaml
goes: here
---

(Or we could use TOML with +++ fencing, but I recognize that TOML is unusual and I'm in the tiny minority that prefers it.)

I was mistaken in my original write up: unified does have a package for parsing front matter, which we could use or we could stick to gray-matter. The semantics of remark's approach is slightly different—the front matter becomes a YAML content node of the document rather than something cleaved from the content—but it doesn't seem any harder to work with, if we want to stick to one ecosystem.

wbamberg commented 5 years ago

OK, thanks for the clarifications @ddbeck . I'm happy with the choice and the process you've used to arrive at it.

To close this issue, looking at the AC above:

decision is documented with a rationale

It would be good to record this choice and the reasoning for it in the stumptown repo rather than a random issue under mdn/sprints (basically copying your doc or a version of it somewhere there), but otherwise I think we can call this done.

the chosen tool(s) along with any extensions is deployed

I guess this is a quite simple change to stumptown-experiment.

ddbeck commented 5 years ago

It would be good to record this choice and the reasoning for it in the stumptown repo rather than a random issue under mdn/sprints (basically copying your doc or a version of it somewhere there), but otherwise I think we can call this done.

OK. I'm going to max out on hours this week. Should we put this officially in the next sprint, to open a PR summarizing the decision?

wbamberg commented 5 years ago

Should we put this officially in the next sprint, to open a PR summarizing the decision?

We talked about this in the planning meeting today. I think it would be good to keep this issue open to track this last bit, and add it to the next sprint, but as a lower priority for you than BCD. If you get time after BCD, then great, otherwise you can do that in a later sprint. I think with the work you've done here we have a solid basis to move ahead, and the remaining stuff is just paperwork really.

Does that make sense to you?

ddbeck commented 5 years ago

Sounds good!

a2sheppy commented 5 years ago

I generally feel that if we are going to use Markdown, we need to be able to avoid having to fall back to HTML as much as is remotely practicable. Any time you have to mix and match them to accomplish your tasks is a potential failure point in the markup that it would be best to avoid.

There are a number of articles about why Markdown is not a great choice for writing documentation, so I won't add to the ranting on that front, other than to say that I agree that it is not a good choice (other than to say that having to write while reading markup at the same time is tedious and awkward, so I hope we find a WYSIWYG editor to offer). But I presume that ship has sailed at this point anyway. :)

Some thoughts I have on this issue:

I know there's more but that is what comes immediately to mind. I hope we find a solution that supports everything we need well.

jpmedley commented 5 years ago

I started reading Eric's comments with the intent of arguing against him because Google has a site where we've successfully used markdown for years.

But he convinced me.

We frequently mix markdown and HTML specifically because of figures and videos like the ones he mentions. After reading the list of additional ways that MDN would have to mix md and HTML, it appears that you will likely only be saving us from typing a handful of tags: hn, p, code, and pre.

(By the way, I think the code problem alluded to was specific to your engine. Google's site has never had this problem that I am aware of.)

a2sheppy commented 5 years ago

@jpmedley Yeah, it's really a matter of how intricate the content is in and how much of a focus you put on detailed presentation with embedded examples and whatnot. The more complex the content, the harder it is to shoehorn it into a Markdown world comfortably. We have a lot of figuring out to do if we really are going to migrate back to Markdown after all these years.

wbamberg commented 5 years ago

I don't expect that in this future MDN you will be able to do all the same things you could do in the old one, and I don't particularly think this is a bad thing. To make a very close analogy: you don't have as much freedom for how to represent compat data now as you used to have, and the payoff for this is (1) highly consistent tables (2) the ability to easily change the appearance of tables (3) authors don't have to hand-craft the tables, and can just focus on the content.

So I don't think we should approach this like: "we can do X now: therefore our replacement must also do X". Instead we want to understand better what are the things that an authoring format absolutely must support, and to do this we have chosen to experiment with Markdown. I agree that Markdown is very limited. But choosing an authoring format is going to be an exercise in compromise. There isn't a perfect authoring format: if there were, everyone would be using it.

So when we encounter things in MDN that Markdown can't support, we need to ask questions like:

Maybe Markdown will turn out to be too limited. We'll find that out by trying to migrate pages and asking the questions above, and if we do, we'll have to think again. Do you have a suggestion for a format that would be better? (I don't think HTML + CKEditor is better: for any nontrivial edit I usually find myself in the source view anyway, many of our pages contain junk HTML from people pasting things into the editor, and reviewing diffs is really difficult.)

Finally: for any content we structure, the pages are going to be built by software, and this will sometimes help with elements that aren't supported in Markdown. For example, the list of HTML element attributes are rendered as a <dl> from a list of attributes in the structured content. So you still get the <dl> in the rendered page, without Markdown needing to understand it. This kind of thing covers most of the uses of <dl> that we have in MDN (for example, lists of properties of an interface or lists of parameters to a function).

ddbeck commented 5 years ago

Will has said nearly everything I started to write last night, but I wanted to add a few things:

I agitated for this process of making an explicit choice of Markdown specification and implementation because I think Markdown is flawed. I'm a Markdown hater. I will happily talk to anyone—at embarrassing length—about what's wrong with Markdown1. I do not think for a second that we will escape all of Markdown's shortcomings (though, in a lot of cases, it's no great harm to this project, as we already have many of those shortcomings and worse, on the wiki right now). But I think we can avoid the worst shortcomings of Markdown by being smart about how we use it, particularly in ways that decouple the way we author content and the way it's presented on the final page.

1Though it pains me to acknowledge this, Markdown has fewer problems in 2019 than it did five years ago. I have fewer bad things to say about Markdown today and that makes me wistful.