Using content to extend front-matter?

harrisjose commented 4 years ago

Thank you for making next-mdx-enhanced ❤️

How would you recommend using the mdxContent that a process() receives in extendFrontMatter? Wouldn't it be better if the content part from grey-matter was passed into extendFrontMatter instead?

Right now I'm parsing it again using grey-matter but this is already being done internally so I figured we might as well re-use that. Also, do I convert it to HTML using mdx-js and react so that I can use it in my layout?

Something like

  extendFrontMatter: {
    process: (mdxContent) => {
      let { content } = matter(mdxContent) // this is being done by mdx-enhanced internally

      /* code to render mdxContent to HTML using mdx-js and react */

      return { content }
    },
    phase: 'prebuild',
  },

Am I missing something here? My use case is fairly straightforward: an index page that shows the title as well as the first few lines from each article. Any help or ideas would be great 😅

jescalan commented 4 years ago

Hey I think you should be able to use scans for this, it behaves the way you expect: https://github.com/hashicorp/next-mdx-enhanced#scanning-mdx-content

You'd just scan for a certain number of characters or lines, then retrieve this off the __scans property on the frontmatter

jescalan commented 4 years ago

Related, if you plan to use snippets, I recommend making a snippet frontmatter property on its own. Cutting out a short portion of existing content is initially a tempting way to save work and duplication, but it ends up being a fussy and complex process if you want it to actually work well.

First you generally want to make sure you aren't cutting off any words, so you will want to split by spaces initially. But then you can still cut off awkwardly in the middle of a sentence, so better to see if you can add or subtract a few words to get the sentences to line up right. This makes things a lot more of a mess. Then you have to start thinking about markup - since you're pulling raw markdown, what if the first sentence has a link? Do you include that, or cut it out? If you cut it out, the text might not make any sense, depending on writing style, so better to make sure it's still included. But then what if you end up cutting in the middle of a link? Now you have invalid markdown, or worse, an open tag that never closes and makes the rest of your page a link. And what about images? You probably don't want those in the snippet. Plus, you have to actually render that raw markdown - this library takes care of the render for the actual content, but not for the custom snippet extraction. Does that happen at runtime? Does it use the same library?

I could keep going but surely the point is clear -- its infinitely easier to have manually defined post snippets than attempting to automate their creation. If you are really passionate about this though, honestly the best road would probably be a contribution to this library, or a separate library that takes care of snippet creation since its such a buggy and involved process.

harrisjose commented 4 years ago

Thanks for taking the time out to explain this in detail. I was trying to get this working for my blog so that I could eventually start using it for larger sites. But I did run into the same issues you mention here.

I didn't add a frontmatter property right away because that would mean changing all the existing markdown files (that's a real pain for the large-ish sites at work) and also, it would introduce another thing for the content team to keep in mind while updating a page.

I was looking at how Gatsby does this and they have a way to get an excerpt for each markdown file. What's notable is that when a standard slice of the content doesn't work for a specific page, there's a way to define where to stop the excerpt using an excerpt_separator like .

I'm going with defining a frontmatter prop right now because that makes more sense for a blog but I'll probably hack around to see if I can copy what gatsby does, within extendFrontMatter when I get around to using this at work.

Again, thanks for open-sourcing this package, it has saved me a lot of time 😍

jescalan commented 4 years ago

Of course, you're very welcome! I'd be happy to work with anyone on adding excerpt logic into this library, I think it's a feature that makes sense. But it is a lot of work and I don't personally have a need for it right now, so I'm not the best choice to lead that effort 😀

joshclow commented 4 years ago

Popping in here to say that I'm also playing with this, but rather than modifying next-mdx-enhanced (which doesn't seem to be that helpful since it doesn't itself parse the MDX, so there's no existing AST to hook off of) I'm hooking through extendFrontMatter, parsing the content with remark, and then running that AST through copies of the functions that Gatsby uses for this (which filter on text nodes, then makes smart-ish decisions about word boundaries to pare the filtered tree down, and then stringifies back to an excerpt).

I'll put this version up somewhere as an example once I sort out a bug with ignoring files

harrisjose commented 4 years ago

That would be super helpful @joshclow.

joshclow commented 4 years ago

Here's a gist with my current experiment: https://gist.github.com/joshclow/2609a82355e4a452ca5812794547688d

To use it, you'd pass the processFrontMatter in as the value of extendFrontMatter in the options for withMdxEnhanced

This is only the "plain text" excerpt to start with (because that's what I've been using on my Gatsby based blog so it's the easiest way for me to get an apples-to-apples comparison), but I did also try a rough hack to use the Markdown excerpt that Gatsby can generate too, mostly to make sure that the AST navigation stuff worked as I expected it would.

The equivalent code in Gatsby starts here: https://github.com/gatsbyjs/gatsby/blob/876ba70f9b9f10672a5e8bb3a9633174d0af20ec/packages/gatsby-transformer-remark/src/extend-node-type.js#L467

and the Markdown and HTML excerpt code they have are in the functions immediately above this one (getExcerptMarkdown and getExcerptHTML). I found that the code overall is pretty contained, but that it's very, very easy to get confused if you're looking at a function that's inputting plaintext/markdown/HTML or one that's emitting the format, and the function names don't always disambiguate.

The main bit of redundant work in this implementation is that I run the content through gray-matter again, even though next-mdx-enhanced already does that. I would hesitate to bring this code into next-mdx-enhanced directly because it feels more opinionated than the Next ecosystem is typically (if you want an opinionated static site system, Gatsby is like, right there), but there might be an argument for enhancing extendFrontMatter so that you can get both the completely unprocessed mdx content (as comes through now) and the content with the frontmatter stripped off

@harrisjose

hashicorp / next-mdx-enhanced

Using content to extend front-matter? #65