Generalize markdown driven page - generate table of contents

ghost commented 3 years ago

[x] Modify html generation to add id's to header tags using remark-slug.
[x] Define minimal bindings for relevant remark libraries to parse a markdown file.
[x] Using remark bindings, generate a tree of headings from a markdown file, ~~save it into the front matter of the markdown file~~ , pass the result to the table of contents component
[x] Switch from using mdxjs to rehype-stringify to render markdown.
[x] Use better ReScript utilities and code style in #296 with a follow-up PR.

Currently, we are implicitly using the following overly complex transformations when running mdx-js (https://github.com/mdx-js/mdx/blob/main/packages/mdx/index.js#L13):

unified()
    .use(remarkParse)
    .use(remarkMdx)
    .use(options.remarkPlugins)  // remarkSlug
    .use(customToc)
    .use(mdxAstToMdxHast)

We can simplify it down to:

unified()
  .use(remarkParse)
  .use(remarkSlug)
  .use(customToc)
  .use(remark2rehype)
  .use(rehypeStringify)

In particular, we will no longer be parsing and generating arbitrary javascript interleaved in the markdown.

ghost commented 3 years ago

Example of adding heading ids to markdown tree: https://github.com/facebook/docusaurus/blob/master/packages/docusaurus-mdx-loader/src/remark/headings/index.js

agarwal commented 3 years ago

save it into the front matter of the markdown file

Why is that useful? It stores in the markdown file information that is computable from the file itself. I feel this should not be done.

ghost commented 3 years ago

Why is that useful? It stores in the markdown file information that is computable from the file itself. I feel this should not be done.

I am still thinking through how data will be passed through phases of markdown processing.(#245) I started with the simplest approach possible, each phase would read a markdown file and output a markdown file with more information/rendering.

ghost commented 3 years ago

We are leaning towards performing table of contents generation in rescript using remark, so updating the tasks above

ghost commented 3 years ago

We might be able to use this general low-level table of contents library (https://github.com/syntax-tree/mdast-util-toc) to implement the rendered table of contents.

ghost commented 3 years ago

mdast-util-toc makes too many assumptions about how the TOC will be rendered. The docusaurus example is a good match and I will imitate that example.

The docusaurus example uses visit and mutates a resulting tree, but there are also operations like filter and map in the syntax tree utility library. I can keep polishing the traversal after it is working.

The docusaurus unit tests demonstrate some edge cases such as having a level three heading without a preceding level two heading.

The implementation of rehype-toc (https://github.com/JS-DevTools/rehype-toc/blob/master/src/create-toc.ts) might provide useful hints on edge cases to be aware of.

ghost commented 3 years ago

The docusaurus example uses visit and mutates a resulting tree, but there are also operations like filter and map in the syntax tree utility library. I can keep polishing the traversal after it is working.

Ultimately, there are two outputs from the markdown processing: a tree of JS data representing the table of content data, and the transformed markdown expressed as a tree of JS React elements. The mdast and unist utilities are meant to be used in the process from going from markdown string to tree of JS React elements. The table of content data doesn't necessary have to stay within the syntax tree format. It can begin by transforming a syntax tree and making use of the syntax tree utilities if we find them convenient, or it can go directly into a custom tree structure.

ghost commented 3 years ago

Create a follow up issue to investigate whether this plugin is powerful enough to forgo writing custom code: https://github.com/JS-DevTools/rehype-toc ... Upon further thought, rehype-toc won't be flexible enough for the mobile view of the table of contents..

ghost commented 3 years ago

Modify html generation to add id's to header tags using remark-slug.

Define minimal bindings for relevant remark libraries to parse a markdown file.

Using remark bindings, generate a tree of headings from a markdown file, save it into the front matter of the markdown file , pass the result to the table of contents component

Switch from using mdxjs to rehype-stringify to render markdown.

These will be completed when #296 is merged.

ghost commented 3 years ago

Use better ReScript utilities and code style in #296 with a follow-up PR.

This will be done in #322 .

ghost commented 3 years ago

We will be converting this to an OCaml implementation.

ghost commented 3 years ago

This is partially done in #445.

ghost commented 3 years ago

This PR provides some insightful discussion on the evolution of the TOC generation implementation: https://github.com/ocaml/ood/pull/44#issuecomment-866901600.

ghost commented 3 years ago

I am closing this under the assumption that the html generation will remain in ood, and thus, no additional work is needed

ocaml / v3.ocaml.org-rescript

Generalize markdown driven page - generate table of contents #243