ocaml / omd

extensible Markdown library and tool in "pure OCaml"
ISC License
156 stars 45 forks source link

Support for GitHub-Flavoured Markdown tables #292

Closed bobatkey closed 1 year ago

bobatkey commented 1 year ago

First cut of support for GitHub-Flavoured Markdown (GFM) tables. No tests yet, but I thought I'd submit a preliminary pull request to make sure that there was interest. This handles simple tables that always have headers, and only allow inline content in cells, and no multi-column or multi-row cells.

All of the existing tests pass.

Addresses part of issue #205 .

GFM-style tables are documented here: https://github.github.com/gfm/#tables-extension- . The code so far seems to work for all the examples in that document, but I haven't made a proper test suite yet, mostly because I don't yet understand how the testing works. I've noticed that there are a few differences between the GFM documentation and Pandoc's gfm implementation too.

I added some functions to the StrSlice module to make the parsing of tables easier.

There are choices in how to represent the cell alignment in the HTML output. I went for the one chosen by Pandoc (CSS style information), but the align="left/right/center" used in the GFM document might be better.

I made up a converstion to Sexprs. I also noticed that inline code and images are not properly converted into Sexprs. What uses this translation?

I'm not sure if I'm handling link definitions properly in the case of lines at the end of blocks that look like they might be table headers, but aren't (line 80 in block_parser.ml).

bobatkey commented 1 year ago

These four new commits do the following:

  1. Fix up the parsing to cover some more cases, and to bring it inline with the GitHub-Flavoured Markdown specification. There is one exception, where the GFM spec contradicts itself on whether or not backslashes are always preserved in code spans.
  2. Adjust the HTML output to better match the GFM spec when printing tables. This makes testing easier.
  3. Add some tests. There are two sets of test. The first are the excerpt of the GFM specification that covers tables. I modified the one that has the non-CommonMark backslash-in-code-span behaviour. The second set of tests are some more examples covering cases that I discovered while deciding how to parse properly.
  4. Fix the code formatting by applying ocamlformat.

I hope this is a useful contribution!

shonfeder commented 1 year ago

Thank you very much for the contribution: It is most welcome!

At a superficial survey, the extensions look great, and the additions to the StrSlice module most welcome. I will set aside some time for a proper review this weekend :)

shonfeder commented 1 year ago

There are choices in how to represent the cell alignment in the HTML output. I went for the one chosen by Pandoc

I think following the lead of pandoc is a good choice, unless there's compelling reasons to differ :)

I made up a converstion to Sexprs. I also noticed that inline code and images are not properly converted into Sexprs. What uses this translation?

These aren't used for anything in Omd. I think they were only meant for debugging and possibly for interop with some other libraries?

Thank for including a s-expr conversion!

tmattio commented 1 year ago

Thanks a lot for the fantastic work on this!

@shonfeder do you think we could cut a release of omd? Having this would solve a rather annoying issue in ocaml.org: https://github.com/ocaml/ocaml.org/issues/59

shonfeder commented 1 year ago

Hi @tmattio! I’ll make time to cut a new alpha release next week. :)

shonfeder commented 1 year ago

@tmattio a bit delayed, but see https://github.com/ocaml/opam-repository/pull/22654

tmattio commented 1 year ago

Thanks a lot @shonfeder!