mity / md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
MIT License
759 stars 140 forks source link

Making the HTML renderer part of public interface? #82

Closed mity closed 4 years ago

mity commented 5 years ago

It seems there is some (growing?) demand to have the HTML renderer (implemented in md2html/render_html.[hc] + md2html/entity.[hc]) available for easy reuse. See e.g. #80 or https://www.reddit.com/r/rust/comments/b5cul0/benchmarking_markdown_libraries_comrak/ejg6c8m/

In some ways, I see it as quite a big step, especially if it becomes part of some "stable official API", and twice so if some Linux distros start to ship it and it becomes subject of some inter-package dependencies; and also because it could possibly make sort of a precedent for the future if some other renderers to more output formats perhaps get implemented.

So I have a lot of questions (and although I may have some opinions about some of them I am still open to discussion and interested in opinion of other people in different roles).

  1. Should its code live in a new separate dir; or be moved into md4c/?
  2. Should the renderer be built as a separate lib when configured to build as a static lib; or merged into libmd4c.a?
  3. Should the renderer be built as a separate lib when configured to build as a shared lib; or merged into libmd4c.so?
  4. Should it be instalable? (I.e. are distros or other projects interested in that?)
  5. Should the header render_html.h be merged into md4c.h or live as a standalone?

In any case it should likely undergo some renaming (render_html.h or potential librender_html.[a|so] is likely not specific enough; the names should contain some "md4c" to mitigate name conflicts etc.), also its API should be reviewed and polished before making it officially public so do not expect this to happen instantly.

Feedback? Opinions? Complains?

(cc @perezmeyer @ec1oud @hvoigt)

perezmeyer commented 5 years ago

Note: I'm understanding: provide the renderer as a library, not as an application.

  1. I would keep it where it is, or even split the test application into it's own directory.

  2. For static builds I think that a separate lib makes it more clear that they are different things. Then the related .cmake config file should list first md4c and then libmd4crenderer as dependencies, so everything goes smooth (static linking should be done in the right order).

  3. Same as above, dependencies will be handled automatically.

  4. If you provide stable API I don't see why it shouldn't be installable.

  5. Both would work, as the renderer will be highly dependant of the libmd4c version.

As for the naming: as I already wrote I think either libmd4crenderer is more clear and straightforward.

Hope that helps, feel free to ping me for more info/feedback if needed. And thanks a lot for asking!

hvoigt commented 5 years ago

Here my 2 cents:

  1. I would move it into md4c why split? Its not that there is a lot of stuff in there already. A directory can hold more that two files ;) IMO, the use case "convert markdown to html" is such a typical one that it deserves to be part of the library. I would make it as easy as possible for users.
  2. Same library
  3. Same library
  4. Yes why not installable as a library (but I would keep it as one)
  5. Keep it one header md4c.h. Simpler and the current header is not that full

I like the generic approach md4c has taken, which makes it very flexible. But with that flexibility comes complexity. One thing I believe in is: "Make typical things, simple to do". HTML output is such a typical thing to do, that I would make that simple. The current renderer is also very generic, so that it actually was very easy to interface directly with Qt's native QByteArray. IMO it deserves to be part of the main library. The example application I would keep seperate though.

As far as the name is concerned I do not really have an opinion, since I would keep it all in one library.

mity commented 4 years ago

I have created the PR #89 addressing it.

In short, I tried the approach all-sources-in-one-dir but standalone renderer build target (library) with its own public header, for reasons specified below.

all-sources-in-one-dir rationale:

To be honest, I'm not that sure it is a good idea -- actually I generally prefer one-target-per-dir approach. But I simply failed to come up with a dir structure which would really provide some value and which would scale well if we implement more renderers. I assume most renderes will just need one source + one header; and also that likely all (or most of them) will need to reuse the entity.h + entity.c.

standalone renderer lib rationale:

I don't want to have a big lib with a dozen of renderers, if/when those are implemented. And I also don't like an idea of a "priviledged" renderer which would live together. Both in general and also specifically here (although I understand that Markdown and HTML are historically close to each other, I also think this relation is much weaker for languages as C which are not that much used for writing webapps.)

Imho, it should still be easy to use: HTML-generating app can just include md4c-html.h instead of md4c.h and links -lmd4c-html instead of -lmd4c. We might even make the header completely independent on md4c.h. (Now such an app may need to use the macros for the parser flags if it is not happy with the default zero corresponding strictly to the CommonMark specs.; see also the comment discussing this in the PR.)

(Feedback is welcome.)

martinrotter commented 4 years ago

I included your fantastic code into my Textosaurus and I moved renderer files into md4c folder. So for my use case it is better that all those files are in single directory. It also makes sense because they are closely interconnected and related to each other. Also, we are talkou about <10 files so maybe folder fragmenration is not desirable/useful here.

dominickpastore commented 4 years ago

I have created the PR #89 addressing it.

In short, I tried the approach all-sources-in-one-dir but standalone renderer build target (library) with its own public header, for reasons specified below.

all-sources-in-one-dir rationale:

To be honest, I'm not that sure it is a good idea -- actually I generally prefer one-target-per-dir approach. But I simply failed to come up with a dir structure which would really provide some value and which would scale well if we implement more renderers. I assume most renderes will just need one source + one header; and also that likely all (or most of them) will need to reuse the entity.h + entity.c.

I wonder if a different two-directory (or even three-directory) approach might make sense:

Most applications using the library will need a renderer, but few will need the standalone md2html utility, so it seems like a logical separation to me. Then, md2html (which, ideally, could be built separately) would serve both as a useful tool and an example for how to use the libraries.

standalone renderer lib rationale:

I don't want to have a big lib with a dozen of renderers, if/when those are implemented. And I also don't like an idea of a "priviledged" renderer which would live together. Both in general and also specifically here (although I understand that Markdown and HTML are historically close to each other, I also think this relation is much weaker for languages as C which are not that much used for writing webapps.)

Imho, it should still be easy to use: HTML-generating app can just include md4c-html.h instead of md4c.h and links -lmd4c-html instead of -lmd4c. We might even make the header completely independent on md4c.h. (Now such an app may need to use the macros for the parser flags if it is not happy with the default zero corresponding strictly to the CommonMark specs.; see also the comment discussing this in the PR.)

I really like this approach. Most applications are going to need a renderer of some sort, most likely the HTML renderer, so making that a library makes a lot of sense. But different applications will sometimes need different renderers (thinking of e.g. PDF or manpage converters), or their own custom renderer, and this allows maximum flexibility.

I hope we can see this merged into master soon!

(Feedback is welcome.)

mity commented 4 years ago

I included your fantastic code into my Textosaurus and I moved renderer files into md4c folder. So for my use case it is better that all those files are in single directory. It also makes sense because they are closely interconnected and related to each other. Also, we are talkou about <10 files so maybe folder fragmenration is not desirable/useful here.

@martinrotter Imagine we do it that way and that the dir absorbs 5 other renderers you are not interested in (e.g. to man, pdf, rtf etc.). You would have then in your project filter what you link into it and what not. So, would it still be preferred in one dir?

mity commented 4 years ago

standalone renderer lib rationale: ...

I really like this approach ...

The standalone lib for every renderer is in my head more or less decided.

The source dir structure is not but at least that one is not subject of ABI, so we can likely proceed anyway and change the dir structure later if needed.

I wonder if a different two-directory (or even three-directory) approach might make sense:

src/ - Containing MD4C core library sources and renderer library sources (or even the renderer library sources in a subdirectory with entity.h/.c) md2html/ - Containing the sources for the standalone md2html utility.

That might be a reasonable compromise. I'm still not sure whether it is ideal, but it is definitely better then all in one dir: People likely don't need to reuse the utility sources so it makes good sense to separate that.

I'll adapt the PR #89 that way.

I hope we can see this merged into master soon!

If there are no complains in a week or two after the PR update, I will then merge it.

mity commented 4 years ago

The PR has been merged. Closing the issue.

dominickpastore commented 4 years ago

I'm not really sure where else to put this, so I'm just putting it here.

I have written Python bindings for MD4C based on the updates from this issue: PyMD4C. After PR #89 makes it into a MD4C release, I will make the first release for PyMD4C and add it to PyPI. Meanwhile, a beta version is available at the link.

(Side note: "PyMD4C" seemed like a logical name but if you prefer something that bears less similarity to "MD4C," I can change it.)

mity commented 4 years ago

I have written Python bindings for MD4C based on the updates from this issue: PyMD4C. After PR #89 makes it into a MD4C release, I will make the first release for PyMD4C and add it to PyPI. Meanwhile, a beta version is available at the link.

Good to know. Feel free to make a PR adding the link into the README.md to get some visibility if you like. There are already some links to some SW based on MD4C.

(Side note: "PyMD4C" seemed like a logical name but if you prefer something that bears less similarity to "MD4C," I can change it.)

No objections at all.