mity / md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
MIT License
776 stars 146 forks source link

Provide a dynamic library #48

Closed perezmeyer closed 5 years ago

perezmeyer commented 5 years ago

Hi! The Qt project is considering adding this library to be used in QTextDocument:

https://codereview.qt-project.org/#/c/214842/

As such distro maintainers like me would really prefer to have this library as a dynamic/shared one. Would you accept a patch for this?

mity commented 5 years ago

Hi! The Qt project is considering adding this library to be used in QTextDocument:

https://codereview.qt-project.org/#/c/214842/

I see that as big compliment. :-)

As such distro maintainers like me would really prefer to have this library as a dynamic/shared one. Would you accept a patch for this?

Sure.

But in that case I should review/reiterate the parser API so that we can provide reasonable binary compatibility into the future. I believe the API has already reached some good and stable shape, but better to be sure.

Would you mind to postpone creation of the patch about for few days? (That would guarantee that some too early binaries don't circulate out there.) Of course, I'd notify you here when I consider it ready.

BTW, do you plan to include just the parser; or also the HTML renderer into the package? (In general I would prefer the former, as I am not sure I want to get feature requests for implementing all possible output formats, and it would also require some refactorization, but I am open to discussion.)

perezmeyer commented 5 years ago

I have actually started the patch, but I have no problem in updating it later.

I really value the fact that you want to review the API first.

With respect to the HTML renderer: it can be left as an example, so packagers will directly know that it's not intended for everyday usage.

Please, do not hesitate on pinging me if you nee dmore help/data/whatever.

Thanks!

mity commented 5 years ago

@perezmeyer

The md4c.h IMHO looks quite good, except the following points:

  1. MD_VERSION_xxx macros should likely be removed and replaced by some CMake machinery to output the right lib names.

  2. I am unsure about what versioning policy should be adopted with respect to the shared lib and you, as a package maintainer, can likely provide valuable feedback: Whenever newer CommonMark specification version is out, there might be changes how the markdown input is interpreted in some corner cases, but without any need for changes in the API. That has already happened in the past (e.g. in specification 0.27 -> 0.28 transition). Should such transitions be seen as incompatible changes (and lead to increasing the shared lib major version number) or not?

  3. Encoding is currently controlled with preprocessor macros (see https://github.com/mity/md4c/wiki/Embedding-Parser%3A-Encoding). I guess that (at least for Linux) the shared lib should likely be built with -DMD4C_USE_UTF8 by default. Do you know what Qt expects? Or should the UTF-8 support rather be controlled by some run-time flag? (That would require some work.) Any opinion?

perezmeyer commented 5 years ago

@perezmeyer

The md4c.h IMHO looks quite good, except the following points:

  1. MD_VERSION_xxx macros should likely be removed and replaced by some CMake machinery to output the right lib names.

That's how is normally handled within KDE projects :-/ I really don't know any other way to solve this. Maybe you have an idea of what you expect so I can take a look?

  1. I am unsure about what versioning policy should be adopted with respect to the shared lib and you, as a package maintainer, can likely provide valuable feedback: Whenever newer CommonMark specification version is out, there might be changes how the markdown input is interpreted in some corner cases, but without any need for changes in the API. That has already happened in the past (e.g. in specification 0.27 -> 0.28 transition). Should such transitions be seen as incompatible changes (and lead to increasing the shared lib major version number) or not?

Now that's an interesting question. The SONAME (so, the major version) could definitely change so any user of the library will know in advance that something might break, but then I'm not entirely sure this is really needed.

In this specific case I would start by not bumping the version/SONAME and see what happens.

  1. Encoding is currently controlled with preprocessor macros (see https://github.com/mity/md4c/wiki/Embedding-Parser%3A-Encoding). I guess that (at least for Linux) the shared lib should likely be built with -DMD4C_USE_UTF8 by default. Do you know what Qt expects? Or should the UTF-8 support rather be controlled by some run-time flag? (That would require some work.) Any opinion?

I have just pinged Shawn Rutledge, who is the developer in charge of adding markdown support. He will surely have a much better idea of this.

Thanks for your fast replies!

perezmeyer commented 5 years ago

@perezmeyer

The md4c.h IMHO looks quite good, except the following points:

  1. MD_VERSION_xxx macros should likely be removed and replaced by some CMake machinery to output the right lib names.

Sorry! Now that I re read this I understood it. That's what I did in https://github.com/mity/md4c/pull/49

mitya57 commented 5 years ago
ec1oud commented 5 years ago

I'm the one who started this idea (and the patches) to integrate it into Qt. I'm only interested in the parser so far: my proposal is to use it to provide another import format for QTextDocument. (And I have a way to export too, but wrote that from scratch.) However it seems Qt Creator has a use for HTML generation: https://codereview.qt-project.org/#/c/249032/

Yes, there has been backlash that it would be nice if it were available as a system library. I was surprised a bit because we bundle other third-party code, but now there's a movement to stop doing that. I suppose the idea is that if a bug is found, the third-party library can be updated independently, but not quite sure if that's the main motivation.

Using discount has been proposed... but it doesn't expose any parsing API AFAICT, only functions to generate HTML or some other parse tree format which would then have to be parsed again. I didn't really consider cmark because I think CommonMark is too limited by itself, whereas the features that QTextDocument and those that md4c support are nearly the same, so that's why it seems to be a good fit.

Fuzzing will probably continue, so we'll see if any more bugs are found that way.

I don't have experience putting things into Arch AUR, but I suppose I could give it a shot if nobody else gets around to it, just so we can test out the abililty of Qt to depend on a system lib for this.

So I'm glad you consider the shared lib a good idea.

1) sorry I'm not an expert on cmake or how library versioning ought to be done.

2) every change may be an improvement for someone and/or may generate a bug from someone else. I guess we'll see how that plays out. Hopefully there won't be big changes that break parsing of many existing documents?

3) I think it's fine if the Qt markdown importing feature supports UTF-8 only, unless I'm missing something. But Qt is known for having good i18n support, and I haven't tested markdown in other languages than English, so I hope it works. (I should do that, actually)

perezmeyer commented 5 years ago

I'm the one who started this idea (and the patches) to integrate it into Qt. I'm only interested in the parser so far: my proposal is to use it to provide another import format for QTextDocument. (And I have a way to export too, but wrote that from scratch.) However it seems Qt Creator has a use for HTML generation: https://codereview.qt-project.org/#/c/249032/

Thanks for chimming in!

Yes, there has been backlash that it would be nice if it were available as a system library. I was surprised a bit because we bundle other third-party code, but now there's a movement to stop doing that. I suppose the idea is that if a bug is found, the third-party library can be updated independently, but not quite sure if that's the main motivation.

That's one of the most important ones, indeed. But definitely not new, it has always been like that within distros. Maybe we became more verbose within Qt lately ;-)

Using discount has been proposed... but it doesn't expose any parsing API AFAICT, only functions to generate HTML or some other parse tree format which would then have to be parsed again. I didn't really consider cmark because I think CommonMark is too limited by itself, whereas the features that QTextDocument and those that md4c support are nearly the same, so that's why it seems to be a good fit.

Fuzzing will probably continue, so we'll see if any more bugs are found that way.

I don't have experience putting things into Arch AUR, but I suppose I could give it a shot if nobody else gets around to it, just so we can test out the abililty of Qt to depend on a system lib for this.

Well, I do handle the Debian (and by cascade, Ubuntu) side, but if you find any issues let me know, I'll see to fix them.

So I'm glad you consider the shared lib a good idea.

  1. sorry I'm not an expert on cmake or how library versioning ought to be done.
  2. every change may be an improvement for someone and/or may generate a bug from someone else. I guess we'll see how that plays out. Hopefully there won't be big changes that break parsing of many existing documents?
  3. I think it's fine if the Qt markdown importing feature supports UTF-8 only, unless I'm missing something. But Qt is known for having good i18n support, and I haven't tested markdown in other languages than English, so I hope it works. (I should do that, actually)
mity commented 5 years ago

I have done some changes to the MD_RENDERER struct:

App developers may need to touch the initialization of the structure. Sorry for that, but I believe it is now much more friendly for the ABI compatibility maintenance.

So, right now, I consider the API ready for the shared lib support, but anyone feel free to review/comment until we make next release.

I also added new file CHANGELOG.md to track the changes in human readable form, so distro packages can install it somewhere into /usr/share/.

Unless there are any objections to the current shape, I consider it ready for adding the support for shared lib as soon as #49 is ready; and then making RC and give it some time (two weeks?) to see if there is any feedback, e.g. from the Qt team, and then releasing it with some fanfares.

Does anyone have some strong preference whether the next release should be 0.3.0 or 1.0.0? I am undecided between the two.

mity commented 5 years ago

@perezmeyer Maybe I was not explicit enough in the previous comment (if so, sorry for it); or perhaps you are just busy on something else. Just to be sure that the message is not lost, I consider the API ready and we can go ahead as soon as your patch is ready.

perezmeyer commented 5 years ago

It's me that is busy :-) ACK, will try to finish that soon.

perezmeyer commented 5 years ago

@mity The static lib was meant just to build md2html or are you expecting to ship the static lib too?

Considering there where no install targets I assume it's the first option. If that's so, then maybe just building the shared library is enough, thus simplifying the rest of #49.

mity commented 5 years ago

To be honest, I didn't care about installing of anything. Installing the shared lib is likely good enough until anyone needs the static one.