jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.11k stars 3.35k forks source link

Include Cabal foreign-library to build C shared library? #6611

Open Sarke opened 4 years ago

Sarke commented 4 years ago

Since Cabal supports this since a couple of years ago, would it be possible to also build a shared library and increase the usefulness of pandoc?

https://cabal.readthedocs.io/en/latest/cabal-package.html#foreign-libraries

I know there is https://github.com/ShabbyX/libpandoc but it doesn't seem active, and was made before Cabal introduced it's foreign-library feature.

This would be great because many languages, not just C based, would be able to link it directly (Go, Rust, PHP, Ruby, Python, etc).

jgm commented 4 years ago

I'm definitely open to this, if someone wants to try, though I know nothing about it. Some thought would have to go into the interface.

tarleb commented 4 years ago

Slightly related: I keep thinking about whether it would make sense to rebuild some part of pandoc's Lua filter with C: AST elements would be marshaled to C structs, and accessors would be written in C.

My gut feeling is that this would increase code size of our current approach by a factor of 2, but coupling should stay approximately the same. The advantage would be that we'd get a C library to modify the pandoc AST; possibly some better performance, too.

All my previous attempts to reduce the amount of Lua code failed due to performance; callbacks from Lua/C into Haskell are comparatively slow, despite my optimization efforts. Rewriting in C wouldn't help with code reduction, but it might help others to bind to pandoc.

estebanlm commented 3 years ago

hello, I come here to push for this issue :) (someone recommended me to do it in the google group). Right now, I work developing a language which is not very known (pharo.org), which has its documentation system using markdown. I am developing a tool that would allow us to generate documentation in different formats and I want to use pandoc for it (why to do worst what you already do the right way ;) ).

Using command line for this is a bit annoying. Yes, I can use a system call, etc. but the possibility to use pandoc through FFI would easy the development a lot and encourage other users of Pharo to use it ;)

mb21 commented 3 years ago

Does anybody know how stable the ABI would be? Would it change with every major GHC release?

I keep thinking about whether it would make sense to rebuild some part of pandoc's Lua filter with C

What about Rust? 😄

jgm commented 3 years ago

The simplest possible interface would just export convertWithOpts, using a JSON serialization of the Opts structure representing command-line options and arguments. This would not be hard to do at all, but it would offer minimal advantages over shelling out to the command-line tool. It would only allow you to do conversions on files, or via stdout/stdin; I imagine many users would want an interface more like:

convert :: Opts -> ByteString -> IO ByteString
jgm commented 3 years ago

See https://cabal.readthedocs.io/en/3.4/cabal-package.html?highlight=foreign-library#pkg-field-foreign-library-options

It seems to say that a "standalone" library is only possible for Windows currently; on other platforms, you'd need a whole slew of shared libraries (libHSrts, libHSbase, and everything else pandoc depends on). This might make this approach less attractive than it seems at first.

ickc commented 3 years ago

What are the available solutions of cross-compiling to another language? (e.g. tweag/asterius: A Haskell to WebAssembly compiler)

Some of those solutions (at least asterius) can be true standalone and may be it would be useful?

jgm commented 3 years ago

Cross-compiling support isn't strong yet. A pandoc in webassembly would be useful, but that should be another issue. Anyway, I don't think the tooling is quite there yet.

ickc commented 3 years ago

Personally I think even if it is just a function call replicating what the cli is doing would be better than calling pandoc cli directly (which is what most if not all current filter frameworks are doing.) The first reason is that it just feels wrong to do it that way (not building API interface directly), and secondly for performance. For example, currently if I have a loop that needs to call pandoc repeatedly, the bottleneck is the overhead of each subprocess. Loop like this is very common when e.g. we have a table (that easily have a ton of cell elements). So currently it has to resort to hack like this for non-standalone conversions, essentially to wrap those n calls into divs and unpack it after calling pandoc.

A support for just wrapping the cli as function call (the convertWithOpts) is that it probably is going to be very easy for existing solutions to port over, as they essentially has the same interface, even if we need to use stdin/stdout (probably won't be much of a bottleneck?)

In a certain sense it seems it would be great if we can obtain the AST directly (i.e. all the elements as a proper type but not a JSON representation.) E.g. in panflute we need to reconstruct an AST representation from the JSON representation, so it is quite error prone and unintuitive (but that's hidden away from panflute users.) So a typical life-cycle would be "pandoc AST -> JSON AST -> panflute AST -> JSON AST -> pandoc AST".

But on the other hand I'm not sure if that would be a "stable" approach for writing filters, as the JSON representation is relatively stable and breaking change rarely occur.

Another point is that it would be nice to provide a "parallel" version of that function call, like converts(texts: List[str], ...) -> List[str] as a variant of convert(text: str, ...) -> str and let pandoc/Haskell handles the parallelism, because that would be quite a likely scenario that people calling pandoc multiple times (say a static site generator).

ickc commented 3 years ago

Sorry, one more comment: probably the dynamic library is not that less attractive. On one hand pandoc is hard to compile, but if we (3rd parties) can provide it in some package managers then it would be not a problem to end-users. The one I'm interested in is conda (for Windows, Linux, macOS and perhaps arch variants), and I think brew, apt, pacman would probably be popular enough among pandoc users to have someone on board.