Open Sarke opened 4 years ago
I'm definitely open to this, if someone wants to try, though I know nothing about it. Some thought would have to go into the interface.
Slightly related: I keep thinking about whether it would make sense to rebuild some part of pandoc's Lua filter with C: AST elements would be marshaled to C structs, and accessors would be written in C.
My gut feeling is that this would increase code size of our current approach by a factor of 2, but coupling should stay approximately the same. The advantage would be that we'd get a C library to modify the pandoc AST; possibly some better performance, too.
All my previous attempts to reduce the amount of Lua code failed due to performance; callbacks from Lua/C into Haskell are comparatively slow, despite my optimization efforts. Rewriting in C wouldn't help with code reduction, but it might help others to bind to pandoc.
hello, I come here to push for this issue :) (someone recommended me to do it in the google group). Right now, I work developing a language which is not very known (pharo.org), which has its documentation system using markdown. I am developing a tool that would allow us to generate documentation in different formats and I want to use pandoc for it (why to do worst what you already do the right way ;) ).
Using command line for this is a bit annoying. Yes, I can use a system call, etc. but the possibility to use pandoc through FFI would easy the development a lot and encourage other users of Pharo to use it ;)
Does anybody know how stable the ABI would be? Would it change with every major GHC release?
I keep thinking about whether it would make sense to rebuild some part of pandoc's Lua filter with C
What about Rust? 😄
The simplest possible interface would just export convertWithOpts
, using a JSON serialization of the Opts
structure representing command-line options and arguments. This would not be hard to do at all, but it would offer minimal advantages over shelling out to the command-line tool. It would only allow you to do conversions on files, or via stdout/stdin; I imagine many users would want an interface more like:
convert :: Opts -> ByteString -> IO ByteString
It seems to say that a "standalone" library is only possible for Windows currently; on other platforms, you'd need a whole slew of shared libraries (libHSrts, libHSbase, and everything else pandoc depends on). This might make this approach less attractive than it seems at first.
What are the available solutions of cross-compiling to another language? (e.g. tweag/asterius: A Haskell to WebAssembly compiler)
Some of those solutions (at least asterius) can be true standalone and may be it would be useful?
Cross-compiling support isn't strong yet. A pandoc in webassembly would be useful, but that should be another issue. Anyway, I don't think the tooling is quite there yet.
Personally I think even if it is just a function call replicating what the cli is doing would be better than calling pandoc cli directly (which is what most if not all current filter frameworks are doing.) The first reason is that it just feels wrong to do it that way (not building API interface directly), and secondly for performance. For example, currently if I have a loop that needs to call pandoc repeatedly, the bottleneck is the overhead of each subprocess. Loop like this is very common when e.g. we have a table (that easily have a ton of cell elements). So currently it has to resort to hack like this for non-standalone conversions, essentially to wrap those n calls into divs and unpack it after calling pandoc.
A support for just wrapping the cli as function call (the convertWithOpts) is that it probably is going to be very easy for existing solutions to port over, as they essentially has the same interface, even if we need to use stdin/stdout (probably won't be much of a bottleneck?)
In a certain sense it seems it would be great if we can obtain the AST directly (i.e. all the elements as a proper type but not a JSON representation.) E.g. in panflute we need to reconstruct an AST representation from the JSON representation, so it is quite error prone and unintuitive (but that's hidden away from panflute users.) So a typical life-cycle would be "pandoc AST -> JSON AST -> panflute AST -> JSON AST -> pandoc AST".
But on the other hand I'm not sure if that would be a "stable" approach for writing filters, as the JSON representation is relatively stable and breaking change rarely occur.
Another point is that it would be nice to provide a "parallel" version of that function call, like converts(texts: List[str], ...) -> List[str]
as a variant of convert(text: str, ...) -> str
and let pandoc/Haskell handles the parallelism, because that would be quite a likely scenario that people calling pandoc multiple times (say a static site generator).
Sorry, one more comment: probably the dynamic library is not that less attractive. On one hand pandoc is hard to compile, but if we (3rd parties) can provide it in some package managers then it would be not a problem to end-users. The one I'm interested in is conda (for Windows, Linux, macOS and perhaps arch variants), and I think brew, apt, pacman would probably be popular enough among pandoc users to have someone on board.
Since Cabal supports this since a couple of years ago, would it be possible to also build a shared library and increase the usefulness of pandoc?
https://cabal.readthedocs.io/en/latest/cabal-package.html#foreign-libraries
I know there is https://github.com/ShabbyX/libpandoc but it doesn't seem active, and was made before Cabal introduced it's
foreign-library
feature.This would be great because many languages, not just C based, would be able to link it directly (Go, Rust, PHP, Ruby, Python, etc).