Feature request: make cmark functions callable

dmurdoch commented 9 months ago

I'd like to insert a filter into the parsing process: parse the markdown input, execute my filter, render to output format. I think this would be relatively straightforward if I was writing the filter in C.

Doing this would require that the cmark header files be made available, and entry points be registered using R_RegisterCCallable, and probably some more things I don't know about.

jeroen commented 9 months ago

About where in the commonmark C code would the filter have to be invoked?

dmurdoch commented 9 months ago

If it was being called from package code, it would go here: https://github.com/dmurdoch/commonmark/blob/58d435515ea2dc1da08abe186e27b4d461d639f6/src/wrapper.c#L88, just after parsing, before rendering. When I wrote the request, I was thinking my code would duplicate most of R_render_markdown, but it would be even better if that function had another argument to provide a filter, and called it if one was provided.

I don't think an R function would work here; any filter would need access to functions like cmark_iter_new, cmark_iter_get_node, etc.

dmurdoch commented 9 months ago

I've just committed a first version of a small demo to my fork of commonmark on branch filter: install_github("dmurdoch/commonmark@filter"). This doesn't register any of the cmark functions so it wouldn't be usable from another package, but this code uses it with code added to the package on that branch:

library(commonmark)
md <- readLines("https://raw.githubusercontent.com/yihui/knitr/master/NEWS.md")[1:10]

tex <- markdown_latex(md, filter=c("commonmark", "latex_sourcepos"))

cat(tex)

Currently the filter just prints the TEXT nodes, but I'll be adding in sourcepos info.

dmurdoch commented 9 months ago

I've updated the filter to insert macros indicating the source location. The script from the last comment now prints something like this:

\section{\sourcepos{1:3}CHANGES IN knitr VERSION 1.46}

\subsection{\sourcepos{3:4}NEW FEATURES}

\begin{itemize}
\item \sourcepos{5:3}Added a new chunk option \texttt{tab.cap} 
...

The feature request is that commonmark should allow me to put something like this in my own package.

jeroen commented 9 months ago

If it is really just about supporting sourcepos for latex, maybe your solution is a bit overly complex.

I tried something simpler based on onjecting latex comments after each newline: https://github.com/r-lib/commonmark/commit/2f032ad5454063341228eec8b2477dce09abdba4

Can you test if this is something that could be workable?

install_github("jeroen/commonmark@latex_sourcepos")

library(commonmark)
md <- readLines("https://raw.githubusercontent.com/yihui/knitr/master/NEWS.md")[1:10]
markdown_latex(md, sourcepos = TRUE)

dmurdoch commented 9 months ago

I'll give it a try. At first glance, it looks as though that would solve my immediate problem.

An advantage of the approach I proposed is that it could be used for other problems, similar to the way filters are used in Pandoc to allow simple Commonmark extensions to be implemented. For example, when I wrote RmdConcord the hope was that it would replace pdf_document, which uses Pandoc markdown. I didn't try to implement all of the extensions in Pandoc markdown, but I did add one that allows \pagebreak or \newpage to be interpreted as page breaks. This page: https://pandoc.org/lua-filters.html gives some other examples.

jeroen commented 9 months ago

I see that it is powerful but I'm a little worried about the can of worms that we open if we want to support callable C filters. Given that this is a widely used package and I have less maintenance time than I used to, I would prefer to keep things simple is possible.

dmurdoch commented 9 months ago

Sure, that makes sense. I haven't had a chance to check your earlier sourcepos patch yet, but I should get to that today.

dmurdoch commented 9 months ago

That patch works very well for me. I was worried it could affect the LaTeX (adding the space, commenting out the newline) but it is very easy for me to remove the comments after processing them, so LaTeX will be fine.

jeroen commented 9 months ago

OK great. If you want we can still change the format to have more spaces or different braces, I just through a comment would be the easiest way to add metadata without affecting the content.

Also does the sourcepos comment appear everywhere in the document where you need it? I think it is inserted before every blank line now, but we could try to inject it in more places if needed.

dmurdoch commented 9 months ago

The format is fine. For my purposes (forward and reverse linking from an Rmd file to a .pdf file) all I'm using is the starting line number. If it's off by a bit it doesn't really matter.

Adding it at the end of every line would be helpful if people write really long paragraphs, or cases where knitr has inserted or removed a lot of lines.

As I said, my code removes the comment before sending the result to LaTeX, so it shouldn't cause parsing issues.

r-lib / commonmark

Feature request: make cmark functions callable #28