Open dmurdoch opened 9 months ago
About where in the commonmark C code would the filter have to be invoked?
If it was being called from package code, it would go here: https://github.com/dmurdoch/commonmark/blob/58d435515ea2dc1da08abe186e27b4d461d639f6/src/wrapper.c#L88, just after parsing, before rendering. When I wrote the request, I was thinking my code would duplicate most of R_render_markdown
, but it would be even better if that function had another argument to provide a filter, and called it if one was provided.
I don't think an R function would work here; any filter would need access to functions like cmark_iter_new
, cmark_iter_get_node
, etc.
I've just committed a first version of a small demo to my fork of commonmark
on branch filter
: install_github("dmurdoch/commonmark@filter")
. This doesn't register any of the cmark functions so it wouldn't be usable from another package, but this code uses it with code added to the package on that branch:
library(commonmark)
md <- readLines("https://raw.githubusercontent.com/yihui/knitr/master/NEWS.md")[1:10]
tex <- markdown_latex(md, filter=c("commonmark", "latex_sourcepos"))
cat(tex)
Currently the filter just prints the TEXT nodes, but I'll be adding in sourcepos info.
I've updated the filter to insert macros indicating the source location. The script from the last comment now prints something like this:
\section{\sourcepos{1:3}CHANGES IN knitr VERSION 1.46}
\subsection{\sourcepos{3:4}NEW FEATURES}
\begin{itemize}
\item \sourcepos{5:3}Added a new chunk option \texttt{tab.cap}
...
The feature request is that commonmark should allow me to put something like this in my own package.
If it is really just about supporting sourcepos for latex, maybe your solution is a bit overly complex.
I tried something simpler based on onjecting latex comments after each newline: https://github.com/r-lib/commonmark/commit/2f032ad5454063341228eec8b2477dce09abdba4
Can you test if this is something that could be workable?
install_github("jeroen/commonmark@latex_sourcepos")
library(commonmark)
md <- readLines("https://raw.githubusercontent.com/yihui/knitr/master/NEWS.md")[1:10]
markdown_latex(md, sourcepos = TRUE)
I'll give it a try. At first glance, it looks as though that would solve my immediate problem.
An advantage of the approach I proposed is that it could be used for other problems, similar to the way filters are used in Pandoc to allow simple Commonmark extensions to be implemented. For example, when I wrote RmdConcord the hope was that it would replace pdf_document, which uses Pandoc markdown. I didn't try to implement all of the extensions in Pandoc markdown, but I did add one that allows \pagebreak
or \newpage
to be interpreted as page breaks. This page: https://pandoc.org/lua-filters.html gives some other examples.
I see that it is powerful but I'm a little worried about the can of worms that we open if we want to support callable C filters. Given that this is a widely used package and I have less maintenance time than I used to, I would prefer to keep things simple is possible.
Sure, that makes sense. I haven't had a chance to check your earlier sourcepos patch yet, but I should get to that today.
That patch works very well for me. I was worried it could affect the LaTeX (adding the space, commenting out the newline) but it is very easy for me to remove the comments after processing them, so LaTeX will be fine.
OK great. If you want we can still change the format to have more spaces or different braces, I just through a comment would be the easiest way to add metadata without affecting the content.
Also does the sourcepos comment appear everywhere in the document where you need it? I think it is inserted before every blank line now, but we could try to inject it in more places if needed.
The format is fine. For my purposes (forward and reverse linking from an Rmd file to a .pdf file) all I'm using is the starting line number. If it's off by a bit it doesn't really matter.
Adding it at the end of every line would be helpful if people write really long paragraphs, or cases where knitr has inserted or removed a lot of lines.
As I said, my code removes the comment before sending the result to LaTeX, so it shouldn't cause parsing issues.
I'd like to insert a filter into the parsing process: parse the markdown input, execute my filter, render to output format. I think this would be relatively straightforward if I was writing the filter in C.
Doing this would require that the cmark header files be made available, and entry points be registered using R_RegisterCCallable, and probably some more things I don't know about.