Support for sourcepos for latex output

r-lib / commonmark

High Performance CommonMark and Github Markdown Rendering in R

https://docs.ropensci.org/commonmark/

Other

88 stars 11 forks source link

Support for sourcepos for latex output #29

Open jeroen opened 8 months ago

jeroen commented 8 months ago

Requested by @dmurdoch. Fixes #28

If we have settled on the format, we should try to upstream this in cmark.

dmurdoch commented 8 months ago

I suspect they won't like it, because there are bound to be cases where the comments change the meaning of the LaTeX source, e.g. if one ends up in some verbatim environment.

jeroen commented 8 months ago

Can you try it again? I tried to now insert the comment at each linebreak, except for within verbatim.

dmurdoch commented 8 months ago

I see a couple of spots where it emits %sourcepos(0:0-0:0). I'm not sure what they have in common, but it might be worth skipping the write if the sourcepos is empty. Here's the input I used:


$equation$

$$ equation $$

This is a sample vignette that contains an error which will be detected
by HTML Tidy.

The error is the use of a non-existent tag, "foobar":

<foobar>

Run `example(processConcordance)` to see before and after reports.

This is *italic*, **bold**, _italic_, __bold__ .

# Header 1

## Header 2

### Header 3

* Item 1
* Item 2
    + Item 2a
    + Item 2b

1. Item 1
2. Item 2
3. Item 3
    + Item 3a
    + Item 3b    

Roses are red,   
Violets are blue.

http://example.com

[linked phrase](http://r-project.org)

![alt text](mesh.png)

jeroen commented 8 months ago

Thanks, I updated it to remove those.

dmurdoch commented 8 months ago

This is working nicely now. I'd be happy with this change.

One other thought about suggesting it for cmark: users of cmark don't need this, since it's not that hard to insert new code between parsing and rendering if you are working in C.

A way to do that in your package without running external C code would be to offer the R interface in two steps: one to parse, one to render to some format. The return from the parse step could be an R list object that contains all the information from the parse tree, and the rendering step could use that as input and rebuild a new parse tree from it. (Pandoc used to do this by generating JSON and then reinterpreting it, but I don't think there's a need for that here.) If you're interested in pursuing this I'd be happy to try to write it, or to test it if you write it.

jeroen commented 8 months ago

I think it would be very difficult to convert the entire cmark parse-tree structure into an R object that users can manipulate, and then even more difficult to convert an arbitrarily modified R list back to the parse tree for cmark. I don't think there is a reliable way to do this honestly, there would be many ways the user can accidentally corrupt the parse tree.

If you want to manipulate the parsed document, the best way is to use the xml representation of the parse tree using tinkr rather than trying to convert everything to R lists and back.

dmurdoch commented 8 months ago

Thanks, I was unaware of tinkr. I think it could have solved all of the issues I had, though I imagine it would introduce a lot of dependencies, so your latex changes are still preferable.