rundel / md4r

An R wrapper for the md4c markdown parsing library
https://rundel.github.io/md4r/
Other
4 stars 1 forks source link

Roundtrip #3

Closed krlmlr closed 6 months ago

krlmlr commented 6 months ago

Is it reasonable to expect a "good enough" roundtrip from parse_md() %>% to_md() ? I'm looking for a pandoc alternative for fledge. Currently:

library(md4r)

md <- "
# Header 1

## Header 2

- Bullet 1.
- Bullet 2.
"

x <- parse_md(md)
writeLines(to_md(x))
#> # Header 1
#> ## Header 2
#>  - Bullet 1.
#>  - Bullet 2.

Created on 2024-02-18 with reprex v2.1.0

Expected:

writeLines(to_md(x))
#>
#> # Header 1
#>
#> ## Header 2
#>
#> - Bullet 1.
#> - Bullet 2.

From a coarse look, it seems that neither the number of spaces before a bullet mark nor the number of empty lines around a header is captured. Short of patching the C library, it would already help if there was always one line after a header, and if no extra space would be added in front of the bullets. The space before a bullet seems intentional, I wonder why.

https://github.com/rundel/md4r/blob/c84bd1f9b1d96051c713baf04ac76438241dce27/R/to_md.R#L237

krlmlr commented 6 months ago

I see that tests start failing when I tweak the output spacing. I wonder if this is a limitation of the AST format, or if the output routine can be adapted to handle this.

rundel commented 6 months ago

Its been awhile since I looked at the C bit of this project but I believe that you are correct that the AST just doesn't capture some of these details as the generation of equivalent markdown to the input was never really an intended use case.

Having a blank line after a header definitely makes sense to me and should not be a big issue.

The Bullets are much more likely to be an issue and were a lot more finicky but I'm happy to play with it a bit and see what works.

rundel commented 6 months ago

Bullets were easier than headings turns out, should be working as expected now.

library(md4r)

md <- "
# Header 1

## Header 2

- Bullet 1.
- Bullet 2.
"

x <- parse_md(md)
writeLines(to_md(x))
#> # Header 1
#> 
#> ## Header 2
#> 
#> - Bullet 1.
#> - Bullet 2.

Created on 2024-02-19 with reprex v2.1.0