rundel / parsermd

https://rundel.github.io/parsermd/
Other
76 stars 4 forks source link

Fails to parse raw html blocks #13

Closed gadenbuie closed 3 years ago

gadenbuie commented 3 years ago

pandoc allows for raw HTML or TeX blocks, but they break the syntax parser. These blocks aren't used often in R Markdown, but the new RStudio visual editor will add them if it identifies raw HTML, so they will likely become more common in the wild.

parsermd::parse_rmd("```{=html}\n<p>boom</p>\n```")
#> Error: Failed to parse line 1, expected chunk engine
#> ```{=html}
#> ~~~~^~~~~~

Created on 2021-02-24 by the reprex package (v0.3.0)

Session info ``` r sessionInfo() #> R version 3.6.3 Patched (2020-04-28 r79534) #> Platform: x86_64-apple-darwin15.6.0 (64-bit) #> Running under: macOS 10.16 #> #> Matrix products: default #> BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib #> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib #> #> locale: #> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 #> #> attached base packages: #> [1] stats graphics grDevices datasets utils methods base #> #> loaded via a namespace (and not attached): #> [1] Rcpp_1.0.5 crayon_1.3.4 digest_0.6.25 #> [4] backports_1.1.9 lifecycle_0.2.0 magrittr_1.5 #> [7] evaluate_0.14 highr_0.8 pillar_1.4.4 #> [10] rlang_0.4.10 stringi_1.4.6 renv_0.12.5 #> [13] checkmate_2.0.0 vctrs_0.3.1 ellipsis_0.3.1 #> [16] rmarkdown_2.5 tools_3.6.3 stringr_1.4.0 #> [19] xfun_0.16 yaml_2.2.1 compiler_3.6.3 #> [22] pkgconfig_2.0.3 parsermd_0.0.1.9000 htmltools_0.5.1.9000 #> [25] knitr_1.29.4 tibble_3.0.1 ```
rundel commented 3 years ago

I don't think I had seen that before, my assumption was that all chunk engines needed to be alphanumeric, should be an easy fix in the parser.

rundel commented 3 years ago

It seems like these ```{=<doc>}``` style code chunks are common with the visual editor, is there some documentation on them somewhere?

gadenbuie commented 3 years ago

If you mean the pandoc docs for raw attribute blocks, those docs are here: https://pandoc.org/MANUAL.html#extension-raw_attribute

rundel commented 3 years ago

Ah ok, I didn't realize that this was a pacdoc specific feature - this might be worth having as its own node within the ast

rundel commented 3 years ago

The raw_attr branch has a quick fix for this - these type of chunks will now parse but they get treated as regular Rmd code chunks which feels a bit awkward but not game breaking. If you have a use case one way of the other for these type of chunks that would be useful to know about when making a final decision.

gadenbuie commented 3 years ago

This looks good! Raw attribute blocks can't have names or other attributes. I think it'd be appropriate to keep engine and code (or maybe renamed language and content, but I can see advantages to using similar names to the R chunks). Would it also make sense to give these a new type, like rmd_raw_chunk or rmd_raw_attr?

rundel commented 3 years ago

Yeah I think having something like rmd_raw_chunk would make the most sense - I think I have a good way of doing this without major changes to the parser. I'll need to add a bit of other stuff to to handle the new class.

rundel commented 3 years ago

I think this should be working now - if you want to check out an Rmd and see that everything looks good from your end I can merge it into master.

gadenbuie commented 3 years ago

This is great, I love it! Thank you!