yihui / knitr

A general-purpose tool for dynamic report generation in R
https://yihui.org/knitr/
2.36k stars 873 forks source link

Add a handler to improve YAML error message #2294

Closed pedropark99 closed 8 months ago

pedropark99 commented 9 months ago

Hello! This PR seeks to fulfill the objective proposed at the issue #2226

This PR adds a tryCatch() block to the step of parsing the YAML with the chunk options, and uses the error handler to build a better error message, as proposed in the issue #2226 . The new error message is in the following format:

> knit("C:\\Users\\pedro\\Desktop\\test.Rmd")
Failed to parse YAML: 
Scanner error: mapping values are not allowed in this context at line 2, column 12

label: test multiqueries -5
 connection: db
             ^~~~~~

This error message was produced while knitting the following Rmd:

---
title: "Example"
format: pdf
engine: knitr
editor: visual
---

```{r}
# install.packages(c("DBI", "RSQLite"))
db = DBI::dbConnect(RSQLite::SQLite(), ":memory:")
#| label: test multiqueries -5
#|  connection: db

select 1


However, this current format of the error message is still not complete. In other words, it only partially meets the full description given by @yihui in the mentioned issue. According to the full description given by @yihui , there are two groups of indeces (or indexes) that would help to improve the error message, which are:

- The line and column indexes that point to the exact position inside the YAML where the error occured;
- The line index (or label) of the code chunk that contains the YAML where the error occured;

The current format of the error message that I showed above only contains the first group of indexes (the line and column indexes inside the YAML). So I can still improve this version by including the line index (or label) of the code chunk. According to @yihui we could get the code chunk index by using a `groups` variable. I did quickly looked for a similar variable at the parent environment from the `tryCatch()` block, with `ls(envir = parent.env(environment()))`, but I did not found this variable, or, a similar global variable from which I could extract the code chunk index.

But I am more than happy to implement this second index if you can give more instructions on how to get this code chunk index/label from inside the `tryCatch()` block 😉.

Furthermore, since the `knitr` codebase tries to stick with base R as much as possible I ended up using `base::regexpr()` to extract/parse the column and line index from the original error message. But I fell that the parsing is still a bit clunk, so, I think that this part of the code can be improved.

Another thing that we can improve/change is how to print/return the improved error message. Since is an error message, I did try to print the message with a `stop()` call, however, the message was not being printed correctly. That is why, I ended up chosing to just returning the error message at the end of the error handler.
pedropark99 commented 8 months ago

Hey @yihui ! First of all, thank you for the review!

OBS: There is one task in the CI pipeline that is failling because of an outdated R version which is not supported by a dependency. Because I understand that the fix to this error is out of this PR scope, I will leave this error to you 😉.

I made the changes you requested in your review.

After that, I did a quick "debug walking" through the parser code, to get a better understanding on how it works. Then, I followed your instructions on how to get the chunk lines. With my latest changes, the error message is now on the following format:

> knit("C:\\Users\\pedro\\Desktop\\test.Rmd")
In partition_chunk(engine, code, block_index) :
  Invalid YAML option format in chunk: 
Failed to parse YAML inside code chunk at lines 15-19. Scanner error: mapping values are not allowed in this context at line 2, column 7

connection: db
 label: test multiqueries -5
       ^~~~~~

I think that maybe the above message can still be improved if we change the last bit of the message like this:

Failed to parse YAML inside code chunk at lines 15-19. Mapping values are not allowed in a code chunk context. Error found at line 2, column 7 inside the code chunk.

connection: db
 label: test multiqueries -5
       ^~~~~~

What is your opinion on it? If you think that this is a good addition, I can make the changes for us.

Now, in addition to parser.R, since I basically changed the function signatures for partition_chunk() and parse_block(), I also had to make changes in two more places:

The changes above led me to also recompile all functions documentation with devtools::document().