Open emilyriederer opened 2 years ago
Thanks for discovering this bug! I will get to fixing it soon.
Okay! Made a fix which I think is not too bad (skips a step when tidyselect expressions don't match any columns). Documentation currently doesn't state this is what happens (it definitely something to follow up on, if this is even the right approach). There may be unintended side effects but basic testing was added.
In fact, I'm going to re-open this so (1) you could weigh in on the changes and (2) there's more opportunities to find bugs, fix 'em, and add more tests.
Wow - thanks so much, @rich-iannone ! That was so fast.
Overall, this definitely works great for my purposes. All four cases I list above work and produce reasonable interrogation output and YAML files. 🎉
The only thing I notice is cosmetic. For res1
and res2
where there is no match, the check shows up as follows in the interrogation output:
That's completely reasonable and not an issue, but I can imagine it might be mildly confusing to a non-developer to see a check with no columns in the output. I wonder if it might be helpful either the have the tidyselect
conditions print where the column names should go and/or to suppress checks with no matching columns from the tabular results? 🤔 Again, that is a tiny aesthetic point and really not material either way.
Thanks again!
Emily, this is great feedback (and of the sort of I was looking for)! I definitely want the reporting to be broadly understandable so I think that I can at least add some text stating that no columns were found (prior to other, better refinements).
You mentioned suppressing checks with no matching columns... does that mean excluding the step entirely from the resulting data and hence also the report (deleting instead of skipping)? There could be an option for this behavior available for that (probably in interrogate()
?).
There are probably lots of little inconsistencies and possible design improvements for the reporting. Feel free to bring those to the fore by making one or several issues (I do want to improve this aspect of the package). Your feedback is super valuable and will be addressed!
I'm honestly torn if I like the idea of checking it completely or not (arguing against my previous statement). On further reflection I can imagine two cases where I would definitely want the check to show even if not used:
So perhaps simple context as you said is better without suppression!
Hi - just wanted to check in on this since I see it tagged for the next version milestone!
If I understand correctly, the initial issue about yaml writing seems to have been resolved. As for the remaining concern about the informativeness of the columns part of the report, the dev version's tidyselect
overhaul now gives us the following reports (with some columns omitted for space) for the res1
and res2
examples. Note that the reprex has been slightly edited to keep the code up to date and avoid deprecation warnings.
agent1 <-
create_agent(tbl = data.frame(IND_A = 1, AMT_B = 2)) %>%
col_vals_not_null(starts_with("ID"), step_id = 1) %>%
col_vals_not_null(starts_with("IND"), step_id = 2)
res1 <- interrogate(agent1)
res1
agent2 <-
create_agent(tbl = data.frame(IND_A = 1, AMT_B = 2)) %>%
col_vals_not_null(
starts_with("ID"), step_id = 1,
active = ~ . %>% has_columns(starts_with("ID"))
) %>%
col_vals_not_null(starts_with("IND"), step_id = 2)
res2 <- interrogate(agent2)
res2
To summarize the relevant changes in the current dev version:
1) When tidyselect
helpers are used to select columns but no columns are found, the agent report will show the original tidyselect expression (in this case, starts_with("ID")
, though it's unfortunately cut off - I may revisit the font size). (#499)
2) The has_columns()
helper now also supports tidyselect, which lets res2
express something like "validate that columns starting with ID do not have any missing values, if such columns exist". Even if this makes a step inactive, the attempted tidyselect expression is still shown in the report to provide some context for the skipping. (#500)
Please let me know how these changes feel for the examples! If there's any other concerns about the report, we can move it to a separate issue.
@yjunechoe I don't find this all that bad! At least it doesn't fail in dev tidyselect and I suppose hovering over the starts_with(...
text provides the rest of the string (though it's unformatted, it is at least informative). I think this is okay for the examples; the unfortunate aspect of that is that if the example output involves a screenshot of the table, those ones have to be regenerated.
BTW I've moved some of the items from the upcoming release milestone (mostly to do with overhaul work for scan_data()
into the release after that). We could even punt on what's in the 2 items left in the upcoming release (I'm kinda thinking yeah on that one).
Sounds good to me! I also don't have strong feelings on the remaining items (they seem isolated enough to get fixed in later patch versions). I only selfishly want to get a new release published but I don't mean to rush :)
Let me know if you need anything from my end!
I also want to get the release out! I’ll push the remaining items to the next milestone and get this going.
Prework
Description
The
yaml_write()
function does not appear to work whenactive
is provided an anonymous function the returnsFALSE
.I suspect #345 could be related since it covers another edge case where
write_yaml()
fails a receiving a function versus a value as input. However, I thought this seemed like a distinct enough problem to merit its own issue.The broader context for this problem is:
starts_with('N')
when data may or may not have any such columns)pointblank
0.6.0, I was able to do this so long as I explicitly provided thestep_id
pointblank
0.8.0 and beyond, I get errors unless I use theactive
argument (shown in first part of reprex)active
makes interrogation work but breakswrite_yaml()
if the function returnsFALSE
Reproducible example
Created on 2021-09-17 by the reprex package (v0.3.0)
Session info
``` r devtools::session_info() ``` #> Error in get(genname, envir = envir) : object 'testthat_print' not found #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.2 (2020-06-22) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz America/Chicago #> date 2021-09-17 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.1.7 2020-05-13 [1] #> blastula 0.3.2 2020-05-19 [1] #> callr 3.4.3 2020-03-28 [1] #> cli 2.5.0 2021-04-26 [1] #> crayon 1.3.4 2017-09-16 [1] #> DBI 1.1.0 2019-12-15 [1] #> desc 1.2.0 2018-05-01 [1] #> devtools 2.3.1 2020-07-21 [1] #> digest 0.6.27 2020-10-24 [1] #> dplyr 1.0.7 2021-06-18 [1] #> ellipsis 0.3.2 2021-04-29 [1] #> evaluate 0.14 2019-05-28 [1] #> fansi 0.4.1 2020-01-08 [1] #> fs 1.5.0 2020-07-31 [1] #> generics 0.1.0 2020-10-31 [1] #> glue 1.4.2 2020-08-27 [1] #> highr 0.8 2019-03-20 [1] #> htmltools 0.5.1.1 2021-01-22 [1] #> knitr 1.33.8 2021-08-08 [1] #> lifecycle 1.0.0 2021-02-15 [1] #> magrittr 2.0.1 2020-11-17 [1] #> memoise 1.1.0 2017-04-21 [1] #> pillar 1.6.2 2021-07-29 [1] #> pkgbuild 1.1.0 2020-07-13 [1] #> pkgconfig 2.0.3 2019-09-22 [1] #> pkgload 1.1.0 2020-05-29 [1] #> pointblank * 0.8.0.9000 2021-09-17 [1] #> prettyunits 1.1.1 2020-01-24 [1] #> processx 3.4.3 2020-07-05 [1] #> ps 1.3.4 2020-08-11 [1] #> purrr 0.3.4 2020-04-17 [1] #> R6 2.5.0 2020-10-28 [1] #> remotes 2.2.0 2020-07-21 [1] #> rlang 0.4.11 2021-04-30 [1] #> rmarkdown 2.8 2021-05-07 [1] #> rprojroot 1.3-2 2018-01-03 [1] #> rstudioapi 0.13 2020-11-12 [1] #> sessioninfo 1.1.1 2018-11-05 [1] #> stringi 1.4.6 2020-02-17 [1] #> stringr 1.4.0 2019-02-10 [1] #> testthat 2.3.2 2020-03-02 [1] #> tibble 3.1.4 2021-08-25 [1] #> tidyselect 1.1.1 2021-04-30 [1] #> usethis 2.0.1 2021-02-10 [1] #> utf8 1.1.4 2018-05-24 [1] #> vctrs 0.3.8 2021-04-29 [1] #> withr 2.4.2 2021-04-18 [1] #> xfun 0.23 2021-05-15 [1] #> yaml 2.2.1 2020-02-01 [1] #> source #> CRAN (R 4.0.2) #> CRAN (R 4.0.0) #> CRAN (R 4.0.2) #> CRAN (R 4.0.2) #> CRAN (R 4.0.5) #> CRAN (R 4.0.2) #> CRAN (R 4.0.2) #> CRAN (R 4.0.3) #> CRAN (R 4.0.2) #> CRAN (R 4.0.3) #> CRAN (R 4.0.5) #> CRAN (R 4.0.5) #> CRAN (R 4.0.2) #> CRAN (R 4.0.2) #> CRAN (R 4.0.2) #> CRAN (R 4.0.3) #> CRAN (R 4.0.5) #> CRAN (R 4.0.2) #> CRAN (R 4.0.5) #> Github (yihui/knitr@55a2df9) #> CRAN (R 4.0.5) #> CRAN (R 4.0.3) #> CRAN (R 4.0.2) #> CRAN (R 4.0.5) #> CRAN (R 4.0.2) #> CRAN (R 4.0.2) #> CRAN (R 4.0.2) #> Github (rich-iannone/pointblank@40212d4) #> CRAN (R 4.0.2) #> CRAN (R 4.0.2) #> CRAN (R 4.0.2) #> CRAN (R 4.0.2) #> CRAN (R 4.0.5) #> CRAN (R 4.0.2) #> CRAN (R 4.0.5) #> CRAN (R 4.0.5) #> CRAN (R 4.0.2) #> CRAN (R 4.0.3) #> CRAN (R 4.0.2) #> CRAN (R 4.0.0) #> CRAN (R 4.0.2) #> CRAN (R 4.0.2) #> CRAN (R 4.0.5) #> CRAN (R 4.0.5) #> CRAN (R 4.0.5) #> CRAN (R 4.0.2) #> CRAN (R 4.0.5) #> CRAN (R 4.0.5) #> CRAN (R 4.0.5) #> CRAN (R 4.0.2) #> #> [1] C:/Users/emily/Documents/R/win-library/4.0 #> [2] C:/Program Files/R/R-4.0.2/librarytitle: reprex_reprex.R author: emily date: ‘2021-09-17’
Expected result
I would expect
agent2
to behave the same was asagent3
when passed towrite_yaml()
Session info
End the reproducible example with a call to
sessionInfo()
in the same session (e.g.reprex(si = TRUE)
) and include the output.