The package has worked extremely well on processing "traditional" non fillable forms -- thank you.
In my first attempts at using it with "fillable forms" I can't seem to find a way to distinguish between radio buttons or checkboxes that are selected and those that are not. I'm not sure if I'm missing some nuance, making a complete mistake, or whether the functions don't support it?
An example "blank" original form is at An example "blank" original form is at. For the reprex below I am focusing on a small segment of the form on page 1 that I have included as screenshots in the original state add after filling out and saving a few entries.
I would like to know if there is a way to distinguish the fact that "Long Term Care" is selected in the filled out form versus not selected in the original?
Thank you in advance. Below is what I hope is a reprex that will help, since I could not find an easy safe place to "post" the example filled out form I used dput to put the resulting data in the reprex obviously users can grab the original and dave changes to their local filesystem if desired.
suppressPackageStartupMessages(library(dplyr))
library(arsenal)
## Not sure if poppler version matters?
library(pdftools)
#> Using poppler version 23.04.0
## Download and save the original form as original.pdf
download.file("https://www.cdc.gov/infectioncontrol/pdf/icar/IPC-demo-LTC-508.pdf",
"original.pdf")
## Let's use just the first page for the reprex
## Using pdf_data() for the convenience of having a tibble
## Same problem if I use pdf_text
original_pageone <- pdf_data("original.pdf")[[1]]
original_pageone_segment <-
original_pageone %>%
filter(y >= 229, y <= 290)
# no obvious errors but difficult to see the the radio button
# "text" in RStudio console
# original_pageone_segment %>% print(n = Inf)
# Fill in the form with some data. It works and I can see
# traditional text such as "1234" and "5678" I entered on the form
# filled_pageone <- pdf_data("example_filled_form.pdf")[[1]]
# use dput to capture the resulting tibble for the reprex
# filled_pageone %>%
# filter(y >= 229, y <= 290) %>% dput()
filled_pageone_segment <-
structure(list(width = c(28L, 18L, 41L, 13L, 53L, 19L, 16L, 49L,
8L, 13L, 18L, 8L, 32L, 7L, 22L, 17L, 31L, 3L, 26L, 25L, 31L,
7L, 40L, 17L, 7L, 90L, 17L, 7L, 22L, 32L, 8L, 48L, 17L, 17L,
28L, 8L, 8L, 48L, 17L),
height = c(11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 9L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 9L, 11L, 11L, 9L, 11L, 11L, 9L, 11L, 11L, 11L, 11L,
7L, 11L, 11L, 11L, 11L, 11L, 7L),
x = c(31L, 61L, 81L, 125L,
140L, 195L, 217L, 31L, 82L, 92L, 108L, 128L, 138L, 37L, 49L,
73L, 92L, 126L, 131L, 159L, 186L, 37L, 49L, 91L, 37L, 49L, 142L,
37L, 49L, 73L, 395L, 406L, 459L, 275L, 294L, 325L, 335L, 346L,
399L),
y = c(229L, 229L, 229L, 229L, 229L, 229L, 229L, 240L,
240L, 240L, 240L, 240L, 240L, 255L, 254L, 254L, 254L, 254L, 254L,
254L, 254L, 267L, 266L, 266L, 278L, 278L, 278L, 290L, 290L, 290L,
229L, 229L, 230L, 248L, 248L, 248L, 249L, 249L, 250L),
space = c(TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE,
TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE,
TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE,
TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE),
text = c("Facility",
"type", "(Complete", "the", "demographic", "form", "that", "corresponds",
"to", "the", "type", "of", "facility):", "●", "Acute", "Care",
"Hospital", "/", "Critical", "Access", "Hospital", "●", "Long-term",
"Care", "●", "Outpatient/Ambulatory", "Care", "●", "Other",
"(specify):", "(if", "applicable):", "1234", "CMS", "Facility",
"ID", "(if", "applicable):", "5678")),
class = c("tbl_df", "tbl",
"data.frame"),
row.names = c(NA, -39L))
## Use arsensal to compare tibbles in detail
summary(comparedf(original_pageone_segment, filled_pageone_segment, by = c("x", "y")))
#>
#>
#> Table: Summary of data.frames
#>
#> version arg ncol nrow
#> -------- ------------------------- ----- -----
#> x original_pageone_segment 6 37
#> y filled_pageone_segment 6 39
#>
#>
#>
#> Table: Summary of overall comparison
#>
#> statistic value
#> ------------------------------------------------------------ ------
#> Number of by-variables 2
#> Number of non-by variables in common 4
#> Number of variables compared 4
#> Number of variables in x but not y 0
#> Number of variables in y but not x 0
#> Number of variables compared with some values unequal 1
#> Number of variables compared with all values equal 3
#> Number of observations in common 37
#> Number of observations in x but not y 0
#> Number of observations in y but not x 2
#> Number of observations with some compared variables unequal 2
#> Number of observations with all compared variables equal 35
#> Number of values unequal 2
#>
#>
#>
#> Table: Variables not shared
#>
#>
#> ------------------------
#> No variables not shared
#> ------------------------
#>
#>
#>
#> Table: Other variables not compared
#>
#>
#> --------------------------------
#> No other variables not compared
#> --------------------------------
#>
#>
#>
#> Table: Observations not shared
#>
#> version x y observation
#> -------- ---- ---- ------------
#> y 399 250 39
#> y 459 230 33
#>
#>
#>
#> Table: Differences detected by variable
#>
#> var.x var.y n NAs
#> ------- ------- --- ----
#> width width 0 0
#> height height 0 0
#> space space 2 0
#> text text 0 0
#>
#>
#>
#> Table: Differences detected
#>
#> var.x var.y x y values.x values.y row.x row.y
#> ------ ------ ---- ---- --------- --------- ------ ------
#> space space 346 249 FALSE TRUE 37 38
#> space space 406 229 FALSE TRUE 32 32
#>
#>
#>
#> Table: Non-identical attributes
#>
#>
#> ----------------------------
#> No non-identical attributes
#> ----------------------------
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.2.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] pdftools_3.4.0 arsenal_3.6.3 dplyr_1.1.4
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 cli_3.6.2 knitr_1.45 rlang_1.1.3
#> [5] xfun_0.42 purrr_1.0.2 styler_1.10.2 generics_0.1.3
#> [9] glue_1.7.0 askpass_1.2.0 qpdf_1.3.2 htmltools_0.5.7
#> [13] fansi_1.0.6 rmarkdown_2.25 R.cache_0.16.0 tibble_3.2.1
#> [17] evaluate_0.23 fastmap_1.1.1 yaml_2.3.8 lifecycle_1.0.4
#> [21] compiler_4.3.2 fs_1.6.3 Rcpp_1.0.12 pkgconfig_2.0.3
#> [25] rstudioapi_0.15.0 R.oo_1.26.0 R.utils_2.12.3 digest_0.6.34
#> [29] R6_2.5.1 tidyselect_1.2.0 utf8_1.2.4 reprex_2.1.0
#> [33] pillar_1.9.0 magrittr_2.0.3 R.methodsS3_1.8.2 tools_4.3.2
#> [37] withr_3.0.0
The package has worked extremely well on processing "traditional" non fillable forms -- thank you.
In my first attempts at using it with "fillable forms" I can't seem to find a way to distinguish between radio buttons or checkboxes that are selected and those that are not. I'm not sure if I'm missing some nuance, making a complete mistake, or whether the functions don't support it?
An example "blank" original form is at An example "blank" original form is at. For the reprex below I am focusing on a small segment of the form on page 1 that I have included as screenshots in the original state add after filling out and saving a few entries.
I would like to know if there is a way to distinguish the fact that "Long Term Care" is selected in the filled out form versus not selected in the original?
Thank you in advance. Below is what I hope is a reprex that will help, since I could not find an easy safe place to "post" the example filled out form I used dput to put the resulting data in the reprex obviously users can grab the original and dave changes to their local filesystem if desired.
Created on 2024-03-29 with reprex v2.1.0