pridiltal / staplr

PDF Toolkit. :paperclip: :hammer: :wrench: :scissors: :bookmark_tabs: :file_folder::paperclip: :bookmark: :construction: :construction_worker:
https://pridiltal.github.io/staplr/
265 stars 27 forks source link

get_fields() Error in XML::htmlParse(fields, asText = TRUE, encoding = "UTF-8") : empty or no content specified #51

Closed jorgesinval closed 3 years ago

jorgesinval commented 3 years ago

I am trying to get the fields of the attached pdf

subset_1_part1.pdf

using:

get_fields("path/to/pdf/example.pdf)

the following error is generated:

Error in XML::htmlParse(fields, asText = TRUE, encoding = "UTF-8") : 
  empty or no content specified
oganm commented 3 years ago

The file you are trying to read doesn't appear to be a fillable form therefore get_fields cannot find any field data in the file. You will have better chance with something like pdftools::pdf_text here

jorgesinval commented 3 years ago

Can you provide me a link to see how that function can help to edit the pdf? Thanks.

oganm commented 3 years ago

For editing this won't be useful. I thought you were only trying to get the information out.

If you need to fill this form, you will have to modify it to include fillable form fields or re-create it from scratch. I believe the go to way to do that would be using adobe acrobat but I created my test files using libre office writer.

sctyner commented 3 years ago

Hello @oganm, I am receiving the same error message with a fillable PDF. The PDF is available for download here. My reprex is below. Thank you for any assistance you can provide.

library(staplr)
#> Warning in fun(libname, pkgname): couldn't connect to display ":0"
template_ads <- get_fields("aia0014.pdf")
#> Error in XML::htmlParse(fields, asText = TRUE, encoding = "UTF-8") : 
#>   empty or no content specified

Created on 2021-03-22 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> Error in get(genname, envir = envir) : object 'testthat_print' not found #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.2 (2020-06-22) #> os Ubuntu 18.04.2 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate C.UTF-8 #> ctype C.UTF-8 #> tz Etc/UTC #> date 2021-03-22 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.0.2) #> backports 1.1.9 2020-08-24 [2] CRAN (R 4.0.2) #> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.2) #> cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.2) #> crayon 1.3.4 2017-09-16 [2] CRAN (R 4.0.2) #> desc 1.2.0 2018-05-01 [2] CRAN (R 4.0.2) #> devtools 2.3.1 2020-07-21 [2] CRAN (R 4.0.2) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2) #> ellipsis 0.3.1 2020-05-15 [2] CRAN (R 4.0.2) #> evaluate 0.14 2019-05-28 [2] CRAN (R 4.0.2) #> fansi 0.4.1 2020-01-08 [2] CRAN (R 4.0.2) #> fs 1.5.0 2020-07-31 [2] CRAN (R 4.0.2) #> glue 1.4.2 2020-08-27 [2] CRAN (R 4.0.2) #> highr 0.8 2019-03-20 [2] CRAN (R 4.0.2) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2) #> knitr 1.31 2021-01-27 [1] CRAN (R 4.0.2) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2) #> memoise 1.1.0 2017-04-21 [2] CRAN (R 4.0.2) #> pkgbuild 1.1.0 2020-07-13 [2] CRAN (R 4.0.2) #> pkgload 1.1.0 2020-05-29 [2] CRAN (R 4.0.2) #> prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.0.2) #> processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.2) #> ps 1.5.0 2020-12-05 [1] CRAN (R 4.0.2) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2) #> remotes 2.2.0 2020-07-21 [2] CRAN (R 4.0.2) #> rJava 0.9-13 2020-07-06 [1] CRAN (R 4.0.2) #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2) #> rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.2) #> rprojroot 1.3-2 2018-01-03 [2] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.0.2) #> staplr * 3.1.1 2021-01-11 [1] CRAN (R 4.0.2) #> stringi 1.5.3 2020-09-09 [2] CRAN (R 4.0.2) #> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.0.2) #> testthat 2.3.2 2020-03-02 [2] CRAN (R 4.0.2) #> usethis 1.6.1 2020-04-29 [2] CRAN (R 4.0.2) #> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2) #> xfun 0.21 2021-02-10 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.0.2) #> #> [1] /home/tyners/R/x86_64-pc-linux-gnu-library/4.0 #> [2] /opt/R/4.0.2/lib/R/library ```