mjwestgate / synthesisr

Data import and deduplication for evidence synthesis projects
30 stars 10 forks source link

Import breaks when bibtex field is empty #25

Open LukasWallrich opened 1 year ago

LukasWallrich commented 1 year ago

Currently, empty bibtex fields break the import - for instance, when Crossref returns an empty author field.

library(synthesisr)
bib <- "@article{68,
  author = {},
  title = {Evaluation of technology transferring: The experiences of the first Navy Domestic Technology Transfair. Final report},
  journal = {Reviews},
  publisher = {Office of Scientific and Technical Information (OSTI)},
  date = {2003},
  year = {2003},
  address = {Arlington, VA},
  doi = {10.2172/10138039}
}"

t <- tempfile(fileext = ".bib")
writeLines(bib, t)
res <- read_ref(t)
#> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0

Created on 2023-01-26 with reprex v2.0.2

LukasWallrich commented 1 year ago

Just sharing my workaround here in case it is useful to someone else - in essence, import as list works, and the conversion to a data.frame can then happen outside the package.

library(synthesisr)
library(purrr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

bib <- "
@article{66,
  author = {Ilgen, Daniel and Hollenbeck, John and Johnson, Michael and Jundt, Dustin},
  title = {Teams in Organizations: From Input-Process-Output Models to IMOI Models},
  journal = {Annual Review of Psychology},
  publisher = {Annual Reviews},
  editor = {S. Fiske Schacter, & A. Kasdin},
  date = {2005-02-22},
  year = {2005},
  month = {2},
  day = {22},
  pages = {517-543},
  volume = {56},
  number = {1},
  doi = {10.1146/annurev.psych.56.091103.070250}
}

@article{68,
  author = {},
  title = {Evaluation of technology transferring: The experiences of the first Navy Domestic Technology Transfair. Final report},
  journal = {Reviews},
  publisher = {Office of Scientific and Technical Information (OSTI)},
  date = {2003},
  year = {2003},
  address = {Arlington, VA},
  doi = {10.2172/10138039}
}"

t <- tempfile(fileext = ".bib")
writeLines(bib, t)

res <- read_ref(t, return_df = FALSE)

map(res, ~{.x[lengths(.x) == 0] <- NA; .x}) %>% 
  map(~{.x[lengths(.x) > 1] <- map(.x[lengths(.x) > 1], paste, collapse = " and "); .x}) %>% 
  map(data.frame) %>% 
  bind_rows()
#>      type
#> 1 article
#> 2 article
#>                                                                      author
#> 1 Ilgen, Daniel and Hollenbeck, John and Johnson, Michael and Jundt, Dustin
#> 2                                                                      <NA>
#>                                                                                                                  title
#> 1                                              Teams in Organizations: From Input-Process-Output Models to IMOI Models
#> 2 Evaluation of technology transferring: The experiences of the first Navy Domestic Technology Transfair. Final report
#>                       journal
#> 1 Annual Review of Psychology
#> 2                     Reviews
#>                                               publisher
#> 1                                        Annual Reviews
#> 2 Office of Scientific and Technical Information (OSTI)
#>                           editor       date year month  day   pages volume
#> 1 S. Fiske Schacter, & A. Kasdin 2005-02-22 2005     2   22 517-543     56
#> 2                           <NA>       2003 2003  <NA> <NA>    <NA>   <NA>
#>   number                                    doi       address
#> 1      1 10.1146/annurev.psych.56.091103.070250          <NA>
#> 2   <NA>                       10.2172/10138039 Arlington, VA

Created on 2023-01-26 with reprex v2.0.2