ropensci / bib2df

Parse a BibTeX file to a tibble
https://docs.ropensci.org/bib2df
99 stars 22 forks source link

Import bibtex from scopus generates thousands of variables #52

Open ccamara opened 2 years ago

ccamara commented 2 years ago

Whenever I want to import an scopus export, the resulting dataframe is completely messed up and has thousands of columns. Apparently, this should be fixed after #33 or #34 , but I'm afraid it is not.

Steps:

  1. Make a query in scopus
  2. Export bibtext file (see here: 20210604_scopus_urban_commons.zip)
  3. install development version from bib2df (devtools::install_github("ropensci/bib2df") - 28th November 2021)
  4. run testbib <- bib2df::bib2df("<attached file>")

Result:

 testbib
# A tibble: 307 × 55
   CATEGORY   BIBTEXKEY ADDRESS ANNOTE AUTHOR BOOKTITLE CHAPTER CROSSREF EDITION EDITOR HOWPUBLISHED INSTITUTION JOURNAL KEY   MONTH
   <chr>      <chr>     <chr>   <chr>  <list> <chr>     <chr>   <chr>    <chr>   <list> <chr>        <chr>       <chr>   <chr> <chr>
 1 ARTICLE    Köpper20… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Resear… NA    NA   
 2 CONFERENCE Manfredi… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          IOP Co… NA    NA   
 3 ARTICLE    Avdikos2… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Geofor… NA    NA   
 4 ARTICLE    Petrescu… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Enviro… NA    NA   
 5 ARTICLE    Parikh20… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Enviro… NA    NA   
 6 ARTICLE    Dekeyser… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Enviro… NA    NA   
 7 BOOK       Stuber20… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Balanc… NA    NA   
 8 ARTICLE    Wang2021… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Americ… NA    NA   
 9 ARTICLE    Marino20… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Territ… NA    NA   
10 ARTICLE    Sardeshp… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Cities  NA    NA   
# … with 297 more rows, and 40 more variables: NOTE <chr>, NUMBER <chr>, ORGANIZATION <chr>, PAGES <chr>, PUBLISHER <chr>,
#   SCHOOL <chr>, SERIES <chr>, TITLE <chr>, TYPE <chr>, VOLUME <chr>, YEAR <dbl>, DOI <chr>, URL <chr>, AFFILIATION <chr>,
#   ABSTRACT <chr>, AUTHOR_KEYWORDS <chr>, REFERENCES <chr>, ISSN <chr>, LANGUAGE <chr>, ABBREV_SOURCE_TITLE <chr>,
#   DOCUMENT_TYPE <chr>, SOURCE <chr>, ART_NUMBER <chr>, KEYWORDS <chr>, FUNDING_DETAILS <chr>, FUNDING_TEXT <chr>,
#   CORRESPONDENCE_ADDRESS1 <chr>, SPONSORS <chr>, FUNDING_TEXT.1 <chr>, FUNDING_DETAILS.1 <chr>, FUNDING_DETAILS.2 <chr>,
#   ISBN <chr>, FUNDING_DETAILS.3 <chr>, FUNDING_DETAILS.4 <chr>, FUNDING_TEXT.2 <chr>, CODEN <chr>, FUNDING_DETAILS.5 <chr>,
#   PUBMED_ID <chr>, PAGE_COUNT <chr>, CHEMICALS_CAS <chr>

sessioninfo:

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: KDE neon User - Plasma 25th Anniversary Edition

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8    LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7         rstudioapi_0.13    magrittr_2.0.1     tidyselect_1.1.1   R6_2.5.1           rlang_0.4.12      
 [7] fansi_0.5.0        stringr_1.4.0      httr_1.4.2         dplyr_1.0.7        tools_4.1.2        humaniformat_0.6.0
[13] utf8_1.2.2         cli_3.1.0          DBI_1.1.1          ellipsis_0.3.2     assertthat_0.2.1   tibble_3.1.5      
[19] lifecycle_1.0.1    crayon_1.4.2       purrr_0.3.4        vctrs_0.3.8        glue_1.4.2         stringi_1.7.5     
[25] compiler_4.1.2     pillar_1.6.4       generics_0.1.1     renv_0.13.2        bib2df_1.1.2       pkgconfig_2.0.3