rOpenGov / eurostat

R tools for Eurostat data
http://ropengov.github.io/eurostat
Other
234 stars 46 forks source link

Enhancements to get_bibentry #268

Closed pitkant closed 8 months ago

pitkant commented 1 year ago

Current implementation of get_bibentry creates some non-sensical results. For example the example in the function:

my_bibliography <- get_bibentry(
    code = c("tran_hv_frtra", "t2020_rk310", "tec00001"),
    keywords = list(
      c("railways", "freight", "transport"),
      c("railways", "passengers", "modal split")
    ),
    format = "Biblatex"
  )

prints the following

> my_bibliography
@Misc{tec00001_15-08-2023,
  title = {Gross domestic product at market prices [tran_hv_frtra]},
  url = {https://ec.europa.eu/eurostat/web/products-datasets/-/tran_hv_frtra},
  language = {en},
  year = {15.08.2023},
  publisher = {Eurostat},
  author = {{Eurostat}},
  keywords = {railways, freight, transport},
  urldate = {2023-08-16},
}

@Misc{tran_hv_frtra_15-03-2023,
  title = {Volume of freight transport relative to GDP [t2020_rk310]},
  url = {https://ec.europa.eu/eurostat/web/products-datasets/-/t2020_rk310},
  language = {en},
  year = {15.03.2023},
  publisher = {Eurostat},
  author = {{Eurostat}},
  keywords = {railways, passengers, modal split},
  urldate = {2023-08-16},
}

We can spot the following things:

European Commission, Eurostat, 'Airport traffic data by reporting airport and airlines' (avia_tf_apal), most recent data 2021-09-01, https://ec.europa.eu/eurostat/ databrowser/view/avia_tf_apal/default/table?lang=en

European Commission, Eurostat, 'Total length of motorways' (ttr00002), accessed 2021-10-15, https://ec.europa.eu/eurostat/databrowser/view/ttr00002/default/ table?lang=en

European Commission, Eurostat, 'Real GDP growth rate -- volume' (tec00115), updated 2021-09-28, https://ec.europa.eu/eurostat/databrowser/view/tec00115/default/ table?lang=en

I link this to the original issue where this was discussed: #128 Also maybe related: #199

sessionInfo:

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Helsinki
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] eurostat_4.0.0.9003

loaded via a namespace (and not attached):
 [1] utf8_1.2.3         generics_0.1.3     tidyr_1.3.0        class_7.3-22      
 [5] xml2_1.3.5         KernSmooth_2.23-22 stringi_1.7.12     hms_1.1.3         
 [9] digest_0.6.33      magrittr_2.0.3     countrycode_1.5.0  timechange_0.2.0  
[13] ISOweek_0.6-2      cellranger_1.1.0   rprojroot_2.0.3    plyr_1.8.8        
[17] jsonlite_1.8.7     e1071_1.7-13       backports_1.4.1    httr_1.4.6        
[21] purrr_1.0.2        fansi_1.0.4        regions_0.1.8      bibtex_0.5.1      
[25] cli_3.6.1          crayon_1.5.2       rlang_1.1.1        bit64_4.0.5       
[29] withr_2.5.0        parallel_4.3.1     tools_4.3.1        tzdb_0.4.0        
[33] dplyr_1.1.2        here_1.0.1         curl_5.0.2         assertthat_0.2.1  
[37] vctrs_0.6.3        R6_2.5.1           proxy_0.4-27       lifecycle_1.0.3   
[41] lubridate_1.9.2    classInt_0.4-9     RefManageR_1.4.0   stringr_1.5.0     
[45] bit_4.0.5          vroom_1.6.3        pkgconfig_2.0.3    pillar_1.9.0      
[49] glue_1.6.2         Rcpp_1.0.11        tibble_3.2.1       tidyselect_1.2.0  
[53] rstudioapi_0.15.0  readr_2.1.4        compiler_4.3.1     readxl_1.4.3      
antagomir commented 1 year ago

Are having RefManageR as a package import and BibLaTeX as an output option really necessary? For example pxweb package just utilises the base R utils::bibentry and print(utils::citation) method which would probably be what most users need. RefManageR seemed to be most useful if I wanted to insert citations in .md/.Rmd files in RStudio and manage my bibliographies there but now it seems a bit overkill for the simple purpose of printing a data citation.

-> I tend to agree. It might be also useful to support similar conventions across packages in general

pitkant commented 1 year ago

I sent a question to Eurostat user support and received the following instructions on how to properly cite Eurostat datasets:

Users are free to choose the method to cite Eurostat’s datasets. However, the following guidelines must be followed to reference our statistical data:

· The origin of the data should always be mentioned as “Source: Eurostat”.

· The online dataset codes(s) should also be provided in order to ensure transparency and facilitate access to the Eurostat data and related methodological information. For example: “Source: Eurostat (online data code: namq_10_gdp)”

· Online publications (e.g. web pages, PDF) should include a clickable link to the dataset using the bookmark functionality available in the Eurostat data browser.

It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.

Something to take into consideration here.

pitkant commented 1 year ago

Upon further inquiry from Eurostat user support I received clarification on the "guidelines that must be followed":

Please note that the three general guidelines sent have been prepared for the citation of statistical data in European Commission publications (traditional publications but also social media, website posts…) to enable the users to quickly and easily identify the origin of the data and, if needed, access the source data for themselves, while not overloading the main document with excessive detail. Therefore, unfortunately, no further details are provided for academic articles.

and then the following information that is also included in at least some eurostat package function documentation:

Eurostat does not advocate a specific way of making bibliographic citations to Eurostat data in academic articles and, unfortunately, we do not provide guidance in this regard.

Simply, what is required under our copyright notice is to mention that the source of the data is Eurostat and, where possible, to provide the link: Copyright notice and free re-use of data - Eurostat (europa.eu)

All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:

  • the source is indicated as Eurostat;
  • when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information.

So it seems that Eurostat user support could not give us definitive advice.

I think returning roughly to the same practice as it was before is the safest option. I removed mention of European Commission from Eurostat citation as it was only found in data.europa.eu website and I'm not sure if Publications Office of the European Union is the final authority to define that European Commission should be attributed.

pitkant commented 8 months ago

Closed with the CRAN release of package version 4.0.0