rOpenGov / eurostat

R tools for Eurostat data
http://ropengov.github.io/eurostat
Other
234 stars 46 forks source link

get_eurostat() does not save .rds files #258

Closed rideofyourlife closed 8 months ago

rideofyourlife commented 1 year ago

Hello.

Until recently everything worked perfectly, but not so long ago the get_eurostat() function stopped saving .RDS files. This is despite the parameter cache = T and making sure the 'cache_dir' is the correct one (obviously). No file appears under the desired location. This is regardless of what directory I choose. However, when I load the dataset into a temporary variable in R memory, it works fine.

For example:

_temp <- get_eurostat(id = "namq_10_gdp", cache = T, cachedir = "XYZ")

A data.frame temp is created, but no RDS appears in the XYZ folder.

Does anyone experience the same issue? Looking forward to your insight. Paweł

Edit: My R version is 4.2.3 and the installed version of the 'eurostat' package is 3.8.2 I guess it might have something to do with the fact that despite having the 3.8.2 version of the package, I don't have the access to the functions introduced in 3.7.14. It was published on 22nd of March and indeed my last RDS files are from before this date. For example I can't access the _get_eurostatraw2 function, although theoretically I should be able to.

pitkant commented 1 year ago

Thank you for your report, I will start looking into this.

For clarification: Do you mean by being unable to access get_eurostat_raw2 function that you can't find it even with triple colon operator (eurostat:::get_eurostat_raw2) or with the regular double colon operator (eurostat::get_eurostat_raw2)? Because if it is the latter, you shouldn't be as it is an internal function (as is its predecessor, get_eurostat_raw), but if it is the former case it is weird indeed.

rideofyourlife commented 1 year ago

The triple colon operator worked. I had no idea it had existed. I thought only double colon operator works. That leaves only the saving of RDS files on the table.

pitkant commented 1 year ago

@dieghernan Do you have, on the top of your head, some insight what might be the source of this caching issue here?

rideofyourlife commented 1 year ago

Unfortunately, no.

pitkant commented 1 year ago

I ran your examples and went through the code. It would seem that there was technically no bug or unintended behaviour here, as it reads in the get_eurostat documentation:

cache a logical whether to do caching. Default is TRUE. Affects only queries from the bulk download facility.

This is not technically true as caching is available also from JSON IF there are filters in the query, like this:

dd <- get_eurostat("nama_10_gdp",
                   filters = list(
                       geo = "FI",
                       na_item = "B1GQ",
                       unit = "CLV_I10"
                   ), cache_dir = "inst/extras")

In the get_eurostat code cache is set explicitly off for queries that don't have filters:

182    # No cache for json
183    if (is.null(filters) || identical(filters, "none")) {
184      cache <- FALSE
185    }

I think this could be changed in the future so that caching would be also available for queries without filters. I will mark this issue as enhancement to be taken into consideration in future version and documentation for writing more explicit documentation (or as a reminder to update the documentation if/when caching functionality is changed).

rideofyourlife commented 1 year ago

I do not use filters, because I then receive a message that my inquiry is longer than 50 obs and I usually need many more than that. This default "cache <- FALSE", was it the result of 3.7.14 update or the new Eurostat API? Because I truly started experiencing this issue not very long ago.

pitkant commented 1 year ago

Judging by the history of get_eurostat.R file it would seem that the intention to prevent caching has existed at least for some years. On February this year I changed the code snippet slightly from this version

152    # No cache for json
153    if (is.null(filters) || filters != "none") {
154      cache <- FALSE
155    }

to this version

152    # No cache for json
153    if (is.null(filters) || identical(filters, "none")) {
154      cache <- FALSE
155    }

Actually now that I look at the examples above, the old code seems to imply that cache was set to FALSE when filters was not equal to "none". On the other hand cache is set to false if filters is NULL (which it rarely would be as "none" is the default parameter value), this is probably why I changed it to be more logical with the comment text as well.

If you are comfortable with editing the source code of the function you could clone this repository in your RStudio and change identical(filters, "none") to !identical(filters, "none") and build the package so it should have the same behaviour as it used to.

pitkant commented 1 year ago

This issue could be solved simultaneously with #257 by making it so that datasets that are downloaded without filters are cached whereas filtered datasets are not.

rideofyourlife commented 1 year ago

Sorry for the newbie question, but how do I find eurostat.R here? temp2

pitkant commented 1 year ago

@rideofyourlife -- thank you for your question! There is no eurostat.R but if you want to find get_eurostat.R it should be in the R folder (along with other functions). I think you could just edit the R code snippet in question and start using the modified package immediately since R packages are not usually compiled but interpreted, but I don't know if there are some downsides to that. The recommended workflow, if you are using RStudio, would be the following:

  1. start a new project
  2. clone this GitHub repository
  3. make the abovementioned changes to get_eurostat.R code
  4. build and install the custom package you just did.

Here is information about starting new projects in RStudio: https://support.posit.co/hc/en-us/articles/200526207-Using-RStudio-Projects

Or then you could fork this repository, make the change in your own repository and then install the modified package with devtools, with a command like this: devtools::install_github("rideofyourlife/eurostat")

rideofyourlife commented 1 year ago

Yes, sorry. I meant get_eurostat.R.

rideofyourlife commented 1 year ago

I can't find any 'get_eurostat.R' file in "C:\Users\ropia\AppData\Local\R\win-library\4.3\eurostat", where supposedly the package is installed. What am I doing wrong?

pitkant commented 1 year ago

Right, sorry. I forgot that installed R packages only have .rdb and .rdx files in the R folder instead of function files.

You could try the other alternatives I mentioned above or see what happens if you use edit(get_eurostat) in RStudio Console and save the changes. Another option mentioned in Stack Overflow is to use a function called trace to make some edits, but these are probably not permanent and you have to do these changes every time.

rideofyourlife commented 1 year ago

I decided to go on with editing the file via edit() function. The issue I face is that I type the below in the R Studio console: edit(eurostat::get_eurostat)

I add the exclamation point in the place that you mentioned: !identical(filters, "none"), I save it but finally nothing changes even after saving, because I edit it again and no changes to the code were made.

Can you think of a reason for this?

pitkant commented 1 year ago

Did you try saving the edited function to a new temporary function like this:

edited_get_eurostat <- edit(get_eurostat)

I was able to save the cached file to my desired folder ("cache" folder in my home directory) after this.

temp <- edited_get_eurostat(id = "namq_10_gdp", cache = T, cache_dir = "cache")

EDIT: I realize I was a bit unclear in my instructions above so I'll copy-paste some relevant information from edit function documentation:

It is important to realize that edit does not change the object called name. Instead, a copy of name is made and it is that copy which is changed. Should you want the changes to apply to the object name you must assign the result of edit to name. (Try fix if you want to make permanent changes to an object.)

Learn something new every day.

pitkant commented 8 months ago

Issue should be now fixed with the new version. Closed with the CRAN release of package version 4.0.0