rOpenGov / eurostat

R tools for Eurostat data
http://ropengov.github.io/eurostat
Other
234 stars 46 forks source link

missing tables in get_eurostat_toc() #259

Closed aptscbs closed 1 year ago

aptscbs commented 1 year ago

When getting the table of contents fort he Eurostat tables with get_eurostat_toc() from the Eusostat Package, there are many missing tables. These tables are at the website, but not in the table of contents.

For example: ilc_li11 https://ec.europa.eu/eurostat/databrowser/view/ILC_LI11__custom_1763573/bookmark/table?lang=en&bookmarkId=6cbe3951-fe2d-4467-b62a-f934f6de9e15

Is there some explanation about this?

pitkant commented 1 year ago

I tried replicating this report by using the following code:

toc <- get_eurostat_toc()
"ilc_li11" %in% toc$code
[1] TRUE

In this example, please note case-sensitivity:

"ILC_LI11" %in% toc$code
[1] FALSE

My downloaded toc object has 11005 observations and 8 variables.

aptscbs commented 1 year ago

my downloaded object has only 3059 records and 8 variables. That is strange and probaly the explanation. But Why? i am using eurostat 3.7.10

pitkant commented 1 year ago

I'm not exactly sure, there could've been a change to the URL where toc is downloaded from. You can check get_eurostat_toc() function help for the direct URL. In the most recent CRAN released version 3.8.2. it is https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=table_of_contents_en.txt

Anyway I would recommend updating the package, with eurostat being a package that is querying an API that had some changes in January 2023. The old version isn't able to download data from JSON web service so functionalities are more limited in the old version.

aptscbs commented 1 year ago

ok, thanks. I installed the newest version, but stil only 4593 records, but there is some progression ;-) I wille ask at my office, and try it at home to make sure that it is not a problem here with some security or proxy things.

pitkant commented 1 year ago

Thank you for your report. What the function does is effectively download the .txt file and parse it to a tibble so I'm at a loss here on why it would only have 4593 or 3059 rows. When I open the table_of_contents_en.txt on my computer it has 11006 rows, the top row being the row for column names. It could be that the 1.7 MB download is interrupted for some reason and the function can read only a part of it.

aptscbs commented 1 year ago

Maybe that is the problem. If I run the code in a separate r-file i have 4817 records. I have to ask the IT service here.

aptscbs commented 1 year ago

I read the file with read.delim() and that works! I got all records now and the program works fine again. Thank for thinking with me!

pitkant commented 1 year ago

Glad to hear it worked. We're actually using readr::read_tsv in get_eurostat_toc / set_eurostat_toc functions but it might actually be worth considering to just use the base R solution.