rOpenGov / eurostat

R tools for Eurostat data
http://ropengov.github.io/eurostat
Other
235 stars 46 forks source link

Transitioning from Eurostat Bulk Download to API #243

Closed djhurio closed 10 months ago

djhurio commented 2 years ago

I am sure you are aware of the planned changes for the Eurostat data dissemination. For example the Eurostat Bulk Download will be changed to API. It will happen this winter, probably starting from November 2022.

We are heavy users of the "eurostat" package. Thank you for the excellent tool you have developed! Have you planned to make the necessary changes for the "eurostat" package so it will be operational also after the mentioned changes in Eurostat data dissemination?

Let me know if you need help with development or testing regarding this issue.

antaldaniel commented 2 years ago

@djhurio @antagomir I think that this is a good opportunity to think about the future of the package in the context of our new project and our planned ecossytem. I want to draw the attention to an early development phase package, dataset, that I have sent for peer-review to rOpenSci. In my opinion, it would be a better basis for all rOpenGov packages that interact with API, as it creates a special tibble that contains provenance metadata and other relevant information about the data inquisition.

The aim of the package, in the context of eurostat use, is to document properly from downloading , through analysis, till eventual end-result publication on open science repositories or in knowlege graphs (RDF) the results.

As most EU open data services are moving towards RDF and SPARQL, I think that we should also think about moving into this direction. This would enable the user to create, for example, subjective interpretations of Eurostat datasets, and release them (such as our improved regional datasets) in full synch with the Eurostat datasets (i.e. whenever Eurostat releases the new version of the dataset, the changes go through the entire chain until the published subjective verison of the dataset.)

I wanted to bring this up anyways next week when we are starting to plan our 3-year project, but this deadline with Eurostat probably makes these new strategic development more urgent.

I would suggest to start a project that ends in November 2022 with a transition to the new Eurostat API and at the same time reviews potential breaking changes in the package, for example, moving from tibble to the inherited dataset class, which harmonizes well with connectors to Zenodo or the rdflib bindings.

antagomir commented 2 years ago

Also ping @pitkant

antaldaniel commented 2 years ago

@antagomir @pitkant I think that this would be a good opportunity to think through what we want to do in OpenMuse, and what we must do by November (these changes will affect thoroughly the iotables package, too.) We should somehow carefully plan resources so that we can have a smooth transition to OpenMuse but avoid a disruption of critical packages.

antagomir commented 2 years ago

So @djhurio we hope to have a solution in advance but currently this depends a bit on the availability of suitably skilled person to work on the implementation. Ideas and contributions are very welcome. We will inform here about the progress.

pitkant commented 2 years ago

From Migrating_to_API_TSV.pdf pages 8-9 I got the impression that the current ("legacy") way of downloading data will not (at least immediately) be removed but will continue to function. Migrating to API should therefore be seen as a new functionality that we can implement at our own pace, instead of rushing it before November.

I can try and get this confirmed from someone in Eurostat.

antagomir commented 2 years ago

Great, please do check if this can be confirmed. We would proceed with the updates as soon as possible but this may take a bit longer than November.

antaldaniel commented 2 years ago

One thing is very funny, the new dataset package that I would like to use eventually as a dependency for potentially all rOpenGov / eurostat related packages, actually implements some of these changes already. I think Eurostat just moves closer to the new SDMX/RDF standard reconciliation and that is what exactly the dataset package is aiming to do.

So in a way we are on track, but the dataset package is not expected to be rolled out by November, it is currently waiting for further reviewers in rOpenSci, and I would like to develop it in OpenMuse.

pitkant commented 2 years ago

I received the following answer from Eurostat user support:

"The bulk download and the API will coexist for a few months.

When the communication announcing the decommissioning will come, you can count around half a year before the decommissioning happens.

Nevertheless, we could only advise you to transition to the new API as soon as possible."

espinielli commented 1 year ago

There is also rsdmx...

antagomir commented 1 year ago

Yes, I think the advantage of eurostat pkg has been that it is more specific to eurostat and serves that particular use case better than the general-purpose rsdmx package. These things can be reconsidered if there is evidence otherwise.

espinielli commented 1 year ago

Yes. To be frank I gave a try to rsdmx but I really struggled (and failed) to understand how to connect to Eurostat (I should probably feed the author back...)

Still it could be useful to use its infrastructure to present a much more user-friendly interface, like for the {eurostat} package, to the user.

antagomir commented 1 year ago

If we have to rewrite the package with the new API changes, then this is potentially something to look at as one possible solution, at least.

djhurio commented 1 year ago

Dear all, it looks the old Eurostat API is not operating any more and it has broken the API download at the eurostat package. Issue: https://github.com/rOpenGov/eurostat/issues/251 Broken API URL: https://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_inw?geo=EA&sinceTimePeriod=1996

antagomir commented 1 year ago

Thanks for the reporting.

We are actively looking for a solution to this, any support will be appreciated.

pitkant commented 1 year ago

Dear all, it looks the old Eurostat API is not operating any more and it has broken the API download at the eurostat package. Issue: #251 Broken API URL: https://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_inw?geo=EA&sinceTimePeriod=1996

As mentioned in #251, bulk download is still working so please use it. In the meantime in working on fixing the get_eurostat_json() function that could retrieve data in JSON-stat format through the Eurostat API Statistics web service. (https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query)

pitkant commented 1 year ago

Message from the Eurostat BulkDownloadListing website:

Information message The bulk download will no longer be available as of October 2023. Users are invited to start using the new features that are available as replacement:

For any additional questions, please contact user support .

We will aim to complete the migration from old BulkDownload to the new API and remove references to BulkDownload from code before October.

pitkant commented 1 year ago

The issue, as it was described in the opening message, is now fixed in v4-dev branch of eurostat: https://github.com/rOpenGov/eurostat/tree/v4-dev

Another avenue worth exploring might be to utilise pure "SDMX-ML" more as abovementioned rsdmx package is doing - at least for fetching dataset metadata from Data Structure Definition (DSD) files. I tried the package and handling big xml files felt a bit slow so I'm not sure if it's the way to go for most users. Probably the easiest way to achieve this would be to rely on rsdmx package as a dependency / import and wrap them in functions that would be similar to other, existing functions.

We're looking into that now but, as mentioned in the first message, the "Transitioning from Eurostat Bulk Download to API" part of this issue has now been fixed and it be available in CRAN in the near future.


@antaldaniel wrote on Sep 13, 2022:

One thing is very funny, the new dataset package that I would like to use eventually as a dependency for potentially all rOpenGov / eurostat related packages, actually implements some of these changes already. I think Eurostat just moves closer to the new SDMX/RDF standard reconciliation and that is what exactly the dataset package is aiming to do.

So in a way we are on track, but the dataset package is not expected to be rolled out by November, it is currently waiting for further reviewers in rOpenSci, and I would like to develop it in OpenMuse.

I am personally not familiar with the relation between SDMX and RDF. SDMX Roadmap 2021-2025 does mentions that

The 2020 SDMX roadmap put a lot of emphasis on the exploration of links between SDMX and other standards (e.g. XBRL, RDF, DDI etc.)

and it would seem that while there certainly is interest in translating SDMX data / metadata to RDF and vice versa in building open data portals and other interoperable infrastructure, it is not something that is a core concern for the SDMX community and institutions that utilize SDMX, such as Eurostat and ECB, at least when they are providing documentation for end users who just need to fetch data regularly.

I think eurostat package users also appreciate the "do one thing and do it well" aspect of the package. I would not lightly add packages like rdflib/redland as a dependency as they, in turn, depend on Redland RDF C library that complicates things in CI and testing.

I think discussing these types of ideas for added features would be the best in Discussions.

pitkant commented 10 months ago

Closed with the CRAN release of package version 4.0.0