Closed djhurio closed 10 months ago
@djhurio @antagomir I think that this is a good opportunity to think about the future of the package in the context of our new project and our planned ecossytem. I want to draw the attention to an early development phase package, dataset, that I have sent for peer-review to rOpenSci. In my opinion, it would be a better basis for all rOpenGov packages that interact with API, as it creates a special tibble that contains provenance metadata and other relevant information about the data inquisition.
The aim of the package, in the context of eurostat use, is to document properly from downloading , through analysis, till eventual end-result publication on open science repositories or in knowlege graphs (RDF) the results.
As most EU open data services are moving towards RDF and SPARQL, I think that we should also think about moving into this direction. This would enable the user to create, for example, subjective interpretations of Eurostat datasets, and release them (such as our improved regional datasets) in full synch with the Eurostat datasets (i.e. whenever Eurostat releases the new version of the dataset, the changes go through the entire chain until the published subjective verison of the dataset.)
I wanted to bring this up anyways next week when we are starting to plan our 3-year project, but this deadline with Eurostat probably makes these new strategic development more urgent.
I would suggest to start a project that ends in November 2022 with a transition to the new Eurostat API and at the same time reviews potential breaking changes in the package, for example, moving from tibble to the inherited dataset class, which harmonizes well with connectors to Zenodo or the rdflib bindings.
Also ping @pitkant
@antagomir @pitkant I think that this would be a good opportunity to think through what we want to do in OpenMuse, and what we must do by November (these changes will affect thoroughly the iotables package, too.) We should somehow carefully plan resources so that we can have a smooth transition to OpenMuse but avoid a disruption of critical packages.
So @djhurio we hope to have a solution in advance but currently this depends a bit on the availability of suitably skilled person to work on the implementation. Ideas and contributions are very welcome. We will inform here about the progress.
From Migrating_to_API_TSV.pdf pages 8-9 I got the impression that the current ("legacy") way of downloading data will not (at least immediately) be removed but will continue to function. Migrating to API should therefore be seen as a new functionality that we can implement at our own pace, instead of rushing it before November.
I can try and get this confirmed from someone in Eurostat.
Great, please do check if this can be confirmed. We would proceed with the updates as soon as possible but this may take a bit longer than November.
One thing is very funny, the new dataset package that I would like to use eventually as a dependency for potentially all rOpenGov / eurostat related packages, actually implements some of these changes already. I think Eurostat just moves closer to the new SDMX/RDF standard reconciliation and that is what exactly the dataset
package is aiming to do.
So in a way we are on track, but the dataset package is not expected to be rolled out by November, it is currently waiting for further reviewers in rOpenSci, and I would like to develop it in OpenMuse.
I received the following answer from Eurostat user support:
"The bulk download and the API will coexist for a few months.
When the communication announcing the decommissioning will come, you can count around half a year before the decommissioning happens.
Nevertheless, we could only advise you to transition to the new API as soon as possible."
There is also rsdmx...
Yes, I think the advantage of eurostat pkg has been that it is more specific to eurostat and serves that particular use case better than the general-purpose rsdmx package. These things can be reconsidered if there is evidence otherwise.
Yes. To be frank I gave a try to rsdmx but I really struggled (and failed) to understand how to connect to Eurostat (I should probably feed the author back...)
Still it could be useful to use its infrastructure to present a much more user-friendly interface, like for the {eurostat} package, to the user.
If we have to rewrite the package with the new API changes, then this is potentially something to look at as one possible solution, at least.
Dear all, it looks the old Eurostat API is not operating any more and it has broken the API download at the eurostat
package.
Issue: https://github.com/rOpenGov/eurostat/issues/251
Broken API URL: https://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_inw?geo=EA&sinceTimePeriod=1996
Thanks for the reporting.
We are actively looking for a solution to this, any support will be appreciated.
Dear all, it looks the old Eurostat API is not operating any more and it has broken the API download at the
eurostat
package. Issue: #251 Broken API URL: https://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_inw?geo=EA&sinceTimePeriod=1996
As mentioned in #251, bulk download is still working so please use it. In the meantime in working on fixing the get_eurostat_json()
function that could retrieve data in JSON-stat format through the Eurostat API Statistics web service. (https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query)
Message from the Eurostat BulkDownloadListing website:
Information message The bulk download will no longer be available as of October 2023. Users are invited to start using the new features that are available as replacement:
- Manual downloads using the data browser 'Download operations' section
- Automatic downloads using the new API. For details, please consult the 'developer's corner' which is part of the data browser online help section. The chapter 'Transition - from Eurostat bulk download to API' contains important information if you need to perform a migration.
For any additional questions, please contact user support .
We will aim to complete the migration from old BulkDownload to the new API and remove references to BulkDownload from code before October.
The issue, as it was described in the opening message, is now fixed in v4-dev branch of eurostat: https://github.com/rOpenGov/eurostat/tree/v4-dev
get_eurostat
, get_eurostat_raw
) now uses SDMX 2.1 API to download data as TSV files as per Eurostat instructions: Transition from Eurostat Bulk Download to APIget_eurostat_json()
) is now using "API Statistics" and migrated as per Eurostat instructions: Migrating from JSON web service to API StatisticsAnother avenue worth exploring might be to utilise pure "SDMX-ML" more as abovementioned rsdmx package is doing - at least for fetching dataset metadata from Data Structure Definition (DSD) files. I tried the package and handling big xml files felt a bit slow so I'm not sure if it's the way to go for most users. Probably the easiest way to achieve this would be to rely on rsdmx package as a dependency / import and wrap them in functions that would be similar to other, existing functions.
We're looking into that now but, as mentioned in the first message, the "Transitioning from Eurostat Bulk Download to API" part of this issue has now been fixed and it be available in CRAN in the near future.
@antaldaniel wrote on Sep 13, 2022:
One thing is very funny, the new dataset package that I would like to use eventually as a dependency for potentially all rOpenGov / eurostat related packages, actually implements some of these changes already. I think Eurostat just moves closer to the new SDMX/RDF standard reconciliation and that is what exactly the
dataset
package is aiming to do.So in a way we are on track, but the dataset package is not expected to be rolled out by November, it is currently waiting for further reviewers in rOpenSci, and I would like to develop it in OpenMuse.
I am personally not familiar with the relation between SDMX and RDF. SDMX Roadmap 2021-2025 does mentions that
The 2020 SDMX roadmap put a lot of emphasis on the exploration of links between SDMX and other standards (e.g. XBRL, RDF, DDI etc.)
and it would seem that while there certainly is interest in translating SDMX data / metadata to RDF and vice versa in building open data portals and other interoperable infrastructure, it is not something that is a core concern for the SDMX community and institutions that utilize SDMX, such as Eurostat and ECB, at least when they are providing documentation for end users who just need to fetch data regularly.
I think eurostat package users also appreciate the "do one thing and do it well" aspect of the package. I would not lightly add packages like rdflib/redland as a dependency as they, in turn, depend on Redland RDF C library that complicates things in CI and testing.
I think discussing these types of ideas for added features would be the best in Discussions.
Closed with the CRAN release of package version 4.0.0
I am sure you are aware of the planned changes for the Eurostat data dissemination. For example the Eurostat Bulk Download will be changed to API. It will happen this winter, probably starting from November 2022.
We are heavy users of the "eurostat" package. Thank you for the excellent tool you have developed! Have you planned to make the necessary changes for the "eurostat" package so it will be operational also after the mentioned changes in Eurostat data dissemination?
Let me know if you need help with development or testing regarding this issue.