opensdmx / rsdmx

Tools for reading SDMX data and metadata in R
https://github.com/opensdmx/rsdmx/wiki
105 stars 30 forks source link

Warning when querying large datasets at Eurostat #50

Closed rbagd closed 9 years ago

rbagd commented 9 years ago

Take this query for data at Eurostat:

It returns a message with code 413 indicating that the query is too large and provides a link to a zip file which contains the desirable XML file. In this case, it's of size 4.7 Mb.

It could be useful to get warned whenever this is the case and possibly provide the link to the zip file in the console output. As of now, rsdmx parses the message correctly, though user is not warned. Only when you get NULL dataframe, you start investigating the problem.

eblondel commented 9 years ago

Thanks @rbagd for this. I'm going to have a look ASAP to this problem, and see the way to proceed. As much as possible, i will provide a small enhancement for this. Prior to this, i need to check if the message provided is somehow supported by the SDMX standard, or if it is adhoc handler implemented by Eurostat.

From a user point of view, do you think it would be enought to have a warning, only, or also to make it even smarter, and extract data from the zip file and read it? (this also could be an option in readSDMX)

rbagd commented 9 years ago

Here's some more documentation from Eurostat on this. I haven't encountered this message elsewhere even though some datasets from OECD I've tried are twice as large.

It's true that it would be handy to have it processed automatically in most cases. I'm just a little worried that in those few other cases you could accidentally lock up your machine because of some wildcard abuse. Is that likely? An explicit option is probably the best idea - whether it's enabled by default could be left for the user.

eblondel commented 9 years ago

It sounds good. I've further looked to the SDMX standard, i didn't this footer supported in SDMX 2.0 but yes in 2.1. In the schemas, it is defined as follows:

Footer is used to communicate information such as error and warnings after the payload of a message.

I'm testing its integration in the package. The first step is to make the footer part of the rsdmx object model (rsdmx intends to have a R image of the SDMX information model), afterwhat i will add messaging if a footer exists in the response. Later i will investigate about the download & read again option.

rbagd commented 9 years ago

That's awesome. Thanks @eblondel for the great work on this package.

eblondel commented 9 years ago

@rbagd You can now test it! Now readSDMX will handle one more warning in case the SDMX document contains a footer.

However, i had a look to a possible suite, downloading / unzipping the file, but as it's not part of the SDMX standard (which does not include any specific element as alternative link), i would not add this to readSDMX which should be generic.

Let me know when you have tested it!

rbagd commented 9 years ago

It appears to work nicely with Eurostat: warnings do appear as expected with the few queries I tested. I'll keep an eye in the future for other data providers who have something similar.

The automatic processing for this particular issue at Eurostat can be easily worked around at the user level now that the footer is there, i.e. if length(query@footer@messages) > 0, then download.file -> unzip -> readSDMX, so it's really no big deal if it's not part of the function.

I think we can close the issue now. Thanks again for a swift response.

eblondel commented 9 years ago

Thanks for your feedback, much appreciated. Feel free to open new tickets when you feel it appropriate.