khaeru / sdmx

SDMX information model and client in Python
https://sdmx1.readthedocs.io
Apache License 2.0
23 stars 17 forks source link

`HTTPError` when using `COMP` client #162

Closed leostimpfle closed 4 months ago

leostimpfle commented 5 months ago

Using sdmx v2.13, I receive an HTTPError when trying to get datflows with the COMP client. It is, however, possible to get the data directly.

Looking at the testing summary, I expected the dataflow to work for the COMP client.

Code

import sdmx

source = 'COMP'
flow = 'AID_SCB_OBJ'
client = sdmx.Client(source)

response = client.data(flow)  # this works
dsd = client.dataflow(flow)  # this throws HTTPError

Output

Traceback (most recent call last):

  Cell In[11], line 8
    dsd = client.dataflow(flow)  # this throws HTTPError

  File ~/Documents/code/venvs/etl-venv/lib/python3.12/site-packages/sdmx/client.py:456 in get
    response.raise_for_status()

  File ~/Documents/code/venvs/etl-venv/lib/python3.12/site-packages/requests/models.py:1021 in raise_for_status
    raise HTTPError(http_error_msg, response=self)

HTTPError: 400 Client Error: Bad Request for url: https://webgate.ec.europa.eu/comp/redisstat/api/dissemination/sdmx/2.1/dataflow/COMP/AID_SCB_OBJ/latest?references=all
khaeru commented 5 months ago

Hi! Thanks for the clear report.

I open the final URL in your message in a browser, and see this:

<S:Fault xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
  <faultcode>140</faultcode>
  <faultstring>ERR_GEN_FLOW_REFERENCES: The reference of return detail attribute must be set to one of None, Children, Descendants</faultstring>
  <script id="bw-fido2-page-script"/>
</S:Fault>

Setting aside the specific XML tags that are not familiar, this appears to be a complaint about the ?references=all part of the URL. This particular web service appears to want another value. For instance:

import sdmx

client = sdmx.Client("COMP")
sm = client.dataflow("AID_SCB_OBJ", params=dict(references="children"))
sm

shows:

<sdmx.StructureMessage>
  <Header>
    id: 'DF1706515980'
    prepared: '2024-01-29T08:13:00.871000+00:00'
    sender: <Agency COMP>
    source: 
    test: False
  response: <Response [200]>
  Codelist (4): CL_FREQ CL_GEO CL_UNIT CL_OBJ_SCB
  ConceptScheme (1): CS_COMP
  DataflowDefinition (1): AID_SCB_OBJ
  DataStructureDefinition (1): DSD_SCB_OBJ

Based on the error message, this source will accept one of "none", "children", or "descendants", but not "all". This is a quirk or idiosyncrasy of this particular source: the value "all" is in the SDMX REST standard and most other sources do support it.

Why does this occur with sdmx1? sdmx1 carries forward an old (old…) default that sets ?references=all. But the code also recognizes that some sources do not support this value, and provide work-arounds. For instance, see here: https://github.com/khaeru/sdmx/blob/7a2de97237860eac894399fd8c4cba9e09180234/sdmx/source/estat.py#L52-L66

So we could fix this in one or more of several ways:

  1. Leave a message in the docs or just this issue to warn people that this source only supports a subset of
  2. Add a similar Source subclass for COMP that makes these changes automatically, warning the user. In this case I might actually generalize the ESTAT code, because it looks like when 2+ services use the same SDMX server software, they end up showing the same limitations or idiosyncrasies. So better not to duplicate code.

In either case, it would be a big help if you could:

leostimpfle commented 5 months ago

Hi Paul,

Thanks for the quick response.

I think that simply providing a more helpful error message would be very valuable, as explicitly providing the references parameter resolves the HTTPError.

Please let me know your thoughts and if there's anything I can help with.

khaeru commented 5 months ago

Great, thanks for the confirmation and for looking into the provider's docs.

Indeed since it's probably the same team at the agency running these several sources, they are likely running the same software for each, with the same limitations. Maybe if they weren't duplicating work they'd have time for better documentation 😅

I will at some point open a PR to make the changes mentioned above. In the meantime this issue will stay open to document. Contributions from you or anyone else who desires a quicker fix are also welcome.