khaeru / sdmx

SDMX information model and client in Python
https://sdmx1.readthedocs.io
Apache License 2.0
28 stars 19 forks source link

verify=False etc. are not passed to pre-query for data key validation #77

Closed albertame closed 3 years ago

albertame commented 3 years ago
ecb = sdmx.Client('ecb')
ecb.data(resource_id = 'YC', 
                         key={'FREQ': 'B',
                              'REF_AREA': 'U2',
                              'CURRENCY': 'EUR',
                              'PROVIDER_FM': '4F',
                              'INSTRUMENT_FM': 'G_N_A',
                              'PROVIDER_FM_ID': 'SV_C_YM',
                              'DATA_TYPE_FM': ['BETA0', 'BETA1','BETA2','BETA3','TAU1','TAU2']}, 
                         params = {'startPeriod': '2007-01-01'})

SSLError: HTTPSConnectionPool(host='sdw-wsrest.ecb.europa.eu', port=443): Max retries exceeded with url: /service/dataflow/ECB/YC/latest?references=all (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1129)')))

Would it be possible to make very=false as default, or adding an additional parameter for that? It worked fine until last week.

The problem seems to be specific for the ECB (SDW) data portal. I don't have the same problem for EUROSTAT for example.

Many thanks in advance. Kind regards, Alberto

khaeru commented 3 years ago

Would it be possible to make very=false as default, or adding an additional parameter for that?

Do you mean verify=False? This is already explicitly collected in Client.get() and passed on: https://github.com/khaeru/sdmx/blob/589928c4238cb911796b3051b7e2262a5c0603db/sdmx/client.py#L402-L405

Have you tried it? What was the output?

P.S. I edited the issue description to add triple backticks (```) around your code snippet. This makes reading much easier. See the Markdown help linked from the little icon in the bottom-right of the text box: https://guides.github.com/features/mastering-markdown/

albertame commented 3 years ago

yes, I meant verify=False. I have never used the get() function, the parameters to be passed are different than data(). But looking at the URL passed, I think for ECB dataflow should be replaced by data.

Thank you also for the advice on the triple backticks.

khaeru commented 3 years ago

I have never used the get() function, the parameters to be passed are different than data().

data(…) is nothing more than a shortcut for get(resource_type="data", …). Extra keyword arguments like verify should be handled the same way by each.

But looking at the URL passed, I think for ECB dataflow should be replaced by data.

No, this is as expected. See the documentation for get() around:

…the key argument is validated against the relevant DataStructureDefinition, either given with the dsd keyword argument, or retrieved from the web service before the main query.

The failure you're seeing occurs during this step, before the “main query”.

albertame commented 3 years ago
ecb.get(resource_type = 'data',
                         resource_id = 'YC',
                         dsd = 'ECB_FMD2',
                         key={'FREQ': 'B',
                              'REF_AREA': 'U2',
                              'CURRENCY': 'EUR',
                              'PROVIDER_FM': '4F',
                              'INSTRUMENT_FM': 'G_N_A',
                              'PROVIDER_FM_ID': 'SV_C_YM',
                              'DATA_TYPE_FM': 'BETA0+BETA1+BETA2+BETA3+TAU1+TAU2'}, 
                         params = {'startPeriod': '2007-01-01'}, 
                         verify = False)
Traceback (most recent call last):

  File "<ipython-input-86-c823d8e1c2d4>", line 1, in <module>
    ecb.get(resource_type = 'data',

  File "C:\Users\al005366\AppData\Roaming\Python\Python39\site-packages\sdmx\client.py", line 411, in get
    req = self._request_from_args(kwargs)

  File "C:\Users\al005366\AppData\Roaming\Python\Python39\site-packages\sdmx\client.py", line 233, in _request_from_args
    key, dsd = self._make_key(resource_type, resource_id, key, dsd)

  File "C:\Users\al005366\AppData\Roaming\Python\Python39\site-packages\sdmx\client.py", line 157, in _make_key
    cc = dsd.make_constraint(key)

AttributeError: 'str' object has no attribute 'make_constraint'
ecb.get(resource_type = 'data',
                         resource_id = 'YC',
                         key={'FREQ': 'B',
                              'REF_AREA': 'U2',
                              'CURRENCY': 'EUR',
                              'PROVIDER_FM': '4F',
                              'INSTRUMENT_FM': 'G_N_A',
                              'PROVIDER_FM_ID': 'SV_C_YM',
                              'DATA_TYPE_FM': 'BETA0+BETA1+BETA2+BETA3+TAU1+TAU2'}, 
                         params = {'startPeriod': '2007-01-01'}, 
                         verify = False)

Traceback (most recent call last):

  File "<ipython-input-87-0be03fbe7e40>", line 1, in <module>
    ecb.get(resource_type = 'data',

  File "C:\Users\al005366\AppData\Roaming\Python\Python39\site-packages\sdmx\client.py", line 411, in get
    req = self._request_from_args(kwargs)

  File "C:\Users\al005366\AppData\Roaming\Python\Python39\site-packages\sdmx\client.py", line 233, in _request_from_args
    key, dsd = self._make_key(resource_type, resource_id, key, dsd)

  File "C:\Users\al005366\AppData\Roaming\Python\Python39\site-packages\sdmx\client.py", line 141, in _make_key
    self.dataflow(

  File "C:\Users\al005366\AppData\Roaming\Python\Python39\site-packages\sdmx\client.py", line 446, in get
    raise e from None

  File "C:\Users\al005366\AppData\Roaming\Python\Python39\site-packages\sdmx\client.py", line 443, in get
    response = self.session.send(req, **send_kwargs)

  File "c:\program files\python39\lib\site-packages\requests\sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)

  File "c:\program files\python39\lib\site-packages\requests\adapters.py", line 514, in send
    raise SSLError(e, request=request)

SSLError: HTTPSConnectionPool(host='sdw-wsrest.ecb.europa.eu', port=443): Max retries exceeded with url: /service/dataflow/ECB/YC/latest?references=all (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1129)')))

Ok. I have tried many different combination, still cannot get the data from ECB. Could you please provide me a functioning example?

khaeru commented 3 years ago

Thanks for providing these examples. I think we're getting closer to diagnosing the issue.

Can you say:

albertame commented 3 years ago
Name: sdmx1
Version: 2.4.1
Summary: Statistical Data and Metadata eXchange (SDMX)
Home-page: https://github.com/khaeru/sdmx
Author: SDMX Python developers
Author-email: mail@paul.kishimoto.name
License: UNKNOWN
Location: c:\users\al005366\appdata\roaming\python\python39\site-packages
Requires: lxml, pandas, pydantic, setuptools, python-dateutil, requests
Required-by:
khaeru commented 3 years ago
* it downloads a file called **latest**

"It" meaning a browser, or curl?

If a browser works, but the Python code (run at the same moment—did you try to re-run it just now?)¹ still does not, then this indicates that the way your browser connects to the server (sdw-wsrest.ecb.europa.eu) is somehow different than the way Python does.

That could have many causes, e.g. using a proxy which is properly configured in your browser, but you are not giving the same proxy settings to Python requests via sdmx1. Or (possibly) the "self-signed certificate" per the error message was directly installed in your browser and is recognized, whereas it's not recognized by requests.

Again, these would not be due to the code in this package, so I can only provide very limited help. I'd suggest maybe you look at the snippet at the top of the requests docs: https://docs.python-requests.org/en/master/ and try to run similar code, like:

r = requests.get(" https://sdw-wsrest.ecb.europa.eu/service/dataflow/ECB/YC/latest?references=all")

If this fails, then the problem is not in sdmx1.


¹ To emphasize: because a web server's status may change from moment to moment, then when we're comparing its response to 2+ different requests (browser vs. Python code) we have to run them at the same time. "Request A worked yesterday, Request B works today" is not enough to diagnose.

albertame commented 3 years ago

Thanks a lot for the help!

"It" means a browser.

Just to conclude, this works:

requests.get("https://sdw-wsrest.ecb.europa.eu/service/dataflow/ECB/YC/latest?references=all", verify = False)

This, does not work: requests.get("https://sdw-wsrest.ecb.europa.eu/service/dataflow/ECB/YC/latest?references=all")

khaeru commented 3 years ago

Thanks for this confirmation. I've done some testing (details below the line), and the failure appears to be here:

…the key argument is validated against the relevant DataStructureDefinition, either given with the dsd keyword argument, or retrieved from the web service before the main query.

The failure you're seeing occurs during this step, before the “main query”.

Specifically verify=False is not being applied to this preliminary query, so that fails—even though the main query does get the right setting ("Case 0" below). So we've found a bug! Thanks for the help in diagnosing it. I'll fix when possible.

In the meantime, one way to work around this ("Case 2" below):

This also has the advantage of being faster, as the data structure query is big & slow.

You could also use 2 separate queries:

# Retrieve the DataflowDefiniton and all related structures
structure_msg = ECB.dataflow("YC", verify=False)

# Get the associated DataStructureDefinition
dsd = structure_msg.dataflow["YC"].structure

# Use the already-retrieved DSD to convert `dict_key` to a string;
# no preliminary query is performed; avoids the bug
data_msg = ECB.data("YC", key=dict_key, dsd=dsd, verify=False)

To confirm: insert print("send_kwargs:", send_kwargs) before this line: https://github.com/khaeru/sdmx/blob/589928c4238cb911796b3051b7e2262a5c0603db/sdmx/client.py#L443

Then run:

import sdmx

ECB = sdmx.Client("ECB")

# Dimensions are in order
dict_key = {
    "FREQ": "B",
    "REF_AREA": "U2",
    "CURRENCY": "EUR",
    "PROVIDER_FM": "4F",
    "INSTRUMENT_FM": "G_N_A",
    "PROVIDER_FM_ID": "SV_C_YM",
    "DATA_TYPE_FM": "BETA0+BETA1+BETA2+BETA3+TAU1+TAU2",
}

str_key = ".".join(dict_key.values())

for case, (verify, validate, key) in enumerate((
    (False, True, dict_key),
    (True, True, dict_key),
    (False, False, str_key),
    (False, False, dict_key),
)):
    print("Case", case)
    print("verify:", verify)
    print("validate:", validate)
    print("key type:", type(key))

    ECB.data(
        "YC",
        key=key,
        params={'startPeriod': '2007-01-01'},
        verify=verify,
        validate=validate,
    )

Output:

Case 0
verify: False
validate: True
key type: <class 'dict'>
send_kwargs: {'verify': True, 'proxies': OrderedDict(), 'stream': False, 'cert': None}
send_kwargs: {'verify': False, 'proxies': OrderedDict(), 'stream': False, 'cert': None}
Case 1
verify: True
validate: True
key type: <class 'dict'>
send_kwargs: {'verify': True, 'proxies': OrderedDict(), 'stream': False, 'cert': None}
Case 2
verify: False
validate: False
key type: <class 'str'>
send_kwargs: {'verify': False, 'proxies': OrderedDict(), 'stream': False, 'cert': None}
Case 3
verify: False
validate: False
key type: <class 'dict'>
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-eb72a280b29d> in <module>
     28     print("key type:", type(key))
     29
---> 30     ECB.data(
     31         "YC",
     32         key=key,

~/vc/sdmx/sdmx/client.py in get(self, resource_type, resource_id, tofile, use_cache, dry_run, **kwargs)
    409             req = self._request_from_url(kwargs)
    410         else:
--> 411             req = self._request_from_args(kwargs)
    412
    413         req = self.session.prepare_request(req)

~/vc/sdmx/sdmx/client.py in _request_from_args(self, kwargs)
    237
    238         # Assemble final URL
--> 239         url = "/".join(filter(None, url_parts))
    240
    241         # Parameters: set 'references' to sensible defaults

TypeError: sequence item 3: expected str instance, dict found  
albertame commented 3 years ago

Thanks for the help!

khaeru commented 3 years ago

Closed in #80 and will be released in the next version after 2.4.1.