orchestracities / ngsi-timeseries-api

QuantumLeap: a FIWARE Generic Enabler to support the usage of NGSIv2 (and NGSI-LD experimentally) data in time-series databases
https://quantumleap.rtfd.io/
MIT License
38 stars 49 forks source link

Can't connect to Timescale over SSL #393

Open c0c0n3 opened 3 years ago

c0c0n3 commented 3 years ago

Describe the bug

QuantumLeap bombs out when attempting to connect to Timescale over SSL. Here's the stack trace from the logs (irrelevant lines omitted):

INFO:translators.factory:Backend selected for tenant 'vecciano' is: timescale
INFO:translators.timescale:Env variable POSTGRES_HOST set to 'pg-patroni', using this value.
INFO:translators.timescale:Env variable POSTGRES_PORT set to '5432', using this value.
INFO:translators.timescale:Env variable POSTGRES_USE_SSL set to 't', using this value.
INFO:translators.timescale:Env variable POSTGRES_DB_NAME set to 'quantumleap', using this value.
INFO:translators.timescale:Env variable POSTGRES_DB_USER set to 'quantumleap', using this value.
INFO:translators.timescale:Env variable POSTGRES_DB_PASS set, using its value.
ERROR:app:Exception on /v2/notify [POST]
Traceback (most recent call last):
...
  File "/src/ngsi-timeseries-api/src/reporter/reporter.py", line 188, in notify
 ...
  File "/src/ngsi-timeseries-api/src/translators/timescale.py", line 74, in setup
    self.conn = pg8000.connect(host=self.host, port=self.port, ssl_context=self.ssl,
  File "/usr/local/lib/python3.8/site-packages/pg8000/__init__.py", line 49, in connect
    return Connection(
  File "/usr/local/lib/python3.8/site-packages/pg8000/core.py", line 1186, in __init__
    self._usock = ssl_context.wrap_socket(
AttributeError: 'dict' object has no attribute 'wrap_socket'
DEBUG:connexion.apis.abstract:Getting data and status code
DEBUG:connexion.apis.abstract:Prepared body and status code (500)
DEBUG:connexion.apis.abstract:Got framework response
127.0.0.1 - - [12/Nov/2020 16:10:19] "POST /v2/notify HTTP/1.1" 500 -
INFO:werkzeug:127.0.0.1 - - [12/Nov/2020 16:10:19] "POST /v2/notify HTTP/1.1" 500 -

To Reproduce

Configure QuantumLeap to use the Timescale backend for a tenant named "vecciano" and set the POSTGRES_USE_SSL env var to true. Then run:

$ curl -v -X POST localhost:8668/v2/notify \
     -H 'Content-Type: application/json' \
     -H 'Fiware-Service: vecciano' \
     -d '
{
  "data": [
    {
       "id": "urn:ngsi-ld:Device:d3",
       "type": "Device",
       "airpressure": {
         "type": "Number",
         "value": 987654321
       }
    }
  ]
}'

You should get a nasty 500 back and if you look at the logs, you should be able to see a fat stack trace like the one above.

Expected behavior

QuantumLeap should be able to establish an SSL connection to Timescale.

Environment

Additional context

Issue cropped up in Orchestra prod when connecting QL to the old pg-patroni cluster and the then to the new pgsql-patroni instance. If you get a Python shell on the QL pod, you can clearly see where things go awry:

>>> import pg8000
>>> conn = pg8000.connect(
                      host='pg-patroni',
                      port=5432, 
                      ssl_context={},   # this is what gets passed in when QL is configured with SSL = true
                      database='quantumleap',
                      user='quantumleap',
                      password='*')     # real password omitted :-)
c0c0n3 commented 3 years ago

Ideally, we should add some QL/Timescale integration tests where we test SSL connections. Timescale container set up should be similar to what we already have in the timescale-container/test dir, see:

github-actions[bot] commented 3 years ago

Stale issue message

c0c0n3 commented 3 years ago

Here's some more debug info we can use later to concoct a fix.

First off, start Timescale w/ SSL using this docker compose:

Notice the server certificates are self-signed. Now if you start a Python interpreter in e.g. the QL image, you can see what's going on

>>> import certifi    # see https://stackoverflow.com/questions/50236117
>>> import ssl
>>> import pg8000

Now the implementation of ssl must've changed since we tested. In fact, it looks like that's where all hell actually breaks loose, pg8000 just sits there peacefully.

>>> pg8000.paramstyle = "qmark"
>>> con = pg8000.connect(host='localhost', port=5432, ssl_context={}, database='quantumleap', user='quantumleap', password='*')
...
AttributeError: 'dict' object has no attribute 'wrap_socket'

as expected. Now the pg8000 implementation changed too since we tested and if you want to use a default SSL context, you should pass in True instead of {}:

Ah, so we can fix it! Or can we?

>>> con = pg8000.connect(host='localhost', port=5432, ssl_context=True, database='quantumleap', user='quantumleap', password='*')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/andrea/.local/share/virtualenvs/ngsi-timeseries-api-MeJ80LMF/lib/python3.8/site-packages/pg8000/__init__.py", line 56, in connect
    return Connection(
  File "/Users/andrea/.local/share/virtualenvs/ngsi-timeseries-api-MeJ80LMF/lib/python3.8/site-packages/pg8000/core.py", line 674, in __init__
    self._usock = ssl_context.wrap_socket(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)

Computer says no. Surely this could be b/c Postgres got started w/ a self-signed cert. In fact,

>>> ctx = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
>>> con = pg8000.connect(host='localhost', port=5432, ssl_context=ctx, database='quantumleap', user='quantumleap', password='*')
...
# same error as before

But here's a little surprise

>>> certs = ctx.load_default_certs()
>>> print(certs)
None

So yah, even if we had a proper cert, I don't think we'd go too far. Could it be our version of certifi is too old? Anyhoo, if you don't care about server authentication, you could work around this

>>> ctx = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
>>> con = pg8000.connect(host='localhost', port=5432, ssl_context=ctx, database='quantumleap', user='quantumleap', password='*')
>>> con.run("select count(*) from mtv.etdevice")
([10],)
Panzki commented 3 years ago

Hi, what is the status on this? For a productive setup SSL is a must have. Are there any plans to fix this issue?

For anyone how wants to run Quantum Leap with TimescaleDB without a SSL connection I can confirm that the following workaround is possible:

  1. Allow non SSL connections for the quantumleap database in your TimescaleDB pg_hba.conf by adding this line of configuration:
    hostnossl quantumleap all all md5
  2. Tell Quantum Leap not to use SSL for connecting to TimescaleDB by setting: POSTGRES_USE_SSL:f
c0c0n3 commented 2 years ago

Hi @Panzki

what is the status on this?

Can you check if it's still broken in the latest QL release v0.8.3?

586, which went into 0.8.3, might've fixed it...

For a productive setup SSL is a must have

Agree :-)

Are there any plans to fix this issue?

Not a priority at the moment, but we'll try fixing this some time in the next couple of months if it's still broken...

bkd231 commented 1 year ago

Hi @c0c0n3! Yes, it's still broken in the v0.8.3 release.

c0c0n3 commented 1 year ago

@bkdkmd oh deary deary, bugs never sleep :-)

@pooja1pathak do you guys have dev cycles to look into this and give it high priority?

FR-ADDIX commented 1 year ago

s this bug also still processed or do you have to work without SSL now? It is now 2023 and the bug has been known since 2020 and runs through the versions. In the current version quantumleap:0.8.3, this now falls on our feet when we want to use a ReplicaSet of the TimescaleDB. :-(

c0c0n3 commented 1 year ago

hello @FR-ADDIX :-)

This is still a bug unfortunately. I share your frustration as a developer, but am also sure you'll appreciate we don't have enough resources to work on Quantum Leap on a full-time basis at the moment, so it's sort of a best-effort approach for us, mainly driven by what our clients request.

This is open-source after all, so if you're willing to roll up your sleeves and contribute this fix to the community we'll gladly merge your PR!