matthewfranglen / postgres-elasticsearch-fdw

Postgres to Elastic Search Foreign Data Wrapper
MIT License
111 stars 32 forks source link

Elasticsearch SSL authentication - errors #25

Closed Saketha016 closed 2 years ago

Saketha016 commented 3 years ago

Hi, Want to know whether "pg_es_fdw" supports SSL authentication.

I have an Elasticsearch cluster setup with SSL authentication. When I tried to insert data into elasticsearch after the foreign table creation step, I see below authentication error:

postgres=# INSERT INTO articles_es1
    (
        id,
        title,
        body,
        metadata
    )
VALUES
    (
        1,
        'foo',
        'spike',
        '{"score": 3}'::json
    );
ERROR:  INDEX for /article-index/1 and document {'title': 'foo', 'body': 'spike', 'metadata': {'score': 3}, 'query': None, 'score': None, 'sort': None} failed: ConnectionError(('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))) caused by: ProtocolError(('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))

Moreover, I tried with "SSL" option too by providing the pem file location as shown below, but no luck:

CREATE FOREIGN TABLE articles_es1
    (
        id BIGINT,
        title TEXT,
        body TEXT,
        metadata JSON,
        query TEXT,
        score NUMERIC,
        sort TEXT
    )
SERVER multicorn_es
OPTIONS
    (
        host '192.168.1.194',
        port '9200',
        index 'article-index',
        rowid_column 'id',
        query_column 'query',
        query_dsl 'false',
        score_column 'score',
        default_sort 'last_updated:desc',
        sort_column 'sort',
        refresh 'false',
        complete_returning 'false',
        timeout '20',
        username 'admin',
        password 'admin',
        ssl '/opt/root-ca.pem'
)
;

Thanks in advance !!

matthewfranglen commented 3 years ago

Basically you should first be able to connect to your cluster using the raw python client: https://elasticsearch-py.readthedocs.io/en/master/

You're probably looking for a client configured like this:

es = Elasticsearch(
    {'host': '192.168.1.194', 'port': '9200'},
    # turn on SSL
    use_ssl=True,
    # make sure we verify SSL certificates
    verify_certs=True,
    # provide a path to CA certs on disk
    ca_certs='/opt/root-ca.pem'
)

You can then perform a simple query:

es.get(index="article-index", id=1)

This step establishes if the problem is with the underlying python client. When you are able to connect and use the connection then you just need to translate the settings to the FDW.

Any option to the FDW which is not explicitly listed in the README is passed through to the elasticsearch client unaltered. So you are likely going to add:

use_ssl 'true',
ca_certs '/opt/root-ca.pem'
Saketha016 commented 3 years ago

Hi Matthew, Thanks for helping !!

I'm successfully able to test the connection with the script which you provided. Below is the code for reference:

from datetime import datetime
from elasticsearch import Elasticsearch
es = Elasticsearch(host='192.168.1.194', port='9200', http_auth=('admin', 'admin'),use_ssl=True,verify_certs=False,ca_certs='/opt/root-ca.pem')
res = es.get(index="article-index1", id=1)
print(res)

Output:

{'_index': 'article-index1', '_type': '_doc', '_id': '1', '_version': 2, '_seq_no': 1, '_primary_term': 1, 'found': True, '_source': {'title': 'foo', 'body': 'spike', 'metadata': {'score': 3}, 'query': None, 'score': None, 'sort': None}}

But when it comes to FDW, this is the query that I used to build a table...

CREATE FOREIGN TABLE articles_es1
    (
        id BIGINT,
        title TEXT,
        body TEXT,
        metadata JSON,
        query TEXT,
        score NUMERIC,
        sort TEXT
    )
SERVER multicorn_es
OPTIONS
    (
        host '192.168.1.194',
        port '9200',
        index 'article-index',
        type 'article',
        rowid_column 'id',
        query_column 'query',
        query_dsl 'false',
        score_column 'score',
        default_sort 'last_updated:desc',
        sort_column 'sort',
        refresh 'false',
        complete_returning 'false',
        timeout '20',
        username 'admin',
        password 'admin',
        verify_certs 'false',
        use_ssl 'true',
        ca_certs '/opt/root-ca.pem'
    )
;

Output: CREATE FOREIGN TABLE

And, when I tried to insert data, error comes up:

INSERT INTO articles_es1
    (
        id,
        title,
        body,
        metadata
    )
VALUES
    (
        1,
        'foo',
        'spike',
        '{"score": 3}'::json
    );

Output: ERROR: INDEX for /article-index/1 and document {'title': 'foo', 'body': 'spike', 'metadata': {'score': 3}, 'query': None, 'score': None, 'sort': None} failed: ConnectionError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1056)) caused by: SSLError([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1056))

Even when I tried with verify_certs 'true', there's no change in the error. Please try to correct me if I'm wrong at any step.

matthewfranglen commented 3 years ago

Really you want the certificate validation to be false. The problem is that the flag is not interpreted as a boolean, instead it's a string (as all options to the fdw are) and a non blank string is true. The "false" string is the blank string so try verify_certs ''.

This is something I should fix properly though.

Saketha016 commented 3 years ago

Thanks, @matthewfranglen Appreciate your help !!

It worked now :)

Saketha016 commented 3 years ago

Hello Matthew,

I know I'm not asking this question related to this issue ticket, but what to check with you and need your help on this... I want to create a foreign table in pgsql which maps to the timestamp field of elasticsearch. Is there any option to do so? I tried the below query, but it resulted in an empty column.

Table creation query:

CREATE FOREIGN TABLE articles_es8
    (
        id TEXT,
        query TEXT,
        timestamp DATE
    )
SERVER multicorn_es
OPTIONS
    (
        host '192.168.1.194',
        port '9200',
        index 'security-auditlog-2021.04.15',
        type 'article',
        rowid_column 'id',
        query_column 'query',
        query_dsl 'false',
        sort_column 'audit_format_version',
        refresh 'false',
        complete_returning 'false',
        timeout '20',
        username 'admin',
        password 'admin',
        verify_certs '',
        use_ssl 'true',
        ca_certs '/opt/root-ca.pem'
    )
;

Select query:

postgres=# select * from articles_es8 where id='Pu3T0ngBkrEbtt6b1d8X';
          id          | query | timestamp 
----------------------+-------+-----------
 Pu3T0ngBkrEbtt6b1d8X |       | 
(1 row)

Actual data from Kibana: image

Appreciate your help !!

matthewfranglen commented 3 years ago

there are several open tickets about proper type conversion which I haven't got round to unfortunately. I need to add a proper type conversion system to the FDW. For now you may be able to get it working by having the column be TEXT and then casting it in pg.

matthewfranglen commented 3 years ago

Ooh and are you sure that the column is named timestamp in the json? You could try naming the column @timestamp in the FDW table.

Saketha016 commented 3 years ago

Even timestamp TEXT didn't work. It was unable to map.

And, When I tried with @timestamp it was throwing syntax error as shown below. That's why I tried with timestamp.

ERROR:  syntax error at or near "@"
LINE 5:         @timestamp TEXT
                ^
Saketha016 commented 3 years ago

It worked Matthew, I tried the below code: (i.e., wrapped with double quotes)

"@timestamp" TEXT

Thanks for your suggestions !!

matthewfranglen commented 3 years ago

Out of interest does it work with "@timestamp" DATE ?

Saketha016 commented 3 years ago

Yes, but the output differs here.

For e.g.: If we chose "@timestamp" TEXT, then output looks like...

postgres=# select * from articles_es8 where id='Pu3T0ngBkrEbtt6b1d8X';
          id          | query |          @timestamp           
----------------------+-------+-------------------------------
 Pu3T0ngBkrEbtt6b1d8X |       | 2021-04-15T00:00:00.896+00:00
(1 row)

If we chose "@timestamp" DATE, the output looks like...

postgres=# select * from articles_es8 where id='Pu3T0ngBkrEbtt6b1d8X';
          id          | query | @timestamp 
----------------------+-------+------------
 Pu3T0ngBkrEbtt6b1d8X |       | 2021-04-15
(1 row)
matthewfranglen commented 3 years ago

Oh that's correct though. There are several date related types in postgres. https://www.postgresql.org/docs/current/datatype-datetime.html

You might want to try out TIMESTAMP WITH TIME ZONE.

matthewfranglen commented 2 years ago

closing this due to inactivity. Hope that the timestamp with time zone tip worked.