reportportal / service-metrics-gatherer

Apache License 2.0
0 stars 3 forks source link

External opensearch instance with TLS enabled #73

Open nevesing opened 6 months ago

nevesing commented 6 months ago

Describe the bug I am trying to use external opensearch instance with TLS enabled. However the envvar as described in docs is not working as expected

Steps to Reproduce Steps to reproduce the behavior:

Enable these envvar for the container service-metrics-gatherer:5.11.0

  metrics-gatherer:
    image: reportportal/service-metrics-gatherer:5.11.0
    container_name: reportportal-metrics-gatherer
    logging:
      <<: *logging
    environment:
      LOGGING_LEVEL: debug
      ES_USER: xxx
      ES_PASSWORD: xxxx
      ES_HOST: https://xxxx-external-opensearch:9200
      ES_VERIFY_CERTS: true
      ES_USE_SSL: true
      ES_SSL_SHOW_WARN: true
      ES_TURN_OFF_SSL_VERIFICATION: false
      ES_CA_CERT: /backend/tls/ca.crt
      ES_CLIENT_CERT: /backend/tls/tls.crt
      ES_CLIENT_KEY: /backend/tls/tls.key
      POSTGRES_USER: *db_user
      POSTGRES_PASSWORD: *db_password
      POSTGRES_DB: *db_name
      POSTGRES_HOST: *db_host
      POSTGRES_PORT: 5432
      ALLOWED_START_TIME: "22:00"
      ALLOWED_END_TIME: "08:00"
      AMQP_VIRTUAL_HOST: analyzer
      AMQP_URL: amqp://rabbitmq:rabbitmq@rabbitmq:5672
    volumes:
      - ./opensearch/ca.crt:/backend/tls/ca.crt
      - ./opensearch/tls.crt:/backend/tls/tls.crt
      - ./opensearch/tls.key:/backend/tls/tls.key
    networks:
      - reportportal

Expected behavior Successful connection to opensearch

Actual behavior

reportportal-metrics-gatherer  | 2024-04-16 21:15:12,439 - ERROR - metricsGatherer.es_client - Elasticsearch is not healthy
reportportal-metrics-gatherer  | 2024-04-16 21:15:12,440 - ERROR - metricsGatherer.es_client - list indices must be integers or slices, not str
reportportal-metrics-gatherer  | 2024-04-16 21:15:12,644 - ERROR - metricsGatherer - Metrics gatherer health check status failed: Elasticsearch is not healthy;
reportportal-metrics-gatherer  | [pid: 6|app: 0|req: 1/1] 127.0.0.1 () {28 vars in 294 bytes} [Tue Apr 16 21:15:12 2024] GET / => generated 43 bytes in 293 msecs (HTTP/1.1 503) 3 headers in 120 bytes (1 switches on core 0)
reportportal-metrics-gatherer  | 2024-04-16 21:16:12,796 - ERROR - metricsGatherer.es_client - Error with loading url: https://xxxx-external-opensearch:9200/_cluster/health
reportportal-metrics-gatherer  | 2024-04-16 21:16:12,796 - ERROR - metricsGatherer.es_client - HTTPSConnectionPool(host='xxxx-external-opensearch', port=9200): Max retries exceeded with url: /_cluster/health (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))

Artifact version 5.11.0

Additional info

Verified the cert with curl cmd and it works fine

curl https://xxxx-external-opensearch:9200/_cluster/health \
--cacert ca.crt --cert tls.crt --key tls.key -u xxx
HardNorth commented 6 months ago

@nevesing According to the log you provided, your OpenSearch instance answered HTTP/1.1 503. That means Service Unavailable. So, how come that's related to TLS or our configuration? And it also tells, that TLS configuration works just fine, since it's able to retrieve the response.

So, please investigate why your OpenSearch instance is in unhealthy state, this is nothing to do with ReportPortal.

nevesing commented 6 months ago

@HardNorth - I dont think HTTP 503 is relevant here since the curl cmd i mentioned above works fine which means opensearch is healthy. Error log also has CERTIFICATE_VERIFY_FAILED so it does relevant to TLS. Could you please help?

HardNorth commented 6 months ago

@nevesing After CERTIFICATE_VERIFY_FAILED there is error description: unable to get local issuer certificate (_ssl.c:1007). Does your certificates issued by public issuer?

nevesing commented 6 months ago

@HardNorth No its internal to our org only. Again the curl was successful with the same set of CA, cert and key. Also volume mounts were fine and I am able to get into the container and cat those files without permission issues.

Below commands were executed from inside the metrics-gatherer container:

uwsgi@eabc486bc0c5:/backend/tls$ ls -al
total 12
drwxr-xr-x 2 root root   50 Apr 17 15:04 .
drwxr-xr-x 1 root root   17 Apr 17 15:04 ..
-rw-r--r-- 1 root root 3974 Apr 16 16:29 ca.crt
-rw-r--r-- 1 root root 2115 Apr 16 16:29 tls.crt
-rw-r--r-- 1 root root 1705 Apr 16 16:29 tls.key

uwsgi@eabc486bc0c5:/backend/tls$ curl https://xxxx-external-opensearch:9200/_cluster/health --cacert ca.crt --cert tls.crt --key tls.key -u xxx
Enter host password for user 'xxx':
{"cluster_name":"PRD_Cluster","status":"green","timed_out":false,"number_of_nodes":5,"number_of_data_nodes":5,"discovered_master":true,"discovered_cluster_manager":true,"active_primary_shards":34,"active_shards":131,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}

uwsgi@eabc486bc0c5:/backend/tls$ printenv | grep ^ES_
ES_HOST=https://xxxx-external-opensearch:9200
ES_CLIENT_CERT=/backend/tls/tls.crt
ES_USE_SSL=true
ES_TURN_OFF_SSL_VERIFICATION=false
ES_VERIFY_CERTS=true
ES_CLIENT_KEY=/backend/tls/tls.key
ES_PASSWORD=xxxx
ES_SSL_SHOW_WARN=true
ES_CA_CERT=/backend/tls/ca.crt
ES_USER=xxx
nevesing commented 6 months ago

Shall we reopen the issue?

nevesing commented 6 months ago

I also tried adding REQUESTS_CA_BUNDLE envvar to the list and now the error message is different. I feel like the es_client.py is not using the TLS parameters at all. Could you please test if https actually works?

reportportal-metrics-gatherer  | 2024-04-22 13:41:18,678 - ERROR - metricsGatherer.es_client - HTTPSConnectionPool(host='xxxx-external-opensearch', port=9200): Max retries exceeded with url: /_cluster/health (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_BAD_CERTIFICATE] sslv3 alert bad certificate (_ssl.c:2578)')))
reportportal-metrics-gatherer  | 2024-04-22 13:41:18,678 - ERROR - metricsGatherer.es_client - Elasticsearch is not healthy

I also tried a simple hello py program for ES client which works fine with my cert combination:

from elasticsearch import Elasticsearch

ELASTIC_PASSWORD = "xxxxxxxx"

client = Elasticsearch(
    "https://xxxx-external-opensearch:9200",
    ca_certs="ca.crt",
    client_cert="tls.crt",
    client_key="tls.key",
    basic_auth=("xxx", ELASTIC_PASSWORD)
)

client.info()
HardNorth commented 6 months ago

@nevesing This all looks suspiciously for me, why then your Analyzer works fine? Or you don't tell something?

nevesing commented 6 months ago

@nevesing This all looks suspiciously for me, why then your Analyzer works fine? Or you don't tell something?

@HardNorth - analyzer, analyzer-train, metrics-gatherer all 3 containers have same issue

nevesing commented 6 months ago

@HardNorth were you able to troubleshoot?

nevesing commented 5 months ago

@HardNorth could you please help?