opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.84k stars 1.83k forks source link

[BUG] Registering MinIO (S3) snapshot repository fails with "Connect timed out" #16305

Open pjuri opened 1 month ago

pjuri commented 1 month ago

Describe the bug

I’m running OpenSearch as part of Graylog Helm installation under Kubernetes. I’m trying to register a snapshot endpoint with MinIO. I’m following this document: https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore/

When I try to register the repository with curl (using the REST API), I get "Connect timed out" error. Using tcpdump I can see that no connection to provided IP address is attempted. When I manually test the connection to MinIO with curl, it works. (I.e. it’s not a network issue.)

If I remove s3.client.default.endpoint setting, I can see OpenSearch connecting to Amazon servers. (Which is not what I want.)

I suspect this might be just a misconfiguration, but no matter what I try, I get the same results.

Related component

Plugins

To Reproduce

[opensearch@opensearch-cluster-master-0 ~]$ opensearch-keystore create An opensearch keystore already exists. Overwrite? [y/N]y Created opensearch keystore in /usr/share/opensearch/config/opensearch.keystore [opensearch@opensearch-cluster-master-0 ~]$ opensearch-keystore add s3.client.default.access_key
Enter value for s3.client.default.access_key: [opensearch@opensearch-cluster-master-0 ~]$ opensearch-keystore add s3.client.default.secret_key Enter value for s3.client.default.secret_key: [opensearch@opensearch-cluster-master-0 ~]$ grep s3.client.default config/opensearch.yml s3.client.default.protocol: "http" s3.client.default.endpoint: "http://1.2.3.4:9000/" s3.client.default.path_style_access: "true"

Did steps above on all 3 cluster members.

[opensearch@opensearch-cluster-master-0 ~]$ curl -X POST "http://localhost:9200/_nodes/reload_secure_settings" {"_nodes":{"total":3,"successful":3,"failed":0},"cluster_name":"opensearch-cluster","nodes":{"Ug2a4ZiqS_6sNDvKlFRNbg":{"name":"opensearch-cluster-master-2"},"zi7xQcAsT0WyPEXLozMEJQ":{"name":"opensearch-cluster-master-0"},"R6I3MgjqRrS85OjyIWHCaw":{"name":"opensearch-cluster-master-1"}}}[opensearch@opensearch-cluster-master-0 ~]$ [opensearch@opensearch-cluster-master-0 ~]$ curl -X PUT "http://localhost:9200/_snapshot/minio-repo?pretty" -H 'Content-Type: application/json' -d '

{ "type": "s3", "settings": { "bucket": "opensearch", "base_path": "opensearch/snapshot/"

} }' { "error" : { "root_cause" : [ { "type" : "repository_verification_exception", "reason" : "[minio-repo] path [opensearch/snapshot/] is not accessible on cluster-manager node" } ], "type" : "repository_verification_exception", "reason" : "[minio-repo] path [opensearch/snapshot/] is not accessible on cluster-manager node", "caused_by" : { "type" : "i_o_exception", "reason" : "Unable to upload object [opensearch/snapshot//tests-nZNGJ5szRh-Pd5gX3q44dA/master.dat] using a single upload", "caused_by" : { "type" : "sdk_client_exception", "reason" : "sdk_client_exception: Failed to connect to service endpoint: ", "caused_by" : { "type" : "i_o_exception", "reason" : "Connect timed out" } } } }, "status" : 500 }

tcpdump shows no traffic to MinIO

Test if the Minio endpoint is reachable:

[opensearch@opensearch-cluster-master-0 ~]$ curl http://1.2.3.4:9000/ <?xml version="1.0" encoding="UTF-8"?>

AccessDeniedAccess Denied./minio17FE44A7FEAD5E72dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8[opensearch@opensearch-cluster-master-0 ~]$ ## tcpdump shows connection with MinIO was established

Expected behavior

Snapshot endpoint should be successfully registered, allowing me to make snapshots and recoveries.

Additional Details

Plugins plugins: enabled: true installList:

Host/Environment (please complete the following information):

Additional context Kubernetes: v1.28.14 Containerd: 1.7.2-0ubuntu1~22.04.1 Docker image: opensearchproject/opensearch:2.4.0 Helm chart: graylog-2.3.10 - uses https://artifacthub.io/packages/helm/opensearch-project-helm-charts/opensearch

pjuri commented 1 month ago

Here is the list of installed plugins from _cat/plugins: (Only from first node) opensearch-cluster-master-0 opensearch-alerting 2.4.0.0 opensearch-cluster-master-0 opensearch-anomaly-detection 2.4.0.0 opensearch-cluster-master-0 opensearch-asynchronous-search 2.4.0.0 opensearch-cluster-master-0 opensearch-cross-cluster-replication 2.4.0.0 opensearch-cluster-master-0 opensearch-geospatial 2.4.0.0 opensearch-cluster-master-0 opensearch-index-management 2.4.0.0 opensearch-cluster-master-0 opensearch-job-scheduler 2.4.0.0 opensearch-cluster-master-0 opensearch-knn 2.4.0.0 opensearch-cluster-master-0 opensearch-ml 2.4.0.0 opensearch-cluster-master-0 opensearch-neural-search 2.4.0.0 opensearch-cluster-master-0 opensearch-notifications 2.4.0.0 opensearch-cluster-master-0 opensearch-notifications-core 2.4.0.0 opensearch-cluster-master-0 opensearch-observability 2.4.0.0 opensearch-cluster-master-0 opensearch-performance-analyzer 2.4.0.0 opensearch-cluster-master-0 opensearch-reports-scheduler 2.4.0.0 opensearch-cluster-master-0 opensearch-security 2.4.0.0 opensearch-cluster-master-0 opensearch-security-analytics 2.4.0.0 opensearch-cluster-master-0 opensearch-sql 2.4.0.0 opensearch-cluster-master-0 repository-s3 2.4.0

jwitko commented 1 month ago

You have to use an environment variable to disable the AWS EC2 METADATA connection. OpenSearch is trying to reach the aws magic IP on your server and of course failing. We just hit this same issue.

pjuri commented 1 month ago

Thanks, @jwitko, setting AWS_EC2_METADATA_DISABLED helped!

It would be good if this was mentioned in the documentation.

Here's what I put in my Helm values:

opensearch:
  extraEnvs:
    - name: AWS_EC2_METADATA_DISABLED
      value: "true"
dblock commented 2 weeks ago

[Catch All Triage - 1, 2]

@pjuri Looks like we got to the bottom of this, care to contribute to the documentation?

pjuri commented 2 weeks ago

@dblock sure, if you tell me where to put it.

dblock commented 2 weeks ago

Would the right place be https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore/? If so that's in https://github.com/opensearch-project/documentation-website.

pjuri commented 1 week ago

@dblock done. Here's the pull request: https://github.com/opensearch-project/documentation-website/pull/8734