terascope / teraslice

Scalable data processing pipelines in JavaScript
https://terascope.github.io/teraslice/
Apache License 2.0
50 stars 13 forks source link

Discussion of Elasticsearch Client #3245

Open jsnoble opened 2 years ago

jsnoble commented 2 years ago

Due to the fact that elasticsearch client and opensearch client are diverging, we need a way to make a unified client to work with both scenarios. This is heavily entrenched in the code, and we will need to take several considerations.

graph TD;
  terafoundation_connector --> context.apis.op_runner.getClient;
  terafoundation_connector --> context.foundation.getConnection;
  context.foundation.getConnection --> apis;
  context.foundation.getConnection --> spaces-api-utils;
  context.apis.op_runner.getClient --> operations/processors/slicers;
  terafoundation_connector --> elasticsearch_api;
  elasticsearch_api --> teraslice_state_storage;
  elasticsearch_api --> elasticsearch_asset_apis;
  elasticsearch_asset_apis --> elasticsearch_assets;
  elasticsearch_asset_apis --> spaces-queries;

Questions:

jsnoble commented 2 years ago

Various auth ways to go about connecting to elasticsearch https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/client-connecting.html#authentication

godber commented 2 years ago

Here are instructions to:

  1. enable the ES security plugin with user auth
  2. generate local test CA
  3. generate certs from that CA
  4. configure ES to use SSL and that cert
  5. connect to ES with curl using the CA cert

First step, extract ES

tar -zxpvf elasticsearch-7.9.3-darwin-x86_64.tar.gz
cd elasticsearch-7.9.3/

Now edit the config/elasticsearch.yml file and paste the following at the end:

discovery.type: single-node
xpack.security.enabled: true
#xpack.security.http.ssl.enabled: true
#xpack.security.http.ssl.keystore.path: "http.p12"

Start ES so you can bootstrap users:

bin/elasticsearch

In a separate terminal bootstrap users:

bin/elasticsearch-setup-passwords auto
Initiating the setup of passwords for reserved users elastic,apm_system,kibana,kibana_system,logstash_system,beats_system,remote_monitoring_user.
The passwords will be randomly generated and printed to the console.
Please confirm that you would like to continue [y/N]y

Changed password for user apm_system
PASSWORD apm_system = muoQLASugCwQTspArywf

Changed password for user kibana_system
PASSWORD kibana_system = EdI5G376M5PpMhjBv5If

Changed password for user kibana
PASSWORD kibana = EdI5G376M5PpMhjBv5If

Changed password for user logstash_system
PASSWORD logstash_system = CPn292dL0H8yMNp5YEQf

Changed password for user beats_system
PASSWORD beats_system = dXQ85NS9JbBjTmfOBdbk

Changed password for user remote_monitoring_user
PASSWORD remote_monitoring_user = JSWYzaElLImQSrdzMTOe

Changed password for user elastic
PASSWORD elastic = N486tydU4d7J1DwcQ1lL

[NOTE: There is a bin/elasticsearch-setup-passwords interactive command if you prefer to type in your own passwords]

Ctrl+C out of elasticsearch to now generate your CA (Certificate authority).

bin/elasticsearch-certutil ca --pem
This tool assists you in the generation of X.509 certificates and certificate
signing requests for use with SSL/TLS in the Elastic stack.

The 'ca' mode generates a new 'certificate authority'
This will create a new X.509 certificate and private key that can be used
to sign certificate when running in 'cert' mode.

Use the 'ca-dn' option if you wish to configure the 'distinguished name'
of the certificate authority

By default the 'ca' mode produces a single PKCS#12 output file which holds:
    * The CA certificate
    * The CA's private key

If you elect to generate PEM format certificates (the -pem option), then the output will
be a zip file containing individual files for the CA certificate and private key

Please enter the desired output file [elastic-stack-ca.zip]:

Make a temp directory and extract the CA certificate and key into it:

mkdir temp; unzip elastic-stack-ca.zip -d temp
Archive:  elastic-stack-ca.zip
   creating: temp/ca/
  inflating: temp/ca/ca.crt
  inflating: temp/ca/ca.key

Then list the complete paths to the ca.crt and ca.key, you're going to need them in the following step:

ls -1 `pwd`/temp/ca/ca.{crt,key}
/Users/godber/temp/ca/elasticsearch-7.9.3/temp/ca/ca.crt
/Users/godber/temp/ca/elasticsearch-7.9.3/temp/ca/ca.key

Now run bin/elasticsearch-certutil http to generate the ES HTTP certificates using the CA you created earlier. Note that this example assumes you access the ES cluster using the hostname localhost which uses the IP 127.0.0.1. This is the most common case for local development but its possible this won't be sufficient in other cases like testing or docker or something.

bin/elasticsearch-certutil http

## Elasticsearch HTTP Certificate Utility

The 'http' command guides you through the process of generating certificates
for use on the HTTP (Rest) interface for Elasticsearch.

This tool will ask you a number of questions in order to generate the right
set of files for your needs.

## Do you wish to generate a Certificate Signing Request (CSR)?

A CSR is used when you want your certificate to be created by an existing
Certificate Authority (CA) that you do not control (that is, you don't have
access to the keys for that CA).

If you are in a corporate environment with a central security team, then you
may have an existing Corporate CA that can generate your certificate for you.
Infrastructure within your organisation may already be configured to trust this
CA, so it may be easier for clients to connect to Elasticsearch if you use a
CSR and send that request to the team that controls your CA.

If you choose not to generate a CSR, this tool will generate a new certificate
for you. That certificate will be signed by a CA under your control. This is a
quick and easy way to secure your cluster with TLS, but you will need to
configure all your clients to trust that custom CA.

Generate a CSR? [y/N]N

## Do you have an existing Certificate Authority (CA) key-pair that you wish to use to sign your certificate?

If you have an existing CA certificate and key, then you can use that CA to
sign your new http certificate. This allows you to use the same CA across
multiple Elasticsearch clusters which can make it easier to configure clients,
and may be easier for you to manage.

If you do not have an existing CA, one will be generated for you.

Use an existing CA? [y/N]y

## What is the path to your CA?

Please enter the full pathname to the Certificate Authority that you wish to
use for signing your new http certificate. This can be in PKCS#12 (.p12), JKS
(.jks) or PEM (.crt, .key, .pem) format.
CA Path: /Users/godber/temp/ca/elasticsearch-7.9.3/temp/ca/ca.crt

## What is the path to your CA key?

/Users/godber/temp/ca/elasticsearch-7.9.3/temp/ca/ca.crt appears to be a PEM formatted certificate file.
In order to use it for signing we also need access to the private key
that corresponds to that certificate.

CA Key: /Users/godber/temp/ca/elasticsearch-7.9.3/temp/ca/ca.key

## How long should your certificates be valid?

Every certificate has an expiry date. When the expiry date is reached clients
will stop trusting your certificate and TLS connections will fail.

Best practice suggests that you should either:
(a) set this to a short duration (90 - 120 days) and have automatic processes
to generate a new certificate before the old one expires, or
(b) set it to a longer duration (3 - 5 years) and then perform a manual update
a few months before it expires.

You may enter the validity period in years (e.g. 3Y), months (e.g. 18M), or days (e.g. 90D)

For how long should your certificate be valid? [5y]

## Do you wish to generate one certificate per node?

If you have multiple nodes in your cluster, then you may choose to generate a
separate certificate for each of these nodes. Each certificate will have its
own private key, and will be issued for a specific hostname or IP address.

Alternatively, you may wish to generate a single certificate that is valid
across all the hostnames or addresses in your cluster.

If all of your nodes will be accessed through a single domain
(e.g. node01.es.example.com, node02.es.example.com, etc) then you may find it
simpler to generate one certificate with a wildcard hostname (*.es.example.com)
and use that across all of your nodes.

However, if you do not have a common domain name, and you expect to add
additional nodes to your cluster in the future, then you should generate a
certificate per node so that you can more easily generate new certificates when
you provision new nodes.

Generate a certificate per node? [y/N]N

## Which hostnames will be used to connect to your nodes?

These hostnames will be added as "DNS" names in the "Subject Alternative Name"
(SAN) field in your certificate.

You should list every hostname and variant that people will use to connect to
your cluster over http.
Do not list IP addresses here, you will be asked to enter them later.

If you wish to use a wildcard certificate (for example *.es.example.com) you
can enter that here.

Enter all the hostnames that you need, one per line.
When you are done, press <ENTER> once more to move on to the next step.

localhost

You entered the following hostnames.

 - localhost

Is this correct [Y/n]

## Which IP addresses will be used to connect to your nodes?

If your clients will ever connect to your nodes by numeric IP address, then you
can list these as valid IP "Subject Alternative Name" (SAN) fields in your
certificate.

If you do not have fixed IP addresses, or not wish to support direct IP access
to your cluster then you can just press <ENTER> to skip this step.

Enter all the IP addresses that you need, one per line.
When you are done, press <ENTER> once more to move on to the next step.

127.0.0.1

You entered the following IP addresses.

 - 127.0.0.1

Is this correct [Y/n]

## Other certificate options

The generated certificate will have the following additional configuration
values. These values have been selected based on a combination of the
information you have provided above and secure defaults. You should not need to
change these values unless you have specific requirements.

Key Name: localhost
Subject DN: CN=localhost
Key Size: 2048

Do you wish to change any of these options? [y/N]

## What password do you want for your private key(s)?

Your private key(s) will be stored in a PKCS#12 keystore file named "http.p12".
This type of keystore is always password protected, but it is possible to use a
blank password.

If you wish to use a blank password, simply press <enter> at the prompt below.
Provide a password for the "http.p12" file:  [<ENTER> for none]

## Where should we save the generated files?

A number of files will be generated including your private key(s),
public certificate(s), and sample configuration options for Elastic Stack products.

These files will be included in a single zip archive.

What filename should be used for the output zip file? [/Users/godber/temp/ca/elasticsearch-7.9.3/elasticsearch-ssl-http.zip]

Zip file written to /Users/godber/temp/ca/elasticsearch-7.9.3/elasticsearch-ssl-http.zip

Now unzip the generated elasticsearch-ssl-http.zip file to access the http.p12:

unzip elasticsearch-ssl-http.zip -d temp/
Archive:  elasticsearch-ssl-http.zip
   creating: temp/elasticsearch/
  inflating: temp/elasticsearch/README.txt
  inflating: temp/elasticsearch/http.p12
  inflating: temp/elasticsearch/sample-elasticsearch.yml
   creating: temp/kibana/
  inflating: temp/kibana/README.txt
  inflating: temp/kibana/elasticsearch-ca.pem
  inflating: temp/kibana/sample-kibana.yml

Copy the http.p12 file to config/

cp temp/elasticsearch/http.p12 config/

Now edit the config/elasticsearch.yml file and uncomment the two lines with ssl in them:

discovery.type: single-node
xpack.security.enabled: true
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: "http.p12"

Now start elasticsearch:

bin/elasticsearch

And in another terminal, you should be able to use the temp/ca/ca.crt to access ES with curl over TLS as a user you created earlier:

curl --cacert temp/ca/ca.crt https://elastic:N486tydU4d7J1DwcQ1lL@localhost:9200/
{
  "name" : "shank.localdomain",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "vWUTD4T-Q_C6k_0RL2TUvw",
  "version" : {
    "number" : "7.9.3",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "c4138e51121ef06a6404866cddc601906fe5c868",
    "build_date" : "2020-10-16T10:36:16.141335Z",
    "build_snapshot" : false,
    "lucene_version" : "8.6.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

References:

These may be handy if you're trying to adapt these instructions to docker:

godber commented 2 years ago

I just came across this issue:

https://github.com/terascope/teraslice/issues/1211

ciorg commented 2 years ago

since we're starting to build a client wrapper to unify all our es calls here's the list of functions we should support:

* document related operations
    * bulk - jared
    * count - jared
    * create - jared
    * delete - jared
    * get - jared
    * index - jared
    * delete_by_query - jared
    * update - jared
    * search - charlie
    * exists - charlie
    * info - charlie
    * mget - charlie
    * msearch - charlie
    * ping - charlie
    * reindex - charlie
 * cluster
    * get_settings - jared
    * health - jared
 * indices
   * create - charlie
   * delete - charlie
   * exists - charlie
   * get - charlie
   * exists_template - charlie
   * delete_template - charlie
   * get_template - charlie
   * put_template - charlie
   * get_index_template - charlie
   * get_mapping - jared
   * put_mapping - jared
   * get_field_mapping - jared
   * get_settings - jared
   * put_settings - jared
   * refresh - jared
   * recovery - jared
   * validateQuery - jared
 * tasks
   * cancel - charlie
   * get - charlie
   * list - charlie
 * cat
   * indices - jared
 * node
   * stats - jared
   * info - jared
kstaken commented 1 month ago

@jnoble @ciorg is the work related to this issue complete?

jsnoble commented 1 month ago

The unified base client for elasticsearch/opensearch is done, but not about the elasticsearch-api side of this discussion. We need to migrate and refactor it a bit more before we close this issue