yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.93k stars 1.06k forks source link

[YSQL][DocDB] Postgres fails when using `node_to_node_encryption_use_client_certificates` #7335

Open Arnav15 opened 3 years ago

Arnav15 commented 3 years ago

Jira Link: DB-2054 The following error happens when using client verification for TLS. This is the error that is generated in the postgres log.

"000000010000300080000000000004ec", num_attempts: 1) failed: Invalid argument (yb/rpc/secure_stream.cc:761): Failed to use private key: passed a null parameter
2021-02-19 13:47:57.044 UTC [22985] FATAL:  Not found: Error loading table with oid 1260 in database with oid 1: Failed to use private key: passed a null parameter
m-iancu commented 3 years ago

From @iSignal on Slack:

it seems like when we call CreateSecureContext in pggate: https://github.com/yugabyte/yugabyte-db/blob/238099e717e51fb5c3a14080e825bcaa4ab06c18/src/yb/yql/pggate/pggate.cc we are not setting the third parameter (node name), so we don't end up loading the node certs/keys in the pg client: https://github.com/yugabyte/yugabyte-db/blob/e092e3f47af47672235ec7d43ccc3fc47cac929d/ent/src/yb/server/secure.cc#L133

tedyu commented 3 years ago

Maybe we can do the following:

diff --git a/src/yb/yql/pggate/pggate.cc b/src/yb/yql/pggate/pggate.cc
index d5f2d50bd..67d681f63 100644
--- a/src/yb/yql/pggate/pggate.cc
+++ b/src/yb/yql/pggate/pggate.cc
@@ -34,6 +34,7 @@
 #include "yb/yql/pggate/ybc_pggate.h"

 #include "yb/util/flag_tags.h"
+#include "yb/util/net/net_util.h"
 #include "yb/client/client_fwd.h"
 #include "yb/client/client_utils.h"
 #include "yb/client/table.h"
@@ -80,9 +81,12 @@ Result<PgApiImpl::MessengerHolder> BuildMessenger(
     const std::shared_ptr<MemTracker>& parent_mem_tracker) {
   std::unique_ptr<rpc::SecureContext> secure_context;
   if (FLAGS_use_node_to_node_encryption) {
+    string hostname;
+    RETURN_NOT_OK(GetHostname(&hostname));
     secure_context = VERIFY_RESULT(server::CreateSecureContext(
         FLAGS_certs_dir,
-        server::UseClientCerts(FLAGS_node_to_node_encryption_use_client_certificates)));
+        server::UseClientCerts(FLAGS_node_to_node_encryption_use_client_certificates),
+        hostname));
   }
   auto messenger = VERIFY_RESULT(client::CreateClientMessenger(
       client_name, num_reactors, metric_entity, parent_mem_tracker, secure_context.get()));
pkj415 commented 3 years ago

We can't use GetHostname because the filename for private key can be derived from any of these three -

  1. The cert_node_filename tserver flag
  2. First of server_broadcast_addresses
  3. First of rpc_bind_addresses

This is from https://github.com/yugabyte/yugabyte-db/issues/5641#issuecomment-705754829

And to solve this exact issue, there was this diff - https://phabricator.dev.yugabyte.com/D9944

I am checking what went wrong after the above fix and will update with a diff here shortly.

pkj415 commented 3 years ago

As per @iSignal 's slack comment - is is true that a third function call argument (which provides file name prefix for key/cert files) is missing. While that can be fixed, let's take a step back - to discuss if we intend to have tls between postgres and tserver processes, and if so in what cases.

Some facts -

  1. use_node_to_node_encryption - this is the gflag to enforce use of tls for all communication between tserver<->tserver, master<->master and tserver<->master. The gflag is supposed to be set to the same value for both master and tserver processes. If they are not the same, tservers and master can't communicate with each other.
  2. node_to_node_encryption_use_client_certificates - this gflag enforces verification of certificates when using tls.
  3. use_client_to_server_encryption - used to enforce tls between external clients and postgres backend processes. If enabled, tablet_server_main sets three gucs in ysql_pg.conf so that postgres can read the key/cert files. The gucs are - ssl_cert_file, ssl_key_file, ssl_ca_file

Now in this issue, when we turn on use_node_to_node_encryption and node_to_node_encryption_use_client_certificates, the tserver processes expect to use tls even with the postgres side.

  1. Do we want to enforce tls communication between postgres and tserver processes?
  2. Answer to 1. depends on whether postgres backends connect just to the local tserver process or even remote tserver processes. We don't really need tls for local tserver comm, but do need it for remote comm. (This is controlled using this gflag - ysql_forward_rpcs_to_local_tserver)

@bmatican tagging since you might know what we need.