teamhephy / postgres

MIT License
2 stars 6 forks source link

support the s3 endpoint config option #3

Closed rwos closed 3 years ago

rwos commented 6 years ago

part of https://github.com/teamhephy/workflow/issues/52

kingdonb commented 5 years ago

So this looks kind of ham-handed, but it will get the job done for anyone who is trying to use AWS_ENDPOINT instead of AWS_REGION

I would like to see a design document to go with this change per https://docs.teamhephy.info/contributing/design-documents/

(We should nonetheless maintain a merged branch for those with S3 endpoints that are not in AWS regions, IMHO)

rwos commented 5 years ago

@kingdonb I put a design doc here: https://github.com/teamhephy/workflow/issues/72 - let me know if that makes sense, or if anything is missing!

Cryptophobia commented 5 years ago

@rwos , you will have to rebase from master as we just merged in #4 where base image and some scripts changed while we switched to official postgresql image.

rwos commented 5 years ago

@Cryptophobia rebased!

Cryptophobia commented 5 years ago

Thanks, I hope to get to this soon.

kingdonb commented 5 years ago

I will be happy to test this on DigitalOcean tonight, if I can get the cluster to deploy this time

Cryptophobia commented 5 years ago

Let us know how it goes @kingdonb :popcorn: :movie_camera:

kingdonb commented 5 years ago

If I don't get around to the testing this evening, then definitely this weekend. This is the only thing stopping me from declaring DigitalOcean K8s support as "basically production-ready," in an experimental sense at least.

kingdonb commented 5 years ago

I am excited to test this! Can anyone report that you've already shown this to be working in a cluster? I'm assuming from the discussion that it is so, but haven't seen it for myself yet.

There are a number of components that demand S3API and they do not all work in the same way, I'd like to test them all together, if possible. But this is the big one.

rwos commented 5 years ago

@kingdonb FWIW, I did have all the parts running in a cluster, and I did run the end-to-end tests against them, with an off-cluster S3 (Virtuozzo Storage at first, currently Minio). Non-production setup though, entirely possible that there's still stuff I have overlooked.

kingdonb commented 5 years ago

I'm building it now... I just noticed DEIS_REGISTY in the Makefile

yebyen commented 5 years ago
2018-11-22 22:18:53.001 UTC [35] LOG:  database system is ready to accept connections
 done
server started
CREATE DATABASE

CREATE ROLE

/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/001_setup_envdir.sh

/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/002_create_bucket.sh
Traceback (most recent call last):
  File "/bin/create_bucket", line 47, in <module>
    if not bucket_exists(conn, bucket_name):
  File "/bin/create_bucket", line 28, in bucket_exists
    bucket = conn.lookup(name)
AttributeError: 'NoneType' object has no attribute 'lookup'

It is possible that I built it wrong, or made some other configuration error... I still don't have CI in place to test a change like this.

I will try your prebuilt images, @rwos

yebyen commented 5 years ago

I tried again after reading the patch and realizing I probably was meant to include "https://" in my S3_ENDPOINT value.

This time I get (after some time elapses):

CREATE DATABASE

CREATE ROLE

/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/001_setup_envdir.sh

/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/002_create_bucket.sh
Traceback (most recent call last):
  File "/bin/create_bucket", line 52, in <module>
    conn.create_bucket(bucket_name, location=region)
  File "/usr/local/lib/python3.5/dist-packages/boto/s3/connection.py", line 619, in create_bucket
    data=data)
  File "/usr/local/lib/python3.5/dist-packages/boto/s3/connection.py", line 671, in make_request
    retry_handler=retry_handler
  File "/usr/local/lib/python3.5/dist-packages/boto/connection.py", line 1071, in make_request
    retry_handler=retry_handler)
  File "/usr/local/lib/python3.5/dist-packages/boto/connection.py", line 1030, in _mexe
    raise ex
  File "/usr/local/lib/python3.5/dist-packages/boto/connection.py", line 943, in _mexe
    request.body, request.headers)
  File "/usr/lib/python3.5/http/client.py", line 1107, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.5/http/client.py", line 1152, in _send_request
    self.endheaders(body)
  File "/usr/lib/python3.5/http/client.py", line 1103, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python3.5/http/client.py", line 934, in _send_output
    self.send(msg)
  File "/usr/lib/python3.5/http/client.py", line 877, in send
    self.connect()
  File "/usr/lib/python3.5/http/client.py", line 1261, in connect
    server_hostname=server_hostname)
  File "/usr/lib/python3.5/ssl.py", line 385, in wrap_socket
    _context=self)
  File "/usr/lib/python3.5/ssl.py", line 760, in __init__
    self.do_handshake()
  File "/usr/lib/python3.5/ssl.py", line 996, in do_handshake
    self._sslobj.do_handshake()
  File "/usr/lib/python3.5/ssl.py", line 641, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:720)

At this point I believe there is some fine-tuning needed, at least, or documentation about the feature... if I remove the "https" and use an unencrypted s3 endpoint, it does appear to reach the bucket.

I exec'd into the container to see if I could diagnose, and since we've used the upstream image I can see that wget and curl are missing. Fortunately the failure takes some time to timeout, so I had enough time to install it... and finding that ca-certificates package is missing.

Without the encryption, I get: ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:720)

I'd say the client is trying to enforce SSL, which is probably good. So chances are, simply adding the ca-certificates package to this image is going to resolve the situation.

yebyen commented 5 years ago

That was the issue, now I have:

<?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidArgument</Code><Message>Server Side Encryption with KMS managed key requires HTTP header x-amz-server-side-encryption : aws:kms</Message><BucketName>hephy-rocks-deis-database</BucketName><RequestId>tx00000000000000616c94e-005bf87191-79b1a-sfo2a</RequestId><HostId>79b1a-sfo2a-sfo</HostId></Error>

It's going to be hard to ensure the database backup is KMS encrypted, when I'm using DigitalOcean spaces... there is a line in rootfs/patcher-script.d/patch_wal_e_s3.py which requests this encryption if you asked for "s3" in DATABASE_STORAGE, which I assume is set from global.storage in values.yaml

I've just converted it to blanket "False" for my own testing and we'll see if that makes it usable. I wonder if this has changed at all in newer versions of boto?

yebyen commented 5 years ago

2018-11-23 21:40:05.977 UTC [1] LOG: database system is ready to accept connections

It's unencrypted but it works! The code is not mergeable though, we need a way to detect that we're using something other than AWS and KMS should not be requested.

I'm afraid of the idea of making a setting for S3_REGION analogous to saying "I am not using AWS and don't encrypt my database backups" though, because no matter how many times we write in the values.yaml "ONLY SET S3_REGION IF YOU ARE NOT USING AWS" someone is going to set it anyway, and they'll probably prefer to have them encrypted as the backups may store some sensitive bits.

duanhongyi commented 5 years ago

@yebyen @kingdonb #5

Ca-certificates have been added to postgres: 11-alpine.