root-gg / plik

Plik is a temporary file upload system (Wetransfer like) in Go.
https://plik.root.gg
Other
1.47k stars 168 forks source link

Docker: offer to configure S3 storage by environment variables #450

Open oupala opened 2 years ago

oupala commented 2 years ago

It looks like it is not currentyl possible to configure S3 storage by environment variables. The only way to configure an S3 storage is by configuration the plikd.cfg config file.

When you are using a k8s that has an S3 operator, the S3 bucket is dynamically set on startup and credentials are made accessible by environment variables (mainly configMaps).

It would be great if plik can retrieve its S3 credentials from environment variables (aka configMaps) so that is can be dynamically linked to the S3 bucket. This would ignore any settings in plikd.cfg if any setting is also set as an environment variable.

I think this requires a change in the plikd binary so that variable can be read from env variables in addition to configuration file. Am I right?

camathieu commented 2 years ago

Hello,

I think that this is already possible. You should be able to pass a JSON config to the PLIKD_DATA_BACKEND_CONFIG environment variable. See : https://github.com/root-gg/plik#configuration-

oupala commented 2 years ago

A config file is not really the same thing as an environment variable.

A config file is giving all variable at once in a file.

I was expecting to be able to pass each variable as an environment variable. This is especially useful when the S3 bucket is being provisionned by a K8S operator that is making all credentials available by environment variables.

camathieu commented 2 years ago

Each configuration parameter is overridable using environment variable as follow :

One can specify configuration parameters using env variable with the configuration parameter in screaming snake case

PLIKD_DEBUG_REQUESTS=true ./plikd

For Arrays and config maps they must be provided in json format. Arrays are overridden but maps are merged

PLIKD_DATA_BACKEND_CONFIG='{"Directory":"/var/files

Having to pass the whole data backend config as a json in a single environment variable is an issue ?

camathieu commented 2 years ago

If needed we could improve the environment variable parser to understand things like

PLIKD_DATA_BACKEND_CONFIG_DIRECTORY="/var/files"

oupala commented 2 years ago

Here is an extract from plikd.cfg config file:

   DataBackend  = "s3"
   [DataBackendConfig]
       Endpoint = "127.0.0.1:9000"
       AccessKeyID = "access_key_id"
       SecretAccessKey = "access_key_secret"
       Bucket = "plik"
       Location = "us-east-1"
       Prefix = ""
       UseSSL = true
       PartSize = 16000000 // Chunk size when file size is not known. (default to 16MB)
                           // Multiply by 10000 to get the max upload file size (max upload file size 160GB)
       SSE = ""  // the following encryption methods are available :
                 //  - SSE-C: server-side-encryption with customer provided keys ( managed by Plik )
                 //  - S3:    server-side-encryption using S3 storage encryption ( managed by the S3 backend )

As far as I understand, some of these variables can be set via environment variables:

But for the following variables, I think an improvement of the data parser would be required:

In fact, the S3 operator is dynamically creating some environment variables: the endpoint, the access key id, the secret access key, the bucket name. It would not be possible to pass a json file as the S3 operator does not provide a json file, but only unitary environment variables.

camathieu commented 2 years ago

As of now you can already pass the data backend config a JSON string (not a JSON file) with the data backend settings. I'll see if I can implement the improvement you described.

On Mon, Aug 29, 2022, 01:56 oupala @.***> wrote:

Here is an extract from plikd.cfg config file:

DataBackend = "s3" [DataBackendConfig] Endpoint = "127.0.0.1:9000" AccessKeyID = "access_key_id" SecretAccessKey = "access_key_secret" Bucket = "plik" Location = "us-east-1" Prefix = "" UseSSL = true PartSize = 16000000 // Chunk size when file size is not known. (default to 16MB) // Multiply by 10000 to get the max upload file size (max upload file size 160GB) SSE = "" // the following encryption methods are available : // - SSE-C: server-side-encryption with customer provided keys ( managed by Plik ) // - S3: server-side-encryption using S3 storage encryption ( managed by the S3 backend )

As far as I understand, some of these variables can be set via environment variables:

  • DataBackend => PLIKD_DATA_BACKEND

But for the following variables, I think an improvement of the data parser would be required:

  • [DataBackendConfig]
    • Endpoint => PLIKD_DATA_BACKEND_CONFIG_ENDPOINT
    • AccessKeyID => PLIKD_DATA_BACKEND_CONFIG_ACCESS_KEY_ID
    • SecretAccessKey => PLIKD_DATA_BACKEND_CONFIG_SECRET_ACCESS_KEY
    • Bucket => PLIKD_DATA_BACKEND_CONFIG_BUCKET
    • Location => PLIKD_DATA_BACKEND_CONFIG_LOCATION
    • Prefix => PLIKD_DATA_BACKEND_CONFIG_PREFIX
    • UseSSL => PLIKD_DATA_BACKEND_CONFIG_USE_SSL
    • PartSize => PLIKD_DATA_BACKEND_CONFIG_PART_SIZE
    • SSE => PLIKD_DATA_BACKEND_CONFIG_SSE

In fact, the S3 operator is dynamically creating some environment variables: the endpoint, the access key id, the secret access key, the bucket name. It would not be possible to pass a json file as the S3 operator does not provide a json file, but only unitary environment variables.

— Reply to this email directly, view it on GitHub https://github.com/root-gg/plik/issues/450#issuecomment-1229586063, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ5XPWCUWUCN4BKGATWVIDV3P4BBANCNFSM54WLSOAA . You are receiving this because you commented.Message ID: @.***>

oupala commented 2 years ago

As of now you can already pass the data backend config a JSON string (not a JSON file) with the data backend settings. I'll see if I can implement the improvement you described.

Is that behavior documented?

How would you do that?

camathieu commented 2 years ago

For example this in the plikd.cfg config file :

DataBackend  = "s3"
[DataBackendConfig]
    Endpoint = "127.0.0.1:9000"
    AccessKeyID = "access_key_id"
    SecretAccessKey = "access_key_secret"
    Bucket = "plik"
    Location = "us-east-1"

Would look like this using environement vairables :

export PLIKD_DATA_BACKEND="s3"
export PLIKD_DATA_BACKEND_CONFIG='{Endpoint:"127.0.0.1:9000", "AccessKeyID": "access_key_id", "SecretAccessKey ": "access_key_secret", "Bucket":"plik","Location":"us-east-1"}'

As Map/Dict are merged you could specify "safe" parameters like Endpoint or Bucket in the config file and pass only the "secret" parameters using the environement variable like this :

export PLIKD_DATA_BACKEND_CONFIG='{"SecretAccessKey ":"access_key_secret"}'
oupala commented 2 years ago

Thanks for the quick reply.

It would be great if the content of the previous comment was pasted in the documentation.

There is still a usecase were individual environment variables would be required for optimized automation with k8s.

oupala commented 2 years ago

For example, when our S3 K8S operator is creating a new bucket, the operator creates a configmap and a secret that become available in the K8S namespace:

The best solution is that plik is able to use these predefined environment variables by the following deployment syntax:

- name: PLIKD_DATA_BACKEND_CONFIG_ENDPOINT
  valueFrom:
    configMapKeyRef:
      name: <bucket-name>-configmap
      key: BUCKET_HOST
- name: PLIKD_DATA_BACKEND_CONFIG_ACCESS_KEY_ID
  valueFrom:
    secretKeyRef:
      name: <bucket-name>-secret
      key: AWS_ACCESS_KEY_ID
[...]
# and so on for all other variables
mattjhammond commented 1 year ago

I've been trying to get an AWS S3 backend working unsuccessfully. I'm not sure what the Endpoint should be set to, should Prefix be set?

Endpoint = "127.0.0.1:9000"
AccessKeyID = "mykey"
SecretAccessKey = "mysecretkey"
Bucket = "mytestbucket"
Location = "us-east-1"
Prefix = ""
UseSSL = true
PartSize = 16000000 #// Chunk size when file size is not known. (default to 16MB)
                    #// Multiply by 10000 to get the max upload file size (max upload file size 160GB)
SSE = "S3"  #// the following encryption methods are available :

unable to start Plik server : unable to initialize data backend : unable to check if bucket mytestbucket exists : Get "https://127.0.0.1:9000/mytestbucket/?location=": dial tcp 127.0.0.1:9000: connect: connection refused

bodji commented 1 year ago

The endpoint is the AWS url.

You should find one corresponding to the zone you want here : https://docs.aws.amazon.com/general/latest/gr/s3.html

Let us know if you made any progress

mattjhammond commented 1 year ago

@bodji Thanks for the endpoint assistance, the service is working now!

oupala commented 1 year ago

@mattjhammond Next time, please open a new issue as your issue is not the same as the one in the title. One thread = one subject => clean language.