kubernetes / test-infra

Test infrastructure for the Kubernetes project.
Apache License 2.0
3.83k stars 2.64k forks source link

S3 support in Prow #11260

Closed apesternikov closed 4 years ago

apesternikov commented 5 years ago

Currently GCS is the only option to use for logs in prow. What would you like to be added: It would be very useful to be able to use AWS/Azure blob services in addition to GCS. Why is this needed: Some companies are using non-GCS blob storage extensively. It would be convenient if they can extend its use to prow. Also, certain corporate IT policies/restrictions may prevent corporate users from adopting prow as CICD solution.

apesternikov commented 5 years ago

We are ready to start working on the implementation.

apesternikov commented 5 years ago

It feels reasonable to try blob implementation from https://github.com/google/go-cloud as abstraction layer. To configure the storage, we can deprecate GCSConfiguration (while keeping compatibility of course) and replace with gs://my-bucket notation as used in go-cloud. Alternatively, we can keep GCSConfiguration and add something like S3Configuration and AzureConfiguration alongside with similar structure to GCSConfiguration. As free benefit, go-cloud also have implementation for file:// and mem:// which could be useful for testing purposes. thoughts?

krzyzacy commented 5 years ago

I would prefer to see a design doc and maybe propose in the weekly sig-testing meeting - this sounds like gonna be a big change.

cc @kubernetes/test-infra-maintainers

fejta commented 5 years ago

I would like to see us move towards more natural structures like gs://bucket/prefix URIs, with or without adding support for other clouds.

I would also like us standardize on using prefix/org/repo rather than the current convoluted strategy (depending on if it's the main repo or in the main org, often not including the org)

More detailed proposal we can review and comment on will be great.

rverma-nikiai commented 5 years ago

Are we accepting pull request for s3. We can implement blob_store_configuration(interface) to replace the gcs_configuration. With aws as one of implementation.

stevekuznetsov commented 5 years ago

@rverma-nikiai sure thing! Let's start with a design document that would outline the approach you want to take, focusing on the interface that would unite the storage systems as well as the migration path for current deployments.

rverma-nikiai commented 5 years ago

@stevekuznetsov may be the suggestion of @apesternikov would be a better approach. I think the target is finally to reduce the direct dependency on package "cloud.google.com/go/storage". We can also implement a generic storage interface to achieve the same. I have created a small doc at https://docs.google.com/document/d/1lAEfYW7xkn1IrW3GKh1WsWb1jxMB4RMrLVsZvXkKBHA/edit.

I would be able to evolve it as per your guidelines :)

fejta commented 5 years ago

Thanks!

The k8s.io/test-infra/pkg/io package already provides methods to read/write to a path, transparently choosing GCS or the local filesystem as appropriate.

I would like to see this design evolve in that direction, and extend this package to support cloud storage systems beyond GCS

https://github.com/kubernetes/test-infra/blob/0dbf9f84c5424c5b70773448ab296d8ae97eb7ee/pkg/io/opener.go#L100-L122

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 5 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

sbueringer commented 5 years ago

@fejta @apesternikov @krzyzacy @rverma-nikiai

I would like to take this over if nobody is currently working on it. If I got it correctly the general direction would be to try to extend the pkg.io.Opener to handle s3 as well as gcs. I would play around with it a bit, try to get a working prototype and open a new proposal for this.

stevekuznetsov commented 5 years ago

Sure thing! I think you've got the right directon there.

/remove-lifecycle rotten /assign @sbueringer /woof

k8s-ci-robot commented 5 years ago

@stevekuznetsov: dog image

In response to [this](https://github.com/kubernetes/test-infra/issues/11260#issuecomment-529614294): >Sure thing! I think you've got the right directon there. > >/remove-lifecycle rotten >/assign @sbueringer >/woof Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sbueringer commented 5 years ago

I have a working prototype for s3 :).

I wrote up a proposal here: https://docs.google.com/document/d/13O1zRX5rpmPCqsPzukavRVqyh4pLp95IeIBVWAZx6Oo/edit?usp=sharing

alvaroaleman commented 5 years ago

Someone did btw start working on this for S3 but does not seem to have the time to finish up that work: https://github.com/kubernetes/test-infra/pull/13400 https://kubernetes.slack.com/archives/CDECRSC5U/p1564678637187100?thread_ts=1562678138.497900&cid=CDECRSC5U

sbueringer commented 5 years ago

@alvaroaleman Thx good to know. Not sure what's easier starting a series of PRs from my working state or going through this huge PR :/

sbueringer commented 5 years ago

@stevekuznetsov @fejta What would be a useful next step. One idea would be to open a first PR to implement s3 support for Tide, which would be a relatively small step and the design could be further discussed in the PR. What do you think?

mikesplain commented 4 years ago

Great work on this @sbueringer and all, I was taking a look at your proposal and would be more than happy to help out. I was wondering what the state of your prototype (https://github.com/sbueringer/test-infra/tree/c445) as I get a number of errors when attempting to build with bazel? I'm doing some more testing to make sure it's not just my build and will post with details later. Let me know if theres anything I can help with!

sbueringer commented 4 years ago

@mikesplain the current state is that we're running it internally with s3 since a few weeks. Not sure about the bazel errors. I'm currently on vacation, but when there's enough support to get this reviewed, I would start opening first PRs early in January (afaik I proposed a few PRs in the doc). If you want to start opening PRs be my guest 😀

mikesplain commented 4 years ago

Thanks @sbueringer! For those wishing to give this a try, I've gotten it running and plan to start more in depth testing as well.

To fix a few bazel issues, you will likely need to run hack/update-bazel.sh.

To save others some time, here are some of the steps I took, I'll writeup something more detailed later (I apologize if this is repetitive or off to those who are more familiar with the test-infra codebase. This assumes you've followed the basic getting started guide and have a base understanding of components and how to deploy them) :

# I was unable to compile gerrit and issue-creator on this branch 
# so I commented them out in prow/BUILD.bazel
hack/update-bazel.sh
# Point bazel to your docker registry
export PROW_REPO_OVERRIDE=gcr.io/<your repo>/k8s-prow
export EDGE_PROW_REPO=gcr.io/<your repo>/k8s-prow-edge
# Do the below or use prow/push.sh
bazel run //prow:release-push --platforms=@io_bazel_rules_go//go/toolchain:linux_amd64
# Bump your yaml manifests or use `bump.sh` to update your prow.  
prow/bump.sh
# Apply manifests or run the bazel command:
bazel run //prow/cluster:production.apply --platforms=@io_bazel_rules_go//go/toolchain:linux_amd64

You'll need to also add aws credentials secret and mount it into deck, tide and a few other components. Right now it still utilizes the gcs flags. Create a secret:

# gcs-credentials secret in your prow namespace and test-pods namespace
{
  "region": "us-east-2",
  "access_key": "keyhere",
  "secret_key": "secrethere",
  "endpoint": "s3.amazonaws.com",
  "insecure": false,
  "s3_force_path_style": true,
  "bucket": "s3://s3bucketname"
}
stevekuznetsov commented 4 years ago

Excited to see a PR to merge that tree in!

tmcgahern commented 4 years ago

Any progress on creating a PR for this issue? We are trying to deploy Prow on an AWS cluster and we would like to use the decorator for our prowjobs but it's failing because of the initupload container requiring a GCS bucket

sbueringer commented 4 years ago

Unfortunately, I didn't had much time the last few weeks. Goal is to refactor step-by-step. The first PR "only" refactors the pkg/io package and enables tide and status-reconciler to use s3: https://github.com/kubernetes/test-infra/pull/16317

I created a PR in my fork to show the differences to the full-s3-support version I'm using in our environment: https://github.com/sbueringer/test-infra/pull/2 (the full-s3-support version currently completely ignores migration aspects, e.g. configuration, but the main components which I'm using work)

sbueringer commented 4 years ago

First PR is merged and tide, gerrit & status-reconciler can now be used with S3. The second PR for gcsupload, initupload and sidecar can be found here: https://github.com/kubernetes/test-infra/pull/16722

fejta commented 4 years ago

Great work! Looking forward to getting the rest of this upstream too :D

sbueringer commented 4 years ago

Next PR is merged. s3 is now also supported by initupload and sidecar. Next PR will be focused on deck/spyglass. This makes it possible to use all core-components with S3 ((I guess core depends on point of view).

@fejta @alvaroaleman What other components are currently using GCS? I only know about crier.

alvaroaleman commented 4 years ago

yeah, Deck/Spyglass and crier seem to be the remaining ones:

$ ack -Q cloud.google.com/go/storage prow/
prow/cmd/deck/main.go
38: "cloud.google.com/go/storage"

prow/cmd/deck/job_history.go
32: "cloud.google.com/go/storage"

prow/cmd/deck/pr_history.go
30: "cloud.google.com/go/storage"

prow/cmd/crier/main.go
26: "cloud.google.com/go/storage"

prow/cmd/gerrit/main_test.go
31: "cloud.google.com/go/storage"

prow/crier/reporters/gcs/internal/util/util.go
26: "cloud.google.com/go/storage"

prow/crier/reporters/gcs/kubernetes/reporter.go
34: "cloud.google.com/go/storage"

prow/crier/reporters/gcs/reporter.go
27: "cloud.google.com/go/storage"

prow/pod-utils/gcs/upload.go
28: "cloud.google.com/go/storage"

prow/spyglass/gcsartifact_test.go
27: "cloud.google.com/go/storage"

prow/spyglass/testgrid.go
25: "cloud.google.com/go/storage"

prow/spyglass/spyglass.go
31: "cloud.google.com/go/storage"

prow/spyglass/gcsartifact.go
25: "cloud.google.com/go/storage"

prow/spyglass/gcsartifact_fetcher.go
32: "cloud.google.com/go/storage"

prow/tide/history/history_test.go
29: "cloud.google.com/go/storage"
ramirezag commented 4 years ago

Hi @sbueringer,

First of all amazing job on implementing https://github.com/kubernetes/test-infra/pull/16722. If you don't mind, would you please support default credential provider chain (see diff below)? Static credential is not possible in our case since we only have federated account with MFA enabled. We heavily rely on the built in providers.

----------------------- prow/io/providers/providers.go ------------------------
index eac46a8c0..3911918a9 100644
@@ -94,11 +94,13 @@ func getS3Bucket(ctx context.Context, creds []byte, bucketName string) (*blob.Bu
    if err := json.Unmarshal(creds, s3Credentials); err != nil {
        return nil, fmt.Errorf("error getting S3 credentials from JSON: %v", err)
    }
-
-   staticCredentials := credentials.NewStaticCredentials(s3Credentials.AccessKey, s3Credentials.SecretKey, "")
+   var configCreds *credentials.Credentials
+   if s3Credentials.AccessKey != "" && s3Credentials.SecretKey != "" {
+       configCreds = credentials.NewStaticCredentials(s3Credentials.AccessKey, s3Credentials.SecretKey, "")
+   }

    sess, err := session.NewSession(&aws.Config{
-       Credentials:      staticCredentials,
+       Credentials:      configCreds,
        Endpoint:         aws.String(s3Credentials.Endpoint),
        DisableSSL:       aws.Bool(s3Credentials.Insecure),
        S3ForcePathStyle: aws.Bool(s3Credentials.S3ForcePathStyle),
sbueringer commented 4 years ago

@ramirezag I assume you're using Prow in AWS. Did you try to use it without specifying a s3-credentials-file? I couldn't try it, but it should auto-discover everything (should be the default in gocloud: https://github.com/kubernetes/test-infra/blob/master/prow/io/providers/providers.go#L71). The io package falls back to gocloud default discovery if no s3-credentials-file is specified.

Of course this assumes that you don't need anything else from the configuration file

ramirezag commented 4 years ago

@sbueringer yes we're using prow in AWS. Ok I will test without specifying s3-credentials-file.

ramirezag commented 4 years ago

@sbueringer GCS or S3 credentials is required when using pod-utilities.

sbueringer commented 4 years ago

@ramirezag I think that check should be removed. Can you test against a version without this check?

ramirezag commented 4 years ago

@sbueringer I honestly haven't figured out how to build or generate prow docker images - I'm still awaiting for a reply in #prow channel. So I cannot test what you suggested.

ramirezag commented 4 years ago

Removed if o.GcsCredentialsFile == "" && o.S3CredentialsFile == "" in prow/gcsupload/options.go and prow/apis/prowjobs/v1/types.go and now I get

{"component":"initupload","error":"failed to upload to blob storage: failed to upload to blob storage: encountered errors during upload: [writer close error: blob (code=Unknown): MissingRegion: could not find region configuration writer close error: blob (code=Unknown): MissingRegion: could not find region configuration upload error: writer close error: blob (code=Unknown): MissingRegion: could not find region configuration]","file":"prow/cmd/initupload/main.go:45","func":"main.main","level":"fatal","msg":"Failed to initialize job","time":"2020-04-13T20:35:30Z"}

This is my config:

plank:
  pod_pending_timeout: 15m
  pod_unscheduled_timeout: 1m
  default_decoration_configs:
    '*':
      timeout: 2h
      grace_period: 15s
      utility_images:
        clonerefs: "{{ .Values.prow_component_image_repo }}/clonerefs:{{ .Values.prow_component_image_tag }}"
        initupload: "{{ .Values.prow_component_image_repo }}/initupload:{{ .Values.prow_component_image_tag }}"
        entrypoint: "{{ .Values.prow_component_image_repo }}/entrypoint:{{ .Values.prow_component_image_tag }}"
        sidecar: "{{ .Values.prow_component_image_repo }}/sidecar:{{ .Values.prow_component_image_tag }}"
      gcs_configuration:
        bucket: s3://somebucket?region=eu-central-1
        path_strategy: "single"
        default_org: org
        default_repo: repo
alvaroaleman commented 4 years ago

@ramirezag can you please open a dedicated issue for S3 default credential provider chain and keep this to "Add S3/Azure support to Prow where we currently support GCS only"?

sbueringer commented 4 years ago

prow: deck: add s3 support #17183 is almost merged

Some followups for the deck PR:

New PR for crier: prow: crier: add s3 support #17618

jhohertz commented 4 years ago

Hello! Ridiculously excited at the prospects of trying out prow in AWS, had my eye on this effort for a while.

Looks like everything pending has been merged, curious if it's pretty much good to go now?

alvaroaleman commented 4 years ago

@jhohertz yes, S3 support is finished, check out config/prow/cluster/starter-s3.yaml.

/retitle S3 support in Prow

@apesternikov I will close this issue as the S3 support is finished. If you (or anyone else for that matter) is still interested in Azure support, please open a dedicated issue for that. It should be much easier now that we have more than one provider supported already.

alvaroaleman commented 4 years ago

/close

k8s-ci-robot commented 4 years ago

@alvaroaleman: Closing this issue.

In response to [this](https://github.com/kubernetes/test-infra/issues/11260#issuecomment-666484677): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sbueringer commented 4 years ago

@jhohertz Yup should be good to go. I, and I guess a few other people, are running this for months now with S3 (without any custom patches). I'm only using it with Minio but we had 1-2 PRs for AWS so I guess it also works there.

mikesplain commented 4 years ago

We've been using it with native aws/s3 for a few months without issue, thanks to everyone for the hard work!

jhohertz commented 4 years ago

Awesome! Thanks for the pointers on getting started, look forward to diving in.