hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.74k stars 9.1k forks source link

Add SageMaker support #2493

Closed jckuester closed 4 years ago

jckuester commented 6 years ago

Amazon has announced the new service Sage​Maker to build, train, and deploy machine learning models at scale.

Prerequisite

aws-sdk-go v1.12.36 (https://github.com/terraform-providers/terraform-provider-aws/pull/2474)

Affected Resource(s)

The following resources are needed.

For building model and training:

For the deployment/hosting part:

Expected Behavior

Create, update, delete, and import for above listed SageMaker resources of deployment part:

resource "aws_sagemaker_endpoint" "foo" {
    name = "endpoint-foo"
    endpoint_config_name = "${aws_sagemaker_endpoint_configuration.ec.name}"
}

resource "aws_sagemaker_endpoint_configuration" "ec" {
    name = "endpoint-config-foo"

    production_variants {
                variant_name            = "variant-1"
        model_name = "${aws_sagemaker_model.m.name}"
                initial_instance_count  = 1
                instance_type           = "m3.xlarge"
                initial_variant_weight  = 1
    }
}

resource "aws_sagemaker_model" "m" {
    name = "my-model"

    primary_container {
       image = "111111111111.ecr.us-west-2.amazonaws.com/my-docker-image:latest"
       model_data_url  = "s3://111111111111-foo/model.tar.gz"
    }
}

Example

A terraform example that shows how to use the above resources to deploy your own model is here: https://github.com/terraform-providers/terraform-provider-aws/pull/2585

References

darrenhaken commented 6 years ago

Has anyone picked this one up yet?

jckuester commented 6 years ago

@darrenhaken yes, I am working on it. See the linked PRs in the issue :-)

darrenhaken commented 6 years ago

@jckuester I can't see a PR for the notebook instances, are you working on that too?

jckuester commented 6 years ago

No, please start working on them if you like :)

ddcprg commented 6 years ago

Hi @jckuester @darrenhaken , anyone working on the training job resource? I'd like to contribute to this if possible

jckuester commented 6 years ago

@ddcprg I am only working on the linked PRs above (i.e. for deployment). So please contribute the training job resource if you like.

ddcprg commented 6 years ago

cool @jckuester ! I'll be taking a look soon and comment back on this ticket with an initial prototype

ddcprg commented 6 years ago

@jckuester I now you've done this in your PR but I've raised #2924 just to add the vendor as per the README

I have the training job nearly done, I'm working on testing and I'll add the PR to this ticket as soon as I consider it ready

ddcprg commented 6 years ago

Training Job PR -> #2955

If no one has picked up the notebook resource yet I'll take it

ddcprg commented 6 years ago

Notebook Instance PR -> #2999

ddcprg commented 6 years ago

How can we help to put some traction on these PR's? I'm going to go over the code and add any new features included by AWS recently if needed

ddcprg commented 6 years ago

@jckuester would you mind adding a new item to the list above to include support for "lifecycle configurations"? I'll take a look and post the PR number later

ddcprg commented 6 years ago

@randomcamel @radeksimko how can we help to push these PR's forward?

smrtslckr commented 6 years ago

+1, very interested in this functionality

ashleyjkell commented 6 years ago

Not been any movement on this for a while (looks like failed integration tests), any chance of getting it pushed on?

ddcprg commented 6 years ago

This has been open for long time, we are still waiting feedback from Hashicorp. The failures in the build are due to other issues in master at the time of merging, they may well have gone away by now

jckuester commented 6 years ago

@ashleyjkell from time to time, I rebased my PRs (https://github.com/terraform-providers/terraform-provider-aws/pull/2477 - https://github.com/terraform-providers/terraform-provider-aws/pull/2479) to make integration test green again (I guess they still are). But as long as there is no feedback coming from HashiCorp I will not wasting any more energy here to keeping PRs uptodate/resolving conflicts with main branch.

rainmanh commented 5 years ago

Any updates on this one? Anyone from Hashicorp?

chasmosis commented 5 years ago

I am interested in this functionality as well. Updates?

ddcprg commented 5 years ago

I've just updated my PR's so they are green again. It'd be nice to have some feedback from @bflad or other TF committers. The PR's have been opened for about 10 months now

bcatubig commented 5 years ago

@bflad Is there any way to get these PR's approved and merged?

t-lanigan commented 5 years ago

Also adding my voice to this one. Interested in getting this merged

mbfrahry commented 5 years ago

Hello! I'm sorry for the delay and I'm actively looking at getting SageMaker support merged into the aws provider.

mbfrahry commented 5 years ago

I've just finished my review of the instance and training job resources and will be moving down the list noted here shortly.

ddcprg commented 5 years ago

awesome @mbfrahry ! I'll try to go through you review today and make the necessary changes

jckuester commented 5 years ago

Thanks @mbfrahry! I'll try find to time on the weekend/next week to go through your reviews.

reynico commented 5 years ago

Hi! We're using Terraform v0.11.3. Do we need to update it to last Terraform version in order to use this new module (when merged) or it will get sync'd during terraform init (keeping the 0.11.3)?

mbfrahry commented 5 years ago

Hey @reynico, you won't have to do anything with modifying the version of Terraform. If you're pinned to a specific version of the aws provider, then you'll have to modify that to get these changes when they're released.

rainmanh commented 5 years ago

Any estimate at all when this will be released?

ddcprg commented 5 years ago

Hello everyone. Unfortunately I've not been able to do the modifications as per @mbfrahry 's review and right now my access to AWS is limited and won't be able to test the code with the changes.

If anyone is willing to pick the first two bullets in the description up that'd be great, the PR is #2999 Perhaps @jckuester ? Whoever takes this on can squash my commits or start from scratch if you prefer.

mbfrahry commented 5 years ago

Hey @ddcprg, sorry to hear about your troubles. I'll take this on with @tracypholmes if it hasn't been picked up yet

ddcprg commented 5 years ago

Sure @mbfrahry pick it up, apologies for the inconveniences

bflad commented 5 years ago

The new aws_sagemaker_notebook_instance resource has been released in version 1.56.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

smrtslckr commented 5 years ago

That is a great milestone! I am sorry to be "that person", but I have been more interested in the 3 outstanding deployment and hosting PRs: 2477, 2478, and 2479. Is there a current ETA on those now too?

jckuester commented 5 years ago

@Erstwild we are very close :) I addressed all comments on the sagemaker PRs (which are in the 2nd feedback round now) and am now waiting for approval.

rainmanh commented 5 years ago

@julesjcraske Is there any plan for implementing the ability of disabling the Direct internet access And what about of attaching Lifecycle configurations ?

Many thanks

bcatubig commented 5 years ago

I would say the following are required in a followup pr.

saritajoshi9389 commented 5 years ago

Hi there, Do we have any specific roadmap or timelines by which we are certain about PRs: 2477, 2478, and 2479? Along with attaching Lifecycle configurations ?

bflad commented 5 years ago

The new aws_sagemaker_model resource has been released in version 1.58.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

jckuester commented 5 years ago

Added a new PR for the aws_sagemaker_lifecycle_config resource (https://github.com/terraform-providers/terraform-provider-aws/pull/7585). I also have updates for the aws_sagemaker_notebook_instance in the pipeline (@saritajoshi9389, @bcatubig), as we need those resources in our company @yoyolabsio.

bcatubig commented 5 years ago

Added a PR for DirectInternetAccess support #7884 . Would love some feedback to see if things could be improved.

jckuester commented 5 years ago

Here is another PR (https://github.com/terraform-providers/terraform-provider-aws/pull/8011) that adds all missing attributes to aws_sagemaker_instance, ie. adapts the resource to the latest AWS API.

bflad commented 5 years ago

The aws_sagemaker_endpoint_configuration resource has been released in version 2.4.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

gattil commented 5 years ago

It would be really great to have the notebook lifecycle implemented. This is crucial for one of my clients.

bflad commented 5 years ago

Hi folks 👋 The following will be releasing in version 2.5.0 of the Terraform AWS Provider shortly:

rverma-nikiai commented 5 years ago

@bflad Is this released?

tdmalone commented 5 years ago

@rverma-nikiai The resources available for the AWS provider can be found at https://www.terraform.io/docs/providers/aws/ - if you type 'sagemaker' in the search box at the top left the resources mentioned above (and the new argument) have indeed been released.

bcatubig commented 5 years ago

Created a new PR for DirectInternetAccess support as I nuked my previous PR by mistake. The new PR is https://github.com/terraform-providers/terraform-provider-aws/pull/8618

LarsNeR commented 5 years ago

Is it possible to change the timeout for creating the sagemaker endpoint? For me it stops after 1m saying

ResourceNotReady: failed waiting for successful resource state

Usually creating the endpoint I have takes around 15 minutes. Adding timeouts { create = "30m" delete = "30m" } like it is possible on other resources gives me:

Error decoding timeout: Timeout Key (create) is not supported

bflad commented 5 years ago

Hi @LarsNeR 👋 My recommendation would be to open submit new GitHub issues following the bug report and feature request templates for existing functionality. These large "support service X" GitHub issues tend to not have a clear definition of done and usually get closed out in preference of more targeted issues for tracking individual asks.