stelligent / mu

A full-stack DevOps on AWS framework
https://getmu.io
MIT License
973 stars 135 forks source link

AWS Batch support via mu? #361

Open AndreyMarchuk opened 5 years ago

AndreyMarchuk commented 5 years ago
  1. What do you think about adding AWS Batch support into mu?
  2. What do you think about Batch compute env being wrapped into mu env?

Example of Batch compute environment, Batch queue and Batch job definitions:

environments:
  - name: dev
    ...
    # AWS batch compute env
    batchComputeEnv:
      - name: myBatchEnv
        type: managed | unmanaged
        serviceRole: # ECS IAM role
        instanceRole: 
        keyName:  # ec2 key pair name
        # compute resources
        provisionModel: onDemand | spot
        allowedInstances: optimal | <c4.large> ... # Optimal chooses the best fit of M4, C4, and R4 instance types available in the region.
        minimumVCPUs:
        desiredVCPUs:
        maximumVCPUs:
        imageId: # AMI id
        # networking
        # allow to use ENV vpcTarget settings (by avoiding vpcId and instanceSubnetIds here)
        vpcId: 
        instanceSubnetIds: 
        securityGroups:
    # Batch queues
    # Job queues with a higher integer value for priority are given preference for compute resources.
    # Jobs are submitted to the connected compute environments based on the order they are listed and the available capacity of those environments.
    batchQueues:
      - name: priority1
        priority: 1 
        computeEnvs:
          - myBatchEnv
      - name: priority5
        priority: 5
        computeEnvs:
          - myBatchEnv
      - name: priority10
        priority: 10
        computeEnvs:
          - myBatchEnv

# AWS batch job definitions    
batchJobs:
  - name: myBatchJob1
    jobRole: # ECS IAM role
    containerImage: amazonlinux
    command: # (optional)
    vCPUs: 2
    memory: 100
    attempts: 1
    execTimeout: 100  # Time (in seconds) to allow each job attempt to run. If your job runs longer than the specified time, it will stop and be moved to FAILED.
    uLimits:
      - name: CORE
        soft: 10
        hard: 80
    parameters:
      param1: value1
    environment:
      envvar1: val1
    # security
    priviledged: true
    user: nobody
    # volumes
    volumes: 
      volname: sourcepath
    readOnlyFilesystem: false
    mountPoints:
      - containerPath: '/srv/www'
        sourcePath: '/opt/build'
        readOnly: false
AndreyMarchuk commented 5 years ago

Created following POC:

  1. batch Compute Environments and Job Queues handled via mu extension
  2. batch pipeline and job definition handled via new 'batch' entity in mu.yml
  3. job definition deploy is done via mu batch deploy

Here is the code for items 2 and 3: https://github.com/stelligent/mu/compare/develop...AndreyMarchuk:feature/batch-job?expand=1

Now the question is: would it be better handled as Service.Provider = batch?

cplee commented 5 years ago

I think having aService.Provider = batch would be the most ideal as it avoids a lot of duplication. how plausible would it be to implement?

AndreyMarchuk commented 5 years ago

POC for Service.Provider = batch

https://github.com/stelligent/mu/compare/develop...AndreyMarchuk:batch-as-service-provider?expand=1

cplee commented 5 years ago

looking good! curious, what is this for?

ProviderOverride     string                 `yaml:"provider,omitempty"`

https://github.com/stelligent/mu/compare/develop...AndreyMarchuk:batch-as-service-provider?expand=1#diff-5db7db86b937470e53496a6ce29a1d3dR160

AndreyMarchuk commented 5 years ago

Currently mu tries to fetch the Env Stack to get the Provider from the environment. ProviderOverride allows to specify Provider on Service level so that Env Stack does not have to exist. Batch job definition registration does not depend on environment.

service:
  name: my-batch-job

  # deployed as AWS Batch job
  provider: batch

It also forces service to be treated as batch even if user mistakenly deploys the service onto non-batch environment (i.e. ecs, ec2 etc)