lyraproj / lyra

Open Source Workflow Engine for Cloud Native Infrastructure
https://lyraproj.github.io
Apache License 2.0
212 stars 37 forks source link

TypeScript Support #42

Closed kenazk closed 5 years ago

kenazk commented 5 years ago

Is your feature request related to a problem? Please describe. When I am authoring a Workflow manifest, I want to use TypeScript to describe the resources I'm creating so that I can leverage my existing knowledge and skillset.

Describe the solution you'd like I'd like to be able to describe a set of declarative resources or imperative actions with TypeScript.

Describe alternatives you've considered None

nmuldavin commented 5 years ago

@kenazk Finally got around to writing down that feedback!

I've been messing around with Pulumi's Typescript / Javascript implementation. I think their vision (as expressed in marketing materials) is exactly right but in my view they've gotten some of the details wrong in a way that I hope Lyra can avoid.

The core issue is the complicated relationship between the pulumi package, app.pulumi.com and the nodejs runtime. To illustrate I'll describe what happens when you want to run some code:

  1. User (or ci/cd) runs pulumi up
  2. The pulumi executable looks for a Pulumi.yaml file in the current working directory with the following structure:
    name: my-stack
    runtime: nodejs
    description: Pulumi test stack
  3. Based on the runtime value, the pulumi executable will then search for the remainder of my pulumi code in a bespoke manner. I'll continue my description for nodejs / typescript:
  4. The pulumi executable searches for the entry file specified by the main field in the current working directory's package.json
  5. The pulumi executable then spawns a new process executing my node.js files using some internal node vm (presumably not the one on my machine? Unclear, actually).
  6. As my code executes, the javascript / typescript nodejs library communicates with the main pulumi process, indicating which resources to create. According to the docs the main process should also be constructing a dependency graph, but I've never gotten this to work.
  7. Once my code finishes executing, the pulumi process figures out the new state of the resources you want to create. It pulls the current state (stored at pulumi.com, accessed with your supplied credentials), diffs, and presents you with the update plan for approval (unless given pre-approval).
  8. If approved, it then implements your resource updates, theoretically in order given its understanding of your dependency graph, but unfortunately with the aforementioned bug this isn't working properly.
  9. Once updated, the process syncs with pulumi.com and closes.

Here are my issues:

  1. A lot of it flat-out doesn't work, in particular the dependency-graph parsing. I know that a lot of their content is generated with a terraform bridge script. I suspect it doesn't work as well as they think without manual oversight / editing. That said, I understand they are just starting up.
  2. I understand why they did it this way, but the fact that I can only execute my Pulumi js code by way of the complicated process above makes it extremely inflexible. I cannot, for example, execute the code without first cd-ing to the correct directory, have pulumi code living alongside other javascript code (because it references the main field from package.json which in a nodejs project would be used for other things, or maintain fine-grained control over the runtime environment of my pulumi code.
  3. The complicated manner of executing Pulumi js code is a UX issue because it is different from how I execute all of my other nodejs code. It produces an immediate trust issue: once I figured out Pulumi was going to execute my code for me, all of a sudden I'm wondering: "Is it doing anything weird to my code before it executes it?", "How and with what is it executing my nodejs code, is it using the node vm on my machine or a separate one?", "is it actually running it with node or is it parsing it to an AST and doing something funky with it?". The truth isn't so bad, but I had to dive all the way in to their source code to figure out exactly what was going on there.
  4. The way that Pulumi code reads is fundamentally misleading ... idiomatically it is expressed as though you are creating the resources with your code (i.e. const myCluter = new gcp.container.Cluster('...')), when in fact you are synchronously informing Pulumi of your intent to create those resources once your script has completed. I don't necessarily have a problem with it working this way, but it is subtly confusing and it creates downstream issues in the resultant code.
  5. Pulumi's way of understand your dependency graph from your js / ts code requires that all entities be instances of pulumi.Input and pulumi.Output. If in defining a resource, I use as field value an instance of pulumi.Input, it then infers a dependency between the two resources. The problem is, this restricts the user to programming exclusively with these classes in order to maintain the dependency graph. The pulumi typescript library proves a set of prototype methods on these instances for manipulation such as output.apply(() => ...), pulumi.all([..outputs]), pulumi.interpolate so that you can do basic stuff, but they are limited, and more importantly they prevent use of any existing libraries from the ts / js ecosystem.

The net result is something mixed: It is better than DSL becuase I don't have to learn a new syntax, and is better than yaml because it's a real language, but it is distinctly not normal Typescript or Javascript and therefore fails to leverage the full flexibility of the existing abstractions.

I think lyra is set up to do better.

Here's my proposal:

  1. Individual language frontends (including Typescript) are libraries written entirely in the language of choice. They should run as standard language executables to be compiled and executed by the consumer, with full control over the runtime envinronment. This means I can run node /path/to/myInfraWorkflow.js from wherever, however, and it will work. I could even, if I wanted, write a server that listens to some data source and automatically runs certain workflows. By maintaining this flexibility, we don't have to support your use case because you (the user) can write it and execute it in your language.
  2. Individual language-frontend libraries should communicate with the Lyra workflow engine through an exposed API. The libraries would only need a server and some kind of auth configuration to know how to communicate.
  3. Workflows are defined and executed by the language code itself, not by Lyra. As an example, let's say I want to build a cluster, in a subnet, that's in a network. Here's some js code I could write:

async function makeCluster() {
  const network = await gcp.compute.network('ny-net', {...options});

  const subnet = await gcp.compute.subnetwork('my-subnet-1', {
    network: network.name,
    ...options
  });

  const cluster = await gcp.container.cluster('my-cluster', {
    network: network.name,
    subnetwork: subnetwork.name,
    ...options
  });

  return {
    cluster,
    subnetwork,
    network
  }
}

makeCluster();

Bam, there it is, I've coded my infrastructure with javascript, with lyra running under the hood to achieve my desired state. I do not need lyra to understand that I need my subnet to be created before my cluster, because I just did it in code. For lyra to parse and understand it on top of that would be redundant. I'm also using regular javascript language constructs, not anything Lyra specific, which means I could abstract it using my favorite libraries if I wanted.

kenazk commented 5 years ago

@thallgren ^^

thallgren commented 5 years ago

@nmuldavin the current TypeScript implementation runs as a separate process and communicates with Lyra using grpc. Lyra is in charge of starting that process. Exactly how the packaging will be made is yet TBD but I totally agree that we should interfere as little as possible with existing nodejs packaging.

I like your example. Simple and easy to read. But perhaps a bit too simple.

  1. It's not declarative. This means that if you make changes to your manifest, it will be hard to compute the delta that needs to be executed when applying it.
  2. Without the notion of a delta it's hard to implement the desired preview functionality.
  3. The lyra wf-engine performs a lot of validations on the declared workflow prior to execution (deadlocks, validation of state transitions, type consistency, etc.). The makeCluster() function short-circuits all of that since it actually takes over the responsibility of workflow execution.
  4. Lyra is language neutral and can assemble a workflow from multiple manifests written in different languages, communicating using different types of RPC. I'm not too keen on giving up that notion and transfer top level control to a language front-end.
  5. Our declarative approach gives us the ability to control various aspects of how the workflow will execute by annotating the resource types with things like dependency information, what properties that can be changed without recreating a resource, what properties that are generated, etc. Since that information will be made available by the resource providers, there's no need for the user to repeat the information in the manifests.

Here's an example of why the type information and the declarative approach is important.

Assume that the user has a vpc with subnets. The manifests has been applied an the resources exists in the cloud. Now an attribute in the vpc is changed in the manifest which applied a second time. Two things can happen:

  1. The modified attribute is mutable and can be directly updated by a simple call to the external provider.
  2. The modified attribute is immutable. The only way to update the infrastructure is to first delete the vpc (which implies that all subnets must have been deleted first), and then recreate the vpc again, this time with the new value, and then recreate its subnets. A complicating factor can be a requirement that the old vpc must continue to function until the new vpc, and its subnets, are fully operational. At that time, the old vpc and its subnets are deleted.
thallgren commented 5 years ago

This is the declarative syntax that I'm about to implement (same example as in plugins/yamltest.yaml):

Lyra.serve('aws', workflow({
  input: {
    tags: {type: 'StringMap', lookup: 'aws.tags'}
  },

  output: {
    vpc_id       : 'string',
    subnet_id    : 'string',
    routetable_id: 'string'
  },

  activities: {
    vpc: resource({
      output: 'vpc_id',
      state : (region: string, tags: StringMap) => new Aws.Vpc({
        amazon_provided_ipv6_cidr_block: false,
        cidr_block                     : '192.168.0.0/16',
        enable_dns_hostnames           : false,
        enable_dns_support             : false,
        is_default                     : false,
        state                          : 'available',
        tags                           : tags,
      })
    }),

    subnet: resource({
      output: 'subnet_id',
      state : (vpc_id: string, region: string, tags : StringMap) => new Aws.Subnet({
        vpc_id                         : vpc_id,
        cidr_block                     : '192.168.1.0/24',
        tags                           : tags,
        assign_ipv6_address_on_creation: false,
        map_public_ip_on_launch        : false,
        default_for_az                 : false,
        state                          : 'available'
      })
    }),

    routetable: resource({
      output: 'routetable_id',
      state : (vpc_id: string, tags: StringMap) => new Aws.RouteTable({
        vpc_id: vpc_id,
        tags  : tags
      })
    })
  }
}));
nmuldavin commented 5 years ago

Thanks for your thorough and well stated response. I think a key point is this one: " 4. Lyra is language neutral and can assemble a workflow from multiple manifests written in different languages, communicating using different types of RPC. " If it were written only for Typescript, there would be no reason not to make it more imperative like my example, but keeping it neutral is a great reason not to.

I'm already liking your example a lot better than the Pulumi stuff. Where they fail is their fancy way of implicitly determining resource dependencies from your code. Your example makes the user define inputs and outputs, but I think that's a good thing so that there's less magic going on. It's also implemented with very low level constructs (functions and objects) and is therefore flexible. the state field as a function of inputs is great.

In that example you're essentially writing something you could write in yaml using typescript. How, in your syntax, would I create reusable logic? The prime example is kubernetes .... their config options are extraordinarily lengthy, when really most of the time you want to tweak maybe 3-4 parameters and leave rest fixed. Helm tries to solve this problem by templating yaml with more yaml (!!?!??!), but we can definitely do better in our language frontends. Here's an example I could do in Pulumi:

const makeK8sDeployment = (
  name: string,
  imageName: number,
  replicas = 1,
  containerPort = 3000,
) => {
  const labels = { component: name };

  return new k8s.apps.v1.Deployment(name, {
    {
      spec: {
        replicas,
        selector: {
          matchLabels: labels,
        },
        template: {
          metadata: {
            labels,
          },
          spec: {
            containers: [
              {
                name,
                image: imageName,
                imagePullPolicy: 'Always',
                ports: [
                  {
                    containerPort,
                  },
                ],
              },
            ],
          },
        },
      },
    },
  })
}

Then use it whenever I need a new deployment it's a 1-liner:

const myComponentDeployment = makeK8sDeployment('my-component', 'my-component-image');
const myOtherDeployment = makeK8sDeployment('my-other-component', 'my-other-component-image');

How would you do something similar with Lyra? The reusable logic will need to be abstracted from a single workflow (one could imagine a workflow for updating each service in your cluster, each calling the same makek8sDeployment() method. You should be able to create reusable logic that also takes inputs (imagine one day we have a Docker provider, which would create an image as a managed resource to be fed in to your deployment resource)

thallgren commented 5 years ago

@nmuldavin Having an arbitrary function produce a resource body should work just fine. It may, under some circumstances, introduce the slight inconvenience of having to state the actual resource type in the resource. Unless the type is explicitly given in the resource (it's omitted in my example), Lyra will try to infer the type by first looking at the body of the state function. Failing that (and it will fail if the body is just a function call), it will use the qualified name of the resource activity. The qualified name is based on the location of the activity in the workflow (in my example, that will give the names 'aws.vpc', 'aws.subnet', and 'aws.routetable').

ahpook commented 5 years ago

The implementation is done; we'll close this and build new issues around writing and packaging workflows.