pulumi / pulumi

Pulumi - Infrastructure as Code in any programming language 🚀
https://www.pulumi.com
Apache License 2.0
20.91k stars 1.09k forks source link

Support community language plugins #11882

Open Frassle opened 1 year ago

Frassle commented 1 year ago

Hello!

Issue details

We have a number of issues requesting new languages:

It is not feasible for Pulumi (the company) to build and maintain support for every language. However we would like to support engineers writing Pulumi programs in the language of their choice, and so need to support the community building their own language plugins.

Currently languages are heavily coupled to the pulumi cli itself:

All of this needs fixing to allow the community to have a good experience building and publishing their own language plugins. The best way to support this is that our language plugins should not be privileged, that is the cli should not know about any of our languages. Everything should be done via the language plugin interface.

We made some progress cleaning up things in this space last year (a lot of language specific code in pulumi new for example got moved across to the language plugins, and all of it in pulumi about was moved across).

As well as all of the above that really must be done for the community to have a chance of building their own languages we should also invest in making it so language SDKs need less manual work to be complete. This should be done in two ways:

  1. Moving as much logic as possible to the engine rather than SDKs. There's a lot of things that language SDKs need to deal with today like aliases, dependency chains, URN parsing, parent tracking, etc which can probably be moved fully into the engine.
  2. Making code generating cover some of the core SDK. Bringing up a language means you have to write a code generator for that language, we should see if there's any simple types in the core SDKs that could be code generated given that a code generator already has to be built. This could possibly cover things like the provider interfaces, resource options, property values, that are all defined basically the same for every language.
lbialy commented 1 year ago

Here are some of the issues that we have encountered while implementing all of the JVM-based Pulumi SDKs (Java, Kotlin, Scala). Most of them are related to provider package schemas but some apply to general architectural decisions and feature support.

  1. Sometimes references to types seem to have an invalid (according to the spec) or at least ambiguous format, e.g. (in aws): "$ref": "aws:index/region:Region" or $ref": "aws:apigateway/restApi:RestApi" instead of "$ref": "#types/aws:index/region:Region" and $ref": "#resources/aws:apigateway/restApi:RestApi" Can we rely on some kind of heuristic to fix such references? What if some type references abbreviated in this way were defined in both types and resources sections of a schema?

  2. Even with the possible heuristic shown above, references to some types can’t be resolved as they are not defined in a given schema at all, e.g. (in aws): "$ref": "aws:index/aRN:ARN"

  3. There are also issues regarding letter casing that lead to similar ambiguities, ie: azure-native:network:IPAllocationMethod vs azure-native:network:IpAllocationMethod. Is there any explicit rule regarding resource types that says they should be considered lower-case equal or are those just exceptions?

  4. What is the expected semantic of empty types, e.g. (in aws-native):

    "aws-native:databrew:RecipeParameterMap": {
      "type": "object"
    }

Should this really result in generation of an empty class? Some usages of such types seem to suggest something different, e.g. (inside the definition of aws-native:databrew:RecipeAction):

        "parameters": {
          "oneOf": [
            {
              "$ref": "#/types/aws-native:databrew:RecipeParameters"
            },
            {
              "$ref": "#/types/aws-native:databrew:RecipeParameterMap"
            }
          ]
        }

The intent of using RecipeParameterMap in an alternative seems to be to allow passing arbitrary parameters not defined in RecipeParameters so should RecipeParameterMap be in fact just a map/dictionary?

  1. Definitions of some types cause us to run into some quantitative limitations of JVM, e.g. a) too many parameters for a method, where each parameter corresponds to a (input) property of a type or resource, e.g. for aws:config/endpoints:endpoints b) too long names of types resulting in classfile names exceeding the typical maximal length of a file name for an operating system’s file system, e.g (in aws): “TemplateDefinitionSheetVisualScatterPlotVisualChartConfigurationFieldWellsScatterPlotCategoricallyAggregatedFieldWellsCategoryCategoricalDimensionFieldFormatConfigurationNumericFormatConfigurationCurrencyDisplayFormatConfigurationNullValueFormatConfiguration” - 259 characters Is there some generic mechanism for excluding problematic types like the one mentioned above from code generation? E.g. java doesn’t generate a class for this type. However, other languages which are not directly bound by these limitations, e.g. typescript, don’t include this type in their codegen either.

  2. Some methods available on resources, e.g. event handlers for AWS lambdas like aws.s3.Bucket.onObjectCreated don’t seem to be specified in the schemas of particular providers. How can we know what methods should be generated and what their exact signatures should be? More generally speaking: some providers do seem to contain non-generated code. Is there some kind of knowledge base or at least a list of github issues / PRs that introduce these features into provider package codebases so that we can reimplement those?

  3. Do descriptions of types, properties, etc. have any fixed semantic structure? According to the developers’ docs, descriptions are “Interpreted as Markdown”. However sometimes they seem to have some sections delimited by {{% examples %}} or {{% example %}}. Is this some official (undocumented?) syntax? Does it support some other types of structures which might require special handling in code generation (e.g. removing unnecessary example snippets for other languages)? Again, more generally speaking, how can makers of community-driven language SDKs “hijack” this to provide their own docs in generated provider libraries as the current mechanism seems limited to languages officially supported by Pulumi.

  4. Some parts of providers’ schemas can have language-specific extensions like

    "language": {
    "nodejs": {
    "requiredOutputs": [
      "contains",
      "criteria",
      "eqs",
      "exists",
      "neqs"
    ]
    }

    Is there an exhaustive list of such extensions so that we can figure out which of them we could reuse in our code generation? E.g. currently we reuse the mapping of package names from java.

  5. Do particular components of the Pulumi stack (Pulumi engine, SDKs for languages, schemas for providers, automation API, etc.) use a common and consistent schema of versioning to guarantee specific levels of compatibility? E.g. is semantic versioning enforced? Can changes in a language SDK enforce a (breaking) version bump for a provider’s SDK? Should adding an optional (input) property to a resource in a schema have a guarantee of preserving binary compatibility in signatures of generated methods? Our concern regards the situation in which there are user-built component libraries released as binaries (jars) with dependencies on both SDK and other provider packages. JVM has a flat classpath, this means that there’s a large potential for version conflicts between provider packages used between user-build libraries. These will lead to binary compatibility errors at runtime.

  6. Remote components implementation design issues - in what cases can arbitrary resources be deserialized from wire? There’s a branch in pulumi-java code that was rewritten from C#. It references ResourcePackages class that contains a capability to find all classes on classpath that are annotated with a @Resource annotation and then to find, via further reflective calls, constructors and instantiate resource instances reflectively. In what situations is this branch executed? When can a Resource class arrive serialized to a language SDK from Pulumi engine? From what we can see the only data that we get are: resource type, urn and resource/package version. Should the resource always be rehydrated from the engine by issuing a ReadResource gRPC call after it arrives in this fashion? In current implementation of Pulumi-java it seems to be the case but the resource is initialized with ResourceOptions containing only the urn parameter leading to a readResource call. We are aware that this is somehow related to the mechanics of multi-language components but were under the impression that this feature is only limited to actual component resources and resources, in general, don’t expose other resources via their properties, only plain objects (is this understanding correct?). We have an idea how we can deal with this problem on the JVM without reflection (we are committed not to use runtime reflection in pulumi-scala as we consider it too error-prone) by leveraging Java SPI mechanism and code-generating a service provider implementation in each of pulumi-scala packages that will expose factories by resource type (cuts down unsafety to a single, relatively safe type downcast - from Resource to particular resource type bound to resource type). This is however a relatively large issue for languages like Rust that, AFAIK, have no way to do anything resembling classpath search. I’m guessing only stuff like dynamic loading could help in this case but I’m definitely not an expert in Rust.

In general, remote resources as a feature seem quite complex and hard to understand. For instance, is it possible, in case of import of a foreign stack, that a serialized resource will be received that has a resource type that is not available in the currently executing Pulumi program (i.e. an AWS EC2 instance resource will arrive in a Pulumi program using Java or Scala SDK that has no direct dependency on AWS package (so no jar added to classpath and subsequently no Instance class available). If not, how so?

  1. There's a bunch of options available on RegisterResourceRequest that do not seem to have any relevancy in implementations we've been basing our work on (js, go, java), e.g. acceptSecrets or supportsPartialValues. Is there some kind of documentation available that would explain those flags and parameters in depth?

  2. Going deeper on the implementation side of things - are the semantics of propagation of properties like secretness or protect flag declared (as rules that implementations have to follow) anywhere? While secretness is generally quite understandable (anything derived from a secret must be a secret unless user explicitly makes the output abandon secrecy) it's not that obvious in case of protect flag. Java implementation does something that seems to make rational sense, it checkes whether the parent resource is protected and if it is, the child inherits the protection. That's not what happens in pulumi-go for instance, each resource has protect flag set separately.

lbialy commented 1 year ago

Regarding integration with Pulumi itself: for besom (pulumi-scala) we have found out that by implementing a language host and manually installing language plugin we are able to use besom with pulumi cli without any issues. We haven't really gotten to the point where our codegen would require integration with pulumi tooling (cli, crd2pulumi, tf2pulumi) but we strongly hope that's something that is going to be possible in the future.

There's also the question of integration with docs and examples. We assume that snippets in javadoc/godoc/*doc (mentioned in point 7 above) and/or proper pulumi.com website docs are not written by hand and are also code generated. It would be massively helpful to know the following:

a) from what are these docs generated from? We'd love to generate them for our SDKs too. b) will you host community-driven SDK docs on your site or just provide a link to doc websites of external SDKs? c) if the latter is true, what are the expected standards of such docs? our best guess was that we should probably setup our own site (based on something like docusaurus or similar) and mirror much of the structure of pulumi docs, e.g. concepts, programming model with examples, provider library docs and such, explicit declaration of which things are different from other SDKs and such.

Frassle commented 1 year ago

Sometimes references to types seem to have an invalid (according to the spec) or at least ambiguous format, e.g. (in aws): Can we rely on some kind of heuristic to fix such references? What if some type references abbreviated in this way were defined in both types and resources sections of a schema?

I suspect this is just missing validation, I don't think it's actaully a valid schema if the same name is used for a Type and a Resource. "package:module:member" is probably sufficent everywhere and we should just unify and validate towards that.

Even with the possible heuristic shown above, references to some types can’t be resolved as they are not defined in a given schema at all, e.g. (in aws): "$ref": "aws:index/aRN:ARN"

This is just missing validation. We shouldn't be publishing schemas like this.

There are also issues regarding letter casing that lead to similar ambiguities, ie: azure-native:network:IPAllocationMethod vs azure-native:network:IpAllocationMethod. Is there any explicit rule regarding resource types that says they should be considered lower-case equal or are those just exceptions?

I think the current rule is that pulumi is case-sensitive. I would like to change this to instead be case sensitive but with restrictions on what casing is allowed in a name. The above would then have to be written as "IP_allocation_method", mixed case in words would be disallowed.

What is the expected semantic of empty types, e.g. (in aws-native):

I think an empty type should just be an empty object. I suspect the use of this type here in aws-native is probably a bug?

Definitions of some types cause us to run into some quantitative limitations of JVM, e.g. a) too many parameters for a method, where each parameter corresponds to a (input) property of a type or resource, e.g. for aws:config/endpoints:endpoints b) too long names of types resulting in classfile names exceeding the typical maximal length of a file name for an operating system’s file system, e.g (in aws): “TemplateDefinitionSheetVisualScatterPlotVisualChartConfigurationFieldWellsScatterPlotCategoricallyAggregatedFieldWellsCategoryCategoricalDimensionFieldFormatConfigurationNumericFormatConfigurationCurrencyDisplayFormatConfigurationNullValueFormatConfiguration” - 259 characters Is there some generic mechanism for excluding problematic types like the one mentioned above from code generation? E.g. java doesn’t generate a class for this type. However, other languages which are not directly bound by these limitations, e.g. typescript, don’t include this type in their codegen either.

I don't think there's any way to exclude types from generation. If a language can't generate a type it should probably just make that decision itself and print a warning that the SDK hasn't been fully generated. Ideally every type should be generated, either by workaround generations (e.g. different parameter styles, or untyped maps or something) or by us putting in reasonable limitiations to what's allowed in a schema. We've got to be careful about balancing lowest common denominater languages against good experinces in capable languages there though.

Some methods available on resources, e.g. event handlers for AWS lambdas like aws.s3.Bucket.onObjectCreated don’t seem to be specified in the schemas of particular providers. How can we know what methods should be generated and what their exact signatures should be? More generally speaking: some providers do seem to contain non-generated code. Is there some kind of knowledge base or at least a list of github issues / PRs that introduce these features into provider package codebases so that we can reimplement those?

These are "overlays". We're probably going to remove these, they are by their nature extreamly language specific and are probably better handled by extension packages manually written per-language (e.g. we'd have "@pulumi/aws" and "@pulumi/aws-extensions" where aws-extensions would be a manually written package to supply those extensions.

Do descriptions of types, properties, etc. have any fixed semantic structure? According to the developers’ docs, descriptions are “Interpreted as Markdown”. However sometimes they seem to have some sections delimited by {{% examples %}} or {{% example %}}. Is this some official (undocumented?) syntax? Does it support some other types of structures which might require special handling in code generation (e.g. removing unnecessary example snippets for other languages)? Again, more generally speaking, how can makers of community-driven language SDKs “hijack” this to provide their own docs in generated provider libraries as the current mechanism seems limited to languages officially supported by Pulumi.

This needs an overhaul. We're thinking we'll probably be sticking to markdown because it's so common but not 100% certain. The example sections will be getting replaced with dedicated "example" attributes and will have examples written in PCL (Our internal cross-code markup) or YAML.

Some parts of providers’ schemas can have language-specific extensions like Is there an exhaustive list of such extensions so that we can figure out which of them we could reuse in our code generation? E.g. currently we reuse the mapping of package names from java.

Most of these should probably be removed. In general add what makes sense for your language, don't copy other languages options. We're also looking at a way to supply these data side-by-side rather than inline with the schema.json.

Do particular components of the Pulumi stack (Pulumi engine, SDKs for languages, schemas for providers, automation API, etc.) use a common and consistent schema of versioning to guarantee specific levels of compatibility? E.g. is semantic versioning enforced? Can changes in a language SDK enforce a (breaking) version bump for a provider’s SDK? Should adding an optional (input) property to a resource in a schema have a guarantee of preserving binary compatibility in signatures of generated methods? Our concern regards the situation in which there are user-built component libraries released as binaries (jars) with dependencies on both SDK and other provider packages. JVM has a flat classpath, this means that there’s a large potential for version conflicts between provider packages used between user-build libraries. These will lead to binary compatibility errors at runtime.

We try to stick to semver but it's probably not perfect and nothing automatically enforces it.

Remote components implementation design issues - in what cases can arbitrary resources be deserialized from wire? There’s a branch in pulumi-java code that was rewritten from C#. It references ResourcePackages class that contains a capability to find all classes on classpath that are annotated with a @Resource annotation and then to find, via further reflective calls, constructors and instantiate resource instances reflectively. In what situations is this branch executed? When can a Resource class arrive serialized to a language SDK from Pulumi engine? From what we can see the only data that we get are: resource type, urn and resource/package version. Should the resource always be rehydrated from the engine by issuing a ReadResource gRPC call after it arrives in this fashion? In current implementation of Pulumi-java it seems to be the case but the resource is initialized with ResourceOptions containing only the urn parameter leading to a readResource call. We are aware that this is somehow related to the mechanics of multi-language components but were under the impression that this feature is only limited to actual component resources and resources, in general, don’t expose other resources via their properties, only plain objects (is this understanding correct?). We have an idea how we can deal with this problem on the JVM without reflection (we are committed not to use runtime reflection in pulumi-scala as we consider it too error-prone) by leveraging Java SPI mechanism and code-generating a service provider implementation in each of pulumi-scala packages that will expose factories by resource type (cuts down unsafety to a single, relatively safe type downcast - from Resource to particular resource type bound to resource type). This is however a relatively large issue for languages like Rust that, AFAIK, have no way to do anything resembling classpath search. I’m guessing only stuff like dynamic loading could help in this case but I’m definitely not an expert in Rust.

I think this should only really occur with remote component packges. It probably doesn't strictly need reflection because the schema should say what the expected resources to be returned are.

In general, remote resources as a feature seem quite complex and hard to understand. For instance, is it possible, in case of import of a foreign stack, that a serialized resource will be rec> eived that has a resource type that is not available in the currently executing Pulumi program (i.e. an AWS EC2 instance resource will arrive in a Pulumi program using Java or Scala SDK that has no direct dependency on AWS package (so no jar added to classpath and subsequently no Instance class available). If not, how so?

This is a good point. I think they we're heavily designed around a TypeScript world with dynamic types. We'll need to think about this.

There's a bunch of options available on RegisterResourceRequest that do not seem to have any relevancy in implementations we've been basing our work on (js, go, java), e.g. acceptSecrets or supportsPartialValues. Is there some kind of documentation available that would explain those flags and parameters in depth?

Not really. A lot of these are for compat with old engines and old SDKs. We should probably document these better in the protobuf files themselves.

Going deeper on the implementation side of things - are the semantics of propagation of properties like secretness or protect flag declared (as rules that implementations have to follow) anywhere? While secretness is generally quite understandable (anything derived from a secret must be a secret unless user explicitly makes the output abandon secrecy) it's not that obvious in case of protect flag. Java implementation does something that seems to make rational sense, it checkes whether the parent resource is protected and if it is, the child inherits the protection. That's not what happens in pulumi-go for instance, each resource has protect flag set separately.

Again not really documented anywhere. But there are two paths of work happening to improve this. Firstly we're trying to remove the need for logic around these properties from SDKs, things like inheriting proect from the parent will be done by the engine so SDKs don't need to. Secondly we're starting up a new way to supply test programs to language plugins that will validate they behave in the way the engine expects, this will include things like merging secret and non-secret properties together and checking they are still secret.

Regarding integration with Pulumi itself: for besom (pulumi-scala) we have found out that by implementing a language host and manually installing language plugin we are able to use besom with pulumi cli without any issues. We haven't really gotten to the point where our codegen would require integration with pulumi tooling (cli, crd2pulumi, tf2pulumi) but we strongly hope that's something that is going to be possible in the future.

crd2pulumi and tf2pulumi are both on the path to deprecation (tf2pulumi is in fact already deprecated) so don't worry about integration with those. We are working on new RPC interfaces to allow language plugins to supply codegen to the CLI rather than it being built in and in fact the Go and NodeJS language plugins are already working this way for most cli integrations.

There's also the question of integration with docs and examples. We assume that snippets in javadoc/godoc/*doc (mentioned in point 7 above) and/or proper pulumi.com website docs are not written by hand and are also code generated. It would be massively helpful to know the following: a) from what are these docs generated from? We'd love to generate them for our SDKs too. b) will you host community-driven SDK docs on your site or just provide a link to doc websites of external SDKs? c) if the latter is true, what are the expected standards of such docs? our best guess was that we should probably setup our own site (based on something like docusaurus or similar) and mirror much of the structure of pulumi docs, e.g. concepts, programming model with examples, provider library docs and such, explicit declaration of which things are different from other SDKs and such.

They're generated by our "docgen" tool which is split over https://github.com/pulumi/pulumi-hugo/ and https://github.com/pulumi/pulumi. It's also due a workover so I wouldn't invest much time into learning it now. I don't know if we'll host community driven SDKs, I suspect probably but something we need to discus and plan for internally.

rigzba21 commented 11 months ago

Are the links in the docs for "How can I add support for my favorite language?" still the best place to learn more about this?

Supported languages run out of process and communicate over gRPC with the Pulumi engine and resource providers. Check out the protocol definitions along with the language providers themselves. You can explore how we added support for Go, which should help with scoping. There is also a summary of the core work items needed as part of adding support for a typical new language on the New Language wiki page.

Frassle commented 11 months ago

still the best place to learn more about this?

Yes and feel free to ask questions on the #contribute channel in our community slack.