inngest / inngest-js

The developer platform for easily building reliable workflows with zero infrastructure for TypeScript & JavaScript
https://www.inngest.com/
GNU General Public License v3.0
440 stars 43 forks source link

Add `@inngest/middleware-remote-state` #639

Open jpwilliams opened 4 months ago

jpwilliams commented 4 months ago

Summary

Adds @inngest/middleware-remote-state, giving consumers an easy way to push state to a remote store with a dataloader-like API.

We could also:

Checklist

changeset-bot[bot] commented 4 months ago

⚠️ No Changeset found

Latest commit: 7df196196650155a9c0bdcb23567aa8d999f61e4

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

JonParton commented 4 months ago

Hi Guys!

Keeping a watch on this and loving the progress on the Cross Compatible Encryption Middleware too 🙌

I have a query on how this is planned to be implemented that I can't tell from the stub code so far.

Is the plan to allow you to store some event data in remote state rather then all of it? Similar to the Encryption Middleware Customizing Event Encryption Options?

This is to make sure that functionality in Inngest is still functional such as triggering based on a CEL If Statement that references non sensitive top level data, whilst storing all the Data Residency sensitive stuff in a sub encrypted or sensitive etc property of the event payload.

await inngest.send({
  name: "app/account.created",
  data: {
    billingPlan: "pro", // Non sensitive and could be used to trigger functions with an if statement etc
    sensitive: { // The sensitive data being Remote State Stored or Encrypted etc
        accountId: "645e9f6794e10937e9bdc201",
        name: "Dave Smith"
     },
  },
  user: {
    external_id: "645ea000129f1c40109ca7ad",
    email: "taylor@example.com",
  } // Presume if you were being PII Sensitive you just wouldn't include proper PII in the user property as Inngest uses it for delete me etc. 
})

Of course All Step Data should be remote from my understanding as it could have all sorts of data in there from variables, but selecting partial event data to be remote makes sense!

Is there also wheels in motion to implement this Remote State middleware functionality in Python too? 🙈

jpwilliams commented 3 months ago

Hi @JonParton! 👋

Awesome! Aye, the cross-language encryption middleware is teetering on the edge of release, which also opens up some pretty possibilities for building on top of it, like this middleware.

As you say, just like the encryption middleware, pushing some data to a remote store (say, S3) and keeping other parts available to Inngest for CEL expressions would indeed be one of the features of this middleware too.

It can be used for some of the same purposes as the E2E encryption middleware (protecting PII), but may also complement it and be used alongside encryption, perhaps for data residency purposes or for steps with very large outputs (generated files, for example) that may be best stored somewhere external.

No immediate plans yet for Python, but the aim is that this is a relatively thin layer on top of the base we build for @inngest/middleware-encryption; carrying it across to Python would hopefully be a small lift to write once we've settled on the APIs.

I believe you're using the encryption middleware already? I'd love to know:

JonParton commented 3 months ago

Hi @jpwilliams !

So we are not actually using the Encryption middleware just yet as we have some backend services in both Python and Typescript written by different teams, so until they hit parity it was a no go. However, the team is just working to change Event Schema's in preparation for using both of these middleware's.

Are you customizing which fields are encrypted outside of the default data.encrypted?

Our current idea is to structure it as the example above (and repeated below):

await inngest.send({
  name: "app/account.created",
  data: {
    billingPlan: "pro", // Non sensitive and could be used to trigger functions with an if statement etc
    sensitive: { // The sensitive data being Remote State Stored or Encrypted etc
        accountId: "645e9f6794e10937e9bdc201",
        name: "Dave Smith"
     },
  },
  user: {
    external_id: "645ea000129f1c40109ca7ad",
  } // Presume if you were being PII Sensitive you just wouldn't include proper PII in the user property as Inngest uses it for delete me etc. 
})

Basically making it so we store all of the values that need encrypting (Or remote storing) in the sub data.sensitive: prop. We have gone with this over data.encrypted as depending on the current deployments need (We will have multiple environments for different clients) we will use either encryption for security, or the Remote State Storage to help with Data Residency. In this case the more generic data.sensitive makes more sense here and is where we will put any PII we are going to deal with.

Do you use your own encryption service or the default provided by the package?

Plan would be to use the default provided unless there is advise otherwise! Main thing is to make sure there is compatibility between the language SDK's!

No immediate plans yet for Python, but the aim is that this is a relatively thin layer on top of the base we build for @inngest/middleware-encryption; carrying it across to Python would hopefully be a small lift to write once we've settled on the APIs.

We would definitely love the state store middleware to be available in Python too 🙏 Our first strongly data residency sensitive client will need cross compatibility of this in 2-3 months, but will be using the TS version sooner. We can of course patch ourselves if the Middleware hooks are all available in python, but having it out of the box that we may be able to contribute back to would be the preference! 💖


Thanks for all the work on this Jack! And also for nailing one of the other ideas I discussed briefly with Tony the other day too around actually using the Zod Schemas for validation without me even realising it was in flight! 🚀