serverlessworkflow / specification

Contains the official specification for the Serverless Workflow Domain Specific Language. It provides detailed guidelines and standards for defining, executing, and managing workflows in serverless environments, ensuring consistency and interoperability across implementations.
http://serverlessworkflow.io
Apache License 2.0
741 stars 147 forks source link

Properly decide on availability of `$secrets` argument #979

Open matthias-pichler opened 2 months ago

matthias-pichler commented 2 months ago

What would you like to be added:

Screenshot 2024-08-19 at 20 35 56

currently $secrets is documented to be available everywhere, but I know @cdavernas mentioned it should only be available in input.from to avoid leaks.

Why is this needed:

JBBianchi commented 2 months ago

I understand the reasoning but I'm concerned it might have unintended consequences. Keeping $secrets accessible everywhere makes it clear to the author that they're dealing with sensitive information, ensuring caution in its use. However, if $secrets is only available in input.from, there's a risk that as the data moves through the pipeline, the secret could be aliased, potentially obscuring its sensitive nature.

do:
- someUnsafeCall:
  call: http
  with:
    method: get
    endpoint: ${ "https://server.com/api/unsafe-auth/\($secrets.token)" # <-- the use of sensitive data is clear

vs

do:
- someUnsafeCall:
  input:
    from:
      token: $secrets.token
  call: http
  with:
    method: get
    endpoint: https://server.com/api/unsafe-auth/{token} # <-- the use of sensitive data is not obvious
cdavernas commented 2 months ago

@JBBianchi I agree with you, but on the other hand leaving them accessible everywhere is like telling the authors it's ok to use them everywhere, like when calling a (potentially custom) function.

Plus, if the user explicitly decided to flow it down, the responsibility is IMHO his.

matthias-pichler commented 2 months ago

@JBBianchi good point. It's much harder to accidentally leak the $secrets arg than the input

cdavernas commented 2 months ago

@JBBianchi good point. It's much harder to accidentally leak the $secrets arg than the input

I disagree. I don't see why it would be harder. For most users, $secrets is just gonna be yet another runtime expression variable.

Annyways, whatever you guys decide is fine by me. I just think we should be extra careful with those, as it's our responsibility to "educate" authors on the proper approach to adopt when using this feature.

cdavernas commented 2 months ago

To quote ChatGPT's opinion on it:

In a Serverless Workflow, the availability of the $secrets argument should be carefully considered to balance security and flexibility.

1. Security Considerations

2. Flexibility Considerations

Best Practice Recommendation

Given the above considerations, the most secure approach is to limit the availability of $secrets to the workflow input and specific, well-defined points within the workflow where access is necessary. This could include:

Implementation Strategy

This approach balances security with the necessary flexibility, ensuring that secrets are protected while still allowing workflows to function as needed.

matthias-pichler commented 2 months ago

I think it's easier to leak when the secret has to be put in the input

imagine I want to send some object (e.g. pet) to an api key authenticated API, first I write:

do:
  - apiCall:
      call: http
      with:
        method: post
        endpoint: https://server.com/api/pets
        body: ${ . }

... then I realise I forgot auth so I add it

do:
  - apiCall:
      input:
        from:
          key: ${ $secrets.apiKey }
      call: http
      with:
        method: post
        endpoint: https://server.com/api/pets
        headers:
          X-API-Key: ${ .key }
        body: ${ . }

and I already made the subtle mistake of no ONLY sending my api key, because input transformations have so much impact an the whole task. It would have been much harder to make such a mistake when using $secrets

JBBianchi commented 2 months ago

Maybe there is a mitigation to be found with implicitly "sharing" a new $secrets variable ?

e.g.:

do:
  - apiCall:
      input:
        $secrets:
          key: ${ $secrets.apiKey }
      call: http
      with:
        method: post
        endpoint: https://server.com/api/pets
        headers:
          X-API-Key: ${ $secrets.key }
        body: ${ . }

And, for instance

do:
  - apiCall:
      input:
        from:
          key: ${ $secrets.apiKey }
#...

Would fail because we do not inject $secrets when processing input.from but only input.$secrets ?

Just throwing ideas here.

matthias-pichler commented 2 months ago

Would fail because we do not inject $secrets when processing input.from but only input.$secrets ?

This feels like it achieves the same just with extra steps because now I have to use ${ .$secrets.foo } (extra .) instead of ${ $secrets.foo } inside the task which seems really confusing. Plus I see this to just being used as

do:
  - apiCall:
      input:
        $secrets: ${ $secrets } # also looks terrible doesn't it 😅 
      call: http
      with:
        ...
cdavernas commented 2 months ago

@matthias-pichler very good sample, which perfectly demonstrates your rightful concerns. You could however leak them in a similar fashion using the runtime expression argument.

Anyways, I won't endlessly ramble on the subject: I think you make a strong and valid point.

@JBBianchi even though I love both the idea and the (awesome) will to find a consensual solution, I feel this would complexify things for authors and might even be confusing.


To conclude, you guys convinced me, and have my vote to definitely close that issue!

JBBianchi commented 2 months ago

I don't think we should dismiss this topic, the concern is real. Imagine a distributed runtime where each activity processor is a FaaS for instance. Is it really a good idea to always send the all $secrets over the wire ? Shouldn't we find a way to explicitly send the $secrets required by the activity ? In that case, shouldn't we enforce a variable name that clearly underlines the sensitive nature of the data to the author ?

cdavernas commented 2 months ago

Imagine a distributed runtime where each activity processor is a FaaS for instance.

Very good point, which was also, if I remember properly, part of the motivation of the initial restriction.

@matthias-pichler Any idea/suggestion?

matthias-pichler commented 2 months ago

Imagine a distributed runtime where each activity processor is a FaaS for instance.

Our runtime is currently implemented such that every task is run in a Lambda Function.

I understand that it's not ideal/least privilege that every Lambda Function needs access to the secrets, however given the tradeoffs I would still prefer having $secrets everywhere. Furthermore currently we have input transformation and task execution in two separate functions but due to performance considerations I am currently merging input/output transforms and task executions into one Lambda Function. So given these circumstances (which I assume are fairly common) it wouldn't change anything and every Task Lambda Function would still need access to secrets for the sake of input transformations

JBBianchi commented 2 months ago

Imagine a distributed runtime where each activity processor is a FaaS for instance.

Our runtime is currently implemented such that every task is run in a Lambda Function.

I understand that it's not ideal/least privilege that every Lambda Function needs access to the secrets, however given the tradeoffs I would still prefer having $secrets everywhere. Furthermore currently we have input transformation and task execution in two separate functions but due to performance considerations I am currently merging input/output transforms and task executions into one Lambda Function. So given these circumstances (which I assume are fairly common) it wouldn't change anything and every Task Lambda Function would still need access to secrets for the sake of input transformations

Well, I'm glad that my hypothetical use-case is an actual one. But I don't think it's OK to share secrets with "everyone", that's quite against their purpose.

What are the arguments against "explicitly select" the secrets provided to a task ?

The following is just a conceptual example, the how can be adjusted:

document:
  #...
use:
  secrets:
  - GitHubToken
  - XToken
do:
  - release:
      secrets:
        - GitHubToken # explicitly share the `GitHubToken` secret
      call: http
      with:
        method: GET
        endpoint:
          uril: https://api.github.com/my-repo/release
          token: ${ $secrets.GitHubToken }
- announceOnX:
      secrets:
        - XToken # explicitly share the `XToken` secret
      call: http
      with:
        method: POST
        endpoint:
          uril: https://api.github.com/my-repo/release
          token: ${ $secrets.XToken }
        body: ${ "Checkout our new release at \(.artifact.url)" }
- updateInternalRegistry:
      call: http
      with:
        method: POST
        endpoint:
          uril: https://server.local
          token: ${ $secrets.GitHubToken } # <--- error, $secrets is empty

It does add an extra step but it doesn't seem to be a huge overhead.

matthias-pichler commented 2 months ago

Well, I'm glad that my hypothetical use-case is an actual one. But I don't think it's OK to share secrets with "everyone", that's quite against their purpose.

I don't quite agree with "share secrets with "everyone"" ... all the Lambda Functions are managed and implemented by us and it would not be any different if it were run in a container. If a custom function (which we don't control) is invoked the input and any expressions in it are evaluated on our side and then the untrusted code is executed with the parsed input... I don't really see who'd we be sharing the secrets with?

What are the arguments against "explicitly select" the secrets provided to a task ?

I think the only one I have is that it's just more work for authors and it seems super repetitive, e.g. when building a workflow that calls the same API 4 times having to specify it 5 times in total (including at the workflow level). It's also so much more overhead around secrets that I have not seen from any other workflow engine. E.g. Pipedream, GitHub, GitLab, etc you define secrets and then they are available as environment variables. And I don't really see any reason why we should make it much more complicated

JBBianchi commented 2 months ago

I don't quite agree with "share secrets with "everyone"" ... all the Lambda Functions are managed and implemented by us and it would not be any different if it were run in a container. If a custom function (which we don't control) is invoked the input and any expressions in it are evaluated on our side and then the untrusted code is executed with the parsed input... I don't really see who'd we be sharing the secrets with?

I must admit I dramatized the situation (although I wouldn't say relying on a 3rd party infrastructure is the same as running a container on your own isolated host). My concern doesn't 100% apply on the situation you describe. The problem would rather be when calling a 3rd party activity processor, or if the network has been compromised (e.g. subject to MitM attacks).

I think the only one I have is that it's just more work for authors and it seems super repetitive, e.g. when building a workflow that calls the same API 4 times having to specify it 5 times in total (including at the workflow level). It's also so much more overhead around secrets that I have not seen from any other workflow engine. E.g. Pipedream, GitHub, GitLab, etc you define secrets and then they are available as environment variables. And I don't really see any reason why we should make it much more complicated

I agree, it's not very pretty and quite cumbersome.