Open matthias-pichler opened 2 months ago
I understand the reasoning but I'm concerned it might have unintended consequences. Keeping $secrets accessible everywhere makes it clear to the author that they're dealing with sensitive information, ensuring caution in its use. However, if $secrets is only available in input.from, there's a risk that as the data moves through the pipeline, the secret could be aliased, potentially obscuring its sensitive nature.
do:
- someUnsafeCall:
call: http
with:
method: get
endpoint: ${ "https://server.com/api/unsafe-auth/\($secrets.token)" # <-- the use of sensitive data is clear
vs
do:
- someUnsafeCall:
input:
from:
token: $secrets.token
call: http
with:
method: get
endpoint: https://server.com/api/unsafe-auth/{token} # <-- the use of sensitive data is not obvious
@JBBianchi I agree with you, but on the other hand leaving them accessible everywhere is like telling the authors it's ok to use them everywhere, like when calling a (potentially custom) function.
Plus, if the user explicitly decided to flow it down, the responsibility is IMHO his.
@JBBianchi good point. It's much harder to accidentally leak the $secrets
arg than the input
@JBBianchi good point. It's much harder to accidentally leak the $secrets arg than the input
I disagree. I don't see why it would be harder. For most users, $secrets
is just gonna be yet another runtime expression variable.
Annyways, whatever you guys decide is fine by me. I just think we should be extra careful with those, as it's our responsibility to "educate" authors on the proper approach to adopt when using this feature.
To quote ChatGPT's opinion on it:
In a Serverless Workflow, the availability of the $secrets
argument should be carefully considered to balance security and flexibility.
$secrets
available only at the workflow input (or in controlled, specific places) reduces the risk of accidental exposure or misuse of sensitive data. The broader the availability of secrets, the higher the chances of them being logged, passed to unintended functions, or exposed in runtime errors.$secrets
to workflow input, you adhere to this principle, ensuring that secrets are only available to components that truly need them.$secrets
to be available throughout the workflow can enhance flexibility and make the workflow easier to design and maintain.$secrets
.Given the above considerations, the most secure approach is to limit the availability of $secrets
to the workflow input and specific, well-defined points within the workflow where access is necessary. This could include:
$secrets
available by default only at the workflow input.$secrets
within the workflow. This could be achieved via configuration or annotations, requiring an extra step to expose secrets, thus making users consciously decide when and where secrets are needed.This approach balances security with the necessary flexibility, ensuring that secrets are protected while still allowing workflows to function as needed.
I think it's easier to leak when the secret has to be put in the input
imagine I want to send some object (e.g. pet) to an api key authenticated API, first I write:
do:
- apiCall:
call: http
with:
method: post
endpoint: https://server.com/api/pets
body: ${ . }
... then I realise I forgot auth so I add it
do:
- apiCall:
input:
from:
key: ${ $secrets.apiKey }
call: http
with:
method: post
endpoint: https://server.com/api/pets
headers:
X-API-Key: ${ .key }
body: ${ . }
and I already made the subtle mistake of no ONLY sending my api key, because input transformations have so much impact an the whole task. It would have been much harder to make such a mistake when using $secrets
Maybe there is a mitigation to be found with implicitly "sharing" a new $secrets
variable ?
e.g.:
do:
- apiCall:
input:
$secrets:
key: ${ $secrets.apiKey }
call: http
with:
method: post
endpoint: https://server.com/api/pets
headers:
X-API-Key: ${ $secrets.key }
body: ${ . }
And, for instance
do:
- apiCall:
input:
from:
key: ${ $secrets.apiKey }
#...
Would fail because we do not inject $secrets
when processing input.from
but only input.$secrets
?
Just throwing ideas here.
Would fail because we do not inject $secrets when processing input.from but only input.$secrets ?
This feels like it achieves the same just with extra steps because now I have to use ${ .$secrets.foo }
(extra .
) instead of ${ $secrets.foo }
inside the task which seems really confusing. Plus I see this to just being used as
do:
- apiCall:
input:
$secrets: ${ $secrets } # also looks terrible doesn't it 😅
call: http
with:
...
@matthias-pichler very good sample, which perfectly demonstrates your rightful concerns. You could however leak them in a similar fashion using the runtime expression argument.
Anyways, I won't endlessly ramble on the subject: I think you make a strong and valid point.
@JBBianchi even though I love both the idea and the (awesome) will to find a consensual solution, I feel this would complexify things for authors and might even be confusing.
To conclude, you guys convinced me, and have my vote to definitely close that issue!
I don't think we should dismiss this topic, the concern is real. Imagine a distributed runtime where each activity processor is a FaaS for instance.
Is it really a good idea to always send the all $secrets
over the wire ?
Shouldn't we find a way to explicitly send the $secrets
required by the activity ?
In that case, shouldn't we enforce a variable name that clearly underlines the sensitive nature of the data to the author ?
Imagine a distributed runtime where each activity processor is a FaaS for instance.
Very good point, which was also, if I remember properly, part of the motivation of the initial restriction.
@matthias-pichler Any idea/suggestion?
Imagine a distributed runtime where each activity processor is a FaaS for instance.
Our runtime is currently implemented such that every task is run in a Lambda Function.
I understand that it's not ideal/least privilege that every Lambda Function needs access to the secrets, however given the tradeoffs I would still prefer having $secrets
everywhere. Furthermore currently we have input transformation and task execution in two separate functions but due to performance considerations I am currently merging input/output transforms and task executions into one Lambda Function. So given these circumstances (which I assume are fairly common) it wouldn't change anything and every Task Lambda Function would still need access to secrets for the sake of input transformations
Imagine a distributed runtime where each activity processor is a FaaS for instance.
Our runtime is currently implemented such that every task is run in a Lambda Function.
I understand that it's not ideal/least privilege that every Lambda Function needs access to the secrets, however given the tradeoffs I would still prefer having
$secrets
everywhere. Furthermore currently we have input transformation and task execution in two separate functions but due to performance considerations I am currently merging input/output transforms and task executions into one Lambda Function. So given these circumstances (which I assume are fairly common) it wouldn't change anything and every Task Lambda Function would still need access to secrets for the sake of input transformations
Well, I'm glad that my hypothetical use-case is an actual one. But I don't think it's OK to share secrets with "everyone", that's quite against their purpose.
What are the arguments against "explicitly select" the secrets provided to a task ?
The following is just a conceptual example, the how can be adjusted:
document:
#...
use:
secrets:
- GitHubToken
- XToken
do:
- release:
secrets:
- GitHubToken # explicitly share the `GitHubToken` secret
call: http
with:
method: GET
endpoint:
uril: https://api.github.com/my-repo/release
token: ${ $secrets.GitHubToken }
- announceOnX:
secrets:
- XToken # explicitly share the `XToken` secret
call: http
with:
method: POST
endpoint:
uril: https://api.github.com/my-repo/release
token: ${ $secrets.XToken }
body: ${ "Checkout our new release at \(.artifact.url)" }
- updateInternalRegistry:
call: http
with:
method: POST
endpoint:
uril: https://server.local
token: ${ $secrets.GitHubToken } # <--- error, $secrets is empty
It does add an extra step but it doesn't seem to be a huge overhead.
Well, I'm glad that my hypothetical use-case is an actual one. But I don't think it's OK to share secrets with "everyone", that's quite against their purpose.
I don't quite agree with "share secrets with "everyone"" ... all the Lambda Functions are managed and implemented by us and it would not be any different if it were run in a container. If a custom function (which we don't control) is invoked the input and any expressions in it are evaluated on our side and then the untrusted code is executed with the parsed input... I don't really see who'd we be sharing the secrets with?
What are the arguments against "explicitly select" the secrets provided to a task ?
I think the only one I have is that it's just more work for authors and it seems super repetitive, e.g. when building a workflow that calls the same API 4 times having to specify it 5 times in total (including at the workflow level). It's also so much more overhead around secrets that I have not seen from any other workflow engine. E.g. Pipedream, GitHub, GitLab, etc you define secrets and then they are available as environment variables. And I don't really see any reason why we should make it much more complicated
I don't quite agree with "share secrets with "everyone"" ... all the Lambda Functions are managed and implemented by us and it would not be any different if it were run in a container. If a custom function (which we don't control) is invoked the input and any expressions in it are evaluated on our side and then the untrusted code is executed with the parsed input... I don't really see who'd we be sharing the secrets with?
I must admit I dramatized the situation (although I wouldn't say relying on a 3rd party infrastructure is the same as running a container on your own isolated host). My concern doesn't 100% apply on the situation you describe. The problem would rather be when calling a 3rd party activity processor, or if the network has been compromised (e.g. subject to MitM attacks).
I think the only one I have is that it's just more work for authors and it seems super repetitive, e.g. when building a workflow that calls the same API 4 times having to specify it 5 times in total (including at the workflow level). It's also so much more overhead around secrets that I have not seen from any other workflow engine. E.g. Pipedream, GitHub, GitLab, etc you define secrets and then they are available as environment variables. And I don't really see any reason why we should make it much more complicated
I agree, it's not very pretty and quite cumbersome.
What would you like to be added:
currently
$secrets
is documented to be available everywhere, but I know @cdavernas mentioned it should only be available ininput.from
to avoid leaks.Why is this needed: