jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
29.59k stars 1.54k forks source link

Add `--sandbox` flag to prevent dynamic loading of other files/data. #3092

Open jemc opened 2 months ago

jemc commented 2 months ago

Some use cases for jq may involve accepting untrusted input. See discussion in #1361 for some security considerations that may be relevant for those use cases.

This commit adds a --sandbox flag which is meant to mitigate one category of security issue with untrusted input: features of jq which are meant to let the jq filter access files/data other than the direct input data given to the CLI.

Specifically, the new --sandbox flag blocks the implicit loading of $HOME/.jq, and also blocks the use of import and include for loading other jq files.

If other features are added to jq in the future which allow for reading files/data as part of the filter syntax, it is intended that the --sandbox flag would also gate access to those.

itchyny commented 2 months ago

Should this flag block use of env?

wader commented 2 months ago

Should this flag block use of env?

Good point, block i think. Would env and $ENV just be empty objects, null or not defined at all?

Any special consideration around the input* functions?

jemc commented 2 months ago

I will have a crack at updating the PR to make env and $ENV be empty objects when the sandbox flag is present.

Regarding the input* functions, my thinking was that because these related to letting the jq filter interact with the inputs that were explicitly given to the filter by the invoker, I saw this as being "within the sandbox".

jemc commented 2 months ago

I've updated the PR to clear the $ENV value and env builtin, including tests and documentation changes to match. Thanks for the feedback!

Let me know if there is anything else.

itchyny commented 2 months ago

The input, inputs are safe, but I think wader is talking about input_filename and input_line_number.

jemc commented 2 months ago

I've pushed an amended commit that fixes the copy/paste mistake in the test echo strings noticed by @emanuele6.

Regarding input_filename and input_line_number:

There's possibly an argument for saying that input_filename should hide the real filename(s) of the input, but I personally don't find it that compelling. However, if someone here wants to specify a special behavior for that case, I will happily implement it as specified. For reference, note that input_filename is the string "<stdin>" when STDIN is used as the input.

For input_line_number I don't think it's leaking any unexpected security-sensitive information. It lets you examine an aspect of the input which is outside of the scope of JSON data structures, but it's still part of the input, so I think it's definitely fair game for a sandboxed filter. Please note that it also works properly when used with STDIN input, so it's not a feature that needs to peek at a real filesystem file to work properly - it's only looking at the string of the input.

jemc commented 2 months ago

@itchyny @wader @emanuele6 - can any of you give this PR a deeper review and approve it (or give further comments on things to be improved)?

emanuele6 commented 2 months ago

I am not personally particularly interested in this feature as it is implemented, so I can't comment much on it.

itchyny commented 2 months ago

Sorry for waiting, but I think we need more discussion on this topic. Using seccomp(2) could be better, or simple chroot(2) might be enough. Yes, platform-independent block like this PR could be the best. For cli usage we already provide Docker image so that might be enough. Anyway I think we need to collect more opinions, we are unpaid volunteers with the fear of being blamed for security issues.

wader commented 2 months ago

Good points @itchyny. It would be great to hear what @nicowilliams thinks and how it would affect other future sandbox efforts.

One thing might be to very clearly document what --sandbox actually means and also mention other methods to use if stronger isolation is needed like wasm, chroot, seccomp etc.

jemc commented 2 months ago

If there's a fear that --sandbox "implies too much" about the security properties of the feature, or otherwise create confusion, I could rename it to more specifically pinpoint what it actually does.

Naming is hard, but perhaps something like --disable-external-file-access would be less objectionable?

The main point here is that jq has some features which allow for external file access, and at minimum I just want a way to turn those features off.

pkoppstein commented 2 months ago

In the interests of making some progress on this feature, which in some way or another has been discussed since at least March 2017, I'd like to propose that --sandbox be added as an experimental feature, or at least one with explicit indications that certain details are subject to change.

A related thought is that, since there are so many different considerations and criteria regarding "security" and "sandbox", the "sandbox" flag could refer to a JSON object that specifies which "security flags" are to be enabled. E.g. {"now": false, "env": false} would mean that access to now and env are disabled.

jemc commented 1 month ago

A related thought is that, since there are so many different considerations and criteria regarding "security" and "sandbox", the "sandbox" flag could refer to a JSON object that specifies which "security flags" are to be enabled. E.g. {"now": false, "env": false} would mean that access to now and env are disabled.

Are there any other existing examples in jq of a CLI option that takes a JSON object as its value? I didn't find one.

Seems more idiomatic to have separate flags that control separate features.

Again, I'm only interested in disabling the features that allow for arbitrary file access from within the filter, so if I need to rename this flag to only mention the file restriction, I'm happy to rename it. Other flags could be added for controlling other features.

I just need guidance from an authoritative maintainer on what the flag name and desired behavior should be, and I can adapt the PR to match.