Open vi opened 1 month ago
Branch | 2068/merge |
Testbed | ubuntu-latest |
β οΈ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the--ci-only-thresholds
CLI flag.
Benchmark | Latency | nanoseconds (ns) |
---|---|---|
fibonacci 10 | π view plot β οΈ NO THRESHOLD | 488,400.00 |
foldl arrays 50 | π view plot β οΈ NO THRESHOLD | 1,821,900.00 |
foldl arrays 500 | π view plot β οΈ NO THRESHOLD | 23,810,000.00 |
foldr strings 50 | π view plot β οΈ NO THRESHOLD | 7,016,800.00 |
foldr strings 500 | π view plot β οΈ NO THRESHOLD | 61,943,000.00 |
generate normal 250 | π view plot β οΈ NO THRESHOLD | 40,043,000.00 |
generate normal 50 | π view plot β οΈ NO THRESHOLD | 1,806,400.00 |
generate normal unchecked 1000 | π view plot β οΈ NO THRESHOLD | 58,861,000.00 |
generate normal unchecked 200 | π view plot β οΈ NO THRESHOLD | 2,890,200.00 |
pidigits 100 | π view plot β οΈ NO THRESHOLD | 3,152,900.00 |
pipe normal 20 | π view plot β οΈ NO THRESHOLD | 1,484,500.00 |
pipe normal 200 | π view plot β οΈ NO THRESHOLD | 12,034,000.00 |
product 30 | π view plot β οΈ NO THRESHOLD | 836,050.00 |
scalar 10 | π view plot β οΈ NO THRESHOLD | 1,508,200.00 |
sum 30 | π view plot β οΈ NO THRESHOLD | 809,790.00 |
I personally think sigils are a good way forward; they're simple, easy to add, and there's precedent of their use on the CLI. My personal preference would to not duplicate them too much, at the risk of having a cryptic CLI that resembles a Perl expression: for example we could just use @
everywhere, but have a kind of URL/protocol specification that is maybe slightly less succinct but easier to understand. For example: @file:foo.txt
, @env:ENV_VAR
, etc.
We could have sensible but simplistic default: if nothing is specified, it's a file, and we use the extension to deduce the format. We probably also want a way to specify the format explicitly if needed, so that we can write @file:text:foo.txt
. This also solves the issue of how to get a Number directly instead of a string: if an environment variable store a number that we want to interpret as such,, we could do @env:nickel:ENV_VAR
. The default would be text, but this can be discussed.
Shall they be parsed from grammar.lalrpop, just like regular assignments? If so, I am uncertain how to design it properly.
This isn't required, if the parsing is really trivial. I think field paths on the CLI as in foo.bar.baz
in nickel eval config.ncl -- foo.bar.baz=5
used to be parsed separately. But field paths are a bit more involved - you can have escaping, so stuff like foo."bar.baz".balrg
, the code was already existing the grammar as Nickel needs to parse such field definitions as well. So it made sense to avoid duplication and do everything in the grammar.
This is not the case here though, as the syntax we propose isn't valid Nickel. So as long as we know exactly what characters we expect and we don't have any kind of stack to maintain like for well-bracketed expressions, parsing the sigil manually is most probably ok.
For the rest of your questions, I think @foo.txt
should behave exactly as sugar for foo='import "foo.txt"'
. We can even make it actual desugaring: that way, we don't have to duplicate any logic or think too hard about each detail. We can't really do that for environment variables, but this can be a guide: if for any question, the answer is probably how would that behave if we put the same content in a file and imported it instead.
if nothing is specified, it's a file, and we use the extension to deduce the format
if an environment variable store a number that we want to interpret as such
What is security standpoint for Nickel in general? Should be in general protect against mixing code and data?
Shortcuts like @$MY_FILE
meaning @file:text:$MY_FILE
can lead to surprising injections where MY_FILE
variable can be bent to contain a prefix like literal:nickel:evil_code
instead of a file name.
Should Nickel strive to make secure way the easiest and avoid easy to code, superficially working, but unsound pitfalls? Filename extension sniffing with a fallback from passive json/yaml/text to active Nickel code is already a step in the wrong direction if it does.
exactly as sugar for foo='import "foo.txt"'
It would be tricky to just transform @foo.txt
to import "foo.txt"
reliably, as foo.txt
may contain embedded "
or %{}
s. Escaping, then unescaping looks ugly and can also be a security issue if done poorly.
Proper way also opens a path to support non-Unicode filenames, to allow any files to be specified e.g. on Linux, like -- my_field=@file:text:$'ABC\xFF\xFE\xFF.txt'
. Such filenames are cumbersome to round trip through a Nickel code fragment.
But field paths are a bit more involved - you can have escaping, so stuff like foo."bar.baz".balrg, the code was already existing the grammar as Nickel needs to parse such field definitions as well.
Currently implementation does reuse proper parser for the part on the left side of =
character.
It does assume that =
cannot happen in the field name part. Is it the case?
Do I correctly assume that current way of specifying fields on CLI is not stable, so we break backwards compatibility and e.g. require some additional prefix to specify a nontrivial Nickel code as a field value?
Shortcuts like @$MY_FILE meaning @file:text:$MY_FILE can lead to surprising injections where MY_FILE variable can be bent to contain a prefix like literal:nickel:evil_code instead of a file name.
It's important to note that Nickel is entirely pure at the time, and can't dynamically load files. So all an evil file or Nickel snippet could do is...to produce a value (well, or loop indefinitely, there's that). While it's a valid point, code injection isn't the same for Nickel as for say for python.
On one hand I'm tempted to say that it's the user's job to use @file:text:$MY_FILE
if $MY_FILE
is untrusted content. On the other hand you're right that we should make the default, simplest approach the safest. One possibility is to always require the file
part explicitly. Or to chose a syntax where the short one can't be a prefix of the longest one, in some sense. Anyway the @file:text
was just a suggestion: I just want to vouch for readable syntax rather than having 5 ad-hoc Nickel-specific sigils. I hope we can find a syntax that satisfies both readability and avoid potential injections.
It would be tricky to just transform @foo.txt to import "foo.txt" reliably, as foo.txt may contain embedded " or %{}s. Escaping, then unescaping looks ugly and can also be a security issue if done poorly.
I'm not sure to understand this part. Are you talking about the filename that can have embedded "
or %{}s
, or the content of the file? Note we don't have to go through a round trip of actually printing and parsing Nickel code. We're in the interpreter, so we can generate an Import
AST node directly without caring about escaping. What I mean is that the semantics should be the same as if we did the round trip, not that we have to actually do it exactly as text substitution. That should still answer all the questions like "should this go to the import cache" and whatnot.
Currently implementation does reuse proper parser for the part on the left side of = character. It does assume that = cannot happen in the field name part. Is it the case?
Ah, this is a good point. I was thinking about parsing just the sigil part (the value) but we still have to find the value part and it can be non-trivial: indeed fields can contain =
as in foo."bar=baz?".value=true
. So maybe it's safest to do it in the parser. Or maybe we can just take the last =
?
Do I correctly assume that current way of specifying fields on CLI is not stable, so we break backwards compatibility and e.g. require some additional prefix to specify a nontrivial Nickel code as a field value?
In theory, yes. In practice, this means like we need to prefix stuff like true
, 5
, etc. So we have to weight each possibility and see if it's really worth it. Even if it's unstable, people are still probably using it, so breakage must be justified.
So all an evil file or Nickel snippet could do is...to produce a value
It can include arbitrary file from a filesystem, which may be unsandboxed. That value may unexpectedly contain secrets.
Are you talking about the filename that can have embedded " or %{}s, or the content of the file?
The filename. Only \x00
and /
cannot happen in a filename on Linux.
Or maybe we can just take the last =?
What if the filename or a literal string contains a =
?
Maybe CliFieldAssignment
should not parse both Vec<LocIdent>
(the left-hand part) and RichTerm
(today's right hand part) in one go, but do it two steps: firstly just Vec<LocIdent>
with =
and some arbitrary string chunk after =
, then (with the appropriate prefix to indicate arbitrary Nickel expression) it would be a RichTerm
?
require some additional prefix to specify a nontrivial Nickel code as a field value
In practice, this means like we need to prefix stuff like true, 5, etc.
5
and true
counts as a trivial Nickel code. Nontrivial means there can be imports, lets, string interpolation, etc.
If we go the grammar.lalrpop
way then we can discriminate between trivial terms and complex ones.
What if the filename or a literal string contains a =?
Maybe CliFieldAssignment should not parse both Vec
(the left-hand part) and RichTerm (today's right hand part) in one go, but do it two steps: firstly just Vec with = and some arbitrary string chunk after =, then (with the appropriate prefix to indicate arbitrary Nickel expression) it would be a RichTerm?
Hmm. Parsing arbitrary stuff is annoying to do in the current parser infrastructure, because the lexer still produces tokens that are tailored to the Nickel language. It's not impossible to do, but might prove annoying - requiring something like a new lexer mode that is toggled by the parser. Maybe the simplest is to add lexer rules for stuff like @....
instead.
In general I don't see how a prefix for complex expressions is going to help though, because if you do foo=$VAR
then $VAR
could just include this prefix. You probably want the converse - have a prefix to restrict what comes after, but then it becomes the user's responsibility to add it, which is a problem, because the natural foo=$VAR
becomes dangerous (beside files, as you mention, I reckon it could also contain something like @env:PATH
or @env:SECRET
).
Maybe what we should do is the following: by default, restrict everything that can be given as a field value on the CLI. That is, we accept string literals, numbers, bools etc - all literals, and maybe arrays and records of literals - and that's it. It's easy to check after parsing, we can just validate the right hand side.
Then for anything more involved, like foo.bar='import "other.json"'
, Nickel would require the usage of additional flags or permissions - like --allow import
, and --allow expression
for more complex expressions. There are small design questions around if you want those flags to apply to one field only, or all, or flags for both cases.
For the sigils, maybe not having shortcuts is the way forward: you always need to write @file:
and @env:
, which isn't too bad. And we can have a different separator for the format so that it can't be injected, for example @file/json:foo.txt
. So when you do @file:$MYVAR
, MYVAR
can't change the protocol at all.
@
is used for array concatenation in Nickel, but if we chose a different character, we can also put the protocol before, which would solve the injection issue similarly. Say we use ~
as a sigil (we don't want to, but for the sake of the argument). Then the syntax could be file~foo.txt
, file:json~foo.txt
, and ~$MY_VAR
would be a safe shortcut.
foo."bar=baz?".value=true
How important are those field names containing =
? Maybe it can be worth to just limit CLI to simpler, reasonable field names and require a workaround (e.g. a small Nickel file that copies one field to another) to set tricky names?
How important are those field names containing =? Maybe it can be worth to just limit CLI to simpler, reasonable field names and require a workaround (e.g. a small Nickel file that copies one field to another) to set tricky names?
I don't expect that to happen a lot. But I would still be a bit disappointed that an implementation detail (it's annoying to parse) leaks to the interface and someone somewhere on day will get a failure because they use a =
in their field name and won't understand why.
Thinking about this again, in fact we could find the right =
by leveraging the lexer. The lexer is indeed able to handle strings properly, so we could lex the whole thing and extract the first normal =
token that we find (which will provide a span). This =
must be the one of the assignment, as strings are lexed as separate tokens.
Thinking about this again, in fact we could find the right = by leveraging the lexer.
I also though about it.
the whole thing
Can lalrpop parse prefix of a string; not expect EOF, just stop parsing when a specified token is found?
This can be used even to handle non-Unicode filenames that cannot be put in a String
- remember proper byte position of the =
, then apply it to original OsString
argument to get the filename reliably. Should I try implementing it (introduces complexity) or avoid handling this for now to keep things simpler?
Can lalrpop parse prefix of a string; not expect EOF, just stop parsing when a specified token is found?
Not that I know. LALRPOP just pops tokens from the lexer until it can't parse stuff anymore, to the best of my knowledge. But honestly the more I think about it, the more I think the lexer is the easiest: it's cheap (lexing should be pretty fast), it's simple (take until =
), it doesn't need to modify the grammar or to do strange LALRPOP hacks...and if there is no sigil, we can just fallback to parsing it again entirely in a normal way. There is no way the cost of lexing the field path two times is even measurable, I suspect.
And if we think that the sigils are the exception and that normal field assignment should be the fast path, we can do the converse: try to parse, if there is an error, restart a lexer and try to sigil-parse, and if both fails report the original parse error. But I'm not sure this will make any difference, honestly.
^
to use literal string without interpretation@
to read string from file%
to read string from environment variableExample:
Related to #2043.
Open questions:
nickel-lang-cli export ... -- the_number=123
may lead inaccurate user tothe_number=$UNTRUSTED_USER_INPUT
, where the user would assume that it would just fail if it is not a number.grammar.lalrpop
, just like regular assignments? If so, I am uncertain how to design it properly.FieldOverride
(that may gain something likeFieldOverrideValue
enum)?@
go though the cache?IOError
or extend the mainError
type?