Open philipaconrad opened 2 years ago
// Structs formatted with %v show field values in their default formats.
// The %+v form shows the fields by name, while %#v formats the struct in
// Go source format.
person := struct {
Name string
Age int
}{"Kim", 22}
fmt.Printf("%v %+v %#v\n", person, person, person)
// Result: {Kim 22} {Name:Kim Age:22} struct { Name string; Age int }{Name:"Kim", Age:22}
(source)
Hmm %+v and %#v format verbs will always leak implementation details, and I'm not sure I know a single valid use. We should deal with them in some way, and keep backwards-compatibility to some extend. I think overriding %+v and %#v to %v would be reasonable. WDYT?
I'm fine with that decision. It will require some reworking of how sprintf
currently handles verbs, but that was going to have to happen to resolve the bug anyway. :sweat_smile:
@srenatus Do we have a formal list anywhere of what verbs / verb forms we want sprintf
to handle? :thinking:
I'm thinking if we have an exact set of verbs we want to support, that could make this work somewhat easier, as we can then take verbs not on that list that are found during parsing, and rewrite them into escaped literals, so that fmt.Sprintf
won't process them as verbs.
Currently, it looks like our tests cover:
Verbs:
%d
%f
%s
%v
Verb forms:
%<verb>
(ex: %s
)%<prefix><verb>
(ex: %.1f
)%[<idx>]<verb>
(ex: %[2]d
)%<prefix>[<idx>]<verb>
(ex: %.3[1]f
)This is great. We should definitely "own" the list of supported verbs instead of referring to the golang docs.
Ideally, we'd fall back to something for those we don't support so we don't break stuff for anyone... 🤔
@srenatus I looked over the fmt
docs, and here is my proposed support list/rationales for formatting verbs. Let me know what you think!
%T
(prints type-of-value) might be worth keeping for debugging purposes alone. :thinking: %
. (Encourages consistent golang-style %%
escaping.)fmt.Sprintf
call treats the verb characters as a raw string. (Discourages trying to abuse the underlying implementation details.)Rationale: The formatting directives (or types of directives) below are well-supported across several modern programming languages (e.g. Python, Ruby), or could readily be implemented from one of the other directives included in the list. We want to hit a useful, common-case subset of behavior, since we can always add more stuff to it later, but taking things away from the list will often hurt users. This may also help implementation support in other environments, such as WASM, by reducing the scope of the formatting support required.
Following the style from the Golang fmt
docs:
General:
%v the value in a default format
%T the (Rego) type of the value
%% a literal percent sign; consumes no value
%T
%#v
)
%T
directive because of @srenatus 's suggestion.Boolean:
%t the word true or false
Integer:
%b base 2
%d base 10
%o base 8
%O base 8 with 0o prefix
%c
, %q
, %U
)%x
, %X
)
Floating-point and complex constituents:
%e scientific notation, e.g. -1.234456e+78
%E scientific notation, e.g. -1.234456E+78
%f decimal point but no exponent, e.g. 123.456
%F synonym for %f
%g %e for large exponents, %f otherwise. Precision is discussed below.
%G %E for large exponents, %F otherwise
%b
)%c
, %q
, %U
)%x
, %X
)
String and slice of bytes (treated equivalently with these verbs):
%s the uninterpreted bytes of the string or slice
%q a double-quoted string safely escaped with Go syntax
%q
)%x
, %X
)Slice:
%p
)Pointer:
%p
)+
, -
, #
, ' '
, 0
)%.3f
)fmt.Sprintf("%[2]*.2f", 6, 12.2)
) thankfully only admits integers in the quoted spot, so this won't be too awful to support.Of all the verbs booted from the list, I think %T (prints type-of-value) might be worth keeping for debugging purposes alone.
It doesn't need to happen right away, but printing the rego type for %T would be nice.
Looks like %T
isn't as bad as I thought -- we're not leaking golang struct types -- but it's still got some things to fix:
call | output | expected |
---|---|---|
'sprintf("%T", [true])' | "string" | "boolean" |
'sprintf("%T", [1])' | "int" | "number" |
'sprintf("%T", [1.0])' | "float64" | "number" |
'sprintf("%T", ["1"])' | "string" | ✔️ |
'sprintf("%T", [[]])' | "string" | "array" |
'sprintf("%T", [{}])' | "string" | "object" |
'sprintf("%T", [set()])' | "string" | "set" |
@srenatus, @anderseknert I think I've just run into another case where #4797 has come back to bite us.
The %x
/%X
directives have different meaning for floats versus integers, and this is a problem because we freely switch between the two representations for Number
types behind the scenes, but there are cases where we imply one representation or the other; such as the bitwise builtins (ints), the number builtins (ints, floats), and the units builtins (ints, floats).
I'm honestly not sure how to handle this collision case for our Number
type. :slightly_frowning_face:
Here are some of the options I've considered though:
%x
/%X
entirely. (Removes the problem case, but also a very useful formatting option for bitwise operations. :disappointed: )%x
/%X
(The "pick one" option. Still axes useful functionality, and may subvert user expectations if they plugged in a "float", but get an "integer" %x
/%X
formatting. :shrug: )Thoughts?
We could remove %x/%X entirely. (Removes the problem case, but also a very useful formatting option for bitwise operations. 😞 )
Those bitwise ops would be using integers, only, though? I'm wondering if "pick one" with int wouldn't be the best option here. But that might be my bias never having to use floats much.
Just some thoughts from the hammock here, but let’s not forget that options that might be very useful for bitwise operations, aren’t necessarily very useful for policy authoring in general. The number of policies in the wild using the bitwise functions ought to be somewhere close to a handful, and the number of policies also pretty-printing these closer to zero.
The fixes previously made to handling float/int presentation with sprintf however came from real problems reported by end-users. If we need to prioritize one over the other, it should be an easy choice.
There's also an option 4, which is ... imitate what JS does when it's confronted with bitwise operations: convert to a 32-bit signed integer, and go (Ex: Bitwise OR docs).
This is, admittedly, kind of silly behavior, but it's what JS does. :man_shrugging:
I think for now, if there are no objections, the best path forward is to strike %x
/%X
from the supported verbs list.
If someone complains about wanting this kind of formatting support, we can revisit this decision later, and try to do something reasonable. (Easier to drop now and add it back later, than to add something half-baked.)
Thank you for your inputs, @anderseknert and @srenatus!
Integers do seem a little silly though without hexadecimal support, since we support other bases (2, 8, 10). :thinking:
x := sprintf("%x", [100])
y := sprintf("%x", [100.0])
⬇️
{
"x": "64",
"y": "0x1.9p+06"
}
I'm an ignorant, I have no idea what y
means. 😅
I think striking is fine, but it would be preferable if there was some advise to hand out to anyone needing it. Do we have other means to get the hexadecimal of an integer? 🤔
My $0.02 is that we should either drop support for integers entirely, or we should support every common radix (2, 8, 10, 16). I think simply converting floats to the nearest integer before feeding them to these formatting verbs is a reasonable approach, and is the behavior I would expect if I was an end user and had not read the docs.
If we don't think that's reasonable, we should drop %b, %d, %o, and %O to be consistent, then people can screw with %f to get the right number of decimal places for their use case.
A third option would be to just print the floats in the alternative radixes. FWIW, C++ defines a syntax for hexadecimal floats (source).
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.
Short description
sprintf
now can panic when given malformed format strings. However, this is a difficultpanic
to invoke outside of direct use of the parser/evaluation functions in Golang.Examples:
50b79b92fbc99c, several PRs after thev0.41.0
tag)Steps To Reproduce
Add a new test case to the end of
test/cases/testdata/sprintf/test-sprintf.yaml
:panic
:goroutine 1124 [running]: testing.tRunner.func1.2({0xb39580, 0xc0007fcc78}) /usr/local/go/src/testing/testing.go:1389 +0x24e testing.tRunner.func1() /usr/local/go/src/testing/testing.go:1392 +0x39f panic({0xb39580, 0xc0007fcc78}) /usr/local/go/src/runtime/panic.go:838 +0x207 github.com/open-policy-agent/opa/topdown.builtinSprintf({0xcbd418?, 0xc000dfdfe0?}, {0xcbcf18?, 0xc000c2d140?}) /github-devel-folder/opa/topdown/strings.go:507 +0x41c github.com/open-policy-agent/opa/topdown.functionalWrapper2.func1({{0xcbc150, 0xc0000240c0}, {0xcbe1a0, 0xc0008114e0}, {0xcb77e0, 0xc00007e2d0}, 0xc00075f710, {0x0, 0x0}, 0x0, ...}, ...) /github-devel-folder/opa/topdown/builtins.go:146 +0x73 github.com/open-policy-agent/opa/topdown.evalBuiltin.eval({0xc00109b4a0, 0x10aaa00, {{0xcbc150, 0xc0000240c0}, {0xcbe1a0, 0xc0008114e0}, {0xcb77e0, 0xc00007e2d0}, 0xc00075f710, {0x0, ...}, ...}, ...}, ...) /github-devel-folder/opa/topdown/eval.go:1686 +0x33f github.com/open-policy-agent/opa/topdown.(eval).evalCall(0xc00109b4a0, {0xc0002e0ab0, 0x4, 0x6}, 0xc0002fcc60) /github-devel-folder/opa/topdown/eval.go:813 +0xa85 github.com/open-policy-agent/opa/topdown.(eval).evalStep(0xc00109b4a0, 0xc000d70d50) /github-devel-folder/opa/topdown/eval.go:358 +0x75c github.com/open-policy-agent/opa/topdown.(eval).evalExpr(0xc00109b4a0, 0xc0002fb7a0) /github-devel-folder/opa/topdown/eval.go:332 +0xec github.com/open-policy-agent/opa/topdown.(eval).next(...) /github-devel-folder/opa/topdown/eval.go:164 github.com/open-policy-agent/opa/topdown.(eval).evalExpr.func1(0xc00109b4a0) /github-devel-folder/opa/topdown/eval.go:333 +0x29 github.com/open-policy-agent/opa/topdown.(eval).evalStep.func1() /github-devel-folder/opa/topdown/eval.go:353 +0x39 github.com/open-policy-agent/opa/topdown.(*eval).biunifyValues(0xc00109b4a0, 0xc000b85e78, 0xc000b85e90, 0xc0002fcab0, 0xc0002fcab0, 0xc0002fcba0) ...