Open GreyCat opened 6 years ago
You have suggested to check such relations using syntax in YAML. Having this syntax in YAML has both drawbacks and profits. The drawback is that we have to support it alongside with usual expressions. The profit is that it is yaml, any tools working with yaml would be able to parse it.
So shouod we drop our current expr syntax and start using entirely yaml-based?
So shouod we drop our current expr syntax
What exactly is "our current expr syntax", and what do you mean by "dropping" it? We had no syntax proposal for validation until now, and I suspect what you imply is covered in valid/expr
, for example, checking that value is even:
- id: foo
type: u4
valid:
expr: _ % 2 == 0
A map, which can feature keys
If the field is supposed to contain an enum value, can I check that by specifying a key of enum
, or can any-of
take the enum type-name instead of a list?
@webbnh Probably for enums we can just do the following:
- id: foo
type: u4
enum: bar
valid:
enum-key: true
What exactly is "our current expr syntax", and what do you mean by "dropping" it?
What you have proposed can be expressed as a logical expression, assumming that we have some facilities.
eq: expected
<=>
throw:
if: _ != expected
min: expected
<=> if: _ < expected
max: expected
<=> if: _ > expected
any-of
, enum-key
<=> #434
But we can go from another side. We can eliminate textual expressions and force the developer into using YAML AST.
for example if: a != 0 and b > _io.size
would be something like
size:
and:
- '!=':
- a
- 0
- '>':
- b
- '_io.size'
I know that it is monstrouous space-wasting and unusable, but I'm not sure ;)
My point here is actually very simple: if we'll just do it as one expression — which we actually still can do, i.e. stuff like that is legal:
valid:
expr: _ == 42 # the same as `eq: 42`
valid:
expr: _ >= 42 # the same as `min: 42`
It's easy to do, but we lose declarativity this way. If it's only an expression, one can't just look at this validation spec and render it as a range input (for example, if a min/max boundaries are known), or as a combo box with a set of choices (if a set of choices is known). Basically, if we only have a validation expression, the only thing we can do (without reversing it and applying some symbolic solver) is checking every particular value for validity, and that's it. Declarative configuration offers more, for example, you can clearly get a list of valid values, or boundaries, or stuff like that. Proposed mechanism is also extensible. For example, if we'll want to add size limits of strings in future, we can do something like
valid:
max-size: 42
and it could be rendered as string with limited input box, not asking a user to enter arbitrary string and then rejecting it for some mysterious reason.
I haven't thought about using that info for metadata for GUI purposes. It's cool idea. In this case
reversing it and applying some symbolic solver
IMHO is the best way to do it. If we have problems with performance we can try to cache solver results.
It's easy to do, but we lose declarativity this way.
I guess we don't lose declarativity, since we still declare "this value must satisfy this logical expression" and all the needed info is there, but we lose some verbosity. What we win, is that we don't make an assumption about programmer using the proposed verbose syntax instead of exprs. So even if a programmer used exprs, if we have a system recovering these info from exprs, we would expect it working correctly, even retroactively, when a new extractor is added and it becomes working even on old specs without any modification
Please take into account bit sized values like:
- id: fixed_10
type: b2
contents: [2]
- id: fixed_10
contents: [true, false]
I went forward and do some proof-of-concept implementation in current compiler. Namely:
valid
is now recognized and parsed in its short form => only equals is supported now, but it's really easy to extend it to support ranges, lists, arbitrary expressions, etc.valid_*.ksy
:
The current KSC (0.9-SNAPSHOT) generates incorrect code when valid
is used on an optional field (with if
). The validation check is executed outside the if
clause instead of inside it.
Test code:
meta:
id: valid_if_test
seq:
- id: optional
if: false
type: u1
valid: 42
produces (in JavaScript, but it happens in all languages):
ValidIfTest.prototype._read = function() {
if (false) {
this.optional = this._io.readU1();
}
if (!(this.optional == 42)) {
throw new KaitaiStream.ValidationNotEqualError(42, this.optional, this._io, "/seq/0");
}
}
@generalmimon
The current KSC (0.9-SNAPSHOT) generates incorrect code when
valid
is used on an optional field (withif
). The validation check is executed outside the if clause instead of inside it.
Probably moving it inside if
would make sense, good catch! Can I ask you to contribute relevant test to tests repo?
Sure.
Recent compiler updates fixed ValidFailInst
and ValidNotParsedIf
tests by @generalmimon. Thanks for enhancing our test base!
Not closely related, but you may find it interesting how CUE unifies types and constrains: https://cuelang.org/docs/concepts/logic/
Implemented min
, max
and any-of
clauses. Works for C++, JS, Python, Ruby. Pending runtime fixes for Java.
May be name key expect
? I think, is sounds better:
id: answer
expect: 0x42
expect:
min: 42
max: 42
any-of: [answer]
expr: _ == 42
AFAIK, this feature is basically implemented (all valid
sub-keys described in https://github.com/kaitai-io/kaitai_struct/issues/435#issue-323332353 work), but no proper type checks if the type
of the attribute can be compared with the type of the expressions in valid
sub-keys (like eq
, min
, any-of
, ...) are done yet. For example, this spec should raise the same "can't compare ... and ..." error as the ==
operator currently does:
meta:
id: attr_valid_eq_bad
seq:
- id: foo
type: u1
valid:
eq: '"bar"'
So that means implementing the checks in compiler + adding relevant ErrorMessagesSpec
s testing if the error message is correct. I think it makes sense to make the message look like that one in the existing test for ==
type mismatch: tests/formats_err/expr_compare_enum2.ksy:1
However, I think that this little thing can wait until we release the 0.9 version, so I'm moving this issue from the 0.9 milestone to 0.10.
Given that the keyword is mostly implemented, is it time to add it to the .ksy schema? This would be beneficial for users (like me) that use a YAML language server for validation while making structures
I found out that it will only pay attention to the first valid
key of the field but ignore the others.
seq:
- id: first_byte
type: u1
valid:
eq: 0x00
# ignores this:
valid: 0x01
# ...
valid:
expr: _ >= clip_bottom
# ignores these:
expr: _ <= clip_top
expr: _ == 0x7 or _ == 0x17 or _ == 0x23 or _ == 0x68 or _ == 0xB5
A reasonable behaviour would be a logical AND of all valid
expressions.
It strictly separates min
/max
, eq
, any-of
and expr
so that combinations of these are not accepted. The error message might be confusing, stating that the key is unknown.
Writing it like this is not possible:
seq:
- id: clip_bottom
type: u1
- id: clip_top
type: u1
- id: byte
type: u1
valid:
min: clip_bottom
max: clip_top
any-of: [0x7, 0x17, 0x23, 0x68, 0xB5]
If this would work it would be a fair solution:
# ...
valid:
min: clip_bottom
max: clip_top
valid:
any-of: [0x7, 0x17, 0x23, 0x68, 0xB5]
The only combination, which works at the moment, is one-time min
+ max
. Multiple min
or max
are ignored. A useful behaviour would be an n-ary min
and max
. But if these would be ANDed instead, multiple min
would validate an infimum and multiple max
a supremum.
This is not a pressing issue. There is no functional limitation. One could write the set of validation formulas as a single formula:
# ...
valid:
expr: |
_ >= clip_bottom and
_ <= clip_top and
(_ == 0x7 or _ == 0x17 or _ == 0x23 or _ == 0x68 or _ == 0xB5)
I found out that it will only pay attention to the first valid key of the field but ignore the others.
.ksy
is the file extension, where the letter y
stands for YAML. What is placed into ksy
files is a subset of YAML. YAML is the language to serialize and deserialize possibly nested data like in JSON.
in JavaScript it'd look like
let field = {};
....
field.valid = {eq: 0x00};
field.valid = 0x01;
As you see, the latter assignments override the former. When you store data it is clear that this must never happen. Different YAML parsers react differently on that. Some raise an error. Other ones just take one of the values and silently ignore the rest.
So your mental model is wrong, repeated keys don't mean what you assume them to mean.
... in JavaScript it'd look like
let field = {}; .... field.valid = {eq: 0x00}; field.valid = 0x01;
As you see, the latter assignments overide the former. When you store data it is clear that this must never happen. Different YAML parsers react differently on that. Some raise an error. Other ones just take one of values and silently ignore the rest.
So your mental model is wrong, repeated keys don't mean what you assume them to mean.
Sorry, thank you for clarifying.
Considering that we can assign contents: ["some string", 0x30, 30]
shouldn't we also be able to assign
seq:
- id: sig
size: : 6
valid:
any-of: [ [ "AA", 0, 0, 1, 0 ] , [ "BB", 0, 0, 1, 0 ] ]
currently the compiler throws an error as it expects "string"? I assume that means KSY expressions so I tried [ '["AA", 0, 0, 1, 0]' , '["BB", 0, 0, 1, 0]' ]
to no avail... can't combine output types: CalcStrType vs Int1Type(true)
Heck even the painful approach doesn't work:
seq:
- id: sig1
type: str
size: : 2
valid:
any-of: [ "AA", "BB" ]
- id: sig2
contents: [ 0, 0, 1, 0 ]
KSC seems to evaluate AA
and BB
as KSY expressions which makes it search for id
s matching those values but an id
can only have lower case letters hence an error is thrown...
I accidentally discovered that this was a thing. When will the relevant keywords be added to the docs and the YAML schema (the kaitai-struct-vscode plugin uses it).
I accidentally discovered that this was a thing. When will the relevant keywords be added to the docs and the YAML schema (the kaitai-struct-vscode plugin uses it).
Which version of the VSCode extension do you have? I have v0.8.1 and it does not have the valid-expressions included. They are underlined red in my version. It seems to be outdated sinde it doesn't know the bit-endian
meta property.
A year ago, I tried to contribute documentation about the valid
property but it was entirely ignored.
Here it is: valid expression documentation (Jan, 2022)
Which version of the VSCode extension do you have? I have v0.8.1
Same.
I almost forgot to mention that there is now a new subkey of valid
called valid/in-enum
based on the idea in https://github.com/kaitai-io/kaitai_struct/issues/435#issuecomment-389597122. I suppose this is also related to https://github.com/kaitai-io/kaitai_struct/issues/778, because it was also discussed there. It's implemented for all 9 languages where valid
is implemented in the first place (well, after checking ValidFailInEnum
at https://ci.kaitai.io/, except for Rust, which would be the 10th such language). The only thing I changed was the name. IMHO valid/in-enum
sounds more like a validation constraint than valid/enum-key
, and it seems to be a more intuitive name when you see it in an actual .ksy spec like this:
seq:
- id: foo
type: u4
enum: animal
valid:
in-enum: true
enums:
animal:
4: dog
12: chicken
valid/in-enum: true
checks that the parsed enum value is "in enum
", i.e. that it is a known value defined in the enum specified by the enum
key. It's functionally equivalent to using valid/any-of
with an exhaustive list of all enum entries:
seq:
- id: foo
type: u4
enum: animal
valid:
any-of:
- animal::dog
- animal::chicken
enums:
animal:
4: dog
12: chicken
... but obviously more convenient to use and easier to maintain. Of course, this valid/any-of
pattern still has a use case when you only want to allow a subset of enum values, but if the intent is that it can be any value in the enum, valid/in-enum: true
should be preferred.
It's intended only for fields with the enum
key. If used on a field without it, a compilation error is planned, but it's not implemented yet at the time of writing (as with most other valid
compile-time checks).
In case you were wondering, true
is the only allowed value for valid/in-enum
. valid/in-enum: false
won't work, and there's an already passing test for that:
attr_bad_valid_in_enum_false.ksy: /seq/0/valid/in-enum:
error: only `true` is supported as value, got `false` (if you don't want any validation, omit the `valid` key)
Extracting easier part of #81, with a concrete proposal on implementation.
Context: we want to have some sort of way to state that certain attributes have some sort of "valid" values. Stating that has two implications:
The simplest existing implementation is
contents
: it is essentially a byte array, which validates that its contents are exactly as specified in ksy file. On parsing, we'll throw an error if byte array contents won't match the expected value. On generation, we can just write the expected value, and not bother the end-user with setting the attribute manually.Syntax proposal
Add
valid
key to the attribute spec. Inside it, there could be:eq: expected
states that attribute value must be equal to given value (i.e. the same as above)min: expected
states minimum valid value for this attribute (compared witha >= expected
)max: expected
states maximum valid value for this attribute (compared witha <= expected
)any-of: list-of-expected
states that valid value must be equal to one in the given list of expressionsexpr: expression
states that given expression need to be evaluated (substituting_
with an actual attribute value) to be true to treat attribute value as valid.All
expected
andexpression
values are KS expression strings, which can be constants or can depend on other attributes.list-of-expected
must be a YAML array of KS expression strings.expected
inferred type must be comparable with type of the attribute.expression
inferred type must be boolean.