Open KOLANICH opened 7 years ago
Hooking is indeed an interesting feature, yet so far I fail to understand how it would help you in this particular case.
For now if an error have occured the parsing stops, throws an error and all the parsing result is discarded.
Actually, in --debug
mode, you'll "best effort"-filled structure, i.e. everything except for the last element that failed to be read. This is done by clear separation of "object creation" and "object attributes setting from read values" stages, i.e. the API changes:
// Normal API
a = new Foo(kaitaiStream);
// Debug API
a = new Foo(kaitaiStream); // never fails due to read errors
try {
a._read();
} catch (...) {
// ...
}
// "a" would still exist afterwards, i.e. here
But, AFAIR, it's not implemented for Python.
The broken record has incorrect and insane value for size, which result in running out of stream boundaries.
The problem with that is that to resume you need some valid size, and it's nowhere to be found. If you know that, for example, length of 0xffffffff means zero length of data, you can do that in expression language:
- id: data
size: 'len_data == 0xffffffff ? 0 : len_data'
If you don't know something like that, then I don't really understand how hooking would help.
That struct was in a substream of known size, so failure to parse it won't break the rest of stream. See pcap.ksy
and assume the struct in the body
has type replaced with usbpcap
.
Debug API won't give you the way to skip the failed element and continue parsing, yo need to be inside the loop to do it. Introducing hook points will allow you to save a ksc-generated function parsing an element and replace it with a new one calling the KSC-generated one, which can catch exceptions or do pre- and postprocessing or do anything you like. This doesn't require any modification of a ksy fike.
I have also thought about extension point mechanism in context of signature matching: the type refers a special type, which searches for signatures (all the signatures (the ones marked as signatures with a hint) from the library) and applies the first successfully parsed one.
Construct does support lazy parsing (so corrupted data can be accidentally skipped) but it does not support these kind of semantics.
I have created a separate issue, since I guess it's better to express that in KS than in third-party code. The hook is still useful for that if we need more complex decisions, for example involving ML.
Construct added hooks, see docs: https://construct.readthedocs.io/en/latest/basics.html#processing-on-the-fly
Its worthy to point out that there is a 2nd, related feature. GreedyRange had added discard option, so that each item can be parsed, processed by the hook, and then discarded. This was gigabyte sized files can be parsed without using gigabytes of RAM.
For now if an error have occured the parsing stops, throws an error and all the parsing result is discarded.
I guess there may be good use cases when parsing broken files is needed. So we need a way to deal with such cases. In particular we need a way to capture the exceptions produced and do some custom processing.
IMHO the best way to deal with it is to have some hooks for that. What the hooks do we need? For every property in a struct KSC should create a method, which is used to parse that field. For parsing a sequence it should create a pair of methods, one is for parsing an item and another one using the latter one is for parsing a sequence. Methods should be deduplicated. I mean, that if we have
n > 1
properties having the same type, the compiler should generate, depending on the language, eithern+1
methods (1 method to read the type and n references to it to read members).What should the methods signatures look like? I propose
for fields and instances
for sequence elements
for types.
The example
should generate something like