kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.02k stars 197 forks source link

Theoretical background of binary parsing algorithms #40

Closed ponyatov closed 8 years ago

ponyatov commented 8 years ago

What first reading papers can you recommend on theme of error-resistant binary parsing ? What papers can you recommend to dive into binary parsing at all ?

My neighbors @ http://ikp.ssau.ru/ have longtime neglected problem with parsing telemetry data with lot of brokken packets. I found your project as draft decision, but is kaitai algorithms resistant for partial/inconsistent data ?

I'm thinking about some algorithm with sliding parse window and backtracking validation, can you name some widely known methods on it ?

GreyCat commented 8 years ago

Now that's a lot of questions :) I'll try to answer them from the latter to the former.

Unfortunately, Kaitai Struct is probably not the project you're looking for: its not about error-resistant encodings and stuff like that. Error resistance usually means 2 things: one needs to (1) detect errors, (2) correct them. Kaitai Struct offers pretty limited terms of error detection (basically, there are only 2 checks: fixed contents to be encountered at certain locations and running out of bytes to parse in certain streams/substreams, i.e. "end-of-stream" exception). There are no "error correction" facilities per se, mostly because error correction is not about parsing, but about whole protocol implementation. You either do some sort of encoding that survives modification of certain amount of bits, or you implement automatic re-transmission ("ARQ") in protocol, if receiver detects an error in a message.

Last, but not least, in 99% of cases, noise-resistance encodings usually deal with bits, not bytes, i.e. several layers below what KS is normally dealing with. Even when #9 would be implemented, it's still probably won't be enough to implement error-correcting codes (ECC) / forward error correction (FEC) during parsing.

As for papers on error-resistant data transmission, I'm not sure about what level you're talking about, so I'd recommend to start with Wikipedia "Error detection and correction" article and stuff mentioned there. My own knowledge is mostly limited to normal university stuff, i.e. parities, Gray codes, basic checksums / CRCs, Reed-Solomon codes, etc. I've heard that in modern world (in, in 3G / 4G / WiMax telecom transmissions) people tend to use "turbo codes" and "LDPC", especially after patent on "turbo codes" expired in 2013, but I barely know any implementation details.

GreyCat commented 8 years ago

Given the lack of discussion and that it's not really an issue in KS, I'm closing this.