kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.04k stars 199 forks source link

un-aligned, bit-based data stream #576

Open smarek opened 5 years ago

smarek commented 5 years ago

Hi, i've tried to implement TETRA air-interface protocol (specifically SDS, text/data messages) using Kaitai

TETRA protocol documentation https://www.etsi.org/deliver/etsi_en/300300_300399/30039202/02.03.02_60/en_30039202v020302p.pdf

TETRA is bit-based byte-unaligned protocol, which has a lot of sub-structures (PDUs), all of which are bit-based due to radio-level optimizations.

Sample project with data and readme can be seen here: https://github.com/smarek/kaitai-tetra-sds

What I've been implementing can be found in documentation on page 436 (clause 21.4.1, table 321) and page 443 (clause 21.4.3.1, table 329), where column "length" always refers to number of bits the field takes.

I've been forced to edit Python runtime (kaitaistruct.py) and nullify what align_to_byte() method was doing, so it works at least partially

Is it possible to set the project to be completely un-aligned and what runtimes do support this setting?

Thank you

tan-wei commented 5 years ago

Maybe you can have a look at issue #12.

smarek commented 5 years ago

@tan-wei thank you, i've seen some of these discussions, however my use-case is imho different, because usual suggestion is to manually calculate bit-offset or handle padding. In my case there is no padding, just fields with specific bit-length (some of them are strings, some ints, some BCDs, ...) and no padding between fields is added. That's why I created new ticket, for protocols, that are completely un-aligned, which is not the same as "some of the fields are aligned and some are"

smarek commented 5 years ago

Also from what I've seen, there are two issues:

snippet

        def _read(self):
            self.fill_bit_indication = self._io.read_bits_int(1) != 0
            self.position_of_grant = self._io.read_bits_int(1) != 0
            self.encryption_mode = self._io.read_bits_int(2)
            self.random_access_flag = self._io.read_bits_int(1) != 0
            self.length_indication = self._io.read_bits_int(6)
            self._io.align_to_byte()
            self.address_type = self._root.MacAddressWithType(self._io, self, self._root)
            self.power_control_flag = self._io.read_bits_int(1) != 0
            if self.power_control_flag:
                self.power_control_element = self._io.read_bits_int(4)

            self.slot_granting_flag = self._io.read_bits_int(1) != 0
            if self.slot_granting_flag:
                self.slot_granting_element = self._io.read_bits_int(8)

            self.channel_allocation_flag = self._io.read_bits_int(1) != 0
            self._io.align_to_byte()
            self.channel_allocation_element = self._root.MacChannelAllocationElement(self._io, self, self._root)

using self._io.align_to_byte() on line 7 before parsing sub-sequent type, which is again incorrect in my case

tan-wei commented 5 years ago

@tan-wei thank you, i've seen some of these discussions, however my use-case is imho different, because usual suggestion is to manually calculate bit-offset or handle padding. In my case there is no padding, just fields with specific bit-length (some of them are strings, some ints, some BCDs, ...) and no padding between fields is added. That's why I created new ticket, for protocols, that are completely un-aligned, which is not the same as "some of the fields are aligned and some are"

Feel happy that you have the same requirement with me. According to the author's reply, no work around can be done now. Maybe it can be solved when #12 is closed. I'll keep an eye on this issue.

tan-wei commented 5 years ago

What about #576?

smarek commented 5 years ago

@tan-wei ?? you've sent id of this ticket

GreyCat commented 5 years ago

@smarek

Is it possible to set the project to be completely un-aligned and what runtimes do support this setting?

At the moment, the answer is "no, it's not possible".

Reference to #12 is generally correct. Unfortunately, it's not as easy as it looks. It's not just a matter of removing align_to_byte calls, it's also a matter of:

In general, it's a complex task. Syntax suggested in #12 generally seems to answer at least some of these questions, and that's at least the first logical step to take. If this feature is important to you, please consider contributing some tests, compiler code, etc, to make it faster.

smarek commented 5 years ago

@GreyCat thank you for your insight. I'm currently motivated to push some work into the topic.

If this is matter of runtime I can possibly contribute the steps you've listed, if it's matter of compiler, then that is probably out of my capacity at the moment

GreyCat commented 5 years ago

@smarek As the first step, we'll need a comprehensive test suite for unaligned bit reading, so probably tests is the first major aim :)

Runtime will need some attention (i.e. emulation of bit-level little-endian reads will need implementation of https://github.com/kaitai-io/kaitai_struct/issues/155, and that will need runtime fixes too), and, yeah, quite a few things will have to be fixed in the compiler.

CrustyAuklet commented 4 years ago

Just found this project and it is really awesome! I ran into the same situation as #504 though. Is there any progress on this (#576) issue?

I actually have a python library i use internally at work that is would be very trivial to use as a python runtime for non-aligned bit-based data stream. Instead of struct library I used the bitstruct library, not sure if that has been considered? I have a C++ implementation too, but it is for embedded use (non-owning statically sized buffers) and so would need to be modified to work with streams.

Would love to contribute, just want to see where everything is at since the last comment was ~7 months ago.

tan-wei commented 2 years ago

@CrustyAuklet Unfortunately, seems no progress on this now.

generalmimon commented 1 year ago

Related: https://github.com/kaitai-io/kaitai_struct/issues/207

aronwk-aaron commented 4 months ago

Just wanted to add to this. I ran into this issue when working on documenting data packets for LEGO Universe: https://github.com/lcdr/lu_formats/pull/6#issuecomment-2028490586 We used kaitai to document the file formats, since they are all byte aligned, but the networking side of it uses raknet bit-streams instead of some type of byte-stream.