google / emboss

Emboss is a tool for generating code that reads and writes binary data structures.
Apache License 2.0
68 stars 21 forks source link

Built in function to count the number of bits set #98

Open fsareshwala opened 10 months ago

fsareshwala commented 10 months ago

Suppose you have a bits type like the following:

bits Features:
  0     [+1] Flag foo
  $next [+1] Flag bar
  $next [+1] Flag baz
  $next [+1] Flag abc
  $next [+1] Flag xyz

A packet contains a bitset like the one above with the number of flags enabled determining the size of a variable length array.

struct Data:
  0     [+1] Features features

  let num_features = (features.foo ? 1 : 0) + (features.bar ? 1 : 0) + ...
  $next [+num_features] UInt feature_values

The calculation of num_features can become quite a bit of toil. We have to physically write out the bits we wish to check. It would be easy to forget to add here if another needed to be checked in the future. It would be much easier to simply call out to a built in function which can do the counting for us.

Something like this would be so much better:

struct Data:
  0     [+1] Features features

  let num_features = bits_set(features)
  $next [+num_features] UInt feature_values
reventlov commented 9 months ago

Are you asking for a POPCOUNT function + a way to reinterpret_cast a bits to UInt, or something that would actually analyze the bits and only pay attention to known Flag fields?

fsareshwala commented 9 months ago

I was asking more for a std::popcount type approach.

reventlov commented 9 months ago

Adding $popcount should be straightforward: a 1-adic function that takes an integer argument. Min value is 0, max value is ciel(log2(input max val)), +1 if the input can be negative. Would need some #ifs in the C++ implementation to use std::popcount() or __builtin_popcount() where available, and fallback to a C++11 native implementation otherwise.

Internally, a bits field and a UInt field (also Flag, Int, Float, etc.) are treated similarly, so some equivalent of reinterpret_cast that basically says "pretend this field is this other type and read it out" should be doable. We would have to figure out a readable syntax -- there aren't many languages that let you cast pointers that way, so $reinterpret_cast<UInt>(field) might be the least-bad option.