tskit-dev / tskit

Population-scale genomics
MIT License
155 stars 73 forks source link

Refactor tsk_bit_array_* structures #2834

Open jeromekelleher opened 1 year ago

jeromekelleher commented 1 year ago

I'm a bit confused about what the tsk_bit_array_ struct really is. I thought it was a straightforward bit-set implementation, but there's the idea of rows which I'm confused by. Is it a 2D bit set? So, a list of N independent bit sets? If so, I think we should change the API to be more explicit about this, and make operations work on the rows and bits rather than using the get_row operation to get a row, and then having methods which just work on a single row (like intersect, substract, etc).

Also, I think it would be clearer if we used set theoretic operations through out, so add -> union etc.

So, to be clear, we'd have operations like

tsk_bit_array_set_bit(self, row, bit)
tsk_bit_array_contains(self, row, bit)

etc

What do you thing @lkirk?

lkirk commented 1 year ago

Yes, indeed these arrays are a list of N independent bit sets. I actually like the word "bit set" more than "bit array" as well. In addition to the refactor, maybe renaming things to tsk_bitset_* is a good idea as well?

I like your suggestions, they will simplify the calling code quite a bit.