Reduce space overhead of small stim.Tableaus

Strilanc commented 2 years ago

A 30x30 tableau will use 65000 bits instead of 900 bits because the major axis is padded up to a multiple of 256. This padding is not actually necessary. Redesign simd_bit_table to allow an unpadded major axis, and rewrite any code that relies on that assumption. Maybe template the class to allow using smaller words?

danielbarter commented 2 years ago

OK, been exploring this issue a bit. Just to clarify, here is the issue (inlining is still working, i am compiling with -O0 here):

>>> import stim
>>> p = stim.PauliString("XZ")
>>> t = p.to_tableau()
Process 361469 stopped
* thread #1, name = 'python', stop reason = signal SIGINT
    frame #0: 0x00007ffff7e020a1 libpthread.so.0`raise(sig=<unavailable>) at raise.c:50:1
(lldb) up 1
frame #1: 0x00007ffff71bce28 stim.so`stim::simd_bit_table<256ul>::simd_bit_table(this=0x00007fffffffd200, min_bits_major=2, min_bits_minor=2) at simd_bit_table.inl:32:15
   29      : num_simd_words_major(min_bits_to_num_simd_words<W>(min_bits_major)),
   30        num_simd_words_minor(min_bits_to_num_simd_words<W>(min_bits_minor)),
   31        data(min_bits_to_num_bits_padded<W>(min_bits_minor) * min_bits_to_num_bits_padded<W>(min_bits_major)) {
-> 32      std::raise(SIGINT);
   33   }
   34   
   35   template <size_t W>
(lldb) p *this
(stim::simd_bit_table<256>) $0 = {
  num_simd_words_major = 1
  num_simd_words_minor = 1
  data = {
    num_simd_words = 256
     = {
      u8 = 0x0000555555d1dce0 ""
      u64 = 0x0000555555d1dce0
      ptr_simd = 0x0000555555d1dce0
    }
  }
}
(lldb) p min_bits_major
(size_t) $1 = 2

In theory, we could have data.num_simd_words = 2. This seems fixable assuming that we are OK maintaining the assumption that simd_bit_tables represent square matrices. In that case, we would just have the required padding rows so that columns can be encoded in simd_bits. All the relevent simd_bit_table methods would need to be updated. @Strilanc: am I missing anything dumb?

danielbarter commented 2 years ago

ah, i am missing something dumb. simd_bit_table does get used for non square data sometimes.

Strilanc commented 2 years ago

The simd bit table class is templated now, so smaller word sizes could be used for smaller tables, hidden behind the implementation details of a pybind tableau class. But ideally the bit table would be refactored so that the class knew the true desired number of columns and rows, and the padding of the minor axis was a slightly more internal (though not hidden) implementation detail.

danielbarter commented 2 years ago

OK, i need to ramble about transposing.

First, just some background to make sure we are on the same page. Lets say that a word has W bits and we have a matrix

A11   A12    …   A1n
A21   A22    …   A2n
 ⋮
An1   An2    …   Ann

Aij consists of W words (W * W bits), and it is stored in memory with stride n * W bits. In order to do a bitwise transpose, we need to first bitwise transpose each Aij and then swap Aij with Aji. We have inplace_transpose_square(bitword<W> *data, size_t stride) which handles the first part and std::swap which handles the second part.

Now, suppose that we are in a situation without major axis padding. Now we have a matrix which looks like this

A11   A12    …   A1n
A21   A22    …   A2n
 ⋮
Bn1   Bn2    …   Bnn

where Aij still consists of W words, but Bnj might have fewer. Now I am imagining that we proceed as follows: for the Aij blocks, transpose them inplace using inplace_transpose_square. Allocate an array C of W words on the stack (and zero it out). Swaps that don't involve the last row can happen in place as before. For swapping Bnj and Ajn, copy Bnj into C. Call inplace_transpose_square on C (with stride 0). Swap C and Ajn, and then copy the required rows from C back into Bnj. For Bnn we just copy it to C, transpose it and then copy the required rows back into Bnn.

I think that all makes sense, but just wanted to write it out to be sure.

Strilanc commented 2 years ago

I would say it's illegal to call inplace_tramspose on a rectangular table, if there's no longer major axis padding, because it can increase the amount of space required and therefore is not inplace.

In general I find it quite tricky to write the transpose methods and rely heavily on randomized unit tests to ensure correctness.

danielbarter commented 2 years ago

Yeah definitely don't want to call inplace_transpose_square on something which isn't square. I think I get around this by allocating the square (and aligned) C on the stack, and only transposing the Bnjs after they have been copied into C.

quantumlib / Stim

Reduce space overhead of small stim.Tableaus #169