microsoft / qsharp-language

Official repository for design of the quantum programming language Q# and its core libraries
MIT License
233 stars 54 forks source link

QIR spec: `Pauli` data type values should be listed as signed or adjusted to `i8` #109

Closed swernli closed 2 years ago

swernli commented 2 years ago

Right now the Pauli data type is explicitly listed as i2 and having the values 0, 1, 2, and 3 (see https://github.com/microsoft/qsharp-language/blob/9cb9b91a49341d691c4fdc7d3f464453cc8eae36/Specifications/QIR/Data-Types.md#simple-types). This is not technically true, as LLVM only supports signed integers, so an i2 can hold the values 0, 1, -1, -2. When attempting interop with other languages such as C, C++, or Rust, if the Pauli type is listed as the smallest addressable type in that language, either 8 bit signed integer or char, the compiler will automatically represent the binary value 11 stored in an i2 as -1 because it is "smart" enough to preserve the sign bit when extending into the larger type. To make it possible to implement functions that accept a Pauli QIR type without having to do extra handling, we should either update the list of integer values to 0, 1, -1, -2 (and update the corresponding generation code in the qsharp-compiler and implementation in qsharp-runtime) or consider updating the type to i8 so it matches the commonly available minimum addressable type in other languages.

@alan-geller, @bettinaheim, @kuzminrobin FYI.

cgranade commented 2 years ago

I'd expect that rather large arrays of Pauli would be somewhat common (e.g.: in Clifford tracking for QEC), such that requiring promotion to i8 feels a bit suboptimal. Is there something we could do to specialize that arrays of Pauli can be stored as bitarrays or similar, perhaps?

swernli commented 2 years ago

@kuzminrobin pointed out the spec already shows the global Pauli constants as negative numbers:

@PauliI = constant i2 0
@PauliX = constant i2 1
@PauliY = constant i2 -1 ; The value 3 (binary 11) is displayed as a 2-bit signed value of -1 (binary 11).
@PauliZ = constant i2 -2 ; The value 2 (binary 10) is displayed as a 2-bit signed value of -2 (binary 10).
So I guess my ask here is that the table entry above that in the spec be updated to either use binary (00, 01, 10, 11) or explicitly signed values (0, 1, -1, -2) instead of the unsigned values it lists today: Type LLVM Representation Comments
Pauli %Pauli = i2 0 is PauliI, 1 is PauliX, 3 is PauliY, and 2 is PauliZ.

Then, when implementing in a language like C/C++, the Pauli type can be treated as int8_t and LLVM will automatically do the right thing to do a signed zero-extension on the values.

swernli commented 2 years ago

Working on some changes related to this, I was able to get this working as is, so I don't think this is as important of a fix anymore. The spec can be left as-is. I'm not sure why it didn't work before, but now the 0, 1, 2, 3 values are working fine.