openxla / stablehlo

Backward compatible ML compute opset inspired by HLO/MHLO
Apache License 2.0
412 stars 113 forks source link

Wrong `stablehlo.add` semantics for booleans in docs #2630

Open mofeing opened 3 days ago

mofeing commented 3 days ago

What happened?

StableHLO spec says that stablehlo.add op on booleans should behave as "logical OR".

image

but that's not the logical behavior nor the implementation (which is "logical XOR").

Steps to reproduce your issue

Using Reactant (which is a Julia frontend to Enzyme + XLA), we can see that it behaves as a logical XOR:

julia> a = ConcreteRArray([false, false, true, true])
4-element ConcreteRArray{Bool, 1}:
 0
 0
 1
 1

julia> b = ConcreteRArray([false, true, false, true])
4-element ConcreteRArray{Bool, 1}:
 0
 1
 0
 1

julia> @jit Ops.add(a, b)
4-element ConcreteRArray{Bool, 1}:
 0
 1
 1
 0

Version information

No response

mofeing commented 3 days ago

CC @wsmoses

ghpvnist commented 3 days ago

Hi @mofeing, the spec is reflective of the behavior that XLA follows. I'm not familiar with Reactant -- is it using StableHLO apis that incorrectly results in the XOR behavior?

GleasonK commented 3 days ago

Sample file:

$ cat /tmp/t.mlir
func.func @main() -> tensor<4xi1> {
  %c = stablehlo.constant dense<[false, false, true, true]> : tensor<4xi1>
  %c_0 = stablehlo.constant dense<[false, true, false, true]> : tensor<4xi1>
  %0 = stablehlo.add %c, %c_0 : tensor<4xi1>
  return %0 : tensor<4xi1>
}

Using StableHLO reference interpreter:

$ stablehlo-translate --interpret /tmp/t.mlir 
tensor<4xi1> {
  [false, true, true, true]
}

Using HLO Runner:

$ run_hlo_module /tmp/t.mlir  --platform=cpu --input_format=stablehlo --print_literals
** Result with test runner Host **
pred[4] {0, 1, 1, 0}
Running HLO module with runner Interpreter...
... compiled and ran in 0.00130981s.
execution time for runner Interpreter: 0.000424s.

** Result with reference runner Interpreter **
pred[4] {0, 1, 1, 0}

Indeed looks like the XLA behavior is XOR, perhaps a bug in the spec.

Numpy behavior for completeness, follows StableHLO spec, XLA is different:

>>> import numpy as np
>>> x = np.array([False, False, True, True])
>>> y = np.array([False, True, False, True])
>>> np.add(x,y)
array([False,  True,  True,  True])
mofeing commented 3 days ago

i see, so stablehlo.add on booleans is a "saturating add" rather than a "overflowing add". I always considered "overflowing add" (logical XOR) to be the correct behavior because that's what you use when implementing an integer adder on electronics.

anyway, yes, there seems to be a mismatch between XLA and StableHLO spec for this op. should I move the issue to XLA then?

GleasonK commented 2 days ago

My guess is we'll fix the StableHLO spec. We intended the spec to capture XLA behavior and divergences were a bug. It seems like XLA behavior has always been truncation. It may be that the fix here lives in the JAX layer in how it lowers boolean addition, to match numpy semantics