pytorch / torcharrow

High performance model preprocessing library on PyTorch
https://pytorch.org/torcharrow/beta/index.html
BSD 3-Clause "New" or "Revised" License
646 stars 79 forks source link

Consolidation of the type of operations #26

Open OswinC opened 2 years ago

OswinC commented 2 years ago

When it comes to the type system of numerical binary operations (including the return type of the operation and the type promotion rule), there are still some discrepancy between TA's behavior and PyTorch's, which we should try our best to follow that of PyTorch. For example, when the binary op of a boolean tensor is called with an integer scalar (e.g. boolTensor + 3), PyTorch promotes it to integer, whereas when it's an integer tensor or float tensor PyTorch always honor the type of the tensor ([PR#] is an example for how to fix this). As we are to onboard more users onto TA we should catch all such type behavior discrepancies and fix them or document them to have the behaviors finalized as early as possible.

Another example is PyTorch uses float32 instead of float64 as the default float type, which we already had work to give a stab to match this behavior, but we may still need a thorough testing to find places we missed.

High level this problem can be approached in two steps: Writing comprehensive tests to validate type behaviors, covering, for example, all type promotion cases. The intension is to find all discrepancies between TA and PyTorch Fix as many of the discrepancies as possible and document what cases are treated differently in TA and PyTorch

wenleix commented 2 years ago

Yeah. TorchArrow's expected arithmetic behavior is PyTorch, both type promotion and semantic (e.g. underflow/overflow/division by 0 behavior).