Optimize casts where the source/dest types are the same by using optimized tensor copy operations instead of Tensor::map, and skipping the copy altogether if the Cast op is run in-place
Refactor tests to make adding new cases easier, and add some tests for i8/u8 conversions
Follow up to https://github.com/robertknight/rten/pull/387.
Tensor::map
, and skipping the copy altogether if the Cast op is run in-place