This PR addresses #155 where bifrost.reduce() throws a BF_STATUS_DEVICE_ERROR when given a slice. This is done my making sure that the input data/slice is aligned along a vector-size boundary before trying to launch a vectorized kernel. If that is not true then the looped kernel is used instead.
This PR addresses #155 where
bifrost.reduce()
throws a BF_STATUS_DEVICE_ERROR when given a slice. This is done my making sure that the input data/slice is aligned along a vector-size boundary before trying to launch a vectorized kernel. If that is not true then the looped kernel is used instead.