Implemented Vector Normalization

Closes #98, or at least moves it forward. This is my first ever contribution to an open-source project so my apologies if I'm doing something very wrong here.

This is a first attempt at allowing you to use vectorized floats as if they are vecN's, with matching functions. As a test I implemented normalize which seems to work both on the WGPU and CUDA backends.

You can now write following code:

#[cube(launch_unchecked)]
fn norm_test<F: Float>(input: &Array<F>, output: &mut Array<F>) {
    if ABSOLUTE_POS < input.len() {
        output[ABSOLUTE_POS] = F::normalize(input[ABSOLUTE_POS]);
    }
}

And depending on the vectorization factor of your input array it will normalize 1D, 2D, 3D or 4D vectors.

Problems with xtask

I tried following the contributor guidelines as closely as possible, although I have a problem with the xtask command:

error: unrecognized subcommand 'pull-request-checks'

Float as vecN, or vecN as a newtype?

Is this the way forward, or do we want specialized types for vecN's?

Temporary variables and shadowing

Is my approach for the CUDA implementation correct? I need a temporary variable and the danger of shadowing exists. I made sure the code is in its own scope, but even then the danger exists that it will shadow a preexisting variable. This seems to be the first time that this occurs in the CUDA backend so the solution I came up with is a unique_variable_name! macro. Alternatively we could store all variable names in the compiler to make sure we definitely make a unique temporary value, but that sounds pretty complicated for potentially little benefit.

tracel-ai / cubecl