Convert int64 to long long for antares cuda backend

I merged this pr into my dev branch, and it partly solve the problem of antares kernel, but it will introduce another issue :

extern "C" int kernel_entry(int32_t* Parameter_0_0_0, long long** Result_2_0_0)
{
// kernel_entry_init
 // name=result
ArgMax_int32_t_int64_t_cuda_ArgMax_1_0(0, 0, Parameter_0_0_0, tensor_1, ArgMax_1_0_temp0);
 // name=Result_2_0
Result_int64_t_int64_t_cuda_lib_Result_2_0(tensor_1, Result_2_0_0);
return 0;
}

since tensor1 and Result_2_0_0 was declared with long long, but the kernel Result_int64_t_int64_t_cuda_lib_Result_2_0 expects two int64_t datatypes.

// Node name:   Result_2_0
// Description: Result
// Input:
//  - name: tensor_1    type: int64_t   shape: Shape{2}
// Output:
//  - name: Result_2_0_0    type: int64_t   shape: Shape{2}
void Result_int64_t_int64_t_cuda_lib_Result_2_0(int64_t* input0, int64_t** output0)
{
    *output0 = input0;
}
// 0: CUDA_GPU; 1: ROCM_GPU; 2: GENERIC_CPU; 3: HLSL; 4: GraphCore; 5: UNKNOWN
int get_device_type()
{
    return 0;
}

microsoft / nnfusion

Convert int64 to long long for antares cuda backend #485