[Fix/Feat] Correct the fp16 inference of resnet50.onnx

refer to this issue #431

uncomment the dot fp16 cuda codegen code, and it worked.
fix the incorrect data read progress of fp16 onnx model.

resnet50-fp16.onnx test passed, the output is: -3.080564e-01 7.984395e-02 -1.190038e+00 -1.483669e+00 -5.135902e-01 3.682717e-01 -2.163917e+00 -8.705018e-01 -1.881244e+00 -1.607677e-01 .. (size = 64000, ends with 2.435706e-01);

the output of onnxruntime is : [-0.3066 0.0791 -1.19 -1.487 -0.5127 0.371 -2.168 -0.874 -1.883 -0.1605] ...(size= 64000 end with 0.2446 )

microsoft / nnfusion

[Fix/Feat] Correct the fp16 inference of resnet50.onnx #433