define <4 x i32> @fptosi_2f16_to_4i32(<2 x half> %a) {
%cvt = fptosi <2 x half> %a to <2 x i32>
%ext = shufflevector <2 x i32> %cvt, <2 x i32> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x i32> %ext
}
Latest trunk now gives the above assembly, ideally we would only have a single vcvtph2ps node, and avoid all the shuffles which are just trying to move elements into the lowest element:
llc -mcpu=x86-64-v3
Latest trunk now gives the above assembly, ideally we would only have a single vcvtph2ps node, and avoid all the shuffles which are just trying to move elements into the lowest element: