Fix bug of fp16 converter about the cast node and topology in sub-graph

There have a lot of bugs in the float16 converter after ORT 1.17 released, because the optimization rule changed in ORT. And customers also raised a lot of fp16 converter issue, which to be considered as corner case, but also need to fix. The original implementation is hard to understand, less maintainable. So I rewrite the function convert_float_to_float16, the logic is simple to understand, and easy to modify each components. Also added 3 new features:

remove identity node when it is unnecessary.
remove cast node when it is unnecessary.
sort topology if ORT complain the topology is not in order, the input of node 'x' is not output of any previous node.

microsoft / onnxconverter-common

Fix bug of fp16 converter about the cast node and topology in sub-graph #291