`vector_algorithms.cpp`: `minmax` for 64-bit elements: replace ugly x86 workaround with a nice one

AlexGuteniev commented 1 month ago

This: piece https://github.com/microsoft/STL/blob/8dc4faadafb52e3e0a627e046b41258032d9bc6a/stl/src/vector_algorithms.cpp#L1116-L1123 works around the oddity of not having _mm_cvtsi128_si64 on 32-bit x86

It has been problematic:

An internal bug was reported; fixed by #2821
A compiler bug was caught, it blocked #3922

I have discovered a nicer workaround!

If we spill the reg into the stack, the spill will optimize away. On 32-bit with at least /arch:SSE2 it even produces better code than the existing workaround. Demo: https://godbolt.org/z/ErGWz8GYT

It still does the actual spill on /arch:IA32. But given that this path is executed only once per function call (there are no intermediate reductions for 64-bit elements), and there's a plan to lift to /arch:SSE2, I think that's fine.

StephanTLavavej commented 1 month ago

Thanks, this is great! :heart_eyes_cat: I pushed further changes to centralize the logic, please meow if you have concerns.

AlexGuteniev commented 1 month ago

This centralization is already done in #4659 , there would just be more conflicts, after doing again here

AlexGuteniev commented 1 month ago

Oh, you also did the implementation of _Get_any via _Get_v_pos. Didn't think about it, bu seems it will be no worse, as 0 is expected to constant propagate.

StephanTLavavej commented 1 month ago

I'm drowning in PRs right now, so I think I'm going to have to clear out the backlog in multiple batches (in between investigating a non-STL bug that I can't weasel out of investigating forever). I'd like to land this PR first, then resolve conflicts in #4659.

StephanTLavavej commented 1 month ago

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej commented 1 month ago

Thanks for noticing how to make this code way more elegant! :magic_wand: :heart_eyes_cat: :green_heart:

microsoft / STL

`vector_algorithms.cpp`: `minmax` for 64-bit elements: replace ugly x86 workaround with a nice one #4661