There are a number of cases where we might want to unfold memory loads from instructions, but keep the memory load/store in the same basic block:
- [ ] Instructions that are notably slower in their folded form (#72530, #14640)
- [ ] optsize/minsize builds - Loading a vector constant that can be compressed by X86FixupVectorConstants
- [ ] RMW scalar arithmetic on many Intel targets (#40176)
I imagine this being similar to MachineLICM; driven by register pressure and scheduler throughput/latency, but within a basicblock.
There are a number of cases where we might want to unfold memory loads from instructions, but keep the memory load/store in the same basic block:
I imagine this being similar to MachineLICM; driven by register pressure and scheduler throughput/latency, but within a basicblock.