Open dimakuv opened 4 months ago
Comments from an expert:
XSAVEC is only a drop-in replacement for XSAVE if XSAVEC and XRSTOR are the only instructions that read/write to that XSAVE area. There are some scenarios where software (e.g., an exception handler) may need to examine and possibly modify the XSAVE area. In these scenarios, XSAVEC (and also XSAVEOPT) cannot be used as a drop-in replacement for XSAVE because the format/layout of the XSAVE area will be different from XSAVE.
This statement is not quite accurate: “XSAVEOPT checks whether the XSAVE state was modified, by consulting the XSAVE area (restored previously with XRSTOR).” When XRSTOR is executed, it clears the processor’s “modified” tracker bits. As soon as some extended state is updated (e.g., an AVX register), the associated “modified” bit is set. Then when the next XSAVEOPT is executed to write to the same XSAVE area used by the most recent XRSTOR, XSAVEOPT consults with the tracker bits to determine what state has been modified, and thus what state must be updated in the XSAVE area. This behavior is described in Section 13.9 of the SDM, volume 1.
Description of the feature
Gramine-SGX currently uses the XSAVE instruction on EEXIT and XRSTOR on EENTER:
There is anecdotal evidence that optimized instructions XSAVEC and XSAVEOPT can benefit workloads that perform many EEXIT/EENTER (e.g., workloads that perform many OCALLs).
Interestingly, Intel SGX PSW already uses XSAVEC instead of XSAVE, assuming that XSAVEC is always available on SGX-supporting processors: https://github.com/intel/linux-sgx/blob/c1ceb4fe146e0feb1097dee81c7e89925443e43c/psw/urts/se_detect.cpp#L80-L81
Two more notes:
Why Gramine should implement it?
XSAVE is used on every EEXIT. Each OCALL leads to an EEXIT, as well as each return from an interrupt handler. So the overhead of saving this state can be significant.
XSAVEC and XSAVEOPT can reduce this overhead, leading to overall performance improvement in OCALL-heavy workloads, e.g. Redis.