Zebartin commented 2 years ago

Description of the problem

I tried running Gramine on a virtual machine but failed to run the helloworld example.

Steps to reproduce

Steps 1 to 3 are based on this article, part of which is somewhat deprecated though.

Build QEMU with SGX support.
Create a virtual machine using virt-manager and install Ubuntu 20.04 in it.

Configure the VM to enable SGX support. I am using the following configuration, which allocates the VM with 512M EPC:

<qemu:commandline>
<qemu:arg value='-cpu'/>
<qemu:arg value='host,+sgx,+sgxlc'/>
<qemu:arg value='-object'/>
<qemu:arg value='memory-backend-epc,id=mem1,size=512M,prealloc=on'/>
<qemu:arg value='-M'/>
<qemu:arg value='sgx-epc.0.memdev=mem1,sgx-epc.0.node=0'/>
<qemu:arg value='--enable-kvm'/>
</qemu:commandline>

Install SGX SDK & PSW in the VM.
Install Gramine and run helloworld.

Expected results

Hello, world

Actual results

XXX@ubuntu-worker:~/gramine/CI-Examples/helloworld$ is-sgx-available
SGX supported by CPU: true
SGX1 (ECREATE, EENTER, ...): true
SGX2 (EAUG, EACCEPT, EMODPR, ...): true
Flexible Launch Control (IA32_SGXPUBKEYHASH{0..3} MSRs): true
SGX extensions for virtualizers (EINCVIRTCHILD, EDECVIRTCHILD, ESETCONTEXT): false
Extensions for concurrent memory management (ETRACKC, ELDBC, ELDUC, ERDINFO): false
CET enclave attributes support (See Table 37-5 in the SDM): false
Key separation and sharing (KSS) support (CONFIGID, CONFIGSVN, ISVEXTPRODID, ISVFAMILYID report fields): true
Max enclave size (32-bit): 0x80000000
Max enclave size (64-bit): 0x100000000000000
EPC size: 0x20000000
SGX driver loaded: true
AESMD installed: true
SGX PSW/libsgx installed: true

XXX@ubuntu-worker:~/gramine/CI-Examples/helloworld$ gramine-sgx helloworld
Gramine is starting. Parsing TOML manifest file, this may take some time...
debug: Gramine parsed TOML manifest file successfully
debug: Token file: helloworld.token
debug: Read dummy DCAP token
error: ECREATE failed in enclave creation ioctl (errno = -5)
error: Creating enclave failed: -5
error: load_enclave() failed with error -5

Gramine commit hash

Gramine was installed via Ubuntu's apt:

XXX@ubuntu-worker:~$ apt list gramine
Listing... Done
gramine/focal,now 1.3.1-1~ubuntu0.18.04 amd64 [installed]

dimakuv commented 2 years ago

What is the version of the Linux kernel inside your VM?
1. If the Linux kernel is less than version 5.11, then you're supposed to install gramine-dcap package.
2. If the Linux kernel is 5.11 or higher, then you're supposed to install gramine package (what you currently have).

Zebartin commented 2 years ago

What is the version of the Linux kernel inside your VM?

If the Linux kernel is less than version 5.11, then you're supposed to install gramine-dcap package.

If the Linux kernel is 5.11 or higher, then you're supposed to install gramine package (what you currently have).

Actually I have tried two different distributions, CentOS 8 and Ubuntu 20.04. In the CentOS one, I built and reinstalled Linux kernel of version 5.18.15, turning SGX feature on, and the kernel version of the Ubuntu one is 5.15.0.

Both of them have /dev/sgx, /dev/sgx_enclave and /dev/sgx_provision, while the Ubuntu one has /dev/sgx_vepc and the CentOS one does not. Both of them have produced the same result as mentioned above.

dimakuv commented 2 years ago

Hm. Can you try to install Intel SGX SDK and run some examples from there? For example, https://github.com/intel/linux-sgx/tree/master/SampleCode/SampleEnclave.

I currently don't understand what's going wrong with your machine.

Zebartin commented 2 years ago

Hm. Can you try to install Intel SGX SDK and run some examples from there? For example, https://github.com/intel/linux-sgx/tree/master/SampleCode/SampleEnclave.

I currently don't understand what's going wrong with your machine.

Thanks for your reply.

I try running some examples there and it seems all right to me. For SampleEnclave, make with hardware mode, debug or pre-release build and no mitigation, as well as with simulation mode and whatever build, work well. They produced similar results, one of which is as follows:

XXX@ubuntu-worker:~/linux-sgx/SampleCode/SampleEnclave$ make SGX_PRERELEASE=1 SGX_DEBUG=0
make[1]: Entering directory '/home/XXX/linux-sgx/SampleCode/SampleEnclave'
GEN  =>  App/Enclave_u.h
CC   <=  App/Enclave_u.c
CXX  <=  App/App.cpp
CXX  <=  App/Edger8rSyntax/Types.cpp
CXX  <=  App/Edger8rSyntax/Pointers.cpp
CXX  <=  App/Edger8rSyntax/Arrays.cpp
CXX  <=  App/Edger8rSyntax/Functions.cpp
CXX  <=  App/TrustedLibrary/Thread.cpp
CXX  <=  App/TrustedLibrary/Libcxx.cpp
CXX  <=  App/TrustedLibrary/Libc.cpp
LINK =>  app
GEN  =>  Enclave/Enclave_t.h
CC   <=  Enclave/Enclave_t.c
CXX  <=  Enclave/Edger8rSyntax/Arrays.cpp
CXX  <=  Enclave/Edger8rSyntax/Functions.cpp
Enclave/Edger8rSyntax/Pointers.cpp: In function ‘void ecall_pointer_string_const(const char*)’:
Enclave/Edger8rSyntax/Pointers.cpp:174:12: warning: ‘char* strncpy(char*, const char*, size_t)’ output truncated before terminating nul copying as many bytes from a string as its length [-Wstringop-truncation]
  174 |     strncpy(temp, str, strlen(str));
      |     ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
CXX  <=  Enclave/Edger8rSyntax/Pointers.cpp
CXX  <=  Enclave/Edger8rSyntax/Types.cpp
CXX  <=  Enclave/Enclave.cpp
CXX  <=  Enclave/TrustedLibrary/Libc.cpp
CXX  <=  Enclave/TrustedLibrary/Libcxx.cpp
CXX  <=  Enclave/TrustedLibrary/Thread.cpp
LINK =>  enclave.so
<EnclaveConfiguration>
    <ProdID>0</ProdID>
    <ISVSVN>0</ISVSVN>
    <StackMaxSize>0x40000</StackMaxSize>
    <HeapMaxSize>0x100000</HeapMaxSize>
    <TCSNum>10</TCSNum>
    <TCSPolicy>1</TCSPolicy>
    <!-- Recommend changing 'DisableDebug' to 1 to make the enclave undebuggable for enclave release -->
    <DisableDebug>0</DisableDebug>
    <MiscSelect>0</MiscSelect>
    <MiscMask>0xFFFFFFFF</MiscMask>
</EnclaveConfiguration>
tcs_num 10, tcs_max_num 10, tcs_min_pool 1
The required memory is 4055040B.
The required memory is 0x3de000, 3960 KB.
Succeed.
SIGN =>  enclave.signed.so
The project has been built in pre-release hardware mode.
make[1]: Leaving directory '/home/XXX/linux-sgx/SampleCode/SampleEnclave'

XXX@ubuntu-worker:~/linux-sgx/SampleCode/SampleEnclave$ ./app
Checksum(0x0x7fff24f835b0, 100) = 0xfffd4143
Info: executing thread synchronization, please wait...
Info: SampleEnclave successfully returned.
Enter a character before exit ...

When it comes to mitigation, make does not work right, prompting errors like as: unrecognized option '-mlfence-after-load=yes', which I suppose is not very important for the issue right now.

dimakuv commented 2 years ago

Interesting.

Error -5 is EIO. This error happens when the ECREATE instruction results in an error, see here.

This seems to only happen on exceptions during ECREATE instruction.

Out of all the possible reasons for exceptions, I can only see this one happenning: If SECS.SSAFRAMESIZE is insufficient.

May I ask what Intel CPU are you using? What is the output of cpuid | grep "SAVE area"? Currently Gramine has a hard-coded value of SSAFRAMESIZE = 4 pages, which is equal to 16KB -- enough for any Intel CPU I'm aware of.

Zebartin commented 2 years ago

Interesting.

Error -5 is EIO. This error happens when the ECREATE instruction results in an error, see here.

This seems to only happen on exceptions during ECREATE instruction.

Out of all the possible reasons for exceptions, I can only see this one happenning: If SECS.SSAFRAMESIZE is insufficient.

May I ask what Intel CPU are you using? What is the output of cpuid | grep "SAVE area"? Currently Gramine has a hard-coded value of SSAFRAMESIZE = 4 pages, which is equal to 16KB -- enough for any Intel CPU I'm aware of.

The output about CPU in th VM is as follows:

XXX@ubuntu-worker:~$ cat /proc/cpuinfo  | grep 'model name'
model name      : Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz
model name      : Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz
model name      : Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz
model name      : Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz

XXX@ubuntu-worker:~$ cpuid | grep "SAVE area"
      SAVE area size in bytes                     = 0x00000988 (2440)
      SAVE area size in bytes                     = 0x00000988 (2440)
      SAVE area size in bytes                     = 0x00000988 (2440)
      SAVE area size in bytes                     = 0x00000988 (2440)

The CPU info of the host machine is almost the same, but with more cores. However, Gramine runs perfectly well on the host.

dimakuv commented 2 years ago

@Zebartin Could you also show the output of is-sgx-available on the host?

dimakuv commented 2 years ago

I wonder if @mythi can provide any insights on running a VM that supports Intel SGX. I currently don't understand what is going wrong with EENTER.

Zebartin commented 2 years ago

@Zebartin Could you also show the output of is-sgx-available on the host?

The output of is-sgx-available on the host is almost the same as that on the guest VM, except for EPC size.

$ is-sgx-available
SGX supported by CPU: true
SGX1 (ECREATE, EENTER, ...): true
SGX2 (EAUG, EACCEPT, EMODPR, ...): true
Flexible Launch Control (IA32_SGXPUBKEYHASH{0..3} MSRs): true
SGX extensions for virtualizers (EINCVIRTCHILD, EDECVIRTCHILD, ESETCONTEXT): false
Extensions for concurrent memory management (ETRACKC, ELDBC, ELDUC, ERDINFO): false
CET enclave attributes support (See Table 37-5 in the SDM): false
Key separation and sharing (KSS) support (CONFIGID, CONFIGSVN, ISVEXTPRODID, ISVFAMILYID report fields): true
Max enclave size (32-bit): 0x80000000
Max enclave size (64-bit): 0x100000000000000
EPC size: 0x1fc800000
SGX driver loaded: true
AESMD installed: true
SGX PSW/libsgx installed: true

mythi commented 2 years ago

I wonder if @mythi can provide any insights on running a VM that supports Intel SGX. I currently don't understand what is going wrong with EENTER.

I haven't seen any problems with it. @Zebartin what version tag of qemu you used from that Intel repository? I guess it's worth pointing out that upstream Qemu has supported SGX since 6.2. I've only used Qemu 6.2+ from the Distros (e.g., Ubuntu 22.04 has it). Do you see anything strange in the guest's dmesg after this error?

ScottR-Intel commented 2 years ago

I'm in agreement with Mikko... I haven't seen any issues using SGX in a QEMU 6.2+ guest VM. Not sure what's going on here.

One question I didn't see asked... What is the host OS distro and kernel version? I assume it has the SGX kernel module and the /dev/sgx_vepc device?

Zebartin commented 2 years ago

I haven't seen any problems with it. @Zebartin what version tag of qemu you used from that Intel repository? I guess it's worth pointing out that upstream Qemu has supported SGX since 6.2. I've only used Qemu 6.2+ from the Distros (e.g., Ubuntu 22.04 has it). Do you see anything strange in the guest's dmesg after this error?

I am aware of that. I am using Qemu built from the official gitlab repository, with options --with-git-submodules=ignore --enable-kvm --enable-vnc --enable-curses --enable-spice --enable-gtk --target-list=x86_64-softmmu --disable-werror --enable-usb-redir, as indicated in the intel article mentioned above.

I can not figure out anything strange in dmesg.

XXX@ubuntu-worker:~$ dmesg | grep sgx
[    0.626037] sgx: EPC section 0x180000000-0x19fffffff
[    0.630174] sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.

Zebartin commented 2 years ago

I'm in agreement with Mikko... I haven't seen any issues using SGX in a QEMU 6.2+ guest VM. Not sure what's going on here.

One question I didn't see asked... What is the host OS distro and kernel version? I assume it has the SGX kernel module and the /dev/sgx_vepc device?

The host OS distro is CentOS 8. The kernel version is 5.18 and was built with SGX and SGX virtualization enabled, so there is /dev/sgx_vepc.

Maybe I should try reinstalling my Qemu... I did see others like you using Qemu with SGX normally, I am confused now.

And it strikes me that I had to modify some codes of Qemu in order to make it run, according to this. Did you do so?

ScottR-Intel commented 2 years ago

If you want to use libvirt with qemu, yes there are some changes that have to be made like this. That part is still being worked. But, if you use qemu 6.2+ only and directly (no libvirt), I have personally verified this works.

boryspoplawski commented 2 years ago

I've also used Qemu build manually and everything worked as expected. I did not have to modify anything sgx related to make it work. Btw you can try what I've used from here: https://github.com/gramineproject/device-testing-tools/tree/master/qemu

Update: I uesed qemu directly without libvirt

mythi commented 2 years ago

And it strikes me that I had to modify some codes of Qemu in order to make it run, according to this. Did you do so?

AFAIK, it was libvirt that was failing. My setup is using vanilla Ubuntu 22.04 but with libvirt 0.8.6 installed from Kinetic repo.

lejunzhu commented 2 years ago

Maybe a silly question, but have you tried "make clean" then "make SGX=1" in the VM? If the .token file has some flag mismatch, it will also cause error -5.

Zebartin commented 2 years ago

It turns out that Qemu 7.1.0 is the problem.

I tried 6.2.0 as @boryspoplawski suggested, and tried 7.0.0 also, they all work well with Gramine. I also tried integrating Qemu 7.0.0 with libvirt, and it works fine. The ioctl error -5 comes up only when I use Qemu 7.1.0 built from the official download page or the latest version of git repo. I can not figure out whether it is Qemu's fault or not.

Thank you all for your suggestions and guidances!

lejunzhu commented 2 years ago

Indeed. With Qemu 7.1 on an ICX, Gramine works only when I remove this line: https://github.com/gramineproject/gramine/blob/d5599d52d2076006d31493da00f096099298aaf0/python/graminelibos/sgx_get_token.py#L40 If either avx or avx512 bit is set in the token, gramine will fail with error -5. But with Qemu 6.2 and the same HW + image, it works fine. I'm not familiar with Qemu and don't know how to fix it. This is the command line I used to launch Qemu 7.1: ./qemu-system-x86_64 -enable-kvm -cpu host,+sgx-provisionkey -object memory-backend-epc,id=mem1,size=8G,prealloc=on -M sgx-epc.0.memdev=mem1,sgx-epc.0.node=0 -smp 8 -m 16384 -drive ... -netdev ...

dimakuv commented 2 years ago

Looks like the issue is solved on the Gramine side (solution: do not use the latest QEMU v7.1).

It would be interesting to debug why QEMU started failing, and how is this related to AVX/AVX512 XFRM bits. @lejunzhu Do we know anyone who is working on SGX + QEMU/KVM development? It would be good to attract their attention to this bug.

mythi commented 2 years ago

Looks like the issue is solved on the Gramine side (solution: do not use the latest QEMU v7.1).

would this also be an additional argument for not forcing .token creation on DCAP platforms?

It would be interesting to debug why QEMU started failing, and how is this related to AVX/AVX512 XFRM bits. @lejunzhu Do we know anyone who is working on SGX + QEMU/KVM development? It would be good to attract their attention to this bug.

But I agree it'd make sense to understand what's going on. We could submit an issue to https://github.com/intel/qemu-sgx

mkow commented 2 years ago

Looks like the issue is solved on the Gramine side (solution: do not use the latest QEMU v7.1).

would this also be an additional argument for not forcing .token creation on DCAP platforms?

I think that the fact that a change in sgx_get_token.py makes something work implies they are using EPID? On DCAP we just ignore these tokens?

dimakuv commented 2 years ago

On DCAP we just ignore these tokens?

No. We do create a dummy token: https://github.com/gramineproject/gramine/blob/d5599d52d2076006d31493da00f096099298aaf0/python/graminelibos/sgx_get_token.py#L148-L149

But the attributes (SECS.ATTRIBUTES.XFRM) are unconditionally taken from the host system's available CPU features: https://github.com/gramineproject/gramine/blob/d5599d52d2076006d31493da00f096099298aaf0/python/graminelibos/sgx_get_token.py#L123

So both EPID and DCAP use populate the .token file with host CPU features, which is used by Gramine at startup.

mkow commented 2 years ago

No. We do create a dummy token:

I know we create it, but I thought that "dummy" means that we don't use it later (except to just preserve the interface).

I don't see how we use it on DCAP, maybe only to initialize starting attributes of an enclave? Even if, why do they differ from the ones we'd choose when taking them from the host?

lejunzhu commented 2 years ago

Looks like the issue is solved on the Gramine side (solution: do not use the latest QEMU v7.1).

It would be interesting to debug why QEMU started failing, and how is this related to AVX/AVX512 XFRM bits. @lejunzhu Do we know anyone who is working on SGX + QEMU/KVM development? It would be good to attract their attention to this bug.

Yes, I will try to contact them.

mythi commented 2 years ago

No. We do create a dummy token:

I know we create it, but I thought that "dummy" means that we don't use it later (except to just preserve the interface).

This was also my question and I was referring to #363.

dimakuv commented 2 years ago

I know we create it, but I thought that "dummy" means that we don't use it later (except to just preserve the interface).

Not true. We use the "dummy" token (.token file), though not for any DCAP/launch purposes. But purely for "starting attributes", see my next reply.

I don't see how we use it on DCAP, maybe only to initialize starting attributes of an enclave? Even if, why do they differ from the ones we'd choose when taking them from the host?

Yes, you're exactly correct. Even if we use DCAP, we still generate a .token that contains the starting attributes of an enclave. And then we read the .token contents and extract these attributes and assign them to the SECS page (which is used during ECREATE): https://github.com/gramineproject/gramine/blob/b491bb9c007b0b0a8e1d360c4cd08a3aa6980ff2/pal/src/host/linux-sgx/host_framework.c#L122

Even if, why do they differ from the ones we'd choose when taking them from the host?

Not sure I understand this question. Since the .token is generated on the host on which the SGX enclave will run, the attributes will be the same as on the host (in the case of this issue, host = VM).

Of course, we could move the logic of determining the "starting attributes of the enclave" to our Gramine untrusted-PAL startup C code. Currently this logic is in the Python code (I gave links above). When we discussed #363, I mentioned it somewhere, that the only thing we should "lift" from the dummy token generated by Python is this "what are the starting attributes" logic (that queries /proc/cpuinfo flags and decides which CPU features can be added to starting enclave attributes). This was not trivial to implement, so nobody cared to create such a PR at that moment.

mkow commented 2 years ago

Since the .token is generated on the host on which the SGX enclave will run, the attributes will be the same as on the host (in the case of this issue, host = VM).

So, isn't this @Zebartin's bug here, not qemu's/gramine's? The token should be generated on the machine where you run the enclave, not where you build it. And I think he's generating it on the host, and then uses inside a VM (which is a different machine, technically).

I agree that moving this logic to startup code could make this case easier, but this may be complicated, we'd need separate logic for EPID where we have to take the attributes from the token. If this won't end up super complex then we could do it, but hard to say for me if that's the case.

dimakuv commented 2 years ago

So, isn't this @Zebartin's bug here, not qemu's/gramine's? The token should be generated on the machine where you run the enclave, not where you build it. And I think he's generating it on the host, and then uses inside a VM (which is a different machine, technically).

Oh, is this how @Zebartin does it? This wasn't clear to me from the issue description. I thought that all the files are generated inside the VM, i.e. the whole Gramine testing/building/tweaking happens inside a VM at all times.

But if it's indeed that @Zebartin generates the .token file on the host (not in a VM) and then tries to use this file in a VM, then yes -- it will probably break.

Zebartin commented 2 years ago

So, isn't this @Zebartin's bug here, not qemu's/gramine's? The token should be generated on the machine where you run the enclave, not where you build it. And I think he's generating it on the host, and then uses inside a VM (which is a different machine, technically).

Oh, is this how @Zebartin does it? This wasn't clear to me from the issue description. I thought that all the files are generated inside the VM, i.e. the whole Gramine testing/building/tweaking happens inside a VM at all times.

But if it's indeed that @Zebartin generates the .token file on the host (not in a VM) and then tries to use this file in a VM, then yes -- it will probably break.

Well, the only things I did were all from Quick start part of the docs. I know neither what the .token file exactly means, nor how to generate it.

It seems that @lejunzhu reproduced the same result, and I believe that @lejunzhu would not make the same mistakes as mine, if there is any.

lejunzhu commented 2 years ago

Oh, is this how @Zebartin does it? This wasn't clear to me from the issue description. I thought that all the files are generated inside the VM, i.e. the whole Gramine testing/building/tweaking happens inside a VM at all times.

But if it's indeed that @Zebartin generates the .token file on the host (not in a VM) and then tries to use this file in a VM, then yes -- it will probably break.

Not really. Even when I generate the token file on the same VM, the issue still happens. There are two different findings here:

According to SDM, XFRM value should match XCR0. But Gramine sets the flags from /proc/cpuinfo. The SDK apps, however, use XCR0, so that's why SDK apps can run.
When using QEMU 7.1, the XCR0 value somehow does not contain AVX, although /proc/cpuinfo says the CPU is capable of AVX. Although SDK apps can run, they lose AVX and have to use basic instructions only. I've contacted a QEMU developer and he is looking at it.

dimakuv commented 2 years ago

When using QEMU 7.1, the XCR0 value somehow does not contain AVX, although /proc/cpuinfo says the CPU is capable of AVX.

Wow, this is clearly a bug. Isn't it affecting not only SGX, but any VM on any machine?

lejunzhu commented 2 years ago

When using QEMU 7.1, the XCR0 value somehow does not contain AVX, although /proc/cpuinfo says the CPU is capable of AVX.

Wow, this is clearly a bug. Isn't it affecting not only SGX, but any VM on any machine?

In this QEMU 7.1 situation, it is most likely a bug.

~~But there is also this kernel command line "noxsave". In such (rarely used) case, I think XCR0 won't match /proc/cpuinfo, although I haven't tried it.~~ I was wrong about the above statement. When booted with "noxsave", /proc/cpuinfo does not contain the AVX flag either.

yangzhon commented 2 years ago

One patch sent to Qemu community to fix this issue https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg01842.html

@lejunzhu also verified this issue in his ENV, thanks for reporting it to me!

dimakuv commented 1 year ago

@lejunzhu We hit this problem again, on QEMU 8.0.

The SDK apps, however, use XCR0, so that's why SDK apps can run.

How do SDK apps "learn" about XCR0? I didn't find any way to get XCR0 values from Linux.

lejunzhu commented 1 year ago

How do SDK apps "learn" about XCR0? I didn't find any way to get XCR0 values from Linux.

It uses XGETBV instruction. I think this is the function that does it:

https://github.com/intel/linux-sgx/blob/master/psw/urts/se_detect.cpp#L97

dimakuv commented 1 year ago

Thanks @lejunzhu. I created two PRs that align with SGX SDK style (and which is the correct way):

Could you maybe also do a (light) review of these PRs? Does the logic seem right to you?

lejunzhu commented 1 year ago

Thanks @lejunzhu. I created two PRs that align with SGX SDK style (and which is the correct way):

[PAL/Linux-SGX] Remove no-XSAVE code paths (dead code) #1402

[PAL/Linux-SGX] Enable optional CPU features only if allowed by XCR0 #1403

Could you maybe also do a (light) review of these PRs? Does the logic seem right to you?

1403 looks correct. I have only one comment: in case this issue happens again in the future, should we print a warning message when CPU and OS report things differently?

I'm not familiar with how XSAVE is used in Gramine, so I have no idea about #1402.

dimakuv commented 1 year ago

1403 looks correct. I have only one comment: in case this issue happens again in the future, should we print a warning message when CPU and OS report things differently?

Done. See the fixup commit in #1403.

dimakuv commented 1 year ago

Just for info.

The patch that fixes the XCR0 and CPUID.12.1 issue was merged in QEMU in April 2023 and appears in QEMU starting from v8.0.1. See for details:

https://github.com/qemu/qemu/blob/v8.0.2/target/i386/cpu.c#L5721 -- correct in v8.0.2 (latest version currently)
https://github.com/qemu/qemu/blob/v8.0.1/target/i386/cpu.c#L5721 – correct in v8.0.1 (first version that fixed the bug)
https://github.com/qemu/qemu/blob/v8.0.0/target/i386/cpu.c#L5721 – buggy in v8.0.0 (this is the version where we saw this bug)

gramineproject / gramine

ioctl ECREATE error -5 when running on virtual machine #955

Description of the problem

Steps to reproduce

Expected results

Actual results

Gramine commit hash

1403 looks correct. I have only one comment: in case this issue happens again in the future, should we print a warning message when CPU and OS report things differently?

1403 looks correct. I have only one comment: in case this issue happens again in the future, should we print a warning message when CPU and OS report things differently?