gramineproject / gramine

A library OS for Linux multi-process applications, with Intel SGX support
GNU Lesser General Public License v3.0
595 stars 197 forks source link

[LibOS] dentry cache for filesystem can cause conflict between stat() result and host file status #88

Closed llly closed 5 months ago

llly commented 4 years ago

Description of the problem

shim_do_stat(), shim_do_lstat() etc. check dentry cache for filesystem first. If the file is already looked up and exists in dentry cache, these function return the cached data. However the cache will be out of sync after other programs modify it. Then program in Graphene can get conflict result when calling open() to access real file.

Steps to reproduce

A scenario that a program stat() a file, exec rm program of OS to delete it, stat() the file again to conform that rm works.

Expected results

The second stat() return ENOENT.

Actual results

The second stat() return 0 and cached stat.

Additional information

Here is the related log of the scenario.

[P27950:T2:java] ---- shim_stat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x9293f03f0) = -2
[P27950:T2:java] ---- shim_mkdir("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",511) = 0
[P27950:T2:java] ---- shim_lstat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x9293ee340) = 0
20/09/24 10:30:00 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79
[P27950:T90:java] ---- shim_stat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x888990410) = 0
[P28139:T91:java] ---- shim_execve("/bin/rm",[rm,-rf,/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79,],[LD_LIBRARY_PATH=/lib:/lib/x86_64-linux-gnu:/usr//lib/x86_64-linux-gnu:/opt/jdk:/opt/jdk/lib/jli,PATH=/opt/jdk/bin:/usr/sbin:/usr/bin:/sbin:/bin,]..
[P28139:T91:rm] ---- shim_newfstatat(AT_FDCWD,"/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x8e91dd648,256) = 0
[P28139:T91:rm] ---- shim_openat(AT_FDCWD,"/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",O_RDONLY|604400,0000) = 3
[P28139:T91:rm] ---- shim_openat(AT_FDCWD,"/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",O_RDONLY|2604400,0000) = 3
[P28139:T91:rm] ---- shim_unlinkat(AT_FDCWD,"/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",512) = 0
[P27950:T90:java] ---- shim_stat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x888990250) = 0
20/09/24 10:33:27 WARN JavaUtils: Attempt to delete using native Unix OS command failed for path = /tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79. Falling back to Java IO way
java.io.IOException: Failed to delete: /tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79
[P27950:T90:java] ---- shim_stat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x888990220) = 0
[P27950:T90:java] ---- shim_lstat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x88898e0c0) = 0
[P27950:T90:java] ---- shim_stat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x8889901d0) = 0
[P27950:T90:java] ---- shim_openat(AT_FDCWD,"/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",O_RDONLY|2204000,0000) = -2
20/09/24 10:33:27 ERROR DiskBlockManager: Exception while deleting local spark dir: /tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79
java.io.IOException: Failed to list files for dir: /tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79
llly commented 4 years ago

I don't find a similar one in the issue list of Filesystem code needs to be rewritten. Help me to close this one if it's a know issue.

dimakuv commented 4 years ago

It's actually a feature :) The original idea behind this caching was that Graphene doesn't share FS state with the host OS. In other words, only Graphene processes are expected to modify the files in mounted directories.

But yes, this feature probably doesn't make sense. @mkow @boryspoplawski @yamahata Should we just remove FS caching inside Graphene and allow the host to modify files? (Of course, SGX Protected Files will still correctly detect modifications by the host, by their design.)

mkow commented 4 years ago

Yes, I think this feature is broken by-design and should be removed. I linked this issue to the list in gramineproject/graphene#1803 (fs rewrite).

boryspoplawski commented 4 years ago

1) I like the idea of separation of Graphene and host filesystems and I don't think we can implement some features (like mounting) without it. 2) From what I can see from the logs, this is two in Graphene processes not seeing each others changes to fs, which is definitely an issue we need to address (be it now or when rewriting fs).

yamahata commented 4 years ago
pwmarcz commented 3 years ago

Current status:

We don't want to support host modifying files without Graphene knowing, but we definitely do want to support multiple processes modifying files. This likely requires synchronizing information using IPC (at the very least, sending information that cache needs to be invalidated for some process).

I did some early attempts ("sync engine", gramineproject/graphene#2158) but I feel that the solution needs to be rethought and simplified before going forward.

dimakuv commented 5 months ago

This was fixed by https://github.com/gramineproject/gramine/commit/29f127542a4d81a8b461f05a4b36344d89bed3fa