rust-vmm / vm-memory

Virtual machine's guest memory crate
Apache License 2.0
306 stars 98 forks source link

mmap: add support for devdax and block devices #189

Open uarif1 opened 2 years ago

uarif1 commented 2 years ago

metadata.len() returns 0 when trying to mmap files such as block devices and devdax (character devices) that are not regular files, hence returning MappingPastEof even if the mapping would fit at the provided file_offset.

This patch adds support for checking the size of devdax and block devices, and returns a new error, InvalidFileType, if the mmap being created is not for a regular file, block or devdax device, or if the size the devices couldn't be found in sysfs.

Usecase: Devdax and block devices can be used in cloud-hypervisor as memory-zone. MmapRegion::build from vm-memory is called while creating a GuestRegionMmap for the VM memory-zone.

Reviewed-by: Fam Zheng fam.zheng@bytedance.com Signed-off-by: Usama Arif usama.arif@bytedance.com

jiangliu commented 2 years ago

The implementation is linux specific and not generic enough. For devdax device, it return 0 for metadata(). How about seek()? Suppose file.seek(SeekFrom::Start(mmap_size)) should work as expect and is generic enough:)

uarif1 commented 2 years ago

Hi,

Thanks for the reply!

check_file_offset is already unix specific, so i thought it calling another unix function calculate_file_size wouldn't be an issue?

For devdax devices seek isnt supported (https://elixir.bootlin.com/linux/latest/source/drivers/dax/device.c#L371) so i dont think this implementation would work.

Another issue is as it says in https://doc.rust-lang.org/stable/std/io/trait.Seek.html#tymethod.seek "A seek beyond the end of a stream is allowed, but behavior is defined by the implementation." so maybe some block devices could just wrap around and give a Result eventhough mmap_size might be bigger than seek size?

Also i guess this would make mmap much slower if we seek rather than just read from sysfs? and also would be dependent on the charachter/block device file_operations actually supporting seek.

I can move check_file_offset and calculate_file_size to mmap_unix.rs. I think that would make it better?

Thanks, Usama

jiangliu commented 2 years ago

Hi,

Thanks for the reply!

check_file_offset is already unix specific, so i thought it calling another unix function calculate_file_size wouldn't be an issue?

For devdax devices seek isnt supported (https://elixir.bootlin.com/linux/latest/source/drivers/dax/device.c#L371) so i dont think this implementation would work.

Another issue is as it says in https://doc.rust-lang.org/stable/std/io/trait.Seek.html#tymethod.seek "A seek beyond the end of a stream is allowed, but behavior is defined by the implementation." so maybe some block devices could just wrap around and give a Result eventhough mmap_size might be bigger than seek size?

Also i guess this would make mmap much slower if we seek rather than just read from sysfs? and also would be dependent on the charachter/block device file_operations actually supporting seek.

I can move check_file_offset and calculate_file_size to mmap_unix.rs. I think that would make it better?

Thanks, Usama

I means the code between line 128-line 160 are linux/block specific. Actually we have encountered similar case, but the device is a special memory device instead of block/dax device. So I hope we could handle them in a generic way. Something like seek() and read() or pread() should work.

uarif1 commented 2 years ago

Hi, Thanks for the reply! check_file_offset is already unix specific, so i thought it calling another unix function calculate_file_size wouldn't be an issue? For devdax devices seek isnt supported (https://elixir.bootlin.com/linux/latest/source/drivers/dax/device.c#L371) so i dont think this implementation would work. Another issue is as it says in https://doc.rust-lang.org/stable/std/io/trait.Seek.html#tymethod.seek "A seek beyond the end of a stream is allowed, but behavior is defined by the implementation." so maybe some block devices could just wrap around and give a Result eventhough mmap_size might be bigger than seek size? Also i guess this would make mmap much slower if we seek rather than just read from sysfs? and also would be dependent on the charachter/block device file_operations actually supporting seek. I can move check_file_offset and calculate_file_size to mmap_unix.rs. I think that would make it better? Thanks, Usama

I means the code between line 128-line 160 are linux/block specific.

Yes but this should not be an issue as this code is part of calculate_file_size which is just called by check_file_offset which is itself linux specific. check_file_offset is only called in the mmap build in mmap_unix so it should be ok being linux specific?

Actually we have encountered similar case, but the device is a special memory device instead of block/dax device. So I hope >we could handle them in a generic way. Something like seek() and read() or pread() should work.

For devdax devices seek isnt supported (https://elixir.bootlin.com/linux/latest/source/drivers/dax/device.c#L371) so this implementation would not work. Trying seek will only give a nop and that would cause the seek to always pass, no matter the size of the devdax.

Also even if seek is supported in other devices, there could be other issues as mentioned above as seek beyond the end of a stream is allowed, but behavior is defined by the implementation. Seek would probably also be slower than just reading from sysfs?

famz commented 2 years ago

Since the later mmap() will return an error anyway if we get a wrong size, maybe we can just do a best effort check here for regular files or maybe block devices only (with BLKGETSIZE64 ioctl), and ignore char devices and defer it to the actual mmap()? IIUC there isn't much user experience difference whether or not we do this size pre-check, is there?

uarif1 commented 2 years ago

If the BLKGETSIZE64 ioctl is linux specific as well and wont work in other unix OS, maybe we just do below and let mmap return an error later if the devdax/block device size is not correct as suggested by Fam?

` pub fn check_file_offset( file_offset: &FileOffset, size: usize, ) -> result::Result<(), MmapRegionError> { let file = file_offset.file(); let start = file_offset.start();

if let Some(end) = start.checked_add(size as u64) {
    if let Ok(metadata) = file.metadata() {
        if metadata.is_file() {
            if metadata.len() < end {
                return Err(MmapRegionError::MappingPastEof);
            }
        }
    }
} else {
    return Err(MmapRegionError::InvalidOffsetLength);
}

Ok(())

}

`

uarif1 commented 2 years ago

Or we could also do something like below as well, the file.seek(SeekFrom::Start((size as u64))) will always pass for devdax as it doesnt support seek, but maybe for other devices it is better. I guess this is what you meant @jiangliu ?

diff --git a/src/mmap.rs b/src/mmap.rs
index 3f83516..c342534 100644
--- a/src/mmap.rs
+++ b/src/mmap.rs
@@ -15,7 +15,9 @@
 use std::borrow::Borrow;
 use std::error;
 use std::fmt;
-use std::io::{Read, Write};
+use std::io::{Read, Seek, SeekFrom, Write};
+#[cfg(unix)]
+use std::os::unix::fs::FileTypeExt;
 use std::ops::Deref;
 use std::result;
 use std::sync::atomic::Ordering;
@@ -116,13 +118,28 @@ pub fn check_file_offset(
     file_offset: &FileOffset,
     size: usize,
 ) -> result::Result<(), MmapRegionError> {
-    let file = file_offset.file();
+    let mut file = file_offset.file();
     let start = file_offset.start();

     if let Some(end) = start.checked_add(size as u64) {
         if let Ok(metadata) = file.metadata() {
-            if metadata.len() < end {
-                return Err(MmapRegionError::MappingPastEof);
+            let file_type = metadata.file_type();
+
+            if file_type.is_file() {
+                if metadata.len() < end {
+                    return Err(MmapRegionError::MappingPastEof);
+                }
+            } else if file_type.is_block_device() || file_type.is_char_device() {
+                // if seek is not supported by the device, this will always return Ok
+                if let Err(_) = file.seek(SeekFrom::Start((size as u64))) {
+                    return Err(MmapRegionError::MappingPastEof);
+                }
+                // Rewind file to the start after seek.
+                if let Err(_) = file.rewind() {
+                    return Err(MmapRegionError::InvalidFileType);
+                }
+            } else {
+                return Err(MmapRegionError::InvalidFileType);
             }
         }
     } else {
diff --git a/src/mmap_unix.rs b/src/mmap_unix.rs
index 2f1a744..6779aaf 100644
--- a/src/mmap_unix.rs
+++ b/src/mmap_unix.rs
@@ -29,6 +29,8 @@ pub enum Error {
     InvalidOffsetLength,
     /// The specified pointer to the mapping is not page-aligned.
     InvalidPointer,
+    /// The specified file type is invalid.
+    InvalidFileType,
     /// The specified size for the region is not a multiple of the page size.
     InvalidSize,
     /// The forbidden `MAP_FIXED` flag was specified.
@@ -56,6 +58,7 @@ impl fmt::Display for Error {
                 f,
                 "The specified size for the region is not a multiple of the page size",
             ),
+            Error::InvalidFileType => write!(f, "The specified file type is invalid"),
             Error::MapFixed => write!(f, "The forbidden `MAP_FIXED` flag was specified"),
             Error::MappingOverlap => write!(
                 f,
famz commented 2 years ago

@jiangliu any suggestions? Thanks.

uarif1 commented 2 years ago

Hi,

To summarize i believe there are 2 options available:

I am happy with either of these approaches and can change the commit to it. Which one do you prefer @jiangliu ? Happy to try other ideas as well.

uarif1 commented 2 years ago

Hi, I just wanted to check if there were any opinions on the above 2 options or any other ideas? Thanks! @jiangliu @bonzini @alexandruag