relan / exfat

Free exFAT file system implementation
GNU General Public License v2.0
789 stars 179 forks source link

Support fast space allocation? #45

Open Lekensteyn opened 8 years ago

Lekensteyn commented 8 years ago

I have recently bought an IODD-2541 USB disk enclosure which should allow me to expose files as virtual disk devices. It supports FAT32/exFAT and NTFS, but since NTFS sounds overkill and less supported for Linux I decided to use the exFAT.

Due to the design, files must be contiguously allocated. When I used truncate -s 20G disk.vhd, it took a while before all bytes were written to the SSD. This could probably be explained by the requirement to initialize extended part with zeroes (\0).

Proposal: a method to request allocation some sectors without writing zeroes to new blocks.

I am fully aware of the security issues from exposing possibly deleted data, but am willing to risk that for creating "empty" disk images. The undelying SSD does not have confidential data and the test images are just like that, test images.

Possible mechanism 1: The Linux-specific fallocate(2) function combined with the FALLOC_FL_NO_HIDE_STALE mode could probably be used here. Originally proposed in April 2012 (see https://lwn.net/Articles/492959/, https://lwn.net/Articles/492920/). Apparently still in use via out-of-tree patches in production according to the ext4 maintainer, writing in September 2015:

However, this patch is in active use in practically every single data center kernel for Google, and it's in use in at least one other very large publically traded company that uses cluster file systems such as Hadoopfs. And if someone wants a copy of the FALLOC_FL_NO_HIDE_STALE patch for ext4, I'm happy to give it to them.

Unfortunately the FUSE layer rejects such flags, so more work would be needed:

static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
                loff_t length)
{
    ...
    if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
        return -EOPNOTSUPP;

Possible mechanism 2: Introduce an ioctl that could preallocate some blocks. (Restricting it to those who have CAP_SYS_RAWIO.)

Possible mechanism 3: Introduce a separate utility that can allocate such file on an offline (unmounted) image.

Hopefully I have shown enough research and made the intent clear. Until then I have to waste some SSD write cycles and wait somewhat longer.

relan commented 8 years ago

Proposal: a method to request allocation some sectors without writing zeroes to new blocks.

Well, looks like the infrastructure (kernel, FUSE and programs) just isn't ready for such feature.

Possible mechanism 4: Implement FUSE 2.9.1 operation fallocate and add an optional mount parameter that will disable data zeroing for this particular call. Some programs may break though.

Possible mechanism 5: Zero data using discard (trim) command when it's supported by the device (and sets flash memory blocks to 0). Hopefully it will be faster.

Until then I have to waste some SSD write cycles and wait somewhat longer.

If SSD controller supports data compression, it hardly writes anything to flash in this case.

Lekensteyn commented 8 years ago

Option (4) would be probably the easiest to add/use but it is a violation of the interface and may break programs.

Option (5) is better than nothing, but I am not sure if it works on the USB disk enclosure. SSDs with data compression are solving problems on the wrong layer, I believe that only (some?) Sandforce SSDs do this hack.

moneytoo commented 7 years ago

I had intention to run Samba share of a exfat formatted external drive but this issue is what prevents me to do that.

After I connect to a Samba share from Windows client and attempt to copy large file there (on the exfat drive), it starts by truncating the file first. Because the truncate is slow (it's via USB 2), Windows client timeouts (and reports error) if the actual transfer doesn't start within 20 seconds. Transferring the file from macOS is ok. Windows hosted share on exfat is also working fine - which I don't get - is it SMB version thing or truncate implementation in Windows?

https://bugzilla.samba.org/show_bug.cgi?id=3583 http://www.gossamer-threads.com/lists/linux/kernel/683607

relan commented 7 years ago

is it SMB version thing or truncate implementation in Windows?

Can be both.

A possible solution that comes to my mind:

  1. On truncate set size to the desired value and valid_size leave intact. Do not initialize allocated blocks.
  2. On write beyond valid_size adjust it accordingly. Initialize blocks between old and new valid_size if they were not overwritten.
  3. On read return zeros for blocks beyond valid_size.

FS should be in a consistent state during all those operations while avoiding extra initialization of blocks that will be overwritten soon. But this would be quite a complex change.

JsBergbau commented 6 years ago

Is there anything new about this topic? Problem still exists, makes ExFat as underlying file system unusable.

JsBergbau commented 5 years ago

bump... Problem still exists :(

josephernest commented 3 years ago

Same problem here!

After I connect to a Samba share from Windows client and attempt to copy large file there (on the exfat drive), it starts by truncating the file first. Because the truncate is slow (it's via USB 2), Windows client timeouts (and reports error) if the actual transfer doesn't start within 20 seconds.

Tested with both exfat-fuse and Linux Kernel 5.4's exfat (non-fuse).

So when sending 2 GB from Windows Explorer to a Linux+Samba+exFAT computer, 4 GB are actually written:

It doubles the transfer time.

Anyone an idea?