littlefs-project / littlefs

A little fail-safe filesystem designed for microcontrollers
BSD 3-Clause "New" or "Revised" License
5.15k stars 793 forks source link

Overhead calculator #95

Open X-Ryl669 opened 6 years ago

X-Ryl669 commented 6 years ago

Can you provide some formula to figure out the overhead used for the filesystem compared to the data stored ?

Something with page size, and block size as input and the Y * (number of file of size XXX) ?

Thanks

geky commented 6 years ago

Hi @X-Ryl669, sorry for the late response, it's taken me a bit to catch up on open issues.

This is the current minimums:

So if you had 10 files stored in root, that would take: 2 for the superblock + 2 for directories (root is a directory) + 10 for files = 14 blocks at minimum

This is useful for most uses of embedded filesystems when files are smaller than the block size.

The full formula that handles when directory blocks/file blocks overflow is a bit more complicated. The full details can be found in the SPEC.md. But if you want a direct formula it would be something like this:

math


Currently this isn't great when there are very few blocks (for example internal flash). But works ok for most forms of external flash. Most notably, small files always take up a full block at minimum.

As a sidenote: I'm currently working on this as a part of the 2.0 work which will get rid of the 2-block superblock requirement and allow small files to be inlined in their parent's directory blocks: https://github.com/ARMmbed/littlefs/pull/85

X-Ryl669 commented 6 years ago

Thanks a lot!

perigoso commented 2 years ago

@geky for the v2 could you show what formulas one could use to calculate overhead?

ExtremeGTX commented 1 year ago

@geky is it still the same overhead for v2.x ?

geky commented 1 year ago

Hi @ExtremeGTX, @perigoso, sorry about missing this earlier.

This has changed in v2, it's improved in some ways but also became a bit more complicated:

The file data structure is the same, so if $\mathtt{file\_size} > \mathtt{cache\_size}$:

\mathtt{file\_blocks} = \left\lceil\frac{\mathtt{file\_size}}{\mathtt{block\_size}-8}\right\rceil

Metadata get a bit more complicated. Each directory still needs at minimum one pair of metadata blocks, and each piece of metadata has a 4x overhead (2x for block pairs, 2x to avoid exponential runoff):

\mathtt{dir\_blocks} = 2 \times \left\lceil\frac{2 \times \left(\mathtt{dir\_metadata} + \sum{\mathtt{file\_metadata}}\right)}{\mathtt{block\_size}}\right\rceil
\mathtt{total\_blocks} = \sum{\mathtt{dir\_blocks}} + \sum{\mathtt{file\_blocks}}

Where the $\mathtt{dir\_metadata}$ includes some bookkeeping things: a 4 byte revision count + ~40 bytes according to this comment. Though this is susceptible to change in updates.

The $\mathtt{file\_metadata}$ includes the filename, inline data or a 8-byte pointer to the file data structure, and any custom attributes. Each attribute includes a 4 bytes tag. So filename + inline data + 3 attributes would need an extra 5x4 bytes to store everything.

So, at the time of writing, this is roughly:

\mathtt{dir\_metadata} \approx 44
\begin{aligned}
\mathtt{file\_metadata} &\approx 4 + \mathtt{name\_size} \\
& + 4 + \begin{cases} 
    8 & \mathtt{file\_size} > \mathtt{cache\_size} \\
    \mathtt{file\_size} & \mathtt{file\_size} \le \mathtt{cache\_size}
\end{cases} \\
& + \sum{\left(4 + \mathtt{uattr\_size}\right)}
\end{aligned}

Some other things to note:

  1. Files are copy-on-write. This means when you write to a file, the old contents remain reserved until you call lfs_file_sync. So you effectively need to pay ~2x the cost for open files!

  2. If dynamic wear-leveling is enabled: Directories are "copy-on-bounded-write", which means worst case you may have a copy-on-write copy of the directory tree. So worst case ~2x the path from the root to your file. This only occurs during metadata updates, and is fixed immediately, but can be a problem for a near-full filesystem.

  3. If dynamic wear-leveling is enabled: Superblocks are inserted into the filesystem if too many writes propagate to the root. This becomes exponentially unlikely as the filesystem grows, so it's unlikely for there to ever be more than one superblock (2 blocks). littlefs avoids inserting a superblock if the filesystem is >1/2 full, but this can still be a problem in a write-heavy application since inserted superblocks are never reclaimed.

For these reasons I would suggest some amount of extra storage to avoid near-full issues, something in the range of 1.5x-2x. These extra blocks will also contribute to dynamic wear-leveling, extending the lifetime of the device, so they're not really wasted in a sense.

Hopefully this info helps.

kyrreaa commented 12 months ago

What about the variable skip block section of each block. This varies with block number and basically grows with size of file?

geky commented 12 months ago

@kyrreaa, great question! It's super unintuitive, but because the number of variable pointers forms a perfectly balanced binary tree, on average the overhead never exceeds $2w$.

Since our word size $w$ is 32-bits, or 4 bytes, this is where the 8 comes from in the $\mathtt{file\_blocks}$ equation:

\mathtt{file\_blocks} = \left\lceil\frac{\mathtt{file\_size}}{\mathtt{block\_size}-8}\right\rceil

We can work through a couple examples to see this is true:

   0        1        2        3        4        5        6        7        8
.-----.  .-----.  .-----.  .-----.  .-----.  .-----.  .-----.  .-----.  .-----.
|     |<---ptr |<---ptr |<---ptr |<---ptr |<---ptr |<---ptr |<---ptr |<---ptr |
|     |<-|     |----ptr |<-|     |----ptr |<-|     |----ptr |<-|     |----ptr |
|     |<-|     |--|     |--|     |----ptr |<-|     |--|     |--|     |----ptr |
|     |<-|     |--|     |--|     |--|     |--|     |--|     |--|     |----ptr |
'-----'  '-----'  '-----'  '-----'  '-----'  '-----'  '-----'  '-----'  '-----'
 0 ptrs   1 ptrs   2 ptrs   1 ptrs   3 ptrs   1 ptrs   2 ptrs   1 ptrs   4 ptrs
'-----------.-----------'
     3 ptrs in 3 blocks
          3 < 6 (3*2)
'--------------------.--------------------'
              7 ptrs in 5 blocks
                   7 < 10 (5*2)
'--------------------------------------.--------------------------------------'
                               15 ptrs in 9 blocks
                                    15 < 18 (9*2)

One way to prove this is to look at each row of pointers. Ignoring the first block, the first row has $n$ pointers, the second has $\frac{n}{2}$ pointers, the third has $\frac{n}{4}$, then $\frac{n}{8}$ etc:

n + \frac{n}{2} + \frac{n}{4} +\frac{n}{8} + \cdots

This gives us a geometric series that converges to $2n$, or an average of $2$ pointers per block:

n\cdot\sum_{i=1}^{\infty }{\left(\frac{1}{2}\right)^i} = n \cdot 2
kyrreaa commented 12 months ago

Yeah I figured the number of pointers based on the block number outwards counting trailing zeroes in block number and add one. Since it is a bit convoluted the cost of the "actual size on disk" stat would be too high I guess. Reason I ask is spending a few days hunting down why my superblocks disappeared. The problem went away when I started using a proper sized lookahead after finding your comment on a previous issue that a lookahead much bigger than the number of blocks in the fs may break something. Seems you were right. (I was writing 20-50 MB files till I got no more space error, but not all files were in memory when listed. Upon reboot filesystem was unmountable and superblocks were gone. My wish was to see if stopping before it was actually full would help.)

shahawi-sumup commented 7 months ago

Hi @geky I wanna understand the following case please: I have a File system (LFS v2.4) used only for logs which will store only 2 files

File system Size = 65536 bytes Block size = 4096 bytes Total blocks = 16

Available_space = 64K - 2*4K (overhead(1x SuperBlock + 1x RootDir)) = 56K
Max_Size_per_log_file = 56K / 2 = 28672 bytes
Available_size_per_file = 28672 - 1*4KB (overhead (file_metadata)) = 24576
LogFileName = log.0000 (8 bytes)

But this calculation gives me error, no space left on the file system unless I subtract 28 bytes more from file size to be (24576 - 28) = 24548 bytes

I assume these 28 bytes are (8 bytes (filename) + 0 bytes (inline data) + 5x4 (attributes)) but shouldn't this be already part of the 1 block subtracted above for (file_metadata) ?

Thank you.

geky commented 7 months ago

Hi @shahawi-sumup, those 28-bytes come from the CTZ skip-list overhead:

\mathtt{file\_blocks} = \left\lceil\frac{\mathtt{file\_size}}{\mathtt{block\_size}-8}\right\rceil

littlefs carves out 8 bytes for each block for its own bookkeeping (the details are a bit more complicated, see above).

That being said, littlefs doesn't reserve a block for each file-metadata, the file's metadata is stored inside the parent directory. So you shouldn't need the - 1*4KB (overhead (file_metadata)).

Two things come to mind:

  1. littlefs is trying to create copies of the metadata blocks to wear-level.

    Consider setting block_cycles=-1 to disable block-level wear-leveling. At 16-blocks this may not get you much. littlefs still wear-levels metadata blocks internally, and data blocks are still dynamically wear-leveled via copy-on-write.

  2. The extra blocks are needed for copies of the file's tail block when appending.

    It's rare for files to be perfectly block-size aligned, so sometimes littlefs needs to create a copy of the last block in a file in order to perform a write.

In general I would avoid trying to fill up littlefs completely. The copy-on-write system is kind of complex and can result in ENOSPC in strange situations like what you're seeing. I would save at least ~10% of storage for copy-on-write operations. Though in your case, at 16 blocks, saving 2 blocks (or maybe even just 1 if block_cycles=-1?) is probably sufficient.

geky commented 5 months ago

This is being referenced externally, and a bit incorrectly (unfortunately the exact constraints are complicated), so I just wanted to clarify:

littlefs can fit in 2 blocks, as of v2.

Iff all files are "inlineable", which requires:

  1. they fit in RAM (cache_size specifically)
  2. they fit in 1/4 block_size

Inline files are more expensive, using ~4x the storage, but this is better than storing small files in full blocks.

If you have a small amount of storage, say, 4 blocks, you may also want to consider using a larger block size, block_size=2*block_size, to allow for larger inline files and better storage utilization.