littlefs-project / littlefs

A little fail-safe filesystem designed for microcontrollers
BSD 3-Clause "New" or "Revised" License
5.22k stars 800 forks source link

Notation and more detailed derivation of index and offset in DESIGN.md #745

Open fourstring opened 2 years ago

fourstring commented 2 years ago

I'm investigating the design and implementaion of littlefs. After reading DESIGN.md for several times, I'm still confused about the formula deriving index and offset from size of a CTZ skip-list. And I have not found any issues or other materials giving a more detailed derivation. Following is my thought and questions. Does I misunderstand the content of the document? Can I have some more detailed explanations? @geky

Thanks in advance!

  1. the term "index" is 0-indexed or 1-indexed?

From figures in DESIGN.md, blocks in a CTZ skip-list seem like to be numbered from 0. However, if so, the formula calculating available space for data in a block would be invalid:

$$ B-\frac{\omega}{8}(\text{ctz}(i)+1) $$

ctz(0) is an undefined value here. Besides, if we assume $\text{ctz}(0)=0$, the formula above is still invalid, because there is obviously no pointer stored in block 0, so its available space should be $B$ instead of $B-\frac{\omega}{8}$

  1. Is the first block in the CTZ skip-list counted in the size?

No matter the first block is numbered 0 or 1, it should contains no pointer and store $B$ bytes. But it seems that these bytes are not regarded as part of size in formulas given in DESIGN.md. For instance, this equation(namely equation(1))

$$ N=\sum_i^n[B-\frac{\omega}{8}(\text{ctz}(i)+1)] $$

gives the most intuitive form of relationship between number of blocks and file size. If we assume the first block is block 0, consider a CTZ skip-list of 2 blocks, there is only 1 pointer in the block 1 apparently, and the size of the list should be $B + (B - \frac{\omega}{8})$. But if we let $n=1$ in equation(1), we get(we also have to ignore undefined ctz(0)!) $B-\frac{\omega}{8}$, $B$ bytes in block 0 are gone! Otherwise, if we assume the first block is block 1, consider a list only containing 1 block, then we get size is $B-\frac{\omega}{8}$ bytes instead of $B$ bytes from equation(1).

  1. How the formula of index is derived actually?

Index can be calculated by the formula(namely equation(2)):

$$ n=\left \lfloor \frac{N-\frac{\omega}{8}(\text{popcount}(\frac{N}{B-2\frac{\omega}{8}}-1)+2)}{B-2\frac{\omega}{8}} \right \rfloor $$

As described in DESIGN.md, equation (2) is derived as follow:

But what we can do is solve for an n' index that is greater than n with error bounded by the range of the popcount function. We can repeatedly substitute n' into the original equation until the error is smaller than our integer resolution. As it turns out, we only need to perform this substitution once

I can't understand how the $n'$ is solved and how substitute $n'$ back works, could you please give a more detailed process?

  1. What the term "offset" refers to and how it's derived from equation(1)?

As stated in DESIGN.md:

This last question is how do we store CTZ skip-lists? We need a pointer to the head block, the size of the skip-list, the index of the head block, and our offset in the head block.

And offset can be calculated using equation(3):

$$ off=N-(B-2\frac{\omega}{8})n-\frac{\omega}{8}\text{popcount}(n) $$

But "offset in the head block" seems not so clear here. Does this refers to offset of data bytes in the head block? the offset of the first data byte in the whole file? or other things? It seems that the first explanation is far from the form of equation(3), because we can get offset of data bytes in the head block using something like $\frac{\omega}{8}(\text{ctz}(n)+1)$.

Nevertheless, there may be still a little distance from the second explanation to equation(3). From my understanding, the offset of the first data byte in the whole file is also summation of data bytes stored in all previous blocks, i.e.:

$$ N-off=B-\frac{\omega}{8}(\text{ctz}(n)+1) $$

$$ off=N-B+\frac{\omega}{8}(\text{ctz}(n)+1) $$

Or,

$$ off=\sum_0^{n-1}[B-\frac{\omega}{8}(\text{ctz}(i)+1)]=(B-2\frac{\omega}{8})(n-1)+\frac{\omega}{8}\text{popcount}(n-1) $$

Any my derivation above does not match equation(3).

geky commented 1 year ago

Hi @fourstring, sorry for the late response, let me see if I can offer some pointers. I'm not really a mathematician, so some of the math is a bit hand-wavey in places.

You're definitely right I should have written more down on this. I had a bit of trouble reproducing the the results myself, which isn't great. Some of this is rederived which may make it different from DESIGN.md.

  1. the term "index" is 0-indexed or 1-indexed?

The index here is 0-indexed. You're right that $\operatorname{ctz}(0)$ would make this formula undefined at $i=0$, but I ignore this for the most part since we can easily handle special cases in the code.

B - \frac{w}{8} \left(\operatorname{ctz}(i) + 1\right)

This formula works out if you define $\operatorname{ctz}(i)$ to be $-1$. Point taken this should really be in the documentation.

  1. Is the first block in the CTZ skip-list counted in the size?

Yes, if $\operatorname{ctz}(i)$ is defined as $-1$ the first block gets the full $B$ bytes. There may be a "more correct" way to define this as a piecewise equation, but this works for our purposes.

  1. How the formula of index is derived actually?

  2. What the term "offset" refers to and how it's derived from equation(1)?

It may help to frame the problem.

We're trying to map a file onto the CTZ skip-list on disk:

                    data (size=105)
                   .-----------------------------.
                   | abcdef yzabcd wxyzab uvwxyz |
                   | ghijkl efghij cdefgh abcdef |
                   | mnopqr klmnop ijklmn ghijkl |
                   | stuvwx qrstuv opqrst        |
                   '-----------------------------'

                                  |
                                  v

 block 0     block 1     block 2     block 3     block 4     block 5
.--------.  .--------.  .--------.  .--------.  .--------.  .--------.
| abcdef |<-| <ptr0> |<-| <ptr0> |<-| <ptr0> |<-| <ptr0> |<-| <ptr0> |<- ptr=block 5
| ghijkl |<-| yzabcd |--| <ptr1> |--| cdefgh |--| <ptr1> |  | abcdef |   size=105
| mnopqr |<-| efghij |--| qrstuv |--| ijklmn |--| <ptr2> |  | ghijkl |
| stuvwx |  | klmnop |  | wxyzab |  | opqrst |  | uvwxyz |  |        |
'--------'  '--------'  '--------'  '--------'  '--------'  '--------'

To keep track of the CTZ skip-list we store a pointer to the "head block", and the size of the file. If we want to read/write to a specific offset in the file, we need to transform the file-relative offset ($\mathit{off_f}$) into the on-disk block number ($i$) and block-relative offset ($\mathit{off_i}$).

I'm going to change the variables to hopefully be more consistent, sorry if this adds confusion.

Also there may be a simpler/faster way to solve this. This was just my approach at finding a solution with the knowledge I have.

                    data (size=105)       .-- off_f=84
                   .----------------------v------.
                   | abcdef yzabcd wxyzab uvwxyz |
                   | ghijkl efghij cdefgh abcdef |
                   | mnopqr klmnop ijklmn ghijkl |
                   | stuvwx qrstuv opqrst        |
                   '-----------------------------'

                                  |
                                  v

 block 0     block 1     block 2     block 3     block 4     block 5
.--------.  .--------.  .--------.  .--------.  .--------.  .--------.
| abcdef |<-| <ptr0> |<-| <ptr0> |<-| <ptr0> |<-| <ptr0> |<-| <ptr0> |<- ptr=block 5
| ghijkl |<-| yzabcd |--| <ptr1> |--| cdefgh |--| <ptr1> |  | abcdef |   size=105
| mnopqr |<-| efghij |--| qrstuv |--| ijklmn |--| <ptr2> |  | ghijkl |
| stuvwx |  | klmnop |  | wxyzab |  | opqrst |  | uvwxyz |  |        |
'--------'  '--------'  '--------'  '--------'  '-^------'  '--------'
                                                  '-- i=block 4
                                                      off_i=3*(w/8)

Note that file size and file-relative offset are the same thing here.

To start solving this, we can first phrase create an equation for $\mathit{off_f}$ given $i$. This is the sum of the available space in the blocks minus the pointer overhead:

\mathit{off_f} = \sum_j^i\left[B - \frac{w}{8}\left(\operatorname{ctz}(j)+1\right)\right]

We can move some of the constants out of the summation:

\mathit{off_f} = Bi - \frac{w}{8}\sum_j^i\left(\operatorname{ctz}(j)+1\right)

And here we can substitute in our $\operatorname{popcount}$ property:

\sum_j^i\left(\operatorname{ctz}(j)+1\right) = 2i-\operatorname{popcount}(i)

So:

\mathit{off_f} = Bi - \frac{w}{8}\left(2i - \operatorname{popcount}(i)\right)

But our goal is to solve for $i$. We can rearrange the terms:

\mathit{off_f} = Bi - 2\frac{w}{8}i - \frac{w}{8}\operatorname{popcount}(i)
Bi - 2\frac{w}{8}i = \mathit{off_f} - \frac{w}{8}\operatorname{popcount}(i)
i = \left\lfloor\frac{\mathit{off_f} - \frac{w}{8}\operatorname{popcount}(i)}{B-2\frac{w}{8}}\right\rfloor

But $\operatorname{popcount}$ gets in the way here and is a big pain.

One interesting thing to note is that the output of $\operatorname{popcount}(i)$ is bounded to $\le \operatorname{nlog}_2(n)$. So in theory, if we substituted this equation for $i$ into itself, it should eventually converge:

i = \left\lfloor\cfrac{\mathit{off_f} - \frac{w}{8}\operatorname{popcount}\left(\left\lfloor\cfrac{\mathit{off_f} - \frac{w}{8}\operatorname{popcount}\left(\left\lfloor\cfrac{\mathit{off_f} - \frac{w}{8}\operatorname{popcount}(\dots)}{B-2\frac{w}{8}}\right\rfloor\right)}{B-2\frac{w}{8}}\right\rfloor\right)}{B-2\frac{w}{8}}\right\rfloor

So we could solve for $i$ iteratively and call it a day. But we can also solve for $i$ algebraically.

First lets guess an $i'$, say we just drop the annoying $\operatorname{popcount}$ part of the equation.

Note that $i'$ is strictly $\geq i$. This will be important later:

i' = \left\lfloor\frac{\mathit{off_f}}{B-2\frac{w}{8}}\right\rfloor

The error $\epsilon_{i'}$ is of course that annoying $\operatorname{popcount}$ part:

\begin{aligned}
\epsilon_{i'} &= \left|i'-i\right| \\
&= \left|
    \left\lfloor\frac{\mathit{off_f}}{B-2\frac{w}{8}}\right\rfloor
    - \left\lfloor\frac{\mathit{off_f} - \frac{w}{8}\operatorname{popcount}(i)}{B-2\frac{w}{8}}\right\rfloor\right| \\
&= \left|\left\lfloor\
    \frac{\mathit{off_f} - \mathit{off_f} + \frac{w}{8}\operatorname{popcount}(i)}
        {B-2\frac{w}{8}}\right\rfloor\right| \\
&= \left\lfloor\
    \frac{\frac{w}{8}\operatorname{popcount}(i)}
        {B-2\frac{w}{8}}\right\rfloor
\end{aligned}

Though we can make some interesting observations about $\epsilon_{i'}$. The output of $\operatorname{popcount}(i)$ is limited to $\operatorname{nlog}_2(i-1)$, and $\frac{w}{8}\operatorname{nlog}_2(i)$ can't exceed the block size $B$, otherwise we wouldn't be able to fit all of our pointers in our blocks. So $\frac{w}{8}\operatorname{popcount}(i)$ must be $\le B$:

\epsilon_{i'} \le \left\lfloor\
    \frac{B}
        {B-2\frac{w}{8}}\right\rfloor

Assuming enough spare space in our blocks to actually store data, $B \ge 3\frac{w}{8}$, and truncating division, our error $\epsilon_{i'}$ never actually exceeds 1:

\epsilon_{i'} \le 1

This is enough to limit the correct solution to either $i'$ or $i'-1$. In our code we could simply calculate both and test which is the right answer, but we can keep going to see if we end up with an unconditional solution.

Lets go ahead and substitute $i'$ into our original equation for $i$. This gives us a new guess, $i''$:

i'' = \left\lfloor\frac{\mathit{off_f} - \frac{w}{8}\operatorname{popcount}(i')}{B-2\frac{w}{8}}\right\rfloor

What's our new error $\epsilon_{i''}$?

\begin{aligned}
\epsilon_{i''} &= \left|i''-i\right| \\
&= \left|
    \left\lfloor\frac{\mathit{off_f} - \frac{w}{8}\operatorname{popcount}(i')}{B-2\frac{w}{8}}\right\rfloor
    - \left\lfloor\frac{\mathit{off_f} - \frac{w}{8}\operatorname{popcount}(i)}{B-2\frac{w}{8}}\right\rfloor\right| \\
&= \left|\left\lfloor\
    \frac{\mathit{off_f} - \frac{w}{8}\operatorname{popcount}(i') - \mathit{off_f} + \frac{w}{8}\operatorname{popcount}(i)}
        {B-2\frac{w}{8}}\right\rfloor\right| \\
&= \left|\left\lfloor\
    \frac{\frac{w}{8}\left(\operatorname{popcount}(i)-\operatorname{popcount}(i')\right)}
        {B-2\frac{w}{8}}\right\rfloor\right|
\end{aligned}

This looks like a dead-end, but we can make some rather interesting observations about this $\operatorname{popcount}(i)-\operatorname{popcount}(i')$ expression.

This is basically asking what is the largest number of bits that could flip between $i$ and $i'$.

We know from above that the error between $i$ and $i'$, $\epsilon_{i'}$, is at most 1. If we think about when a change of 1 results in the most bit-flips, it would be when a number "rolls over" during addition:

i  = 0111...1111
i' = 1000...0000

What's really interesting about this, is that the number of bits that flip in this case is equal to $\operatorname{ctz}(i')+1$:

i  = 0111...1111
i' = 1000...0000
     ^'---+----'
     |    '-- ctz(i')
     '------- + 1

I suspect this is somehow related to the $\operatorname{popcount}$ property we've been using.

This relationship depends on which $i$ is the one "rolling over":

\left|\operatorname{popcount}(i)-\operatorname{popcount}(i')\right| \le \operatorname{ctz}\left(\operatorname{max}(i, i')\right)+1

But since we know $i'$ is strictly $\ge i$ from above, we can simplify this:

\left|\operatorname{popcount}(i)-\operatorname{popcount}(i')\right| \le \operatorname{ctz}(i')+1

Plugging this in:

\begin{aligned}
\epsilon_{i''} &\le \left|\left\lfloor\
    \frac{\frac{w}{8}\left(\operatorname{popcount}(i)-\operatorname{popcount}(i')\right)}
        {B-2\frac{w}{8}}\right\rfloor\right| \\
&\le \left\lfloor\
    \frac{\frac{w}{8}\left(\operatorname{ctz}(i')+1\right)}
        {B-2\frac{w}{8}}\right\rfloor
\end{aligned}

We've managed to remove any trace of $i$ from our new equation, which is probably a good thing. But it's still not low enough to be reliable.

Let's create a new estimate $i'''$, where we subtract the error from our estimate $i''$:

\begin{aligned}
i''' &= i'' - \epsilon_{i''} \\
&= \left\lfloor\frac{\mathit{off_f} - \frac{w}{8}\operatorname{popcount}(i')}{B-2\frac{w}{8}}\right\rfloor
    - \left\lfloor\
    \frac{\frac{w}{8}\left(\operatorname{ctz}(i')+1\right)}
        {B-2\frac{w}{8}}\right\rfloor \\
&= \left\lfloor\frac{\mathit{off_f} - \frac{w}{8}\operatorname{popcount}(i') - \frac{w}{8}\left(\operatorname{ctz}(i')+1\right)}
    {B-2\frac{w}{8}}\right\rfloor \\
&= \left\lfloor\frac{\mathit{off_f} - \frac{w}{8}\left(\operatorname{popcount}(i') + \operatorname{ctz}(i')+1\right)}
    {B-2\frac{w}{8}}\right\rfloor
\end{aligned}

So now what is our error $\epsilon_{i'''}$?

\begin{aligned}
\epsilon_{i'''} &= \left|i''' - i\right| \\
&= \left|\left\lfloor \frac{
    \mathit{off_f} - \frac{w}{8}\left(\operatorname{popcount}(i') + \operatorname{ctz}(i')+1 \right)}
    {B-2\frac{w}{8}}\right\rfloor
    - \left\lfloor \frac{
    \mathit{off_f} - \frac{w}{8}\operatorname{popcount}(i)}
    {B-2\frac{w}{8}}\right\rfloor\right|\\
&= \left|\left\lfloor \frac{
    \mathit{off_f} - \frac{w}{8}\left(\operatorname{popcount}(i') + \operatorname{ctz}(i')+1 \right)
    - \mathit{off_f} + \frac{w}{8}\operatorname{popcount}(i)}
    {B-2\frac{w}{8}}\right\rfloor\right|\\
&= \left|\left\lfloor \frac{
    \mathit{off_f} - \frac{w}{8}\operatorname{popcount}(i') - \frac{w}{8}\left(\operatorname{ctz}(i')+1\right)
    - \mathit{off_f} + \frac{w}{8}\operatorname{popcount}(i)}
    {B-2\frac{w}{8}}\right\rfloor\right|\\

&= \left|\left\lfloor \frac{
    \mathit{off_f} - \mathit{off_f} + \frac{w}{8}\operatorname{popcount}(i) - \frac{w}{8}\operatorname{popcount}(i') - \frac{w}{8}\left(\operatorname{ctz}(i')+\frac{w}{8}\right)}
    {B-2\frac{w}{8}}\right\rfloor\right|\\
&= \left|\left\lfloor \frac{\frac{w}{8}\left(\operatorname{popcount}(i) + \operatorname{popcount}(i')\right) - \frac{w}{8}\left(\operatorname{ctz}(i')+\frac{w}{8}\right)}
    {B-2\frac{w}{8}}\right\rfloor\right|
\end{aligned}

Oh hey, there's our $\operatorname{popcount}(i)-\operatorname{popcount}(i')$ expression again. Let's substitute in $\operatorname{ctz}(i')+1$:

\begin{aligned}
\epsilon_{i'''} &= \left|\left\lfloor \frac{\frac{w}{8}\left(\operatorname{popcount}(i) + \operatorname{popcount}(i')\right) - \frac{w}{8}\left(\operatorname{ctz}(i')+1\right)}
    {B-2\frac{w}{8}}\right\rfloor\right| \\
&\le \left|\left\lfloor \frac{\frac{w}{8}\left(\operatorname{ctz}(i')+1\right) - \frac{w}{8}\left(\operatorname{ctz}(i')+1\right)}
    {B-2\frac{w}{8}}\right\rfloor\right| \\
&\le \left|\left\lfloor \frac{0}
    {B-2\frac{w}{8}}\right\rfloor\right| \\
&\le 0
\end{aligned}

And now our error $\epsilon_{i'''}$ is zero, which means $i''' = i$ if all of our above assumptions hold. This gives us our final equation:

i = \left\lfloor\frac{\mathit{off_f} - \frac{w}{8}\left(\operatorname{popcount}(i') + \operatorname{ctz}(i')+1\right)}
    {B-2\frac{w}{8}}\right\rfloor

But there's a few more tweaks. See, both $\operatorname{popcount}$ and $\operatorname{ctz}$ can be hardware instructions, but if they aren't available they need to be emulated with $O(w)$ loops, which isn't great. If we could get rid of one of them that would be nice.

Let's take a look at the $\operatorname{popcount}$ part:

\operatorname{popcount}(i') + \operatorname{ctz}(i')+1

We can shove in our $\operatorname{popcount}$ property here, albiet a bit ungracefully:

\operatorname{popcount}(i') + \operatorname{ctz}(i')+1
-\left(2i'-\operatorname{popcount}(i')-2i'\right) + \operatorname{ctz}(i')+1
-\left(\sum_j^{i'}\left[\operatorname{ctz}(j)+1\right]-2i'\right) + \operatorname{ctz}(i')+1
-\left(\sum_j^{i'-1}\left[\operatorname{ctz}(j)+1\right]+\operatorname{ctz}(i')+1-2i'\right) + \operatorname{ctz}(i')+1
-\left(2(i'-1) - \operatorname{popcount}(i'-1)+\operatorname{ctz}(i')+1-2i'\right) + \operatorname{ctz}(i')+1
-\left(2i'-2 - \operatorname{popcount}(i'-1)+\operatorname{ctz}(i')+1-2i'\right) + \operatorname{ctz}(i')+1
-\left(2i'-2i' - \operatorname{popcount}(i'-1)+\operatorname{ctz}(i')+1-2\right) + \operatorname{ctz}(i')+1
-\left(-\operatorname{popcount}(i'-1)+\operatorname{ctz}(i')-1\right) + \operatorname{ctz}(i')+1
\operatorname{popcount}(i'-1)-\operatorname{ctz}(i')+1 + \operatorname{ctz}(i')+1
\operatorname{popcount}(i'-1)+\operatorname{ctz}(i')-\operatorname{ctz}(i')+1+1
\operatorname{popcount}(i'-1)+2

Plugging this back in to our result gives us the equation used in the code:

i = \left\lfloor\frac{\mathit{off_f} - \frac{w}{8}\left(\operatorname{popcount}(i'-1) + 2\right)}
    {B-2\frac{w}{8}}\right\rfloor

Oh I almost forgot to put $i'$ back in there:

i = \left\lfloor\frac{\mathit{off_f} - \frac{w}{8}\left(\operatorname{popcount}\left(\frac{\mathit{off_f}}{B-2\frac{w}{8}}-1\right) + 2\right)}
    {B-2\frac{w}{8}}\right\rfloor

Once we have $i$, solving for $\mathit{off_i}$ is relatively straightforward. Our solution for $i$ intentionally truncates to the nearest block boundary. So our relative offset in the block is the difference between our original $\mathit{off_f}$ and the truncated offset, $\mathit{off_f}'$, calculated from $i$:

\begin{aligned}
\mathit{off_i} &= \mathit{off_f} - \mathit{off_f}' \\
&= \mathit{off_f} - \left(Bi-\frac{w}{8}\left(2i-\operatorname{popcount}(i)\right)\right) \\
&= \mathit{off_f} - Bi + \frac{w}{8}\left(2i-\operatorname{popcount}(i)\right) \\
&= \mathit{off_f} - Bi + \frac{w}{8}2i - \frac{w}{8}\operatorname{popcount}(i) \\
\end{aligned}

We just have one abnormal thing to watch out for. Since we're a filesystem, these numbers can be quite large. $i$ can be near the upper bound of a 32-bit integer, and in littlefs block addresses and offsets are managed as separate integers to allow the address limit to generall cover a larger range even on 32-bit devices.

Long story short, that $Bi$ is a bit of a problem since it can easily overflow our off/size types. Fortunately we can rearrange the equation a bit to avoid this:

\begin{aligned}
\mathit{off_i} &= \mathit{off_f} - Bi + \frac{w}{8}2i - \frac{w}{8}\operatorname{popcount}(i) \\
&= \mathit{off_f} - \left(B - 2\frac{w}{8}\right)i - \frac{w}{8}\operatorname{popcount}(i)
\end{aligned}

After all that we get our two equations for $i$ and $\mathit{off_i}$:

i = \left\lfloor\frac{\mathit{off_f} - \frac{w}{8}\left(\operatorname{popcount}\left(\frac{\mathit{off_f}}{B-2\frac{w}{8}}-1\right) + 2\right)}
    {B-2\frac{w}{8}}\right\rfloor
\mathit{off_i} = \mathit{off_f} - \left(B - 2\frac{w}{8}\right)i - \frac{w}{8}\operatorname{popcount}(i)

This comment turned into a bit of a long one, but hopefully it helps explain the thought process.

Do ask if something looks wrong or could be simplified. I'll probably add this to the DESIGN.md next chance I get.