Split core implementation into smaller pieces

BenBE commented 8 months ago

Looking at the current implementation of littlefs there's (apart from one function in lfs_utils.c) just one big blob of code implementing the whole filesystem (in lfs.c). While this is a valid way to go about this, this makes the whole source a bit hard to work with outside an IDE.

I'd thus suggest to consider splitting the library into smaller files that are aimed at more specific parts of the implementation. A possible split could involve each public API function and it's "raw" function per file and one or few files implementing some further common internals like skiplist handling, directory walking, block allocation/management and forth.

This issue is mainly for discussion and if there are reasons for the current layout of the code, – apart from historic growth ;-) –, I'd be glad to hear about them. Also no pressing need to implement any discussed changes for the earliest upcoming release; more like a mid-term item on the list.

geky commented 8 months ago

I'd curious exactly what makes the single-file approach hard to work with. I can imagine it's easy to get lost in, but I'm curious what other challenges you have ran into.

The whole single-file organization wasn't really intentional, it just meant I didn't have to think too hard about organization during early development. There's a few points in the code that break through several abstraction layers because of how commits need to be atomic.

That being said, there's been a few surprising benefits to having a single-file:

C doesn't have a concept of "modules", but you can get surprisingly close with static and everything being in a single-file. Aside from the structs, very little of the internal implementation leaks out. If you can call it, it's part of the public API.
Having everything in a single-file acts as a sort of "poor-man's" LTO. The current code size measurements don't use LTO because it doesn't add much (and breaks some scripts). Turns out link-time optimizations don't do much if you don't link.

Because of this breaking up littlefs into multiple files risks a code size increase. Turning on LTO would avoid this, but it's a hard ask for dependencies to always compile with LTO.
I've heard some users say a single-file can make it easier to integrate in idiosyncratic project-specific build systems, but I don't know if this is actually the case in practice.

For these reasons I haven't really considered changing the organization. Though I wonder if there are things that can be done to make it easier to navigate. Like a table of contents in comments, though such a thing would be tricky to keep up to date...

BenBE commented 8 months ago

Having quite a bit experience working with different projects and styles of structuring code there's nothing too surprising with having the code in just one file. With a proper IDE, navigating those ~7kLOC file is not to difficult either. Thus structuring this is not a pressing issue (and as mentioned before: Having all in one file is a valid approach); so it's primarily intended as a starting point for further discussion.

The reason why I bring this topic up, is not because of some optimization stuff or the code being messy (I've seen far worse), but ease of working with the code with limited tool support (I heard there are dev setups that still use plain text editors). Also when the source is split into multiple smaller files (e.g. block layer management, CTZ handling, allocation, directory/metadata handling, …) you can have some focused files, that concentrate the work needed for that particular task. This also helps with some stupid compilers (avr-gcc is kinda notorious for this), that link whole object files (I know about -ffunction-sections and -fdata-sections), bloating the binary if you're not overly careful.

As you may be aware, even FatFS splits parts of its code (mostly related to charset handling) into a separate compilation unit. A similar thing could be done even for littlefs by pulling out the code mostly into "include files" and retain the main C source file as central anchor point (if you want to keep the single compile unit). This point then in turn even could act as a kind of ToC using the various forward declarations right above all the includes for all the functional units. For my own projects I often try to stay below about 1kLOC per "module" so that functionality per file is focused to certain tasks. This helps other people trying to understand the code with navigating things even without actually looking at the whole implementation.

BTW: Having smaller chunks of code in separate files also helps with reviewing the code as it documents the architecture of how things relate and makes jumping between different sections a lot easier.

TL;DR: No urgent need to change anything. Just an invitation for general discussion.

littlefs-project / littlefs

Split core implementation into smaller pieces #888