Open slimsag opened 1 year ago
There are a few layers of a game dev oriented filesystem that I'm interested in exploring. This issue addresses one of the foundational layers, an archive file format, but in the medium/long term here's what I'm hoping to go for. Feedback is welcome.
server.openStream(path: []const u8) io.StreamSource
and would report whether such a path existed in the server. There could be more than one type of asset server depending on how needs develop.
fs.addAssetServer(WebAssetServer.init("https://supercoolfallbackgameassetserver.net/whatishappening.php.bf"));
fs.addAssetServer(LocalDirectoryAssetServer.init(fs.getExecutableBaseDirectory(allocator)));
fs.addAssetServer(ArchiveAssetServer.init("path/to/base_game_assets.pck"));
fs.addAssetServer(ArchiveAssetServer.init("path/to/mod_that_overwrites_base_game_assets.pck"));
When someone requests a file from the filesystem, it would poll each server and retrieve the first matching stream it can find. Asset servers here function like mounting an archive in PhysFS, but more generalized.
As far as the archive format itself goes, I've currently prototyped a writer but no reader yet for a simple archive format I'm currently calling mach-pck
, since pck
is commonly used as a generic archive extension. I'd appreciate better/more specific name suggestions.
The format is just a header followed by a list of data blocks, each of which report their name, checksum, compression mode, and the byte range in the file at which their data can be found. The writer currently concatenates all file data into one big blob at the file's end, but since each file block just reports the range in the file that you can find the data, it's also possible to interlace file data with descriptors which would make archiving files sequentially easier and possibly use less memory.
I'm currently working on a reader for the format and a CLI tool to pack files into it. Most compression methods will need to be pre-compressed before compiling the archive, but I'll just use the stdlib's decompression routines when reading.
@spindlebink in general this sounds like a good direction to me, and the file format sounds correct as well - but the devil is in the details. I'd encourage sending these changes one at a time very incrementally to the main repo so we can start to integrate them and make sure we see eye to eye as we go so to speak. I also imagine the CLI can be part of the mach
editor CLI here: https://github.com/hexops/mach/tree/main/src/editor
Does that sound like a good starting point?
Some other thoughts:
So I think the ideal end state for files stored in a .pck
file would be:
The only question would be whether we employ some compression for the 'header' of the file with all the metadata. I think probably this is a good reason to keep the metadata at the start of the file, and not scattered, as it keeps this option open.
The tag for the decompression method is per file block, just an enum(u8)
, which means adding decompression methods for specific endpoints only requires implementing it in the reader and adding an enum member. Designing a file import workflow is important, so I'll open a new issue for that. I'm trying to keep the archive format generalized, since design work there will reflect on design work here.
whether we employ some compression for the header of the file
Right now, the header contains no structural information about the file other than the total length and the interlacing mode, which is necessary for the reader to know where to start reading the next block [*]. Directly following the header are individual data blocks prefixed by a block type (enum(u8)
again), which is where the archive stores file info.
header: signature version block_mode body_len
file block: filename range [...]
file block: filename range [...]
file block: filename range [...]
big binary blob: ............
It's easy to add a CompressedBlockRange
block indicating a range to decompress and read as a list of blocks:
header: signature version block_mode body_len
compressed block range: compression_mode range
(when uncompressed)
file block: filename range [...]
file block: filename range [...]
file block: filename range [...]
big binary blob: ............
Then when the reader encounters a compressed block range block type, it slices out the range indicated, decompresses it, and uses the same archive reading routines to parse the blocks inside it.
A concern: files will need some information provided per file when packing them (i.e. compression mode, file name relative to the archive if it needs to be different from cwd, more if the import pipeline needs it). Specifying that information over the CLI every time the user builds an archive means that the CLI will become unwieldy for anything of scale.
It could be cool to include archive manifest information in the build.zig
as part of the build step, but A) this possibly implies repacking every archive every build and B) this makes it more of a pain to build archives outside of the game source tree (e.g. for asset mods and more generally for keeping the artist workflow distinct from the code workflow).
My proposal instead is that the archive CLI rely on a manifest file which is passed to the CLI when packing. I don't currently see a way to parse .zon
via the Zig standard library, so in the meantime I'll use INI or JSON or something.
[*] It occurs to me that including the offset of the next block in each block would do away with needing to differentiate between interlaced and non-interlaced modes entirely and also help with both version incompatibility warnings and validation. There might be some issues there I'm not thinking of, though, so I'll think on it.
Right now, the header contains no structural information about the file other than the total length and the interlacing mode [...] A concern: files in this archive format need some per-file information attached to them (i.e. compression mode, file name relative to the archive if it needs to be different from cwd, more if the import pipeline needs it).
It is this information that I'd expect to be in the header of the file, and which I'd like to see compressed
Specifying that information means a purely CLI approach to archive management would be unwieldy to use for anything of size.
I don't think so, I would imagine the CLI could expose unix-like file commands. e.g. ls
, mv
, cp
, touch
, stat
which would decompress+read the header and perform operations, updating/writing to the pack file if needed.
My proposal instead is that the archive CLI rely on a manifest file which is passed to the CLI when packing. I don't currently see a way to parse .zon via the Zig standard library, so in the meantime I'll use INI or JSON or something.
The manifest being a separate file (or perhaps just a chunk at the end of the file) is interesting.
It might be worth thinking about this as two concepts: one blob of data that is just ranges of file bytes, one blob of data that is describing everything about those ranges of bytes (file name, modtime, compression type, etc.)
I would suggest this: a single file which has a layout like:
binary data length (`u64`)
binary data (a big `[]const u8`): (zero metadata, just arbitrary bytes)
[file1]
[file2]
[file3]
[...]
metadata length (`u32`)
metadata (whatever type is needed):
[file1 byte range]
[file1 compression type]
[file1 name]
[file1 ...]
I'd also suggest using a binary format for the metadata instead of JSON/INI/whatever.
When implementing tools, we'd simply read the binary data length
and seek/skip over that many bytes to get to the metadata, at which point you can find any file in the pack. To add files, you would just trim the metadata off the end of the file, append the new file, and write the updated metadata. To delete files, you could zero the bytes and have a special metadata field that marks the file as 'deleted' until you run a garbage-collection operation that rewrites the whole file without that byte range or such.
Sorry, reading back I realize I wasn't very clear. I was talking about two different subjects--the metadata for each file and its byte range is currently stored in binary in the file itself, and can definitely be compressed. The manifest would be to describe the file tree pre-packing, such that packing assets into the archive could be a single step done when building a release game package.
Unix-style commands for the CLI make a lot of sense, the problem I was trying to solve in the latter half of my comment was that the process of updating the packfile every time an asset changes and re-supplying the information could get tedious for projects of scale.
Gotcha, that makes more sense. I understand what you meant now.
What about a model of say: create an archive from <this directory>
and everything from that directory gets included, no manifest required? We could support a .machignore
file to allow you to exclude certain files
Oooh, that'd be super clean. I guess in that case we'd rely on opinionated defaults (a la your comment above) for most file types (identified via extension), then maybe support a means to override it in specific cases like Godot's .import
files?
Yeah, that sounds reasonable on the surface 👍 In general I think opinionated defaults are best
I'd like to have something like this - yes. No immediate plans to work on this (since we have higher priority things before then), but it would be written in Zig when we do (rather than say bindings to PhysFS) and the design may be a bit different. it wouldn't go in mach-core, since mach-core aims to be very minimal. Just window+input+GPU. Rather it'd go in a separate package/module somewhere else.
Good question; I definitely haven't thought about it super in-depth, but could start doing so if you're eager to work on something like this in the mach codebase (which I'd be happy to have). on the surface, I see a few things here
so, stepping back from those high-level thoughts, I think the first implementation of this could be something like: a library which implements an archive file format which has metadata first, and zstd compressed file regions afterwards + some tooling to create it and develop/work with it nicely.
once such a thing exists, an 'overlay' concept / PATH analog ('you load one or more archives into a sort of PATH analog, which makes modding and patching completely transparent.') would be very nice to have, I would also very much like that as a way to support modding nicely