proposal: Resource management system

hexops / mach

zig game engine & graphics toolkit

https://machengine.org

Other

3.4k stars 161 forks source link

proposal: Resource management system #357

Open iddev5 opened 2 years ago

iddev5 commented 2 years ago

Description

General purpose resource management engine which is meant to be cross platform (works on desktop, wasm, android etc.)

The general idea is that in order to access resources, we will use URIs instead of path. All resources will be accessed from the provided API and std.fs.* stuffs should not be used (but the library may use it internally).

The URI will internally evaluate to a path (for desktops) or to an URL (for web). On desktops it will use std.fs APIs and on web it will stream resources which are served on your web server.

Mechanism

Let's say the projects needs a.png, b.png and c.mp3. Then the resources have to be arranged in your directory as such:

data/
- textures/
  - a.png
  - b.png
- audio/
  - c.mp3

This arrangement can be done automatically by some code, which takes a config file (json/yaml/toml) as input and sort the files by their names. The file can look like as such (using json for example):

{
    "collections": [
        {
            "name": "textures",
            "resources": [
                "a.png",
                "b.png"
            ]
        },
        {
            "name": "audio",
            "resources": [
                "c.mp3"     
            ]
        }
    ]
}

The resources arrangement/installation can be part of a build step for seamless building. This step may just install or optionally generate an archive file from all the resources.

The resources will be referred inside the application as such: textures/a.png, textures/b.png, audio/c.mp3. But why not data/textures/a.png? (Answered below in API)

Remember that the names textures, audio etc above are totally cosmetic and has no significance. It can be arbitrarily named anything. [1]

API

The library will provide the following API:

// The main high level API
pub const ResourceManager = struct {
    pub fn init(resource_types: []const ResourceType) ResourceManager;

    pub const ResourceType = struct {
        name: []const u8,
        ext: []const []const u8, // [2]
        is_binary: bool, // May not be needed
        load: *const fn (mem: []const u8) !*anyopaque,
        unload: *const fn (resource: *anyopaque) void,
    };

    // Loads ``name`` collection to the resource manager.
    //
    // ``name`` is the address of the particular collection. This is relevant in context of
    // when we have multiple archives. This name will be used when accessing the resources:
    // ``{name}://textures/abc.png``.
    //
    // The ``root`` param will dictate from where will the resources be loaded, by default
    // it is set to ``{appdir}/data/``. This provides us with the convenience of being able
    // support different mods/themes/resource packs for the end user. It can also point to
    // a file (archive), like ``{appdir}/data.mres``
    pub fn loadCollection(self, name: []const u8, root: ?[]const u8) !void;

    // Create a new resource group
    pub fn pushGroup(self) !ResourceGroup;

    pub const ResourceGroup = struct {
        // Removes/unloads all resources which are in present group. The way is works is, check if
        // ``resources[resources.len - i].group == current_group`` then unload the resource. 
        // Repeat this until we reach a different group in backward order.
        pub fn popGroup(self) !void;

        // Loads the resource into current group. If the resource is already present, then don't
        // do anything.
        // Lets say uri is ``res://textures/ziggy.png`` (res is the collection name), so in a 
        // native device, it will load the resource present at ``{root}/textures/ziggy.png``
        pub fn loadResource(self, uri: []const u8) !void;

        // Returns the Resource. Loads if not already loaded.
        pub fn getResource(self, uri: []const u8) !Resource;
    };

    // Create a new resource group but all the resources are already asked to be loaded
    // using loadResource(). This is useful when you have a large sized game and resource loading
    // may take time. 
    // For noticable results, it should be called before popGroup of the current group.
    pub fn prepareGroup(self) !ResourceGroup;

    pub const Resource = struct {
        // Returns the raw data, which you can use in any ways. Internally it is stored
        // as an *anyopaque
        pub fn getData(comptime T: type) *const T;

        // Returns some info like size, type (binary/text) or whatever we want
        pub fn getInfo() Info;
    };
}

// The lower level API (for interacting with fs directly but within comforts of mach)
// Needed for loading custom user content and saving savefiles/screenshots. Must be
// used with caution and it will require manual ``if (platform) ... else ...;``
// Future TODO (less important)

Open Questions

Should it be a separate mach module (mach-res) as in something like mach-gpu, mach-audio etc? or just be part of main mach application framework.
Should []const ResourceType be taken as comptime parameter? as it is known what type of resources the application will use beforehand.
How do we actually recognise what is the type of requested resource?
- Should we actually make [1] a mechanism, or should we use the file extensions? [2] Or should [load/get]Resource take an additional paramater?

Future TODOs

Come up with a suitable archive format.
- Because loading individual files on windows is very slow, and to some extend it may help with web too.
- Should ideally be uncompressed for faster resource loading.
- References: https://github.com/raysan5/rres/tree/master/design
Multi-threading support.
- It has not been planned properly yet. The prepareGroup() function, for example cant be implemented because we dont have any function to ensure all resources are loaded. That is, in a way the current API is blocking+single-threaded but with some consideration for multi-threading.
Finish the low level API

EDIT: Oversights

Resources which themselves reference external files (like glTF) / resources which are formed of multiple files obj+mtl has been overlooked here.

Note: In all instances where multi-threading is mentioned, its about loading resources in parallel and not about the thread safety of the API.

slimsag commented 2 years ago

I like the design overall. Thanks for starting this!

Questions

When can loadCollection be called? It seems to return void so presumably it must be called before any other methods?
If I understand correctly, ResourceGroup could be used to describe a scene for example. In this context: Let's say scene A depends on textures (a, b, c) and you want to transition to scene B which requires textures (b, c, d) - ideally (b, c) are not unloaded, while (a) is unloaded and (d) is now loaded. Does the loadResource and popGroup API support this? (it's unclear to me)
In games there is often a need to load resources which are used to compute/build something else. For example, you might load a PNG texture into memory in order to upload it and create a gpu.Texture. In this situation, you ideally do not want the PNG texture to remain in memory after upload to the GPU finishes on a separate thread. Additionally, if the GPU device context is lost (can happen at any point on mobile and web), the GPU may just "lose all data entirely" and we'd need to be able to handle this event by loading such resources from disk (PNG) and uploading them to the GPU again. How can this system support this?
How would a resource loader for textures get access to the GPU, say to upload the texture to the GPU? Would the resource manager be responsible for that, or does it merely handle loading of resources from disk?
Have you given thought to what custom resources might look like? For example, I can imagine some applications needing to define their own resource formats like Minecraft block chunks which are expensive to compute/produce, require some information from the game (ECS), and get saved in a custom binary format they define.
Can you give some real-world examples of how you envision "Collections" being used? I see textures and audio used as examples right now, it's not fully clear to me what benefits collections give us

Thoughts

Should it be a separate mach module (mach-res) as in something like mach-gpu, mach-audio etc? or just be part of main mach application framework.

For things like this, I suspect it is mostly useful in the context of Mach only, and should be deeply integrated. So this can live in a library at src/resource perhaps, and accessible via @import("mach").resource. What do you think?

I'll have more thoughts on other questions you posed once I learn a bit more from your responses to my questions I think.

iddev5 commented 2 years ago

For the sake of explaining, lets just assume that Collection is a single archive. Actually having multiple collections only makes sense in case of archives and thats why its there. But it can be used with local directory based structures. Lets take an example, you have a game and its data is contained in a archive called gamedata.res. Since the assets are isolated you can easily have gamedata as a collection. Take another case in which game directory is textures.res audio.res some.dll game.exe. In this case you need to have different collections (textures and audio). Note that this can be a deliberate design decision. Take for instance its a large game and storing all resources in separate archives makes more sense. This is answer to your last question. (Q6)

Do note that you do not need to have multiple collections in your game. You can perfectly store all resources in one collection/archive (but it may not scale well for large application organization). A collection itself internally follow tree structure, so it can have directories inside.

1) loadCollection should ideally be called before you create any groups. Collections aren't meant to be different for different scenes/chunks. That would be very inefficient. So you should call it just after ResourceManager.init(). If youre thinking about just putting it in init() as a param, it can work I guess.

2) (it's unclear to me) Thats exactly how it will work. ResourceGroups themselves don't store resources. They just ask the engine that they want to use it. loadResource and popGroup do take care about this. (But now that I m thinking, naming should not be pushGroup and popGroup. Since the system can get more complicated as you load resources for a different scene when the current scene is currently running/about to end)

3) While preparing the ResourceManager, you pass it some callbacks (load() and unload()) which will provide you with the raw png data and you are free to use this data inside the function to generate a gpu.Texture. One small oversight here was that we also need to pass in some context (an additional context: *anyopaque param).

To handle, reloads, I think there should be a function to force reload a resource (i.e just reload it even if its already loaded, free and discard previously present data). This action can be signaled by the ECS. The question now is API: a function like ``fn ResourceGroup.reloadResource`` similar to how ``loadResource`` looks (i.e individual to each resource) or a more general ``fn ResourceGroup.reloadResourceType(resource_type_name: []const u8) !void``

4) Answer in last paragraph with context. But I am open to better suggestions.

5) Techincally speaking, all resources are custom in the eyes of this system. The system just loads a chunk of data (from a files/files), pass it to your provided callback (fn load()) and stores whatever it returns. So I m unsure how anything would be different for the case you mentioned. All ecs data and such can be easily accessed with the context parameter. How the file is structured is not a problem of this system. You should have your custom functions to parse that file.

6) Answered in first para.

I suspect it is mostly useful in the context of Mach only

Sounds right. I didnt had any opinion on this so decided its better to ask, just in case.

iddev5 commented 2 years ago

The json collection example I initially created didn't take into consideration having multiple collections (in fact it used the word collection for just any folder). So here's an updated one:

{
    "collections": [
        {
            "name": "texture_collection",
            "tree": [
                {
                    "dir": "player",
                    "tree": [
                        "player_standing.png",
                        "player_running.png"
                    ]
                },
                "tree.png",
                "rock.png"
            ]
        },
        {
            "name": "audio_collection",
            "tree": [
                "c.mp3",
                {
                    "dir": "ambient",
                    "tree": [
                        "chirping.mp3",
                        "wind.mp3"
                    ]
                }
            ]
        }
    ]
}

Json is not good for manually creating trees.

slimsag commented 2 years ago

OK this makes a lot of sense.

I assume that collections config file format (JSON above) is not actually something end users of Mach would need to deal with, it would be produced automatically via a tool based on directory structure, correct? E.g. in build.zig you invoke something and give it data/ and it finds data/textures, data/audio and produces the config file for those two collections?
- This being the case, it could be a binary file (not human readable) or a generated Zig code file (I see what you were saying earlier in Matrix chat now), is that right?
I presume textures and audio are only examples, in reality you'd most likely have a single collection for all assets in your game, except in cases like:
- Modding, theming, etc. where you want users to be able to easily override a specific resource group, providing either an archive of their own (single file) or not (plain directory)
- "Take for instance its a large game and storing all resources in separate archives makes more sense" -> I assume this would be something like 'we split our resource collections every few GiB because any larger than that and they're a little slow to build/produce' or something like this?

iddev5 commented 2 years ago

1) Yes and no. I think its better to say that its optional. So if someone wants, they can provide a list and structure of resources. What are the advantages? Well I think it would make sense in case of editor where you are too busy so you just throw your .psd files beside the exported .tga but selectively only import the .tga file in your editor asset menu.

The one I mentioned in the matrix chat was an unrelated feature which I didnt added to this proposal because I m not sure yet. See, archives would need to maintain a record of what files they have anyways. For plain directories, I dont think its much needed. The system can just error.FileNotFound. Generating the structure of collection wont give us any advantage because we cant validate what assets will be loaded in future at comptime, nor can we validate their types.

What I was actually referring to on matrix is a way to automatically generate the bunch of ``loadResource`` function calls instead of manually typing it one by one. The problem is that we need to provide a list of resources which are going to be used in that scene. I m not sure if this is a good idea. Ofcourse this will be completely optional, so with editors we can just do it easily. This has the additional disadvantage that it can break the resource streaming system. So I m unsure until we come up with a decent plan. In either case I think its better to push this to a future proposal/plan since its just a convenience function and not a functional one.

2) Yes

Actually if you want to override a specific resource pack, then you just need to change root. See above.
Yes, and thats where being able to have multiple collections makes sense.

slimsag commented 2 years ago

Collections is the abstraction / part of this proposal I feel least confident about. I think it enforces a certain way of working with your game data that may not be very clear, and the benefits are not always obvious.

Being very critical of it, most of what it solves can be resolved in other ways:

Modding

Modding, theming, etc. where you want users to be able to easily override a specific resource group, providing either an archive of their own (single file) or not (plain directory)

I think the best-case scenario for modding would be "Here's my mod directory/archive, it wants to override very specific resources, oh and i might have a few others of those.. just use a.png from one of the mod folders if you find it there first, otherwise fallback to the game's a.png file"

Collections don't seem like they would do this at all: they would only let us override specific resources, and only in aggregate. If you wanted to override a few audio files and texture files in a game, for example, you'd need to provide an entirely new audio and texture collection with all game files in it, rather than just overriding the files you want to modify. Additionally, it's not clear that this system could support multiple mods wanting to override multiple different files.

Chunking

We had discussed this benefit of collections:

building the archive of data for my game takes a really long time, it's over a hundred GiB, we need a way to split it into chunks

But, actually I think it's not a big benefit necessarily. A single file can manage all assets in a reasonable way, so long as the file format of that file is reasonable. A good example of this is Guild Wars, where both the original game and newer version 2 game are distributed as a single exe file which downloads a single gw2.dat file with all files in it. As you navigate the game and need new content, it updates that .dat file with more assets.

Excluding of certain files

I think it would make sense in case of editor where you are too busy so you just throw your .psd files beside the exported .tga but selectively only import the .tga file in your editor asset menu.

We could support this with a .gitignore-type file easily.

slimsag commented 2 years ago

Suggestion: update the proposal with a solidified set of goals and non-goals. We could start with this set:

Goals

Cross-platform (Desktop, WASM, Android, iOS, etc.)
Handles loading resources from disk (native) and over HTTP (WASM)
Enables modding of game resources for any game made in Mach in a simple way
Protect developers from typical resource management footguns, such as:
- Accidently reloading assets when transitioning between e.g. levels in a game (if both levels require similar, subtly different resources)
- Keeping unneccessary resources in memory, accidentally duplicating resources needlessly
- Using too many separate files, rather than one big archive (which can substantially harm performance on Windows.)
Loading of non-disk assets (custom loader functions)

Non-goals (to be handled at a later date)

Game patching
Downloads of game assets from a server (e.g. in the case of an MMO)
Scene management (which assets are require for a given scene)

slimsag commented 2 years ago

Suggestion: We could remove the idea of collections entirely, and instead add explicit support to the proposal for exclusion of assets and modding:

In order to access a resource, you use a URI instead of a file path:

data://textures/a.png
data://audio/c.mp3
data://junk.txt

Depending on where the application is running, and in what mode (release/debug), behavior will differ by default (but you can choose):

Release
- Natively
- std.fs.* APIs will be used to access a single-file archive data.res for example, which contains all assets for the game packed into a single file. The file will contain a header which describes where to locate files within the single-file archive, so we can e.g. seek to a specific file in the archive to read it. We can chat more about the specifics of this file format, but I think it's safe to assume in general we can come up with a good way to pack multiple files into a single one and manage that in a way that is performant and can support incremental updates (adding new files, updating existing ones, deleting ones, etc.)
- The data.res file is produced from your game's data/ directory via build.zig at build time. The only constraint the system poses is that you provide a single directory where your assets will live.
- Browser
- Same as native, except we use JavaScript HTTP Range requests to query byte ranges of the data.res file. Works in the same way as file seeking natively, effectively.
Debug
- Natively
- Assets will be loaded directly from the data/ directory (this enables swapping out assets at runtime without rebuilding/updating the data.res archive file.)
- Browser
- Assets will be loaded single-file via HTTP GET requests.

Exclusion of assets

Generally speaking, you put all game assets under a folder called data/. In some cases, it may make sense to have files you want to live alongside your game assets such as .psd or .blend files excluded from being included in your final data.res archive. There will thus be a way (TBD, maybe similar to .gitignore, maybe via build.zig options, maybe something else) to exclude files using patterns.

When excluded, they will not end up in the final data.res and will also not be accessible via the API in debug builds either (to prevent accidentally relying on assets which get excluded in release builds.)

Modding

To enable resource modding of Mach games/applications generally, the following will occur:

If running natively (not supported in wasm for now), then a mods folder can live alongside data.res:

game.exe
data.res
mods/
- mytexturepack.res
- newmod/
- a.png

mods/ can either be a .res file (same format as data.res), or just plain directories (newmod/).

When loading a file, say a.png, first each mod is checked in alphanumeric order for an a.png file to override the game's resource with. If none is found, then a.png is loaded from data.res.

slimsag commented 2 years ago

Problem: I think the scoping logic may not be sufficient, OR I don't exactly understand how it should work. I see a few use cases we should support with scopes:

"load level 1, now load level 2 and free level 1 - but only load/free the assets that change between the two levels"
"load level 1, now prepare level 2 for me (I'll switch to it soon and free level 1)"
"I'm building an open-world game, I need to load assets near the player and free assets not near the player myself. Grouping wouldn't hurt, but maybe I really do just need to handle loading/unloading myself entirely"

How could the API support all 3?

slimsag commented 2 years ago

Answers:

is_binary: bool, // May not be needed

I agree, not needed. Detecting if a file is binary (and what that actually means) is notoriously difficult/annoying.

How do we actually recognize what is the type of requested resource?

We need a way to register load functions, right? As in, "here are the bytes of the file, now turn it into a type T(like PNG ->gpu.Texture`) - but I guess even if we had a bunch of these functions registered, we also don't know based on a given file/bytes, which one to call either.

We could require that such a function be provided to the getResource function (so you pass it the function that knows how to turn PNG bytes -> gpu.Texture with context.) That's the first thing that comes to mind for me, and doesn't seem too bad. Thoughts?

I think file extension-based would be bad, because some resources with the same extension need to be interpreted differently (e.g. .png could be a gpu.Texture, or it could really be a PNG image someone wants to load and handle themselves (such as for a heightmap, or to do something else funky with.) Similarly, .json files might go into different data structures)

Resources which themselves reference external files (like glTF) / resources which are formed of multiple files obj+mtl has been overlooked here.

This may be quite important to sort out.

One small oversight here was that we also need to pass in some context (an additional context: *anyopaque param).

Agreed.

slimsag commented 2 years ago

I think we've gotten all of the major discussion points out on the table, so we can do one of two things (whatever you're comfortable with):

Update the proposal to take into account all we've discussed so far, and continue from there.
OR just go ahead with the implementation, taking into account all that's been said here as best you can / as you see fit. Then continue discussion as we review PRs to implement this.

I'm OK with either at this point, I don't want to place a burden of writing out more stuff here on you just for the sake of it.

iddev5 commented 2 years ago

Problem: I think the scoping logic may not be sufficient,

1 and 2 is solved by the function ResourceManager.prepareGroup(). With this you register what resources are needed for the upcoming scene. This function is called when youre about to end the current scene. The whole system will work in a different thread, so this function will not block. When you do popGroup() on the current group, the system already has one more group on top of this, so it will be careful what to remove.

Do note that pushGroup and popGroup are slightly misleading names as mentioned earlier.

3 ) I dont know if i understand it correctly. Is it about being able to individually load any resource? Well then it will be covered with the low level API. But let's say if the resource is a registered one, maybe we can provide an additional function to directly load with the URI?

iddev5 commented 2 years ago

Alright, I agree with everything here

Collections need to go.
I like the idea to use archives in release mode and folders in debug mode. But i think there should not be any mechanism in the library to assert this. And that the user should be able to change the behavior for release mode if they desire to do so.
For exclusion, I think we can indeed use an exclusion list. And then we can get rid of the json. Great!
Modding: I in general like this part but would like to propose minor changes. Let the resource manager initializer take a list of paths. It will try to load the resource from 1st, if not present then try in 2nd and so on. The actual resource archive being last in this list. This is similar to how PATH env var works. The reason for this proposal is that I think we shouldn't dictate what installing mods should look like to the end application, so basically it can just disallow mods if it wants.

iddev5 commented 2 years ago

We could require that such a function be provided to the getResource function (so you pass it the function that knows how to turn PNG bytes -> gpu.Texture with context.) That's the first thing that comes to mind for me, and doesn't seem too bad. Thoughts?

The job to convert PNG bytes -> gpu.Texture and so one is performed automatically behind the scenes by the load() function which we provide in ResourceManager.init. What I think can be done here that let say if load() function returns a particular error like error.IncorrectResourceType then it will move on and try using a different resource loader. Plus a function called checkMagic() can be added besides load() which checks file magic to figure out the type, if it returns false, try with the next type and so on. This is a trick used in SDL's helper libraries.

slimsag commented 2 years ago

3 ) I dont know if i understand it correctly. Is it about being able to individually load any resource? Well then it will be covered with the low level API. But let's say if the resource is a registered one, maybe we can provide an additional function to directly load with the URI?

Yes, and sounds good. The point is just being able to load/free resources manually, without groups (think basically "I want to implement my own grouping logic on top, can I?")

slimsag commented 2 years ago

The job to convert PNG bytes -> gpu.Texture and so one is performed automatically behind the scenes by the load() function which we provide in ResourceManager.init. What I think can be done here that let say if load() function returns a particular error like error.IncorrectResourceType then it will move on and try using a different resource loader. Plus a function called checkMagic() can be added besides load() which checks file magic to figure out the type, if it returns false, try with the next type and so on. This is a trick used in SDL's helper libraries.

The problem with this is we can't handle resources in different ways. Let's say my application needs to do two things:

Load PNGs as gpu.Textures
Load a few PNG which I decode pixels of myself, and use as a heightmap to generate some vertices for heightmap terrain. I cannot use a gpu.Texture and I don't want it to be uploaded to the GPU anyway.

We need some sort of way to support "handle the same resource type in different ways" I think.

The problem is more noticeable when you talk about e.g. loading .json files: you don't want one generic load function for this, you want to be able to handle JSON decoding based on say the resource URI using different functions.

iddev5 commented 2 years ago

Can we use the URI scheme for this, since we aren't using it for anything else right now? Like texture://images/player.png and sheet://images/player_anim.png where "texture" and "sheet" must be the name of any one ResourceType we added in ResourceManager.init().

slimsag commented 2 years ago

That sounds like a great solution! Then we don't have to rely on extension or checkMagic either!

In the future (not part of library), we could also allow for loader functions to be defined as part of mach.Modules, specifying a list of URI schemes and loader functions that implement that.

desttinghim commented 2 years ago

I have some audio-centric questions that I didn't see answered.

Can assets be streamed from the disk? Long audio clips usually shouldn't be stored in memory if it isn't necessary
If assets can be streamed, is there a way to prioritize loading certain assets over others? It's generally more important to ensure continuous audio playback then it is to load textures in quickly

silversquirl commented 2 years ago

It's very difficult to ensure stutter-free streaming, since you can't predict where the file comes from. If a user has the game stored on a slow HDD, it may not be able to keep up. Similarly for a WASM game trying to stream over a slow mobile data connection.

desttinghim commented 2 years ago

Right, those are factors that can affect streaming. Obviously it would be impossible to guarantee stutter-free streaming in those situations. There are plenty of other places where it is possible to stream the audio - consoles, most modern computers, phones (I assume? I actually don't know how fast storage is on mobile), etc. I guess the question will then come down each individual game, and what your target audience is. Do you want to target older computers to increase your possible reach, or are you wanting to push the limits of modern hardware, or something in-between?

I mainly wanted to know if that use case had been planned for or bring attention to the possibility if it hadn't.