hexops / mach

zig game engine & graphics toolkit
https://machengine.org
Other
3.4k stars 161 forks source link

proposal: Resource management system #357

Open iddev5 opened 2 years ago

iddev5 commented 2 years ago

Description

General purpose resource management engine which is meant to be cross platform (works on desktop, wasm, android etc.)

The general idea is that in order to access resources, we will use URIs instead of path. All resources will be accessed from the provided API and std.fs.* stuffs should not be used (but the library may use it internally).

The URI will internally evaluate to a path (for desktops) or to an URL (for web). On desktops it will use std.fs APIs and on web it will stream resources which are served on your web server.

Mechanism

Let's say the projects needs a.png, b.png and c.mp3. Then the resources have to be arranged in your directory as such:

This arrangement can be done automatically by some code, which takes a config file (json/yaml/toml) as input and sort the files by their names. The file can look like as such (using json for example):

{
    "collections": [
        {
            "name": "textures",
            "resources": [
                "a.png",
                "b.png"
            ]
        },
        {
            "name": "audio",
            "resources": [
                "c.mp3"     
            ]
        }
    ]
}

The resources arrangement/installation can be part of a build step for seamless building. This step may just install or optionally generate an archive file from all the resources.

The resources will be referred inside the application as such: textures/a.png, textures/b.png, audio/c.mp3. But why not data/textures/a.png? (Answered below in API)

Remember that the names textures, audio etc above are totally cosmetic and has no significance. It can be arbitrarily named anything. [1]

API

The library will provide the following API:

// The main high level API
pub const ResourceManager = struct {
    pub fn init(resource_types: []const ResourceType) ResourceManager;

    pub const ResourceType = struct {
        name: []const u8,
        ext: []const []const u8, // [2]
        is_binary: bool, // May not be needed
        load: *const fn (mem: []const u8) !*anyopaque,
        unload: *const fn (resource: *anyopaque) void,
    };

    // Loads ``name`` collection to the resource manager.
    //
    // ``name`` is the address of the particular collection. This is relevant in context of
    // when we have multiple archives. This name will be used when accessing the resources:
    // ``{name}://textures/abc.png``.
    //
    // The ``root`` param will dictate from where will the resources be loaded, by default
    // it is set to ``{appdir}/data/``. This provides us with the convenience of being able
    // support different mods/themes/resource packs for the end user. It can also point to
    // a file (archive), like ``{appdir}/data.mres``
    pub fn loadCollection(self, name: []const u8, root: ?[]const u8) !void;

    // Create a new resource group
    pub fn pushGroup(self) !ResourceGroup;

    pub const ResourceGroup = struct {
        // Removes/unloads all resources which are in present group. The way is works is, check if
        // ``resources[resources.len - i].group == current_group`` then unload the resource. 
        // Repeat this until we reach a different group in backward order.
        pub fn popGroup(self) !void;

        // Loads the resource into current group. If the resource is already present, then don't
        // do anything.
        // Lets say uri is ``res://textures/ziggy.png`` (res is the collection name), so in a 
        // native device, it will load the resource present at ``{root}/textures/ziggy.png``
        pub fn loadResource(self, uri: []const u8) !void;

        // Returns the Resource. Loads if not already loaded.
        pub fn getResource(self, uri: []const u8) !Resource;
    };

    // Create a new resource group but all the resources are already asked to be loaded
    // using loadResource(). This is useful when you have a large sized game and resource loading
    // may take time. 
    // For noticable results, it should be called before popGroup of the current group.
    pub fn prepareGroup(self) !ResourceGroup;

    pub const Resource = struct {
        // Returns the raw data, which you can use in any ways. Internally it is stored
        // as an *anyopaque
        pub fn getData(comptime T: type) *const T;

        // Returns some info like size, type (binary/text) or whatever we want
        pub fn getInfo() Info;
    };
}

// The lower level API (for interacting with fs directly but within comforts of mach)
// Needed for loading custom user content and saving savefiles/screenshots. Must be
// used with caution and it will require manual ``if (platform) ... else ...;``
// Future TODO (less important)

Open Questions

Future TODOs

EDIT: Oversights


Note: In all instances where multi-threading is mentioned, its about loading resources in parallel and not about the thread safety of the API.

slimsag commented 2 years ago

I like the design overall. Thanks for starting this!

Questions

Thoughts

Should it be a separate mach module (mach-res) as in something like mach-gpu, mach-audio etc? or just be part of main mach application framework.

For things like this, I suspect it is mostly useful in the context of Mach only, and should be deeply integrated. So this can live in a library at src/resource perhaps, and accessible via @import("mach").resource. What do you think?

I'll have more thoughts on other questions you posed once I learn a bit more from your responses to my questions I think.

iddev5 commented 2 years ago

For the sake of explaining, lets just assume that Collection is a single archive. Actually having multiple collections only makes sense in case of archives and thats why its there. But it can be used with local directory based structures. Lets take an example, you have a game and its data is contained in a archive called gamedata.res. Since the assets are isolated you can easily have gamedata as a collection. Take another case in which game directory is textures.res audio.res some.dll game.exe. In this case you need to have different collections (textures and audio). Note that this can be a deliberate design decision. Take for instance its a large game and storing all resources in separate archives makes more sense. This is answer to your last question. (Q6)

Do note that you do not need to have multiple collections in your game. You can perfectly store all resources in one collection/archive (but it may not scale well for large application organization). A collection itself internally follow tree structure, so it can have directories inside.

1) loadCollection should ideally be called before you create any groups. Collections aren't meant to be different for different scenes/chunks. That would be very inefficient. So you should call it just after ResourceManager.init(). If youre thinking about just putting it in init() as a param, it can work I guess.

2) (it's unclear to me) Thats exactly how it will work. ResourceGroups themselves don't store resources. They just ask the engine that they want to use it. loadResource and popGroup do take care about this. (But now that I m thinking, naming should not be pushGroup and popGroup. Since the system can get more complicated as you load resources for a different scene when the current scene is currently running/about to end)

3) While preparing the ResourceManager, you pass it some callbacks (load() and unload()) which will provide you with the raw png data and you are free to use this data inside the function to generate a gpu.Texture. One small oversight here was that we also need to pass in some context (an additional context: *anyopaque param).

To handle, reloads, I think there should be a function to force reload a resource (i.e just reload it even if its already loaded, free and discard previously present data). This action can be signaled by the ECS. The question now is API: a function like ``fn ResourceGroup.reloadResource`` similar to how ``loadResource`` looks (i.e individual to each resource) or a more general ``fn ResourceGroup.reloadResourceType(resource_type_name: []const u8) !void``

4) Answer in last paragraph with context. But I am open to better suggestions.

5) Techincally speaking, all resources are custom in the eyes of this system. The system just loads a chunk of data (from a files/files), pass it to your provided callback (fn load()) and stores whatever it returns. So I m unsure how anything would be different for the case you mentioned. All ecs data and such can be easily accessed with the context parameter. How the file is structured is not a problem of this system. You should have your custom functions to parse that file.

6) Answered in first para.

I suspect it is mostly useful in the context of Mach only

Sounds right. I didnt had any opinion on this so decided its better to ask, just in case.

iddev5 commented 2 years ago

The json collection example I initially created didn't take into consideration having multiple collections (in fact it used the word collection for just any folder). So here's an updated one:

{
    "collections": [
        {
            "name": "texture_collection",
            "tree": [
                {
                    "dir": "player",
                    "tree": [
                        "player_standing.png",
                        "player_running.png"
                    ]
                },
                "tree.png",
                "rock.png"
            ]
        },
        {
            "name": "audio_collection",
            "tree": [
                "c.mp3",
                {
                    "dir": "ambient",
                    "tree": [
                        "chirping.mp3",
                        "wind.mp3"
                    ]
                }
            ]
        }
    ]
}

Json is not good for manually creating trees.

slimsag commented 2 years ago

OK this makes a lot of sense.

  1. I assume that collections config file format (JSON above) is not actually something end users of Mach would need to deal with, it would be produced automatically via a tool based on directory structure, correct? E.g. in build.zig you invoke something and give it data/ and it finds data/textures, data/audio and produces the config file for those two collections?
    • This being the case, it could be a binary file (not human readable) or a generated Zig code file (I see what you were saying earlier in Matrix chat now), is that right?
  2. I presume textures and audio are only examples, in reality you'd most likely have a single collection for all assets in your game, except in cases like:
    • Modding, theming, etc. where you want users to be able to easily override a specific resource group, providing either an archive of their own (single file) or not (plain directory)
    • "Take for instance its a large game and storing all resources in separate archives makes more sense" -> I assume this would be something like 'we split our resource collections every few GiB because any larger than that and they're a little slow to build/produce' or something like this?
iddev5 commented 2 years ago

1) Yes and no. I think its better to say that its optional. So if someone wants, they can provide a list and structure of resources. What are the advantages? Well I think it would make sense in case of editor where you are too busy so you just throw your .psd files beside the exported .tga but selectively only import the .tga file in your editor asset menu.

2) Yes

slimsag commented 2 years ago

Collections is the abstraction / part of this proposal I feel least confident about. I think it enforces a certain way of working with your game data that may not be very clear, and the benefits are not always obvious.

Being very critical of it, most of what it solves can be resolved in other ways:

Modding

Modding, theming, etc. where you want users to be able to easily override a specific resource group, providing either an archive of their own (single file) or not (plain directory)

I think the best-case scenario for modding would be "Here's my mod directory/archive, it wants to override very specific resources, oh and i might have a few others of those.. just use a.png from one of the mod folders if you find it there first, otherwise fallback to the game's a.png file"

Collections don't seem like they would do this at all: they would only let us override specific resources, and only in aggregate. If you wanted to override a few audio files and texture files in a game, for example, you'd need to provide an entirely new audio and texture collection with all game files in it, rather than just overriding the files you want to modify. Additionally, it's not clear that this system could support multiple mods wanting to override multiple different files.

Chunking

We had discussed this benefit of collections:

building the archive of data for my game takes a really long time, it's over a hundred GiB, we need a way to split it into chunks

But, actually I think it's not a big benefit necessarily. A single file can manage all assets in a reasonable way, so long as the file format of that file is reasonable. A good example of this is Guild Wars, where both the original game and newer version 2 game are distributed as a single exe file which downloads a single gw2.dat file with all files in it. As you navigate the game and need new content, it updates that .dat file with more assets.

Excluding of certain files

I think it would make sense in case of editor where you are too busy so you just throw your .psd files beside the exported .tga but selectively only import the .tga file in your editor asset menu.

We could support this with a .gitignore-type file easily.

slimsag commented 2 years ago

Suggestion: update the proposal with a solidified set of goals and non-goals. We could start with this set:

Goals

Non-goals (to be handled at a later date)

slimsag commented 2 years ago

Suggestion: We could remove the idea of collections entirely, and instead add explicit support to the proposal for exclusion of assets and modding:


In order to access a resource, you use a URI instead of a file path:

Depending on where the application is running, and in what mode (release/debug), behavior will differ by default (but you can choose):

Exclusion of assets

Generally speaking, you put all game assets under a folder called data/. In some cases, it may make sense to have files you want to live alongside your game assets such as .psd or .blend files excluded from being included in your final data.res archive. There will thus be a way (TBD, maybe similar to .gitignore, maybe via build.zig options, maybe something else) to exclude files using patterns.

When excluded, they will not end up in the final data.res and will also not be accessible via the API in debug builds either (to prevent accidentally relying on assets which get excluded in release builds.)

Modding

To enable resource modding of Mach games/applications generally, the following will occur:

If running natively (not supported in wasm for now), then a mods folder can live alongside data.res:

mods/ can either be a .res file (same format as data.res), or just plain directories (newmod/).

When loading a file, say a.png, first each mod is checked in alphanumeric order for an a.png file to override the game's resource with. If none is found, then a.png is loaded from data.res.

slimsag commented 2 years ago

Problem: I think the scoping logic may not be sufficient, OR I don't exactly understand how it should work. I see a few use cases we should support with scopes:

  1. "load level 1, now load level 2 and free level 1 - but only load/free the assets that change between the two levels"
  2. "load level 1, now prepare level 2 for me (I'll switch to it soon and free level 1)"
  3. "I'm building an open-world game, I need to load assets near the player and free assets not near the player myself. Grouping wouldn't hurt, but maybe I really do just need to handle loading/unloading myself entirely"

How could the API support all 3?

slimsag commented 2 years ago

Answers:

is_binary: bool, // May not be needed

I agree, not needed. Detecting if a file is binary (and what that actually means) is notoriously difficult/annoying.

How do we actually recognize what is the type of requested resource?

We need a way to register load functions, right? As in, "here are the bytes of the file, now turn it into a type T(like PNG ->gpu.Texture`) - but I guess even if we had a bunch of these functions registered, we also don't know based on a given file/bytes, which one to call either.

We could require that such a function be provided to the getResource function (so you pass it the function that knows how to turn PNG bytes -> gpu.Texture with context.) That's the first thing that comes to mind for me, and doesn't seem too bad. Thoughts?

I think file extension-based would be bad, because some resources with the same extension need to be interpreted differently (e.g. .png could be a gpu.Texture, or it could really be a PNG image someone wants to load and handle themselves (such as for a heightmap, or to do something else funky with.) Similarly, .json files might go into different data structures)

Resources which themselves reference external files (like glTF) / resources which are formed of multiple files obj+mtl has been overlooked here.

This may be quite important to sort out.

One small oversight here was that we also need to pass in some context (an additional context: *anyopaque param).

Agreed.

slimsag commented 2 years ago

I think we've gotten all of the major discussion points out on the table, so we can do one of two things (whatever you're comfortable with):

I'm OK with either at this point, I don't want to place a burden of writing out more stuff here on you just for the sake of it.

iddev5 commented 2 years ago

Problem: I think the scoping logic may not be sufficient,

1 and 2 is solved by the function ResourceManager.prepareGroup(). With this you register what resources are needed for the upcoming scene. This function is called when youre about to end the current scene. The whole system will work in a different thread, so this function will not block. When you do popGroup() on the current group, the system already has one more group on top of this, so it will be careful what to remove.

Do note that pushGroup and popGroup are slightly misleading names as mentioned earlier.

3 ) I dont know if i understand it correctly. Is it about being able to individually load any resource? Well then it will be covered with the low level API. But let's say if the resource is a registered one, maybe we can provide an additional function to directly load with the URI?

iddev5 commented 2 years ago

Alright, I agree with everything here

iddev5 commented 2 years ago

We could require that such a function be provided to the getResource function (so you pass it the function that knows how to turn PNG bytes -> gpu.Texture with context.) That's the first thing that comes to mind for me, and doesn't seem too bad. Thoughts?

The job to convert PNG bytes -> gpu.Texture and so one is performed automatically behind the scenes by the load() function which we provide in ResourceManager.init. What I think can be done here that let say if load() function returns a particular error like error.IncorrectResourceType then it will move on and try using a different resource loader. Plus a function called checkMagic() can be added besides load() which checks file magic to figure out the type, if it returns false, try with the next type and so on. This is a trick used in SDL's helper libraries.

slimsag commented 2 years ago

3 ) I dont know if i understand it correctly. Is it about being able to individually load any resource? Well then it will be covered with the low level API. But let's say if the resource is a registered one, maybe we can provide an additional function to directly load with the URI?

Yes, and sounds good. The point is just being able to load/free resources manually, without groups (think basically "I want to implement my own grouping logic on top, can I?")

slimsag commented 2 years ago

The job to convert PNG bytes -> gpu.Texture and so one is performed automatically behind the scenes by the load() function which we provide in ResourceManager.init. What I think can be done here that let say if load() function returns a particular error like error.IncorrectResourceType then it will move on and try using a different resource loader. Plus a function called checkMagic() can be added besides load() which checks file magic to figure out the type, if it returns false, try with the next type and so on. This is a trick used in SDL's helper libraries.

The problem with this is we can't handle resources in different ways. Let's say my application needs to do two things:

  1. Load PNGs as gpu.Textures
  2. Load a few PNG which I decode pixels of myself, and use as a heightmap to generate some vertices for heightmap terrain. I cannot use a gpu.Texture and I don't want it to be uploaded to the GPU anyway.

We need some sort of way to support "handle the same resource type in different ways" I think.

The problem is more noticeable when you talk about e.g. loading .json files: you don't want one generic load function for this, you want to be able to handle JSON decoding based on say the resource URI using different functions.

iddev5 commented 2 years ago

Can we use the URI scheme for this, since we aren't using it for anything else right now? Like texture://images/player.png and sheet://images/player_anim.png where "texture" and "sheet" must be the name of any one ResourceType we added in ResourceManager.init().

slimsag commented 2 years ago

That sounds like a great solution! Then we don't have to rely on extension or checkMagic either!

In the future (not part of library), we could also allow for loader functions to be defined as part of mach.Modules, specifying a list of URI schemes and loader functions that implement that.

desttinghim commented 2 years ago

I have some audio-centric questions that I didn't see answered.

silversquirl commented 2 years ago

It's very difficult to ensure stutter-free streaming, since you can't predict where the file comes from. If a user has the game stored on a slow HDD, it may not be able to keep up. Similarly for a WASM game trying to stream over a slow mobile data connection.

desttinghim commented 2 years ago

Right, those are factors that can affect streaming. Obviously it would be impossible to guarantee stutter-free streaming in those situations. There are plenty of other places where it is possible to stream the audio - consoles, most modern computers, phones (I assume? I actually don't know how fast storage is on mobile), etc. I guess the question will then come down each individual game, and what your target audience is. Do you want to target older computers to increase your possible reach, or are you wanting to push the limits of modern hardware, or something in-between?

I mainly wanted to know if that use case had been planned for or bring attention to the possibility if it hadn't.