Asset splitting - Githubissues

raphnet commented 2 years ago

If --allowsplitting is passed on the command-line, when an asset larger than 16kB is encountered, instead of generating an error, assets2banks will split it in multiple 16kB chunks and place those in consecutive banks. All parts will end up being consecutive in ROM.

The string "_PARTx" where x is the part number (starting at 0) will be appended to the default resource name.

For instance, bigfile would be declared as

bigfile_PART0 bigfile_PART1

sverx commented 2 years ago

I admit I was thinking about adding this feature myself, but you've been quicker :sweat_smile:

I'm just not totally sure about the way you handled it. I mean, as you know assets2banks can do some processing on the assets such as manipulate the values, add headers and trailing data, etc.

If I understand your code correctly, this won't happen with assets that needs splitting.

If I'm right, do you think there's a way to make it work again?

It probably requires that the assets that needs splitting are handled in a separate list, and to fully process all of them before all the remaining assets...

Your thoughts on this?

raphnet commented 2 years ago

You are right, header/footer and modify operations are not carried through. I don't use those features, and it was easier to omit them. A quick fix would be to have assets2bank at least emit an error if one attempts to use those features with a split asset.

Otherwise, I can try doing better. Here is the approach I think I would take, let me know what you think.

At the moment, the assets source files are read at the last minute and header/footer and modify operations are carried on the fly as the assets are being outputted in the final format (C files or REL files). In my current modification, I already added a data field to the Asset class and added a load method to read the data from the file. My proposition is to go even further and to:

Read source files for all assets (no longer only for those that need splitting) in advance as assets instances are created.
Move the code to prepend/append footer/header to the Asset class, so the asset itself can return a buffer containing its final modified form (header/footer and modifications applied) - perhaps this could be a applyTransformations() method.
Simplify the output code to work with Assets buffers instead of reading from files and applying modifications.

With the above, by the time the AssetGroups get sorted and then iterated on to be added to Banks, all modifications would have been applied so when needed, assets cold still be split like I already do.

sverx commented 2 years ago

sure this approach would work, what doesn't sell this idea to me is that you don't really need to read data in advance because all you need is the asset original size and then its final size after adding header and footer to calculate placement and/or if splitting is needed (also, an asset could need splitting only after adding an header and/or trailing data...)

if you can work out a way for this improvement to work this way I'd be very happy :smiley:

raphnet commented 2 years ago

Sure, I don't really need to read in advance, but the ramifications of not doing it like this make me really want to ;-)

Is there a reason to avoid reading everything in advance? Perhaps due to RAM issues when working with very large assets (not likely for the SMS)?

Because unless I misunderstand, it seems to me that what you suggest would be a lot more complex and involves a lot of checks and corners cases to handle, enough for me to doubt I'll be able to get those right without too much work.

I mean, there are a lot of annoying corner cases. The split can happen anywhere. What if the split occurs within the footer instead of within the data? The remainder of the footer becomes the header of the following block.. And if a modify operation spans across the split point? Then the two blocks need each a new Modify with correctly updated start offset, length, split list of values...

In contrast, when reading everything in advance, and then performing operations (prepend, append and modify) on complete chunks before splitting, none of those corner cases exist! It feels a lot easier to implement, at least for me...

sverx commented 2 years ago

Because unless I misunderstand, it seems to me that what you suggest would be a lot more complex and involves a lot of checks and corners cases to handle, enough for me to doubt I'll be able to get those right without too much work.

Honestly I don't see them. Once you know the asset file size and how much data will be added by headers and footers you know the asset final size, and you can calculate exactly each part size. Then of course in the second part of the process you'll have to assemble the contents of the asset file and its headers and footers and use the sizes of the parts to create the output files.

Now that I think about it, assets2banks should not just support single files bigger than 16 KB but even AssetGroups bigger than that, because the point of the asset groups was to keep data together (originally in the same bank but we would expand that of course)

What do you think about it?

raphnet commented 2 years ago

Oh, I thought of asset groups as a feature meant to have things in a same bank, this is why I thought it did not make sense to support splitting those and even generated an error in my first patch for this situation. But of course it can be done. I pushed an update which now supports splitting of asset groups. I also fixed header/footer and modification support.

I tried to reconsider, but in the end I did not change my approach, sorry. I read the files, apply modifications and add the header/footer earlier in the process, in fact right before assigning asset groups to banks.

Doing it like this feels more natural to me as a newcomer to a codebase I did not originally write and simplifies the implementation details a lot in my opinion, because as I stated before, you do not need to worry about does the split occur in the header, footer or data, and do not need to worry about Modifies which may overlap the split point. Also, when the code reaches the point where it starts adding asset groups to banks, the filename of the source disk asset and the required header/footer and modifications to apply won't change, they are final, so why insist on postponing those operations to the output routine?

Speaking of the output routine, I moved the file loading, header/footer and modification stuff from the output routine to the Asset class. Removing this from the output routine may be a good thing in the long run. For instance if one day someone (I have a feeling I may want .asm output for wla-dx in the future) wants to add support for other output formats (obj, elf, asm, etc), they can create a function which takes the Bank list in argument and just worry about outputting the bytes in their format, since everything else is done in advance (each asset's data[] member is ready to go!)

sverx commented 2 years ago

Well, you're probably right, and even if I do still believe everything would have worked even without loading the data from the files 'sooner', it is likely that there's no real harm in doing that.

BTW, did you check if your code correctly handles both the assets present in the asset folder but not listed in the config file and those that are instead listed in the config file? I'm mostly worrying about the latter, as you surely need to first read all the attached attributes (such as :format unsigned int for instance) before reading the file contents. If you do read the file(s) contents after having read the whole config file than there should be no problem - and reading your code it seems to me you're doing this already.

What I mean is: as far as the whole process goes following this flow:

read the config file (if present) and prepare assets and asset groups (no file contents read here)
scan the asset folder for any other asset not listed in the config file and prepare assets for those (no file contents read here)
read the contents from the asset files, process them if needed (modify data, prepend header, append footer...)
create banks
write output in the requested output format

I'm fine with it, as I wasn't much worrying about how much RAM we need to keep all the data before writing output files.

Let me know - and thanks for your improvements!

raphnet commented 2 years ago

Yes, it is done exactly in the order of your bullet list. Reading and processing the assets is done right before creating banks, after reading the config file and scanning the directory.

During development, I did test that it works with or without a config file, as well as for mixed use (i.e. some files omitted from the config file). As for attached attributes, I only tested the :header attribute however, but I see no reason for others to work. Unfortunately I do not have a test suite covering all use cases. I was hoping that maybe you had one ;-)

My changes did not break my project, but I am not using config files nor attributes...

sverx commented 2 years ago

I only tested the :header attribute however, but I see no reason for others to work.

LOL! :sweat_smile:

I will run a few tests ASAP but I don't expect problems. I'll then merge this PR.

Thank you so much for adding this feature!

sverx commented 2 years ago

I just realized this part here https://github.com/sverx/devkitSMS/blob/master/assets2banks/src/assets2banks.py#L307 doesn't work correctly with assets with :format unsigned int attribute.

This part can be either fixed or removed.

sverx commented 2 years ago

Also seems like the compiled output is broken (seems again related to the asset group size). I don't know what's going on, I'm quite sure it was working fine...

sverx commented 2 years ago

Not even the not compiled output is working now. Is my computer going crazy? This doesn't make any sense!

sverx commented 2 years ago

I'm an idiot! :sweat_smile:

I had commented the whole block:

for ag in AssetGroupList:
    for a in ag.assets:
        a.process()
        if len(a.data) != a.size:
            print("Fatal: Internal error")
            sys.exit(1)

thus removing the process call completely.

So we just need to fix the check for the :format unsigned int attribute, likely

... sorry... :bowing_man:

sverx commented 2 years ago

commenting only this part now:

        # ~ if len(a.data) != a.size:
            # ~ print("Fatal: Internal error")
            # ~ sys.exit(1)

and everything seems fine

sverx commented 2 years ago

this seems to be enough to fix the issue:

        if (a.style == 0 and len(a.data) != a.size) or (a.style == 1 and len(a.data) != a.size/2):
            print("Fatal: Internal error")
            sys.exit(1)

sverx commented 2 years ago

Commit pushed. Sorry for all this mess of messages :sweat_smile:

sverx / devkitSMS

Asset splitting #35