profi200 / open_agb_firm

open_agb_firm is a bare metal app for running GBA homebrew/games using the 3DS builtin GBA hardware.
GNU General Public License v3.0
962 stars 49 forks source link

Creating a save type database. #9

Open profi200 opened 4 years ago

profi200 commented 4 years ago

Some people probably noticed when i pushed this file. The idea is to make a database of save types for all GBA games ever released numbered using the no-intro.org database. Similar databases (script to convert) exist but there are a number of problems with them. They are not numbered using the no-intro db and i found several broken games testing only a hand full of them (mostly the less popular games).

The big question: Why? Because all save type detection attempts so far didn't give satisfying results. The biggest problem are EEPROM games where i just can't properly detetct if 4k or 64k.

I surely won't go over the almost 3000 games alone so i want to make this a community effort.

Rules:

To make this easier i will probably make a build including a save type selector later. This is subject to change.

TurdPooCharger commented 4 years ago

This submission is a rough draft. Treat the spreadsheet as a starting template.

I might be able to later help verify those listed games and their compatible save types.

Currently, I don't have a large enough capacity SD card that makes it possible to batch install 300 GBA VC injections and have made prior arrangements to debug test other homebrews.

profi200 commented 4 years ago

That's quite a lot of games already. Thanks.

Release number seems to be missing for a few but otherwise ok. The difference between EEPROM save types 0 and 1 (and 2 and 3) is the ROM size. The later type is usually used when the ROM is bigger than 16 MiB. Should probably include that. The SDK save strings are often misleading btw. Which is why we have this problem in the first place. The ROM header doesn't tell us the save type either.

Also yes, usually the flash manufacturer doesn't matter but if we are building such a db we might as well use the correct one used in real carts.

TurdPooCharger commented 4 years ago

I went back and edited the spreadsheet for those 32 MB (256 Mbit) games with EEPROM save types to correspond to 0x1 or 0x3. See my above first post for the updated list.

In the sorted tab, the numbers in the first column next to Region do not reflect the DAT-o-MATIC numbers. They were pulled from ADVANsCEne in the order they were listed when searched with "AGB-". There wasn't an easy way to separate their lists of GBA Release, Pure, and X-Files.

Here's the gba dat list added to another spreadsheet. If tagging those 1712 titles listed in sorted is absolutely necessary, this will have to be later revisited and manually added in to the first .xlsx file.

There are additional titles on D-o-M not found on AS. I went ahead and add those to sorted.

profi200 commented 4 years ago

Meanwhile i'm fiddling with parsing the no-intro dat file i got or rather a version converted to json to auto generate every entry of the header file. In theory it's then as easy as just filling in the missing save types.

profi200 commented 4 years ago

Got a good looking list out of my crappy converter. Only problem is some betas and prototypes use the same game code.

Warning. Huge list: https://gist.github.com/profi200/f6788190c6cec54f187cfa11ef1ed32c

SirLoopy commented 4 years ago

Would there be a benefit to adding a CRC type code to each entry for verification on game load? This could help when multiple betas/protos use the same code. Also could help identify invalid roms.

profi200 commented 4 years ago

CRC is useless for bigger amounts of data imho. You can get collisions very easily. For smaller data it's fine. There is no acceleration for CRC32 on the 3DS while there is for SHA1 and SHA256. At least the db contains SHA1 hashes.

For now this list above is to show how it could look. I think this entry layout looks good and it's easy to find releases. We will see how to solve the same game code issue. Possibly i will also alter the save type values to make it more universal for emulators and not just this project.

profi200 commented 4 years ago

Ok, i think we have a solution for the problem. The idea came from Wolfvak. Instead of searching linearly by the game code instead we do a binary search using the hash of the ROM. This will require some changes to the db format and unfortunately means separate entries for each region/language of a game instead of all in one entry. Also to avoid hardcoding such a huge db it will be moved to a file on the SD card where it can be independently updated.

I will not be doing much in the next 7 or so days because we have a heat wave. It's just too hot.

profi200 commented 4 years ago

A little update: It's still very hot here and that is not going to change until the end of the week. I will try and work on this on evenings.

profi200 commented 4 years ago

It's been quite some time. I finally made the promised build so people can start creating the save type database.

Place "gba_db.bin" (SHA1 2f28d3338b27a21ae4f404bc3c0e8b4a21edda23) on the root of the SD card and launch the included FIRM file. You will get a small dialog for save type selection after choosing a game. This dialog will show what save type is currently saved in the db file (none by default) and what the auto detection thinks. Don't rely on auto detection but use it as a guidance which category of save hardware the game uses (EEPROM, Flash, S/FRAM). Each selectable type has the numbers listed so you can match them to what is printed as being in the database.

Keep the rules at the bottom of the touchscreen in mind when trying save types. Make backups of the database sometimes in case you screwed up or it gets corrupted. If a game doesn't work with any save type report it here

Removed build. See below for a less buggy one.

dicelander commented 4 years ago

Hi! I know this is still an early alpha and everything is hardcoded, but is the build allowing to select save type still in the roadmap? Or will you keep with this approach of using a database for detecting save types?

I'm trying this build for building a database and have been attempting a few games so far, but I'm asking because I like messing with romhacking and game patching, so I've noticed that if it doesn't find the hash in the database it doesn't allow you to select a save type. Could it be made so that it allows selecting it anyway if the hash isn't found, but just doesn't modify the database (as it is only for no-intro games)?

Wouldn't it be easier to build a database based on user selection on the fly instead of having a hardcoded one? Maybe having a separated config file (something like "RomFileName.cfg") for games for which the user has changed options that deviate from the autodetected ones?

Thanks a lot. I'll be sure to contribute to the database anyway with my no-intro verified dumps.

Edit: Also, Classic NES Series - The Legend of Zelda only works with Flash 512k RTC, is that really supposed to happen? I know classic nes series games have some weird stuff going on, but... real time clock?

profi200 commented 4 years ago

Per game save type override is coming later. Having a reliable database to make all the original games work is higher priority right now.

That doesn't sound right. All the Classic NES Series games use EEPROM 4k iirc. I will check that.

EDIT: Yeah, ok. There was a bug. As for Zelda it works with EEPROM 64k only it seems.

@TurdPooCharger Mentioning you in case you are still interested in helping.

If anyone is using the above build please replace it immediately. gba_db.bin SHA1 bc376a624f214d70b404d8d2afae42d5b26a7fb9 [removed old build]

profi200 commented 4 years ago

A little update:

I will collect database files and merge them as needed. I will see if i can give status updates on which games still need testing and which are working either using a GitHub gist or a .csv table you can open in any office program.

gba_db.bin SHA1 daf3c0683526efef0c5d3935d44b9c55514e3d84 open_agb_firm_save_db_build3.zip

dicelander commented 4 years ago

Another weird thing I've noticed: When using this save_db build, for some games (I've noticed it on Classic NES Series Dr. mario and Classic NES Series Bomberman), it incorrectly auto detects the save type as 2 - EEPROM 64k, and I set it to EEPROM 4k/8k. The game doesn't even start (shows error screen) when save type is 2.

But, when using the release build, the game starts normally. Shouldn't it give an error? Or is the release build, when it detects EEPROM, defaulting EEPROM saves to type 0 even when it detects type 2?

Edit: I didn't express it very well, so basically: Release build: game starts and saves correctly, no info about save detection. save_db buld: detects save type 2 when it's actually (0,1). So I assumed release build would also detect and try to use type 2 and game would give an error.

profi200 commented 4 years ago

As explained above the auto detection is flawed. It can be considered a hint only which tells you which save technology the game uses (EEPROM, Flash, SRAM or FRAM where SRAM and FRAM are basically the same). It works for some games in the release builds because i have a small list of overrides for specific games and it will also force EEPROM 4k/8k for Classic NES Series games.

The entire point of this special build is to collect the correct save types for every existing game ;)

dicelander commented 4 years ago

Ah, so there are overrides already in place for some games in the release builds. I see, thanks! I was wondering if it was some kind of bug that could help in solving the save detection issue.

Also, is support for games that use RTC and EEPROM (boktai series are the only ones I know that use this combo, so patches are needed anyway) possible or there is some kind of limitation?

Thanks!

profi200 commented 4 years ago

There is no hardware support for this combination. It will probably need patches. https://github.com/profi200/open_agb_firm/blob/master/include/hardware/lgy.h#L23-L38

As for the override list: https://github.com/profi200/open_agb_firm/blob/kernel_experiments/source/arm11/main.c#L157-L160 https://github.com/profi200/open_agb_firm/blob/kernel_experiments/source/arm11/main.c#L168-L172

profi200 commented 3 years ago

I should give an update for this.

I'm a little disappointed. Everyone wants a proper AGB_FIRM replacement/alternative with more features than the original but no one wants to help with the save type db. I'm about 1400 in the list and had to go through some pretty weird games and i can't even read Japanese.

I will continue but i surely will not spend all my free time to go through the list of games.

Kirit29 commented 3 years ago

I completely forgot about this. Umm is there a newer build to test the saves or is the one linked up above still fine? Also do you have an updated savetype db.bin? I may be able to help out in the coming weeks. Although I can't read Japanese either unfortunately.

Masamune3210 commented 3 years ago

I literally just found this project recently, if you can outline or point out an outline on how to go through the process, I could probably start going through games in my free time, as I have access to most of the library of GBA games

profi200 commented 3 years ago

I will upload a new build and the current db later.

As for how it works it's pretty simple usually. Auto detect tells you what it thinks the save type is. For example EEPROM 64k. That tells you the save type is most likely one of the 2 EEPROM types. You start with EEPROM 4k and test if the game is properly saving and loading the save. If not you go one size higher and delete the old save with X in case any is found. If the game is still not saving you can try other types (smallest first). If the game uses any extra hardware like tilt sensor or RTC note that down somewhere. If you can't figure out how to save you can load the game in mGBA and see what size save file it creates. 512 bytes is EEPROM 4k, 8 KiB (or KB if you prefer) is EEPROM 64k... and so on. In rare cases a game might not work at all or is very glitchy. Note that down and move on. If auto detect says the save type is a flash save type with RTC do not set it to flash + RTC immediately. Try without first. The auto detect assumes all flash types have RTC because it can't detect if the game uses RTC.

If you note problematic games or ones with extra hardware down at least note the no-intro release number and the problem. I have a .txt file for this.

Also a tip. Often you can avoid going ingame by changing settings/options and see if the changes persist across reboots of the game. If not with any save type continue ingame. If the game also says saving failed that obviously means the save type is wrong.

edit: New build: open_agb_firm_save_db_build4.zip gba_db.bin SHA1 231690f25497b857084ef061b8580b2c29d9781f

Current notes:

dir | release | problem/notes
2   | 0059    | Multiple save SDK strings but actually legit SRAM? Detects as EEPROM.
5   | 0246    | Not in db. Corrupt dump.
9   | 0405    | This game has a tilt sensor. Needs a patch.
12  | 0574    | Can't get to the first save point. EEPROM 4k according to mGBA.
17  | 0810    | Can't get the highscore. EEPROM 4k according to mGBA.
17  | 0814    | How the fuck do i save? EEPROM 4k according to mGBA.
20  | 0987    | No, just no. Just let me save. EEPROM 64k according to mGBA.
21  | 1019    | "Plust Gate" extra hardware? SRAM according to mGBA.
21  | 1021    | Saves with both EEPROM types wtf. EEPROM 64k according to mGBA. But saves within the first 0xF8 bytes so EEPROM 4k?
22  | 1053    | How the fuck do i save? EEPROM 64k according to mGBA.
22  | 1082    | How the fuck do i save? EEPROM 64k according to mGBA.
22  | 1085    | Needs solar sensor patches.
24  | 1191    | Falsely detected as EEPROM? Doesn't save anywhere. At least one minigame uses passwords.
25  | 1240    | EEPROM 4k but game acts very buggy/glitchy with static sounds. Works with ROM mirroring. Bad dump?
26  | 1252    | How the fuck do i save? Writes something on new game only with EEPROM 4k. EEPROM 4k according to mGBA.
27  | 1302    | EEPROM SDK string but uses a password system. mGBA detects nothing.
28  | 1354    | Should not this release be called "Corvette Anniversary"?
28  | 1385    | How the fuck do i save? EEPROM 4k according to mGBA.
htv04 commented 3 years ago

@profi200

Okay, on my fork, I made a change that made it so that every EEPROM type defaulted to 8k since that was the more common variant for all of the EEPROM types. I was also looking for faster ways to add save type exceptions to open_agb_firm, and I came across this issue and found @TurdPooCharger's Excel sheet.

After some tinkering, I came up with this function, which looks for "EEPROM v(any) - 64 kbit" entries and generates a string that can be copied/pasted into the exception list if it finds one: =IF(COUNTIF(TRIM(G1), "Eeprom v??? - 64 kbit"), "{""" & MID(F1,5,4) & """, SAVE_TYPE_EEPROM_64k}, // " & TRIM(C1),)

However, it's probably not feasible in the long run to add all of these, especially when there's a database in the works. So I thought of a better idea, formatting the data into a database as you made. Only I would have to do it differently since TurdPooCharger's Excel sheet doesn't contain the SHA-1 hashes.

To format my custom database, I thought of the most critical things needed in a save type database, the title ID, and the save type. The other things your database has, like the SHA-1 (at least I think it's the SHA-1?) and the game title are unnecessary since the SHA-1 can potentially trip up ROM hacks (and overrides for ROM hacks that change the save type can be left for the user to add) and the game title isn't used by anything. Then, I created another function (=IF(COUNTIF(TRIM(G1), "Eeprom v??? - 64 kbit"), MID(F1,5,4) & ",EEPROM_64k",)) that separated the needed data in the CSV format. After this, I copied/pasted the column with the generated entries into a text editor, used it to delete all of the blank lines in between the entries.

This is the result.

I don't know how hard it would be to implement a CSV parser into open_agb_firm, but the format is very simple, and probably better to use since it doesn't waste space. If not, it can be converted to a DB like yours, I think.

profi200 commented 3 years ago

The db contains unnecessary data to make conversion back to a human readable format easier. That bloat will be removed later. I plan to convert the db to json later and host that maybe in a separate repo. The binary format in oaf will stay since it's required for fast lookup. The way the db is ordered right now is by the number given by the first 8 bytes of the SHA1 hash (entries ascending). This way i can do a binary search on the entries. And the game code is not helpful in identifying a game either because that often collides with betas and prototypes i have found. I want to support not just retail games.

And as i mentioned the SDK string doesn't help at all to identify the EEPROM size. It only tells you it's EEPROM and which SDK version the game was made with. mGBA is a better way of getting many save types fast but it's not always accurate either (found a few edge cases).

Meanwhile i'm at 1500. Didn't do many games since i was using the time for other things.

htv04 commented 3 years ago

I see. Does your database have any other games than EEPROM? The database I generated in my post has all 436 EEPROM 64k entries from TurdPooCharger's Excel sheet. I might be able to extend the function I made to support the 8k entries and other save types as needed (in fact, do we need a database for any games other than the ones that use EEPROM).

As for the SHA-1 hashes, I supposed if those are needed, I'll try to add them in.

If you want to stick with the DB format, I guess I can try to format my database for it. What tool do you use to export it (or do you add entries manually)?

htv04 commented 3 years ago

@profi200

Update.

I was able to completely reverse-engineer your gba_db.bin so that I could recreate it. Since TurdPooCharger's Excel sheet didn't have enough info to add everything needed for the database, I resorted to MAME's gba.xml, which had everything I needed.

Then, after an hour or so of learning how the hell xml.etree.ElementTree works, I wrote a Python script that completely parses MAME's gba.xml into a gba_db.bin file compatible with open_agb_firm.

I also included the resulting gba_db.bin in the repo, if you want to try it out. There are exactly 3,011 entries. It seems to be just a little smaller than yours, so maybe the MAME database is missing a few games that your database has. I noticed that some of the entries you had were not given a save type when they had one, though. Later on, I could try to modify the script to append any missing entries from the database into the resulting gba_db.bin.

Hope this helps!

profi200 commented 3 years ago

I have seen this db/XML before but seeing games with missing save type entry right a the beginning was not very convincing. And not sure how up to date this is compared to no-intro. That's why i ditched the idea of using it.

My db contains all games and not just EEPROM. It's probably also not up to date anymore with latest no-inro changes. That's another reason why i included seemingly unnecessary data. Makes it a lot easier to match against an updated db.

And yeah, for now i encoded the ROM size in log2 within the same field as the save type. This "attribute" field eventually gets even more bits but i have not decided on a layout yet. Basically it should contain extra bits that say what extra hardware a game uses or if it supports special features like Game Boy Player.

I will see later if i can import most of the missing entries from your db. But i want to go over the >3000 games/betas/prototypes anyway to make sure it's as accurate as possible.

htv04 commented 3 years ago

Compared to No-Intro, it's probably way more up-to-date. No-Intro seems to have a lot of wrong save type entries. For example, one of the games in your problem list with scene number 0059 actually uses SRAM and is accurately reflected in the MAME db while it's wrong in the No-Intro db (shows up as EEPROM).

Additionally, there was an interesting save type it had for a few games, namely the one you mentioned that worked with both 4k and 64k EEPROM types. The save type was just "gba_eeprom," instead of "gba_eeprom_4k" or "gba_eeprom_64k." However, the MAME GBA emulator seems to just take it in as 4k, so I adjusted my script to match it to the 8k save type (change not committed yet as of writing).

My script grabs all of the known save types now, not just EEPROM like with the previous CSV db I had. Which games were missing from the MAME db? The only things I really saw missing were the Virtual Console versions of some of the games, and the db has a lot of "bad dump" versions of games I want to remove (might add No-Intro DAT support for filtering the games and giving them proper titles by matching their SHA-1 hashes).

As for making sure the db is complete as possible, I don't really see a problem with that, but imo I think an alpha update should be released with gba_db.bin support, since most of the games people would play through open_agb_firm are now supported.

profi200 commented 3 years ago

I think you misunderstood that. no-intro doesn't really track save types (with some entries being the exception) and it's not in the exportable XML either which is why i don't bother. But the db i want to create follows their numbering and naming schemes.

htv04 commented 3 years ago

Oh, I see, sorry. Anyways, I can make a pull request with all the changes from my repo, if you want. It's mostly minor things, like the addition of the resources folder for the db build script and db file, as well as some save exception and other QOL changes.

pistonfish commented 2 years ago

Wouldn't it be better to use id's as an identifier for the games instead of hashes? That would make the whole thing much easier for modified games and the used database could be much smaller

profi200 commented 2 years ago

There are multiple reasons for using hashes:

pistonfish commented 2 years ago

The software also starts games that are not in the dat file. I also think that the end user should be responsible to get working files. I see that using the id alone is not perfect but using the hashes is not ideal either. It might be a little personal, but I use modified games such as rom hacks and trims much more often than prototypes and demos.

profi200 commented 2 years ago

Trimmed ROMs will work since they get automatically untrimmed while loading. Not convinced regarding responsibility. I'm 100% sure people will blame oaf for their corrupted files not working and open more issues. The main deal breaker for me is the database lookup. I don't want to search through thousands of entries linearly.

The only compromise i can offer is to fall back to searching for the serial if no match was found + a warning. I'm not yet sure how that will work because the db doesn't tell me if an entry is for a prototype/demo or not.