Open i30817 opened 6 years ago
This is a fair point. These duplicate entries being shown need a solution of some sorts.
There is also the case where two or more different dump groups have the same exact rom (hashcode too obviously). In that case my recommendation is to ... let the more popular/respected group win the race (ie: no intro would always win over goodtools).
In fact i'd suggest duplicated entries like that don't even get into the database, just one canonical for the most respected group atm.
(this might actually already being done).
Serials are useful when you want the ambiguity (for instance, game configs that apply to multiple versions of the game because they're all broken or enhanced the same way), or displaying the same cover for a single 'key' (although this is sometimes a bit inaccurate for extensive hacks, which are rare). So they're still useful in the database but more as a secondary key.
You know what, i'm going to give a suggestion on the assumption you're not going to decide to revert the serial scanning.
Give a option, either:
You or a 3rd party would still be able to bundle game specific configuration bundles if the serial was treated as primary key (say for fixing letterboxing in ps1 games or frame delay or having arcade cabinet overlays) but they'd would have false positives for certain hacks configurations (which would require 'specific' overriding, ie: if the crc/md5 is known and matches, overlay these changes).
Basically don't assume that just because you have a serial match on the database that the hashcode is the same and think case by case if the operation requires a unique rom.
Leaving things 'lazy' on 1 might be a acceptable result, if you put in a little warning on the user trying a feature requiring exact matches on roms that are 'large enough' on size 'this feature requires a long crc calculation Y/N?'
This way, features would all be accessible, but would have a small delay the first time around or a warning that the delay is going to be much longer. Hack names and descriptions would be the main victim of this idea and the reason why I feel the 2. is still necessary, because 1. would give the wrong result right away for those things.
Alternative to 2 that would work with 1. as the only (invisible) option, force games which have known hacks to be identified by hash:
Keep the hack database separate from the 'published rom' database. After finding a serial check first if it's in the hack database (it would need to be modified to get them but it's feasible). If so require a hashcode calculation before creating the playlist entry and check if it isn't a hack with that before checking the 'published' database (problem: not all hacks are in the database, but that would just mean a coder decision, either assume the serial is a unknown variation of a original published rom for the 'lazy' info of 1. or not put a new entry on the playlist at all and if you're security paranoid even warn the user that romhack is fishy).
This is not as slow as it seems because outside of the snes and nes there aren't really hundreds of hacks (especially cd games which are the slowest offenders).
The reddit discussion also made me realize something else. Having a serial scanner is even worse for 'security' than accepting hacks with the right crcs from romhacking.net, which was the shade being thrown at me (not as bad as listing anything with the right extension though).
Any ROM found anywhere that has the right serial could be confused with a whitelisted version of the original game, when if you kept using the hashes and avoid anything that doesn't match, they'd at least have to put a malicious rom on romhacking.net (and when caught, it would be removed and removed from libretro-database, like the hacks with game ending bugs - at least if people - ie: me - keeps using rhdndat to validate dats).
have these of fbalpha, nes, snes and others. considering playlists are created based on hash, the other hashes should not show. (posting here to reconfirm this issue)
I noticed this too, it seems dependent on the game more than the core as some games won't cause it.
This is pretty backward and weird that it hasn't been addressed for so long. Just don't use the serial as a UID, it makes no sense because it doesn't work.
It's fast and that's all that matters to (l)users. Until there is a option to opt-out of the insanity this will still happen.
My main beef with it - besides not being given the option - is how it's warping the functionality of both RA and libretro around this 'heuristic' model. The database becomes almost useless for advanced metadata, not that it has any. Even hack names can't reliably be identified, even if they're there.
Causing bugs like this is bad too ofc.
It's fast and that's all that matters to (l)users.
It depends, doesn't it? Even some users scanning folders full with gigabytes of isos may be more inclined to be patient and have their collection properly arranged, but I understand the performance benefit - especially if you're going to scan a folder multiple times.
Until there is a option to opt-out of the insanity this will still happen.
An opt-out option at the very least would be nice. At least for single file scanning it could consider calculating a checksum by default (with serial as opt-in), and for folder-wide scan being opt-out.
I'm curious to see some benchmarks on crc32 across platforms and file sizes, it will obviously be a whole lot slower than seeking a few bytes at an offset for sure.
oh, it takes hours, depending if you have a 10 years+ harddisk instead of a ssd. But with the magic of fileystem extended attributes you could only do it once. And as a bonus you could even make softpatches record the 'correct' hardpatched hash, with some trickery.
xattrib memoization doesn't work for hardlinks of ROMs unfortunately, because linux was stupid here, but it may work if your filesystem does native deduplication (not sure about this myself, i only verified that changing a xattrib on a hardlink changed it too on the original).
Or you could be a bit more intelligent when scanning playlists and reusing entries crcs whose rom file last modification is > than the older playlist creation time and have the same path, that would probably be much more portable. Then for softpatches you'd make tool or option or extra property to modify/record the hardpatched CRC in the playlist (modify and not recreate so that 'rom file last modification is > than the older playlist creation time' keeps happening).
'Only 2 hard problems in computing science, naming, caching and off-by-one'.
First and foremost consider this:
Description
Some rom information that comes from libretro-database is being duplicated on RA rom information page.
for instance this is a megadrive playlist
Alien Soldier (Europe)
:There are more hashsums listed but not visible (about 7 of them all different as far as i can tell). The top is also obviously a result of a merge of a different style of dat since it includes the serial.
The real rom has crc 0496e06c (the last listing on the screenshot and also the 'first' in a different style, of info, probably because take from another type of dat).
Other crcs (not listing md5 to simplify) are 0A9C04C9, 9B1A1D7F, B5CB1CEB, 3F5746BA, 3DB10C48, C4EB3F76
None of these other crcs appear on the original no-intro dat (but i have a slightly older one as should be obvious on the first entry of the list bellow), but it's definitely the result of using serials to 'identify' games and ending up with multiple versions of the game on the 'info' instead of the right version.
grep on a repository of the libretro-database confirms it
Anyway, it's extremely sketchy to use serials to 'identify' roms especially if RA is going to try to merge all possible dumps on the database - not that only using no-intro would help, for instance the first on that list is the no-intro virtual console version.
Expected behavior
Stop using serials to uniquely identify games. Use them as shared data foreign keys (like pictures of covers) but not for 'actually unique' information like hashcodes or other properties on the per-game entries on dats. A single rom should give a single result, end of story.
Actual behavior
Trying to accommodate every rom version and not calculate hashcodes one time 'because it's slower to scan' leads to this, among other things like hacks getting identified as the 'game' and not using their actual database entries.
Even if you completely hid the info page just to keep using the faster serial scan, a hashcode check is necessary if you want to support features which ask for exactness. What if two roms with the same serial but different hashes try to netplay? What if people want the display name to be the hack name? What if people want the ability to go to a hack page (as in my recently open request) and this opens 'all' of the pages for hacks on that game?
If you're trying to move away from hashcodes upfront calculation - features which need it would need it calculated before a feature which requires uniquness is used which is a much worse user experience imo if the hash is going to be a '4 gb wii disc' or similar.
Steps to reproduce the bug
Bisect Results
[Try to bisect and tell us when this started happening]
Version/Commit
You can find this information under Information/System Information
Environment information