libretro / RetroArch

Cross-platform, sophisticated frontend for the libretro API. Licensed GPLv3.
http://www.libretro.com
GNU General Public License v3.0
10k stars 1.8k forks source link

[PLAYLISTS] Add an option to scan unknown roms #8624

Closed andiandi13 closed 4 years ago

andiandi13 commented 5 years ago

[ISSUE UPDATED] The other one was blocked

Many many, users complains about RetroArch not adding files to their playlists because the CRC doesn't match the original dump.

It can be because of a translation patch, an SRAM patch, and so on...

So I suggest a very simple solution :

Just add an option in playlist menu that could be named 'Ignore CRC while scanning roms' or 'Add unknown roms to playlists', whatever.

The thing is that it'll replace that line

"crc32": XXXXXXXX | crc"

By

DETECT

In order to easily display all roms on playlists, with a label taken from rom filenames, so that roms with a good labels would display thumbnails, and roms with another label (like T-eng 100% etc...) would not display cover (it's not a big deal).

RetroArch playlists are painful to use because of those unique CRC32, and it would be wonderful to simplify it in that way !

For now, I'm creating my Playlists with RetroArch Playlists Manager on Windows, to add my SRAM patched GBA roms, as well as all my non-detected translated roms.

Thanks for reading, I really hope to see that option.

Version

Environment information

i30817 commented 5 years ago

I should apologize. I misread the previous issue and thought it was about 'disabling CRCs' when i already knew the CRCs weren't used anymore and was pissed about it. To be constructive i suggest using 'Add unknown roms to playlists' for this reason.

However, i'm not sure it's that easy to make a 'match' of a rom to a image/data with a filename. They're not unique not only because of hacks but because dumping groups (from where the name comes from) can have the same name for the same release of the same game on a different console. This is kind of unintuitive, but i'd expect this to happen a bunch on redump for instance.

So it'll have to be a tuple (scanned cd/rom console type, filename). I'm also unsure that this makes it any better than the serial¹, which if it's failing, it's likely to be failing in classifying the cd or extracting the serial because of non-standard dumping formats of the users, which would also apply to the first element of that tuple. Eg: the feature may get implemented and the games that are missing still not appear because the bug was elsewhere.

edit: ¹ actually it sounds slightly better in that versions will have different names on redump and truerip standard namings (but not re-editions i think) so at least no more serial duplicates in those sets (except re-editions, which if you squint, don't count), if they actually happened with serials (i only tested on the genesis, where it does happen).

andiandi13 commented 5 years ago

They're not unique not only because of hacks but because dumping groups (from where the name comes from) can have the same name for the same release of the same game on a different console. This is kind of unintuitive, but i'd expect this to happen a bunch on redump for instance.

What about the header ? When I check roms header I see the name and other infos.

e.g. if the header is FINAL FANTASY TACTICS and the extension is .gba, then it must recognize it has FF tactics gameboy advance and match the thumbnail and the title in the db.

It's funny how easy it is to create manual playlists on a PC, but seems to be very hard to do it directly on RetroArch :/

i30817 commented 5 years ago

What about the header ?

tl;dr: the header is (mostly) irrelevant for platform id of roms and the 'header' for the cds across platforms can mean many different things, where all of them have the same problems of cd images heuristic platform identification being necessary to get anywhere because the extension is not indicative of the platform.

For roms, platforms can be easily be recognized by the extension so parsing is 'not even necessary' most times - i know one case though, a .bin from a cue and a .bin from a genesis rom dump need to be distinguished.

For cd consoles however, cd-images have a proliferation of formats and meta-formats and the user if he has the game will just dump it on a aleatory format.

You generally can trust the dumping groups to use the same format (but not hacks, for technical reasons). However, that format usually doesn't have a platform extension. A cd image is a cdimage, unless it's a ngc file, in which case it has a platform extension (the only reason redump did this accidental good deed was that ngc 'cd' really aren't normal cds so they can't be read on normal cd mounters or burnt to iso).

The RA code needs to actually parse then but it doesn't actually try to read the 'bytes' as they are if the cd image was mounted; because a multiplatform mulltiformat cd image mounter that runs on toasters is hard work. Instead it does some heuristics, tries to find the track that has the 'magic bytes' identifying the platform (sometimes doesn't start at track one or you even have to chase down the real file because the tracks are divided into files) and then attempt reading those bytes from the file directly to match.

This fails if the cd image is some format the magic bytes didn't expect sometimes (that user dump case). For example, most psx cues point to MODE2/2352 (a bin). However they might as easily point to a MODE2/2048 (a iso). These files not only have different block sizes but also different 'start headers' (so the magic is wrong). Or the code didn't expect the user to put in a say, nero image in the scanner, so the 'nero header + 'SONY CORPORATION bla blah' string is not even there.

So you basically have a situation where the images describe the same game but the representation in bytes is different enough that the RA platform identifier routine gets confused, often (for personal dumps).

Hacks are mislabeled for roms and cds if using serials, as i already explained, so i much prefer checksums, if they aren't flawed themselves (a bunch of open and closed bugs kinda shows that the checks for CRC can easily screw up because the dumping groups made it hard - and slow - to be correct).

The only way this will get tamed is if RA starts directing people to use the MAME chd set imo (that has a sane internal checksum that doesn't need runtime calculation).

andiandi13 commented 5 years ago

If you want a list like that may as well just browse to the folder.

lol

More seriously. Here is an example of a playlist that I want RetroArch to make :

1.C:\retroarch\roms\Zelda (USA).gba" 
2. Zelda (USA)
3. DETECT
4. DETECT
5. DETECT
6. Nintendo - Nintendo Gameboy Advance.lpl

Let's take the problem line by line :

  1. Can RetroArch list all files in a folder to a playlist, rather based on a known extension, or not ? YES
  2. Can RetroArch copy the filename to the title ? YES
  3. Can RetroArch write DETECT ? YES
  4. Same
  5. Same
  6. Can RetroArch match a console with a file do determine what files go in what playlist and name those playlists ? I don't know and I think that's the main issue.

Here are some ideas :

Alternatively, a detection file could be placed into a folder, whatever it's name, to help RetroArch determine in which playlists would go the roms inside, and to not force users to rename their folders a specific name. e.g. in the folder /roms/MySNESroms, we just copy a file named "Nintendo - Super Nintendo Entertainment System.whatever" inside, and RetroArch would detect that "whatever" extension and put all the files into that folder in a Super Nintendo playlist.

I think the first option seems pretty easy and feasible...

i30817 commented 5 years ago

I honestly agree with that idea that the user should be able to start a 'dumb' scan on a folder and the resulting files should either be assumed (per directory name) or chosen to go a console playlist manually (per GUI). Users that are organized already place their games per platform on the filesystem and it would be good to have a scan type that doesn't assume that hacks are the same game as the original (though i'd prefer if the libretro-database CRCs were actually used ofc, for correct metadata instead of native filenames entries with no metadata because the games weren't found in the database, or that the CRC found were actually correct instead of being a secondary property of a query using a non-unique key that may or may not give the right result).

But i think the RA team would rather find and fix the reason why some (original) games aren't scanning correctly on the automatic scanner first, shrug. It just so happens that my idea that the scanner is failing at platform ids is what I think it's most likely to be happening but that really needs to be tested/debugged with concrete files that are failing. It may also be that the platform is correctly found but the cd image is sufficiently funky that the cd serial parser can't handle extracting it with just a byte search.

andiandi13 commented 5 years ago

Yeah as soon as we have the choice of scan type, it's OK.

Let's wait what the devs will think about it :)

hizzlekizzle commented 5 years ago

Simple directory listing seems to be what most people want out of the playlists, but they also want boxart/thumbnails, and that's where the problem comes in. That is, hooking things up when we have no way of knowing what's what. We can do fuzzy matching, but then we have to add a bunch of stuff (menus, etc.) to correct false positives.

andiandi13 commented 5 years ago

@hizzlekizzle As I said, you just have to rename your rom the same name as the thumbnail

i30817 commented 5 years ago

edit: accidentally deleted a post, but tl;dr: if using a the tuple (console-type, rom filename) the false negatives could be controlled to the minimum if the users are smart enough to name things the same as the dump sets; and i don't really see why that would require new GUI. You can make users select the console-type manually ofc, but you already need those heuristics to even start scanning serials because each console has a different type of header structure.

As a fuzzy search this is likely to be slightly superior to the serial, because a dump set filename will not repeat on the same set (cause files would get overwritten on a complete set), while a serial absolutely could, if the console manufacturer is crazy (SEGA) or the game be misindentified (hacks). If people naming the hacks and translations on libretro-database are careful enough to name translations the same as the original game, they get the same image as the original, but if they name hacks different, they get no image but different name (my PRs there were organized like that). In all cases, the CRC fetched from this query (as well as the 'serials' query) can't be trusted, so it shouldn't be used for retroachievements or netplay (or any other feature that needs it).

Unless you put in entries to two or more dumping groups with the exact same kind of filename rules on the database... truerip and redump maybe? If this happens this idea is worthless not worthless because it's a fuzzy search anyway and 'non-uniqueness' would just result on the same entry twice (possibly with a different CRC but not certainly). If 2 dumping groups share the exact same name for the same game on the same console, it's 99,999% certain it's the same version of the game (though there could be 'exceptions' when a set 'corrects' a mistake), even if the CRC might be different (from different dumping formats or strategies).

To share images you could leave the game images associated to the serial like today and get the info in stages.

  1. (console-type, rom filename) -> [data] <-- fetch all the matches from the database
  2. iterate over [data] and find (if possible) one with a serial property (this is the 'original game')
  3. if found this database original->serial, use it to find the image with the same mechanism as today (this will find images for both the original games and translations today if translations obey the rule to be named the same as the original game). Hacks get no serial 'fallback' (because the original game name is different and libretro-database entries for both hacks and translations don't specify the serial) so they get no image.
  4. fill the GUI with the found filename name, found image if any and other data found. I'm moderately sure you can't preserve the exact metadata for translations in this fuzzy search (because it depends on confusing a translation for a original game), so you might be 'forced' by this logic to display the data of the original game for the translations.

It's not ideal because it's misleading (at least translations are not marked as such). It's possible that hacks would be marked as hacks with this, but i feel that would be inconsistent and confusing because of that translation issue. But this is the curse of not using unique keys and the price you pay for the 'convenience' of not using checksums as key.

edit: more edits

Ferk commented 5 years ago

Just add an option in playlist menu that could be named 'Ignore CRC while scanning roms' or 'Add unknown roms to playlists', whatever.

I don't understand why you would need DETECT in the playlist for this.

If such an option (adding unknown roms) was added I would expect that the scanning process would generate the CRC from the unknown file and save that CRC into the playlist (without checking if it matches any known CRC from the database), it would not need to store DETECT. Maybe the only thing you would gain if you did that is some speed in the scanning if you free it from generating any CRC, but at the cost of adding more processing later when you actually want to know what is the CRC for other things (I imagine it would make scrolling your list of games much slower, which is much more painful than leaving the scanning process running for an extra hour, it would also be troublesome when searching what content from your library matches the CRC for a Netplay room). And the problem you are trying to address isn't scanning speed anyway, as far as I understand.

In my opinion, what would help in the scanning of content that isn't in the database would be to make it so when the scanner finds a file with a particular extension (say... ".retroentry" for example) it read the file and added its content as a playlist entry (without needing to check if it's in the database or not and without doing any CRC calculation, as long as you already provide the CRC inside the file). The content of the file itself could also contain the name of the playlist where it's meant to be added.

Then you could place a "mycustomgame.retroentry" file next to your custom game, fill it up with the relative path to the file, label, crc and maybe any future metadata allowed by the new playlist json format (thumbnail url/path?) and store it always next to the game file so whenever you scan the folder with your games the entry will be added to the playlist automatically without having to set up your game collection every time for every retroarch device you load your collection from.

Kodi already does something similar with the ".nfo" files, it will read the local metadata stored on disk and add the movies and shows using it.

andiandi13 commented 5 years ago

It's kinda like what I suggested in my third post (first idea).

If RetroArch see a file named "Sega - Game Gear.retroentry" (to keep your nomenclature) in a folder, then it will add all the content of that folder to a playlist named Sega - Game Gear.lpl, whatever roms are in the folder.

Or.... Just based on the extension.

It's really easy

Ferk commented 5 years ago

Making it based on extensions would complicate things. It's not a good thing to let the scanner try to be too smart, because then you risk it doing dumb things. Not every ISO/EXE/BIN file is game content, and not every game content is just one file.

I think it's better to keep things simple but flexible. Scan for playlists within the content folders and include their entries (accounting for relative paths) when they are scanned, that would be more than enough to make me happy.

This would also decouple the scanning process. You could scan your content folder with some fancy third party tool from your computer if you want to and then copy the resulting file along with your content into your Switch or whatever device you want to run it from. Scan it once, reuse it everywhere.

Content that has been already added by including those playlists can be skipped from the actual scan. This would also help weaker devices that are not very well equipped for doing heavy calculations or heavy IO.

andiandi13 commented 5 years ago

Yes of course, there is an obvious issue with many extensions, the idea of a small "detection" file is good imo

i30817 commented 5 years ago

Making it based on extensions would complicate things. It's not a good thing to let the scanner try to be too smart, because then you risk it doing dumb things. Not every ISO/EXE/BIN file is game content, and not every game content is just one file.

The scanner is 'already' too smart.

The first part of what you're complaining about here already occurs in cd images and must occur to scan for serials (which is different per platform so the platform must be identified so the parsers don't extract a bunch of nonsense), which are the currently chosen - non-unique in some situations - key for the available game metadata (retroachievements, cheats, images, publisher data etc).

The alternative, scanning checksums was removed/changed to this because people complained it didn't caught enough games (on their personal dumps with aleatory cd formats and procedure, which results in different bytes) and it was too slow (while scanning bare files, not zip files that have the CRC32 pre-calculated as a zip header field). CHDs could also be pre-calculated with its internal checksum (better even, because that checksum doesn't care if the cd image is divided into multiple files, which is a common source of errors on the older CRC scanner).

I don't really blame them for the 'slowness' complaint, because scanning bare giga/terabytes files is brutal (hours in 10 years old hardware), I imagine it blows the patience of the kids on phones. But supporting a fuzzy scan without also supporting a checksum scan has several disadvantages for hacks and reliability in certain consoles and titles (misprints and versions which changed the game but didn't change the serial).

As a aside, the second part of what you're complaining about here (many false positives), is something that is likely to be avoided by technical users because they'll already organize their file folder structure by console. In fact the opposite problem (false negative) is more likely to happen because the format was 'unexpected' by the retroarch scanner for that console (for instance the parser can't identify a saturn game dumped or converted to a cue/iso), or the game is a homebrew and has no serial or whatever string RA uses to try to distinguish it as a game for 'x' console.

Ferk commented 5 years ago

That's unfortunate, but if the scanner is already imprecise for CD images (I guess this does not apply to file formats that do not contain serials) then that's more or a reason to allow the use of additional methods that allow users to override the behavior of scanning, like distributing playlist files within the content folders.

That way regardless of what the scanner is normally doing, the user has a way to define what does it want to get added, and at the same time it would be an approach that would be ok for 10 years old hardware too, since you could skip the scanning of folders that have pregenerated playlists to include.

The drawback is that most people probably won't know about this feature, specially at the beginning. But an option could be added later to, for example, export the currently scanned playlists to their respective content folders, that way the feature would get some exposition.

i30817 commented 5 years ago

A new file in the game folder is a terrible solution to the problem of pointing RA to the game version. I'd seriously stop using RA if that was the only method, because it's already fucking terrible when it happens for scummvm because the RA scanner is sufficiently simple it can't emulate the scummvm parsing algorithm. And i'm not the only one that would agree. Can you imagine the 'normal user' being asked to write hundreds of files with different content or copying a file out of thousands hundred of times into different folders?¹

No, any 'solution' that is not automatic is not going to fly. The scanner needs to be 'more' complicated not less, though the filename strategy asked here is a interesting idea for a fuzzy mode, even it is not truly simpler because as i showed, it still needs to parse out the platform from the -platform agnostic- cd image files, and won't work for data that is not uniquely named in anyway (like, say Sierra sci files).

¹ which already happens today if you have a complete scummvm collection and one of the reasons it's not worth it to use the RA port of that unless you really really don't have a original port on the platform. I'd prefer if the playlist generator of the scanner just parsed the scummvm.ini file after we used the native scummvm scanner.

andiandi13 commented 5 years ago

Why do you talk about hundreds of files ?

I'm talking about one file per console/per folder.

In your Game Boy folder, named Nintendo - Game Boy, you put a Nintendo - Game Boy.detect file.

In your Sony - PlayStation folder, you put a Sony Playstation.detect file, and so on.

All the files into each folder will go to a playlist named after the .detect file, whatever if it's .GB, .ISO, .TXT, .JPG etc...

RetroArch could come with a pre-created "games" folder, containing many folders inside it, named after each consoles, with all the .detect files inside. Then you'll just have to put your roms at their place.

Is that painful ? Is that complicated ?

i30817 commented 5 years ago

Ok that's actually a nice idea, sorry for misunderstanding. I was immediately reminded of the RA scummvm scanning strategy, which is basically horrible and was thinking you were proposing to add a id per game.

'Just' overriding extra heuristic data once per platform sounds doable and a nice standardization to override the faille platform parser. I support that idea.

andiandi13 commented 5 years ago

Yes, it would have been terrible to create a file per game. That method seems pretty feasible for the devs though

i30817 commented 5 years ago

Most people already organize their games in 'platform subtrees'. In fact there is confusing code in the scanner that is supposed to take advantage of this by either 'remembering' which database file the last game was found or based on the name of the scanned subtree being the same as the name of the playlist (can't recall what convention the code had) to scan quicker.

The idea proposed here would have some major code modifications there probably, and to pass the found platform into the scanner itself as a 'override' to bypass the platform scanner part. Sounds like a PR someone new to the project could do.

andiandi13 commented 5 years ago

Hmm I see... I wish I could have done a PR. For now I'm going to wait for a response of a member

i30817 commented 5 years ago

Why's that? Sounds like a good way to avoid the 'we have to create a GUI' and the 'the scan is letting false positives / false negatives pass' problems to the people that are organized enough.

There would still false positives and negatives because of the scanner using serials ofc, but that won't change until a version of the checksum scanner mode is available again, and this could avoid some silly false negatives because of different fileformats (though it's a open question if the serial scanner wouldn't just quit after on a 'unusual' cd image format if the filename scanner doesn't become a thing).

andiandi13 commented 5 years ago

@fr500 What then ?

I never told that this new feature would replace the current scan at all !

It's adding an extra options, an advanced feature, for advanced users, that know what they are doing.

It's just making retroarch scan specific folders and put the content in playlists.

And if you don't like the small files idea, I actually came with two ideas, the first one is also good imo.

Summary

First option

So if I have that folder :

RetroArch/games/Sony - PlayStation

RetroArch would detect the folder (thanks to it's good name AND it's path), and write all the .ISO files into a Sony - PlayStation.lpl file.

Second option

The second idea is to create little empty files and put them wherever we want to help retroarch determine what folder is what console.

Example, with two folders :

/snes

/roms/gbaroms

Here, the name don't help retroarch. So, on the first folder, we will manually put a file named Nintendo - Super Nintendo Entertainment System.detect, and on the second folder, a file named Nintendo - Gameboy Advance.detect

Then, when RetroArch will scan the entire device, it will put all files of /snes in a new Super Nintendo playlist, and all the files of /roms/gbaroms in a Gameboy Advance playlist.

What about thumbnails

The titles of roms on the playlist would be taken from filename, so that well named roms would display thumbnails, and roms named with another name would not.

For CRC, playlists would show DETECT.

I know I repeat the same things, but it seems so obvious and simple...

andiandi13 commented 5 years ago

@bparker06 @twinaphex What do you think of that idea ?

i30817 commented 5 years ago

I'm going to mention (again) i dislike 'false CRCs' in playlists - even if they already occur today with serial scanning.

'False CRC' equals 'useless, misleading, cause of bugs CRC', especially since so many RA advanced features require byte for byte equal games (netplay, retroachievements, cheats, sharing savestates, etc).

'False CRCs' is one of the reasons that the retro-achievements system in retroarch has to redo the calculation when using it, which is simply bizarre (the other being that the strip out nes headers to id roms, in order to 'catch the most', which may or may not be a mistake, depending on the influence of the header on runtime behavior of the code). And it's also unfeasible if retro-achievements wants to spread to cd consoles, but they'd have further trouble with that from the 'not a single file means not a single CRC' problem too.

andiandi13 commented 5 years ago

@i30817 You're right. I just checked my manually created Playlists with DETECT, and RetroArch did manage to associate true CRC with rom information of the database, and nothing for patched roms.

So it's better to set the CRC line on DETECT (I edited my post above).

hizzlekizzle commented 5 years ago

Something fr500 and I discussed on discord the other day is: keeping the existing CRC calculation, try to match thumbnails on name alone, and adding a flag (maybe a child node in the JSON playlists and an icon/character beside the name) to show that a file is "unverified" if it doesn't match the databases.

Would that be acceptable?

andiandi13 commented 5 years ago

try to match thumbnails on name alone

Do you mean filename or title name extracted from the CRC while scanning ?

The idea of verified/unverified files is good.

But I guess you attempt to add unknown roms into playlists ? Did you discussed about that ?

RobLoach commented 5 years ago

Possible duplicate of #2033 Alternative Scanning Method, which currently has a $15 bounty.

i30817 commented 5 years ago

Something fr500 and I discussed on discord the other day is: keeping the existing CRC calculation, try to match thumbnails on name alone, and adding a flag (maybe a child node in the JSON playlists and an icon/character beside the name) to show that a file is "unverified" if it doesn't match the databases.

fine by me, but at the cost of being pedantic again, (file)name match has the problem this bug discussed of same named games on different platforms and needs a further subkey (platform) to get a 'right' cover image. I like the 'detect' file or folder idea myself (because it makes RA not require parsing of exotic cd image formats, just passing them to the emulator) but fr500 already mentioned he doesn't.

And, hacks/translations need exact methods, and even serials don't really work (though you can 'cheat' and name them the same, or using serial 'inherit' the images/data of them, which is inappropriate because all the translation info is lost and the hacks will be completely different and you'll end up with 'duplicate' games if you have the original).

Resuming, if i was a C wizz, i'd try to organize the scanner into

  1. 'serial' (which requires a good platform id function that works on multiple types of cd image and parsing the serial out after id'ing the platform and type of image), what exists today, with all of its speed, false negatives and problems for features that require exact CRCs ; AND
  2. 'filename-extension+the idea here to have a id file or folder' (since this mode doesn't actually require parsing like this it's very 'reliable' to different fileformats if the convention is followed (ie: users follow instructions), but bad for hacks and features that require exact CRCs) AND
  3. A 'hard CRC' mode that takes care of only supporting amortized CRCs for cd image files in zip files and chds. Because of that, it'd be 'fast enough' to be used. However, i'd very much prefer if this would only be shown once many cd emulators have support for chd without uncompressing the whole file to tmp, which is a disk killer. I'd eventually also would like to extend that to support a custom xattr extension (something like user.crc32) for support in compressed OS filesystem of large 'bare' files if the user is smart enough to do that (this removes the need for the user to care about the emulators supporting compression/chd while at the same time preserving the amortization - i'd have to use a filesystem that supports both compression and xattr and script the compression myself, so it's a niche idea).

If the playlist would have 'DETECT' in place of the actual checksum, RA could disable features that need a 'correct' checksum when one wasn't even attempted, and maybe have a option to force a calculation/save on the playlist of to enable them. It's a bit lame because you may be forcing a checksum calc that you have no way to use later (ie: if the game netplay room is empty, or if the game doesn't have retroarchievements).

andiandi13 commented 5 years ago

@RobLoach Thanks, I didn't saw it. However, I suggest a specific solution to solve the issue.

Also, the issue goes back to 2015 and there is no new scanning method despite the $15 bounty.

Is there a specific bounty value to be sure that the issue will be solved?

Ferk commented 5 years ago

keeping the existing CRC calculation, try to match thumbnails on name alone, and adding a flag (maybe a child node in the JSON playlists and an icon/character beside the name) to show that a file is "unverified" if it doesn't match the databases.

What determines if a file is added as "unverified" or excluded entirely from the playlist? I imagine now you would have people complain on false positives and extra entries in their playlists. I wouldn't want to run a blind scanner across my folders of custom wads for the prboom core, for example. Not every wad is a game, much like not every CD image is, or every bat/exe file in Dosbox.

I think trying to get the perfect scanner that works automagically even for unlisted content and across different types of cores is a lost battle. Just let those of us who don't mind managing our own collection manually (or with our own third party software or scripts) to have at least a reusable way to maintain it so we can distribute the metadata along with the content.

i30817 commented 5 years ago

keeping the existing CRC calculation, try to match thumbnails on name alone, and adding a flag (maybe a child node in the JSON playlists and an icon/character beside the name) to show that a file is "unverified" if it doesn't match the databases.

What determines if a file is added or excluded? I imagine now you would have people complain on false positives and extra entries in their playlists. I wouldn't want to run a blind scanner across my folders of custom wads for the prboom core, for example. Not every wad is a game, much like not every CD image is, or every bat/exe file in Dosbox.

This can be easily and reliably done with just a bit of convention. Organized people already place their games by platform, so a standard dir name or a 'id file' that makes the scanner treat all subsequent files that can be for that 'platform' as targets is more than enough. Better even than the normal scanner because it won't depend on a very failible parsing and could accept 'weird' cd image files which the cores accept but RA has no conception of how to parse. For instance there are some games on the ps2 (for a hypothetical example ofc) that are isos instead of dvds. Some games/translations on the dreamcast were converted to 'normal' iso instead of what weird thing the dreamcast uses etc.

You could even use a scheme where the 'detect' files inside have the extensions to detect, so the user can choose 'i want cues but not bins' or 'i want dosbox.conf files but not bat files'.

You'd need to educate the users, so this should be optional.

I think trying to get the perfect scanner that works automagically even for unlisted content and across different types of cores is a lost battle.

Just so. That's why this idea doesn't even try to use the part that screws up: the parsing for platform attribution and consequent parsing of serial.

I myself would rather also have a hard checksum method as a option to get correct metadata on certain cases, but you can read that on my last post.

Ferk commented 5 years ago

a standard dir name or a 'id file' that makes the scanner treat all subsequent files that can be for that 'platform' as targets

How do you exclude subsequent files within that folder that are not meant to be targets? To illustrate it there's the example from https://github.com/libretro/libretro-prboom/issues/72, there are 3 files:

The actual game is original.wad, while orig15.wad is an optional file that provides some extra stuff but it's not possible to load orig15.wad by itself (although you can set your configuration after loading original.wad to load orig15.wad).

A blind scanner could easily assume that orig15.wad is a game, since it's the wad extension and it cannot really know since they both just look like custom wads.

i30817 commented 5 years ago

To that i ask: is that situation any better right now?

Single points of entry are usually maintained and when they aren't RA tends to support the 'cmd file' or 'm3u' hacks. In that case, i'd expect RA to simply support 'cmd' file for the doom engine and force users to put in 'only accepts .cmd' on their detect files and create them (just like it does when you want to load multiple floppies in x68k at startup).

I bet if you want to make it easy for intelligent users you can make the path iteration be able to override a previous 'detect' so the user doesn't have to create .cmd files when not needed. Like this:

doom games dir DOOM.detect with '.wad' content --------doomgame dir with 1 wad --------doomgame dir with 2 wads required -------------DOOM.detect with '.cmd' -------------doomgame_hd.cmd with the right order for the 2 wads. -------------doomgame.cmd with the only the 'original' game wad.

You'd need to make the scanner iterate in such a way that it never 'forgets' what it's supposed to be searching for before entering one of the branches where the 'detect' file changes and there is more branches to search, but that is a simple tree transversal algorithm (of which i don't have the brainpower to think of the most efficient way, but you're all good programmers and will figure out something good - probably a auxiliary stack to remember which detect variant is active and pop it on returning from the branch and seeing the detect file again).

Ferk commented 5 years ago

So far the PrBoom core does not use 'cmd' files. Wouldn't that bring the same problem as the '.scummvm' files that you criticised?

Right now I tackled the problem by allowing people to distribute a .cfg file along with the content so it's used as the default PrBoom settings when the wad is loaded (PR is still open, but people seem ok with the solution). Within the configuration you can set additional wads to load, so you still open the wad itself (not the cfg or any cmd) and the cfg file next to the wad will indicate which other wads to load.

i30817 commented 5 years ago

So far the PrBoom core does not use 'cmd' files. Wouldn't that bring the same problem as the '.scummvm' files that you criticised?

Far less. It's a question of degree: you made the point that there is a 'exception', and the proposed scheme is supposed to be 'general'. So i proposed a way to deal with the exception. Cores that habitually require multiple files that are not already in zip are either:

cd console cores with multiple cd games - like the ps1 - which require the user creating a m3u file - of which i already have a solution for myself and proposed incorporating the same solution into RA (it just depends on dumping group naming convention - though for that reason i don't really expect RA to adopt it).

Cores which habitually require .cmd files or equivalent are basically x68k and scummvm and there the situation is deplorable because it's every single game and just scummvm has hundreds.

Right now I tackled the problem by allowing people to distribute a .cfg file along with the content so it's used as the default PrBoom settings when the wad is loaded (PR is still open, but people seem ok with the solution). Within the configuration you can set additional wads to load, so you still open the wad itself (not the cfg or any cmd) and the cfg file next to the wad will indicate which other wads to load.

And hey, i'm not saying 'don't do better solutions' if you want this, by all means, the users can use your cfg as a better entry point with more options (the idea is to be flexible on what the user is allowed to specify, don't depend on fallible parsing and still put out a acceptable metadata entry, if without the certainty of CRCs). I myself require the dosbox core to allow loading/scanning dosbox.conf files before i will use RA dosbox (though i probably won't because i further require more patches for larger hd files to use windows 95 games in dosbox). There is far far too much config that RA is ignoring by not loading those for a DOS collection to be usable.

i30817 commented 5 years ago

Thinking about it, this idea is orthogonal to the the scan type.

Idea provides: A way to whitelist formats for directory branches and say which console (playlist) they belong to.

serial scan -> needs to figure out the playlist of the game and 'understand' fileformat the game is in to parse the serial. With this only needs to 'understand' a fileformat to parse the serial.

CRC scan -> needs to figure out the playlist of the game and 'understand' the fileformat enough to know which file to checksum (in the case of divided files)

filename scan -> needs to figure out the playlist of the game and to have less false positives.

I think i'll open a request to have this as a optional 'hidden' alternative to depending on fileformat heuristics to figure out which playlist the game goes to.

andiandi13 commented 5 years ago

Why bothering with such advanced solutions ?

A path, a folder with a good name, and that's all.

I mean.. retroarch knows where to look to find thumbnails according to paths (/retroarch/thumbnails/Nintendo - Nintendo 64...), so why would it be more complicated to identify consoles according to paths.

i30817 commented 5 years ago

Files give a opportunity to give the user control over whitelisting at any directory level, which folders do not. I agree simpler is good, but this was too good of a opportunity to pass by. If you have further suggestions or criticism on this method, i opened a issue for the idea (since it's orthogonal to filename scanning).

RobLoach commented 5 years ago

@andiandi13 Also, the issue goes back to 2015 and there is no new scanning method despite the $15 bounty.

There is the Qt interface which you can use to build custom playlists, but it would be great to have it directly in the RetroArch menu. Here's a video demonstrating the Qt interface https://www.youtube.com/watch?v=hfuioGjCItw

Is there a specific bounty value to be sure that the issue will be solved?

It varies depending on motivation and skill for people implementing. There have been bounties that got to ~$150 and were done, and there were bounties that were $0 and were done.

andiandi13 commented 5 years ago

There is the Qt interface which you can use to build custom playlists, but it would be great to have it directly in the RetroArch menu. Here's a video demonstrating the Qt interface https://www.youtube.com/watch?v=hfuioGjCItw

Thanks, but actually I tested it once, and find that RetroArch Playlist Manager is faster to create quick playlists with just a drag and drop.

It varies depending on motivation and skill for people implementing. There have been bounties that got to ~$150 and were done, and there were bounties that were $0 and were done.

I see... But honestly with what I suggest, despite not being a developer, it seems so easy to implement.

Scanning folders, recognizing paths and names, and creating .lpl files according to the content of each folder linked to a console.

ghost commented 5 years ago

While it may be possible with an "ignore CRC" or such option to add cartridge-based games into the right playlist (based on file extension<->core info/database mappings), this is a lot more difficult for CD systems because we don't have detection methods for all kinds of images and systems.

andiandi13 commented 5 years ago

@bparker06 : Again, it doesn't have to be that complicated.

So, for CDs (PSX, PSP, GC...), You'd just have to create the appropriate folder, or Retroarch would create them as soon as the option is activated, and you put your .ISO files in the appropriate folders.

I don't know what easiest can I suggest

ghost commented 5 years ago

Not everyone wants to make folders.

Where is the playlist name going to come from?

How do you know what database to associate each entry with?

Thumbnails aren't going to work.

Playlist asset icons won't work.

Database metadata won't work.

What's the point now? Might as well just use Load Content.

I promise it's not actually easy to make even the majority of users happy.

andiandi13 commented 5 years ago

You obviously didn't read my previous posts.

Not everyone wants to make folders.

No one is forced to use that new option + Retroarch can create them for us

Where is the playlist name going to come from?

From the fact that file are inside folders named like playlists. e.g. Game Gear ROMs have to be put in a folder named "Sega - Game Gear" (which is in a certain path, that could be changed in Directories options e.g. "retroarch/roms" or just "/" the same way you determine the playlists or thumbnails folder in options)

How do you know what database to associate each entry with?

I never say that CRC32 would not work ! ROMs would still be scanned : known ROMs would have their informations from the database, and unknown ROMs, not. You know, I create my playlists manually on Windows, and thanks to CRC32, my ROMs have database infos.

Thumbnails aren't going to work.

Yes they are, as soon as your ROMs are well named (like the thumbnails), and put in the food folder, thumbnails could be associated. How ? Just by taking filename a putting it as title name in the playlist, like "smart" softwares can do on Windows.

Playlist asset icons won't work.

They would work

Database metadata won't work.

CRC32

ghost commented 5 years ago

I meant that people aren't going to want to have specifically named files and folders in order for all the features/effects of the scanner and databases to work, whether or not the CRC/serial/etc. is checked. And that creates problems.

andiandi13 commented 5 years ago

If people don't want that, they just have to download clean ROMs with clean CRC and they're good.

What I suggest here is an extra option, an advanced option, why not an hidden option.

Others CRC problems can be dealed with in other issues.

But that feature would satisfy everybody making their Playlist manually on Windows/Mac/Linux, and I know they are numerous.

It's a hundred times easier for them, for us, than going through another device, managing Playlist, copying them to the target device. It's so painful, especially when you take new ROMs quite often.

If naming folders is so painful, Retroarch could let us choose custom paths for each console, as it does with core by default.

i30817 commented 5 years ago

I meant that people aren't going to want to have specifically named files and folders in order for all the features/effects of the scanner and databases to work, whether or not the CRC/serial/etc. is checked. And that creates problems.

This idea (both his and mine) is supplementary to the agnostic scanner that exists today and that allows zero customization. Frankly the main problem i have with the scanner is that it doesn't allow any adjustment in a sane manner, and the second is the false negatives that this would help workaround.

Don't get me wrong, i'm the first one to criticize things like 'scummvm files' for the scummvm core, and i'm the first one to wish the scanner was more specialized in that case and was able to pick up the game folders without that 'help' which causes much much more work for the users. However, even on this example, imagine that tomorrow the scanner gained the ability to recognize scummvm compatible game folders without the extra files (from upstream). Would it be enabled? No, precisely because it would slow down the general scanner and the scanner doesn't know when to use that strategy, except with the files strategy that already exists so there is 'no reason' for the feature to be added (except there is, because with a control strategy, the scanner could use the scummmvm strategy when it encounters a single file in the root of the 'scummvm games folder' instead of one per game with different contents for hundreds of different files).

So yes, i support a 'hidden' way to configure the scanner that won't bother people that don't care, and i support a more complex version than just renaming directories (the file algorithm i opened in https://github.com/libretro/RetroArch/issues/8672 ) because it allows more and better control and would provide better chances for specialization even in the scanner (and whitelisting, and 'two games on a single dir' scanning, or even 'just use files of this type to launch the game and forget about game CRC/name checking' etc).

For instance there are some cores that already have a specialized 'launcher file' in upstream, with several config features, that simply can't be used as-is in the scanner for some reason or other (manly being useless for id, because they're user made, not given by the developer with the original game or dumper). Dosbox.conf files are a example, but not the only one. Just in this issue there is a dev writing about DOOM launcher files.

diegopau commented 5 years ago

I am very new to RetroArch and have little knowledge about the internal works of it. I am using Retroarch in a Playstation Classic (Retroboot). I think I can be considered the "regular user" so I wanted to share my experience with this issue:

When I first tried Retroarch I moved my collection of Mega Drive roms to a "megadrive" folder and then I used the Scan Folder functionality inside Retroarch. Honestly I expected this to just work since all the roms are either .gen, .md, .smd or a zip file with the rom in any of those formats and all I wanted is have them on a playlist that can be quickly accessed. After scanning them I found out that around half of the games where there, the other half not. My reaction to this was to take a look to the folder structure and try to match the folder names of thumbnails, playlists, roms folder. This time i was really convinced I understood what the problem was, I even renamed a few files to match the thumbnail names. I thought: "if all the folders have the same name, Retroarch will already know that these are Mega Drive files because I am matching the name "Sega - Mega Drive - Genesis" everywhere, and the thumbnails have same name as the files... this has to work!"

After another scan the problem was still there. I could see some thumbnails now, but big part of the roms were not there. Those roms come from different sources across time, some are hacks and translations, all or at least most worked always with different emulators. I then found out about how Retroarch is trying to match every single file to a database and if it doesn't the game it is not added. It was a disappointment, I tried to see the database file but it doesn't seem to be in plain text. I gave up. My father was also struggling with the same thing when adding his ZX Spectrum games, some that he programmed himself decades ago. Yes you can always manually load a game and add it to favorites, but it is not the same, what we love about Retroarch is precisely that feeling of having your full collection there, tidy, with playlists and thumbnails, and right now it is only possible with lots of manual editing to manually build playlists to basically bypass Retroach's way of doing things.

Another problem I see with not having an option of building playlists bypassing the database matching is that it really takes a huge amount of time in some cases, and if you have a big collection and you just added one new game it seems like it would try to match it all again, sometimes that means hours of leaving there the Playstation Classic working on it for adding a single game. For some people like me we wish to just add the games to a playlist in a basis of 1 entry per file or 1 entry per subfolder, and skip completely any match against a database. Then the thumbnails just have to match the filename, which is much easier to understand for the average user in my opinion than trying to understand what specific ROM you have to get (if you can) so it is accepted by the scanner or to manually edit playlists (I actually tried and I wasn't successful with that).

I just wanted to share my point of view as a new user. Maybe many of the things I comment here or suggest comes from too much ignorance about how it should work and they are just not possible. Thank you for reading. And big thanks for Retroarch!