Open i30817 opened 4 years ago
To avoid the need for a parsing and filtering hierarchy on the C code (for m3u at least) i was thinking that maybe parsing could be triggered from certain files.
The best i could think was if there was a operator | that could parse a file based on the extension and 'destructure' the strings one by one on the right hand side, to be used with the glob 'both sides' rule.
complete redump 'rule':
*.m3u | *.cue => * (Track 1).bin:CRC32
*.m3u | *.cue => *.bin:CRC32
*.cue => * (Track 1).bin:CRC32
*.cue => *.bin:CRC32
I realized that in order for this to work I needed to mandate the fileformat to filter out the mappings too. ie in that case the 'successful' cues in *.m3u | *.cue => * (Track 1).bin
that mapped a cue to a actual file would get removed from the pool of the following *.cue => * (Track 1).bin
, which lead to the realization i needed to filter out files matched on previous lines for everything and added that to the spec above. But i don't really like the 'parsing operator' idea.
I think it's a combination of things making this a bad idea: increases the number of rules for a single case, sabotages the universality of the both sides rule, makes potentially match multiple relative directories if the m3u pointed to a subdir or relative dir, looks weird.
Another alternative i thought of would be to embrace the parsing and remove the globbing as name id. Turn the leftmost entries on the DF files into suffixes to files. If possible, the [] operator of the rightmost of a token is interperted as 'access string(s) that you can extract from a parse'.
The 'complete redump' would be:
.m3u[*][0]:XATTR
.cue[0]:XATTR
but i don't know how this scheme would work when you do have to use the name to associate the LF to the SF, because the parse is too complicated or not appliable:
dosbox.config ????
So between these two options i chose the name centric one as more universal, and applicable to particular cases.
Advice welcome.
Description
This idea requires massive changes to the scanner, and the playlist GUI or playlist fileformat, but i think it'd be worth it. Recently there have been some requests for a 'playlist/console scanner' option for the GUI on the filesystem and this could be used to implement that too.
This is a restatement of https://github.com/libretro/RetroArch/issues/8672 into a single post.
glossary: RA: retroarch LF (launcher file) is the file that appears on a playlist to be given to the core. Often a rom but can be something else too in this proposal. There can be multiple LF per game. DF (detect file) is a file with a fileformat holding a language to describe the expected name of a LF and optionally a mapping from instances of that match to a libretro-database entry. This optional mapping is needed because LF are most often editable and not part of a game distribution, and also often not modified by hacks, so they're not suitable to identify a game. SF (signature files) the files that libretro-database uses to identify a game. There can be multiple of these per game, for instance DOS games may have a installer and game SFs.
Expected behavior
This idea is orthogonal and cannot replace scanner heuristics that map a game rom/iso to a playlist (ie: console identification) completely, if you want users that didn't organize their games on the filesystem to 'scan' a random mishmash of rom types. It's just a way to disable that mess and take control by making the scanner only consider roms of a console below or in a directory. If you have any idea to make this type of configuration replace the heuristics, please comment, because that is the messiest part of the scanner code from what i've seen.
As a aside to this proposal, if you have, for instance a game with a cue referencing a iso (for mednafen), and both are valid LF; both appear on the playlist. 'Filtering' this is not handled in this idea but could be done after having the whole collection by a understanding of certain files. For instance after acquiring a cue or m3u file for a playlist, files referenced there could be hidden from the playlist and their metadata added to the corresponding cue or m3u (if missing and existing on the files pointed to).
The idea
The main idea is that the users organize their games per platform on the filesystem and then when the scanner is given a start directory to scan, the recursive function doing that gains a stack argument and does this:
Detect file format
Each line of text, maps to a either a single file, or a list of files and those optionally map to a single libretro-database entry (ideally) or multiple (if there is no other option).
If a line has no mapping ( => ) but just a single file with a glob or not, that file serves as both LF and SF.
Matching lines (for the files, not database entries) remove the leftmost file (LF) in the mapping from the pool of the next line; for performance, and correctness. Again for performance the scan should cache calculated checksums in the case a file doesn't match a line, though the XATTR proposed method would be a longer lived cache and i'd use it when possible.
+ is a metacharacter for globbing that doesn't cross directories (filename only), and which if it appears on 'both' sides of a mapping is restricted to only add a playlist entry if it's the same on both sides. This is done to allow + globbed libretro-database mappings to map to different metadata entries than the first; being essentially a way to allow users to 'connect' a LF to a SF and not have to create DF for every single game if they use LF, 'just' name/rename the LF correctly.
* is a metacharacter for (directory only) globbing that should only appear on the right-hand side of a line, does cross directories, and is potentially empty. It's a way for the SF side of a line to search for a directory tree, not only a single directory, and to still allow the 'both sides' rule of '+' to take effect.
It's best if the system directory separator here is '/' a 'fixed' choice so the files work on unix and windows, and best if the lines are matched to files with case insensitivity.
In the libretro-database mapping it might be helpful to allow different methods (CRC32, NAME, MD5), on the SF and it might be helpful to allow a fixed entry without a search. If no suffix is given, no attempt is done to fetch metadata.
Of special interest here is these libretro-database matching methods can be extended. For instance if in the future the database get support for CHD sha1 internal checksum, you could have a 'CHD' method coded, or a 'PS1SERIAL' or even a unix only 'XATTR' to reuse a checksum recorded in the file that supports softpatches etc.
To make it clear, some examples:
Sony - Playstation.df
with contentIn and under the DF dir, search for any cue file as LF and get the metadata in libretro-database by looking for '.bin' files CRC32 on the same dir and the same name (minus extension).
Sony - Playstation.df
with contentAs above, but you 'know' that you have redump files, therefore you can have separate tracks if the game has digital audio.
NEC - Turbografx-16.df
with contentTurbografx Cds need to use the second track to identify because that is where the game actually is
DOS.df
with contentsFor any fixed name
dosbox.conf
file in and under the DF dir, use it as LF and get the metadata in libretro-database by looking for the first executables CRC32 you find under the 'game' subdir in the DOS database and place it on the 'DOS' playlist.As you probably noticed this case the metadata is uncertain, because there might be more than 1 executable in the database than matches. Resolving this ambiguity can be done by using a fixed
DOS.df
with contentsdosbox.conf => game/game.exe:CRC32
but that is not scalable, thus the allowance to show 'a' metadata entry in this case.DOS.df
with contentsSpeaking of scalable - here the '+' both side rules apply but the * allows the right hand side to search the whole
game/
directory tree for all executables, which is much more reliable.DOS.df
with contentsIf there are two or more conf files in the dir and they match that exe, they're all entered with the same metadata but different LF. This also shows why only the LF is removed from the pool of possible files after a line; other LFs could use the same SF, but no line should match the same LF again.
DOS.df
with contentsA fixed mapping, where the libretro database is only used to fetch a already calculated crc. Not that useful a idea, but may be useful if a there is a several gb file you don't want to rescan or a hack not on libretro-database and force it to use a particular metadata entry.
Sony - Playstation.df
with contentAs was mentioned before, but i want to emphasize, DF can have multiple lines for multiple types of allowed LF, and m3us are a 'special case' of LF where their libretro-database mapping requires parsing and filtering, which would be inappropriate to this fileformat, so it's simply not done here.
After the scanner is over (or the playlist is shown), the m3us would be parsed and entries in it that match LF on the playlist would have those LF hidden and their metadata used as metadata of the m3u file. This is the main part of this scheme that I think requires GUI code modification, specifically the code to hide entries, though it should be done anyway if you want m3u support in the playlist.
I think it cannot be done on the playlist format because the metadata of individual files in the m3u is cached there, and removing them from there would necessitate a new scan unless the playlist format gets special support for m3u and multiple 'keys' for metadata, one per item on it.
Taking into account that the file mapping match removes the leftmost file from the possibilities below a complete redump + extras rule in linux could be:
In order, from more to less specific: 2 normal redump mappings, hacked games that the patch turned to iso mappings (and the user named them iso instead of bin), remaining cues where the name of the cue has no relation to the bin/iso and so have no libretro-database candidate key (without parsing), then m3u that will filter out all of the others that match and consume their metadata after the scan. If possible, the metadata would be fetched by checking it on extended attributes first, and potentially saving it there if missing after calculation, so only the first scan is slow.
I think that's all; any suggestions or ideas? If you have any idea how to simplify the RA heuristics code beyond this I encourage you to post it.
edits: