[Request] Scan de-compressed files only

tony971 commented 7 years ago

Right now, scanning a compressed file (.7z, for example) involves scanning both the .7z file itself and its contents individually. Scanning the .7z file takes a considerable amount of time and could be skipped if the user is only concerned with the underlying files. Is it possible to set an option for "Scan de-compressed files only"?

RobLoach commented 7 years ago

Compile with HAVE_COMPRESSION=0 ?

tony971 commented 7 years ago

Would that still allow scanning files contained within zips and such?

ghost commented 7 years ago

@tony971 Some cores accept compressed files directly and do in fact have checksums in our databases for zip/7z files, for example for MAME and FBA. However, the scanner should only be scanning those direct archives against the databases whose relevant core info files might accept an archive directly. Removing this functionality would break scanning on those cores... unless you know of a better alternative.

tony971 commented 7 years ago

I was hoping for a configurable setting to skip zip/7z files themselves and only scan their contents. Users should know whether they need the zip/7z files scanned or not.

aarononeal commented 7 years ago

I'd like to see this improved without UI configuration. The main issues are performance and lack of hinting.

One of the problems with the current scanning approach is that a zip/7z has to be checked against every database of every installed core that supports compressed files. I added MAME and noticed scanning slowed down significantly because that massive list was being checked even for non-MAME scans. I remove that core now before any scan just to avoid it.

To make matters worse, the scanning logic re-reads the db header every single file being scanned and then memory maps and subsequently frees the entire db over and over every single scanned file. This is hugely wasteful and completely defeats the attempt the db layer makes at caching with mmap.

Rather than looping over every file in the scan set and then checking each core, it would be much faster to loop over each core and then check the files in the scan set so that the db is only loaded once per scan and record caching actually works.

That would speed up the zip/7z scans to where it might not be necessary to optimize further.

Another optimization would be to check the content path. If the content folder name matches the name of the content database, it makes it perfectly clear that only that single database needs to be scanned. So no in-app configuration for the hint, just require that the user explicitly name the content folder on their filesystem.

libretro / RetroArch

[Request] Scan de-compressed files only #5440