muldjord / skyscraper

Powerful and versatile game scraper written in c++
GNU General Public License v3.0
470 stars 124 forks source link

Duplicated data - every asset (video-images) is copied from the cache when generating xml gamelist #237

Closed theshinyknight closed 4 years ago

theshinyknight commented 4 years ago

Describe the bug After data is scraped, and saved in skyscraper/cache/[PLATFORM], when creating a xml gamelist; the data is not moved but copied. This means that if you have 4 GB of data in a platform; when you create the gamelist file, the data is copied from the appropriate cache folder, in the final location (you can see this clearly if you save the media and xml file in the rom folder of each platform.

Not sure why the data is saved both in the cache and then copied in the final location; this end up duplicating files and take up a lot of of space on the SD card. I understand that this is done to keep track of files and maybe for safety reason if the media is deleted; but that should be a user option to decide if you want to keep twice all the media you download, or if you want to keep it in the cache and generate the gamelist file pointing at the resources in the cache, instead of copy all assets somewhere else and take twice the space.

To Reproduce Download a consistent amount of videos for a platform; check the size of the cache folder for that platform; in the skyscraper script select the option to copy data in the rom folder; generate the gamelist file

Current result Data is duplicated; every media file is copied instead of being moved to the roms folder.

Special circumstances none

Terminal output none

Technical information

Additional context None

muldjord commented 4 years ago

Yes, this is intentional and the point of the cache. Skyscraper supports multiple frontends. ES is just one. Each have their own way of storing videos for their gamelists. If you decide to change frontends or change any other part of your Skyscraper config (artwork setup or title look) all you have to do is regenerate the gamelists with this new frontend and you are up and running. No need to rescrape data as it is already in the cache.

Using videos takes up a lot of space. This is why the --flags symlink flag and config.ini option symlink="true" exists. Please read the documentation here and here.

If you prefer to remove the data you can do so with the --cache purge option, also documented in one of the above links. If you do you will have to rescrape if you make any changes to your Skyscraper setup or change frontends. Unless you want to convert the data from ES manually.

muldjord commented 4 years ago

I'm not sure if you were aware that this wasn't a bug, but if you were, for further similar inquiries I would point you to the RetroPie subreddit and or the RetroPie forums. There are many Skyscraper users on there who could answer this question. Although I strive to provide support and often answer questions on there as well, I would like to give others the opportunity to provide it, to soften my workload.

All Skyscraper documentation can be found here.

If you wish to use a non-caching scraper I would direct you to SSelphs' Scraper (merely called scraper if I recall correctly) which is also found in the RetroPie installation script.

theshinyknight commented 4 years ago

Thanks for the clarification. Since it was mentioned nowhere that you end up with data in the cache and in the final destination, and since in software design and development you rarely have redundancy unless it is necessary (like a DB in multiple copies with links to assets by reference for example); I thought it was a bug.

Make sense in this context to keep around the cache; what is not explained is why this is not an option. Taking away space on a SD card without inform the user may cause confusion; and a simple option to decide if you want to keep your actual files in the cache or not, in case one day you decide to switch frontend (emulationstation is the default, on the default image for retrogaming, so it is safe to assume that the majority are using emulationstation).

There was a time when space, memory and cpu were a premium, and as such, you would design your systems in a way to be as efficient as possible. The fact we have huge SD cards make us developers lower our good standards, thinking that just because storage is cheap, there is no reason to worry about using space inefficiently.

I have retrieved the data I needed so I am good removing the cache; I can add by hand extra items to the gamelist. Thanks for the help in making me achieve my final outcome of having all roms scraped on my arcade. Cheers.