schellingb / dosbox-pure

DOSBox Pure is a new fork of DOSBox built for RetroArch/Libretro aiming for simplicity and ease of use.
GNU General Public License v2.0
759 stars 62 forks source link

Request: 7zip archive support #39

Open ryz opened 3 years ago

ryz commented 3 years ago

Hello!

It'd be great if DOSBox Pure would support 7z archives (as well as loading games from them on-the-fly, just like zip files do) due to the superior compression ratios with LZMA2 compared to zip archives.

With that said, thanks for your hard work! I've been using DOSBox Pure extensively for the last few days and it works really well so far - stellar release! Features like loading games from archives, save states, and fast forward which 95% of the other forks ignore are absolute game changers, so really, thanks again!

schellingb commented 3 years ago

One big problem I'd expect with supporting 7z is just the feasibility performance wise. In a ZIP file, each and every file is compressed completely independent. So accessing and "streaming" a specific file out of a ZIP is very possible. On top of that thanks to the seek functionality I implemented in this core, even a large file can be streamed without having to uncompress it completely into memory or onto disc.

If you ever tried to look at a small .TXT file in a large .7Z you probably notice it takes a long time. That is because in 7Z, files aren't compressed separately but as one big thing. So opening even a small time can take a long time.

While I have had a bit of experience with ZIP files and its most common deflate compression algorithm used in it before starting this core, I don't know much about the insides of 7zip and LZMA2. Maybe someone has done a random access streaming solution for it that I could look at? But if the highly regarded 7zip application is this slow with opening a file, I don't have high hopes :-)

ryz commented 3 years ago

Hey @schellingb, thanks for the answer!

I was just reading DOSBox-X feature highlights and they state the following:

Support for the mounting ZIP/7Z archives as drives You can mount ZIP or 7Z archives as DOSBox-X drives and run your DOS programs or games in these mounted drives directly, which will operate in read-only mode.

Maybe it'll help to see how they implemented it?

Cheers!

schellingb commented 3 years ago

@ryz I wrote a little test program to compare the seek performance and for ZIP this turned out as expected.

https://user-images.githubusercontent.com/14200249/103455365-c347db00-4d2f-11eb-94e8-f73366fd5bec.mp4

(notice I'm not even starting the test at the same time but in DOSBox Pure it is over almost instantly)

This is explained by a comment in the PhysFS code as used by DOSBox-X:

        /*
         * If seeking backwards, we need to redecode the file
         *  from the start and throw away the compressed bits until we hit
         *  the offset we need. If seeking forward, we still need to
         *  decode, but we don't rewind first.
         */

With 7z things look a bit different. Somehow running DIR on a mounted .7z with just a single file takes a long time. Opening a file also takes quite long. But reading and seeking inside the file then isn't that bad! I can't test actual games because the mounting in DOSBox-X is just read-only so games that depend on fast seek like Descent refuse to run. Also mounting ISO from inside a mounted archive is not supported in DOSBox-X so it's hard to compare the performance with larger games.

Indexing (DIR) can certainly be optimized. But if opening files always is that slow, it could be usable for games with one large file, but games with many small files would still be quite a problem. Overall I learned that 7z seeking performance is not as hellish as I assumed, so maybe there's a future for .7z in DOSBox Pure. No promises :-)

PoloniumRain commented 3 years ago

If you ever tried to look at a small .TXT file in a large .7Z you probably notice it takes a long time. That is because in 7Z, files aren't compressed separately but as one big thing. So opening even a small time can take a long time.

Hey @schellingb, are you aware of the 'Non-solid' 7zip compression option? :) Have you done any tests with it?

non-solid

With non-solid the entire .7z file doesn't have to be uncompressed when opening a single file. So in your example above, you could now open that small .TXT file without having to uncompress the entire thing. It's way faster. Compression ratio isn't as good with non-solid, but files are still considerably smaller than what zip is capable of. Currently i don't use zips for my DOS games because its mediocre compression wont save that much space and DOSBox-SVN and Core can't open zips. Not worth the effort. But if i could use .7z i'd compress them anyway because the space saving would be pretty significant. It would be such a great feature to have!

Augusto7743 commented 1 year ago

About less load performance comparing between 7-zip and Zip the slowdown not is exactly big if you use an good cpu and also several DOS games are total size less of 100 MB. Even an AMD Athlon 2 dual core 3.5 GHZ run with good speed.

PoloniumRain commented 1 year ago

@schellingb Any updates on 7-Zip support? (or CHD?).

BTW in the latest Windows 11 update, released publicly tomorrow, it now has native support for .7z files.

Augusto7743 commented 1 year ago

7z support only for windows core version ?

PoloniumRain commented 1 year ago

7z support only for windows core version ?

No i meant Microsoft added native 7-Zip support to Windows. Previously you had to download the 3rd party 7-Zip software to open or extract a .7z archive file. That's no longer needed.

I think this change will help make .7z files more popular, which is great because ZIP offers awful compression.

schellingb commented 1 year ago

Yeah @PoloniumRain is right, having .7z support in Windows won't suddenly make the core be able to magically support 7zip. Even if Windows 11 had a "Extract .7z file" API it won't give the necessary low level .7z file access we need.

I recently did a deep dive into .7z source code and now with what I've learned there, we are much closer to getting 7zip support in DOSBox Pure. One issue with 7zip is the amount of compression formats it supports. A file inside 7zip can be compressed with the algorithms "LZMA2", "LZMA", "PPMd", "BZip2", "Delta", "BCJ", "PPC", "IA64", "ARM", "ARMT" and "SPARC". On top of that a file can be compressed multiple times with different algorithms to achieve maximum compression. For example an .EXE file can be compressed with "BCJ" which is specifically for code inside an .EXE file and then on top of that be compressed with "LZMA2". This means a lot of code needs to be added to DOSBox Pure for just this one feature.

I still want to do it but I just need time, which I currently don't have much available. The little bit of time I want to spend on cleaning up what I prepared for the next version and finally get that out without losing myself again in adding another feature to it. It's been to long since the last update...

Augusto7743 commented 1 year ago

windows having native support for 7Z was an miracle.

PoloniumRain commented 1 year ago

One issue with 7zip is the amount of compression formats it supports. A file inside 7zip can be compressed with the algorithms "LZMA2", "LZMA", "PPMd", "BZip2", "Delta", "BCJ", "PPC", "IA64", "ARM", "ARMT" and "SPARC". On top of that a file can be compressed multiple times with different algorithms to achieve maximum compression. For example an .EXE file can be compressed with "BCJ" which is specifically for code inside an .EXE file and then on top of that be compressed with "LZMA2". This means a lot of code needs to be added to DOSBox Pure for just this one feature.

Hmm if it would be easier and means less code then maybe DBP could only support a few of these algorithms because some of them would be useless in this case, and others like LZMA are deprecated. Or maybe just have support for LZMA2 alone, which almost all .7z files use anyway being as it's the default 7-Zip setting. I've tested most of these methods before on around 50 random DOS games and found that LZMA2 definitely produces the smallest file sizes overall. Although if this was done i'm sure you'll eventually get posts on here saying "this 7z file is not working" because people don't read the instructions lol, but i'd say it's still worth doing. I think some other emulators/cores also do this but i can't remember which ones.