tsolomko / SWCompression

A Swift framework for working with compression, archives and containers.
MIT License
233 stars 39 forks source link

LZ4 decompress speed #46

Closed sarensw closed 1 year ago

sarensw commented 1 year ago

Hi, thanks for providing the lib and especially for the support of LZ4. I'm trying to use this to decompress lz4 files using the following code:

do {
    if let data = try? Data(contentsOf: sourceFilePath, options: .mappedIfSafe) {
        print("data loaded")
        let decompressedData = try LZ4.multiDecompress(data: data)
        print("decompressed")

        let combinedData = decompressedData.reduce(Data()) { (result, data) in
            var mutableResult = result
            mutableResult.append(data)
            return mutableResult
        }

        FileManager.default.createFile(atPath: extractedFilePathName.path, contents: combinedData)
        print("file written... in theory")
    } else {
        print("could not load")
    }
} catch {
    print("ran in error")
    print(error)
}

It works. But compared to running lz4 via command line, it is really slow. My input has a size of around 30MB, extracted 126MB. Using the terminal it takes a second. Using the code above it takes about a minute. After "data loaded" and before "decompressed", the CPU goes up to 100%Β and memory goes up by roughly 125MB which is expected.

Is it expected to take so long to decompress?

I have also tried this code on your test archive SWCompressionSourceCode.tar.lz4 which I found in the test files repo. There the code just returns "corrupted". Am I loading the file wrong?

tsolomko commented 1 year ago

Hi @sarensw,

First, I would like to ask, if there is any specific reason, why you are using LZ4.multiDecompress instead of LZ4.decompress? The former is only useful if there are several "LZ4 frames" concatenated together. This seems to be quite rare in practice, so unless you know that is likely to be the case, you should use LZ4.decompress, which in theory should be slightly faster in the case of one "LZ4 frame". In addition, you can then also remove the combining data code.

With regards to performance, let me first ask you, have you compiled SWCompression in the Release mode? If not, I suggest you to try it, the difference is usually quite substantial.

I have also tried this code on your test archive SWCompressionSourceCode.tar.lz4 which I found in the test files repo. There the code just returns "corrupted". Am I loading the file wrong?

I am not sure why this is happening. Before writing this sentence I've double-checked and tested both LZ4.multiDecompress and LZ4.decompress with that file, and they work fine for me. I suspect that you may have got a "wrong" file. The test files in that repository are stored using Git LFS, so depending on how you have downloaded it, you may have got a git lfs reference file instead of the actual file.

sarensw commented 1 year ago

Hi @tsolomko , thanks for the quick answer.

The only reason I used LZ4.multiDecompress is that with LZ4.decompress all files fail with corrupt. Yours, and mine. I have just downloaded the test repo as a zip to have all the files available.

I'm honestly not sure weather I use Release mode or not. I just used the GitHub link in XCode to install the package. I'll check on later.

tsolomko commented 1 year ago

The only reason I used LZ4.multiDecompress is that with LZ4.decompress all files fail with corrupt. Yours, and mine. I have just downloaded the test repo as a zip to have all the files available.

When the test repo is downloaded as a zip, it contains git-lfs reference files instead of the actual files, so that's why they are "corrupted". As for why your files fail, if you can share an example with me, I can have a look.

sarensw commented 1 year ago

You are right of course. I have downloaded the correct file now. It works as expected. Also: Not sure whether I was too tired to see it. But I just switched to decompress (from multiDecompress) and it works for both files. Yours and mine. βœ…

The only question that remains is the speed. You were right regarding the Release mode. It only takes seconds (if not less) to decompress. In Debug mode to decompress a lz4 file of 31mb (decompressed ~126mb tar) takes ~150seconds. As long as the Release mode works I'm fine. βœ…

I'll close the issue now. Thanks for your support.

sarensw commented 11 months ago

@tsolomko do you know any way to work with the release mode of the package while debugging an app? I have to use the Release mode of my app to try out things as it is really slow in Debug mode. Google didn't really help. Maybe you have a tip?

tsolomko commented 11 months ago

@sarensw

This depends on how the package is installed. If you use Carthage, for instance, you can supply --configuration Release option to the carthage update/bootstrap invocation. For cocoapods, I think, you can change the xcode build settings of the SWCompression target in the generated xcode workspace.

sarensw commented 11 months ago

@tsolomko I'm using Swift Package Manager πŸ˜…

tsolomko commented 11 months ago

@tsolomko I'm using Swift Package Manager πŸ˜…

I was hoping that you wouldn't say this πŸ˜„. I don't think there is any option that you can set in the package manifest to enable this, but if you're using SPM via its integration into Xcode maybe there are some target-specific build settings that you can change? (I have zero experience with SPM integration into Xcode, so I am just guessing.)

sarensw commented 11 months ago

πŸ˜„ I thought so. I just started out with Swift last year and I have used SPM since then (XCode integration). I have googled already but couldn't find this. Thank you so much for your help anyways. Maybe I will try out Carthage then :).

sarensw commented 11 months ago

I'm using Carthage now and it works like a charm. Thanks again for your time and suggestion.