Closed ANogin closed 1 month ago
Would it make sense to add a benchmark test with this to record the speedup? Perhaps a recursive unpack of a filesystem pulled from http://tinycorelinux.net/15.x/x86/release/ or somewhere else?
We would probably want to find a pytest fixture that can record several runs of one test case, and/or perhaps fail if the execution time exceeds a certain threshold. Alternatively, we just add a fixture that records test duration
Please update the changelog
Done.
Since we're adding fallbacks like this, it probably also makes sense to fall back on gzip
if pigz
is not installed. We can be sure that it will be present in Docker images, but it likely will not for people using pip install
OFRAK.
@rbs-jacob, interesting idea re:gzip
/pigz
. Implementing that cleanly with our ComponentExternalTool
setup might take some additional thought. I'm in favor of merging this in for now to get the performance improvement -- if you see value in that as a further change please add an issue.
One sentence summary of this PR (This should go in the CHANGELOG!) Use GzipFile python unpacker for speed, fall back on pigz if needed
Link to Related Issue(s) N/A
Please describe the changes in your request. Back before OFRAK became public we switched to using external gzip, and then later in #79 we switched to pigz. The reason the initial switch to gzip was made is because unlike python gzip, gzip and pigz binaries are willing to accept gzipped data with extra junk at the end as a "warning" (exit code 2, as opposed to exit code 1 for errors). Unfortunately, spawning external processes has nontrivial overhead, and asyncio is not enough to mitigate it. This commit tries a compromise - try
gzip.GzipFile
in python first, then fall back to pigz if BadGzipFile is raise.Anyone you think should look at this, specifically? @whyitfor