Filtering sub-dirs when unzipping runs in exponential time.

twolfson / grunt-zip

Zip and unzip files via a grunt plugin

MIT License

87 stars 19 forks source link

Filtering sub-dirs when unzipping runs in exponential time. #27

Closed michaelsantiago closed 10 years ago

michaelsantiago commented 10 years ago

In release 0.15.0, the process for identifying leaves and filtering sub-dirs when unzipping ran in exponential time. This made unzipping file with content in the thousands take an inordinate amount of time.

This commit adds an extra parameter called "flat", which can be specified in the config. If true, then the exponential block is not executed. If false or unspecified, the filtering block will run. I'm not sure if the filtering block is completely necessary.

twolfson commented 10 years ago

Unfortunately, we need the filtering since on Windows there are issues if we try to write a leaf directory as a file. Instead of covering up the problem with a new option, are there any data structures or ways to improve the algorithm to decrease the runtime?

Maybe we can convert this to 2 linear loops, one that builds out a directory tree, and the second that does filtering.

michaelsantiago commented 10 years ago

Understood, and I would be interested in investigating this further. My next questions would be:

What sort of ill effects appear when trying to write a leaf directory as a file?
What kind of scenario would cause this? I tested unzipping files with various sub-dirs and that seemed to work fine. Does depth matter?

twolfson commented 10 years ago

Alright, so I have done my research and found the original cause of the issue. It first appeared in grunt-zip@0.2.0

https://github.com/twolfson/grunt-zip/commit/d4524a9ecab8e04fa8b06bb2266a3f60fc082539

The issue was that we were creating files over nested directories. This was a naive patch that would wind up only creating files and leveraging mkdirp (grunt.file.mkdir in grunt) to create the higher level directories for those files.

Since then, the file/directory generator has become a lot more robust as it now will create empty directories and what not.

https://github.com/twolfson/grunt-zip/blob/0.15.0/tasks/zip.js#L164-L193

It seems like your initial statement that this block is irrelevant was correct. Removing the code in the latest iteration still gets tests to pass where in 0.2.0 (or more specifically e2437edf6d37109839df98d1b3764f2fa13a6fa5, when I could test with grunt@0.4), they would fail.

michaelsantiago commented 10 years ago

Might this be release worthy? Is there further validation necessary to make the removal of that block safe?

twolfson commented 10 years ago

Yep, I prefer to release frequently; easier to catch bugs due to more granularity.

I think the test is validation enough. If there any reports of issues then we will revert or fix the bug more appropriately.

michaelsantiago commented 10 years ago

Want me to submit another pull request with that block of code removed?

twolfson commented 10 years ago

Updating this PR works as well

michaelsantiago commented 10 years ago

Updated with the code block in question removed.

twolfson commented 10 years ago

This has been merged and released in 0.16.0