npm / fstream

Advanced FS Streaming for Node
ISC License
208 stars 43 forks source link

dir-reader: account for entries being changed after _read #50

Closed evanlucas closed 8 years ago

evanlucas commented 8 years ago

If this.entries is changed after _read() has been called, we will be out of sync and try to access an invalid index of this.entries. When the entry cannot be found, we emit end and close, which can drop files from reading.

Apologies for no test...I am having trouble reproducing with actually using npm at this point. Thanks!

Should fix https://github.com/npm/npm/issues/5082

Thanks!

othiym23 commented 8 years ago

This is great work, and something like it will be incorporated into fstream with credit to you, but I can't land it until there are meaningful tests to prevent regressions, and without doing some more digging to ensure that the underlying race condition is well and truly addressed. As @addaleax pointed out on #node-dev, this fixes the test cases in https://github.com/othiym23/eliminate-5082, which is a very good sign. However, as this code base is very complicated, I want to be careful.

That said, I believe it's on the CLI team to come up with the tests for this, because the current test suite pretty much needs to be rewritten out of its current whimsical style and into something that gives us a better sense of how much of the code is covered by the tests, and this is part of that. I'm going to start working on a new test suite for it immediately, because I'd like to get this fix landed and released in next week's npm. The current state of things is not great both for npm and for Node 6 users.

Thanks so much for your time and the patch, and to @addaleax for doing the bisection to find where the change to Node landed!

evanlucas commented 8 years ago

Cool, happy to help. I'll keep trying to come up with a standalone, reproducible test case for it for the time being

addaleax commented 8 years ago

Yeah, either of you, let me know if I can do something to help – I have access to a machine where I can reproduce the bug (almost?) 100 % of the time with Node v6 (at least for the eliminate-5082 repo).

stevenbenner commented 8 years ago

I tested this patch and it does fix the npm/npm#5082 issue for me.

But I did find a side effect. The .npmignore filter no longer prevents my .editorconfig and .npmignore files from being included in the package. It's a 100% repro for my project in my Arch Linux environment using the same repro steps I listed in npm/npm#12542.

Environment
> uname -rs
Linux 4.5.2-1-ARCH
> node --version
v6.1.0
> npm --version
3.8.9
Reproduction steps
  1. git clone https://github.com/stevenbenner/jquery-powertip.git
  2. cd jquery-powertip
  3. npm pack
  4. tar -tvf jquery-powertip-1.2.0.tgz (see that .npmignore is in the tgz file)
  5. npm install
  6. npm pack
  7. tar -tvf jquery-powertip-1.2.0.tgz (see that .npmignore and .editorconfig are in the tgz file)
Console
> npm pack                               
jquery-powertip-1.2.0.tgz

> tar -tvf jquery-powertip-1.2.0.tgz     
-rw-r--r-- 1000/100         38 2016-04-24 10:40 package/.npmignore
-rw-r--r-- 1000/100       6109 2016-04-24 10:40 package/CHANGELOG.yml
-rw-r--r-- 1000/100       1084 2016-04-24 10:40 package/LICENSE.txt
-rw-r--r-- 1000/100       4132 2016-05-01 14:04 package/README.md
-rw-r--r-- 1000/100       1324 2016-05-08 13:57 package/package.json

> npm install
[... install spam ...]

> npm pack                          
jquery-powertip-1.2.0.tgz

> tar -tvf jquery-powertip-1.2.0.tgz
-rw-r--r-- 1000/100        175 2015-09-19 13:12 package/.editorconfig
-rw-r--r-- 1000/100         38 2016-04-24 10:40 package/.npmignore
-rw-r--r-- 1000/100       6109 2016-04-24 10:40 package/CHANGELOG.yml
-rw-r--r-- 1000/100       1084 2016-04-24 10:40 package/LICENSE.txt
-rw-r--r-- 1000/100       4132 2016-05-01 14:04 package/README.md
-rw-r--r-- 1000/100       1324 2016-05-08 13:57 package/package.json
Notes

That said, having extra files in the package is far better than randomly excluding files silently. I still vote that you land this PR asap.

othiym23 commented 8 years ago

Reworded commit message and landed as a55ae72b23e6cdd6c540dac896a361060c56dd17 / published as fstream@1.0.9. I'm still working on writing tests that would exercise this fix, but decided to ship to avoid further messes in the ecosystem. Thanks so much to @evanlucas for the fix, and to all for helping out!

othiym23 commented 8 years ago

It turns out this is causing some other problems within npm and / or node-tar, so either this masks another race condition or an fstream-ignore issue exists independently of this race. More tests to write!

evanlucas commented 8 years ago

So, i've spent some time looking more into this. I'm not sure how we can actually make it 100% unless we double pass all of the entries. fstream-ignore modifies what entries are emitted up, but there is no guarantee that we have not already emitted a file when we read an ignore file. So, in order to make it 100% we would either have to filter all of the entries that need filtering prior to emitting any, or have a way to back out an entry in node-tar (which could already be an option, I just haven't seen it yet)

othiym23 commented 8 years ago

Thanks for looking into this @evanlucas. I suspected that something like this may have been the case. I'll look into finding a way to make this work, but I would vastly prefer to have some kind of strategy that decides synchronously, and once, whether a given path should be included or not. Right now this is all feeling pretty rickety.