nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
105.22k stars 28.5k forks source link

buffer: discuss future direction of Buffer constructor API #9531

Closed addaleax closed 7 years ago

addaleax commented 7 years ago

In today’s CTC meeting we discussed reverting the DeprecationWarning for calling Buffer without new that was introduced in v7 (PR up here), and it became clear that we need to come up with a long-term plan on what exactly we want to achieve, how to do that and improve our messaging about it, both before and after our actions. I’ll try to sum up what exactly we are talking about; obviously, I am somewhat biased, having been involved in plenty of the previous discussion here. (This has still gotten pretty long btw, so I hope a lot of people will find the information in here useful enough to warrant a Wall of Text.)

The Buffer constructor has the usability flaw that it accepts input with different type signatures, so new Buffer('abcdef') and new Buffer(100) will both return valid buffers, and in the latter case, the Buffer will contain 100 bytes of unitialized memory. This is a security problem for two reasons:

  1. When passing unvalidated user input (e.g. from a JSON request) to the Buffer constructor where a string is expected but a number is actually passed, uninitialized memory will be returned:
// This is a dangerous example of converting a string to Base64!
new Buffer(someStringReceivedFromTheUser).toString('base64')

Passing the value 100 here will return a slice of memory that may contain garbage, but generally can contain any value previously stored in memory, including credentials, source code, and much more. @ChALkeR has a pretty good write-up of this: https://github.com/ChALkeR/notes/blob/master/Buffer-knows-everything.md

  1. Accidentally accepting large numeric values can very quickly increase resource usage, and can be turned into a Denial-of-Service attack against vulnerable applications.

Again, @ChALkeR has a very-good write-up on these security issues at https://github.com/ChALkeR/notes/blob/master/Lets-fix-Buffer-API.md. It predates the current Buffer.alloc()/Buffer.from() situation, but it contains a helpful FAQ with answers to questions like “Why not just make Buffer(number) zero-fill everything by default?”.

So far, in Node v6.0.0 the safer Buffer.alloc()/Buffer.from() API was introduced and later backported to the v5.10.0 and v4.5.0 releases. Additionally, v6.0.0 came with a documentation-only deprecation of the old Buffer() API.

In June, https://github.com/nodejs/node/pull/7152 was opened, which seeks to deprecate the old Buffer() API using a runtime deprecation, i.e. printing a single warning per Node process when Buffer() or new Buffer() is executed for the first time. Currently, that PR is still open. A reduced version of it, https://github.com/nodejs/node/pull/8169, was landed as a semver-major change in v7.0.0, that emits and displays DeprecationWarnings for uses of Buffer() only, but excludes uses of new Buffer().

I had summarized the goals and possible actions before that decision was made in https://github.com/nodejs/node/pull/7152#issuecomment-241355246 ¹; And @jasnell has written a then-current long-term plan in https://github.com/nodejs/node/pull/7152#issuecomment-240753218 that would include runtime deprecations of new Buffer() in v8.0.0 and later actual breaking changes to the Buffer constructor.

The reason for this distinction was trying to keep the possibility of making Buffer a proper ES6 class at some point in the far, far future open, which would mean that calling new Buffer() may always work. (Effects of turning Buffer into a class would be proper subclassability and breaking Buffer() without new. It is, however, completely possible to add a separate class to the API that would behave like the current Buffer implementation does, only with these differences.)

As a result of that deprecation for Buffer() without new in v7.0.0, significant pushback from well-known members of the community ensued, both in the threads on https://github.com/nodejs/node/pull/7152 and https://github.com/nodejs/node/pull/8169. On the one hand, it became clear that we failed in our messaging to make clear that the primary motivation for that change was helping our users avoid serious security issues; on the other hand, the added deprecation warning seemed to be incongruent with the expectations of stability and backwards compatibility that module authors and consumers have, as far as Node core is concerned.

As a result of this, the CTC decided to consider reverting the deprecation warning, possibly temporarily, and the corresponding PR is in https://github.com/nodejs/node/pull/9529. The decision on that has yet to be made, but the desire has been expressed to reach a decision soon to limit the number of v7.x versions with possibly incongruent behaviour.

From following the discussions, it is obvious that the path forward is a contentious issue; right now, the opinions range from never introducing a runtime deprecation for any version of the Buffer constructor, to applying one for all uses of it at the next semver-major release in v8.0.0.

The strongest and most frequently expressed argument for fully runtime-deprecating the Buffer constructor soon remains that users may not be aware that parts of their application use an unsafe API and should be warned about that.

On the other side, the warning itself is perceived as a very disruptive change to the ecosystem, suggesting that it is definitely worth exploring alternative ways to reduce the usage of both Buffer() and new Buffer().

/cc @nodejs/collaborators


¹ It may or may not be obvious from the way I articulate my thoughts here – I try to stick to stating facts – but in hindsight, I regret writing it this way.

addaleax commented 7 years ago

More opinionated stuff from myself: I do not think we should ever turn Buffer into an ES6 class; we can create a separate API for that and make Buffer a wrapper around that, as it has been discussed before, and nothing about Buffer() ever needs to be truly broken. I know my comment linked above mentions a possible breakage as a motivation for deprecating Buffer() earlier than new Buffer(), but I definitely don’t agree with my past self on this anymore. I do however see that there is significant security risk involved with their usage, and that that may warrant a full runtime deprecation at some point in the future.

Correspondingly, I would ask of others (although I am aware that I obviously can’t speak for everyone) to try and avoid giving “Subclassability” and “ES6 class” as reasons for the deprecations here; I think it is obvious that our messaging broke here, as reflected by the comments on https://github.com/nodejs/node/pull/7152 and https://github.com/nodejs/node/pull/8169. IMHO, this is primarily about security, and all other benefits of future changes pale in front of that.

mafintosh commented 7 years ago

@addaleax Thank you for the detailed write up. Much appreciated.

I'll suggest a way forward that have been mentioned by other people already.

Instead of deprecating anything start zero-filling buffers returned by the Buffer constructor, similar to Buffer.alloc. Back-port this change to old versions of node (I'd like to see this added all the way back to 0.10 but I know that isn't officially supported anymore).

A benefit of this approach is that old code will work just like before - no changes needed.

It also introduces an incentive for module authors to upgrade to the new API as the zero filling has a perceived performance penalty.

jasnell commented 7 years ago

@mafintosh ... automatically zero-filling only addresses part of the issue. There are other aspects of the existing Buffer() use that are problematic -- but that can be patched over in much the same way. Unfortunately, doing so makes it less obvious that users on older versions of Node.js are doing the wrong thing -- and transparently fixing the issue in new versions of Node.js could lead to developers being completely unaware that their older version of Node.js may have an issue. That's not to say that zero-filling by default and limiting the maximize size to avoid DOS is not a good approach, it's just that there are still ecosystem and usability issues that go along with it.

not-an-aardvark commented 7 years ago

@mafintosh For more context on what @jasnell said, see https://github.com/nodejs/node/issues/4660#issuecomment-171262864 (from before Buffer.alloc existed). The concern is that this would actually create security issues, because people would stop using new Buffer(num).fill(0) in modules, and then anyone using an older version of Node without the zerofill behavior would be vulnerable.

This might be more doable now that Buffer.alloc exists. If we decide to go that route, I think we should at least keep Buffer() and new Buffer() soft-deprecated to avoid this issue and to encourage everyone to use Buffer.alloc instead.

mafintosh commented 7 years ago

Just to clarify: I'm all for the soft deprecating of the constructor. I was referring to doing the zero-fill instead of hard-deprecating it.

bnoordhuis commented 7 years ago

I do not think we should ever turn Buffer into an ES6 class; we can create a separate API for that and make Buffer a wrapper around that

I'm going to disagree with that. The additive approach of "just add a new API" leads to a sprawling and uncohesive design. Newcomers to node already complain there is so much to take in, let's not make it worse.

Accidentally accepting large numeric values can very quickly increase resource usage

v6.x and v7.x solve that by throwing an exception when calling Buffer(1234, 'encoding') (but not for Buffer('1234', 'encoding')), we should discuss back-porting that to v4.x.

I'm in favor, it improves security and I can only see it breaking code that is already prone to getting broken by attackers. Better it's us than some black hat, am I right?

addaleax commented 7 years ago

v6.x and v7.x solve that by throwing an exception when calling Buffer(1234, 'encoding') (but not for Buffer('1234', 'encoding'))

They don’t when there’s only a single argument, Buffer(1234) vs Buffer('1234').

sam-github commented 7 years ago

I'm on record for supporting deprecation of Node.js APIs that don't make sense (https://www.youtube.com/watch?v=jJaIwea8r2A, https://gist.github.com/sam-github/4c5c019b92cf95fb6571), and I even support the deprecation of these Buffer APIs (slowly, properly communicated), but I think the pain of these kind of changes is not well appreciated.

For one, console messages have huge negative impacts downstream, because people who can do nothing about them see them, see https://github.com/nodejs/node/pull/9483

But for another, trivial changes become less and less trivial as they work their way up module dependencies.

For example, say a version of glob is deprecated because there is a security problem. A new major comes out. The only change is in an obscure corner case I don't care about, and most don't. But, its a functional change, it must be a major. That's OK.

To update to the new major for packages that directly depend on glob is trivial, just bump the major in your package, the API didn't actually chnage, so no code changes. So far, so good. So packages that depend on glob do this, takes a couple minutes, republish, no problem.

But authors do this on only the head of their packages. Going back into history and doing a patch release of every major version that has ever used glob? That's a lot of work, so only the latest. And maybe they bump their package (EDIT: major) version because, hey, who knows, maybe some downstream dep depended on that particular glob corner case.

And the changes slowly work their way up, 3 levels up through other packages that have had major updates, used by tap.... and now the latest tap depends on the latest glob, which is A-OK.

But I have packages with unit tests I haven't touched in years... and now I need to bump to the latest tap to get ride of the messages about glob... and while glob's change was tiny and insignificant, tap has changed a lot, and all my tests are now failing, and I'm spending hours and hours fixing them, even though the original change in glob required no js code changes for the update!

This has been happening to me lately, with the graceful-fs and glob updates. I support those updates, and I'm doing the work, and I don't mind.

But its worth remembering that both those updates were at their root trivial, graceful-fs and glob had new versions that were drop-in replacements for the previous majors. So, upgrading should have been trivial, and it was for direct dependencies, but as the changes ripple up the dependency trees, the changes become less and less trivial, because they start to get bundled with changes that are not so trivial.

This is the kind of thing that will happen with the Buffer changes. Immediate code that uses Buffer will change trivially, but further up the tree, its going to hurt a lot more.

bnoordhuis commented 7 years ago

They don’t when there’s only a single argument, Buffer(1234) vs Buffer('1234').

Yes, and it's regrettable, but that can't be changed anytime soon. Back-porting the changes for the two argument case would at least mitigate @ChALkeR's hoek example.

billiegoose commented 7 years ago

I've been thinking a lot about this, and actually written a bit but haven't shared my thoughts yet because they're incomplete. But I wanted to share one of my ideas: perhaps the fundamental problem had nothing to do with Buffers or APIs at all, but how we deprecate things.

There are two values that we all share but right now they are in conflict: Security and Stability. Core wants to motivate module authors to fix two highly dangerous flaws that might exist in their code: leaking secrets via uninitialized buffers, and DDOS attacks via buffers constructed with unboundchecked ints. Having created the flawed API, it seems reasonable they assumed responsibility for trying to fix the mistake. But we can't always solve the problems we create on our own, and this is a case where core had dug itself in a hole, and would need module authors to help lift them out. Because the Buffer API was a one-way (lossy) function: you could easily convert Buffer.from, Buffer.allocate, and Buffer.allocUnsafe to the unsafe "Buffer()" form, but given a Buffer() call it is impossible to automatically choose a safe behavior based the arguments, because knowing whether it was being used safely or not requires analyzing the source code in the calling function which the Buffer() function doesn't have access to.

When your only tool is a hammer, every problem looks like a nail. So they used the only tools they had: the docs website for node, and a console message in node. That's when we ran into a stability conflict. Because apparently the console output of Node is treated as part of it's API by test runners, which saw the deprecation message and freaked out. Which sort of succeeded at the original aim: get module authors attention so they can fix this flaw. However it flew in the face of another value: Stability. Module author's freaked out, thinking core was randomly changing the Buffer API, and complained loudly. And rightly so, because broken test cases, multiplied by every project that includes that somewhere as a dependency, creates chaos. (See leftpadgate)

The literal cause of the problem is not deprecating Buffer - it's how it was communicated, which turned out to break modules, much to core's surprise, I think. I certainly wouldn't have thought one little console log message would break builds. But apparently it does! So what can we do?

Core values stability just like module authors, I'm certain. Therefore it seems clear that deprecating things by printing to the console log is the wrong strategy, because that will always interfere with program output. I think that deprecating strategy... has to go. Should be removed from core's toolbox.

Now here's my actual novelty/contribution. If the goal of deprecation is to get module authors attention so they fix their code, node core is not the right venue for that. Npm is. I think, core should reach out to npm and see if they could help publicize deprecations. Place a warning banner on every module that uses the old unsafe API. This would have zero effect on how the code runs and therefore would not threaten the stability of the module ecosystem. Yes, developers might be pissed to be publicly called out for their module using insecure APIs. So take advantage of the fact that npm has the email address for all the authors and let them know in advance. Maybe just show a small warning that gets bigger over time. The point is, deprecation is fundamentally a social process done via communication, not a technical one that can be done by modifying the node engine.

Sorry if I rambled on a bit.

seishun commented 7 years ago

I certainly wouldn't have thought one little console log message would break builds. But apparently it does!

So far I've only heard of one module that was actually functionally broken by Buffer-without-new deprecation. All other cases of breakage I've encountered were in tests which look at stderr output.

Place a warning banner on every module that uses the old unsafe API.

Static analysis doesn't always work well in JavaScript.

billiegoose commented 7 years ago

All other cases of breakage I've encountered were in tests which look at stderr output.

A broken test is a broken test, is a broken test. It is disruptive, especially if you rely on Continuous Deployment and a breaking test halts deploying to production. But that's a hypothetical on my part. I am a little curious though about the particulars. Can someone (@mafintosh @substack etc) speak to what horrors rained down on them as a consequence of test suites breaking?

sam-github commented 7 years ago

​Unexpected console output when using a node CLI program is a functional breakage.

feross commented 7 years ago

I (along with @mafintosh) were the original reporters of this issue. After reading through this discussion (and dealing with many issues opened by users about this issue), I think the best way forward is to zero-fill buffers created with Buffer() and new Buffer().

Proposal: Zero-fill buffers created with Buffer() and new Buffer() in v7 (as well as active LTS versions: v4 and v6)

This solves the most important issue that is at stake here: the possibility of accidental data leakage via an unintentional Buffer(num). The DOS issue of Buffer(really_large_num) remains, but denial-of-service is far, far better than data leakage.

Since Buffer() is already soft-deprecated in the docs, modules written in the future should not be using it. Existing code that uses Buffer() will only benefit from the zero-filling, which may fix existing security issues.

The risk that module authors will start assuming that Buffer(num) will zero-fill, leading to insecure modules on older Node versions is reduced if we ship zero-filling to all active LTS versions.

In conclusion, zero-filling fixes the most important issue – data leakage – without major breaking changes and without hard-deprecating the Buffer API.

This solution lets us support the legacy Buffer() API for the foreseeable future, ensuring existing code continues to work, without it being a security risk for new code that happens to use Buffer().

seishun commented 7 years ago

It seems there is consensus that security issues in old modules are a real problem, but some people think it could be better solved by zero-filling new Buffer(num) instead. While this alternative proposal has some merits, it also has its shortcomings, which in my opinion outweigh the merits. I'll try to summarize them below.

Solution A: one-time runtime warning (aka hard deprecation)

Solution B: zero-fill buffers created with Buffer(num) and new Buffer(num)

mcollina commented 7 years ago

There will not be a solution that please everyone. I'm definitely 👎 on zero-filling Buffer, and at the same time I am 👎 on hard deprecating anything. And I am 👎 on delivering insecure software. I am also aware that one of the above will happen. It's a bad situation.

Just to clarify, the major issue I see with the old API is Buffer(string) and new Buffer(string), but not new Buffer('hello world') and new Buffer(42). For old modules, the only problematic case is new Buffer(string), where string is a variable.

Emitting the warning just for Buffer(str), but not Buffer(str, 'utf8') and Buffer(int) might also be a viable option. This should reduce the surface area for maintainers to just the cases that are problematic. It might be a silly idea.

Can we start adding rules in the linting/security software to catch this? Maybe there is already, but it might be an easy way to get started.

ChALkeR commented 7 years ago

Emitting the warning just for Buffer(str), but not Buffer(str, 'utf8') and Buffer(int) might also be a viable option. This should reduce the surface area for maintainers to just the cases that are problematic.

I also thought about that, but no, that won't work. Buffer(42, 'utf8') does the bad thing on 4.x LTS, so Buffer(str, 'utf8') is not universally safe. And adding a throw into v4.x LTS is hard to justify.

new Buffer(42) could probably be fine, though I am not yet sure, need to think about that a bit more. Upd: at least one cons — if we keep it, we won't be able to zero-fill new Buffer(42) for the reasons noted below.

ChALkeR commented 7 years ago

@feross Re: zero-fill, the only viable time to zero-fill is in the same release that introduces a runtime deprecation warning for the API. E.g. when/if Buffer(42) is runtime-deprecated with a clear message, we will be sure that no people would add that in new code, thinking that it will zero-fill on all releases that they care about.

The problem with zero-fill is that if we introduce it in vN.0.0 but don't introduce a runtime-deprecation for Buffer(42) in that release, then at least some library authors caring about only vN.0.0 would rely on Buffer(42) being zero-filled, and when someone runs that code on a previous release, things will go even worse than they are now.

Also don't forget that zero-fill doesn't prevent DoS issues.

yoshuawuyts commented 7 years ago

Re: zero-fill, the only viable time to zero-fill is in the same release that introduces a runtime deprecation warning for the API. E.g. when/if Buffer(42) is runtime-deprecated with a clear message, we will be sure that no people would add that in new code, thinking that it will zero-fill on all releases that they care about.

@ChALkeR Making things safe and deprecating an API are two separate things. By zero-filling security becomes guaranteed which is a different concern from migrating people onto a new API. I feel while the two might be related, they are by no means tied to each other.

@ChALkeR if I understand you correctly you try and make the point that if people don't migrate to a new API the npm ecosystem will be worse off than it currently is. Given that in the current situation nobody is using the new API this is not possible. Instead what we can do is make all buffer code instantly safe by zero-filling, and move the ecosystem forward in a giant leap.

Then there's the point of moving people onto a new API. I feel NodeJS has a good outreach through release notes, blog posts talks and other channels. More than enough to make people aware of a changed API. Besides that there's also an incentive for developers to pick up the new API in the form of improved performance. If we do this well all developers that care about performance will have the means and reasons to move to the new API, all without needing to make maintainers sad.

I hope this sounds reasonable; I'm in favor of zero-filling by default as it seems like the least problematic way forward. Fwiw I'd also be keen to see the perf implications of zero-filling, as I don't recall seeing any numbers so far and having them would be great to incentivize people to move to a new API.

mcollina commented 7 years ago

I hope this sounds reasonable; I'm in favor of zero-filling by default as it seems like the least problematic way forward. Fwiw I'd also be keen to see the perf implications of zero-filling, as I don't recall seeing any numbers so far and having them would be great to incentivize people to move to a new API.

console.time('new Buffer(1024)')
for (var i = 0; i < 10000000; i++) {
  new Buffer(1024)
}
console.timeEnd('new Buffer(1024)')

Here it is, roughly 2x:

$ node --zero-fill-buffers test.js
new Buffer(1024): 4329.844ms
$ node test.js
new Buffer(1024): 2562.717ms

(node v6.9.1)

IMHO making an old API slower is worse than emitting a warning, because you are "forcing" people to move, with the additional problem that the culprit is not easy to track. On the good side, tests will not cause warnings to be emitted. It might be ok as a semver-major change together with a warning, but not in v4, v6 or v7.

I am still 👎 on zero-filling, as I am 👎 on the deprecation warnings. However, if we go for a deprecation warning, zero-filling any new Buffer() and Buffer() call is acceptable.

Maybe we can warn something like:

"new Buffer() and Buffer are now zero filled for security reasons, see LINK". In LINK, we show the new APIs, and how to move to them. If we can write something like standard --fix that does the job automatically, it would be of great help.

The above warning is annoying, but it might cause less worry than a full deprecation. It is not a generic "the api might go away", but it's positive: we have deprecated this api to make you safe, read this link to make this warning disappear.

bnoordhuis commented 7 years ago

Keep in mind that zero-filling is currently done in an unsophisticated way. Preallocating memory coupled with offloading zeroing to a separate thread should remove much of the overhead.

feross commented 7 years ago

@bnoordhuis Keep in mind that zero-filling is currently done in an unsophisticated way.

Good to know! If there's only a negligible performance difference to the zero-filling, that seems the best option to me.

feross commented 7 years ago

@seishun Most of the cons in your list aren't correct.

If this is not backported, it will create new security vulnerabilities, see @ChALkeR's summary

If we add zero-filling to supported release lines, then the impact will be minimal. Users on v4, v6, v7 (and 0.12 if this is resolved in time) will be covered.

As for users on 0.10 or other unsupported versions -- they're already playing with fire by running an unsupported version in product. Even so, they're unlikely to be affected by new code that uses Buffer() unsafely IMO. Buffer() is already deprecated in the docs. New code should not be using it. (Developers who are familiar with the old behavior and haven't read the docs recently will continue to assume that they are responsible for zeroing/overwriting the buffer, so no harm done.)

If this is backported, it will silently introduce non-trivial performance degradation to some users

This is not an issue. See @bnoordhuis's comment: https://github.com/nodejs/node/issues/9531#issuecomment-261773876

Raises the question why we didn't do this in the first place instead of introducing new APIs ("if new Buffer is OK in old code, then it should be OK in new code, so what's the point of Buffer.alloc*?"). Might create an impression of bad long-term planning.

The point of the new APIs (Buffer.alloc(), etc.) is to split apart two very different behaviors into explicit functions. Buffer.alloc() allocates num bytes of memory. Buffer.from() converts an object into a Buffer.

ChALkeR commented 7 years ago

@feross

Users on v4, v6, v7 (and 0.12 if this is resolved in time) will be covered.

No, they won't. You are assuming an immediate update, I expect the number of users that will gain problems from this (i.e. that don't update immediately to a latest version in the branch, or don't update fast enough before they install a package that relies on zero-filling) as non-zero and significant. Moreover, I estimate this to be more disturbing on the ecosystem than forcing maintainers of popular packages to do trivial updates.

Also, every time someone proposes zero-filling, they are ignoring the DoS issue.

seishun commented 7 years ago

Developers who are familiar with the old behavior and haven't read the docs recently will continue to assume that they are responsible for zeroing/overwriting the buffer, so no harm done.

Why do you think these developers won't make the same mistakes that triggered the creation of the new Buffer API in the first place?

This is not an issue. See @bnoordhuis's comment: #9531 (comment)

I wouldn't be so hasty with conclusions. Remember that people run Node.js on all kinds of hardware.

The point of the new APIs (Buffer.alloc(), etc.) is to split apart two very different behaviors into explicit functions.

No, splitting them apart was not the end goal. The end goal was to fix security and DoS vulnerabilities, and we agreed that it should be solved by introducing new API rather than changing the old one. Zero-filling now means backing down on that decision.

feross commented 7 years ago

@ChALkeR forcing maintainers of popular packages to do trivial updates

The impact is way more serious than either you or @seishun understand.

You can't just send a few PRs, do a few npm publishs, and be done with it. Lots of packages have dependencies on packages that are not the latest version. So, in order to make the warning go away in a top-level package, you'll need to upgrade past a semver major or two, and potentially rewrite a lot of code. It's not accurate to keep calling this "trivial", IMO.

The day that node.js deprecates Buffer() will be the day that everyone sees a permanent deprecation error when they start node.

Maybe a deprecation message and forcing everyone to do a ton of work is the right way to go in the end, but you shouldn't minimize the HUGE and very real impact this will have on the community.

feross commented 7 years ago

No, splitting them apart was not the end goal. The end goal was to fix security and DoS vulnerabilities

This is not correct at all. There was never any security or denial-of-service vulnerability in Node.js because of Buffer(). The behavior of Buffer() is well-documented, and the implementation has always worked as documented.

I opened the original issue because Buffer() was a potential foot-gun. It does very different things depending on the type of the argument that is passed in. The changes were always about splitting apart the safe and "unsafe" methods so users can be very explicit when they want uninitialized memory.

Zero-filling now means backing down on that decision.

We can make the original, flawed Buffer() API safer by zero-filling, while also recommending the newer, explicit APIs.

@seishun Why do you think these developers won't make the same mistakes that triggered the creation of the new Buffer API in the first place?

They might make that mistake. But in that case, Node users with zero-filled Buffers would be safe, while Node users without it wouldn't be. In such a scenario, we want as many users to have zero-filling as possible.

ChALkeR commented 7 years ago

@feross

The impact is way more serious than either you or @seishun understand.

I do understand the impact. Moreover, I have built a dependency graph of packages from npm which takes versions into an account, especially for tracking update propagations. (An older version is available here, entry points are recent package versions). I have not yet built a model which combines that graph with Buffer() usage and download counts, but I'm planning to do that soon, before the next deprecation lands, collecting and filing security issues still takes a lot of time.

Based on the current data and previous experience of propogating package updates in dependencies chains, I expect the total usage (per downloads) to decrease about 90% in several months. I hope to obtain a better estimation soon.

The day that node.js deprecates Buffer() will be the day that everyone sees a permanent deprecation error when they start node.

  1. That's not an error, it's a one-time (per launch) warning.
  2. That warning could be surpressed using a command-line flag.
  3. In a significant percentage of cases, that warning would hint a security problem somewhere in the dependencies. Which is now hidden for the same reasons you mentioned.

We can make the original, flawed Buffer() API safer by zero-filling, while also recommending the newer, explicit APIs.

Did you read my arguments agaist that above? The amount of setups whose security would be lowered by such a change is going to be not zero. If you disagree with that statement — please explain.

Node users with zero-filled Buffers would be safe

No, they won't, because they will still have the DoS vulnerability in their projects.

No, splitting them apart was not the end goal. The end goal was to fix security and DoS vulnerabilities

This is not correct at all. There was never any security or denial-of-service vulnerability in Node.js because of Buffer().

To the end user, it does not matter where the vulnerability was, there is no point in blaming anyone or saying «that's not my problem» here. The thing that matters here is the ecosystem security (or end user security, whichever you prefer to call it), and there was definitely a problem there. And we are still having that problem.

feross commented 7 years ago

@ChALkeR version is available here

Your link is dead.

That's not an error, it's a one-time (per launch) warning.

I should have said "warning" instead of "error", but my point still stands. Almost all node users will see warnings spew to the console at start.

That warning could be surpressed using a command-line flag.

Doesn't matter. 95% of users will not suppress the warnings, and maintainers will still get inundated with issues about code that worked fine before.

In a significant percentage of cases, that warning would hint a security problem somewhere in the dependencies. Which is now hidden for the same reasons you mentioned.

If instead of showing a warning we just zero-fill all buffers, then there is no data leakage security issue anymore.

Did you read my arguments agaist that above? The amount of setups whose security would be lowered by such a change is going to be not zero. If you disagree with that statement — please explain.

I do disagree with that statement.

Your argument, as I understand it, is that if we ship zero-filled Buffers, then new code might be written by module authors who assume zero-filling happens on all Node versions, making users on older versions less secure.

For that to happen, ALL the following things need to go wrong, which I find unlikely:

1) Developers would need to write new code using Buffer(num), despite the strong recommendation against doing so in the docs.

2) Developers would need to use Buffer(num) in such a way that the buffer is not fully overwritten by new data before being exposed to a client.

3) Developers would need to install this new code and ship it to production. (This is very unlikely because users on v0.10 or v0.12 can't really upgrade their dependencies these days without some ES6 or ES7 sneaking in and breaking everything.)

4) Developers would need to be running an insecure version of Node.js in production. (This is a really bad idea, and any user/company that cares about security will be on a supported version to keep on top of critical OpenSSL fixes, etc.)

5) An attacker would need to identify a vulnerable service and create an exploit.

Security is a continuum. Different users are willing to tolerate different levels of risk. Users with the most stringent security requirements will run a supported version: 0.12, v4 LTS, v6 LTS, or v7. Users on these versions will get zero-filled buffers and be better off than before.

No, they won't, because they will still have the DoS vulnerability in their projects.

Neither solution fixes the "DoS vulnerability". Printing a deprecation warning doesn't prevent DoS.

Also, it's a stretch to the truth to call this a "DoS vulnerability". This is an API usability issue that made it easy to write code that might be DoS'd one day. Regular expressions are also notorious for the same problem -- are you willing to deprecate those too?

To the end user, it does not matter where the vulnerability was, there is no point in blaming anyone or saying «that's not my problem» here.

It actually does matter where the source of the problem is.

It's extremely easy to write a regex that has extreme DoS potential, for example (a+)+ or (a|aa)+ can be exploited to take a server offline. Is it Node's responsibility to prevent this? Of course not! You can't deprecate regex, and you can't deprecate 1000s of packages on npm because Buffer(large_num) might slow down some sites.

The thing that matters here is the ecosystem security (or end user security, whichever you prefer to call it), and there was definitely a problem there. And we are still having that problem.

Yes, I can agree with that statement :) We're all trying to make it (1) easier for users to do the right thing, while (2) minimizing ecosystem disruption.

We just disagree on how much ecosystem disruption is acceptable. Node can't make huge backwards incompatible changes anymore. The Web which is WAY bigger than Node doesn't even make breaking changes anymore.

seishun commented 7 years ago

95% of users will not suppress the warnings, and maintainers will still get inundated with issues about code that worked fine before.

The important point is that the users have such an option if they don't want to be annoyed. And I'm not sure what you mean by "inundated". As far as I know, the deprecation of Buffer without new didn't cause any project to receive more than one issue about the warning. Most modules I've seen also received PRs with fixes.

Neither solution fixes the "DoS vulnerability". Printing a deprecation warning doesn't prevent DoS.

It does, indirectly. By encouraging people to update their code or their dependencies.

This is an API usability issue that made it easy to write code that might be DoS'd one day.

We could have applied the exact same logic to new Buffer(num), but there's a reason we didn't.

Regular expressions are also notorious for the same problem -- are you willing to deprecate those too?

Only if their misuse was as widespread as that of Buffer, and could be deep inside the dependency tree.

Fishrock123 commented 7 years ago

After thinking about it more I don't think deprecating the entire constructor is viable due to constructor inheritance.

Which is something that requiring new was supposed to open more of and since it opens more use-cases to our users it is something we are unlikely to back down on I think.

seishun commented 7 years ago

After thinking about it more I don't think deprecating the entire constructor is viable due to constructor inheritance.

Actually, you can add a check to prevent the deprecation warning if it's a child class. See here.

jasnell commented 7 years ago

fwiw, @trevnorris' suggestion can be simplified just a bit to... if (new.target && new.target === Buffer) { /** **/ }

ChALkeR commented 7 years ago

@feross

Your link is dead.

Fixed, thanks. I forgot http://, that was quite obvious =).

For that to happen, ALL the following things need to go wrong, which I find unlikely:

«Unlikely» does not work that way. Assign probabilities to each one, multiply them, multiply them by the ecosystem size. It's the large ecosystem that makes the net effect non-zero here. If there would have been 10 users in the ecosystem, then this would not have been an issue, of course.

Developers would need to write new code using Buffer(num), despite the strong recommendation against doing so in the docs.

They already do. A lot of users ignore deprecation warning in the docs. p > 0.1.

Developers would need to use Buffer(num) in such a way that the buffer is not fully overwritten by new data before being exposed to a client.

They already do.

Developers would need to install this new code and ship it to production. (This is very unlikely because users on v0.10 or v0.12 can't really upgrade their dependencies these days without some ES6 or ES7 sneaking in and breaking everything.)

Why are you mentioning 0.10 and 0.12 only? Think v6.9.1, v7.1.0, v4.6.2. That's all the current node users, a significant amount of whom would install/update new packages without updating Node.js. p > 0.3 (from the affected package users).

Developers would need to be running an insecure version of Node.js in production. (This is a really bad idea, and any user/company that cares about security will be on a supported version to keep on top of critical OpenSSL fixes, etc.)

Insecure? E.g. v7.1.0? The fact that v7.1.0 is going to become an «insecure version» is only because of this change you propose.

An attacker would need to identify a vulnerable service and create an exploit.

When one gets here and this becomes their last hope, things are pretty bad already. If the service is vulnerable, this is a problem by itself, even if it was not actually attacked. One of the problems is that in many cases one can't be sure if the server was compromised or not. Also, I estimate the rate of hacking something vulnerable (which accepts typed input e.g. through JSON) using just a params fuzzler here is quite probable.

Printing a deprecation warning doesn't prevent DoS.

It does, though highligting potentially dangereous code and making the packages to migrate from that to safer code.

yoshuawuyts commented 7 years ago

Alright, let's get some facts straight:

I feel we can have a productive discussion on what the best solutions are given these facts. But trying to argue against them again and again is quite unproductive. It makes people with possibly valuable perspectives grow tired and leave the thread, and I'm quite sure that turning a thread into an echo chamber is not the in the best interest of the project.

That said: I'm leaving this thread now. I'm tired.

seishun commented 7 years ago

I'd say that presenting opinions as facts isn't really productive either. I'm not sure if there's any point in responding, but I'll try to be terse anyway.

in no case is throwing a deprecation warning safer than zero-filling buffers

As mentioned, zero-filling doesn't help with DoS, and there is a non-zero chance that some devs will rely on it, creating security issues for users of older versions of node.

throwing a deprecation warning will be disruptive for the ecosystem and maintainers alike

It's indeed a fact that this creates annoyance for users and work for maintainers, but I'd reserve the term "disruption" for situations where things break and stop working, i.e. a whole different level of bad. (examples: realpath fiasco, removal of leftpad, etc)

Fishrock123 commented 7 years ago

Aside: "throwing" deprecation warnings is a misnomer. If that was true, your program would have unhandled exceptions. It does not unless you use a flag.

billiegoose commented 7 years ago

If you skipped over my first comment, go back and read it.

@feross @mafintosh @substack @sam-github I haven't authored any popular modules, so I haven't been "inundated" with Github issues. You'll have written hundreds of packages, and seem to think that the deprecation warning message causes the end of the world. Can you link to some of the flood of issues this deprecation has created, for the benefit of the rest of us (ie @seishun @ChALkeR @Fishrock123)? Because I'm genuinely curious. And comments like @Fishrock123 's seem to suggest they don't think deprecation warnings are a "real" error.

My 2 cents: DoS simply means in the worst case, your server fails. Maybe you lose money. I'm in this position - apparently five different packages in production introduce a RegExp DoS vulnerability, even though I have the latest versions of all my own dependencies installed. Whatever.

Leaking secrets from the server memory? Worse case, you have a major lawsuit on your hands from all the users whose passwords and sexual predilections were just made public.

The first is merely an issue of availability. The second is an issue of privacy.

But everyone is missing the forest for the trees:

To the end user, it does not matter where the vulnerability was, there is no point in blaming anyone or saying «that's not my problem» here. It actually does matter where the source of the problem is.

^This is the discussion that the Node TSC needs to discuss, elaborate, and resolve. Everybody wants safe code. But frankly, I think that core is over-reaching in trying to protect users here. The TSC needs to have a contract specifying what to do in cases like this. Frankly, this vulnerability has been around for a long time, so by inaction you've basically chosen "backwards compatibility" over "forward security". But that needs to be discussed, agreed on, and written in stone somewhere. You need to explicitly decide where the buck stops in terms of security. Right now, core is trying to hoist that responsibility on module authors ("we'll break your existing code to motivate you to fix it"), and module authors are trying to hoist it onto core ("just zero-fill the buffers already"), and neither side has even suggested that security is the user's responsibility. Which is fine until there's a lawsuit, and then I guarantee it's the user who will be held irresponsible for running outdated, insecure code.

sam-github commented 7 years ago

@wmhilton spelunking through github issues is time consuming, and a lot of our issues come through private support channels, but here's somewhat related examples:

https://github.com/strongloop/strongloop/issues/296

scroll to the end, you can see the user saying "it worked"! You can also see our yeoman generator output:

$ slc loopback
(node:36574) fs: re-evaluating native module sources is not supported. If you are using the graceful-fs module, please update it to a more recent version.
loopback deprecated loopback.cookieParser is deprecated. Use `require('cookie-parser');` instead. ../../../../usr/local/lib/node_modules/strongloop/node_modules/loopback-workspace/server/server.js:45:17

     _-----_
    |       |    .--------------------------.
    |--(o)--|    |  Let's create a LoopBack |
   `---------´   |       application!       |
    ( _´U`_ )    '--------------------------'
    /___A___\    
     |  ~  |     
   __'.___.'__   
 ´   `  |° ´ Y ` 

? What's the name of your application?

Ouch.... we fixed that. Note the command functions perfectly, it doesn't need those capabilities, but there isn't anything "soft" about those deprecations, they are VERY in-the-end-user's-face, even though they are only actionable by a developer.

Also, https://github.com/strongloop/strongloop/issues/200#issuecomment-76652267, this again isn't super specific, but you can see the user dumped a bunch of scary messages, assumed it was a problem, and we have to walk through and say "ok, but really, didn't it all actually work?", which is quite time consuming.

I tried to find an example of the glob "security deprecation" that came out during install of various tooling (even for CLIs that did not accept glob patterns), but didn't on github.

sam-github commented 7 years ago

^--- and to follow on, we treat all npm scary output as critical bugs (not "soft" messages), because when something does go wrong, customers tend to not give us the full npm-debug.log as npm gently asks them to, but just a screen shot of the first scary output they saw, even though a more experienced node user would recognize the warnings as advisory, and not related to the eventual install failure. The back and forth with them to get the relevant info is painful for everyone, and not worth the support cost.

mafintosh commented 7 years ago

@wmhilton The tone of most replies in this issue is too stubborn / confrontational / aggressive for me to participate. I share @feross opinion from earlier and stand by my original reply. I think around 1/4 modules I have authored is using the Buffer constructor is some way, so any deprecation involved here impacts me a lot which is why I care about this issue. Feel free to ping me on IRC / Twitter if you have questions :)

billiegoose commented 7 years ago

Sorry @mafintosh, I'm going to keep engaging! :) So would I be right in saying the consensus of module developers is that it is NOT node's job to ever print warnings to the console? Stdout and stderr are sacred and should not be used by node core? (I tend to agree with this.)

@sam-github I think you may be conflating node runtime api deprecation warnings with npm installation deprecated package warnings. Although from a user's experience they may be perceptually similar, the team of developers responsible for node and for npm are completely independent AFAIK.

trevnorris commented 7 years ago

It's after 5 am, I've been up all night and currently watching some anime while working through issues. So forgive me if I missed something in this large thread.

I'd like to first solidify two things about any future changes to the Buffer constructor.

  1. The Buffer constructor can't be fully removed. We can force new, print a warning or leave it as-is, but it can't be fully removed.
  2. The minimum parameters Buffer must accept are those that Uint8Array also accepts.

These are more facts of where we can go more than requests. So I'd like to solidify the main argument here is if Buffer without new will be deprecated and if so then how will we go about it?

seishun commented 7 years ago

@trevnorris It seems it's generally agreed that "ES6 classes" is insufficient justification for hard-deprecation. There's some agreement that hard-deprecation for security reasons is justified, but that implies hard-deprecating the Buffer constructor entirely (not removing, mind you), which has a nice side effect of also hard-deprecating Buffer without new (with possibility of later removal).

sam-github commented 7 years ago

@wmhilton You asked "how do deprecations effect node module authors", or did I misunderstand? It was a node semver-major change that broke graceful-fs, and it is node that introduced the warning about how the version of graceful-fs you are using needs to be updated, and did so before making the breaking change to give developers advance notice (but the message got seen by end-users and developers). If/when Buffer constructors are deprecated, it will cause a wave of npm modules deprecating older versions, so while the deprecation messages will come from npm, the root cause would be here, in node.

feross commented 7 years ago

I'm considering making Buffer() usage into an error in the standard linter that I maintain. [issue]

standard's ruleset has 450K downloads/month, so this could be a good, opt-in way to discourage usage of Buffer(). The other popular rulesets maintained by Airbnb, Google, etc. might also be receptive to a rule like this.

But, this investigation has shed more light for me on just how disruptive a change like this deprecation will be: Of the 300 repos in our test suite, around 70 of them fail due to Buffer() usage. That's 25%!

trevnorris commented 7 years ago

Apart from all of this, there's a solution for allowing Buffer to be extended that I believe touches outside the security peripheral. Basically it would come down to:

function Buffer(arg, encodingOrOffset, length) {
  if (typeof new.target === 'function' && new.target !== Buffer) {
    // continue w/ the old API
  }
  // ...
}

I believe allowing the old API to be used, w/o a warning, is specific enough that it wouldn't be a security concern. So the only question in this case is, would it be a strange enough API discrepancy to have the existing API deprecated with simply new Buffer() but not when it's extended.

billiegoose commented 7 years ago

@sam-github Ahh, that makes sense. I did not know that the root cause of graceful-fs being deprecated was a change in node. It has a ripple effect.

@feross Now that is an interesting idea. I wonder how many builds would break though because they rely on the linter passing? But I think this is the right kind of thinking. Who else could we involve in promoting the new APIs? Obviously npmjs. But maybe even popular bloggers. We could get "I rewrote my modules using the new Buffer API, and so should you" trending on Medium and Twitter. Or the Node Security Project. It's a social engineering problem at its core.

ChALkeR commented 7 years ago

@trevnorris Several questions:

  1. Am I correct that extending is not possible now in any way that would hit that «new» code once we introduce it?
  2. Would it be possible to zero-fill the extended Buffer?
  3. Does the extended Buffer need the Buffer(string) API?
addaleax commented 7 years ago

@ChALkeR I think I can answer these:

  1. Yes.
  2. Yes.
  3. We haven’t really discussed that yet but it would be possible to allow that, and to do so in a safe way.