Closed feross closed 8 years ago
Personally, I use new Buffer(number)
a lot in my code. I wouldn't mind replacing those calls with Buffer.alloc(number)
, but I would mind having the word "unsafe" all over my code that doesn't even send data anywhere. Not every Node.js program is a web server, and the concept of "safety" depends heavily on the use case.
Sure, but that's not what this issue is about.
Exactly, that's why I don't see why the addition of Buffer.safe
is being discussed here.
@seishun I am now preparing a lengthy explanation of my proposal that covers many questions, including your one. It will be done in few hours.
@seishun ... I can definitely see the argument for not using the unsafe
term. Perhaps instead of Buffer.unsafe()
/Buffer.safe()
we can simply use Buffer.alloc()
/Buffer.allocSafe()
and go from there? Seems to be a more acceptable solution.
@jasnell I'd go with Buffer.alloc()
/Buffer.calloc()
instead. As I said, the "safety" factor of a zero-filling allocation is use case dependent.
@jasnell Whatever you name them, the safe variant (the one that zero-fills the memory) should be the most obvious to use one. Buffer.alloc()
/Buffer.allocSafe()
will not do that.
The most straightforward and recommented way of allocating Buffer
s should do that in a safe way to ensure that users use that by default, and resort to allocating unsafe Buffer
s only if they are absolutely sure what they are doing and definitely need it.
Don't forget that not everyone will read the docs.
@seishun @jasnell If you want Buffer.alloc()
(or Buffer.allocate()
) to be present, then it should allocate zero-filled buffers, and the non zero-filled one should be called something like Buffer.allocUnsafe()
, Buffer.allocRaw()
, or something like that.
Perhaps Buffer.raw()
would be also fine, but there were some objections against that, if I remember things correctly. @joepie91, perhaps?
Most people should use the zero-filled variant.
@ChALkeR
the safe variant (the one that zero-fills the memory) should be the most obvious to use one
So far I haven't seen any evidence that people erroneously assume that new Buffer(number)
zero-fills the memory. The original message only gives examples of erroneously calling new Buffer(number)
instead of new Buffer(string)
. Zero-filling would only help in one case – if new Buffer(number)
returned zero-filled memory, then the issue presented would only result in incorrect behavior instead of a security issue. In other words, if we aren't going to change new Buffer(number)
to return zero-filled memory, then zero-filling is completely irrelevant to this issue.
So far I haven't seen any evidence that people erroneously assume that new Buffer(number) zero-fills the memory.
I have seen those people, found issues in code, reported those, and talked with such people over Gitter. I remember someone being surprised. Also, look at Node.js issue reports regarding zero-filling Buffer()
. Also, at least one of Node.js active collaborators was surprised by the fact that Buffer(number)
is not zero filled, afaik. Also, I guess @evilpacket have seen many such people.
«I have not seen any evidience» is not a good attitude to an issue that targets improving the ecosystem security.
So far I haven't seen any evidence that people erroneously assume that new Buffer(number) zero-fills the memory.
I, for one, was really surprised to find that the buffers are not zero-filled.
@ChALkeR Well, you're the first person to mention that in this issue. Would you care to provide some examples?
In any case, such confusion should be resolved by improving documentation, not by changing method names to something scary. I bet for most users performance is more important than safety for the case when they forget to write to the buffer.
I think using Buffer.alloc()
and Buffer.allocUnsafe()
would be a good start. The word "unsafe" could be used just like "sync", some methods have the default safety, others don't (Foo.bar()
- Foo.barUnsafe()
). Go has an unsafe
package too: https://golang.org/pkg/unsafe/.
It's not good to see the word "unsafe" in a code, but at least it makes developers cautious, which I think is the point. They will immediately know that something has to be taken care of when they use it.
I don't like Buffer.safe()
and Buffer.unsafe()
because it is not clear what they are doing. On the other hand alloc
describes the purpose of the function.
@seishun For example: https://github.com/nodejs/node-v0.x-archive/issues/7230
I don't want to paste Gitter chat logs now, I really want to finish my post about the current proposal today, and it's already 22:35 here. What I already mentioned should be enough.
For example: nodejs/node-v0.x-archive#7230
And it was resolved by improving the documentation. I'm actually surprised that the docs didn't mention it until less than a year ago. That must mean not a lot of people were surprised/affected by this behavior, otherwise someone would have complained about it and got the docs fixed much earlier.
As said earlier, the fact that it is "unsafe" is use case dependent.
The best proposal I've seen so far, from a user POV, is deprecating Buffer(num) altogether and introducing Buffer.alloc(num) that takes on the old behavior of Buffer(num). Introducing a Buffer.calloc is a good idea, too, although I agree that it is separate.
This seems like the easiest change to explain, implement, and maintain (one simple find and replace). Overloading the constructor this way is dangerous, proven by the fact that advanced developers didn't catch it. It's clearly going to continue to be a problem if the behavior is not deprecated.
On Thursday, January 14, 2016, Сковорода Никита Андреевич < notifications@github.com> wrote:
@seishun https://github.com/seishun For example: nodejs/node-v0.x-archive#7230 https://github.com/nodejs/node-v0.x-archive/issues/7230
I don't want to paste Gitter chat logs now, I really want to finish my post about the current proposal today, and it's already 22:35 here. What I already mentioned should be enough.
— Reply to this email directly or view it on GitHub https://github.com/nodejs/node/issues/4660#issuecomment-171754455.
Karissa McKelvey http://karissa.github.io/
@karissa
and introducing Buffer.alloc(num) that takes on the old behavior of Buffer(num). Introducing a Buffer.calloc is a good idea, too, although I agree that it is separate.
That won't help. Please see explanations above or wait for my post.
I don't actually understand why that wouldn't help, can you give me more information as to why you think that?
I was under the impression that the problem is from overloading the constructor, passing a number when you expected it to be a string.
On Thursday, January 14, 2016, Сковорода Никита Андреевич < notifications@github.com> wrote:
@karissa https://github.com/karissa
and introducing Buffer.alloc(num) that takes on the old behavior of Buffer(num). Introducing a Buffer.calloc is a good idea, too, although I agree that it is separate.
That won't help.
— Reply to this email directly or view it on GitHub https://github.com/nodejs/node/issues/4660#issuecomment-171762736.
Karissa McKelvey http://karissa.github.io/
I'm a Node user, and a longtime security engineer at Twitter and now EFF. I was very surprised to hear that a common userland API in Node can return uninitialized memory, and it makes me really concerned about the safety of the Node project I maintain. I don't use Buffer directly, but I use number of packages. The dynamic nature of Node means it's impossible to programmatically check whether any of those packages may be misusing Buffer. As a Node user, I very much want this API to be fixed (to zero memory) and backported.
As far as I can tell, there are two objections to fixing the API and backporting:
For (1), how can it be an API-breaking change? The existing API specifies that the memory is uninitialized, which means it can have any value. Zeros are a valid value.
For (2), I haven't seen any benchmarks showing the presumed slowdown on real world loads. I would venture to say that most applications won't even notice the difference. That said, for applications that get slower when upgrading to a backported version, the path is clear and easy: Update their critical Buffer(num)
calls to the new Buffer.allocUnsafe(num)
.
@mhart linked earlier to https://matt.sh/howto-c#_misc-thoughts, a very good, credible source with the recommendation that C programmers should "always use calloc. There is no performance penalty for getting zero'd memory." I'd like to repeat that recommendation. Best practice for C coders in 2016 is to always zero memory. I'd really like Node in 2016 to be at least as good as C's best practices in terms of memory safety.
@jsha Changing new Buffer(number)
in a way so that it will zero-fill the memory will cause more security issues and make this matter even worse. Please read the above discussion or wait for my post.
@ChALkeR Your argument is that if node 1.6 is secure against this and node 1.5 is not, this is worse than both being insecure. I think that's wrong.
@wbl No, that's not it. Please read the discussion above.
@ChALkeR I found your long post, I don't see how (other then my getting the numbers wrong) that isn't your argument. People on 5.x are getting pwnd now because of this, we can stop that without requiring any code changes or breaking existing code.
For the sake of testing I have sent in a WIP PR that replaces all instances of malloc in buffer with calloc. I have a CI run going and a CITGM run going so we can see if it introduces any weirdness into the ecosystem.
@seishun @karissa @wbl Please wait a few hours for a lengtly explanation of my current proposal before asking further questions.
@TheAlphaNerd ... if you would, please run a similar CITGM run on https://github.com/nodejs/node/pull/4682
Adding an API seems like a very poor solution. This would mean that individual modules decide whether they want to be faster or safer and would leave users of modules with a combination of slower and less safe modules when they want to choose a specific characteristic.
The best solution I can see would be a command line flag that turns on these safety features globally. Adding API only makes the situation worse.
As far as changing the default, that's a huge departure from Node's prior behaviour. Considering the range of use cases Node.js has this could be detrimental to many users and may not be the particular profile they prefer.
@mikeal That point of view is also covered in my forecoming post =).
To give a little more perspective, think of all the small devices that are now running Node.js on incredibly limited resources. Changing this default wouldn't just slow them down, it might make them entirely unusable, and for a concern they may not even share. This project is responsible for maintaining compatibility with a wide range of use cases and while should certainly add features that help subsets of those use cases we have to be cognisant of the fact that some behaviours are going to win as the default simply because that's how they have been for long enough that altering them causes a huge amount of breakage and upgrade pain for too many people and so the burden of turning some of this stuff on will and on users who need the new behaviour rather than the old.
@mikeal The current proposal does not deal with changing the existing behaviour.
@ChALkeR I know, but not everyone in thread shares the perspective driving some of the proposed solutions so I'm trying to build a better understanding of the motivations behind them for everyone involved :)
@mikeal I am currenly in the process of writing a lengthy post that will explain everything better, including the proposed approach, motivations, and possible considerations.
I think the main problem here is that the current Buffer constructor overloads behaviour. One overload Buffer(number) allocates a new buffer. Buffer(string) converts a string into a buffer. In a dynamically language you sometimes mess up types which results in the wrong constructor being called (which can be hard to notice and result in security issues like @feross describes). Adding two new explicit APIs would help solve this issue
var buf = Buffer.allocate(number)
to allocate a non-zeroed out buffervar buf = Buffer.from(val)
to turn a string, uint8 array etc into a bufferUser-land libraries could easily poly/pony-fill these APIs for old node. If we add these APIs we could deprecate the old Buffer constructor as well.
@mafintosh Do I get it right that you mean something close to my proposal, with the only addition being that Buffer(val)
will be also deprecated and moved to Buffer.from(val)
?
@ChALkeR yea sounds about right. Buffer.from
vs Buffer(val)
is a bit easier to feature detect for which makes it easier to add old node support in a userland library. Would also allow for static analysis to find uses of the deprecated constructor
@mafintosh That one (Buffer.from
) is new and is not covered in my forecoming post, but I think that it's out of scope of the current issue — it doesn't give any security benefits compared to the current proposal.
Let's deprecate Buffer(number)
first, introducing more deprecations in the same issue will delay the discussion and make coming to a consensus less likely.
Also note that Buffer(value)
is widely used, maybe even more widely than Buffer(number)
.
@ChALkeR since both of the actual security issues @feross and I have found in modules resulted from the fact the Buffer(number)
was called by accident when someone was trying to call Buffer(string)
I think its related. A deprecation warning wouldn't necessarily help here since it would only shown when the faulty constructor is being called (during an attack). It's better than doing nothing though.
@ChALkeR I think that @mafintosh's point is relevant.
In the current proposal, developers are still forced to keep using Buffer(variable)
to convert to a string, leaving them vulnerable to the original issues that affected bittorrent-dht
and ws
.
With these:
Buffer.from(value)
- convert from any type to a bufferBuffer.alloc(size)
- create an uninitialized buffer with given sizethe developer intent is clearly conveyed, and potentially unsafe functionality is put into it's own API.
I think if the API were being designed today, this how it would be done. Let's deprecate Buffer(value)
entirely.
@mafintosh Hm. That's a valid point. Perhaps that justifies adding a new API endpoint to create Buffers
from values.
I think if the API were being designed today, this how it would be done. Let's deprecate
Buffer(value)
entirely.
The assertiveness of your statement is a little off putting. At minimum this needs to be voted on by the @nodejs/ctc and a change this drastic may warrant a proper EPS.
@mafintosh, @feross I just gave you the access to the post about the current proposal, it's about 70% done, so that you can view it and comment here or somewhere else.
@trevnorris «What the API would look like if it was designed today» is the first most important question that we should study and think about. The second one is «how do we get there from where we are now in a best possible way»?
https://github.com/nodejs/node/issues/4660#issuecomment-171382461
@ChALkeR thanks. i'll have a look.
@ChALkeR "What should it look like today" is always going to be subjective. I understand the merit in asking that question, but for example designing the fs module today from scratch would bring on quite the argument. Instead I would suggest "what feels most natural with the existing API". Maybe it's not "the best", but at least it will work the way the community expects it to.
Quick update on https://github.com/nodejs/node/pull/4682 :
--zero-fill-buffers
command line option is implemented.Buffer.alloc()
and Buffer.zalloc()
.
Buffer.zalloc()
== 'zeroed-allocation'. It essentially just creates and returns a new Uint8Array that is always zero-filled.Buffer.alloc()
is equivalent to the existing new Buffer(num)
. SlowBuffer.alloc()
and SlowBuffer.zalloc()
methods are added as wellBuffer(num)
and SlowBuffer(num)
are hard deprecated.As a reminder, this is a work in progress PR that is not ready to land as is. It is intended to give us something concrete to work with as opposed to going around in circles over whether or not there's really a bug in Buffer or not.
Using the --zero-fill-buffers
command line flag would force Buffer.alloc()
and Buffer(num)
to fallback to Buffer.zalloc()
(and SlowBuffer.alloc()
/SlowBuffer(num)
to fall back to SlowBuffer.zalloc()
).
This approach gives us two distinct ways of addressing the problem:
--zero-fill-buffers
flag to enforce that all buffers be zeroed Buffer.alloc()
or Buffer.zalloc()
based on their specific needs, with the --zero-fill-buffers
command line flag taking precedence. For LTS, I would propose that only the --zero-fill-buffers
flag would be backported to v4, v0.12 and v0.10. Note, however, that due to the differences in the underlying Buffer implementation in v0.12/v0.10, the actual implementation would be different than v4 and above. The differences in the implementation are likely inconsequential here.
This all said, I'm still personally unconvinced that the new Buffer.alloc()
and Buffer.zalloc()
methods are even required (similar to @mikeal's objection here). I think the new methods simply add confusion and make the API even muddier. However, I won't let my personal objections block the action on this.
Changing new Buffer(number) in a way so that it will zero-fill the memory will cause more security issues and make this matter even worse. Please read the above discussion or wait for my post.
I read the entire thread in detail before replying. I see two arguments that de-fanging the present Buffer(num)
API might cause security issues:
Such code is already critically, dangerously broken, and hopefully very rare. I don't see this as a good reason to hold up a very real security fix affecting a broad range of modules.
I have been taking it for granted that de-fanging the current API would come along with deprecating it, which would avoid this problem.
Are there arguments I'm missing about why this would cause additional security issues?
To give a little more perspective, think of all the small devices that are now running Node.js on incredibly limited resources.
I understand this concern, but can you give some examples of apps and/or devices? I would be very surprised if even the smallest devices can't afford to zero their memory. It would also be useful to have examples of popular npm modules that intentionally use Buffer(num)
for performance reasons. It would then be possible to benchmark those modules under realistic conditions.
Even if we assume there is a device and app that would become unusable with a de-fanged API, the maintainer can just switch to some new Buffer.unsafe_alloc()
method, if that's really what they intend to do.
Another argument for Buffer.from
and Buffer.alloc
as a replacement for the current overloaded Buffer(value)
constructor is that it's familiar because of Array.from
and TypedArray.from
.
Those already work in the way that Buffer.from
would work.
The
Array.from()
method creates a newArray
instance from an array-like or iterable object.
Almost all the types that Buffer(value)
takes today (except for number
, of course, and ArrayBuffer
) match what Array.from
and TypedArray.from
do today.
Almost all the types that
Buffer(value)
takes today (except fornumber
, of course, andArrayBuffer
) match whatArray.from
andTypedArray.from
do today.
So except for type number
, ArrayBuffer
, string
and JSON
type (leaving TypedArray
and Array
) almost all the types match?
Actually, Array.from
works on string
.
The whole point of this issue is that number
shouldn't be included in the same constructor.
So, yes. Array.from
works on TypedArray
, Array
, and string
. Not on ArrayBuffer
or JSON
.
Doesn't work the same way.
Uint8Array.from('123') -> [1, 2, 3]
Buffer('123') -> [31, 32, 33]
Fair point. So it's not exactly the same. I just suggested that it would be familiar.
tl;dr
This issue proposes:
new Buffer(number)
to return safe, zeroed-out memoryBuffer.alloc(number)
Update: Jan 15, 2016
Upon further consideration, I think that returning zeroed out memory is a separate issue. The core issue is: unsafe buffer allocation should be in a different API.
I now support adding two APIs:
Buffer.from(value)
- convert from any type to a bufferBuffer.alloc(size)
- create an uninitialized buffer with given sizeThis solves the core problem that affected
ws
andbittorrent-dht
which isBuffer(variable)
getting tricked into taking a number argument.Why is
Buffer
unsafe?Today, the node.js
Buffer
constructor is overloaded to handle many different argument types likeString
,Array
,Object
,TypedArrayView
(Uint8Array
, etc.),ArrayBuffer
, and alsoNumber
.The API is optimized for convenience: you can throw any type at it, and it will try to do what you want.
Because the Buffer constructor is so powerful, you often see code like this:
_But what happens if
toHex
is called with aNumber
argument?_Remote Memory Disclosure
If an attacker can make your program call the
Buffer
constructor with aNumber
argument, then they can make it allocate uninitialized memory from the node.js process. This could potentially disclose TLS private keys, user data, or database passwords.When the
Buffer
constructor is passed aNumber
argument, it returns an UNINITIALIZED block of memory of the specifiedsize
. When you create aBuffer
like this, you MUST overwrite the contents before returning it to the user.Would this ever be a problem in real code?
Yes. It's surprisingly common to forget to check the type of your variables in a dynamically-typed language like JavaScript.
Usually the consequences of assuming the wrong type is that your program crashes with an uncaught exception. But the failure mode for forgetting to check the type of arguments to the
Buffer
constructor is more catastrophic.Here's an example of a vulnerable service that takes a JSON payload and converts it to hex:
In this example, an http client just has to send:
and it will get back 1,000 bytes of uninitialized memory from the server.
This is a very serious bug. It's similar in severity to the the Heartbleed bug that allowed disclosure of OpenSSL process memory by remote attackers.
Which real-world packages were vulnerable?
bittorrent-dht
@mafintosh and I found this issue in one of our own packages,
bittorrent-dht
. The bug would allow anyone on the internet to send a series of messages to a user ofbittorrent-dht
and get them to reveal 20 bytes at a time of uninitialized memory from the node.js process.Here's the commit that fixed it. We released a new fixed version, created a Node Security Project disclosure, and deprecated all vulnerable versions on npm so users will get a warning to upgrade to a newer version.
ws
That got us wondering if there were other vulnerable packages. Sure enough, within a short period of time, we found the same issue in
ws
, the most popular WebSocket implementation in node.js.If certain APIs were called with
Number
parameters instead ofString
orBuffer
as expected, then uninitialized server memory would be disclosed to the remote peer.These were the vulnerable methods:
Here's a vulnerable socket server with some echo functionality:
socket.send(number)
called on the server, will disclose server memory.Here's the release where the issue was fixed, with a more detailed explanation. Props to @3rd-Eden for the quick fix. Here's the Node Security Project disclosure.
What's the solution?
It's important that node.js offers a fast way to get memory otherwise performance-critical applications would needlessly get a lot slower.
But we need a better way to signal our intent as programmers. When we want uninitialized memory, we should request it explicitly.
Sensitive functionality should not be packed into a developer-friendly API that loosely accepts many different types. This type of API encourages the lazy practice of passing variables in without checking the type very carefully.
Buffer.alloc(number)
The functionality of creating buffers with uninitialized memory should be part of another API. We propose
Buffer.alloc(number)
. This way, it's not part of an API that frequently gets user input of all sorts of different types passed into it.How do we fix node.js core?
We sent a PR (merged as
semver-major
) which defends against one case:In this situation, it's implied that the programmer intended the first argument to be a string, since they passed an encoding as a second argument. Today, node.js will allocate uninitialized memory in the case of
new Buffer(number, encoding)
, which is probably not what the programmer intended.But this is only a partial solution, since if the programmer does
new Buffer(variable)
(without anencoding
parameter) there's no way to know what they intended. Ifvariable
is sometimes a number, then uninitialized memory will sometimes be returned.What's the real long-term fix?
We could deprecate and remove
new Buffer(number)
and useBuffer.alloc(number)
when we need uninitialized memory. But that would break 1000s of packages. So that's a no-go.Instead, we believe the best solution is to:
new Buffer(number)
to return safe, zeroed-out memoryBuffer.alloc(number)
This way, existing code continues working and the impact on the npm ecosystem will be minimal. Over time, npm maintainers can migrate performance-critical code to use
Buffer.alloc(number)
instead ofnew Buffer(number)
.Conclusion
We think there's a serious design issue with the
Buffer
API as it exists today. It promotes insecure software by putting high-risk functionality into a convenient API with friendly "developer ergonomics".This wasn't merely a theoretical exercise because we found the issue in some of the most popular npm packages.
Eventually, we hope that node.js core can switch to this new, safer behavior. We believe the impact on the ecosystem would be minimal since it's not a breaking change. Well-maintained, popular packages would be updated to use
Buffer.alloc
quickly, while older, insecure packages would magically become safe from this attack vector.