Buffer(number) is unsafe

feross commented 8 years ago

tl;dr

This issue proposes:

Change new Buffer(number) to return safe, zeroed-out memory
Create a new API for creating uninitialized Buffers, Buffer.alloc(number)
Update: Jan 15, 2016

Upon further consideration, I think that returning zeroed out memory is a separate issue. The core issue is: unsafe buffer allocation should be in a different API.

I now support adding two APIs:

Buffer.from(value) - convert from any type to a buffer
Buffer.alloc(size) - create an uninitialized buffer with given size

This solves the core problem that affected ws and bittorrent-dht which is Buffer(variable) getting tricked into taking a number argument.

Why is `Buffer` unsafe?

Today, the node.js Buffer constructor is overloaded to handle many different argument types like String, Array, Object, TypedArrayView (Uint8Array, etc.), ArrayBuffer, and also Number.

The API is optimized for convenience: you can throw any type at it, and it will try to do what you want.

Because the Buffer constructor is so powerful, you often see code like this:

// Convert UTF-8 strings to hex
function toHex (str) {
  return new Buffer(str).toString('hex')
}

_But what happens if toHex is called with a Number argument?_

Remote Memory Disclosure

If an attacker can make your program call the Buffer constructor with a Number argument, then they can make it allocate uninitialized memory from the node.js process. This could potentially disclose TLS private keys, user data, or database passwords.

When the Buffer constructor is passed a Number argument, it returns an UNINITIALIZED block of memory of the specified size. When you create a Buffer like this, you MUST overwrite the contents before returning it to the user.

Would this ever be a problem in real code?

Yes. It's surprisingly common to forget to check the type of your variables in a dynamically-typed language like JavaScript.

Usually the consequences of assuming the wrong type is that your program crashes with an uncaught exception. But the failure mode for forgetting to check the type of arguments to the Buffer constructor is more catastrophic.

Here's an example of a vulnerable service that takes a JSON payload and converts it to hex:

// Take a JSON payload {str: "some string"} and convert it to hex
var server = http.createServer(function (req, res) {
  var data = ''
  req.setEncoding('utf8')
  req.on('data', function (chunk) {
    data += chunk
  })
  req.on('end', function () {
    var body = JSON.parse(data)
    res.end(new Buffer(body.str).toString('hex'))
  })
})

server.listen(8080)

In this example, an http client just has to send:

{
  "str": 1000
}

and it will get back 1,000 bytes of uninitialized memory from the server.

This is a very serious bug. It's similar in severity to the the Heartbleed bug that allowed disclosure of OpenSSL process memory by remote attackers.

Which real-world packages were vulnerable?

`bittorrent-dht`

@mafintosh and I found this issue in one of our own packages, bittorrent-dht. The bug would allow anyone on the internet to send a series of messages to a user of bittorrent-dht and get them to reveal 20 bytes at a time of uninitialized memory from the node.js process.

Here's the commit that fixed it. We released a new fixed version, created a Node Security Project disclosure, and deprecated all vulnerable versions on npm so users will get a warning to upgrade to a newer version.

`ws`

That got us wondering if there were other vulnerable packages. Sure enough, within a short period of time, we found the same issue in ws, the most popular WebSocket implementation in node.js.

If certain APIs were called with Number parameters instead of String or Buffer as expected, then uninitialized server memory would be disclosed to the remote peer.

These were the vulnerable methods:

socket.send(number)
socket.ping(number)
socket.pong(number)

Here's a vulnerable socket server with some echo functionality:

server.on('connection', function (socket) {
  socket.on('message', function (message) {
    message = JSON.parse(message)
    if (message.type === 'echo') {
      socket.send(message.data) // send back the user's message
    }
  })
})

socket.send(number) called on the server, will disclose server memory.

Here's the release where the issue was fixed, with a more detailed explanation. Props to @3rd-Eden for the quick fix. Here's the Node Security Project disclosure.

What's the solution?

It's important that node.js offers a fast way to get memory otherwise performance-critical applications would needlessly get a lot slower.

But we need a better way to signal our intent as programmers. When we want uninitialized memory, we should request it explicitly.

Sensitive functionality should not be packed into a developer-friendly API that loosely accepts many different types. This type of API encourages the lazy practice of passing variables in without checking the type very carefully.

`Buffer.alloc(number)`

The functionality of creating buffers with uninitialized memory should be part of another API. We propose Buffer.alloc(number). This way, it's not part of an API that frequently gets user input of all sorts of different types passed into it.

var buf = Buffer.alloc(16) // careful, uninitialized memory!

// Immediately overwrite the uninitialized buffer with data from another buffer
for (var i = 0; i < buf.length; i++) {
  buf[i] = otherBuf[i]
}

How do we fix node.js core?

We sent a PR (merged as semver-major) which defends against one case:

var str = 16
new Buffer(str, 'utf8')

In this situation, it's implied that the programmer intended the first argument to be a string, since they passed an encoding as a second argument. Today, node.js will allocate uninitialized memory in the case of new Buffer(number, encoding), which is probably not what the programmer intended.

But this is only a partial solution, since if the programmer does new Buffer(variable) (without an encoding parameter) there's no way to know what they intended. If variable is sometimes a number, then uninitialized memory will sometimes be returned.

What's the real long-term fix?

We could deprecate and remove new Buffer(number) and use Buffer.alloc(number) when we need uninitialized memory. But that would break 1000s of packages. So that's a no-go.

Instead, we believe the best solution is to:

Change new Buffer(number) to return safe, zeroed-out memory
Create a new API for creating uninitialized Buffers. We propose: Buffer.alloc(number)

This way, existing code continues working and the impact on the npm ecosystem will be minimal. Over time, npm maintainers can migrate performance-critical code to use Buffer.alloc(number) instead of new Buffer(number).

Conclusion

We think there's a serious design issue with the Buffer API as it exists today. It promotes insecure software by putting high-risk functionality into a convenient API with friendly "developer ergonomics".

This wasn't merely a theoretical exercise because we found the issue in some of the most popular npm packages.

Eventually, we hope that node.js core can switch to this new, safer behavior. We believe the impact on the ecosystem would be minimal since it's not a breaking change. Well-maintained, popular packages would be updated to use Buffer.alloc quickly, while older, insecure packages would magically become safe from this attack vector.

seishun commented 8 years ago

Personally, I use new Buffer(number) a lot in my code. I wouldn't mind replacing those calls with Buffer.alloc(number), but I would mind having the word "unsafe" all over my code that doesn't even send data anywhere. Not every Node.js program is a web server, and the concept of "safety" depends heavily on the use case.

Sure, but that's not what this issue is about.

Exactly, that's why I don't see why the addition of Buffer.safe is being discussed here.

ChALkeR commented 8 years ago

@seishun I am now preparing a lengthy explanation of my proposal that covers many questions, including your one. It will be done in few hours.

jasnell commented 8 years ago

@seishun ... I can definitely see the argument for not using the unsafe term. Perhaps instead of Buffer.unsafe()/Buffer.safe() we can simply use Buffer.alloc()/Buffer.allocSafe() and go from there? Seems to be a more acceptable solution.

seishun commented 8 years ago

@jasnell I'd go with Buffer.alloc()/Buffer.calloc() instead. As I said, the "safety" factor of a zero-filling allocation is use case dependent.

ChALkeR commented 8 years ago

@jasnell Whatever you name them, the safe variant (the one that zero-fills the memory) should be the most obvious to use one. Buffer.alloc()/Buffer.allocSafe() will not do that.

The most straightforward and recommented way of allocating Buffers should do that in a safe way to ensure that users use that by default, and resort to allocating unsafe Buffers only if they are absolutely sure what they are doing and definitely need it.

Don't forget that not everyone will read the docs.

ChALkeR commented 8 years ago

@seishun @jasnell If you want Buffer.alloc() (or Buffer.allocate()) to be present, then it should allocate zero-filled buffers, and the non zero-filled one should be called something like Buffer.allocUnsafe(), Buffer.allocRaw(), or something like that.

Perhaps Buffer.raw() would be also fine, but there were some objections against that, if I remember things correctly. @joepie91, perhaps?

Most people should use the zero-filled variant.

seishun commented 8 years ago

@ChALkeR

the safe variant (the one that zero-fills the memory) should be the most obvious to use one

So far I haven't seen any evidence that people erroneously assume that new Buffer(number) zero-fills the memory. The original message only gives examples of erroneously calling new Buffer(number) instead of new Buffer(string). Zero-filling would only help in one case – if new Buffer(number) returned zero-filled memory, then the issue presented would only result in incorrect behavior instead of a security issue. In other words, if we aren't going to change new Buffer(number) to return zero-filled memory, then zero-filling is completely irrelevant to this issue.

ChALkeR commented 8 years ago

So far I haven't seen any evidence that people erroneously assume that new Buffer(number) zero-fills the memory.

I have seen those people, found issues in code, reported those, and talked with such people over Gitter. I remember someone being surprised. Also, look at Node.js issue reports regarding zero-filling Buffer(). Also, at least one of Node.js active collaborators was surprised by the fact that Buffer(number) is not zero filled, afaik. Also, I guess @evilpacket have seen many such people.

«I have not seen any evidience» is not a good attitude to an issue that targets improving the ecosystem security.

thefourtheye commented 8 years ago

So far I haven't seen any evidence that people erroneously assume that new Buffer(number) zero-fills the memory.

I, for one, was really surprised to find that the buffers are not zero-filled.

seishun commented 8 years ago

@ChALkeR Well, you're the first person to mention that in this issue. Would you care to provide some examples?

In any case, such confusion should be resolved by improving documentation, not by changing method names to something scary. I bet for most users performance is more important than safety for the case when they forget to write to the buffer.

gergoerdosi commented 8 years ago

I think using Buffer.alloc() and Buffer.allocUnsafe() would be a good start. The word "unsafe" could be used just like "sync", some methods have the default safety, others don't (Foo.bar() - Foo.barUnsafe()). Go has an unsafe package too: https://golang.org/pkg/unsafe/.

It's not good to see the word "unsafe" in a code, but at least it makes developers cautious, which I think is the point. They will immediately know that something has to be taken care of when they use it.

I don't like Buffer.safe() and Buffer.unsafe() because it is not clear what they are doing. On the other hand alloc describes the purpose of the function.

ChALkeR commented 8 years ago

@seishun For example: https://github.com/nodejs/node-v0.x-archive/issues/7230

I don't want to paste Gitter chat logs now, I really want to finish my post about the current proposal today, and it's already 22:35 here. What I already mentioned should be enough.

seishun commented 8 years ago

For example: nodejs/node-v0.x-archive#7230

And it was resolved by improving the documentation. I'm actually surprised that the docs didn't mention it until less than a year ago. That must mean not a lot of people were surprised/affected by this behavior, otherwise someone would have complained about it and got the docs fixed much earlier.

okdistribute commented 8 years ago

As said earlier, the fact that it is "unsafe" is use case dependent.

The best proposal I've seen so far, from a user POV, is deprecating Buffer(num) altogether and introducing Buffer.alloc(num) that takes on the old behavior of Buffer(num). Introducing a Buffer.calloc is a good idea, too, although I agree that it is separate.

This seems like the easiest change to explain, implement, and maintain (one simple find and replace). Overloading the constructor this way is dangerous, proven by the fact that advanced developers didn't catch it. It's clearly going to continue to be a problem if the behavior is not deprecated.

On Thursday, January 14, 2016, Сковорода Никита Андреевич < notifications@github.com> wrote:

@seishun https://github.com/seishun For example: nodejs/node-v0.x-archive#7230 https://github.com/nodejs/node-v0.x-archive/issues/7230

I don't want to paste Gitter chat logs now, I really want to finish my post about the current proposal today, and it's already 22:35 here. What I already mentioned should be enough.

— Reply to this email directly or view it on GitHub https://github.com/nodejs/node/issues/4660#issuecomment-171754455.

Karissa McKelvey http://karissa.github.io/

ChALkeR commented 8 years ago

@karissa

and introducing Buffer.alloc(num) that takes on the old behavior of Buffer(num). Introducing a Buffer.calloc is a good idea, too, although I agree that it is separate.

That won't help. Please see explanations above or wait for my post.

okdistribute commented 8 years ago

I don't actually understand why that wouldn't help, can you give me more information as to why you think that?

I was under the impression that the problem is from overloading the constructor, passing a number when you expected it to be a string.

On Thursday, January 14, 2016, Сковорода Никита Андреевич < notifications@github.com> wrote:

@karissa https://github.com/karissa

and introducing Buffer.alloc(num) that takes on the old behavior of Buffer(num). Introducing a Buffer.calloc is a good idea, too, although I agree that it is separate.

That won't help.

— Reply to this email directly or view it on GitHub https://github.com/nodejs/node/issues/4660#issuecomment-171762736.

Karissa McKelvey http://karissa.github.io/

jsha commented 8 years ago

I'm a Node user, and a longtime security engineer at Twitter and now EFF. I was very surprised to hear that a common userland API in Node can return uninitialized memory, and it makes me really concerned about the safety of the Node project I maintain. I don't use Buffer directly, but I use number of packages. The dynamic nature of Node means it's impossible to programmatically check whether any of those packages may be misusing Buffer. As a Node user, I very much want this API to be fixed (to zero memory) and backported.

As far as I can tell, there are two objections to fixing the API and backporting:

It's an API-breaking change.
There will be performance impacts.

For (1), how can it be an API-breaking change? The existing API specifies that the memory is uninitialized, which means it can have any value. Zeros are a valid value.

For (2), I haven't seen any benchmarks showing the presumed slowdown on real world loads. I would venture to say that most applications won't even notice the difference. That said, for applications that get slower when upgrading to a backported version, the path is clear and easy: Update their critical Buffer(num) calls to the new Buffer.allocUnsafe(num).

@mhart linked earlier to https://matt.sh/howto-c#_misc-thoughts, a very good, credible source with the recommendation that C programmers should "always use calloc. There is no performance penalty for getting zero'd memory." I'd like to repeat that recommendation. Best practice for C coders in 2016 is to always zero memory. I'd really like Node in 2016 to be at least as good as C's best practices in terms of memory safety.

ChALkeR commented 8 years ago

@jsha Changing new Buffer(number) in a way so that it will zero-fill the memory will cause more security issues and make this matter even worse. Please read the above discussion or wait for my post.

wbl commented 8 years ago

@ChALkeR Your argument is that if node 1.6 is secure against this and node 1.5 is not, this is worse than both being insecure. I think that's wrong.

ChALkeR commented 8 years ago

@wbl No, that's not it. Please read the discussion above.

wbl commented 8 years ago

@ChALkeR I found your long post, I don't see how (other then my getting the numbers wrong) that isn't your argument. People on 5.x are getting pwnd now because of this, we can stop that without requiring any code changes or breaking existing code.

MylesBorins commented 8 years ago

For the sake of testing I have sent in a WIP PR that replaces all instances of malloc in buffer with calloc. I have a CI run going and a CITGM run going so we can see if it introduces any weirdness into the ecosystem.

ChALkeR commented 8 years ago

@seishun @karissa @wbl Please wait a few hours for a lengtly explanation of my current proposal before asking further questions.

jasnell commented 8 years ago

@TheAlphaNerd ... if you would, please run a similar CITGM run on https://github.com/nodejs/node/pull/4682

mikeal commented 8 years ago

Adding an API seems like a very poor solution. This would mean that individual modules decide whether they want to be faster or safer and would leave users of modules with a combination of slower and less safe modules when they want to choose a specific characteristic.

The best solution I can see would be a command line flag that turns on these safety features globally. Adding API only makes the situation worse.

As far as changing the default, that's a huge departure from Node's prior behaviour. Considering the range of use cases Node.js has this could be detrimental to many users and may not be the particular profile they prefer.

ChALkeR commented 8 years ago

@mikeal That point of view is also covered in my forecoming post =).

mikeal commented 8 years ago

To give a little more perspective, think of all the small devices that are now running Node.js on incredibly limited resources. Changing this default wouldn't just slow them down, it might make them entirely unusable, and for a concern they may not even share. This project is responsible for maintaining compatibility with a wide range of use cases and while should certainly add features that help subsets of those use cases we have to be cognisant of the fact that some behaviours are going to win as the default simply because that's how they have been for long enough that altering them causes a huge amount of breakage and upgrade pain for too many people and so the burden of turning some of this stuff on will and on users who need the new behaviour rather than the old.

ChALkeR commented 8 years ago

@mikeal The current proposal does not deal with changing the existing behaviour.

mikeal commented 8 years ago

@ChALkeR I know, but not everyone in thread shares the perspective driving some of the proposed solutions so I'm trying to build a better understanding of the motivations behind them for everyone involved :)

ChALkeR commented 8 years ago

@mikeal I am currenly in the process of writing a lengthy post that will explain everything better, including the proposed approach, motivations, and possible considerations.

mafintosh commented 8 years ago

I think the main problem here is that the current Buffer constructor overloads behaviour. One overload Buffer(number) allocates a new buffer. Buffer(string) converts a string into a buffer. In a dynamically language you sometimes mess up types which results in the wrong constructor being called (which can be hard to notice and result in security issues like @feross describes). Adding two new explicit APIs would help solve this issue

var buf = Buffer.allocate(number) to allocate a non-zeroed out buffer
var buf = Buffer.from(val) to turn a string, uint8 array etc into a buffer

User-land libraries could easily poly/pony-fill these APIs for old node. If we add these APIs we could deprecate the old Buffer constructor as well.

ChALkeR commented 8 years ago

@mafintosh Do I get it right that you mean something close to my proposal, with the only addition being that Buffer(val) will be also deprecated and moved to Buffer.from(val)?

mafintosh commented 8 years ago

@ChALkeR yea sounds about right. Buffer.from vs Buffer(val) is a bit easier to feature detect for which makes it easier to add old node support in a userland library. Would also allow for static analysis to find uses of the deprecated constructor

ChALkeR commented 8 years ago

@mafintosh That one (Buffer.from) is new and is not covered in my forecoming post, but I think that it's out of scope of the current issue — it doesn't give any security benefits compared to the current proposal.

Let's deprecate Buffer(number) first, introducing more deprecations in the same issue will delay the discussion and make coming to a consensus less likely.

Also note that Buffer(value) is widely used, maybe even more widely than Buffer(number).

mafintosh commented 8 years ago

@ChALkeR since both of the actual security issues @feross and I have found in modules resulted from the fact the Buffer(number) was called by accident when someone was trying to call Buffer(string) I think its related. A deprecation warning wouldn't necessarily help here since it would only shown when the faulty constructor is being called (during an attack). It's better than doing nothing though.

feross commented 8 years ago

@ChALkeR I think that @mafintosh's point is relevant.

In the current proposal, developers are still forced to keep using Buffer(variable) to convert to a string, leaving them vulnerable to the original issues that affected bittorrent-dht and ws.

With these:

Buffer.from(value) - convert from any type to a buffer
Buffer.alloc(size) - create an uninitialized buffer with given size

the developer intent is clearly conveyed, and potentially unsafe functionality is put into it's own API.

I think if the API were being designed today, this how it would be done. Let's deprecate Buffer(value) entirely.

ChALkeR commented 8 years ago

@mafintosh Hm. That's a valid point. Perhaps that justifies adding a new API endpoint to create Buffers from values.

trevnorris commented 8 years ago

I think if the API were being designed today, this how it would be done. Let's deprecate Buffer(value) entirely.

The assertiveness of your statement is a little off putting. At minimum this needs to be voted on by the @nodejs/ctc and a change this drastic may warrant a proper EPS.

ChALkeR commented 8 years ago

@mafintosh, @feross I just gave you the access to the post about the current proposal, it's about 70% done, so that you can view it and comment here or somewhere else.

ChALkeR commented 8 years ago

@trevnorris «What the API would look like if it was designed today» is the first most important question that we should study and think about. The second one is «how do we get there from where we are now in a best possible way»?

https://github.com/nodejs/node/issues/4660#issuecomment-171382461

mafintosh commented 8 years ago

@ChALkeR thanks. i'll have a look.

trevnorris commented 8 years ago

@ChALkeR "What should it look like today" is always going to be subjective. I understand the merit in asking that question, but for example designing the fs module today from scratch would bring on quite the argument. Instead I would suggest "what feels most natural with the existing API". Maybe it's not "the best", but at least it will work the way the community expects it to.

jasnell commented 8 years ago

Quick update on https://github.com/nodejs/node/pull/4682 :

The --zero-fill-buffers command line option is implemented.
The new APIs are now named Buffer.alloc() and Buffer.zalloc().
- Buffer.zalloc() == 'zeroed-allocation'. It essentially just creates and returns a new Uint8Array that is always zero-filled.
- Buffer.alloc() is equivalent to the existing new Buffer(num).
Equivalent SlowBuffer.alloc() and SlowBuffer.zalloc() methods are added as well
The existing Buffer(num) and SlowBuffer(num) are hard deprecated.
Documentation is updated as well.

As a reminder, this is a work in progress PR that is not ready to land as is. It is intended to give us something concrete to work with as opposed to going around in circles over whether or not there's really a bug in Buffer or not.

Using the --zero-fill-buffers command line flag would force Buffer.alloc() and Buffer(num) to fallback to Buffer.zalloc() (and SlowBuffer.alloc()/SlowBuffer(num) to fall back to SlowBuffer.zalloc()).

This approach gives us two distinct ways of addressing the problem:

Policy: Node can be run with the --zero-fill-buffers flag to enforce that all buffers be zeroed
Dev: Developers can choose either Buffer.alloc() or Buffer.zalloc() based on their specific needs, with the --zero-fill-buffers command line flag taking precedence.

For LTS, I would propose that only the --zero-fill-buffers flag would be backported to v4, v0.12 and v0.10. Note, however, that due to the differences in the underlying Buffer implementation in v0.12/v0.10, the actual implementation would be different than v4 and above. The differences in the implementation are likely inconsequential here.

This all said, I'm still personally unconvinced that the new Buffer.alloc() and Buffer.zalloc() methods are even required (similar to @mikeal's objection here). I think the new methods simply add confusion and make the API even muddier. However, I won't let my personal objections block the action on this.

jsha commented 8 years ago

Changing new Buffer(number) in a way so that it will zero-fill the memory will cause more security issues and make this matter even worse. Please read the above discussion or wait for my post.

I read the entire thread in detail before replying. I see two arguments that de-fanging the present Buffer(num) API might cause security issues:

Users could be depending on uninitialized memory to provide entropy to a security-critical CSPRNG.

Such code is already critically, dangerously broken, and hopefully very rare. I don't see this as a good reason to hold up a very real security fix affecting a broad range of modules.

Module authors might notice the new zero-filling behavior (without reading the docs) and start to depend on it in new code that intentionally calls Buffer with a number argument. Users running old versions of Node would import those modules and get incorrect behavior.

I have been taking it for granted that de-fanging the current API would come along with deprecating it, which would avoid this problem.

Are there arguments I'm missing about why this would cause additional security issues?

To give a little more perspective, think of all the small devices that are now running Node.js on incredibly limited resources.

I understand this concern, but can you give some examples of apps and/or devices? I would be very surprised if even the smallest devices can't afford to zero their memory. It would also be useful to have examples of popular npm modules that intentionally use Buffer(num) for performance reasons. It would then be possible to benchmark those modules under realistic conditions.

Even if we assume there is a device and app that would become unusable with a de-fanged API, the maintainer can just switch to some new Buffer.unsafe_alloc() method, if that's really what they intend to do.

feross commented 8 years ago

Another argument for Buffer.from and Buffer.alloc as a replacement for the current overloaded Buffer(value) constructor is that it's familiar because of Array.from and TypedArray.from.

Those already work in the way that Buffer.from would work.

The Array.from() method creates a new Array instance from an array-like or iterable object.

Almost all the types that Buffer(value) takes today (except for number, of course, and ArrayBuffer) match what Array.from and TypedArray.from do today.

trevnorris commented 8 years ago

Almost all the types that Buffer(value) takes today (except for number, of course, and ArrayBuffer) match what Array.from and TypedArray.from do today.

So except for type number, ArrayBuffer, string and JSON type (leaving TypedArray and Array) almost all the types match?

feross commented 8 years ago

Actually, Array.from works on string.

feross commented 8 years ago

The whole point of this issue is that number shouldn't be included in the same constructor.

So, yes. Array.from works on TypedArray, Array, and string. Not on ArrayBuffer or JSON.

trevnorris commented 8 years ago

Doesn't work the same way. Uint8Array.from('123') -> [1, 2, 3] Buffer('123') -> [31, 32, 33]

feross commented 8 years ago

Fair point. So it's not exactly the same. I just suggested that it would be familiar.

nodejs / node