Closed feross closed 8 years ago
For those interested in getting the new behavior we propose without a change to node core, we published a user-land package: safe-buffer
.
Now, the new Buffer(number, encoding) will throw exception, detail is here: https://github.com/nodejs/node/commit/3b27dd5ce15942a054904b26e3dca295806038d8 . I introduce a new API Buffer.encode() for encoding case.
You could make similar arguments about child_process
being unsafe, about loading native addons being unsafe, even about fs
being unsafe. The more bubble-wrap we put in core, the more users are likely to make the false assumption that programming in Node.js is like programming in the browser, which it absolutely is not. That's not a -1 to this proposal (in fact, this exact discussion is an ongoing on that keeps on coming up, we have an active thread in the security repo to thrash this out too), just a word of caution because we keep on having discussions like this where we want to move closer to a sandbox but that will never be the case for Node and it would be even more unsafe to lull users into the perception that they are as protected as in the browser (there's a very good chance that's exactly what happened to TrendMicro).
It should be noted that the primary reason for not zero-filling by default is performance. So new Buffer()
should be considered a form of malloc()
. In the past, it has been demonstrated that allocating filled memory has significant performance penalties to a plain malloc()
. I keep on suggesting that those who are advocating switching the default to a clean allocation should do some benchmarking to demonstrate what kind of impact it might have. Otherwise, this argument is dead-in-the-water because the "it's faster" argument currently holds this one in place.
This discussion has been had several times in the last few weeks.
If a "high-risk functionality" application doesn't have proper type checking in place then it already has a security issue. This problem can't be dished into Buffer as if it's at fault.
Not zero filling the Buffer is completely a performance issue, but the impact on performance is more than trivial. Even at allocations as low as 1KB. And while this type of impact probably wouldn't harm your standard web app, it can noticeably affect performance of a node process.
The concern is when a number is passed when something else is expected. Again this is a bug in the developers code, and TBH it doesn't make much sense to impact the performance of the many modules out there today and force them to update their code to use new syntax because of that.
And let's be honest, in the years that Buffer has worked exactly this way how many times has this been reportedly seen as an issue in the wild. The only reason you're here now is because it affected modules you were directly involved with.
@rvagg @trevnorris You both raise good points. Node is not the browser, and performance is critical.
The difference with modules like child_process
and fs
is that they're very obviously doing powerful and sensitive operations. Even the most trivial use of them will confront the developer with the fact that they're interacting with their OS on a low level and need to be careful.
Buffer
is different. 95% of the time, it's safe and idiot-proof. 5% of the time you need to be very, very careful.
You can pass in:
Array
–> safeBuffer
–> safeArrayBuffer
–> safeUint8Array
–> safeUint16Array
–> safeObject
(reverse of .toJSON()
) –> safeString
–> safeNumber
–> _SURPRISE! REMOTE MEMORY DISCLOSURE!_The API seems like it's trying to lay a trap for the user.
Look at the contributors on each of those repos. Some of the most talented node.js developers looked at this code extensively and didn't notice this issue for years.
And while this type of impact probably wouldn't harm your standard web app, it can noticeably affect performance of a node process.
To be clear, I'm not suggesting that node core should use zeroed-out buffers. It would continue using unallocated buffers, but that functionality would move to a different API away from the one that looks and behaves (and now actually is) a Uint8Array
.
-1 to changing new Buffer(number)
to zero-out by default. Even if there was a separate API to preserve the behavior, that means a lot of code change for a lot of packages and application code. I'd personally be more open to a separate method that created zeroed-out Buffers, that way it's an explicit opt-in and you are (or should be) at that point aware of the tradeoff you're making.
separate method that created zeroed-out Buffers
That would not fix anything, there is already a new Buffer(number).fill(0)
.
I see around 20-25% perf regression using .fill()
on small buffers (on ARM here):
node -e "for(var i = 1e6; i >= 0; i--) Buffer(1e3)" 6.16s user 0.19s system 100% cpu 6.287 total
node -e "for(var i = 1e6; i >= 0; i--) Buffer(1e3)" 6.10s user 0.19s system 101% cpu 6.222 total
node -e "for(var i = 1e6; i >= 0; i--) Buffer(1e3)" 5.97s user 0.17s system 101% cpu 6.074 total
node -e "for(var i = 1e6; i >= 0; i--) Buffer(1e3).fill()" 8.41s user 0.16s system 100% cpu 8.504 total
node -e "for(var i = 1e6; i >= 0; i--) Buffer(1e3).fill()" 7.89s user 0.14s system 100% cpu 7.968 total
node -e "for(var i = 1e6; i >= 0; i--) Buffer(1e3).fill()" 7.94s user 0.24s system 100% cpu 8.111 total
I see around 20-25% perf regression using
.fill()
on small buffers
Nice.
Remember too, this is not a 20-25% perf regression across the board. It's only going to apply when all of these conditions are true:
new Buffer(number)
because they need uninitialized memoryBuffer.alloc()
Node.js core would not be slower. Uses of new Buffer(string)
, new Buffer(array)
, etc. would not be slower.
I think I'm in support of this, but I'm not aware of all the use cases for Buffer(number)
, so take that with a grain of salt.
Instead of introducing new API, how about an opt-in?
Buffer(100); // safe
Buffer.unsafe = true;
Buffer(100); // unsafe
Thanks feross for explaining this well, I learned something tonight! I am relatively new to Node - I gained my open source chops in Python land. In Python, slow performance is sort of a given. If you want to make a Python program more performant, you can optimize it in certain ways, which are well known. Performance then is an advanced problem. The old adage "make it work, then make it fast" works well for these kinds of dynamic, interpreted languages. After all, we are not writing in C here, and the beauty of languages like Node and Python is the ease with which external, more performant packages can be accessed when necessary.
20-25% is actually quite a lot slower, true -- but how many calls to Buffer are using Buffer(Number) and require high performance? It seems as though this being fixed in core will bring great gratitude from newcomers and old hats alike. As a relatively new node programmer, I am certainly grateful - my memory won't be stolen!
I generally agree that the API is problematic.
I just want to point out, that fixing Buffer
to be zero-filled potentially introduces security risks to code that would use the constructor to get some entropy. Consider this (arguably badly coded) example, where someone might use it instead of crypto.randomBytes
.
var someEntropy = (new Buffer(256)).toString('hex')
Updating to a new node major version with the proposed new Buffer API would not break the code, but would introduce (an even higher) security risk to that code, without the user noticing right away.
I just wanted to point that out for the sake of argument.
@finnp that API is so insecure that I would flag it as a security vulnerability, for the same assumption of this PR.
So, I'm :-1: on anything that make node (or apps that runs on node) slower. However, I see the security vulnerability.
My gut tells me: let's benchmark this change in node. Let's benchmark this change in userland (all the popular modules have benchmarks), and let's see where we stand. Does express gets slower? or Hapi? or socket.io? To what degree?
I think the problem is that malloc
and encodings should not be on the same API. I propose to deprecate anything that is not new Buffer(number)
, and prepare a factory method for all the other cases. new Buffer(number)
will stay like malloc, and we migrate all users to a safe API, that does not support the 'number'
argument.
I think anyone depending on that kind of buffer entropy is already walking in the land of the undocumented, and the security risk represented by the issue versus the pretty modest performance ding make this one a no brainer. To put this back into a C context, returning memory without a memset in response to a 'client' request is effectively insanity.
+1 on making uninitialized Buffer creation a separate function in the API, and removing it from the "type-guessing" constructor, or replacing it with a 'safe' variant. This kind of potentially dangerous functionality should be explicitly opt-in.
I'm not sure why there's any discussion about it at all, to be honest. It's well understood that having safe defaults considerably decreases the chance of vulnerabilities, and moving uninitialized creation to a separate function would not incur any performance issues for maintained code (because they can simply use the new function if they know what they are doing, and actually intend uninitialized creation).
The unmaintained code would not receive security patches anyway, and in those cases a slowdown is better than an outright vulnerability. Node.js, as a project that follows semantic versioning, is in a fairly unique position to actually be able to make such patches without widespread ecosystem breakage.
@silverwind: Instead of introducing new API, how about an opt-in?
I would vote against a global opt-in. It doesn't really fix the problem (as a single intended opt-in means your entire codebase is now unprotected), while still introducing the same kind of break. That, plus global state gets messy in general.
@trevnorris: The concern is when a number is passed when something else is expected. Again this is a bug in the developers code, and TBH it doesn't make much sense to impact the performance of the many modules out there today and force them to update their code to use new syntax because of that.
It absolutely does. Safe defaults are the single most important thing a language/runtime can possibly provide for fostering a secure ecosystem around it. No matter how competent people are, they make mistakes, and the amount of possibly footguns should be reduced to an absolute minimum. The code change would be trivial, and provide significant security gains.
@trevnorris: And let's be honest, in the years that Buffer has worked exactly this way how many times has this been reportedly seen as an issue in the wild. The only reason you're here now is because it affected modules you were directly involved with.
Observance of security issues does not correlate to severity or how badly they need to be solved. Not to mention that you have no idea whether this has been discovered (and exploited) in the wild or not, nor any way to check, nor would the perpetrators have any reason to disclose as such.
The security community has spent several decades now understanding how to assess the severity of a security issue and how to prevent common issues, and it would probably be advisable to work from that knowledge, rather than repeating that entire process again in the Node.js community and exposing many users to danger in the process.
@mscdex: I'd personally be more open to a separate method that created zeroed-out Buffers, that way it's an explicit opt-in and you are (or should be) at that point aware of the tradeoff you're making.
Secure opt-ins don't work. See also this and the general principle of people following the path of least resistance.
While I first thought that making Buffer(number)
zero-filled by default is a good idea, I do not think that now. Imo it should better be deprecated and replaced by two separate methods.
Making Buffer(number)
zero-filled will solve the issue long-term, but will cause havoc short-term. For example, if Alice writes a lib, sees that with Node.js 6.x, 7.x etc Buffer(number)
is zero-filled, and does not zero-fill the buffer in her code (she doesn't want to zero-fill it twice!), and Bob uses that lib under Node.js 5.x where Buffer is not zero-filled by default — it will cause more damage that it solves.
Documenting that in a way «zero-fills since 6.x, zero-fill it twice if you use a lower version» would not help, Alice could think «I don't need 5.x» and publish a lib that does not manually zero-fill Buffers
, but some other person could use that lib under 5.x and will be affected. There is no way around that — if Buffer(number)
becomes zero-filled by default, the user will have to choose between zero-filling it two times in newer versions of Node.js or not zero-filling it at all in less recent versions of Node.js (and the third option would be — write some ugly version-detection code). Guess what will the user choose?
Alternate proposal:
Buffer.allocateRaw(number)
to allocate a non-zero filled Buffer
, document it, try hard to note the possible security consequences of careless usage.Buffer.allocate(number)
to allocate a zero-filled Buffer
.Buffer(number)
(first in doc for at least a whole major release, perhaps), tell people to use Buffer.allocate(number)
instead.Buffer(number)
calls, so that the topmost app could enforce zero-filling all Buffers. I think this point was already discussed and supported.Buffer(number)
, point people to Buffer.allocate(number)
. Perhaps make Buffer(number)
zero-filled by default at the same time (but don't let people rely on that) and remove the command-line flag, because it being hard-deprecated would mean that the usage is low and no new code is expected to use that.Thoughts?
This will also solve unintentional calls to Buffer(number)
when a person wanted to call Buffer('200')
instead (and called Buffer(200)
).
@finnp Anyone who uses the code you provided is already doomed.
At this point it's easier just to soft-deprecate Buffer
altogether and encourage devs to use typed arrays instead.
At this point it's easier just to soft-deprecate Buffer altogether and encourage devs to use typed arrays instead.
I think this is even harder than changing Buffer api, as Buffer is part of the Stream API, and also a lot of C++ code.
@vkurchatkin Buffer(number)
has much lower usage than general Buffer
, plus the perfomance concerns of Buffer
vs typed arrays. Let's be realistic here and let's try to solve this somehow. Deprecating Buffer
, if that would ever happen, will take long time.
I think this is even harder than changing Buffer api, as Buffer is part of the Stream API, and also a lot of C++ code.
it shouldn't be that hard to change, since Buffers ARE typed arrays now.
@ChALkeR
plus the perfomance concerns of Buffer vs typed arrays
The only concern is instantiation. We can still provide a function to create uninitialized typed array or array buffer.
Let's be realistic here and let's try to solve this somehow. Deprecating Buffer, if that would ever happen, will take long time.
I mean not really deprecating them, but discouraging and maybe eventually removing from docs.
I'm +1 on the general approach @ChALkeR suggests in https://github.com/nodejs/node/issues/4660#issuecomment-171262864
By introducing two new alternative factory functions (one safe, one unsafe) and deprecating the current unsafe constructor, we ensure that existing code isn't adversely affected while giving new code an appropriate path forward. Changing the expected behavior of the Buffer constructor or even deprecating Buffer altogether does not address all of the requirements here.
One tweak I would make to @ChALkeR's suggestion is: instead of allocate
and allocateRaw
, I would use allocateSafe
and allocateUnsafe
, to make sure it's clearly indicated that there is a safety/security choice being made.
Some comments, in no particular order:
At this point it's easier just to soft-deprecate
Buffer
altogether and encourage devs to use typed arrays instead.
@vkurchatkin Except that Buffer
has extra APIs that make it more useful both to core and userland.
20-25% is actually quite a lot slower, true -- but how many calls to Buffer are using Buffer(Number) and require high performance? It seems as though this being fixed in core will bring great gratitude from newcomers and old hats alike. As a relatively new node programmer, I am certainly grateful - my memory won't be stolen!
@karissa Core's are probably most important, but if you are, say, running a heavy realtime websocket application, and ws
were using zero-filled buffers for everything, you'd likely loose a lot of throughput.
In Python, slow performance is sort of a given. If you want to make a Python program more performant, you can optimize it in certain ways, which are well known. Performance then is an advanced problem. The old adage "make it work, then make it fast" works well for these kinds of dynamic, interpreted languages.
Python is a tricky comparison because the language just comes with everything. Sure, we expect JavaScript to not execute as fast as C, or even Java, but in Node, being able to achieve high throughput of I/O operations is pretty darn important. That's what we do in a nutshell after all.
Secure opt-ins don't work. See also this and the general principle of people following the path of least resistance.
I would generally agree with this.
Alternate proposal:
- Introduce
Buffer.allocateRaw(number)
to allocate a non-zero filledBuffer
, document it, try hard to note the possible security consequences of careless usage.- Introduce
Buffer.allocate(number)
to allocate a zero-filledBuffer
.- Soft-deprecate
Buffer(number)
(first in doc for at least a whole major release), tell people to useBuffer.allocate(number)
instead.- Introduce a command-line flag for opt-in to zero-filling all
Buffer(number)
calls, so that the topmost app could enforce zero-filling all Buffers. I think this point was already discussed and supported.- Hard-deprecate
Buffer(number)
, point people toBuffer.allocate(number)
. Perhaps makeBuffer(number)
zero-filled by default at the same time (but don't let people rely on that) and remove the command-line flag, because it being hard-deprecated would mean that the usage is low and no new code is expected to use that.
I think this the most sound proposal here, although I wouldn't mind tweaking it like @jasnell pointed out.
Alternate proposal:
Introduce
Buffer.allocateRaw(number)
to allocate a non-zero filledBuffer
, document it, try hard to note the possible security consequences of careless usage.
-1
Introduce
Buffer.allocate(number)
to allocate a zero-filledBuffer
.
+1
Soft-deprecate
Buffer(number)
(first in doc for at least a whole major release, perhaps), tell people to useBuffer.allocate(number)
instead.
-1
Introduce a command-line flag for opt-in to zero-filling all
Buffer(number)
calls, so that the topmost app could enforce zero-filling all Buffers. I think this point was already discussed and supported.
+2
Hard-deprecate
Buffer(number)
, point people toBuffer.allocate(number)
. Perhaps makeBuffer(number)
zero-filled by default at the same time (but don't let people rely on that) and remove the command-line flag, because it being hard-deprecated would mean that the usage is low and no new code is expected to use that.
-1
I agree with @jasnell. Buffer.allocateSafe()
and Buffer.allocateUnsafe()
are the best ones. I would shorten them up to Buffer.safe()
and Buffer.unsafe()
, but that's my taste.
@mscdex I actually disagree more with that, allocate()
does not sound safe-by-default to me.
Imho, new API would just complicate things unneccessary. +1 for a --safe-buffers
flag.
@silverwind Just an opt-in command-line flag will not fix anything in the ecosystem. It will fix things only for those setups that care, and those will receive unneeded performance penalties (by double-zeroing the Buffers). It should be viewed only as a temporary measure.
@Fishrock123 I was +1 to that particular idea, not the specific function name. I do agree a better name could be chosen if that route was taken (allocZeroed()
?).
@ChALkeR we could just make it --unsafe-buffers
if we really want to go that far, but I think a few people here would rather have zeroed buffers as an opt-in because performance is a key feature of Node.js after all.
@silverwind
When trying to solve a problem, we should ask these questions:
Preserving new Buffer(string)
, new Buffer(number)
, and adding just the weird command-line flag that affects the behaviour of the latter is not the way how this should have been done.
I'm a bit iffy about a long hard-deprecation period (as this is a critical security issue that will affect existing applications right now, likely without awareness of the operator), but in principle, I agree with @ChALkeR's suggestion to deprecate the current uninitialized constructor entirely, and to introduce two new methods.
That having been said, I think @jasnell's suggested tweak is a very good idea, and even an essential one, from an "encouraging safe implementations where possible" point of view. Rust takes the same approach with unsafe blocks.
It requires the developer to commit to the choice of using something 'unsafe', which will make them less likely to do so without understanding the consequences. As a bonus, it acts as a 'red flag' to others reviewing the code - a clear indication that they need to be careful when modifying it, as an 'unsafe' feature is being used.
I think the --safe-buffers
flag as a temporary mitigation strategy is a good proposal as well, but it should absolutely not be the only mitigation against this issue in the long term. It comes with a severe performance penalty, and will thus introduce a "security vs. performance" trade-off, and I think we all know how that is going to turn out when managers get involved.
@joepie91 A hard deprecation done sooner would leave us in a situation when module authors can't use the old API because it's hard deprecated and can't use the new API becase old Node.js versions do not support it.
I guess that will not object to hard-deprecating Buffer(number)
sooner that one major release away, if all the other supported Node.js branches receive the two new methods (as backports). But I doubt that anyone else would support that.
I guess that will not object to hard-deprecating Buffer(number) sooner that one major release away, if all the other supported Node.js branches receive the two new methods (as backports). But I doubt that anyone else would support that.
Releasing a minor update for existing branches with the new methods would probably be a good idea anyway, for module compatibility reasons. Requiring developers to "detect" which API to use is a great way to ensure that they will just use the old (dangerous) API for as long as they can get away with it instead...
To be clear, I'm not suggesting that node core should use zeroed-out buffers.
This is mostly moot. The majority of memory for core purposes is allocated on the native side, bypassing the ArrayBufferAllocator
. The reason for how it operates today was always done for the best interest of users, and our assumption that developers understood the clearly documented API they were using.
The API is optimized for convenience: you can throw any type at it, and it will try to do what you want.
This makes it sound like Buffer does a best guess. Every acceptable value is documented, and what will happen with that value is also documented.
The difference with modules like child_process and fs is that they're very obviously doing powerful and sensitive operations.
node is full of operations that allow us to do power OS interactive things with a simplistic API. This is one of the most powerful aspects of node. I remember when I first started using node (and was completely new to server side development in general) I did all sorts of stupid things to my machine. I learned, admitted my stupidity and didn't do it again. Truth be told I'm definitely not done doing stupid things.
Those who want performance will figure out how to get it.
Let's zero fill buffers by default. Please let's not create a
mysql_real_escape_string
situation. Just fix it.
On Wednesday, January 13, 2016, Sven Slootweg notifications@github.com wrote:
I guess that will not object to hard-deprecating Buffer(number) sooner that one major release away, if all the other supported Node.js branches receive the two new methods (as backports). But I doubt that anyone else would support that.
Releasing a minor update for existing branches with the new methods would probably be a good idea anyway, for module compatibility reasons. Requiring developers to "detect" which API to use is a great way to ensure that they will just use the old (dangerous) API for as long as they can get away with it instead...
— Reply to this email directly or view it on GitHub https://github.com/nodejs/node/issues/4660#issuecomment-171472775.
Karissa McKelvey http://karissa.github.io/
@karissa Zero-filling new Buffer(number)
by default will actually make the situation worse from the security point of view, taking the current ecosystem into account. See https://github.com/nodejs/node/issues/4660#issuecomment-171262864 for explanation and an alternate approach.
(added a security
label, memory
is about memory leaks)
Btw, I made that old note (that was unfinished before) public: https://github.com/ChALkeR/notes/blob/master/Buffer-knows-everything.md
This makes it sound like Buffer does a best guess. Every acceptable value is documented, and what will happen with that value is also documented.
You're right: it's already documented. But the disclosure is a total of 39 words, amidst a 6000 word document. We can blame users for not reading the docs – but there's no denying it's bad API design to mix safe and unsafe functionality in the same constructor. It's laying a trap for the user.
node is full of operations that allow us to do power OS interactive things with a simplistic API. This is one of the most powerful aspects of node.
It's interesting that you bring up other core APIs.
There are many core APIs that helpfully coerce from String
to Number
, or the reverse. For example, setTimeout(fn, '1000')
becomes setTimeout(fn, 1000)
.
Even buffer itself does coercion: buf[0] = '40'
sets the index to the number 40
, and buf.readInt8('0')
reads out the value from index 0
. Both automatically coerce from String
to Number
.
Can users really be blamed for expecting new Buffer(myVariable)
to be safe?
I also think the existence of Uint8Array
and the other TypedArrayView
types plays a role here.
Buffer
's similarity to these safe typed arrays lulls users into a false sense of security. The fact that Buffer
is now also a subclass of Uint8Array
makes the potential for confusion even greater.
Every new Buffer should allocate new sandboxed memset, fulfilled with zeros depends on it size.
@bricss Switching to zero-filled (by default) Buffer(number)
will most likely make things even worse in the current situation, even from the security point of view. Please read my explanation above.
@ChALkeR Btw, the point you raised is valid and I hadn't considered it. Thanks for bringing it to light. Any solution should take that into account.
@feross What do you think about the proposal at https://github.com/nodejs/node/issues/4660#issuecomment-171262864?
@nodejs/ctc: Distilling the conversation up to this point: the existing behavior of Buffer(number)
is likely not going to change. Given that zero-filling by default would be a significant breaking behavioral change, it would take at least one full semver-major cycle to do and would make Buffer(number)
behavior expectations different across different versions of Node, which will lead to more issues than it would solve. Given that, changing the current behavior of Buffer(number)
is unlikely to happen. Scanning through the thread, I believe there is consensus on this point among the core contributor participants.
There also appears to be consensus among the core contributors that adding a new factory method for creating a "safe" zero-filled Buffer instance is a good thing. There is some disagreement over what to call it. I believe simply calling it Buffer.safe(number)
is sufficient. The implementation of Buffer.safe(number)
would essentially be return new Buffer(n).fill(0)
.
There appears to be a difference of opinion among the core contributors about whether the Buffer(number)
constructor should be deprecated and replaced by a Buffer.unsafe(number)
type of factory method. I can understand the reasoning for this. For my part, I'm in favor of introducing a new Buffer.unsafe(number)
factory method to be symmetrical with Buffer.safe(number)
-- which would actually make it easier to understand. But I get why folks don't want Buffer(number)
deprecated and I can live with that.
There appears to be consensus among the core contributors about having a command line switch that would explicitly change the default behavior of Buffer(number)
to zero-fill by default.
There also appears to be consensus that the documentation can do a better job of spelling out the risks.
The tl/dr; version is:
Buffer(number)
: NoBuffer.safe(number)
: Yes--safe-buffers
(or similarly named): YesIf we went with this now, we would still have the ability to add Buffer.unsafe(number)
and deprecate Buffer(number)
later.
I suggest we go with this approach.
Change or Deprecate Buffer(number): No
This is unacceptable. This API is extremely dangerous and likely already an issue in many applications due to the ease of introducing it (any typed user-supplied input is a problem!). It absolutely needs to be deprecated as soon as possible, at the very least a soft deprecation.
@jasnell Without deprecation of Buffer(number)
, this has no sense. There is already an opt-in (Buffer(number).fill(0)
) for those who care, but users should be guarded from inadvertently creating unitialized Buffers in the long term, so Buffer(number)
must be deprecated or changed. Changing it will break more stuff, so the deprecation must happen.
@ChALkeR ... ok. As I said, I prefer that option also. @trevnorris and @mscdex , if we had symmetrical Buffer.unsafe(number)
and Buffer.safe(number)
with a soft-deprecated Buffer(number)
, is that something you could live with?
tl;dr
This issue proposes:
new Buffer(number)
to return safe, zeroed-out memoryBuffer.alloc(number)
Update: Jan 15, 2016
Upon further consideration, I think that returning zeroed out memory is a separate issue. The core issue is: unsafe buffer allocation should be in a different API.
I now support adding two APIs:
Buffer.from(value)
- convert from any type to a bufferBuffer.alloc(size)
- create an uninitialized buffer with given sizeThis solves the core problem that affected
ws
andbittorrent-dht
which isBuffer(variable)
getting tricked into taking a number argument.Why is
Buffer
unsafe?Today, the node.js
Buffer
constructor is overloaded to handle many different argument types likeString
,Array
,Object
,TypedArrayView
(Uint8Array
, etc.),ArrayBuffer
, and alsoNumber
.The API is optimized for convenience: you can throw any type at it, and it will try to do what you want.
Because the Buffer constructor is so powerful, you often see code like this:
_But what happens if
toHex
is called with aNumber
argument?_Remote Memory Disclosure
If an attacker can make your program call the
Buffer
constructor with aNumber
argument, then they can make it allocate uninitialized memory from the node.js process. This could potentially disclose TLS private keys, user data, or database passwords.When the
Buffer
constructor is passed aNumber
argument, it returns an UNINITIALIZED block of memory of the specifiedsize
. When you create aBuffer
like this, you MUST overwrite the contents before returning it to the user.Would this ever be a problem in real code?
Yes. It's surprisingly common to forget to check the type of your variables in a dynamically-typed language like JavaScript.
Usually the consequences of assuming the wrong type is that your program crashes with an uncaught exception. But the failure mode for forgetting to check the type of arguments to the
Buffer
constructor is more catastrophic.Here's an example of a vulnerable service that takes a JSON payload and converts it to hex:
In this example, an http client just has to send:
and it will get back 1,000 bytes of uninitialized memory from the server.
This is a very serious bug. It's similar in severity to the the Heartbleed bug that allowed disclosure of OpenSSL process memory by remote attackers.
Which real-world packages were vulnerable?
bittorrent-dht
@mafintosh and I found this issue in one of our own packages,
bittorrent-dht
. The bug would allow anyone on the internet to send a series of messages to a user ofbittorrent-dht
and get them to reveal 20 bytes at a time of uninitialized memory from the node.js process.Here's the commit that fixed it. We released a new fixed version, created a Node Security Project disclosure, and deprecated all vulnerable versions on npm so users will get a warning to upgrade to a newer version.
ws
That got us wondering if there were other vulnerable packages. Sure enough, within a short period of time, we found the same issue in
ws
, the most popular WebSocket implementation in node.js.If certain APIs were called with
Number
parameters instead ofString
orBuffer
as expected, then uninitialized server memory would be disclosed to the remote peer.These were the vulnerable methods:
Here's a vulnerable socket server with some echo functionality:
socket.send(number)
called on the server, will disclose server memory.Here's the release where the issue was fixed, with a more detailed explanation. Props to @3rd-Eden for the quick fix. Here's the Node Security Project disclosure.
What's the solution?
It's important that node.js offers a fast way to get memory otherwise performance-critical applications would needlessly get a lot slower.
But we need a better way to signal our intent as programmers. When we want uninitialized memory, we should request it explicitly.
Sensitive functionality should not be packed into a developer-friendly API that loosely accepts many different types. This type of API encourages the lazy practice of passing variables in without checking the type very carefully.
Buffer.alloc(number)
The functionality of creating buffers with uninitialized memory should be part of another API. We propose
Buffer.alloc(number)
. This way, it's not part of an API that frequently gets user input of all sorts of different types passed into it.How do we fix node.js core?
We sent a PR (merged as
semver-major
) which defends against one case:In this situation, it's implied that the programmer intended the first argument to be a string, since they passed an encoding as a second argument. Today, node.js will allocate uninitialized memory in the case of
new Buffer(number, encoding)
, which is probably not what the programmer intended.But this is only a partial solution, since if the programmer does
new Buffer(variable)
(without anencoding
parameter) there's no way to know what they intended. Ifvariable
is sometimes a number, then uninitialized memory will sometimes be returned.What's the real long-term fix?
We could deprecate and remove
new Buffer(number)
and useBuffer.alloc(number)
when we need uninitialized memory. But that would break 1000s of packages. So that's a no-go.Instead, we believe the best solution is to:
new Buffer(number)
to return safe, zeroed-out memoryBuffer.alloc(number)
This way, existing code continues working and the impact on the npm ecosystem will be minimal. Over time, npm maintainers can migrate performance-critical code to use
Buffer.alloc(number)
instead ofnew Buffer(number)
.Conclusion
We think there's a serious design issue with the
Buffer
API as it exists today. It promotes insecure software by putting high-risk functionality into a convenient API with friendly "developer ergonomics".This wasn't merely a theoretical exercise because we found the issue in some of the most popular npm packages.
Eventually, we hope that node.js core can switch to this new, safer behavior. We believe the impact on the ecosystem would be minimal since it's not a breaking change. Well-maintained, popular packages would be updated to use
Buffer.alloc
quickly, while older, insecure packages would magically become safe from this attack vector.