Closed springmeyer closed 7 years ago
Awesome, thanks for this writeup @springmeyer .
Currently the skel does this...
std::string name = *v8::String::Utf8Value(info[0]->ToString());
Does the implementation in your comment above differ in any way re: allocation, since it seems as if Nan::Utf8String utf8_value(info[0]);
is allocating on the stack? But I suppose this is trivial since it's using a pointer to pass into the C++ string instantiation?
Also curious where exactly allocation is happening. At std::string name =
?
My real question is...is Nan::Utf8String utf8_value(info[0]);
allocating memory?
Also trying to figure out why ToString()
is no longer necessary in your example. Because of Nan?
Currently the skel does this...
Sorry, made a copy/paste error. You are right - I've fixed the post above to note what we are currently doing and should avoid.
Also trying to figure out why ToString() is no longer necessary in your example. Because of Nan?
Really insightful! That is exactly the kind of question that will help you be cautious and write secure and robust code. In this case Nan has all the smarts needed internally to validate, in a fast way, that the string is valid, see https://github.com/nodejs/nan/blob/7291008d755f658343bcb61f456cb0a6744a6235/nan.h#L911. So if any of those fail then the length will be 0
.
My real question is...is Nan::Utf8String utf8_value(info[0]); allocating memory?
GREAT question. You are thinking like a performance engineer!!!. So the answer is maybe
. This is why NanUtf8String
is so fast. Because it uses a kind of SSO (discussed at the bottom of https://github.com/mapbox/cpp/blob/master/glossary.md#instantiate). What that means is that it uses stack allocation for short strings and only falls back to dynamic allocation for strings longer than 1024
bytes. You can see this in action at https://github.com/nodejs/nan/blob/7291008d755f658343bcb61f456cb0a6744a6235/nan.h#L915-L918
Sorry, made a copy/paste error. You are right
Ah, wasnt saying you were wrong. It tied into my initial comment which was digging into the comparison between the old way and the Nan way (the strike-through above), but then I altered my qvestion.
The reason I asked about whether or not Nan::Utf8String utf8_value(info[0]);
is allocating memory is because I was pondering if it was possible to bypass this allocation and just set std::string var to some pointer of info[0]...but I'm realizing now that's probably impossible since std::string can only take a value it understands, which is the v8/nan type. And we would want to check length anyway.
but then I altered my qvestion.
👍
The reason I asked about whether or not Nan::Utf8String utf8_value(info[0]); is allocating memory is because I was pondering if it was possible to bypass this allocation and just set std::string var to some pointer of info[0]...but I'm realizing now that's probably impossible since std::string can only take a value it understands, which is the v8/nan type. And we would want to check length anyway.
Right, std::string
needs to own the data it contains. It cannot point to memory managed elsewhere. So I think the only way to create a std::string that bypass's allocation is to create a std::string
by moving another std::string
. That basically moves ownership of the memory from one variable to the next.
While we are thinking about bypassing allocation, another alternative would be to pass the pointer that references a V8 object directly into the threadpool (instead of converting to a std::string). This is the ideal. I've not tried this with Nan::Utf8String
yet but it is commonly done with node::Buffer
objects. Instead of converting the node::Buffer
to a std::string
or some other kind of C++ object to hold the bytes, we pass the buffer object directly to the threadpool and access its internal memory directly, without copies. This is done over at https://github.com/mapnik/node-mapnik/blob/9f0d98c685e87f7aef8df24fb1e3fc1112726f3b/src/mapnik_image.cpp#L3452-L3454. That 1) creates variables to point at the buffer data pointer (without copying anything but the pointer) and 2) marks the JS buffer object as persistent to keep it alive while working with its data in the threadpool.
Nan string PR merged at https://github.com/mapbox/node-cpp-skel/pull/55
@springmeyer Good to close this and carry future improvement convo per your last comment to separate ticket?
Good to close 👍 As far as last convo about moving to a node::Buffer
over std::string
, my feelins that would be a decent lift, so perhaps something to keep in mind, but not critical to ticket.
A common way to get data from a v8::String to pass to C++ is:
While concise (by v8/c++ terms) this can be improved. Instead of
v8::String::Utf8Value
we can useNan::Utf8String
and we should validate the input before trying to create astd::string
by:info[0]
actually is a stringThe
Nan::Utf8String
api both validates strings internally and is fast. It has an optimized conversion routine similar to node core. And it works across all recent node versions.So we should convert all usage of
String::Utf8Value
toNan::Utf8String
. Like theString::Utf8Value
interface, theNan::Utf8String
API exposes is the ability to check the length of the string before accessing in the value: this is a very good idea for safety and robustness. We don't want to risk passing a null string around, which might create unpredictable behavior or a crash like in https://github.com/mapbox/node-fontnik/issues/124.To keep 100% test coverage, after adding lines that check the line, we will need to add a test that tries to pass an empty string, and will need to assert reasonable behavior in this case. So, let's move to code like this:
An additional (albeit minor unless the string is massive) performance win of getting the
len
of the string is that we can pass it to thestd::string
constructor. Doingstd::string string_copy(*utf8_value);
would work, but providing the length allows thestd::string
constructor to avoid calculating the length internally and should be faster since it skips an operation. Thestd::string
class has a bewildering number of different constructors. This one we are using that accepts aconst char *
as the first arg and anint
as the second argument is the(4)
from http://en.cppreference.com/w/cpp/string/basic_string/basic_string./cc @GretaCB