Magic also for pointer type

andoma commented 10 years ago

It would be immensely useful to have some kind of 'magic' value paired with pointer values so the native code can perform checks that it's an expected type that's passed (or even act differently on different internal objects that's being pointed to).

I'm not sure if this can be made to fit within duk_tval or not.

The other option is of course that all native objects exposed to duktape start with some kind of class or vtable pointer so the native code can figure out what's what. I'm fine with this option as well but before starting to implement that I would just want to know if the pointer magic is something you've considered implementing.

svaarala commented 10 years ago

The magic value for function is stored in the heap structure of a native function (duk_hnativefunction to be specific) so it's a bit different in that respect.

But I can't see why one wouldn't be able to embed a 16-bit magic value into a pointer. For 32-bit environments there are 16 spare bits for pointers in the packed duk_tval representation. For 64-bit environments duk_tval is unpacked and usually aligns to 16 bytes, so there is plenty of space there.

The changes required to do this would be mostly API changes so that you could create a pointer with a magic value. Pointer assignments etc would copy the magic over automatically. Serialization of pointers would currently lose the magic value, unless it is also added to the JX/JC custom serialization formats.

Just thinking quickly I can see why this would be a common concern so it's certainly worth thinking over. Even if the pointer magic is lost in some situations, calling code which manages its own pointers could quite easily ensure that doesn't happen for the datatypes it manages.

andoma commented 10 years ago

Just some feedback on this.

I realized that having a "magic" or "type" for the pointer only cover half the story for me. I also need to know when the value is no longer reachable from duktape. Thus I need a finalizer as well.

So instead I suppose it makes more sense to create an object with a property that's just a pointer and then a finalizer on the object.

What I've come up with now is to use the C native finalizers's magic to identify the type of the pointer stored in the object.

This make these "native objects" resistant to accidental deep copies (which would otherwise create two pointers in the ecmascript world with only refcount in the native code). Indeed the deep copied object will contain the property but without a matching finalizer all native code that accept these type of "native" objects will just throw.

In addition to this the objects could be frozen to avoid anyone swapping around the pointer values and thus causing a crash or something worse.

sva-p commented 10 years ago

Hmm, an interesting approach, I'll need to think about it a bit :-)

Another thing you could do is to use an internal property name for storing the pointer value. These properties are not enumerable and won't even be returned when iterating non-enumerable properties from normal Javascript code - e.g. Object.getOwnPropertyNames() won't return them.

You can create internal properties by using a property name whose C representation begins with an FF byte (e.g. "\xFF" "ptr"; careful with C's hex escape interpretation). See https://github.com/svaarala/duktape/blob/master/website/guide/internalproperties.html for more.

It's still possible to access these properties from Ecmascript code if you create the property name through a buffer-to-string conversion for instance. But they should be pretty difficult to access accidentally.

andoma commented 10 years ago

Yeah, I've thought a bit about hiding them with the \xFF but wasn't sure if I would accidentally conflict with duktape's own hidden properties (perhaps in the future). Maybe I can namespace them with something...

svaarala commented 10 years ago

That's certainly an issue - for now I've been suggesting to prefix user internal properties with two \xFF bytes. They should never then conflict with Duktape's internal properties. (Perhaps this convention should be reversed :-).

Somewhat related if you run untrusted code: #44.

creationix commented 10 years ago

How do you set internal properties on pointer types? I'm getting "invalid base value" errors.

I'm also fine with using the buffer type if that's preferred, I can handle the memory either way.

In lua I use full userdata to wrap the libuv structs because they can be set with a metatable in the C side to tag the struct to avoid segfaults caused by users passing in the wrong type.

svaarala commented 10 years ago

As some background:

At the moment Duktape's plain pointer and buffer types cannot hold any properties, so a finalizer setup involves a wrapper object which holds the plain pointer/buffer in some key (if this key should be hidden from Ecmascript code, it can be stored in an "internal property"). Changing this would be possible but always involves a compromise between functionality and memory footprint.

Pointers are basically just tagged void * values without a heap allocation, so adding e.g. a finalizer reference would not work directly. However, there is space for 16 flag bits which could be used as some sort of a "magic" value for pointer validation (or whatever user code wishes to use it for, it could be a struct size just as well). A pointer value has no reference count so there is no way to tell that the last reference to that specific pointer value was removed.

For buffers, adding additional fields like a finalizer or prototype reference would be manageable as they are heap allocated; however, adding a field for a finalizer or some other reference might not be a good compromise for low memory systems. As for plain pointers, there is space for 16 flag bits for buffers too. Buffers have a reference count so it would be possible to trigger a fixed finalizer, or perhaps a finalizer chosen by some of the flag bits. This would provide limited finalizer support for plain buffer values with no additional footprint impact, but the mechanism would be somewhat unmodular.

(There are also full object Buffer and Pointer values (similar to String vs. string difference). They hold the plain buffer/pointer reference internally, but can hold additional properties, finalizers, etc.)

creationix commented 10 years ago

For me, what would be ideal is to use the 16-bits to allow 2^16 different kinds of pointer types globally among all bindings. I don't want to hide my pointers as properties in objects because that makes everything much more complex and uses more memory. I don't need finalizers but I understand others do.

As far as modularity goes, we can just say that users don't choose their 16-bit magic directly but rather register a new type using the ref system or something like it so they can associate arbitrary data to a class of pointers or buffers. As long as there aren't 2^16 different types of pointers total, many libraries can coexist without trampling on each-other.

Personally, I just need to store a string constant or void* opaque value so that I know which struct the void* pointer holds.

If you allow the 16-bit magic to work with pointers, such a system could be done outside duktape much like me ref system uses the stash system.

svaarala / duktape

Magic also for pointer type #37