Open ghost opened 10 years ago
So to get the same behavior without reference structs, you would simply make the reference type be object
, and then the subobject could itself be a struct:
var Point = new StructType({ x: int32, y: int32 });
var PackedLine = new StructType({ start: Point, end: Point });
var SharedLine = new StructType({ start: object, end: object });
So does this alternate design provide additional performance benefits, because you get richer type information about heap references? In the past I think @andhow said engines could do that automatically so object
was sufficient.
As for ergonomics, there's definitely appeal to your proposal since the defaults match the expected use case. It is bothering me a little that the concept of an "embedded struct type" is a little confusing, although I think the existing system already confused people in its inconsistency with existing practice.
Would it maybe make sense for inline struct types not to actually be full types, in that they could only be embedded in other types but not themselves constructed or given methods etc? So
new StructType({ start: Point.embed(), end: Point.embed() })
is legal but not
var EmbeddedPoint = Point.embed();
var x = new EmbeddedPoint(); // error: wat are you doing??
Also, bikeshed: s/embed/inline/? And another bikeshed: just make it a getter that produces a memoized singleton instead of requiring a function call? I.e.:
var PackedLine = new StructType({ start: Point.inline, end: Point.inline });
Dave
With some significant extensions to TI (adding a notion of subtyping), I think SM's TI system could do a pretty good job avoiding these guards in JIT code. asm.js, otoh, would have major problems since it would have to handle the case that an unexpected type was assigned to an object field by outside code. In these cases, JIT code is invalidated/bailed, but the whole point of asm.js is not to have to do that (it's worse for codegen and requires more runtime metadata). Other JS engines don't have sound heap summarization like TI (afaik), so they'd do worse (in fact, if all they had was hidden classes, they'd do much worse on polymoprhic code).
Another non-AOT optimization advantage of typed references is that, even if heap summarization could (with work) could do pretty well on well-typed code, the user may not write well-typed code (or know exactly what qualifies as "well-typed code"). For example, using a single type definition object to create instances in two different contexts which store unrelated types in the same field will lose type precision on the field and thus introduce guards at getprops. With typed references, the programmer who intends to write efficient code has clear rules to follow (with feedback when they make a mistake).
Ref-as-default and inline-as-default both make sense depending on your use case.
If your primary use case is C-style structs ('struct' in C#, etc) you want inline-as-default because it's the most obvious layout and it has better performance characteristics. If your primary use case is Java/C#-style heap objects ('class' in C#), then you want member values to be ref, MOST of the time. In C# it's still possible that you might want to have a 'class' that only has inline 'struct' members instead of having all its members be heap references.
I do think that ref-as-default is probably less error prone for the average neophyte programmer. Having foo.x appear to be a typed object but actually be an alias is probably a surprising behavior.
For inline members, like Line.x in your example, is there a trivial way to 'box' them onto the heap? That's the mechanism typically used in C# to make it easy to extract an inline structure onto the GC heap. In most cases it is done automatically:
Line l = ...;
Point p = l.x; // Inline, stored directly on stack. Not a reference.
fixed (Line * pL = &l) {
Point * pP = &(pL->x); // Reference
}
object oP = l.x; // Boxed - storing a value into a local of type 'object' boxes it
p = (Point)oP; // Unbox
Typed objects could expose a method called 'box' or something that automatically clones them into a heap instance if they are currently pointers into a buffer. If they're already a heap instance it could be a no-op. This would allow you to make passing a typed object across a function boundary, or storing it into a field, to be trivially safe by calling .box() on all the values. Without a mechanism like that I think you need global knowledge of your code to know whether a typed object member is safe to pass around?
So, @tschneidereit, and I worked through this in some detail recently. Our conclusion was that things ultimately worked more smoothly if we retained the "embedded by default" style of the specification. However, it's also important to be able to have first-class, typed references, so we want to have every type descriptor offer a ref
variant that gives you a typed reference. (For example, if Point
is a StructType
instance, then Point.ref
would be a type for references to Point
values (versus embedded values)). (This suggests, as an aside, that we should perhaps rename the type for object references to ref
, for consistency.) Therefore, I'm inclined to close this issue and leave things as they are.
Note that introducing something like Point.ref
will require us to be able to make "incomplete" struct types that can only be used by references (the equivalent of struct Foo;
in C), so that we can support recursive types.
@nikomatsakis I've been trying to come up with a way of dealing with incomplete structs for the lib I'm working on. It's tricky to do it in a safe and fast way using the existing proposals here, so I've started using something similar to Promise
's revealing contructor pattern.
If the struct does not contain recursive references, fields can be defined with an object as normal:
const Point = new StructType('Point', {x: int32, y: int32});
If recursive references are required, we pass in a function instead of an object:
const TreeNode = new StructType('TreeNode', TreeNode => {
return {
value: int32,
left: TreeNode,
right: TreeNode
};
});
This approach allows recursive references:
let User, Role;
User = new StructType('User', User => {
Role = new StructType('Role', {
name: string,
users: User.vector()
});
return {
name: string;
roles: Role.vector()
};
});
But generally speaking I think this is nicer if it's reference-by-default, that's how JS normally behaves and that's what users will expect. Otherwise, if people don't know what they're doing, this has weird effects:
const roles = new Role.Vector([
{name: 'Admin'},
{name: 'Guest'}
]);
const alice = new User({name: 'Alice', roles: [roles[0]] });
roles[0].name = 'Administrator';
alice.roles[0].name === roles[0].name; // false
I think users should have to opt in for that.
@phpnode what I had always assumed we would do is create "incomplete" struct types:
const Tree = new StructType();
The only thing you can legally do with a Point
in this state is created a reference type via Tree.ref
. Then at some point you can call fulfill
. So, for example, to make a binary tree type, I might fulfill the fields like so (note that Tree.ref
here executes before fulfill
is called):
Tree.fulfill({data: Any, left: Tree.ref, right: Tree.ref})
@nikomatsakis the issue (for me, in my implementation at least) is that deciding whether or not a struct is finalized/fulfilled in this way seems pretty expensive / convoluted. An unfulfilled struct would poison anything which depends on it, and that dependency tree can be pretty deep. How do I efficiently decide when a particular struct type can be instantiated?
@phpnode an unfulfilled struct does not, I think, have to poison anything that uses it. You can't instantiate (or embed) the struct until it is fulfilled. You can have types that use Foo.ref
, but the only valid value they would be able to supply is null
.
If recursive references are required, we pass in a function instead of an object:
const TreeNode = new StructType('TreeNode', TreeNode => { return { value: int32, left: TreeNode, right: TreeNode }; });```
I don't think this self-as-argument function approach covers all the cases that Type.ref
does, though. I certainly wouldn't be able to use it to solve my problem. Some definition cycles span 3 or more types and you need a way to get at all of the types in advance, not just self. Forward-declaration is the way you solve that.
Forward declaration also simplifies multiple-phase type initialization, which is something you'll see in runtimes. I.e. declare all the names first, then define all their shapes, in single dumb passes. Using the self-as-argument function approach would require you to painstakingly construct dependency graphs to figure out what order to initialize the types in.
At the point where you're creating an instance of a type all the types it interacts with have been initialized, so I don't think this implies any runtime overhead. It's not 'expensive', and arguably it's not convoluted either, for either API consumer or API implementer. It's just splitting the task of initialization into pieces.
ok, thanks for your comments, I'm just going to try and implement it in the way you're suggesting here and report back if I run into issues.
If people use Typed Objects as "better JS objects", then most non-primitive fields will be references, not embedded sub-objects. Thus, we should consider making reference be the default, that is: var A = new StructType(...); var B = new StructType({a:A}); var a = new A(...); var b = new B({a:a}) assert(b.a === a); and, to get an embedded subobject, you'd get derive an "embedded" type definition: var Point = new StructType({x:int32, y:int32}); var Line = new StructType({x:Point.embed(), y:Point.embed()});
One consequence of this change is that we'd need to overload a type definition's constructor based on whether it was called with 'new' or not. That is 'Point(p)' would either return p (if p was a Point or extended Point) or throw and 'new Point({x:1, y:2})' would construct a new P object. In some sense this is similar to other primitive constructors like String and Number.