gracelang / language

Design of the Grace language and its libraries
GNU General Public License v2.0
6 stars 1 forks source link

What are primitive objects and methods, and how do you get to them? #152

Open kjx opened 6 years ago

kjx commented 6 years ago

The spec defines things like Number, String, etc but doesn't say how they come from or how you can extend their code, etc. In particular, the spec leaves open the question as to whether you can inherit from primitive types.

What are some design options:

I like the NS model, but talking to @apblack, realised it is asymmetric - it moves the magic behaviour out of the NS objects, but leaves the magic 'data' there. This certainly suits e.g. tagged VMs where integers or floats are represented by some subset of bitpatterns that are invalid as pointers.

The cleanest model I can think of is to go the 'full William Cook'. The VM provides an (interlinked) set of ADTs - not objects. Individual instances of these ADTs offer no Grace methods, not even equality. To Grace code, they are opaque 'magic cookies' that can be stored in Grace object fields, passed as arguments etc. Perhaps there are branded types that can distinguish the ADTs to which they belong but nothing more. Behaviour for these ADTs is provided by a vm object (perhaps a set of singleton VM objects one per ADT) that have methods that take and return those magic cookies. Then all the Grace-level behaviour for primitives is written as vanilla Grace code that manipulates the magic cookies. There's no question e.g. about inheriting from magic cookies; you just can't. In this design, the integer class would be 100% Grace and would look like this:

import "vmIntegerADT" as vmIntegerADT
 class integer(cookie' : VmInteger) -> GraceInteger  { // construction is normal
   def cookie is public = cookie' 
   method +(other : GraceInteger) -> GraceInteger { integer( vmIntegerADT.add(cookie, other.cookie) ) }
}

This gives a clear interface to the VM ADTs, a cleaner data model (objects are either all primitive, or all Grace, but nothing in between), and separates out the language specification (interface of Grace Integer or Number classes) and the VM interface of the vmIntegerADT module object.

You'd probably want to do stuff with branded types, both on GraceInteger and VmIntegers. The cookie def could be handled specially in most VMs (perhaps the branded type is enough; perhaps you'd need an annotation too).

I fear there is a subtle problem that - because the cookie must be purely exposed to Grace code - it may offer a route to an encapsulation breach if e.g. Grace code can even just do type tests - I guess that problem can be addressed by putting the brands and types into the vmModule too - so you really can only manipulate them if somehow you can get to code in those VM modules. In a typically Grace system, only the module implementing the Grace-level number class would import that module.

In terms of e.g. adding ownership or provenance or taint to Grace numbers or other primitives, this design resolves it nicely: those features could all be done in Grace code and the interface to the underlying primitives wouldn't need to be modified. I fear you'd need brands (or ownership) at the Grace level to make everything secure though (because the core interface for non-receiver method arguments is just type { cookie -> VmInteger } which will admit a multitude of sins.

Thinking for a minute about implementation though, this model need not necessarily be slower or more memory intensive than one based on primitive parts and inheritance. This is because in non-boxed VMs, e.g. running Grace on top of JavaScript, or Java/Truffle/Graal - there's no way to inherit from primitives: you'll have to have a implementation field in the representation of the Grace object that holds the JS or Java VM integer (or string or whatever) anyway. Given branded types, the cookie field could be implemented as precisely a primitive field. Reading or writing that field would have to box and or unbox the primitives to make the Grace opaque cookies, but everything at that level would be functional, and if you can just inline the ADT code into the Grace code, the overhead should go away.

apblack commented 6 years ago

I think that @kjx is making this more complicated that it needs to be.

We can make classes number and string intrinsic, and the meaning of a number and a string literal be given by methods in the current dialect. In the common case, these dialect methods would just re-export the the intrinsic number and string classes; the compiler would notice this, and avoid any additional overhead. (Recall that the code of the dialect is known at compile time.)

How the intrinsics are defined is not part of the language spec. What the intrinsics do is part of the spec, but their implementation should be left to the language implementor. The implementation of the intrinsics will obviously differ from one implementation to another. In minigrace I've implemented the native (language) code (implementation) method, which is essentially like asm in C. When writing the implementation string, one needs to know all of the compiler's implementation secrets. I don't think that this is bad, provided that we use this sort of non-portable code only in a limited number of places.

I really don't like appealing to ADTs (aren't objects good enough?), nor understand why doing so is any better than writing the intrinsic module using objects directly. Something like:

class number(literalString) -> Number  {
    def value is public = numberfromString(literalString)  
    method +(other : Number) -> Number { other.reversePlusIntrinsicNumber(self) }
    method reversePlusIntrinsicNumber(other) { 
        // here we know that self and other are both intrinsicNumbers
        native "st" code ‹^ (other + self) asGraceNumber›
    }
    ...
}

In fact, I don't see how the "ADT model" allows for bigNums or the Integer/Rational/Float hierarchy that we have talked about in the past. It seems to limit Grace to providing whatever cookies implement. Maybe I don't understand what it is really doing?

apblack commented 6 years ago

the spec leaves open the question as to whether you can inherit from primitive types.

I presume that what you meant to say is that the spec leaves open the question as to whether you can reuse built-in objects like booleans, numbers and strings.

I don't agree with this. The spec says (or air least used to say) we can reuse the booleans. Why would the numbers and strings be any different?

kjx commented 6 years ago

I think that @kjx is making this more complicated that it needs to be.

certainly!

We can make classes number and string intrinsic,

sure. They have to be intrinsic some way or other. The question is whether we want those kind of magic classes or not. It's probably simpler if we do...

and the meaning of a number and a string literal be given by methods in the current dialect.

or rather, methods in the dialect can return the dialect-specify versions of integers etc. The catch is: what do those methods accept as arguments? The answer is: the intrinsics objects (or cookies).

makeString(s : IntrinsicString) -> GraceString
makeNumber(s : IntrinsicNumber) -> GraceNumber
   // should this take IntrinsicSting?   look at GHC. 
   // if so, we'd still need some way to go from an IntrinsicString to an IntrinsicNumber

classes; the compiler would notice this, and avoid any additional overhead.

well in most cases those methods would just be identity. Like you said elsewhere, the question is where the behaviour goes. In the objects, or elsewhere...

(Recall that the code of the dialect is known at compile time.)

Hmm. Does that mean calls to things like makeString are magically bound only to dialects? Or do we generate calls like myDialect.makeString "foo"? Why not just implicit/internal requests (which leads to a bunch of interesting options).

kjx commented 6 years ago

I really don't like appealing to ADTs (aren't objects good enough?),

not according to William Cook - those intrinsics really are ADTs, not objects.

I don't see how the "ADT model" allows for bigNums or the Integer/Rational/Float hierarchy

Of course you can - in some sense, that's the whole point. The cookies reify platform primitive intrinsic data - that's all. In the ADT model, all interactions with platform primitive intrinsic data are mediated by Grace code. You can write literally whatever Grace code you want: you interact with platform primitive intrinsic data via the explicit interfaces provided by the ADTs. The model is pretty much exactly the same as the "platform/memory" interface works (low level methods to allocate primitive "arrays", and to read and write, but no other protocol; they're only effective when wrapped in a grace object. The twist is, ideally, we'd want the compiler to inline at least one of those primitive fields when possible. A bit harder to do for arrays which are mutable and shareable (but not impossible); but easier for machine floats and integers which are values.

nor understand why doing so is any better than writing the intrinsic module using objects directly.

Using intrinsic objects directly could be fine --- my cookies could just be your intrinsics. The catch, though is that either the intrinsics must have both primitive parts and object parts, or we can't extend them (normally) by writing Grace code. The temptation will always be there to use intrinsics directly. The ADT interface should be horrible enough to avoid that temptation --- while still permitting the cookies to be implemented directly as unboxed machine values --- on JS or the JVM, cookie numbers would be actual host VM cookies...

kjx commented 6 years ago

I presume that what you meant to say is that the spec leaves open the question as to whether you can reuse built-in objects like booleans, numbers and strings.

OK so I shouldn't have said types - but on the other hand, I don't particularly like reuse meaning only inheritance but not composition.

The spec says (or air least used to say) we can reuse the booleans.

It doesn't now: true and false are constants or literals. They're not generative (that's what #154 is about).

Why would the numbers and strings be any different?

Because they're literals, not generative, and because (unless you want to write one literal class for every Number or String value) I can't see how the superclass could be manifest? It might work if the spec also said that there were special number and string classes that you could inherit from; but that again comes down to making an explicit decision that those classes are heritable. The cookie design avoids this because it's clear you can't inherit from the cookies - why would you want to anyway - but all the Grace behaviour is in plain Grace code. The intrinsic design could be used in just the same way - but the temptation will always be there to go to the intrinsics directly.

kjx commented 6 years ago

Why would the numbers and strings be any different?

because it's conceivable to implement Church Boolean efficiently, but not Church numbers?

apblack commented 6 years ago

In "Grace's Inheritace" we write:

This design also makes it possible to use the object true as a trait; this works because true in Grace, as in Smalltalk, has no state — just methods like

method or(another:Block) { self }
method and(another:Block) { another.apply }
method ifTrue(trueBlock:Block) ifFalse(falseBlock:Block) { trueBlock.apply }

Such a use is illustrated in the following example, which is motivated by Homer et al.’s design for object-oriented pattern-matching [HNB+12].

class successfulMatch(result′, bindings′) { 
use true
def result is public = result′
def bindings is public = bindings′ 
method asString { "SuccessfulMatch(result = {result}, bindings = {bindings})" }
}

This of course implies that true is generative — which does no harm, because == on true tests abstract equality, not object identity. The same can be true of Strings and Numbers: there is no reason to assume that "Hello" is a unique object, rather than a newly generated object. (In minigrace, strings and number generative — something that I've tried eliminating for efficiency reasons, but gave up on because caching and reusing the objects turned out to be slower than generating new ones.

This line of reasoning makes me even more unhappy with our freshness constraint on inheritance — its insisting that the programmer ensure something that should that should be a invisible implementation detail.

kjx commented 6 years ago

This design also makes it possible to use the object true as a trait

we might have written that, but perhaps we shouldn't have. There were designs where traits were actual objects - that's not the current design. The spec seems pretty clear.

This line of reasoning makes me even more unhappy with our freshness constraint on inheritance

I think removing the freshness constraint is incompatible with retaining pseudo-Javarian inheritance semantics. As Michael would say: objects, classes, pick one. The current design pretty much picks classes, and then the Black Equalities #134 let us pretend we're doing something closer to objects.

Interpreted - ideally in a language with dynamic variables (or implicits) available - I don't think the current semantics come out as too bad - or rather, the freshness part of them isn't too bad. My dynamic semantics (#136) has 64/2000 lines dealing with the "creation" dynamic/implicit parameter --- but at least half are just wiring the thing though. Either with generic parameters or a more general mechanism for dynamic flags in the interpreter could streamline this I think so fewer than 20 lines would actually manage the freshness tracking.

apblack commented 6 years ago

The point that I was making is not that the freshness constraint is hard to implement, but that it wires into the semantics an implementation issue that should be invisible. Whether an object that has only abstract identity (what @kjx calls a value object in #154) is cached or freshly generated ought to be a secret of its implementation.

I think removing the freshness constraint is incompatible with retaining pseudo-Javarian inheritance semantics.

I may be mis-remembering, but I think that's not quite right. We invented the freshness constraint to avoid having copy built into the language semantics. In other words, we pushed the responsibility of creating the new object onto the programmer, because it meant that the language did not have to define what copy meant. I think that it would be possible (but perhaps strange) to retain the pseudo-Javarian initialization semantics and nevertheless let the implementation do the copy — we would just initialize again after the copy.

I see two advantages of doing it this way:

  1. The language-defined copy need not actually make a copy, if it can cheat and not be caught.
  2. It exposes our initialization after inheritance rules as being bizarre.
kjx commented 6 years ago

It exposes our initialization after inheritance rules as being bizarre.

Well in that case we should take them out and shoot them. !!TRUMP!! Actually I was saying the opposite: I think with dynamic variable support in the interpreter / semantics, the freshness rules don't look too bad at all. Doesn't mean there aren't things we take out and shoot though (I HAS LIST as I suspect we all do).

we would just initialize again after the copy.

so that that point, aren't we just reinventing freshness semantics? to put it another way, we find the class of he object from which we're inheriting, and then instantiate that class. This seems a bit much to be able to inherit from

def x = object { 
  method a { ... } 
  def b = ...
  var c = ...

as well as

trait x { 
  method a { ... } 
  def b = ...
  var c = ...

Whether an object that has only abstract identity (what @kjx calls a value object in #154) is cached or freshly generated ought to be a secret of its implementation

well sure, but inheritance is an implementation relationship - superclasses don't have any private secrets kept from subclasses. Perhaps this argument works better the other way around: when trait generates value objects, which probably means captures no mutable state, there's no difference between the an instance of the trait and a singleton object. Now there's something the implementation can elide, but once the trait is a class; the once the object has imperative implementation code, then the distinction is observable, whatever the equality semantics.

kjx commented 6 years ago

I think that it would be possible (but perhaps strange) to retain the pseudo-Javarian initialization semantics and nevertheless let the implementation do the copy — we would just initialize again after the copy.

thinking this over (when I should have been doing something else) it seems the catch with the copy is that given Grace object initialisation is side-effectful, programs can (and likely will) observe the copy. Notably if extend the copy semantics straightforward to the inherit-from-method-tail-returning-object-constructor then initiation code in that objet construct will be run twice; once to actually run the object contstructor, in the context of the object described by that constructor, and then once again in the context of the copy - the actual final object.

It's worse than that - I fear every parental-part-object is a copy, and that copy initialisations that part. so doesn't that go to O(N^2) in the very worst case

apblack commented 6 years ago

This issue is way too abstract for me. The primitive objects are mirrors, exceptions, graceObject, done, numbers, strings and booleans. This list will probably grow as we find the need, but should be as small as possible. They are in a module called intrinsic, which is special, because it is not written in Grace and it's implementation is known to the compiler or interpreter.

When user case needs to access these things, it can import intrinsic. In particular, the standardGrace dialect will import intrinsic, and re-export things like Exception and Done.

I agree that there is some engineering necessary to make this work, without intrinsic incorporating the whole world. Is that what this issue is about? If so, I think that this can best be explored in the context of a particular implementation. I don't see that it needs to be part of the language definition.

The content of intrinsic should be in the language definition.