Simulating Objects: Performance

ChristianGruen commented 10 months ago

Related to #953, #917 and #916, I wonder whether we are aware enough of the essential differences when we think of objects in a functional language:

Mutable objects are extremely efficient, as an update is a simple main-memory value change.
Immutable data structures need to be fully copied if a single value changes. As a result, the update of a map with, let’s say, 1 string and 50 functions would be a new map with 1 string and 50 functions. Even with efficient immutable map implementations that we have, I doubt that it makes sense to create full copies with 1+50 entries, of which only 1 string will be different.
Imagine a FLWOR expression that creates 1000 of such maps, with possibly 1 value that’s different in each instance. We don’t need 1000 copies of 50 functions; the memory consumption would be much smaller if we only stored relevant values.

This thread is not about premature optimization; I just want to be sure we think about the obstacles when using maps for objects. Maybe the solutions are already on the horizon; maybe we could tackle some of the concerns with the definition of default values…

declare record person(
  name   as xs:string,
  title  := (),
  full   := fn { string-join((?title, ?name), ' ') }
);

…and maps with type annotations. If we don’t materialize defaults, the embedded annotation would indeed need to effect functions like map:get, as questioned by Michael in https://github.com/qt4cg/qtspecs/pull/953#issuecomment-1896078605.

dnovatchev commented 10 months ago

The issue is more the cost of allowing them to do this when they don't need to, not the cost of them actually doing it and trying to stop them. Const tells the environment to not bother allowing them when they dont, and thus it can optimise on that basis.

C# does not support const methods, properties, or events.

Well, what does this (above fact) mean? Even such an imperative/mutable language as C# is free of this "const wisdom".

As of now, all XPath system functions can be regarded as what in C# are static methods (because we do not have objects yet), and the namespaces they reside in play the same role as C# static classes.

A static class is the same as a singleton class. It doesn't make any sense (and is dangerous) to programmatically alter any members of a static class, because this may affect other, unknown clients of this static class. People of course know this and do not shoot themselves in the foot.

We need to learn more from the best practices and years of accumulated experience and wisdom in our favorite programming languages.

MarkNicholls commented 10 months ago

I'm a little confused, C# (and most mainstream OO languages) DO implement methods as effectively "const". The method is held once in single definition and referred to from the "object" via a vtable (usually).

You CAN obviously define a class with mutable Func<A,B> fields, (i.e. not "const") and the data is then held on the object. If you want to follow the C# example then you would need "const" (you may want to call it "method" but its the same thing in a different costume).

This conversation is about the analogous situation, declaring a "method" as const, would mean it could be held once.

But from my perspective we're flogging a dead horse, I think its at worst its a premature optimization, and at best unnecessary, but still worth a chat.

dnovatchev commented 10 months ago

I'm a little confused, C# (and most mainstream OO languages) DO implement methods as effectively "const". The method is held once in single definition and referred to from the "object" via a vtable (usually).

Exactly!

This is why prefixing a method with const is meaningless.

Why would we want to do this in a immutable/functional language is beyond any logic.

MarkNicholls commented 10 months ago

ok, for the moment, lets just forget that we probably both dont think this is a sensible thing to do, but for different reasons.

So maybe there a nuance to the syntax I dont comprehend,

in C# these 2 things means 2 different but similar things.

    class Foo
    {
        public string name = "name";
        public string title = "title";
        public Func<string> full;

        public Foo()
        {
            this.full = () => this.name + " " + this.title;
        }
    }

    class Bar
    {
        public string name = "name";
        public string title = "title";

        public Bar()
        {
        }

        public string full() => this.name + " " + this.title;
    }

the 2nd version has a method, i.e. its readonly, its "const". the 1st version in theory can be amended to so something different.

and because of that the 1st version has a pointer on each object to "full" that the 2nd version doesnt.

I thought the record declaration was basically the equivalent to the 1st version, i.e. there was no difference in principle between name/title and full...they were all held on the object, they just have different types.

dnovatchev commented 10 months ago

he 2nd version has a method, i.e. its readonly, its "const". the 1st version in theory can be amended to so something different.

Yes, absolutely correct.

The only sensible use for the 1-st scenario that I am aware of, is implementing dependency injection, and in .NET Core and .NET 6+ this is done very often.

And nobody is complaining that this hinders performance (due to extraordinary memory consumption) - because we do know what we are doing and we want it this way.

One could think of having a way to ask for the value of Full dynamically - only when really needed - something like lazy/deferred execution. But this would be of little use if there are many independent tasks all running asynchronously.

Here's code that actually is compiled and executed successfully, where in the Main method we do our own "dependency injection":

    public class Program
    {
        static void Main(string[] args)
        {
            var foo = new Foo();
            foo.Full = () => DateTime.Now.ToString();
            Console.WriteLine($"Hello, World - {foo.Full()}");
        }
    }

    public class Foo
    {
        public string Name = "name";
        public string Title = "title";
        public Func<string> Full;

        public Foo()
        {
            this.Full = () => this.Name + " " + this.Title;
        }
    }

ChristianGruen commented 10 months ago

I thought the record declaration was basically the equivalent to the 1st version, i.e. there was no difference in principle between name/title and full...they were all held on the object, they just have different types.

@MarkNicholls Right (provided we’ll add record constructors, which is likely). I think it’s foreseeable that the second variant in your code is out of scope for our languages (or at least version 4).

Maybe this here…

class Person {
  string name;
  Func<string> welcome;

  Person(name, welcome) {
    this.name = name;
    this.welcome = welcome;
  }
}
new Person('X', () => "Hello " + this.name);

…is even closer to the record constructor approach. And this here…

class Person {
  string name;
  Func<string> welcome = () => "Hello " + this.name;

  Person(name) {
    this.name = name;
  }
}

…could be simulated if we support default values in the constructor.

As a strictly-typed language, C# offers good type safety. With the given proposals, we cannot offer that, as the value of a record entry can easily be replaced by a value of a different type, or completely removed:

declare item type person as record(name as xs:string, hello as function() as xs:string);
let $person := person('John', fn($this) { 'Hello ' || $this })
let $updated := map:put($person, $key, 'X')
return $updated

If $key is welcome, $updated?welcome() raises an error; if it is name, $updated?welcome() returns Hello X. In fact, $updated is no person record anymore if an update takes place that violates the record declaration, but that’s something a user will not be aware of. We could potentially improve this by…

introducing typed maps, or/and
adding keywords to prevent developers (who may not necessarily the ones who wrote the record declaration!) from removing or updating specific map entries.

But back to performance: Typing is a good example of something that’s beneficial for both developers and optimizers. The same applies to many keywords in C# or Java (like final, sealed, readonly, etc.): It controls how users work with data structures, and it helps optimizers to do things more efficiently.

MarkNicholls commented 10 months ago

@dnovatchev

in my experience OO dependency injection is done by passing interfaces with "const" methods, not setting functions (i.e. example 2), but I accept it is possible. I have seen people (FPers) "dictionary passing" (using example 1), but not that often, "dictionaries" still tend to be defined statically, i.e. in types with "const" function pointers (i.e. in a similar manner to scala/haskell), and it ends up looking like OO DI (I don't understand or like magic injection frameworks, they may work differently). I have never seen people embed functions in large scale "data" (e.g. records in databases or xml), so I can't really comment whether this causes performance issues, BUT if i did this, I wouldn't really worry about it (in C# etc), unless the class definitions were bloated and I saw sluggish performance, but C# is another notch up on the performance metrics, so who knows.

I'm still slightly confused by your comments, on one hand you say C# doesnt support const methods, and then accept that methods are by definition const pointers, that is the prevailing OO doctrine, and the proposed record syntax doesn't do that.

@ChristianGruen

I agree with your psudeo C# code. I don't completely understand your XQuery example (I don't understand how I can create a person with only a name and not get an error), but I think i take the point, and we've discussed and agreed(?) previously that we think records and maps are 2 different things and really should be dealt with differently (even if they share some common underlying mechanisms) - or did I dream it?

I think we agree that the question is valid, but I'm of the opinion that its probably premature to introduce a keyword to help any optimisation because

1) I'm not convinced it will be an issue, rather than an edge case, which can be optimised by the developer by explicitly embedding a vtable inside the record themselves...ugly yes, but not a show stopper. 2) IF this emerges as a real issue then a) the implementation can probably be done in a way to flip between a "const" (space efficient) representation of the "object" to a dynamic "overriden" local values (as MK said I don't think it takes "too much ingenuity" and naively suggested a simplisitic mechanism to achieve it), without the need for an explicit "const" keywork. b) worst case is you introduce "const" or whatever, later, and kick the can down the road.

I'd kick the can down the road.

"premature optimisation is the root of all evil" (Knuth)

ChristianGruen commented 10 months ago

I don't completely understand your XQuery example

…my fault, I’ve just fixed it (I’ve added the function argument and replaced 'welcome' with $key).

we've discussed and agreed(?) previously that we think records and maps are 2 different things and really should be dealt with differently

In the current specification and in the upcoming proposals, they are very similar and treated equally: Record tests can be used to check if a map matches certain criteria, but its type doesn’t change. A record is nothing else than a map.

ChristianGruen commented 10 months ago

@MarkNicholls To avoid misunderstandings: With my last comment, I mostly wanted to summarize how records are going to be designed in our language, and that a record is nothing else than a map.

What I took away from the fruitful discussion is that there’s no real need to restrict the discussed optimizations to records. Instead, they should probably be applied to arbitrary maps. For example, an optimizer could rewrite code like…

declare function local:update-number($map, $n) {
  map:put($map, 'n', $n)
};
let $map := map { 'n': 0, 'inc': fn($this) { $this?n + 1 } }
for $n in 1 to 10
let $updated-map := local:update-number($map, $n)
return $updated-map ? inc($updated-map)

…to…

...
return fn($this) { $this?n + 1 }($updated-map)

…in order to avoid the repeated lookup of the inc key in the return clause (for this, it must be detected that intermediate map:put calls won’t remove or overwrite the inc key from the original map). Next, the internal representation of $updated-map needn’t contain all keys from $map. Instead, it can internally reference the existing map and only store the updated values.

For the example above, optimizations of this kind are of like shooting sparrows with a cannon (or do you say “breaking a butterfly on a wheel”? Pardon my English). For more sophisticated code, it would certainly be much more relevant.

ChristianGruen commented 10 months ago

To quote Donald Knuth: "Premature optimization is the root of all evil..."

;) See my initial comment. – And as we've all learnt, it continues with: “Yet we should not pass up our opportunities in that critical 3 %. A good programmer will not be lulled into complacency by such reasoning.”. It may be less known that the quote is from one of his papers that advocates the use of goto statements.

dnovatchev commented 10 months ago

@dnovatchev

in my experience OO dependency injection is done by passing interfaces with "const" methods, not setting functions (i.e. example 2), but I accept it is possible.

@MarkNicholls ,

There is the so called "lifecycle management". Dependency injection could be global (using the AddSingleton method), or per-request (using the AddScoped method) or completely dynamic - using the AddTransient method.

As for property injection in C#, see this.

And yes, it is typical for the property of an object, that needs to be injected to be declared with type that is an interface, but I think the type could equally enough be an object that is not sealed (either abstract or having virtual methods)

I'm still slightly confused by your comments, on one hand you say C# doesnt support const methods

This is not what I am saying but what the Microsoft documentation states - as was shown with a screenshot of the content of the provided link.

dnovatchev commented 10 months ago

Maybe looking at the rules for C# Record types could be helpful, especially the rules for Nondestructive Mutation

ChristianGruen commented 10 months ago

For the sake of completeness, a link to the JavaScript Record proposal.

For updates, JavaScript’s spread operator is used:

// Add a Record field
let rec = #{ a: 1, x: 5 }
#{ ...rec, b: 2 }  // #{ a: 1, b: 2, x: 5 }

// Change a Record field
#{ ...rec, x: 6 }  // #{ a: 1, x: 6 }

MarkNicholls commented 9 months ago

@ChristianGruen

so this is basically the same as C#/F#, though i suspect you can 'extend' the object? (which in C#)

As an aside, F# supports "anonymous records" whereby the type of the record is inferred simply by the instantiation of a record

i.e. this is the normal record declaration and usage

type NormalRecordLikeCS = { x ; int }
let normalRecordLikeCS = { x = 1 }

and this....

// this is fine
let normalRecordLikeCS2 = { normalRecordLikeCS with x = 2 }
// this is a type error
let normalRecordLikeCS3 = { normalRecordLikeCS with z = 567 }

i.e. the field z isnt part of the declaration

for anonymous records it works like this

// this is fine, it is in effect a type declaration and a call the "constructor"
let anonRecord1 = {| x = "2" |}
// this is also fine...and has the same type
let anonRecord2 = {| anonRecord1 with x = "567" |}
// this is ALSO fine, but defines a new type too
let anonRecord3 = {| anonRecord2 with newField = "new field" |}

there is no nominal relationship between the two anonymous types (I think), it doesnt make the 2nd one a subtype of the first. (F#s type system allows polymorhism is slightly more esoteric ways than C# so this isnt a massive issue...I think)

be worth looking at typescript ("thinking man's/woman's" javascript).

As I've said the issue with all these basically idental syntactical constructs is deeply nested records require the developer to write a long update statement effectively unpacking the record to the point it needs to change, before "setting" the member.

In extremis I've used "Lenses"

https://www.fpcomplete.com/haskell/tutorial/lens/

but they arent pain free.

rhdunn commented 9 months ago

Regarding JavaScript, the JS Engines track the shape of an object [1] and essentially have the object instance as an array with n items (one for each property/key value) and a shape type (with an equivalently sized array to track the information about each property -- its name, etc.).

I could see something similar being done by an XPath/XQuery/XSLT engine. Here, the record objects could be made into predefined shape instances. Then, when map:put etc are called on the object, the shape is modified accordingly (creating a new shape if one doesn't exist).

A record type could then have a set of shapes that conform to it, so the only things needed to be done for instanceof checks are: 1) check if the instance's shape is in the set of shapes for the record; and 2) check the types of each property.

As long as the language provides a suitable framework to allow these performance optimizations then that should be sufficient. It is then upto the engines to make the optimizations where necessary for their target platform, e.g. by taking advantage of JavaScript engine shape optimizations, database tables, immutable data structures, etc.

[1] https://mathiasbynens.be/notes/shapes-ics

michaelhkay commented 9 months ago

Thanks for the link. I had assumed Javascript did something like that, but hadn't seen a description in writing. We do have a similar approach in Saxon, insofar as the map implementation adapts itself to the characteristics of the map; I have been planning to do a new map implementation for maps that are known to conform to a specific record type but haven't quite got there yet. I did some crude statistics and found that the vast majority of maps are never modified (i.e. subjected to put/remove operations) after initial creation, so optimising for read-only certainly makes sense.

There's also the possibility of doing something like Saxon does for XML element and attribute names: allocate integer fingerprints to names that are recognised during static analysis, and replace string searches with integer searches in those cases.

I do think that the JS experience should convince us there is no need to be unduly influenced by performance factors in the language design: if it's cleanly specified, implementors will find a way to optimise it. (There are exceptions, however: in the XML node tree model, the use of node identity and parent pointers does severely limit performance options however hard you try.)

ChristianGruen commented 9 months ago

Regarding JavaScript, the JS Engines track the shape of an object [1] and essentially have the object instance as an array with n items (one for each property/key value) and a shape type (with an equivalently sized array to track the information about each property -- its name, etc.).

Thanks as well, Reece. Your summary and the article are a great addition on how dynamic lookups can be turned into fixed-size access via offsets.

Ironically, my initial comment in this thread was mostly about the price we pay for immutability – i.e., a challenge that JavaScript doesn’t have – but the idea of shapes is inspiring: When a record is updated, the original record could be preserved and accessed for values that don’t change. In principle, that’s also what hash tries do (but more generally).

michaelhkay commented 8 months ago

This has been a very stimulating discussion and we have all learnt a lot, but there has been no concrete proposal for action so I think it is time to close it.

ndw commented 8 months ago

The CG decided to close this issue without any further action at meeting 069.

qt4cg / qtspecs

Simulating Objects: Performance #961