red / REP

Red Enhancement Process
BSD 3-Clause "New" or "Revised" License
11 stars 4 forks source link

WISH: replace class-of with a word-matching routine #102

Open hiiamboris opened 3 years ago

hiiamboris commented 3 years ago

class-of has proven itself to be a design you can't rely on, see https://github.com/red/red/issues/4396

>> class-of a: object [x: 1]
== 1000444
>> class-of b: make a []
== 1000444
>> class-of b: make a [y: 2]
== 1000445            ;) once you add smth, it's a new class

It will still work for objects you never expect to be extended, but I like much more Python's "if it quacks like a duck, it must be a duck" philosophy because it's reliable. It boils down to this:

child-of: func [model [object!] child [object!]] [
    find/match words-of child words-of model
]

It works because make proto new adds new words after the words of the proto.

Problem with this mezz is that both words-of calls allocate new blocks, so one can easily overload the GC. Another problem is that this operation is supposed to be lightning fast like, because it may be used in on-deep-change*, in timers, in event filters - time critical places.

I suggest we implement a routine version of it and ship it with Red runtime.

There can be naming variants though..

What name and type (func/op) should work better?

loziniak commented 3 years ago

What name and type (func/op) should work better?

Perhaps made-from, since we use make to create a child/descendant? I'd also count only any-function! words, to not relay on internal fields.

hiiamboris commented 3 years ago

any-function won't do what if you're testing if object [x: 1 y: 2 z: 3] is a descendant of object [x: 4 y: 5]?

loziniak commented 3 years ago

If we talk about duck-typing, we only care about functionality, not data/state, don't we? In the case you provided, I'd say object 1 is a functional descendant of object 2, because it includes it's functionality, which is NO functionality :-)

hiiamboris commented 3 years ago

OK ;) But think about the data model. object 1 extends data model of object 2. Anyway, I don't know a use case for checking functions only. If you have any, let's consider it as an option.

hiiamboris commented 3 years ago

Another more flexible option would be to check presence of all words of one object in the second object (regardless of their order), but this will be slower due to comparisons becoming hash lookups. It's more into the field of multiple-inheritance than just make A [stuff].

greggirwin commented 3 years ago

Seems there are conflicting goals: 1) flexible, 2) lighting fast.

I agree that Duck Typing means we care about compatibility, not what prototype(s) might have been used. This also plays against speed. Wording is tricky too, and I lean toward made-from, but even that may not be informative or complete enough. It's also a lie if we're talking about ducks that came from different bloodlines.

Two thoughts:

1) I have old set op funcs whose names might work, and be useful beyond this case.

subset?: func [
    "Returns true if A is a subset of B; false otherwise."
    a [series! bitset!]
    b [series! bitset!]
][
    empty? exclude a b
]

superset?: func [
    "Returns true if A is a superset of B; false otherwise."
    a [series! bitset!]
    b [series! bitset!]
][
    subset? b a
]

Since all they return is a logic! result, in theory we could do it at the R/S level and optimize copies away. That's a lot more work, without knowing how useful it will be in general. I would have to scour my brain to remember why I needed them in the past, though I know I did. This is going to be extremely limited from a feature standpoint, but can be fast for what are common cases with small objects.

2) A powerful spec matcher, which is not going to be light or fast. I've had this on my list for a long time, and draft notes and code to go with it. I think this will be great to have, even if as an optional module.

hiiamboris commented 3 years ago

Seems there are conflicting goals: 1) flexible, 2) lighting fast.

True. After some more study of the code, I think that flexible version will be fast enough for the task.

Objects already possess a hash table. Setup code may be done only once: https://github.com/red/red/blob/9d91cdd273a9c9f7851d7a3a70750a3b77b22f71/runtime/hashtable.reds#L1723-L1746

Lookup code should be repeated for every word, but they're pretty light - should be 1-2 iterations on average, no hashing done. Plus it will iterate until the first failed lookup, making negative cases even quicker: https://github.com/red/red/blob/9d91cdd273a9c9f7851d7a3a70750a3b77b22f71/runtime/hashtable.reds#L1748-L1766

Red's baseline speed (on my laptop) is about 40-50ns per token. Thing as simple as in obj 'word takes about 250ns. class-of face! - 240ns. find/match [a b c] [a b] - 350ns. find/match set-of-30 set-of-25 - 800ns (success), 300ns (early failure). That's the top throughput we can count on: ~1.2M ops per sec. So if timer fires at 60 FPS, it's 20k ops per one timer iteration max. If timer manages to keep it under 2k ops per iteration, slowdown is acceptable, under 400 ops - negligible. Divide op count by 5-10 for old machines. Benchmarks will tell for sure, but just from the looks of the code I don't see why ~25 lookups (count of words in a face) should take up total to more than 1000ns if lookups are made in one batch. In fact, I don't see why it can't be done in 500ns (considering that find has to account for a lot of other stuff, this routine may be faster than find).

hiiamboris commented 3 years ago

On subset? & superset?.. hopefully the algorithm can be generalized to support also these two cases. Super duper spec matching is for another day though ;)

greggirwin commented 3 years ago

Thanks for the numbers, which are like candy to me. Actually, better because I don't eat much candy.

The most important thing is correctness in the context of use. @dockimbel needs to weigh in on his original design goal for class-of, or if it was just a placeholder. These new approaches seem to augment that, not replace it though. In looking at my old object lib collection, there are a number of things that we can consider in letting people play with objects in different ways. Super-spec is bigger, yes.

greggirwin commented 3 years ago

The great thing about code that might be too slow is that we'll find out when it is, and can optimize it then. As long as we don't botch the interface to things too badly, I'm all for slow code.

qtxie commented 2 years ago

Related issue https://github.com/red/red/issues/4269.

hiiamboris commented 2 years ago

Work on Spaces has opened a new dimension to this topic.

Yes, as a user of the object, duck-typing is enough for me to interact with it. But I implemented per-class type & value checking and on-change handlers. Per-class for two reasons:

For this resource economy use case, two seemingly identical ducks may belong to different classes and vary in their checks or on-change handlers. For example, general list is different from a specialized list belonging to list-view widget (mainly to support infinite dimensions). What I do is tell Red that "this object belongs to that class" at construction time. Often I reclassify it multiple times, a simplistic illustration:

ascendant-template: [
    classify-object self 'ascendant
    x: 1  #type [integer!]
]
descendant-template: append copy/deep ascendant-template [
    classify-object self 'descandant
    x: 1x1   #type [pair!]
]

If we had language (R/S) support for features of #132 then I suppose it could be fast enough and not require too much RAM as long as on-change body is referenced, not copied. Then it could be supported on per-object granularity, and there would be no need for such non-duck-like classes. It would still require though per object word: a typeset and a few references to checking code, and the checking code itself, so still a few times the RAM a simple hash-backed object takes, so doesn't seem like a sound option. Maybe if we had both simple and typed objects as different types, or if simple object would be promoted to typed one on demand, so users would have a choice, that would be more acceptable. Still doesn't grab me yet. Having a hidden class name or reference seems like a better solution.

So this is deeper than I thought. Basically it comes down to two interpretations: "does it behave like 'class?" or "does it work internally as 'class?" (is this duck live or robotic?).

hiiamboris commented 2 years ago

Also a bit of an issue here is that system/words will qualify as any other duck, because it has all the words :)

>> face? system/words
== true