Experimental branch on aggregation operators for spaces

ghost commented 6 years ago

The branch aggregation contains a new interface and new operators for aggregation:

t, _ := s.PutAgg(f, x...): Put a tuple to space s obtained by using a user defined aggregation function f on the tuples matched by a template x.... If no tuples are matched, the template x... is converted to its intrinsic tuple, that is, the concrete values read from the template itself, and put in s instead.
t, _ := s.GetAgg(f, x...): Get a tuple from space s obtained by using a user defined aggregation function f on the tuples matched by a template x..., while removing the tuples in the process.
t, _ := s.QueryAgg(f, x...): Query for a tuple from space s obtained by using a user defined aggregation function f on the tuples matched by a template x..., leaving space s intact.

All operators are non-blocking and return a copy of aggregate tuple t. The returned copy can be ignored.

Below is a visualisation of the three operators PutAgg(), GetAgg() and QueryAgg(). t corresponds to a tuple returned, while template T = x.... The empty tuple is (), and small crosses, circles and squares indicate tuples of possibly different type.

aggregation

The interface referred to here will be referred to as the Star Library.

ghost commented 6 years ago

Example:

package main

import (
    "fmt"
    . "github.com/pspaces/gospace"
)

func sum(a Tuple, b Tuple) Tuple {
    s := make([]interface{}, a.Length())

    for i := 0; i < a.Length(); i += 1 {
        va := a.GetFieldAt(i).(int)
        vb := b.GetFieldAt(i).(int)
        s[i] = va + vb
    }

    return CreateTuple(s...)
}

func main() {
    spc := NewSpace("space")

    spc.Put(1, 2, 3, 4, 5)
    spc.Put(2, 3, 4, 5, 6)
    spc.Put(3, 4, 5, 6, 7)
    spc.Put(4, 5, 6, 7, 8)
    spc.Put(5, 6, 7, 8, 9)

    var i, j, k, l, m int

    q, _ := spc.Query(1, &i, &j, &k, &l)
    fmt.Printf("Query: %v\n", q)
    fmt.Printf("Size: %v\n", spc.Size())

    a, _ := spc.QueryAgg(sum, &i, &j, &k, &l, &m)
    fmt.Printf("QueryAgg: %v\n", a)
    fmt.Printf("Size: %v\n", spc.Size())

    b, _ := spc.GetAgg(sum, &i, &j, &k, &l, &m)
    fmt.Printf("GetAgg: %v\n", b)
    fmt.Printf("Size: %v\n", spc.Size())

    c, _ := spc.PutAgg(sum, "a", &a, "b", &b)
    fmt.Printf("PutAgg: %v\n", c)
    fmt.Printf("Size: %v\n", spc.Size())

    spc.Put(1, 2, 3, 4, 5)
    spc.Put(2, 3, 4, 5, 6)
    spc.Put(3, 4, 5, 6, 7)
    spc.Put(4, 5, 6, 7, 8)
    spc.Put(5, 6, 7, 8, 9)

    d, _ := spc.PutAgg(sum, &i, &j, &k, &l, &m)
    fmt.Printf("PutAgg: %v\n", d)
    fmt.Printf("Size: %v\n", spc.Size())
}

Outputs:

Query: (1, 2, 3, 4, 5)
Size: 5
QueryAgg: (15, 20, 25, 30, 35)
Size: 5
GetAgg: (15, 20, 25, 30, 35)
Size: 0
PutAgg: ("a", (15, 20, 25, 30, 35), "b", (15, 20, 25, 30, 35))
Size: 1
PutAgg: (15, 20, 25, 30, 35)
Size: 2

ghost commented 6 years ago

@albertolluch @mshPolo

See commit 3cc3d778a6a66f6554dfeca63b93f33117c6b896.

Comments are welcome.

albertolluch commented 6 years ago

Operation PutAgg is not clear to me.

ghost commented 6 years ago

Description has been clarified and visualisation of the operators are provided for easier understanding.

albertolluch commented 6 years ago

It seems to me that PutAgg can be implemented/defined via other operations as follows.

t, _ := s.PutAgg(f, x...)

can be obtained with:

t, _ := s.GetAgg(f, x...)
s.Put(t)

If this is indeed the case, I don't see the need of PutAgg. I guess it is a matter of minimality vs symmetry in the API. Is that the point?

The meaning of the template x... is converted to its intrinsic tuple is not cleary to me. What is the intrinsic tuple of template &x?

On a related note, the behaviour of aggregations when the set of matching tuples is the empty set should be explained. I would say that every aggregation function f should be defined for multisets and hence, the value of f(nil) should be well defined.

sequenze commented 6 years ago

t, _ := s.PutAgg(f, x...): Put a tuple to space s obtained by using a user defined aggregation function f on the tuples matched by a template x.... If no tuples are matched, the template x... is converted to its intrinsic tuple, that is, the concrete values read from the template itself, and put in s instead.

If im understanding correctly, PutAgg takes a function f, and a template to match the tuples in space s. For all tuples which match the template will be aggregated over. The result is then stored in the tuple space. If however, no tuples match the template, then the template is instead considered a tuple itself and subsequently stored in the tuple space?

a) I get the first part. That is a natural aggregation. But I do not understand the second part. What is the utility in storing the pattern / template?

b) Im going out on a limb here. But it looks like PutAgg is trying to do too much in one operation. not unlike a god-functions / god classes.

2) This only works for local spaces right?

Other than that it looks nice :D

albertolluch commented 6 years ago

@sequenze: me and @luhac discussed this offline and the description will be updated. Long story short: putAgg is getAgg+put in one shot, atomically.

I can see this as a frequent pattern when you want do reduce/abstract large amounts of data. A name like "reduce" or "compress" could be more faithful to the intended semantics.

As for having it in the API, I posed the same question. Minimality of the API would argue against including the operation. @luhac argues here for some form of symmetry/homogeneity in the API (same argument for the current name). I guess that efficiency could be also an issue here: atomicity of the operation can be better handled directly wrt. requiring the programmer to get it using lock tuples, for the same reason as we have query (which can be obtained as get+put).

sequenze commented 6 years ago

@sequenze: me and @luhac discussed this offline and the description will be updated. Long story short: putAgg is getAgg+put in one shot, atomically.

I can see this as a frequent pattern when you want do reduce/abstract large amounts of data. A name like "reduce" or "compress" could be more faithful to the intended semantics.

I get that part. That makes sense. Having the possibility of aggregating and putting in one atomic operation is nice.

My question was merely related to the storing of the pattern? As far as i understand, it stores the pattern/template when no tuples match the pattern/template. Wouldn't it be more true to just not put any tuple at all? considering its a non-blocking call then perhaps have it return a boolean. true if a tuple was created; otherwise false.

Edit: Why is PuttAgg also returning the values that was put'ted?

albertolluch commented 6 years ago

We also discussed this.

In my view aggregation functions f must have multisets (unordereded lists) of tuples as domain and tuples as co-domain.

If there is no matching tuple, the resulting aggregated tuple should just be f(\emptyset).

Recall that f is user-defined. In my view it is the responsibility of the programmer to provide a well-formed f. For many aggregation functions, the result of f(\emptyset) is the identity of the function. Examples: sum(\emptyset) is 0, max(\emptyset) is the minimumu value in the corresponding domain, dually for min().

For other functions we have different story. For example, avg(\emptyset) is nonsense. The result should be undefined. When it comes to reducing a set of tuples I would go here for just not adding tuple to the tuple space. This could be accomplished by requiring f() to return an error in those cases, so that putAgg knows that nothing is to be done.

Alternatively, and more simply: the semantics of putAgg when no tuple is matched is just to do nothing.

I see advantages/disadvantages in both solutions.

albertolluch commented 6 years ago

PS. the rationale behind putAgg returning the tuple added is the same as for the normal put in Go. @luhac proposed this to make put have a similar return signature as get/query. I said ok for trying this experimental feature. It may be interesting for instance if one applies some form of access control that may accept tuples after modifying them (e.g. sanitizing/anonymizing some fields) and may notify the user how the modified tuple is actually stored.

sequenze commented 6 years ago

This quote just popped into my head: "Better to remain silent and be thought a fool than to speak and to remove all doubt."

So it occurs to me that i am better of being silent :D

albertolluch commented 6 years ago

I don't agree. @luhac and me forgot to update this thread, and all your concerns/doubts were more than reasonable. Thanks for your feedback!

pSpaces / goSpace

Experimental branch on aggregation operators for spaces #7