google / mangle

Apache License 2.0
1.09k stars 38 forks source link

Calculating over aggregate values #38

Closed maxott closed 1 month ago

maxott commented 1 month ago

I have a list of observations of coverage of some species at various locations within various sites. There will be multiple observations per site and they can be for the same species. Through the group_by I can calculate the sum and the count for a particular site/species pair, but how could I calculate CoverAvg = CoverSum / Count. Wherever I put it, it didn't work. Do I need to put that into a sub relation? Actually, I'm still very confused about the semantics of the \> operator.

query(SiteID, Species, CoverSum, Count) :-
    survey_image(SiteID, Species, Cover),
    |> do fn:group_by(SiteID, Species),
        let CoverSum = fn:float:sum(Cover),
        let Count = fn:count().
burakemir commented 1 month ago

One way to look at this is: We are missing the avg() reducer function. I should add that.

If we stick to what is available now: the workaround you describe is perfectly good. You then need a second rule

query_avg(SiteID, Species, Avg) :- query(SiteID, Species, CoverSum, Count)
  |> let Avg = fn:float:div(CoverSum, Count). 

Indeed |> looks like an operator and on some level it is, but Mangle does not support using it multiple times. This example shows how it would make sense to support it.

|> do fn:group_by(key...) is something that transforms the set of rows (per "key") to a single row (per "key") |> let ... is something that transforms every row (like "map")

After some time, support was added so that the mapping aka "let-transform" can also be done more conveniently done like this:

query_avg(SiteID, Species, Avg) :-
  query(SiteID, Species, CoverSum, Count), Avg = fn:float:div(CoverSum, Count). 

Thus there is no real need for let-transform if one cannot compose multiple uses of |> . I still would like to eventually support composition of |>, and then let-transforms become actually useful to do some transformation in between aggregations.

Does this explanation help?

I will consider this issue as two feature requests, one for fn:avg(), one for composing |>.

maxott commented 1 month ago

Thanks for the explanation. I do have a PR for you to allow users to add "custom" operators, but I got side tracked as I first needed to work out if my company already signed the necessary agreement or if I can do that myself. Worked that out, but now I need to find the time to sync with your current version ... hopefully soon.