t27duck / active_reporting

OLAP-like DSL for ActiveRecord-based reporting
MIT License
136 stars 19 forks source link

Consider null values ​​when group by dimension #31

Closed germanotm closed 4 years ago

germanotm commented 4 years ago

Problem

Considering this models

class Game < ActiveRecord::Base
  belongs_to :platform
end

class Platform < ActiveRecord::Base
  has_many :games
end

I want a report of Games grouped by Platform dimension. This could be easy achieve adding a dimension platform to GameFactModel. But, what if i have some non categorized Games that have platform_id equals nil.

In this scenario the report will only get the Games that have a Platform. Games without a Platform will be missing.


Possible Solution

This behavior can be achieve using a Model.left_outer_joins instead using Model.joins on ActiveReporting::Report#statement.

I'm not sure what is the best way to add this behavior to the gem.

I could add another type of dimension , a left_join type, id addition to :standard and :degenerate. Then when passing the dimension to Metric i add a type to hash with dimension type. This would allow to have dimensions defined with standard type or left_join type.

Or maybe a global configuration to use INNER JOIN or LEFT OUTER JOIN is more suitable for this task.

Not sure if the LEFT OUTER JOIN behavior should be the default, that way I would never have missing values.


Any thoughts about this matter?

@andresgutgon @niborg

t27duck commented 4 years ago

The original codebase where I extracted/rewrote this gem from had one or two edge cases where we also wanted to do an outer join to group by the "missing relationship". The way it was "handled" for those cases was ... somewhere we had an include_blank: true option either with the dimension list or somewhere on the Report object itself. Setting that flag would result in an outer join being done on the dimension (back then, we didn't have left_outer_joins on ActiveRecord so we had to build the SQL snippet by hand... poorly I might add 😦 ). I'm about 3 years removed from that employer so the actual implementation detail is lost in my memory.

My guy says either something like your idea of a separate dimension type or passing in some flag from the report object, down the chain so it knows to do an outer join for a dimension.

germanotm commented 4 years ago

I decided to add a join_method option on report, this way i have more flexibility to chose when have "missing relationship" and where not. The dimension already accept field and name options, so it look like the right place to me.

Now you can pass an option on dimension array.

Ex: [{ dimension1: { join_method: :left_outer_joins }}]

and even combine left joins with inner joins in the same report

Ex: [{ dimension1: { join_method: :left_outer_joins }},:dimension2]

The only available options is :joins and :left_outer_joins. The default is joins.

A real example of using the option:

metric = ActiveReporting::Metric.new( :a_metric, 
  fact_model: GameFactModel, 
  dimensions: [{ platform: { join_method: :left_outer_joins }}], 
  aggregate: :sum
)
report = ActiveReporting::Report.new(metric)

I submit a pull request #32.