Open diego91964 opened 3 years ago
Aggregation feature allow Power BI to create a cached data inside Power BI Service, where the aggregated data are imported to the cloud and your report can take advantage of this cached data and load your visuals faster because the query will hit first this cached data before to try to submit a query to your data source (direct query).
In order to achieve this you need to know about:
In order to achive the aggregation feature here's the step by step we need to apply to understand and set the aggregation feature:
This article explain how to consume large data from Cargill Data Platform using Power BI.
In this example, we are going to read the customer profitability table from customers
To know the size of your table we have several ways, the simplest in this case is to make a count.
SELECT
COUNT(*) AS rows_quantity
FROM
schema_name.customer_profit
The result of this example is:
rows_quantity |
---|
343.165.452 |
We have already done some tests using dataflows and using our best gateway (closest to Hadoop servers, using Amazon AWS) and unfortunately it is not possible to read, it literally takes over hours until a time-out is returned.
To get around this problem we will use dataflows to aggregate our data, and simplify and be able to read fewer records for a given analysis.
Therefore, the next query shows the following dimensions and a sum measure.
SELECT account_description,
billing_date,
bpc_account_parent_4_description,
customer_segment_description,
l2_code_description,
plant_country,
Sum(usd_value) AS sum_usd
FROM schema.table
WHERE 1 = 1
AND bpc_account_parent_4_description = 'Gross Profit'
AND customer_segment_description IS NOT NULL
AND customer_segment_description <> ''
AND customer_segment_description <> 'Non Classified'
AND customer_segment_description <> 'X'
GROUP BY account_description,
billing_date,
bpc_account_parent_4_description,
customer_segment_description,
l2_code_description,
plant_country
The above query returns a total of 4,250,566 rows, which means only 1% of our total rows.
let
Source = Odbc.Query("host=hostnamem;port=00000;driver={Cloudera ODBC Driver for Impala};usesystemtruststore=1;ssl=1;checkcertrevocation=0;authmech=1","")
in
Source
Allows a report to have two or more data connections from different source groups, such as one or more DirectQuery connections and an import connection, two or more DirectQuery connections, or any combination thereof.
In order to apply your aggregation, you'll need to map your fields and measures.