woocommerce / woocommerce-admin

(Deprecated) This plugin has been merged to woocommerce/woocommerce
https://woocommerce.github.io/woocommerce-admin/#/
Other
362 stars 146 forks source link

Filters to alter the columns used by a DataStore class #4147

Closed daigo75 closed 4 years ago

daigo75 commented 4 years ago

Description of the issue

I need to be able to extend the reports generated by the WC Admin so that it takes different values. This is needed to be able to prepare correct summary reports in a multi-currency scenario.

Current behaviour DataStore uses columns like the following (only relevant columns included):

$this->report_columns = array(
    'gross_sales'     => $gross_sales,
    'total_sales'     => "SUM({$table_name}.total_sales) AS total_sales",
    'coupons'         => 'COALESCE( SUM(discount_amount), 0 ) AS coupons', // SUM() all nulls gives null.
    'refunds'         => "{$refunds} AS refunds",
    'taxes'            => "SUM({$table_name}.tax_total) AS taxes",
    'shipping'        => "SUM({$table_name}.shipping_total) AS shipping",
    'net_revenue'     => "SUM({$table_name}.net_total) AS net_revenue",
    'avg_order_value' => "SUM( {$table_name}.net_total ) / SUM( CASE WHEN {$table_name}.parent_id = 0 THEN 1 ELSE 0 END ) AS avg_order_value",
);

This produces incorrect results when orders are in different currencies. For example, in a store with EUR as the base currency, running SUM({$table_name}.total_sales) AS total_sales against orders for 100 EUR, 100 USD and 100 GBP, would return a total 300 EUR. That's obviously incorrect.

A relatively easy way to fix this issue would be to alter the columns returned by the SELECT, as follows:

'total_sales' => "SUM({$table_name}.total_sales* FX_RATE_META.meta_value) AS total_sales",
'coupons' => 'COALESCE( SUM(discount_amount * FX_RATE_META.meta_value), 0 ) AS coupons', // SUM() all nulls gives null.
'refunds' => "{$refunds} AS refunds",
'taxes' => "SUM({$table_name}.tax_tota * FX_RATE_META.meta_value) AS taxes",
'shipping' => "SUM({$table_name}.shipping_total * FX_RATE_META.meta_value) AS shipping",
'net_revenue' => "SUM({$table_name}.net_total * FX_RATE_META.meta_value) AS net_revenue",
'avg_order_value' => "SUM( {$table_name}.net_total * FX_RATE_META.meta_value ) / SUM( CASE WHEN {$table_name}.parent_id = 0 THEN 1 ELSE 0 END ) AS avg_order_value",

Note the addition of FX_RATE_META.meta_value to convert the amounts to the same currency. The value of FX_RATE_META.meta_value would be the exchange rate stored by the multi-currency plugin (preferably mine, Aelia FTW 😁), used to convert each value to its shop's base currency equivalent. With this trick, the values mentioned before would change as follows:

Proposed solution

Adding a filter to the return value of DataStore::assign_report_columns() should be sufficient to allow filtering the columns with the new ones, adding the necessary calculations.

Here's an example, applied to class Automattic\WooCommerce\Admin\API\Reports\Order\DataStore:

protected function assign_report_columns() {
    $table_name = self::get_db_table_name();
    // Avoid ambigious columns in SQL query. 
    $this->report_columns = apply_filters('wc_analytics_orders_report_columns', array(
        'order_id'         => "{$table_name}.order_id",
        'parent_id'        => "{$table_name}.parent_id",
        'date_created'     => "{$table_name}.date_created",
        'date_created_gmt' => "{$table_name}.date_created_gmt",
        'status'           => "REPLACE({$table_name}.status, 'wc-', '') as status",
        'customer_id'      => "{$table_name}.customer_id",
        'net_total'        => "{$table_name}.net_total",
        'total_sales'      => "{$table_name}.total_sales",
        'num_items_sold'   => "{$table_name}.num_items_sold",
        'customer_type'    => "(CASE WHEN {$table_name}.returning_customer = 1 THEN 'returning' WHEN {$table_name}.returning_customer = 0 THEN 'new' ELSE '' END) as customer_type",
    ), $table_name, $this);
}

Then, in my custom code, I write a filter for wc_analytics_orders_report_columns and replace the appropriate one. Example:

public function wc_analytics_orders_report_columns($columns, $table_name, $data_store) {
    // ORDERS_BCER stays for ORDERS_BASE_CURRENCY_EXCHANGE_RATE. It's an alias for
    // the post_meta table, where the exchange rate is stored
    $columns['net_total'] = "({$table_name}.net_total * ORDERS_BCER.meta_value)";
    $columns['total_sales'] = "({$table_name}.total_sales * ORDERS_BCER.meta_value)";
    return $columns;
}

With this trick, the totals become correct.

Notes

Alternatives considered

There isn't any, as far as I could see. πŸ€·β€β™‚οΈ

Should this be prioritized?

Kind of. Multi-currency sites are becoming ubiquitous and the Analytics still don't allow to handle them properly. Also, adding these filters can take some time, but it shouldn't be too difficult.

Additional context

The proposed solution relies on a strict assumption: all the tables used by the WC Admin tool must NOT store "global" aggregate data, but only order-specific data (i.e. that there is ALWAYS one row per order). That's absolutely vital. If there were any table with an aggregate figure, for example containing the sum of multiple orders (e.g. the "300" from my initial example), that would make it impossible to calculate correct the figures. That's because each order can have a different exchange rate, which can't be applied to a single total.

If any "summary" table is going to be introduced in the future, the multi-currency aspect must be taken into account before such feature is implemented. I will be happy to discuss this with your developers, so that the common pitfalls can be avoided. I have quite a bit of experience in this field (I'm the one who brought multi-currency to WooCommerce, 7 years ago), I know what lurks around the corner. πŸ™‚

psealock commented 4 years ago

Have you tried any of the SQL modification filters?

Here are some of them in use as part of a multi-currency extension example I created: https://github.com/woocommerce/woocommerce-admin/blob/master/docs/examples/extensions/sql-modification/woocommerce-admin-sql-modification.php

More on that in https://woocommerce.wordpress.com/2020/02/20/extending-wc-admin-reports/

Using these filters, you can manipulate each clause however you'd like: JOIN, SELECT, or WHERE clauses are fully available to be changed or appended.

daigo75 commented 4 years ago

I remember trying that a while ago, but for some reason it didn't work for me. I will have another look.

daigo75 commented 4 years ago

I reviewed again the filters for the SELECT clauses, I'm afraid that they won't do. The filter receives a clause with a single entry and a "blob" of text for the "core" SELECT clause. Example:

SELECT SUM(wp_wc_order_stats.net_total) AS net_revenue, SUM( wp_wc_order_stats.net_total ) / SUM( CASE WHEN wp_wc_order_stats.parent_id = 0 THEN 1 ELSE 0 END ) AS avg_order_value, SUM( CASE WHEN wp_wc_order_stats.parent_id = 0 THEN 1 ELSE 0 END ) as orders_count, SUM( wp_wc_order_stats.num_items_sold ) / SUM( CASE WHEN wp_wc_order_stats.parent_id = 0 THEN 1 ELSE 0 END ) AS avg_items_per_order, SUM(wp_wc_order_stats.num_items_sold) as num_items_sold, COALESCE( SUM(discount_amount), 0 ) AS coupons, COALESCE( coupons_count, 0 ) as coupons_count, ( COUNT( DISTINCT( wp_wc_order_stats.customer_id ) ) - COUNT( DISTINCT( CASE WHEN wp_wc_order_stats.returning_customer = 0 THEN wp_wc_order_stats.customer_id END ) ) ) AS num_returning_customers, COUNT( DISTINCT( CASE WHEN wp_wc_order_stats.returning_customer = 0 THEN wp_wc_order_stats.customer_id END ) ) AS num_new_customers

While this might be sufficient for the custom module you developed, which only added clauses to it, it's not suitable for a module that needs to modify the existing clauses.

The only way to alter the SELECT clause in this case would be by calling str_replace() against it, to replace some text with with other text. The "search and replace" approach can easily lead to errors, for example if the search string matches what it shouldn't. It's a bit like trying to translate the content of an HTML page after the page has been generated, rather than processing the raw data that goes into it. I would definitely prefer not to do that.

In terms of practical use, here's an example showing why it's easier and safer to replace the columns before they are used to build the SQL clause:

Original method DataStore::assign_report_columns() I left my new filter, for clarity.

$table_name = self::get_db_table_name();
// Avoid ambigious columns in SQL query.
$refunds     = "ABS( SUM( CASE WHEN {$table_name}.net_total < 0 THEN {$table_name}.net_total ELSE 0 END ) )";
$gross_sales =
    "( SUM({$table_name}.total_sales)" .
    ' + COALESCE( SUM(discount_amount), 0 )' . // SUM() all nulls gives null.
    " - SUM({$table_name}.tax_total)" .
    " - SUM({$table_name}.shipping_total)" .
    " + {$refunds}" .
    ' ) as gross_sales';
$this->report_columns = apply_filters('wc_analytics_orders_stats_report_columns', array(
    'orders_count'            => "SUM( CASE WHEN {$table_name}.parent_id = 0 THEN 1 ELSE 0 END ) as orders_count",
    'num_items_sold'          => "SUM({$table_name}.num_items_sold) as num_items_sold",
    'gross_sales'             => $gross_sales,
    'total_sales'             => "SUM({$table_name}.total_sales) AS total_sales",
    'coupons'                 => 'COALESCE( SUM(discount_amount), 0 ) AS coupons', // SUM() all nulls gives null.
    'coupons_count'           => 'COALESCE( coupons_count, 0 ) as coupons_count',
    'refunds'                 => "{$refunds} AS refunds",
    'taxes'                   => "SUM({$table_name}.tax_total) AS taxes",
    'shipping'                => "SUM({$table_name}.shipping_total) AS shipping",
    'net_revenue'             => "SUM({$table_name}.net_total) AS net_revenue",
    'avg_items_per_order'     => "SUM( {$table_name}.num_items_sold ) / SUM( CASE WHEN {$table_name}.parent_id = 0 THEN 1 ELSE 0 END ) AS avg_items_per_order",
    'avg_order_value'         => "SUM( {$table_name}.net_total ) / SUM( CASE WHEN {$table_name}.parent_id = 0 THEN 1 ELSE 0 END ) AS avg_order_value",
    // Count returning customers as ( total_customers - new_customers ) to get an accurate number and count customers in with both new and old statuses as new.
    'num_returning_customers' => "( COUNT( DISTINCT( {$table_name}.customer_id ) ) -  COUNT( DISTINCT( CASE WHEN {$table_name}.returning_customer = 0 THEN {$table_name}.customer_id END ) ) ) AS num_returning_customers",
    'num_new_customers'       => "COUNT( DISTINCT( CASE WHEN {$table_name}.returning_customer = 0 THEN {$table_name}.customer_id END ) ) AS num_new_customers",
), $table_name, $this);

Replacement with a filter for the "raw" columns

$refunds = "ABS( SUM( CASE WHEN {$table_name}.net_total < 0 THEN {$table_name}.net_total * ORDERS_BCER.meta_value ELSE 0 END ) )";
$gross_sales =
"( SUM({$table_name}.total_sales * ORDERS_BCER.meta_value)" .
' + COALESCE( SUM(discount_amount * ORDERS_BCER.meta_value), 0 )' . // SUM() all nulls gives null.
" - SUM({$table_name}.tax_total * ORDERS_BCER.meta_value)" .
" - SUM({$table_name}.shipping_total * ORDERS_BCER.meta_value)" .
" + {$refunds}" .
' ) as gross_sales';

return array_merge($columns, array(
    'gross_sales' => $gross_sales,
    'total_sales' => "SUM({$table_name}.total_sales* ORDERS_BCER.meta_value) AS total_sales",
    'coupons' => 'COALESCE( SUM(discount_amount * ORDERS_BCER.meta_value), 0 ) AS coupons', // SUM() all nulls gives null.
    'refunds' => "{$refunds} AS refunds",
    'taxes' => "SUM({$table_name}.tax_tota * ORDERS_BCER.meta_value) AS taxes",
    'shipping' => "SUM({$table_name}.shipping_total * ORDERS_BCER.meta_value) AS shipping",
    'net_revenue' => "SUM({$table_name}.net_total * ORDERS_BCER.meta_value) AS net_revenue",
    'avg_order_value' => "SUM( {$table_name}.net_total * ORDERS_BCER.meta_value ) / SUM( CASE WHEN {$table_name}.parent_id = 0 THEN 1 ELSE 0 END ) AS avg_order_value",
));

This has several advantages:

  1. It's immediately clear which columns are being modified, for for which table.
  2. There's no need to parse the SQL to find the name table (e.g. to extract wp_wc_order_stats.net_total from the blob of text)
  3. There's no risk to match the wrong piece of text and replace it with a different clause, like it could happen with a str_replace() call.

Conclusion

As it is, the filter for the SELECT clauses (i.e. woocommerce_analytics_clauses_select_<report>_subquery) is not suitable to modify the query. Perhaps, if it were modified to return "raw" data, such as separate columns and the $table_name, which are available in method assign_report_columns(), that would make things easier. If that's not possible, adding a filter directly to assign_report_columns() is still a good option, in my opinion.

daigo75 commented 4 years ago

It seems that Github "went nuts" and duplicated my last comment a dozen times. I'm going to delete the extra ones.

rrennick commented 4 years ago

The filter receives a clause with a single entry and a "blob" of text for the "core" SELECT clause. > Example:

SELECT SUM(wp_wc_order_stats.net_total) AS net_revenue, SUM( wp_wc_order_stats.net_total ) / SUM( CASE WHEN wp_wc_order_stats.parent_id = 0 THEN 1 ELSE 0 END ) AS avg_order_value, 
....

@daigo75 Which filter are you using? Both filters in https://github.com/woocommerce/woocommerce-admin/blob/master/src/API/Reports/SqlQuery.php#L116:L122 pass arrays of clauses.

daigo75 commented 4 years ago

@rrennick as indicated in my last reply, filter woocommerce_analytics_clauses_select_<report>_subquery receives an array, which contains exactly one element. That element is the whole SELECT clause, i.e. the "blob" of text.

That is:

add_filter('woocommerce_analytics_clauses_select_orders_subquery`, function($clauses) {
  // $clauses contains only one entry. $clauses[0] contains the whole SELECT clause, as a single block of text
  // The $table_name variable, which is important, is absent, and so is the link between the original
  // columns and the corresponding SELECTed field
  /* SUM(wp_wc_order_stats.net_total) AS net_revenue, SUM( wp_wc_order_stats.net_total ) / SUM( CASE WHEN wp_wc_order_stats.parent_id = 0 THEN 1 ELSE 0 END ) AS avg_order_value, SUM( CASE WHEN wp_wc_order_stats.parent_id = 0 THEN 1 ELSE 0 END ) as orders_count, SUM( wp_wc_order_stats.num_items_sold ) / SUM( CASE WHEN wp_wc_order_stats.parent_id = 0 THEN 1 ELSE 0 END ) AS avg_items_per_order, SUM(wp_wc_order_stats.num_items_sold) as num_items_sold, COALESCE( SUM(discount_amount), 0 ) AS coupons, COALESCE( coupons_count, 0 ) as coupons_count, ( COUNT( DISTINCT( wp_wc_order_stats.customer_id ) ) - COUNT( DISTINCT( CASE WHEN wp_wc_order_stats.returning_customer = 0 THEN wp_wc_order_stats.customer_id END ) ) ) AS num_returning_customers, COUNT( DISTINCT( CASE WHEN wp_wc_order_stats.returning_customer = 0 THEN wp_wc_order_stats.customer_id END ) ) AS num_new_customers
  */

  return $clauses;
});

As explained, I can't use a search and replace approach to alter the query, because it's too easy to introduce errors. Due to that, the woocommerce_analytics_clauses_select_<report>_subquery filter doesn't help.

daigo75 commented 4 years ago

Here's the logic, method DataStore::get_data( $query_args ).

public function get_data( $query_args ) {
    // Some stuff...

    // This call returns a single value, which is a blob of text with all the SELECT,
    // columns, separated by comma. I need to alter that BEFORE such string is built
    $selections = $this->selected_columns( $query_args );

    // Some stuff...

    $this->subquery->clear_sql_clause( 'select' );

    // This call adds a single entry to the 'select' clauses, which is the blob of text I mentioned
    $this->subquery->add_sql_clause( 'select', $selections );

    // Rest of the code
}

Here's the logic, method DataStore::selected_columns( $query_args ).

/**
 * Returns a list of columns selected by the query_args FORMATTED AS A COMMA SEPARATED STRING.
 *
 * @param array $query_args User-supplied options.
 * @return string
 */
protected function selected_columns( $query_args ) {
    // This method takes the fields from $this->report_columns, which is what I must change
    $selections = $this->report_columns;

    if ( isset( $query_args['fields'] ) && is_array( $query_args['fields'] ) ) {
        $keep = array();
        foreach ( $query_args['fields'] as $field ) {
            if ( isset( $selections[ $field ] ) ) {
                $keep[ $field ] = $selections[ $field ];
            }
        }
        $selections = implode( ', ', $keep );
    } else {
        $selections = implode( ', ', $selections );
    }
    return $selections;
}
rrennick commented 4 years ago

I need to be able to extend the reports generated by the WC Admin so that it takes different values.

@daigo75 After giving this discussion some thought I realized that there is a distinction between extend and modify. You are attempting to modify queries using filters designed to allow the queries to be extended.

The issue with modifying these queries is that a modified query may work fine when the extension modifying it is the only extension active. But in a year or two's time when there are 50 extensions attempting to modify these queries there are bound to be conflicts that will break the core reports.

If you need to modify the columns then the data store classes are extendable. You could extend the class in your plugin and include your column definitions in your classes assign_report_columns().

daigo75 commented 4 years ago

Sorry if I come across as harsh, but I disagree with this statement:

The issue with modifying these queries is that a modified query may work fine when the extension modifying it is the only extension active. But in a year or two's time when there are 50 extensions attempting to modify these queries there are bound to be conflicts that will break the core reports.

Any filter or action can be used by multiple plugins, the risk of conflicts is always present. Anyone can abuse a filter or a method. I could already write a filter for the unsuitable filter woocommerce_analytics_clauses_select_<report>_subquery and return a bunch of random fields, breaking all the sales reports.

To put it bluntly, you don't have to worry about what other developers do with the filters offered by WooCommerce, or the WooCommerce Admin tool. Whoever writes the filter will be responsible for the changes. Due to that, I can't accept that as a justification for not adding a simple filter to the value returned by a method.

I also disagree with the suggestion of extending a whole class, just to change a bunch of columns. In fact, it can cause even more conflicts, when multiple plugins, or custom code decide to override the same class multiple times. Everyone will have a "I need to use my custom class" approach, causing far more issues. I already quite a bit of experience with that to know that it's a bad idea, having done it for years in WooCommerce 1.6, all the way up to 2.1, when there were no filters to extend the sales reports. At the time, I had to write a dozen of subclasses for the reports, each one overriding whole methods to just change a couple of fields. Add to that the presence of private properties and methods scattered all over the place, which had to be copy/pasted into the descendant classes (which, as you know, are not accessible by the descendants), and you get an idea of what mess it became.

In WC 2.2, they added filter _woocommerce_reports_get_order_report_dataargs, and it allowed to change the whole query used by core sales report (see the similarity?), including the fields for the SELECT clause, before such clause was generated. I could finally delete all the subclasses, and replaced them with a single filter, which worked with all reports. The same filter has been in place, untouched, since WooCommerce 2.2. I didn't need to touch that filter ever since.

To summarise, extending the core reports in WC 2.1 and earlier involved replacing whole classes with custom ones, and it was a mess. This changed for the better in WC 2.2, with the introduction of a simple filter. Sorry, but I can't accept a workaround that involves going back to the clunky approach of class overrides.

I still don't understand what the difficulty in adding my proposed filters is, it seems a reasonable request to me. If that's not going to happen, I will just have to inform my users that, due to limitations in the design of the Analytics, some features can't be added to them like it was done with the old sales reports.

rrennick commented 4 years ago

In fact, it can cause even more conflicts, when multiple plugins, or custom code decide to override the same class multiple times.

Sorry for the confusion. In my comment I wasn't suggesting that you replace the class. If an extension needs a report with different aggregate columns then creating a new report would be the route to avoid conflicts.

daigo75 commented 4 years ago

Thanks for the clarification. However, I don't want to create a new report, I want to fix the existing ones, just like I do with the core sales reports. Hence my request for a filter that can allow that, without changing anything in the existing logic. Sorry for insisting, but adding a filter still seems a very reasonable request to me.

To reiterate the reason behind my request, I received dozens of requests to fix the report so that it stops aggregating totals for orders in different currencies. As indicated in my original post:

[The default logic] produces incorrect results when orders are in different currencies. For example, in a store with EUR as the base currency, running SUM({$table_name}.total_sales) AS total_sales against orders for 100 EUR, 100 USD and 100 GBP, would return a total 300 EUR. That's obviously incorrect.

That's a major issue for any customer who runs a multi-currency shop (which means a lot of them) and has to be fixed.

The "filter by currency" feature in your examples doesn't solve the issue. It allows to run a partial report, by showing a total for EUR orders, USD orders and GBP orders. As per my example, each one of them would show "100" as the total. However, that's not what the customers need. They need to see a grand total, including all orders, in all currencies, but, obviously, with the correct conversion applied. Again, from my original post:

This is not a new report, it's a fix to make all core reports work correctly in a multi-currency environment. I'm not looking for different aggregate columns, the columns are the same. It's the exact, identical, same report, just with a conversion calculation applied to each amount.

Back to my previous conclusion: a filter will suffice, and it's easy to add. I managed to add it myself, and I understand a fraction of what the WC Admin plugin does. πŸ˜… There's no need to override classes and reports.

What if someone else (ab)uses the filter?

As I said, that's not something you need to worry about. As always, when a plugin introduces features or changes, its authors are responsible for fixing issues. Compatibility issues and conflicts will always exist. In fact, quite a few of the tasks I work on I get are related to conflicts caused by plugins I never heard of, which were not designed to work with one or more of the plugins installed on a site. It's a risk that we will have to take.

daigo75 commented 4 years ago

Just a "ping" to see if there are any further questions about this improvement. I can send a pull request, if needed.