matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.88k stars 2.65k forks source link

[Bug] Not able to build Segments on CustomDimensions #22253

Open jorgeuos opened 5 months ago

jorgeuos commented 5 months ago

What happened?

I have created a Segment, looks like this:

mysql> SELECT * 
    -> FROM matomo_segment\G
*************************** 1. row ***************************
         idsegment: 1
              name: NameOfSegment
        definition: dimension1==quepapoloco
              hash: 2aae533a0383136ee4f1d30576bca987
             login: jorgeuos
  enable_all_users: 1
enable_only_idsite: 1
      auto_archive: 1
        ts_created: 2024-05-10 10:00:16
      ts_last_edit: 2024-05-15 13:07:00
           deleted: 0
*************************** 2. row ***************************
         Other rows...
7 rows in set (0.02 sec)

mysql> 

And I want to invalidate past dates to run the archiver on these segments:

./console core:invalidate-report-data --dates=2024-05-01,2024-05-24 --segment=1
ERROR     [19:30:44] 21673  Uncaught exception: /var/www/html/core/Segment.php(253): Segment 'dimension1' is not a supported segment. [Query: , CLI mode: 1]

In Segment.php line 253:

  Segment 'dimension1' is not a supported segment.  

core:invalidate-report-data [--dates DATES] [--sites SITES] [--periods PERIODS] [--segment SEGMENT] [--cascade] [--dry-run] [--plugin PLUGIN] [--ignore-log-deletion-limit]

And if I use the Segment definition:

./console core:invalidate-report-data --dates=2024-05-01,2024-05-24 --segment=dimension1==quepapoloco 
ERROR     [19:37:24] 21682  Uncaught exception: /var/www/html/core/Segment.php(253): Segment 'dimension1' is not a supported segment. [Query: , CLI mode: 1]

In Segment.php line 253:

  Segment 'dimension1' is not a supported segment.  

core:invalidate-report-data [--dates DATES] [--sites SITES] [--periods PERIODS] [--segment SEGMENT] [--cascade] [--dry-run] [--plugin PLUGIN] [--ignore-log-deletion-limit]

And if I invalidate without a segment definition or id and then rerun the core:archive, I just see rows like:

INFO      [20:34:09] 22708  Archived website id 1, period = week, date = 2024-05-06, segment = 'dimension1==quepapoloco', 0 visits found. Time elapsed: 0.766s

But I can see that there is data in that custom_dimension:

mysql> SELECT 
    -> COUNT(mlv.idvisit)
    -> FROM matomo.matomo_log_visit mlv
    -> WHERE mlv.idsite = 1
    ->     AND mlv.custom_dimension_1="quepapoloco"
    -> AND mlv.visit_last_action_time BETWEEN "2024-05-01" AND "2024-05-24";
+--------------------+
| COUNT(mlv.idvisit) |
+--------------------+
|              23312 |
+--------------------+
1 row in set (0.32 sec)

Why is this happening? I have tried with multiple segments and custom dimensions. I have tried with numerical values and strings. I have tried with building the segment in the UI and with API calls: Operator Behavior Example == Equals &segment=dimension1==quepapoloco != Not equals &segment=dimension1!= (E.g. Not empty) =@ Contains &segment=dimension1=@quepapo =^ Starts with &segment=dimension1=^quepapo

I have tried with custom dimensions in both visits scope and action scope. I don't know why it's not working but the error refers to this line:

    private function getSegmentByName($name)
    {
        $segments = $this->getAvailableSegments();

        if (array_key_exists($name, $segments)) {
            if ($segments[$name] === null) {
                throw new NoAccessException("You do not have enough permission to access the segment " . $name);
            }

            return $segments[$name];
        }

        throw new Exception("Segment '$name' is not a supported segment.");
    }

And if I follow that getAvailableSegments() function, I can see that it's calling a PiwikCache::getTransientCache(); function, so I thought it might be an issue with that. I've tried clearing the cache too:

# Both:
./console cache:clear
# And:
./console core:clear-caches

But nothing changes

Config relevant to the segments:

./console config:get --section=General --format=text | grep segment
enable_segments_cache = 1
anonymous_user_enable_use_segments_API = 1
enable_create_realtime_segments = 1
enable_segment_suggested_values = 1
adding_segment_requires_access = view
allow_adding_segments_for_all_websites = 1
process_new_segments_from = beginning_of_time
disable_archiving_segment_for_plugins =
pivot_by_filter_enable_fetch_by_segment = 0
data_comparison_segment_limit = 5
rearchive_reports_in_past_exclude_segments = 0

Any thoughts on why this is happening or is it not possible to use custom_dimension_x with segments?

What should happen?

I wan't to add a custom dimensions for a visits, so that a session can be tracked and filtered with from the Segment editor.

How can this be reproduced?

Add a Custom dimension with visit scope, populate the custom_dimension_x column for the matching rows, invalidate past dates, run core archiving.

Matomo version

5.0.2-5.0.3

PHP version

8.2.19

Server operating system

Alpine Linux 3.19

What browsers are you seeing the problem on?

Chrome

Computer operating system

MacOS

Relevant log output

ERROR     [21:08:02] 28942  Uncaught exception: /var/www/html/core/Segment.php(253): Segment 'dimension1' is not a supported segment. [Query: , CLI mode: 1]

In Segment.php line 253:

  Segment 'dimension1' is not a supported segment.  

core:invalidate-report-data [--dates DATES] [--sites SITES] [--periods PERIODS] [--segment SEGMENT] [--cascade] [--dry-run] [--plugin PLUGIN] [--ignore-log-deletion-limit]

Validations

sgiehl commented 5 months ago

@jorgeuos Custom Dimensions are site specific, so maybe it fails as the segment isn't available globally. We need to investigate that. As a workaround: Did you try running the invalidation with --sites=1, to limit it to the site where it is available?

sgiehl commented 5 months ago

I've quickly reproduced that locally. It is indeed a problem around the segment not being available globally, but only site specific. We could rework the command to automatically skip sites where a segment is not available and print an info in that case.

The suggested workaround from above works as expected, so this might not need to be highest priority.

jorgeuos commented 5 months ago

The workaround looks promising!

I'll continue to update some sites and if I hit any roadblocks, I'll update here.

jorgeuos commented 5 months ago

I'm not sure, but I'm guessing that there's a bug with how custom dimensions are used in segments. I believe that in the segment, it's looking for the custom dimension by the id, but when the custom dimension log table is created, it's looking for the index.

Because I just ran:

./console core:invalidate-report-data --dates=2024-02-01,2024-05-28 --sites=30 --segment=15
./console core:archive --force-idsegments=15 --force-idsites=30

And all of a sudden, another custom dimension is being used in the segment.

Could this be the issue?

File: matomo/plugins/CustomDimensions/Dao/LogTable.php

    public static function buildCustomDimensionColumnName($indexOrDimension)
    {
        if (is_array($indexOrDimension) && isset($indexOrDimension['index'])) { // <-- indexOrDimension
            $indexOrDimension = $indexOrDimension['index'];
        }

        $indexOrDimension = (int) $indexOrDimension;

        if ($indexOrDimension >= 1) {
            return 'custom_dimension_' . (int) $indexOrDimension;
        }
    }

File: matomo/plugins/CustomDimensions/DataTable/Filter/AddSegmentMetadata.php

    /**
     * @param DataTable $table
     */
    public function filter($table)
    {
        $dimension = CustomDimensionsRequestProcessor::buildCustomDimensionTrackingApiName($this->idDimension);

        foreach ($table->getRows() as $row) {
            $label = $row->getColumn('label');
            if ($label !== false) {
                if ($label === Archiver::LABEL_CUSTOM_VALUE_NOT_DEFINED) {
                    $label = '';
                }
                $row->setMetadata('segment', $dimension . '==' . urlencode($label));
            }

            $subTable = $row->getSubtable();
            if ($subTable) {
                $subTable->filter('Piwik\Plugins\CustomDimensions\DataTable\Filter\AddSubtableSegmentMetadata', array($this->idDimension, $label));
            }
        }
    }