Open anthosz opened 4 years ago
It seems that indeed, all separators are managed by slash in https://github.com/matomo-org/matomo/blob/3.14.1/plugins/SitesManager/SiteUrls.php
I don't know if you have something like a patch to allow other separator?
@anthosz If I understand things correct what you are after then you're wanting to only match paths where the path starts with /!*
vs currently Matomo would only support excluding URLs where the path is `/!/*? Do I understand this right?
This would be kind of on purpose currently if I understand things correctly since for Matomo there's currently no way to differentiate which behaviour someone expects.
@anthosz Yes, that's what I would like, have the possibility to also take into account "/!*"
A simple way can be to compare if import_url (url in log or request) like url (instead to force url/) -> use this website Also add an option to disable this behavior by default (so no impact on existing instance) and allow to enable it on demand
The bonus can be to allow regex in url (not related to this issue but can be usefull if someone want to use another separator (like "/(!|&)") ^^
Thanks @anthosz I've updated the title to make it a bit more clear for us. Generally we would likely only be able to support some simply wildcards like *
(if that's even possible) as I think we're sometimes might be using the site URLs also for other purposes maybe. To be checked.
Do I see this right it might already help if the include-path
parameter in the log importer would support this in your case( eg include-path='/!*'
)?
@tsteur yes and no, currently seems to works if we also specify the site ID but the issue is that in this case, we need to execute multiple time the imports_logs and it is slow (especially when we have more 10 millions of lines of logs to parse and multiples websites)
We got another request for this feature today.
The user would like to be able to use a wildcard for subdomains, for example:
https://*.example.org
instead of having to specify every subdomain individually.
Maybe even with the ability to use regular expressions, similar to the field "Excluded User Agents".
be patient :)
We have another request from a Matomo user for this feature today.
Hey. Was support for regular expressions for website URLs added? My colleague assures me this used to work, however I can't seem to get it going myself, so it would be nice if you could confim.
Thanks
Hello,
I have an issue when I want to use import_logs.py and I check "Only track visits and actions when the action URL starts with one of the above URLs." once I use url like example.com/! (for a shortener url). My goal is to create a website with a report for all url/pages starting with "/!*".
Example: Url (tried also with https & a * and the end): http://example.com/!
Scenario 1 (doesn't works): Enabled: Only track visits and actions when the action URL starts with one of the above URLs. Log:
example.com X.X.X.X [21/Sep/2020:14:30:01 +0200] "GET /!abcd" 200
./import_logs.py --idsite=1 --url='http://example.com/piwik/' --recorders=3 --log-format-regex="(?P<host>\S+) (?P<ip>\S+) \[(?P<date>.*?) (?P<timezone>.*?)\] \"\S+ (?P<path>.*?)\" (?P<status>\S+)" access.log
-> Nothing new in log_link_visit_action tableScenario 2 (works): Disabled: Only track visits and actions when the action URL starts with one of the above URLs. Log:
example.com X.X.X.X [21/Sep/2020:14:30:01 +0200] "GET /!abcd" 200
./import_logs.py --idsite=1 --url='http://example.com/piwik/' --recorders=3 --log-format-regex="(?P<host>\S+) (?P<ip>\S+) \[(?P<date>.*?) (?P<timezone>.*?)\] \"\S+ (?P<path>.*?)\" (?P<status>\S+)" access.log
-> New entry in log_link_visit_action tableScenario 3 (works): Disabled: Only track visits and actions when the action URL starts with one of the above URLs. Log:
example.com X.X.X.X [21/Sep/2020:14:30:01 +0200] "GET /!abcd" 200
./import_logs.py --idsite=1 --url='http://example.com/piwik/' --recorders=3 --log-format-regex="(?P<host>\S+) (?P<ip>\S+) \[(?P<date>.*?) (?P<timezone>.*?)\] \"\S+ (?P<path>.*?)\" (?P<status>\S+)" --hostname=example.com --include-path='/!*' access.log
-> New entry in log_link_visit_action table (so it works if I force the path in import_logs but not in matomo -> I need to launch several time the import_logs in this case)In this case, my goal is not to use a path separated by slash (/) but by exclamation mark "!".
If you need more informations, doesn't hesitate.
Thank you!