matomo-org / plugin-GoogleAnalyticsImporter

Google Analytics to Matomo importer
24 stars 14 forks source link

Missing page views due to page size limit defaulting to 1000 #337

Open shanerutter-kempston opened 1 year ago

shanerutter-kempston commented 1 year ago

Finding that matomo is massivly under reporting compared to GA. Found that if I do a export of the page URLs from mataomo there always seems to be a maximum of 1000 unique pages, but in analytics we have 20k unique page urls for that same day.

I have done some checking of the analytics API and done some quick testing and it appears the reporting API defaults to a page size of 1000 results. I made a quick modification in the following file Google\GoogleQueryObjectFactory.php after line 58 I added $request->setPageSize(100000); and did a quick import and can now see its pulling all unique url page views through.

However it only gets a couple days, maybe a months of data before it crashes.

AltamashShaikh commented 1 year ago

@shanerutter-kempston are you using the latest version of the plugin v4.4.6 ? We did fix this issue with #329, what error do you get ?

shanerutter-kempston commented 1 year ago

I see, Ive updated to that version now, but still getting the same error after its processed a couple days of data. I found if I wait an hour or so and then continue the import it works fine for another couple days then fails again with the same issue.

Error message: Error on day 2023-01-18, { "error": { "code": 401, "message": "Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.", "errors": [ { "message": "Invalid Credentials", "domain": "global", "reason": "authError", "location": "Authorization", "locationType": "header" } ], "status": "UNAUTHENTICATED" } } These errors are unexpected and will likely continue every time you run the import on this day. To resolve this issue, please [ask on the forums](https://forum.matomo.org/). If you can provide access to your GA account to a member of Matomo's support team it will provide a quicker resolution.

AltamashShaikh commented 1 year ago

@shanerutter-kempston Can you check the token grant rate graph ? Screenshot from 2023-01-26 11-51-28

It will be in your OAuth consent screen

AltamashShaikh commented 1 year ago

I am assuming the rate limits are the cause of this issue

AltamashShaikh commented 1 year ago

@shanerutter-kempston Can you confirm your Oauth app is internal/external ? If external can you try publishing it by following this doc and reauthorizing and checking again ?

shanerutter-kempston commented 1 year ago

@AltamashShaikh its an external app, I have just published it. and Left it running for a couple hours. It pulled through more data but eventually ended with the same error. Pictures of the screens requested.

Error message: Error on day 2022-11-15, { "error": { "code": 401, "message": "Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.", "errors": [ { "message": "Invalid Credentials", "domain": "global", "reason": "authError", "location": "Authorization", "locationType": "header" } ], "status": "UNAUTHENTICATED" } } These errors are unexpected and will likely continue every time you run the import on this day. To resolve this issue, please [ask on the forums](https://forum.matomo.org/). If you can provide access to your GA account to a member of Matomo's support team it will provide a quicker resolution.

image

image

image

AltamashShaikh commented 1 year ago

@shanerutter-kempston I am still checking why we would get this error after running import for few hours, how much data does it import before throwing error any idea ?

I am trying to reproduce the same..but unable to reproduce it, can you maybe run the import with verbose logging and share the log file here? ./console googleanalyticsimporter:import-reports --idsite={YOUR_IMPORT_ID_SITE} -vvv

shanerutter-kempston commented 1 year ago

@AltamashShaikh Nothing which indicates a problem, other than the google API responding with invalid credentials, I have gotten around the issue by setting up a cronjob to run the CLI import command each hour, its managed to import a years worth of data so far.

AltamashShaikh commented 1 year ago

@shanerutter-kempston have you set ./console googleanalyticsimporter:import-reports --idsite={YOUR_IMPORT_ID_SITE} -vvv like this to run every hour and it works without any error ?

shanerutter-kempston commented 1 year ago

Without the -vvv but yes, it appears google api every now and again rejects the credentials but if you setup the CLI to run every hour, it will continue the import again and google will accept the same credentials again without issue, its a strange issue.... Its not the best way, but its at least getting my data downloaded now.

AltamashShaikh commented 1 year ago

@shanerutter-kempston Strange, if you have already setup an archiving cron, then there is already a task which runs every hour and you don't need to do this separately.

This is the guide to set up auto archiving cron, which will trigger this task