Open snake14 opened 6 months ago
@AltamashShaikh Do these changes look like they should work to you?
@snake14 No change, unfortunately. I'm pasting output from the console below. Please let me know if there are any specific debugging steps I could try that would help. I'll poke at it some today and see if I can figure anything out.
I did confirm that setting the https_proxy
env var before a console run allows OAuth to succeed and the import to begin. I can use that to get around this issue for our current use case, but I still want to help with getting this PR to release.
/var/www/html# php ./console googleanalyticsimporter:import-ga4-reports --property="properties/XXXXXXXXX" --dates=2023-07-01,2024-01-01
ERROR [2024-04-29 16:25:13] 124 Uncaught exception:
/var/www/html/plugins/GoogleAnalyticsImporter/vendor/prefixed/guzzlehttp/guzzle/src/Handler/CurlFactory.php(146):
cURL error 28: Failed to connect to oauth2.googleapis.com port 443 after 300000 ms: Timeout was reached (see
https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://oauth2.googleapis.com/token [Query: , CLI mode: 1]
In CurlFactory.php line 146:
cURL error 28: Failed to connect to oauth2.googleapis.com port 443 after 300000 ms: Timeout was reached (see https:
//curl.haxx.se/libcurl/c/libcurl-errors.html) for https://oauth2.googleapis.com/token
For the record, with this PR the GA3 import still works, no regression. With a little debug I can see that the new code in AuthorizationGA4.php getClientClassArguments()
is setting what looks like a properly configured Guzzle client (pasted below). I'm trying to figure out a way to verify that the passing authHttpHandler
to BetaAnalyticsDataClient
and AnalyticsAdminServiceClient
actually has the desired effect.
\\Matomo\\Dependencies\\GoogleAnalyticsImporter\\GuzzleHttp\\Client::__set_state(array(
'config' =>
array (
'proxy' => 'http://proxy.foo.com:3128',
'exceptions' => false,
'base_uri' =>
\\Matomo\\Dependencies\\GoogleAnalyticsImporter\\GuzzleHttp\\Psr7\\Uri::__set_state(array(
'scheme' => 'https',
'userInfo' => '',
'host' => 'www.googleapis.com',
'port' => NULL,
'path' => '',
'query' => '',
'fragment' => '',
'composedComponents' => NULL,
)),
'handler' =>
\\Matomo\\Dependencies\\GoogleAnalyticsImporter\\GuzzleHttp\\HandlerStack::__set_state(array(
'handler' =>
\\Closure::__set_state(array(
)),
'stack' =>
array (
0 =>
array (
0 =>
\\Closure::__set_state(array(
)),
1 => 'http_errors',
),
1 =>
array (
0 =>
\\Closure::__set_state(array(
)),
1 => 'allow_redirects',
),
2 =>
array (
0 =>
\\Closure::__set_state(array(
)),
1 => 'cookies',
),
3 =>
array (
0 =>
\\Closure::__set_state(array(
)),
1 => 'prepare_body',
),
),
'cached' => NULL,
)),
'allow_redirects' =>
array (
'max' => 5,
'protocols' =>
array (
0 => 'http',
1 => 'https',
),
'strict' => false,
'referer' => false,
'track_redirects' => false,
),
'http_errors' => true,
'decode_content' => true,
'verify' => true,
'cookies' => false,
'idn_conversion' => false,
'headers' =>
array (
'User-Agent' => 'GuzzleHttp/7',
),
),
))
Thank you @machinehum . Looking at the GuzzleHttp\Client you output, I'd say that looks correct. What does the https://github.com/matomo-org/plugin-GoogleAnalyticsImporter/blob/cad809ff8b9c7dddf13ba24607080f3a17a7152b/vendor/prefixed/google/gax/src/CredentialsWrapper.php#L110 look like if you comment out line 52 where authHttpHandler gets set in the arguments? I'm wondering if maybe I used the wrong base_uri
value for GA4?
@snake14 Hm. In CredentialsWrapper.php
at line 110 I dumped the value of $args['authHttpHandler']
, and then of $authHttpHandler
after it is set. $args['authHttpHandler']
is alway null and $authHttpHandler
is always a vanilla GuzzleHttp\Client instance, without the proxy or other custom settings. This is true regardless of whether line 52 in AuthorizationGA4.php
is commented out or not.
I did confirm that the conditional on line 51-53 is being entered. So it seems like the new form of setting the arguments isn't actually passing them through? I'm going to try some more debugging. I suppose you could do similar even without a proxy to see whether this version of AuthorizationGA4.php
is effectively passing the credentials argument or not.
With the change in this PR to AuthorizationGA4, the authHttpHandler
argument is being set in an array next to \Matomo\Dependencies\GoogleAnalyticsImporter\Google\ApiCore\CredentialsWrapper, but from looking at the Google API code it looks like it needs to be an argument to the credentials wrapper builder, alongside keyFile
. This change to AuthorizationGA4::getClientClassArguments():
$arguments = [
'keyFile' => $this->getClientConfiguration()
];
$proxyHttpClient = StaticContainer::get('GoogleAnalyticsImporter.proxyHttpClient');
if ($proxyHttpClient) {
$proxyHttpHandler = \Matomo\Dependencies\GoogleAnalyticsImporter\Google\Auth\HttpHandler\HttpHandlerFactory::build($proxyHttpClient);
$arguments['authHttpHandler'] = $proxyHttpHandler;
}
$credentialWrapper = \Matomo\Dependencies\GoogleAnalyticsImporter\Google\ApiCore\CredentialsWrapper::build($arguments);
return ['credentials' => $credentialWrapper];
}
results in CredentialsWrapper showing what looks like a good GuzzleHttp\Client object. However, the initial OAuth request still isn't using the proxy. The BetaAnalyticsDataGapicClient and AnalyticsAdminServiceGapicClient builders also allows for passing in a set of config arguments via credentialsConfig
, as opposed to a preconstructed CredentialWrapper object. I tried that and got the same result. I'll keep poking at it.
Nice find @machinehum . I meant to pass the handler into the CredentialsWrapper
constructor. I went ahead and updated the PR with the latest changes that you shared. It looks like maybe passing the transport
argument similar to credentials
might work? I found mention of it in Google's documentation.
@snake14 Yeah, I did try the transport config eventually as well. That didn't make any difference either. Ended up spending 9 hours debugging this today. The documentation on the google PHP API side seems a little scattered. I did see that some of the modules of google-cloud-php, like BigTables, implement their own proxy settings. This makes me suspicious about whether the "official" solution is fully tested and working. Most of the threads I can find on it say "just use environment variables", heh. I opened an issue at https://github.com/googleapis/google-cloud-php/issues/7274 to hopefully get some guidance.
Thank you for all of your effort @machinehum ; it's greatly appreciated. I agree that it's hard to find a clear answer in Google's documentation. Pretty much all of my searches for using a proxy with Google's PHP API classes ended up pointing to answers for UA and not GA4.
Hi @machinehum . I tried to follow the input that was provided on the issue you created. I tried to test the changes, but got some Guzzle error. If you get a chance to try out the latest changes and let me know how it goes, it would be greatly appreciated.
Hi @machinehum . Reading your most recent comment, it looks like you received the same error that I did when I tried testing the suggested workaround. I'll keep watching that thread :+1:
@machinehum . I made the most recently suggested change and it appears to fix the Guzzle error. Does this PR work correctly for your proxy now?
@snake14 Thanks for keeping an eye on it! I'm continuing to test. Unfortunately it still doesn't use the proxy with your newest commits. But, I have some new insight.
As recommended in that issue I created a test script that authenticates and makes a GET request to an API endpoint directly with the Guzzle client, without using the PHP libraries. I modified it slightly to use an OAuth client (like the importer does) instead of a service account. That test worked appropriately with and without the proxy. I tried inserting the same code into ImportGA4Reports::doExecute()
and invoking it via starting an import from the console, and the import timed out on an initial OAuth call.
So, this points to an issue in the importer context. I did more exhaustive tests on the importer code, and found something interesting. I tried hard-coding client setup (just to rule out any issues in the chain of configs loading), and an AnalyticsAdmin API call in AuthorizationGA4
, and in ImportGA4Reports::doExecute()
, like this:
'client_id' => "redacted.apps.googleusercontent.com",
'client_secret' => "redacted",
'refresh_token' => 'redacted'];
$client = new Client([
'proxy' => 'http://proxyhost.com',
'verify' => false,
'exceptions' => false,
'base_uri' => 'https://www.googleapis.com',
]);
$proxyHttpHandler = HttpHandlerFactory::build($client);
$arguments['credentialsConfig'] = ['authHttpHandler' => $proxyHttpHandler, 'keyFile' => $clientConfig];
$arguments['transport'] = 'rest';
$arguments['transportConfig'] = ['rest' => ['httpHandler' => [$proxyHttpHandler, 'async']]];
$adminClient = new AnalyticsAdminServiceClient([
// pass the HTTP handler in to the "credentialsConfig" option
'credentialsConfig' => ['keyFile' => $clientConfig, 'authHttpHandler' => $proxyHttpHandler],
// ensure we are using "rest" transport (gRPC is the default when the grpc.so extension is enabled)
'transport' => 'rest',
// set the http handler on the RestTransport
'transportConfig' => ['rest' => ['httpHandler' => [$proxyHttpHandler, 'async']]]
]);
$request = (new GetPropertyRequest())
->setName('properties/redacted');
$response = $adminClient->getProperty('properties/redacted');
print_r((string) var_dump($response));
return 1;
Then I tried an import run from the console:
https_proxy
environment variable, then it works. BUT, if the proxy in the code is set to a non-existent URL, this changes:
https_proxy
environment variable to a valid proxy URL, the code quickly fails with a "not found" Guzzle/curl error trying to reach the non-existent proxy URL value from the Guzzle client. I replicated the same results by removing this hard-coded stuff and running the same tests with this branch, varying the proxy settings in config/config.php
.
All of this taken together is leading me to believe that before code execution gets to AuthorizationGA4
or ImportGA4Reports::doExecute()
, there is another OAuth attempt (that doesn't pick up on the proxy Guzzle client that gets set in AuthorizationGA4
).
This call, wherever it is, times out if the https_proxy env var isn't set. If the env var is set to a non-existent URL, there's an immediate Guzzle/curl failure on connecting to that URL. If that env var is set to a valid proxy URL, then:
config.php
, set via the Guzzle client in AuthorizationGA4
, and either succeed or fail depending on the settings. I've traced the code some but I'm not sure where that unknown OAuth attempt is coming from. You might have more luck. The other thing I've learned from all of this is that you should be able to test the issue without actually having a proxy server to test with -- if you create a Guzzle client with a non-existent proxy URL it should surface as a Guzzle/curl error. Create clients with different 'bad' proxy URLs in different places, and the error message will reveal which one failed because it includes the URL that it attempted to proxy through.
Combine this with setting a non-existent URL in the https_proxy
env var, and that will reveal when an execution path is not picking up any of these Guzzle clients created in the code, and falls through to trying to use the env var setting (anything set in the code in a Guzzle client will supersede the env var). You can also inspect the requests with a network traffic monitor like Little Snitch/OpenSnitch to see what requests are using which settings.
Does this make sense? I'm not sure if I explained it well or if it's too wordy.
Thank you @machinehum . I haven't been able to spend much time on this today, but I'll try to do some more debugging first thing next week.
@machinehum It took some serious debugging, but I think I found where it's using the ENV variable instead of the defined proxy. In Google's CredentialsWrapper line 171, it doesn't pass the http handler like it does on line 175, so a default handler gets created. I tried passing $authHttpHandler
as a third parameter on line 171 and I received the curl error for the expected proxy URL instead of the ENV variable one.
Edit: Looks like the issue was fixed toward the end of April. I'll have to see if we can update to that version of the dependency or if it would be a breaking change.
@snake14 Great catch! Thanks, and I'll be looking forward to any updates.
@machinehum I'll keep working on the UI build failure, but I'm pretty sure that it's an issue in the test and not the dependency upgrade itself. Can you please test if things work correctly with your proxy?
@AltamashShaikh Can you please review again since I upgraded multiple dependencies?
@machinehum I think that this PR might finally be ready to go if it works for your proxy. Please let me know how your testing goes when you get a chance.
Hi @machinehum . It's been a while, so I thought that I'd check in. I hope that you're doing well. Any chance you'll have an opportunity to test this PR soon?
Description:
We previously added the ability for UA imports to work with a proxy. This is to make GA4 imports work with a proxy as well. Fixes Issue: #521
Review