matomo-org / plugin-GoogleAnalyticsImporter

Google Analytics to Matomo importer
24 stars 14 forks source link

Making GA4 imports work with a proxy #522

Open snake14 opened 6 months ago

snake14 commented 6 months ago

Description:

We previously added the ability for UA imports to work with a proxy. This is to make GA4 imports work with a proxy as well. Fixes Issue: #521

Review

snake14 commented 6 months ago

@AltamashShaikh Do these changes look like they should work to you?

machinehum commented 6 months ago

@snake14 No change, unfortunately. I'm pasting output from the console below. Please let me know if there are any specific debugging steps I could try that would help. I'll poke at it some today and see if I can figure anything out.

I did confirm that setting the https_proxy env var before a console run allows OAuth to succeed and the import to begin. I can use that to get around this issue for our current use case, but I still want to help with getting this PR to release.

/var/www/html# php ./console googleanalyticsimporter:import-ga4-reports --property="properties/XXXXXXXXX" --dates=2023-07-01,2024-01-01
ERROR     [2024-04-29 16:25:13] 124  Uncaught exception: 
/var/www/html/plugins/GoogleAnalyticsImporter/vendor/prefixed/guzzlehttp/guzzle/src/Handler/CurlFactory.php(146):
cURL error 28: Failed to connect to oauth2.googleapis.com port 443 after 300000 ms: Timeout was reached (see
 https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://oauth2.googleapis.com/token [Query: , CLI mode: 1]

In CurlFactory.php line 146:

  cURL error 28: Failed to connect to oauth2.googleapis.com port 443 after 300000 ms: Timeout was reached (see https:
  //curl.haxx.se/libcurl/c/libcurl-errors.html) for https://oauth2.googleapis.com/token
machinehum commented 6 months ago

For the record, with this PR the GA3 import still works, no regression. With a little debug I can see that the new code in AuthorizationGA4.php getClientClassArguments() is setting what looks like a properly configured Guzzle client (pasted below). I'm trying to figure out a way to verify that the passing authHttpHandler to BetaAnalyticsDataClient and AnalyticsAdminServiceClient actually has the desired effect.

\\Matomo\\Dependencies\\GoogleAnalyticsImporter\\GuzzleHttp\\Client::__set_state(array(
   'config' => 
  array (
    'proxy' => 'http://proxy.foo.com:3128',
    'exceptions' => false,
    'base_uri' => 
    \\Matomo\\Dependencies\\GoogleAnalyticsImporter\\GuzzleHttp\\Psr7\\Uri::__set_state(array(
       'scheme' => 'https',
       'userInfo' => '',
       'host' => 'www.googleapis.com',
       'port' => NULL,
       'path' => '',
       'query' => '',
       'fragment' => '',
       'composedComponents' => NULL,
    )),
    'handler' => 
    \\Matomo\\Dependencies\\GoogleAnalyticsImporter\\GuzzleHttp\\HandlerStack::__set_state(array(
       'handler' => 
      \\Closure::__set_state(array(
      )),
       'stack' => 
      array (
        0 => 
        array (
          0 => 
          \\Closure::__set_state(array(
          )),
          1 => 'http_errors',
        ),
        1 => 
        array (
          0 => 
          \\Closure::__set_state(array(
          )),
          1 => 'allow_redirects',
        ),
        2 => 
        array (
          0 => 
          \\Closure::__set_state(array(
          )),
          1 => 'cookies',
        ),
        3 => 
        array (
          0 => 
          \\Closure::__set_state(array(
          )),
          1 => 'prepare_body',
        ),
      ),
       'cached' => NULL,
    )),
    'allow_redirects' => 
    array (
      'max' => 5,
      'protocols' => 
      array (
        0 => 'http',
        1 => 'https',
      ),
      'strict' => false,
      'referer' => false,
      'track_redirects' => false,
    ),
    'http_errors' => true,
    'decode_content' => true,
    'verify' => true,
    'cookies' => false,
    'idn_conversion' => false,
    'headers' => 
    array (
      'User-Agent' => 'GuzzleHttp/7',
    ),
  ),
))
snake14 commented 6 months ago

Thank you @machinehum . Looking at the GuzzleHttp\Client you output, I'd say that looks correct. What does the https://github.com/matomo-org/plugin-GoogleAnalyticsImporter/blob/cad809ff8b9c7dddf13ba24607080f3a17a7152b/vendor/prefixed/google/gax/src/CredentialsWrapper.php#L110 look like if you comment out line 52 where authHttpHandler gets set in the arguments? I'm wondering if maybe I used the wrong base_uri value for GA4?

machinehum commented 6 months ago

@snake14 Hm. In CredentialsWrapper.php at line 110 I dumped the value of $args['authHttpHandler'], and then of $authHttpHandler after it is set. $args['authHttpHandler'] is alway null and $authHttpHandler is always a vanilla GuzzleHttp\Client instance, without the proxy or other custom settings. This is true regardless of whether line 52 in AuthorizationGA4.php is commented out or not.

I did confirm that the conditional on line 51-53 is being entered. So it seems like the new form of setting the arguments isn't actually passing them through? I'm going to try some more debugging. I suppose you could do similar even without a proxy to see whether this version of AuthorizationGA4.php is effectively passing the credentials argument or not.

machinehum commented 6 months ago

With the change in this PR to AuthorizationGA4, the authHttpHandler argument is being set in an array next to \Matomo\Dependencies\GoogleAnalyticsImporter\Google\ApiCore\CredentialsWrapper, but from looking at the Google API code it looks like it needs to be an argument to the credentials wrapper builder, alongside keyFile. This change to AuthorizationGA4::getClientClassArguments():

       $arguments = [
              'keyFile' => $this->getClientConfiguration()
        ];

        $proxyHttpClient = StaticContainer::get('GoogleAnalyticsImporter.proxyHttpClient');
        if ($proxyHttpClient) {
            $proxyHttpHandler = \Matomo\Dependencies\GoogleAnalyticsImporter\Google\Auth\HttpHandler\HttpHandlerFactory::build($proxyHttpClient);
            $arguments['authHttpHandler'] = $proxyHttpHandler;
        }

        $credentialWrapper =  \Matomo\Dependencies\GoogleAnalyticsImporter\Google\ApiCore\CredentialsWrapper::build($arguments);

        return ['credentials' => $credentialWrapper];
    }

results in CredentialsWrapper showing what looks like a good GuzzleHttp\Client object. However, the initial OAuth request still isn't using the proxy. The BetaAnalyticsDataGapicClient and AnalyticsAdminServiceGapicClient builders also allows for passing in a set of config arguments via credentialsConfig, as opposed to a preconstructed CredentialWrapper object. I tried that and got the same result. I'll keep poking at it.

snake14 commented 6 months ago

Nice find @machinehum . I meant to pass the handler into the CredentialsWrapper constructor. I went ahead and updated the PR with the latest changes that you shared. It looks like maybe passing the transport argument similar to credentials might work? I found mention of it in Google's documentation.

machinehum commented 6 months ago

@snake14 Yeah, I did try the transport config eventually as well. That didn't make any difference either. Ended up spending 9 hours debugging this today. The documentation on the google PHP API side seems a little scattered. I did see that some of the modules of google-cloud-php, like BigTables, implement their own proxy settings. This makes me suspicious about whether the "official" solution is fully tested and working. Most of the threads I can find on it say "just use environment variables", heh. I opened an issue at https://github.com/googleapis/google-cloud-php/issues/7274 to hopefully get some guidance.

snake14 commented 6 months ago

Thank you for all of your effort @machinehum ; it's greatly appreciated. I agree that it's hard to find a clear answer in Google's documentation. Pretty much all of my searches for using a proxy with Google's PHP API classes ended up pointing to answers for UA and not GA4.

snake14 commented 3 months ago

Hi @machinehum . I tried to follow the input that was provided on the issue you created. I tried to test the changes, but got some Guzzle error. If you get a chance to try out the latest changes and let me know how it goes, it would be greatly appreciated.

snake14 commented 3 months ago

Hi @machinehum . Reading your most recent comment, it looks like you received the same error that I did when I tried testing the suggested workaround. I'll keep watching that thread :+1:

snake14 commented 3 months ago

@machinehum . I made the most recently suggested change and it appears to fix the Guzzle error. Does this PR work correctly for your proxy now?

machinehum commented 3 months ago

@snake14 Thanks for keeping an eye on it! I'm continuing to test. Unfortunately it still doesn't use the proxy with your newest commits. But, I have some new insight.

As recommended in that issue I created a test script that authenticates and makes a GET request to an API endpoint directly with the Guzzle client, without using the PHP libraries. I modified it slightly to use an OAuth client (like the importer does) instead of a service account. That test worked appropriately with and without the proxy. I tried inserting the same code into ImportGA4Reports::doExecute() and invoking it via starting an import from the console, and the import timed out on an initial OAuth call.

So, this points to an issue in the importer context. I did more exhaustive tests on the importer code, and found something interesting. I tried hard-coding client setup (just to rule out any issues in the chain of configs loading), and an AnalyticsAdmin API call in AuthorizationGA4, and in ImportGA4Reports::doExecute(), like this:

 'client_id' => "redacted.apps.googleusercontent.com",
 'client_secret' => "redacted",
 'refresh_token' => 'redacted'];

 $client = new Client([
    'proxy'      => 'http://proxyhost.com',
    'verify'     => false,
    'exceptions' => false,
    'base_uri' => 'https://www.googleapis.com',
 ]);

 $proxyHttpHandler = HttpHandlerFactory::build($client);
    $arguments['credentialsConfig'] = ['authHttpHandler' => $proxyHttpHandler, 'keyFile' => $clientConfig];
    $arguments['transport'] = 'rest';
    $arguments['transportConfig'] = ['rest' => ['httpHandler' => [$proxyHttpHandler, 'async']]];

    $adminClient = new AnalyticsAdminServiceClient([
    // pass the HTTP handler in to the "credentialsConfig" option
    'credentialsConfig' => ['keyFile' => $clientConfig, 'authHttpHandler' => $proxyHttpHandler],
    // ensure we are using "rest" transport (gRPC is the default when the grpc.so extension is enabled)
    'transport' => 'rest',
    // set the http handler on the RestTransport
    'transportConfig' => ['rest' => ['httpHandler' => [$proxyHttpHandler, 'async']]]
 ]);

 $request = (new GetPropertyRequest())
        ->setName('properties/redacted');
 $response = $adminClient->getProperty('properties/redacted');
 print_r((string) var_dump($response));
 return 1;

Then I tried an import run from the console:

BUT, if the proxy in the code is set to a non-existent URL, this changes:

I replicated the same results by removing this hard-coded stuff and running the same tests with this branch, varying the proxy settings in config/config.php.

All of this taken together is leading me to believe that before code execution gets to AuthorizationGA4 or ImportGA4Reports::doExecute(), there is another OAuth attempt (that doesn't pick up on the proxy Guzzle client that gets set in AuthorizationGA4).

This call, wherever it is, times out if the https_proxy env var isn't set. If the env var is set to a non-existent URL, there's an immediate Guzzle/curl failure on connecting to that URL. If that env var is set to a valid proxy URL, then:

  1. That unknown OAuth attempt succeeds using the env var proxy
  2. Subsequent requests use the proxy settings from config.php, set via the Guzzle client in AuthorizationGA4, and either succeed or fail depending on the settings.

I've traced the code some but I'm not sure where that unknown OAuth attempt is coming from. You might have more luck. The other thing I've learned from all of this is that you should be able to test the issue without actually having a proxy server to test with -- if you create a Guzzle client with a non-existent proxy URL it should surface as a Guzzle/curl error. Create clients with different 'bad' proxy URLs in different places, and the error message will reveal which one failed because it includes the URL that it attempted to proxy through.

Combine this with setting a non-existent URL in the https_proxy env var, and that will reveal when an execution path is not picking up any of these Guzzle clients created in the code, and falls through to trying to use the env var setting (anything set in the code in a Guzzle client will supersede the env var). You can also inspect the requests with a network traffic monitor like Little Snitch/OpenSnitch to see what requests are using which settings.

Does this make sense? I'm not sure if I explained it well or if it's too wordy.

snake14 commented 3 months ago

Thank you @machinehum . I haven't been able to spend much time on this today, but I'll try to do some more debugging first thing next week.

snake14 commented 3 months ago

@machinehum It took some serious debugging, but I think I found where it's using the ENV variable instead of the defined proxy. In Google's CredentialsWrapper line 171, it doesn't pass the http handler like it does on line 175, so a default handler gets created. I tried passing $authHttpHandler as a third parameter on line 171 and I received the curl error for the expected proxy URL instead of the ENV variable one.

Edit: Looks like the issue was fixed toward the end of April. I'll have to see if we can update to that version of the dependency or if it would be a breaking change.

machinehum commented 3 months ago

@snake14 Great catch! Thanks, and I'll be looking forward to any updates.

snake14 commented 3 months ago

@machinehum I'll keep working on the UI build failure, but I'm pretty sure that it's an issue in the test and not the dependency upgrade itself. Can you please test if things work correctly with your proxy?

@AltamashShaikh Can you please review again since I upgraded multiple dependencies?

snake14 commented 2 months ago

@machinehum I think that this PR might finally be ready to go if it works for your proxy. Please let me know how your testing goes when you get a chance.

snake14 commented 1 month ago

Hi @machinehum . It's been a while, so I thought that I'd check in. I hope that you're doing well. Any chance you'll have an opportunity to test this PR soon?