pirsch-analytics / pirsch-php-proxy

A self-hosted proxy for Pirsch Analytics.
https://pirsch.io
MIT License
9 stars 0 forks source link

Proxy hit error in headless setup #5

Closed npgfg closed 1 week ago

npgfg commented 3 weeks ago

Hi,

thanks for providing this proxy solution!

I am having trouble setting it up though. My frontend is a static build with Astro JS. The pirsch php proxy is on a separate server together with the CMS. My frontend is in staging level on let's say "stage.xyz.de". This is also the host i added in the pirsch dashboard. The content url is "content.xyz.de". This is where i load the scripts from which works fine. I created a client in the dashboard, added id and secret to the config.php and "stage.xyz.de" as the host.

Now when i load the page i get an console error alongside a 400 bad request from "hit" as well as "session" requests saying: "Identification code not found for domain. Make sure you're sending requests from the configured domain or subdomain. Identification codes and domains are cached for up to 5 minutes."

I know there is a separate Id code in the dashboard. I've tried adding it via the data-code-attribute to the script tag but that did not help.

Any hint or help much appreciated, thanks np

Kugelschieber commented 3 weeks ago

Hi,

did you set up the data-endpoint attribute correctly? The proxy doesn't use the identification code, as it directly makes use of the API. The error message however looks like it has been directly sent to pirsch.io. Can you check your browser network tab to see if that's the case?

<script defer type="text/javascript"
    src="/custom/path/pirsch.min.js"
    id="pirschjs"
    data-endpoint="/custom/path/hit.php"></script>
npgfg commented 3 weeks ago

Hey,

thanks for getting back. As for the data-endpoint i was under the impression that i only have to set it up if i use a custom path. Since i have the proxy inside a folder pirsch in the docroot i felt like that is the default.

As for the request itself. It looks like it targets the api as expected:

:authority: api.pirsch.io
:method: GET
:path: /hit?nc=1724139002250&code=zO ...
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br, zstd
accept-language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7
cache-control: no-cache
origin: https://stage. ...
pragma: no-cache
priority: u=1, i
referer: https://stage. ...
sec-ch-ua: "Not)A;Brand";v="99", "Google Chrome";v="127", "Chromium";v="127"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36

Just to be clear. The script-tags in the frontend look like this:

<script defer type="text/javascript" src="https://content.xyz.de/pirsch/pirsch.min.js" id="pirschjs"></script>
<script defer type="text/javascript" src="https://content.xyz.de/pirsch/pirsch-sessions.min.js" id="pirschsessionsjs"></script>
Kugelschieber commented 3 weeks ago

Okay, in that case, you need to add the data-endpoint attribute. Our JavaScript endpoint will look for the identification code that is missing from your request. The proxy directly uses the API, where the identification code isn't required (auth is handled by the client access key or id + secret).

The entire point of the proxy is that all requests are first-party. Right now you're only hosting the script files yourself.

npgfg commented 3 weeks ago

Ok. Got it. The error is gone. Now i got a CORS-Problem but i guess that is a problem in my setup out of this scope? Script tags look like this now. For the error see screenshots attached.

<script defer type="text/javascript" src="https://content.xyz.de/pirsch/pirsch.min.js" id="pirschjs" data-endpoint="https://content.xyz.de/pirsch/hit.php"></script>
<script defer type="text/javascript" src="https://content.xyz.de/pirsch/pirsch-sessions.min.js" id="pirschsessionsjs" data-endpoint="https://content.xyz.de/pirsch/session.php"></script>

I don't really get why the overview states "CORS Error" while the detailed view says "200" but i guess i have to figure that out with hosting.

request_detail request_overview

Kugelschieber commented 3 weeks ago

Yup, you can try setting the CORS headers on the proxy:

header("Access-Control-Allow-Origin: *");
header("Access-Control-Allow-Headers: *");

Through htaccess might also be an option.

npgfg commented 3 weeks ago

Another step. I've set the headers. CORS is fine now.

The requests seem to work as expected. I still don't get any data in my dashboard though :( I've checked the Client ID, Secret and Host. I also added the data-code attribute to the scripts. Hostname (also in the dashboard) is stage.xyz.de as that is my frontend url. I've also tried setting up a second website in the dashboard with content.xyz.de as the host, added a new client there and used the settings for the config.php. Also no hits / page views in the dashboard.

Any idea what could be the problem? My frontend is protected by htaccess / passwd but that cannot be a problem i guess since the content-server is communicating with the api 🤔

Kugelschieber commented 3 weeks ago

The requests could get stuck in the bot filter. Is your proxy behind a load balancer or another proxy (Cloudflare?). In that case, you might need to configure an HTTP header to extract the real visitor IP address from, or otherwise, our service only receives your server's/proxy IP address, which will be filtered.

The PHP SDK and proxy are a bit outdated and don't support this out of the box. I need to update it. It would still be good to know if that's the cause of your problem :)

npgfg commented 2 weeks ago

I am not much of an expert in that area, but load balancers are definetly involved. It is a pretty nice but rather special hosting solution. https://docs.freistilbox.com/how_it_works/

Should i meanwhile switch to a Cloudflare Worker?

Kugelschieber commented 2 weeks ago

Yes, I think that would be the better option for now. The Go proxy supports configuring a header.

Just for context: the PHP proxy currently only accepts the remote IP. If a proxy or load balancer is involved, the remote IP will be that of the proxy/LB instead of the actual visitor IP address. It's possible to get around this by extracting the visitor IP out of a header set by the proxy/LB (like X-Forwarded-For for example).

Kugelschieber commented 2 weeks ago

I've released version 2 of the proxy, although it doesn't support setting the headers right now...

Can we close this issue for now?

npgfg commented 2 weeks ago

Yes. Sorry for the delay.

npgfg commented 1 week ago

I just got info that we can get the original IP just like you expected via X-Forwarded-For. Hosting says: X-Forwarded-For is a comma seperated list of addresses where the first one is the original visitor ip. Can you give a hint on how to actually pass this address to the script then?

Kugelschieber commented 1 week ago

Well, that's the issue with the PHP proxy currently. It only supports the remote IP address:

https://github.com/pirsch-analytics/pirsch-php-proxy/blob/master/p/proxy.php#L67

As you can see, I've started implementing header support but didn't finish yet. I guess I should do that right now to get it out of the way. I'll let you know when it's ready, probably later today :)

Kugelschieber commented 1 week ago

Alright, could you give it a shot?

https://github.com/pirsch-analytics/pirsch-php-proxy/releases/tag/v2.1.0

You need to update the configuration to include the header:

return (object) array(
    'ipHeader' => array('X-Forwarded-For'),
    'clients' => array(
        (object) array(
            'secret' => 'your-client-secret'
        )
    )
);
npgfg commented 1 week ago

I've deployed 2.1.0 on the server at content.xyz.de/p config updated looks like this:

return (object) array(
  'ipHeader' => array('X-Forwarded-For'),
  'clients' => array(
    (object) array(
      'id' => 'KUr...',
      'secret' => 'jFy...',
      'hostname' => 'content.xyz.de'
    )
  )
);

Script in the frontend head looks like this:

<script defer type="text/javascript"
      src="https://content.xyz.de/p/p/p.js.php"
      id="pianjs"
      data-hit-endpoint="https://content.xyz.de/p/p/pv.php"
      data-event-endpoint="https://content.xyz.de/p/p/e.php"
      data-session-endpoint="https://content.xyz.de/p/p/s.php"></script>

When i visit the frontend i see one related request:

requests

Nothing in the Dashboard yet. Am i missing something?

Kugelschieber commented 1 week ago

Hmm, looks good to me. It might take a few seconds before the page view appears on the dashboard though.

If it doesn't, could you add a php file with the following code to see if it spits out your real IP address?

<?php

function parseXForwardedForHeader($value) {
    if (!isset($value)) {
        return '';
    }

    $parts = explode(',', $value);

    if (count($parts) > 0) {
        return cleanIP(trim($parts[0]));
    }

    return '';
}

function cleanIP($ip) {
    if (str_contains($ip, ':')) {
        $parts = explode(':', $ip, 1);
        return $parts[0];
    }

    return $ip;
}

if (array_key_exists('HTTP_X_FORWARDED_FOR', $_SERVER)) {
    echo parseXForwardedForHeader($_SERVER['HTTP_X_FORWARDED_FOR']);
} else {
    echo 'X-Forwarded-For header not found!';
}
npgfg commented 1 week ago

Looks good. Echoes my IP.

Kugelschieber commented 1 week ago

Perfect! It should track then. Are you behind a proxy or VPN or something? Could you send me a link to support@pirsch.io so that I can have a look?

npgfg commented 1 week ago

Just in case somebody else stumbles across this issue. Stats work fine via the 2.1.0 proxy. Last issues were related to some special circumstances with the hosting.

Kugelschieber commented 1 week ago

Awesome! Thank you for your patience :)