pluginkollektiv / statify

Statify – statistics plugin for WordPress
https://wordpress.org/plugins/statify/
GNU General Public License v3.0
76 stars 22 forks source link

Empty target column in database #266

Open Zodiac1978 opened 1 year ago

Zodiac1978 commented 1 year ago

Describe the bug A user in the support forums reported that his website has empty target columns in the database: https://wordpress.org/support/topic/beste-inhalte-2/ (German)

What can cause this issue? Any idea?

Do I need to ask more about the environment, then please add a comment?

Zodiac1978 commented 1 year ago

The website has no redirect from http to https. Can this lead to this anomaly?

stklcode commented 1 year ago

~I don't see any reason why this should not~ The target might be empty, if there is no path in the URI, e.g. https://www.example.com was requested and not https://www.example.com/

While almost all browsers do, not all webservers enforce a / for empty paths. So the request might not be issued by humans, but we can't tell this for sure.

The target website does not seem to use JS tracking, so we use $_SERVER['REQUEST_URI']: https://github.com/pluginkollektiv/statify/blob/5fcea33069c8cbaf10edc0842f09f1d8d23c73ff/inc/class-statify-frontend.php#L48-L50

And we just have a fallback for failed sanitization, not for empty content: https://github.com/pluginkollektiv/statify/blob/5fcea33069c8cbaf10edc0842f09f1d8d23c73ff/inc/class-statify-frontend.php#L57-L59

(which I don't thinks is really elegant, because the JS-content might contain any kind of bogus data which fails sanitization, so I'd personally prefer null here and just don't count those requests for top lists - but that's not the point here)

But a few lines below we should opt-out on empty or invalid targets: https://github.com/pluginkollektiv/statify/blob/5fcea33069c8cbaf10edc0842f09f1d8d23c73ff/inc/class-statify-frontend.php#L65-L67

Zodiac1978 commented 1 year ago

As there is a "/" also in the list, I think it is a request from http which is empyting the referrer because of the default Referrer-Policy set to strict-origin-when-cross-origin which is not sending the referrer in this case:

Don't send the Referer header to less secure destinations (HTTPS→HTTP).

stklcode commented 1 year ago

We are not talking about an empty referrer, but an empty target... :thinking:

Zodiac1978 commented 1 year ago

Oh, I mixed this up ... sorry, you are right.

Zodiac1978 commented 1 year ago

The target might be empty, if there is no path in the URI, e.g. https://www.example.com/ was requested and not https://www.example.com/

But the mentioned website redirects from without to the one with trailing slash ...

stklcode commented 1 year ago

Only path that comes to my mind would be a valid URI like https://example.com/ with Home URL / and $wp_rewrite->use_trailing_slashes === false, i.e. this call https://github.com/pluginkollektiv/statify/blob/5fcea33069c8cbaf10edc0842f09f1d8d23c73ff/inc/class-statify-frontend.php#L95

with user_trailingslashit/)

    if ( $wp_rewrite->use_trailing_slashes ) {
        $url = trailingslashit( $url );
    } else {
        $url = untrailingslashit( $url );
    }

calling untrailingslashit().

No idea how to achieve this :shrug:

stklcode commented 1 year ago

But the mentioned website redirects from without to the one with trailing slash ...

No. Likely your browser does, but a call without slash with a slightly more primitive client gives a 200 response immediately:

$ curl -i https://**********.de
HTTP/2 200 
x-ua-compatible: IE=edge
link: <https://**********.de/wp-json/>; rel="https://api.w.org/", <https://**********.de/wp-json/wp/v2/pages/150>; rel="alternate"; type="application/json", <https://**********.de/>; rel=shortlink
cache-control: max-age=0
expires: Sun, 16 Jul 2023 17:04:45 GMT
vary: Accept-Encoding
content-type: text/html; charset=UTF-8
date: Sun, 16 Jul 2023 17:04:45 GMT
server: Apache

<!DOCTYPE html>
[...]

So here is no client-side redirection, but in general you are right, the path should always be root, i.e. /. Highly likely something else messed up the path in the process. Probably temporarily due to some reconfiguration.

I would recommend observing the behavior and if it does not grow, just ignore or remove the 31 entries.


And there seems to be a 301 redirection from HTTPS to HTTP now, maybe added after your advice:

$ curl -i http://**********.de 
HTTP/1.1 301 Moved Permanently
Date: Sun, 16 Jul 2023 17:08:04 GMT
Server: Apache
X-UA-Compatible: IE=edge
Expires: Sun, 16 Jul 2023 18:08:04 GMT
Cache-Control: max-age=3600
X-Redirect-By: WordPress
Upgrade: h2,h2c
Connection: Upgrade
Location: https://**********.de/
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Zodiac1978 commented 1 year ago

And there seems to be a 301 redirection from HTTPS to HTTP now, maybe added after your advice:

This does WordPress if the Site/Home is set to https, for pages WP doesn't do this by default: http://example.de/impressum/ still does not redirect to https.