milesmcc / shynet

Modern, privacy-friendly, and detailed web analytics that works without cookies or JS.
Apache License 2.0
2.87k stars 180 forks source link

All hosts showing up as CLOUDFLARENET (true IP not being analyzed) #190

Open milesmcc opened 2 years ago

milesmcc commented 2 years ago

This is what I currently see in the session-level logs:

image

Anyone else run into this issue recently with Cloudflare?

CasperVerswijvelt commented 2 years ago

I saw something similar a while ago. I don't have any data of it anymore since I cleared thosr entries using the admin panel, but it was a very high (relative to normal stats) amount of traffic coming from cloudflarenet. I remember there being a couple concurrent sessions at all time during thi phenomenon, with most sessions (I think all of them, not sure) being just a single hit. It stopped again as randomly as it started, so I think it was maybe an issue on cloudflare's end during crawling or something?

Maybe the same is happening to you right now

CasperVerswijvelt commented 2 years ago

To expand on my previous comment, my shynet instance is behind cloudflare, while the tracked site is not.

agucova commented 2 years ago

This is weird, I've always used Cloudflare (both as a reverse proxy and using Cloudflare Pages) and I've never seen it. This could be Cloudflare browser analytics or the old Always Online (it's now managed by the Internet Archive).

Cloudflare usually uses a descriptive user agent, do you still have the issue and the UAs for those sessions? @milesmcc

I'm actually having a simillar issue but with the Internet Archive ASN, which I presumed was Cloudflare's Always Online, but the frequency doesn't match and the user agent is "Chrome".

image

agucova commented 2 years ago

If it's a crawler, this could be fixed by modifying the bot filter to also filter out Cloudflare's and Internet Archive's ASN (though we would need to be careful not to block Cloudflare's WARP VPN).

I'm afraid to do this without understanding why though, I'll try to put my web server on debug mode and collect more info on those sessions.

null-domain commented 2 years ago

Currently running into the same issue with all networks showing as CLOUDFLARENET. For me, the first occurrence is April 4th, although something may have happened between March 18th and then, as March 18th is the last entry I have that doesn't appear as CLOUDFLARENET.

milesmcc commented 2 years ago

Maybe Cloudflare changed their headers? We do get the origin IP... https://github.com/milesmcc/shynet/blob/master/shynet/analytics/views/ingress.py#L28

null-domain commented 2 years ago

It's definitely something on the Cloudflare end of things. My shynet instance is currently proxied behind Cloudflare; switching to "DNS only" starts returning correct network names and information. Switching it back starts returning entries for CLOUDFLARENET again.

danya02 commented 1 year ago

Since Cloudflare is a proxy, a Cloudflare server will download the page from your server, process it in some way, then send it to the user. Because of this, if you look at the IP address, it will seem that only Cloudflare ever visits your site.

Cloudflare provides information about the request it is proxying with a set of headers. For this discussion, the important ones are X-Forwarded-For and CF-Connecting-IP. According to Cloudflare, you should be using the latter; the reason for that is that this header is only set by Cloudflare and not by any intermediate proxies you might have, so if you're using Cloudflare as your first proxy, and every other proxy maintains the header, then your app can just read the CF-Connecting-IP to know the IP of the user.

Headers can be spoofed, of course, but if you know that your app is only ever going to be accessed through Cloudflare (which you can ensure with an intermediate proxy that will only accept connections from Cloudflare's IP ranges), then you can use that to figure out the user's IP.

For my part, I'm running Shynet in a container behind Traefik, and Traefik is behind Cloudflare. I've run into the same problem, so as a workaround I'll clone the repo and build a custom version of the Shynet application container with the user's IP detection method changed to instead just use the CF-Connecting-IP header.