Closed graphiclunarkid closed 10 years ago
The nightly results dump contains everything. We can change this. The ISP display list was edited down to ORG probe ISPs some time shortly after the nightly dumps were implemented, so the change didn't propagate there.
It contains the probe ID, purely for the purposes of allowing analysts to assess accuracy by probe, but no other information about the probe is provided.
Another related question is, should we reject result submissions for ISPs that we don't have rules for? We've collected results for wireless providers that might perform some blocking, but without having added block-detection rules to the probe config, they will never return a "blocked" result.
The spreadsheet shown in the screenshot isn't the nightly dump though - it's a one-off export of the alexa 100k URLs that was produced just prior to the site relaunch. The nightly dump contains timestamp information for the results, while the snapshot was pretty much a pivot table showing the current status as of that date. The results may have changed since then.
Prior to go-live, we did run most of the alexa 100k against Virgin Mobile. However, we don't currently have a probe connected to that mobile provider, so there won't be live results for it. When we chose to limit the list of ISPs displayed to the ORG ones, we limited it to live ORG probes,
OK, so it looks like the site "adultesextube.com" was actually OK at Virgin mobile, and we are confident of that result. The site is actually a redirect URL to some other porn site so that may be why it was detected as ok – that’s just a guess however,
If I remember correctly, we follow 301 and 302 redirects, and report results according to where we end up. I'm not sure how this works with respect to storing results for the original URL, though, nor do I know whether we treat temporary and permanent redirects differently.
@dantheta I agree we should reject submissions for networks where we can't detect blocking. Otherwise, don't we risk polluting the database with false-negative reports?
@jimkillock Have you checked this result in the nightly data-dumps?
On Fri, Jul 18, 2014 at 08:11:32AM -0700, Richard King wrote:
If I remember correctly, we follow 301 and 302 redirects, and report results according to where we end up. I'm not sure how this works with respect to storing results for the original URL, though, nor do I know whether we treat temporary and permanent redirects differently.
We don't store a separate status for the intermediate locations. If we detect site1 -> (302) -> site2 -> blocked, then the status for site1 is blocked, and nothing extra is stored for site2, unless a request is submitted for site2's url separately. We don't treat permanent and temporary redirects differently.
@dantheta I agree we should reject submissions for networks where we can't detect blocking. Otherwise, don't we risk polluting the database with false-negative reports?
OK - I'll open a new issue for this one, since I want to make sure that we've addressed all of the surrounding issues. I think not accepting results (or even setting up a queue) for unknown ISPs is the right call. It will keep the results much cleaner, and will also save us some memory/disk space as we won't have to maintain queues for those ISPs.
@jimkillock Have you checked this result in the nightly data-dumps?
The result we got for Virgin Mobile for that URL was from 2014-05-17; it might have changed since then.
Reply to this email directly or view it on GitHub: https://github.com/openrightsgroup/cmp-issues/issues/86#issuecomment-49441993
The results for the extra ISPs have been archived and purged. Virgin Mobile results are still there, as they were gathered by an ORG probe and we may run a probe against that network again in the future.
There seems to be a discrepancy between the data snapshot and the results as reported in the front end.
https://www.blocked.org.uk/results?url=http://www.adulterfree.com https://www.blocked.org.uk/results?url=http://adultesextube.com
http://adultesextube.com appears to be a redirect page to http://winx-xxx.com (a possible reason it wouldn’t be blocked?)
The entry for Virgin Mobile below, I’m not sure of the status of the probe. It’s not listed on the front end results above so it’s possible that it is a “user result” that we don’t yet trust.
We wight want to advise on the status of some of these results, as the snapshot is reporting results that aren’t available in the front end, but as it is just a dump, it’s not plain what these results are or mean.
Perhaps we should consider restricting the data dump to official probes only to match the website?