internetstandards / Internet.nl

Internet standards compliance test suite
https://internet.nl
171 stars 35 forks source link

External use of Internet.nl #1255

Open bwbroersma opened 7 months ago

bwbroersma commented 7 months ago
simonbesters commented 6 months ago

Hi, product owner of Digital Insights Platform here. We are happy to add (more) content about internet.nl within our product. As for now it is:

To determine the security score, we employ the standards and benchmarks established by the Dutch government. These are tested through the platform internet.nl.

bwbroersma commented 6 months ago

Hey @simonbesters, thanks for joining the conversation. Currently zero attribution is required, but we're thinking of some 'advised' attribution to make some clearer distinction between Internet.nl and tools using Internet.nl results.

I wondered: do you currently 'scrape' the site or do you use the json REST API via batch.internet.nl?

BTW Internet.nl is in the process of updating the API server to docker (see #1253), there are still a few release-blockers, but after that it should be fairly easy to setup a private Internet.nl-API instance (and then you could also setup a brand in the scraper UA, see #1257).

simonbesters commented 6 months ago

We scrape, outside office hours.

bwbroersma commented 4 months ago

We scrape, outside office hours.

@simonbesters: Please note the new rule number 7 of the application form (which is not yet deployed on the online application form).

dipnluser commented 4 months ago

Hi @bwbroersma, DIP programmer here. We really like the 'scraping' method, because it gives us a nice internet.nl page to give to the user (like a municipality ciso). We poll the request status very slowly, and we visit the result page only once. We request about 300 domains per day. In random order (in our checks queue), so your server gets them across 6 - 12 hours. We scrape some key results from the result page (eg. https://internet.nl/site/www.waterland.nl/2722343/) and save the permalink, and that's what we give the user if they want to see why their website didn't get a 100%. We really need that internet.nl result page for the ciso. I don't think the API makes one, or does it? And if it does, why use the API instead of the front-end?

bwbroersma commented 4 months ago

See other users of the API, batch-calls will - nest to JSON - also give a result page, e.g. https://batch.internet.nl/site/www.rijksoverheid.nl/5899563/ in this case. The benefit of using the API is that it performs a fair scheduling and better use of the resources, you will be able to make 1 request for 300 domains, or 2000 domains, and don't have to guess how to best handle the scheduling. Furthermore the batch resources are different from the single test (internet.nl) instance, so large batches will never slow down regular users of the site.

dipnluser commented 4 months ago

What do you mean, scheduling? Our jobs run synchronously, so we will wait for results. (About 20 sec on average in 2024 and 2023.) Will the batch api requests be much slower, or more unpredictable? We will still do requests per 1 domain, not all 300 at the same time, because the queue doesn't know that. If possible we'll ask mail and site for 1 domain in the same request, so 300 requests x 2, but not 2 requests x 300.

dipnluser commented 4 months ago

From the TOS:

Causing heavy loads for the Service makes things slower for other users. We therefore request the users to honor the following ‘fair use’ rules:

  • Maximum 2 batch requests per week;
  • Per batch request a maximum of 5000 domain names;

That sounds like a problem... Even if I completely change the way the queue works, it would be 7 batches per week. And users do adhoc tests for a single site (site & mail), so there would also be batches of 1, OR those would still use the scraping method. I'm gonna sleep on it. I've requested API access, and I'll give the batch api a try soon.

dipnluser commented 1 month ago

@bwbroersma Our batch API implementation is live, and it works beautifully ❤️ so this should unload the internet.nl instance somewhat. Thanks guys.