iRail / hyperRail

The repo for the iRail.be webapp
https://iRail.be
Creative Commons Zero v1.0 Universal
47 stars 17 forks source link

IP Address blocked from irail.be, api.irail.be #286

Closed PolMrt closed 6 years ago

PolMrt commented 6 years ago

Hi, I have an app which collects data from your api to create analytics from delay. Since yesterday at 2 pm, it doesn't work anymore. My app calls all the departure from each station every 30min. Have I been blocked?

Bertware commented 6 years ago

Hi!

Thanks for getting in touch. The IP address was blocked yesterday, as you were making a huge number of requests which were returning an error 503. At the moment you were blocked, we already received 6000 invalid requests, We expect you to use this status code as a sign to lower the number of requests, as returning that 503 error still uses server resources which we'd like to keep available to other users.

image

As you can see from this excerpt, you were polling the same vehicle over and over, multiple times per second.These are only the blocked requests, there were still 3 requests per second passing through.

Please update your code to do the following:

Please change your user agent so we can contact you if should this happen again in the future. Your user agent is empty now, so we hadn't any lead on who was causing these requests.

If you need to retrieve information on stations:

If you need to retrieve information on trains, I suggest you do the following strategy:

Your IP address has been unblocked for a second at 10:38 today, and your server started a neverending flow of requests responses, so I blocked it again. Please fix your code, as this behaviour will always result in a block. In comparison, a typical client throws less than hundred 503 requests on a day. You just caused 4500 blocked requests in the first one minute. You can resolve this by implementing throttling on your side, or aborting requests when you receive a 503. Storing failed requests in a queue to then run when the server comes back online needs throttling or you'll run into this issue. image

Feel free to ask for more specific advice if you need help optimizing your queries. Please comment in this issue when you have resolved this behaviour to get the IP unblocked.

See also https://docs.irail.be/#header-best-practices-when-using-the-api

We know the rate limiting might be annoying, but we're trying to keep the data available for everyone. We're working on a new system which will allow you to get all this data ever few seconds with simple requests, instead of having to search every train. However, we're still ironing the kinks out and it might take a few months before it's stable and 100% correct. Follow us on twitter, our blog, or gitter to stay updated and get notified when the new system is available.

You can read more here: https://hello.irail.be/2018/03/16/one-million-daily-requests-where-do-they-come-from-and-how-well-cope-with-them/

PolMrt commented 6 years ago

Can you help me to configure UserAgent in php please

Bertware commented 6 years ago

How you set the user agent in PHP depends on the method you're using to retrieve data. Below I described the method to set your user agent for file_get_contents and cURL, as I suspect you're using one of those methods to download data.

Setting your user-agent using file_get_contents:

$options  = array('http' => array('user_agent' => 'custom user agent string'));
$context  = stream_context_create($options);
$response = file_get_contents('http://domain/path/to/uri', false, $context);

source: https://joshtronic.com/2013/06/04/specifying-a-user-agent-when-using-file_get_contents/

Setting your user agent using CURL: curl_setopt($curl,CURLOPT_USERAGENT,'your user agent here');

PolMrt commented 6 years ago

Here we are, I all updated! It should be good

Bertware commented 6 years ago

Your IP address has been unblocked. You are still exceeding the request limit, but there is already a significant improvement. This is a log of all blocked requests in one minute:

180.47 - - [22/Apr/2018:13:08:11 +0200] "GET /liveboard/?station=Poulseur&date=220418&time=1125&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.053 cgi:-
180.47 - - [22/Apr/2018:13:08:11 +0200] "GET /liveboard/?station=Sint-Joris-Weert&date=220418&time=1225&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "undefine/0.5 (delay.pol.tf; @)" t:0.050 cgi:-
180.47 - - [22/Apr/2018:13:08:12 +0200] "GET /liveboard/?station=Morlanwelz&date=220418&time=1025&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.051 cgi:-
180.47 - - [22/Apr/2018:13:08:12 +0200] "GET /liveboard/?station=Ruisbroek-Sauvegarde&date=220418&time=1155&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.053 cgi:-
180.47 - - [22/Apr/2018:13:08:12 +0200] "GET /liveboard/?station=Ninove&date=220418&time=1055&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.052 cgi:-
180.47 - - [22/Apr/2018:13:08:26 +0200] "GET /liveboard/?station=Pepinster&date=220418&time=1125&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.053 cgi:-
180.47 - - [22/Apr/2018:13:08:27 +0200] "GET /liveboard/?station=Serskamp&date=220418&time=1225&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "undefine/0.5 (delay.pol.tf; contact@gitewy.net)" t:0.051 cgi:-
180.47 - - [22/Apr/2018:13:08:27 +0200] "GET /liveboard/?station=Michelau&date=220418&time=1025&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.054 cgi:-
180.47 - - [22/Apr/2018:13:08:27 +0200] "GET /liveboard/?station=Rochefort-Jemelle&date=220418&time=1155&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.053 cgi:-
180.47 - - [22/Apr/2018:13:08:28 +0200] "GET /liveboard/?station=Neerpelt&date=220418&time=1055&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.054 cgi:-
180.47 - - [22/Apr/2018:13:08:41 +0200] "GET /liveboard/?station=Muizen&date=220418&time=1055&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.051 cgi:-
180.47 - - [22/Apr/2018:13:08:42 +0200] "GET /liveboard/?station=Oostkamp&date=220418&time=1125&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.053 cgi:-
180.47 - - [22/Apr/2018:13:08:42 +0200] "GET /liveboard/?station=Scheldewindeke&date=220418&time=1225&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "undefine/0.5 (delay.pol.tf; contact@gitewy.net)" t:0.053 cgi:-
180.47 - - [22/Apr/2018:13:08:56 +0200] "GET /liveboard/?station=Pont de Bois&date=220418&time=1155&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.050 cgi:-
180.47 - - [22/Apr/2018:13:08:57 +0200] "GET /liveboard/?station=Moortsele&date=220418&time=1055&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.054 cgi:-
180.47 - - [22/Apr/2018:13:08:58 +0200] "GET /liveboard/?station=Marseille-Saint-Charles&date=220418&time=1025&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.050 cgi:-
180.47 - - [22/Apr/2018:13:08:58 +0200] "GET /liveboard/?station=Marne-la-Vall\xC3\xA9e - Chessy&date=220418&time=1025&arrdep=departure&lang=fr&format=json&fast=false&alerts=false HTTP/1.0" 503 206 "-" "-" t:0.050 cgi:-

Note: I redacted your e-mail adress to prevent you getting spam, but it is there, so that's good!

Please fix the following issues:

As your IP address is now unblocked, it should be easier for you to debug. Thanks for the quick fix!