searxng / searx-space

Statistics of the public SearX(NG) instances
https://searx.space/
GNU Affero General Public License v3.0
158 stars 25 forks source link

Response times are server based and not based on the location of the visitor #1

Open unixfox opened 4 years ago

unixfox commented 4 years ago

Currently, the response times are retrieved from only one server so from one location in the world. Thus, some Searx instances could actually have a bad ranking because they are far away from the server that serve the statistics.

One way to fix that would be to test every Searx instances from multiple locations. The problem is that renting a server running all the time in multiple locations in the world is expensive.

So I think the best way to test each Searx instance without paying anything would be to use the CDN of Zeit. They have quite a lot of servers in different locations of the world: https://zeit.co/docs/v2/network/regions-and-providers/ We would just have to implement the test into a lambda function.

unixfox commented 4 years ago

I found a really cheap hosting provider that offer VPS for only $2/year: https://hosting.gullo.me/pricing The VPS are IPv6 compatible, the latest version of Debian, locations in New York, Chicago, Los Angeles, Canada, Germany and Finland. There is unfortunately no location in Asia nor Africa nor Oceania on this provider.

But according to serverhunter, it seems that there are other cheap VPS providers for the 3 missing regions:

dalf commented 4 years ago

I agree with this point. Thank you for all the suggestions.

I wish the configuration to be the same among the different locations:

About mesures from different locations:

Amazon EC2 seems to offer 750 hours for t2.micro (?)

unixfox commented 4 years ago

Yeah AWS EC2 seems to be a good target for that, but they offer only one year of free t2.micro. After that you are free to create as many accounts as you want as long as you have a credit card still unused on their services.

dalf commented 4 years ago

Note about the VPS choice:

Other notes:

unixfox commented 4 years ago

most probably, whatever the hosting solution, each instance can't provide a free pass for the tests (bypass filtron).

Isn't that already the case? I mean apart from my antibot it's the only one that whitelist your IP.

Maybe we could introduce a token that is passed in the HTTP headers so that the anti bot solution knows that it's searx-stats2?

unixfox commented 2 years ago

Now that we control the antibot solution in searxng, maybe we could have a way to bypass the limiter in order to do tests from multiple IP addresses around the world.

This could be a public text file on the searx-space repository with all the whitelisted IP addresses. Searxng would refresh this list every X hours and add them in the redis database.

For the searx(ng) instances that don't use the builtin limiter we would just test them normally from a single IP address.

Now that we have donations we can buy multiple VPS servers around the world.

dalf commented 11 months ago

Dumb idea using JS (since the website requires JS anyway): Add a button Find the fastest(*) instance for me in searx.space

The button make few requests to front pages of each selected instances ; then it sort the table according to the response times.

The Resource timing API can help to do that.

It will never be as accurate as a constant measure of the result time, but I hope it can be a good approximation.

An possible improvement of the measure: with the agreement of the user, the JS store the response times in the local storage:

unixfox commented 11 months ago

@dalf you can't do that due to CORS

dalf commented 11 months ago

This is workaround: a script to run locally

curl https://gist.githubusercontent.com/dalf/66f8962460048d8d5a6d9b4eaeab197a/raw/a9c389d1721723ed1491b78b7bb7603f528bf4f9/findmyinstance.py | python

See : https://gist.github.com/dalf/66f8962460048d8d5a6d9b4eaeab197a

The scripts makes a few requests on the front page of each instances, and then displays the median and mean response times. Far from perfect, but it will gives an idea of the response times without making guesses.

It relies on Python, but there is no external dependency and should work on whatever OS.

jazzzooo commented 11 months ago

Sorry if this is a dumb suggestion, but why don't we just subtract the ping time from the response times, to get the underlying time.