searx / searx

Privacy-respecting metasearch engine
https://searx.github.io/searx/
GNU Affero General Public License v3.0
13.42k stars 1.71k forks source link

Yandex Engine via XML #3210

Open ghost opened 2 years ago

ghost commented 2 years ago

Working URL to the engine https://yandex.com

Why do you want to add this engine? Most unfiltered and complete search engine I could find

Features of this engine They don't filter results Superior results compare to Google/Bing

How can Searx fetch the information from this engine? Yandex.XML is a service that lets you send queries to the Yandex search engine and get responses in XML format: https://yandex.com/dev/xml/doc/dg/concepts/about.html

You need to register an IP but having the option would be amazing!

Applicable category of this engine general, files, images, it, map, music, news, science, social media and videos.

Additional context There is also Serpapi API service, so they found a way to parse Yandex.

unixfox commented 2 years ago

The amount of number of queries that can be done per day unless you register a phone number is very limited though.

Moreover, this requires to register an account, very bad for the privacy and only few searx instances will do it.

Screenshot_20220417-085315_Bromite_1

ghost commented 2 years ago

We can use pre-paid sim card to register an account and get the key.

I think may will be willing to use the engine, at least having the option would be great 👍

unixfox commented 2 years ago

We can use pre-paid sim card to register an account and get the key.

I think may will be willing to use the engine, at least having the option would be great 👍

Well you could contribute to the code of this new engine :).

br4nnigan commented 2 years ago

I'm fine with 10 queries per day, I'm running my own instance.

dmigis commented 2 years ago

We can use pre-paid sim card to register an account and get the key.

I think may will be willing to use the engine, at least having the option would be great +1

IMO, better to use virtual SIM services like https://onlinesim.ru/ or sms-activate.org, for example (depending on your budget). It will remove the necessity for getting physical prepaid SIM and cell phone itself.

dmigis commented 2 years ago

Also, they show only first 1000 results per request (https://yandex.com/dev/xml/doc/dg/concepts/restrictions-new.html). I would prefer to parse Yandex using Selenium + Chrome Headless, but I guess, it is resource consuming and anyway not suitable for public instance because in case of multiple requests from 1 IP even using Chrome Headless configured for stealth operations will lead to CAPTCHA or instant ban. That can be solved by having pool of proxies (and better with Russian IPs) or using mobile proxies (for example, https://mobileproxy.space/en/), but even so it is not a guarantee and not every public instance owner will pay for such service and thus prefer to switch off Yandex support completely. If developers are potentially ready to merge a PR for Selenium-based search engines, then I'll take a shot to implement such thing as a POC.

unixfox commented 2 years ago

SearX current developers are not interested to add any anti-privacy features (see https://github.com/searx/searx#is-searx-in-maintenance-mode) so a full browser fall into this category.