kitUIN / PicImageSearch

整合图片识别 API,用于以图搜源 / Aggregator for Reverse Image Search API
https://pic-image-search.kituin.fun/
MIT License
428 stars 46 forks source link

Class google not work #56

Closed ghost closed 1 year ago

ghost commented 1 year ago

google search doesn't work. Your library uses https://www.google.com/searchbyimage but that url doesn't work, you need to search google lens.

I would also like to see yandex image search in your library, searches by image better than google, consider this option ^^

examples(for url) Image URL: https://i1.sndcdn.com/avatars-000248291936-634zqi-t500x500.jpg Google: https://lens.google.com/uploadbyurl?url=https://i1.sndcdn.com/avatars-000248291936-634zqi-t500x500.jpg Yandex: https://yandex.com/images/search?rpt=imageview&url=https://i1.sndcdn.com/avatars-000248291936-634zqi-t500x500.jpg TinEye: https://tineye.com/search?url=https://i1.sndcdn.com/avatars-000248291936-634zqi-t500x500.jpg Bing: https://www.bing.com/images/searchbyimage?cbir=sbi&imgurl=https://i1.sndcdn.com/avatars-000248291936-634zqi-t500x500.jpg

I don't know how to do it for the files, but it would be very useful

NekoAria commented 1 year ago

Could you share the code you used for testing? I tested it using demo_google.py and it worked fine. Here's the code:

import asyncio

from loguru import logger

from PicImageSearch import Google, Network
from PicImageSearch.model import GoogleResponse
from PicImageSearch.sync import Google as GoogleSync

# proxies = "http://127.0.0.1:1081"
proxies = None
url = "https://i1.sndcdn.com/avatars-000248291936-634zqi-t500x500.jpg"
# file = "images/test03.jpg"

@logger.catch()
async def test() -> None:
    async with Network(proxies=proxies) as client:
        google = Google(client=client)
        resp = await google.search(url=url)
        # resp = await google.search(file=file)
        show_result(resp)
        resp2 = await google.goto_page(resp.get_page_url(2), 2)
        show_result(resp2)

def show_result(resp: GoogleResponse) -> None:
    # logger.info(resp.origin)  # Original Data
    # Should start from index 2, because from there is matching image
    logger.info(resp.raw[2].origin)
    logger.info(resp.index)
    logger.info(resp.raw[2].thumbnail)
    logger.info(resp.raw[2].title)
    logger.info(resp.raw[2].url)
    logger.info(resp.page)
    logger.info("-" * 50)

if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(test())

Regarding Yandex support, I considered it before, but the main issue is that it might trigger the ~Cloudflare interception~Yandex SmartCaptcha, so it's currently on hold.

ghost commented 1 year ago

Even when using your code, the error even if you just follow this link in your browser, it redirects from "https://www.google.com/searchbyimage?image_url=https://i1.sndcdn.com/avatars-000248291936-634zqi-t500x500.jpg&safe=off" to "https://www.google.com/imghp?sbi=1"

PS C:\test> & C:/Users/User/AppData/Local/Programs/Python/Python311/python.exe c:/test/text.py
2023-03-24 15:52:24.229 | ERROR    | asyncio.events:_run:80 - An error has been caught in function '_run', process 'MainProcess' (1064), thread 'MainThread' (5824):
Traceback (most recent call last):

  File "c:\test\text.py", line 40, in <module>
    loop.run_until_complete(test())
    │    │                  └ <function test at 0x00000242435A3880>
    │    └ <function BaseEventLoop.run_until_complete at 0x0000024241DE36A0>
    └ <ProactorEventLoop running=True closed=False debug=False>

  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 640, in run_until_complete
    self.run_forever()
    │    └ <function ProactorEventLoop.run_forever at 0x0000024241EA7380>
    └ <ProactorEventLoop running=True closed=False debug=False>
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\asyncio\windows_events.py", line 321, in run_forever
    super().run_forever()
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 607, in run_forever
    self._run_once()
    │    └ <function BaseEventLoop._run_once at 0x0000024241DE9440>
    └ <ProactorEventLoop running=True closed=False debug=False>
  File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 1922, in _run_once
    handle._run()
    │      └ <function Handle._run at 0x0000024241D32DE0>
    └ <Handle <TaskStepMethWrapper object at 0x000002424358CCA0>()>
> File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
    │    │            │    │           │    └ <member '_args' of 'Handle' objects>
    │    │            │    │           └ <Handle <TaskStepMethWrapper object at 0x000002424358CCA0>()>
    │    │            │    └ <member '_callback' of 'Handle' objects>
    │    │            └ <Handle <TaskStepMethWrapper object at 0x000002424358CCA0>()>
    │    └ <member '_context' of 'Handle' objects>
    └ <Handle <TaskStepMethWrapper object at 0x000002424358CCA0>()>

  File "c:\test\text.py", line 21, in test
    show_result(resp)
    │           └ <PicImageSearch.model.google.GoogleResponse object at 0x00000242435B1110>
    └ <function show_result at 0x00000242435A3920>

  File "c:\test\text.py", line 29, in show_result
    logger.info(resp.raw[2].origin)
    │      │    │    └ []
    │      │    └ <PicImageSearch.model.google.GoogleResponse object at 0x00000242435B1110>
    │      └ <function Logger.info at 0x0000024242440E00>
    └ <loguru.logger handlers=[(id=0, level=10, sink=<stderr>)]>

IndexError: list index out of range
NekoAria commented 1 year ago

Sorry, I forgot to create a new release. The Google module was previously maintained by @lleans.

I am currently modifying the code logic of the Google module, so I will not be releasing a new version for the time being. You can use pip install git+https://github.com/kitUIN/PicImageSearch.git@4b5eaff5b02c9aec88b14aa3dbb341c66d6a3826 in the meantime.

NekoAria commented 1 year ago

You can now upgrade to v3.8.0 by using pip install -U PicImageSearch , and then try the code above.

In addition, I tried to support Yandex again, but unfortunately triggered Yandex SmartCaptcha. To bypass it, we may need to use a headless browser, but that conflicts with the lightweight positioning of this library, so I had to put it on hold.

ghost commented 1 year ago

In new v3.8.0 made it work, thank you very much, but it's worth working on the documentation. now instead of resp2 = await google.goto_page(resp.get_page_url(2), 2) need resp2 = await google.goto_page(resp=resp, index=2)

NekoAria commented 1 year ago

Yes, your proposal is good. I will update the document when I have time.

But, the goto_page() function actually has problems. The upper limit of resp.pages is 10, and resp.index may be different from the actual index. I think it may need to be changed to pre_page() and next_page() in the future.

Furthermore, there is some good news. I found that using httpx or requests instead of aiohttp seems to bypass the Yandex SmartCaptcha. However, please wait for me to have time before pushing forward with the subsequent progress.

NekoAria commented 1 year ago

Now Yandex is supported, but due to dependency changes, you need to install httpx by using pip install httpx .

Please assist in testing the current code. If there are no issues, I will release a new version.

Regarding the updates related to the documentation, please wait until tomorrow when I have time.

ghost commented 1 year ago

Google is working properly. Now I need to add Yandex to my project. Thank you for the updates

NekoAria commented 1 year ago

I may work on creating an English version of the documentation recently.

If possible, could you assist with maintaining the documentation in the future?

ghost commented 1 year ago

I am not an English-speaker too, but I can advise you deepl.com translator, according to my observations it works better than google translator

ghost commented 1 year ago

A day ago I had this code working. Now it's not working, why? I didn't change my IP, I didn't change the code either. Before it gave out 5 pages on the image, now it's 1 blank. When I do a manual search for a photo, it works fine. Any idea why this is happening?

Async with Network() as client:
            google = google(client=client)
            results = await google.search(file=photo_file)
            logger.info(results.origin)
NekoAria commented 1 year ago

Please try the latest release v3.9.0 .

ghost commented 1 year ago

Are there options to get around this? Shows this only with a proxy, without it it works fine. It also shows the following on the first request

About this page
URL: https://www.google.com/search?tbs=sbi:AMhZZivzirwHe01r0mQJ_1MQe6wiD0NCyhzX6KgSxVLhHETl3IGQIYwqRHsQh-JFkp9QKmsYP_1hGruKpDOYmdbiurLqiiKxL4W0cXgUsOVXESu7MJyG_1cNcJPS0kUvWQH1eCj2Im9v_17kdyHTYBnI64WKmaekreOwBQ
Time: 2023-03-30T12:20:37Z
IP address: 45.142.37.173

Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot.
NekoAria commented 1 year ago

The reason why you are encountering this problem is due to your network environment, either you use or do not use a proxy, or you configure the headers and cookies obtained in your browser to the Network() , which may be helpful, but I am not sure, because there is no particularly good way to deal with anti-crawling techniques.