paul-gauthier / aider

aider is AI pair programming in your terminal
https://aider.chat/
Apache License 2.0
17.89k stars 1.67k forks source link

Webscaper often returns error #1063

Closed zenturacp closed 3 weeks ago

zenturacp commented 1 month ago

Issue

When scaping websites i often run into issues like below

I know there is a timeout - but the side is responding, i know the site is HUGE and there is a ton of SVG's but probably that adresses the issue - would be nice if the scaper was able to just get text or?

From the error i'm really struggeling to understand why this exact site fails.

Add https://docs.eraser.io/docs/icons to the chat? y
Scraping https://docs.eraser.io/docs/icons...
Timeout while loading https://docs.eraser.io/docs/icons
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\users\cp\.local\bin\aider.exe\__main__.py", line 7, in <module>
  File "C:\Users\cp\pipx\venvs\aider-chat\Lib\site-packages\aider\main.py", line 620, in main
    coder.run()
  File "C:\Users\cp\pipx\venvs\aider-chat\Lib\site-packages\aider\coders\base_coder.py", line 685, in run
    self.run_one(user_message, preproc)
  File "C:\Users\cp\pipx\venvs\aider-chat\Lib\site-packages\aider\coders\base_coder.py", line 717, in run_one
    message = self.preproc_user_input(user_message)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\cp\pipx\venvs\aider-chat\Lib\site-packages\aider\coders\base_coder.py", line 709, in preproc_user_input
    self.check_for_urls(inp)
  File "C:\Users\cp\pipx\venvs\aider-chat\Lib\site-packages\aider\coders\base_coder.py", line 743, in check_for_urls
    inp += self.commands.cmd_web(url, paginate=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\cp\pipx\venvs\aider-chat\Lib\site-packages\aider\commands.py", line 139, in cmd_web
    content = self.scraper.scrape(url) or ""
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\cp\pipx\venvs\aider-chat\Lib\site-packages\aider\scrape.py", line 97, in scrape
    content, mime_type = self.scrape_with_playwright(url)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\cp\pipx\venvs\aider-chat\Lib\site-packages\aider\scrape.py", line 146, in scrape_with_playwright
    mime_type = response.header_value("content-type").split(";")[0]
                ^^^^^^^^
UnboundLocalError: cannot access local variable 'response' where it is not associated with a value

Version and model info

Aider v0.49.1 Models: claude-3-5-sonnet-20240620 with diff edit format, weak model claude-3-haiku-20240307 Git repo: .git with 1 files Repo-map: using 1024 tokens VSCode terminal detected, pretty output has been disabled. Use /help for help, run "aider --help" to see cmd line args

paul-gauthier commented 1 month ago

Thanks for trying aider and filing this issue. I've fixed this bug and confirmed that aider can now scrape the https://docs.eraser.io/docs/icons url.

The change is available in the main branch. You can get it by installing the latest version from github:

python -m pip install --upgrade git+https://github.com/paul-gauthier/aider.git

If you have a chance to try it, let me know if it works better for you.

paul-gauthier commented 3 weeks ago

I'm going to close this issue for now, but feel free to add a comment here and I will re-open or file a new issue any time.