Invalid URL '<captcha_url>': No host supplied

kyngs commented 2 years ago

When downloading any file, the script errors out with the following error:

CAPTCHA protected download - CAPTCHA challenges will be displayed

Starting TOR...
Traceback (most recent call last):HA
  File "/usr/bin/ulozto-downloader", line 33, in <module>
    sys.exit(load_entry_point('ulozto-downloader==2.6.0', 'console_scripts', 'ulozto-downloader')())
  File "/usr/lib/python3.10/site-packages/uldlib/cmd.py", line 44, in run
    d.download(args.url, args.parts, args.output)
  File "/usr/lib/python3.10/site-packages/uldlib/downloader.py", line 237, in download
    download_url = next(self.captcha_download_links_generator)
  File "/usr/lib/python3.10/site-packages/uldlib/page.py", line 278, in captcha_download_links_generator
    captcha_answer = captcha_solve_func(
  File "/usr/lib/python3.10/site-packages/uldlib/captcha.py", line 94, in __call__
    u = requests.get(img_url)
  File "/usr/lib/python3.10/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3.10/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3.10/site-packages/requests/sessions.py", line 515, in request
    prep = self.prepare_request(req)
  File "/usr/lib/python3.10/site-packages/requests/sessions.py", line 443, in prepare_request
    p.prepare(
  File "/usr/lib/python3.10/site-packages/requests/models.py", line 318, in prepare
    self.prepare_url(url, params)
  File "/usr/lib/python3.10/site-packages/requests/models.py", line 395, in prepare_url
    raise InvalidURL("Invalid URL %r: No host supplied" % url)
requests.exceptions.InvalidURL: Invalid URL 'https:https://xapca.uloz.to/081da61e2fe16217fa90522022c22f2f8557abfa/image.jpg': No host supplied

A few hours ago it still worked. Perhaps a change of API from uloz.to?

jise commented 2 years ago

Only "Download type: slow direct download (without CAPTCHA)" is still working.

em1tter commented 2 years ago

Same for me. Slow direct downloads without captcha are working, but anything else.

hroncok commented 2 years ago

Hotpatching site-packages/requests/models.py PreparedRequest.prepare_url() with the following hack just to see if it is the only thing required:

        if url.startswith('https:https://'):
            url = url[6:]

Makes it work.

hroncok commented 2 years ago

This also makes it work: https://github.com/setnicka/ulozto-downloader/pull/83

jirikrepl commented 2 years ago

Thanks for the quick fix LGTM tested and worked fine. I this repository still maintained? Could we get this fix into the pip repository? Maybe @hroncok we can use your fork

for quick file edit, this is where the file is located on MacOS if installed with pip and pyenv

/Users/<username>/.pyenv/versions/3.7.10/lib/python3.7/site-packages/ulozto_downloader/page.py

em1tter commented 2 years ago

Hotpatching site-packages/requests/models.py PreparedRequest.prepare_url() with the following hack just to see if it is the only thing required:
        if url.startswith('https:https://'):
            url = url[6:]
Makes it work.

For me it also works. Thank you:)

VaclavTrpisovsky commented 2 years ago

Hotpatching site-packages/requests/models.py PreparedRequest.prepare_url() with the following hack just to see if it is the only thing required:
        if url.startswith('https:https://'):
            url = url[6:]
Makes it work.

Thank you. Myself, I didn't want to touch models.py so I hacked a similar fix into captcha.py: I added the line img_url = img_url[6:] before every occurence of u = requests.get(img_url). Please don't do this yourself, I am a beginner in Python and may have created an instability in the code...

setnicka commented 2 years ago

@hroncok thank you for the PR, I merged it into master. Later today (after reviving other changes) I will publish a new version to the pip repository.

This project is in a low-maintenance state, especially during the summer when I have almost zero time for it. So sorry for delay (applies also to other issues and PR... I will process them today).

hapashtiepa commented 2 years ago

thanks a milion guys, but I am amateur. I would need some kinda tutorial how use this fix. or I will wait. Solving captcha is always faster than slow download.

VaclavTrpisovsky commented 2 years ago

I am amateur. I would need some kinda tutorial how use this fix.

You need to modify the specified file in your Python*\Lib directory. In a noob-friendly way: ulozto-downloader uses some external packages that are written in the scripting language called Python. Python is an interpreted language, which means that you don't compile source code into an executable but instead a program called interpreter will decode instructions on the fly. The Python interpreter is installed on your system and can either be fed .py files or manually typed commands. Python scripts import pre-written modules/libraries/packages to perform more complicated tasks, like opening a GUI window, manipulating images or making web requests. Due to clashes between Python 2 and 3, there may be multiple Python versions on your system so you better find out which is the "default" one - that is, what gets run when you call python on the command line. (Assuming you are using Windows, just substitute "/" for "\" in paths and "terminal" for "command line" etc. on Unix-based systems.) We will now open up the Python CLI and ask it where it is installed. So open Command Prompt (Win+R, cmd, Enter) and type python. You are now in the Python CLI, as indicated by >>> at the start. Now we import the "sys" package so that we can access system information: use the command import sys. Then, use print(sys.exec_prefix) to get the location of your Python directory printed in the command line. Now navigate to that folder and proceed to subfolders \Lib\site-packages\requests\. Open the file models.py in Notepad (or a better alternative like Notepad++, especially if using Windows <10 because of CRLF/LF clash). The error, as you have seen on the command line, is that URL starting with https:https:// is not valid and some code in models.py cannot handle that. The double-prepending of https: happens likely elsewhere but we will apply a quick and dirty fix by modifying the request-handling module to fix the URL if it erroneously begins with https:https://. Go to line 410. This is where the definition of the function PreparedRequest.prepare_url begins. Add the aforementioned lines to the beginning of its code. Keep the eight/twelve spaces at their beginning, as whitespace is crucial in Python scripts! So now, lines 411 and 412 should read:

        if url.startswith('https:https://'):
            url = url[6:]

The code checks the url variable for the erroneous prefix: if it starts with https:https://, then the second line removes the first 6 characters. That's it. Save and close.

setnicka / ulozto-downloader

Invalid URL '<captcha_url>': No host supplied #82