rockdaboot / wget2

The successor of GNU Wget. Contributions preferred at https://gitlab.com/gnuwget/wget2. But accepted here as well 😍
GNU Lesser General Public License v3.0
542 stars 74 forks source link

File downloaded as an html page instead of a .txt file #337

Closed mruprich closed 4 days ago

mruprich commented 3 weeks ago

Running wget2 on a .txt file on dropbox does not download the actual file but rather an html page:

# wget2 -qO - 'https://www.dropbox.com/scl/fi/uryzfd2i7rgwr5my5lqod/hello.txt?rlkey=guhp9wxm1x8caqtd7a9qbo62d&st=11px7sc4&dl=0'
<!DOCTYPE html>
<html class="maestro global-header" xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head><meta charset="utf-8" />
...
...
</script><script nonce="xaep3Xr5nsBfG9w05&#43;ikUJpGT24=" async src="/page_success/end?edison_page_name=scl_oboe_file&amp;path=%2Fscl%2Ffi%2Furyzfd2i7rgwr5my5lqod%2Fhello.txt&amp;request_id=97c8a4d93f7348279ce1ce123e1f43d2&amp;time=1724052721" crossorigin="anonymous"></script>
</body></html><!--status=200-->

With wget1 the actual file and its contents are printed:

# wget -qO - 'https://www.dropbox.com/scl/fi/uryzfd2i7rgwr5my5lqod/hello.txt?rlkey=guhp9wxm1x8caqtd7a9qbo62d&st=11px7sc4&dl=0'
Hello world.
# 

I tried --spider with the old wget (the new one does not have this info in the output) and it says that the file is considered [application/json]. But wget1 could download it fine.

Regards, Michal Ruprich

rockdaboot commented 2 weeks ago

The request headers send by wget1 and wget2 are slightly different. Some servers inspect the headers and provide content based on what they think is best for the request.

In many cases servers look at the User-Agent header, but it can also be the Accept-Encoding header that results in different server behavior. I can't reproduce with the URLs from your description, the response is always empty for me.

rockdaboot commented 2 weeks ago

You can see with --debug what the difference is. E.g. wget2 may use http2, so a first try could be with --no-http2 to use the same protocol as wget1 does.

rockdaboot commented 4 days ago

@mruprich Please feel free to reopen when you have more information or a reproducer that works for anybody but you.