nilfoer / gwaripper

Tool for conveniently downloading audios from r/gonewildaudio and similar subreddits
MIT License
30 stars 5 forks source link

WARNING - URL Error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired #14

Closed Moleculor closed 1 year ago

Moleculor commented 1 year ago

A friend of mine is looking to backup her content before nuking everything.

I'm currently using the 0.6.8_single-folder version and running .\gwaripper.exe redditor 200 <username> (is there a way to not have to limit the # of posts?) and I'm getting nothing but the following kinds of errors:

2023-05-15 12:19:39,177 - gwaripper.download - WARNING - URL Error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1125) (url: <entirely valid soundgasm URL that works if I copy/paste it into a browser>)
2023-05-15 12:19:39,179 - gwaripper.extractors.base - WARNING - ERROR - NO_RESPONSE - Request timed out or no response received! (URL was <entirely valid soundgasm URL that works if I copy/paste it into a browser>)

I also get this error if I just try to grab a post from the subreddit's current front page. For example:

.\gwaripper.exe links <any comments link from GWA, or any file link from Soundgasm>

gives me the same errors.

(Also, praw is complaining about a new version being out. But I don't think that matters here?)

If it helps, I've checked the certificate sent by Soundgasm via web browser, and that one definitely isn't expired yet.

nilfoer commented 1 year ago

Unfortunately reddit has a hard limit of 1k posts that you can query using their api. If you need more then you have to do it manually in the browser.

I can't reproduce your problem on my end (using the same single folder version). In case sth. is broken inside the bundle, I created a new one using the most recent pyinstaller version and certifi package if you want to try it. Which Windows version are you using?

Moleculor commented 1 year ago

Windows 10 Pro Version 10.0.19045 Build 19045

I downloaded the new one, extracted it to a new directory (didn't even try overwriting the old one so I knew I was starting fresh), and tried again.

I got the exact same errors.

Does or can this version leverage Python stuff installed elsewhere on my machine by mistake?

I grabbed the single-folder version because this machine has been used for all sorts of things over the course of a decade+, some of which probably involved Python of some sort. I was trying to avoid the hassle of finding yet another Python version, installing yet another different set of versions of various packages, etc. Plus conflicts and whatnot.

nilfoer commented 1 year ago

It shouldn't but I can't rule it out. Maybe run python -m pip install --upgrade certifi to upgrade your CA bundle and then try again. If it works then it was using sth. from your system.

Moleculor commented 1 year ago

I've actually been trying to do that since my previous reply. 😅

It took some time to realize that I needed to do it with a terminal window that was running as an administrator, but I finally got it to stop reinstalling the old 2022 version I had on here.

Or so it claimed:

PS H:\> python -m pip install --upgrade certifi
Requirement already satisfied: certifi in c:\program files\python310 (2022.9.14)
Collecting certifi
  Using cached certifi-2023.5.7-py3-none-any.whl (156 kB)
Installing collected packages: certifi
  Attempting uninstall: certifi
    Found existing installation: certifi 2022.9.14
    Uninstalling certifi-2022.9.14:
      Successfully uninstalled certifi-2022.9.14
Successfully installed certifi-2023.5.7

PS H:\GWARipper-0.6.8_single-folder_dev2> python -m pip list --outdated
Package                Version   Latest   Type
---------------------- --------- -------- -----
<.... a bunch of other packages .... >
certifi                2022.9.14 2023.5.7 wheel
<.... a bunch of other packages .... >

PS H:\GWARipper-0.6.8_single-folder_dev2> python -m pip install --upgrade certifi
Requirement already satisfied: certifi in c:\users\moleculor\appdata\roaming\python\python310\site-packages (2022.9.14)
Collecting certifi
  Using cached certifi-2023.5.7-py3-none-any.whl (156 kB)
Installing collected packages: certifi
  Attempting uninstall: certifi
    Found existing installation: certifi 2022.9.14
    Uninstalling certifi-2022.9.14:
      Successfully uninstalled certifi-2022.9.14
Successfully installed certifi-2023.5.7

PS H:\GWARipper-0.6.8_single-folder_dev2> python -m pip list --outdated
Package                Version   Latest   Type
---------------------- --------- -------- -----
<.... a bunch of other packages .... >
<.... but certifi is no longer listed as outdated ....>

No, I have no idea why I had to install/upgrade twice to get it to stick.

(I have always found Python to be frustrating to work with. I'll add this to the list of reasons why.)

I still get the same errors.

Moleculor commented 1 year ago

I just tried doing it the other way, where I download GWARipper-0.6.8.zip, unzip it to a directory, and run python -m pip install -r requirements.txt.

PS H:\GWARipper-0.6.8> .\gwaripper-runner.py config -p .\DownloadedFiles
New root dir is: H:\GWARipper-0.6.8\DownloadedFiles
PS H:\GWARipper-0.6.8> .\gwaripper-runner.py redditor 200 <user_name>

Same errors. Or, well, mostly the same. Slightly different line number.

WARNING - URL Error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:997) <soundgasm.net link>
WARNING - ERROR - NO_RESPONSE - Request timed out or no response received! (URL was <soundgasm.net link>)

So it's not a problem exclusive to the single-folder version.

Moleculor commented 1 year ago

I do know enough to stumble my way through Python to identify that the specific line throwing the error is https://github.com/nilfoer/gwaripper/blob/73964617866a09878dcf5089cf7f8cbf953de57f/gwaripper/download.py#LL330C67-L330C67 if that helps in any way?

You may already know/realize that. I'm currently researching to see if there's any way for urllib to show details of what it is getting when SSL fails for some reason, but I definitely don't know Python well enough to think I'll be able to figure that out any time soon.

If you want me to edit files or grab a development branch of some kind to play around with, I could.

Moleculor commented 1 year ago

I've spent some time trying to debug this, eventually reaching the realization that this is some sort of problem unique to my computer. I have a second computer running on the exact same internet connection that works fine with some basic test code, while nothing more than a simple

import socket
import ssl
import urllib.request

hostname = 'soundgasm.net'
context = ssl.create_default_context()

with socket.create_connection((hostname, 443)) as sock:
    with context.wrap_socket(sock, server_hostname=hostname) as ssock:
        print(ssock.version())

can replicate the problem on the problem machine.

But I can't seem to find any methods of actually debugging ssl or urllib. If you've got ideas, I'd love to hear them, but this looks like it's a problem far outside the scope of your code. Sorry to have wasted any of your time.

Moleculor commented 1 year ago

Okay. I've done a bit of research, and this is my very inexpert "I don't know what I'm talking about" understanding of what might be going on.

Apparently there's currently a bug in Python or maybe OpenSSL or something like that.

Basically, on Windows, it just uses the first certificate it can find that might be appropriate within the Windows Certificate Store on that individual machine.

But the certificates that are in there are going to be wildly different from machine to machine, and older machines will have a lot more certificates in there.

I narrowed down the issue as being possibly related to the DST Root CA X3 certificate that expired back in 2021? Except if I delete that from my Windows Certificate Store I just get a different error ([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:997)).

But here's the weird part: You asked me to update certifi. Like certifi should be handling root certificates, not Windows.

But it's fairly clear that Windows is handling certificates, not certifi. Because of two reasons:

  1. I changed the Windows Certificate Store and it impacted the behavior of the code
  2. certifi isn't mentioned anywhere in your code.

For a demonstration of why I think this is this is the case, see this answer on StackOverflow.

To further demonstrate this:

import ssl
import urllib.request

import certifi

hostname = 'soundgasm.net'
context = ssl.create_default_context()

try:
    with urllib.request.urlopen("https://soundgasm.net", context=context) as response:
        print("Success with no certifi context.")
except:
    print("Failure with no certifi context.")

context2 = ssl.create_default_context(cafile=certifi.where())

try:
    with urllib.request.urlopen("https://soundgasm.net", context=context2) as response:
        print("Success with certifi context!")
except:
    print("Failure with certifi context.")

Outputs:

Failure with no certifi context.
Success with certifi context!

So, on my machine, if I specifically tell it to use the certificate authority file from certifi, I can connect to soundgasm just fine.

The thing is, I barely know enough about Python to know how to test this in your code. Even me just typing import certifi at the top of download.py seems to not allow me to import certifi, and I'm not sure why.

nilfoer commented 1 year ago

Thank you for taking the time to investigate! It was an interesting read!

I wrongly assumed that certifi was being used by python internally, since some SO answers mentioned updating it as a solution. I'll think about adding certifi to my code directly, but I'm torn, since I found some issues of packages explicitly moving away from it so that the system bundle gets used.

Moleculor commented 1 year ago

Would it help knowing that using certifi would be potentially working around a known, unresolved bug within cpython that has sat unresolved for months? (I don't know how much you want links to this specific project being linked to cpython issues in GitHub, so rather than pasting a link I'll just point you towards issue number 101738 at cpython's GitHub?)

That issue report might have alternative workarounds, though?

Moleculor commented 1 year ago

Actually, it turns out that maybe certifi is being used by default, supposedly? Though I'm not sure how/why me altering my Windows Certificate Store results in a change in behavior at that point.

import requests
import certifi
print(requests.certs.where())
print(certifi.where())

prints C:\Program Files\Python310\lib\site-packages\certifi\cacert.pem twice.

So, color me confused. Specifying the location fixes the issue, but it's supposed to be using that location by default.

Unless requests and urllib aren't using the same store?