Closed Moleculor closed 1 year ago
Unfortunately reddit has a hard limit of 1k posts that you can query using their api. If you need more then you have to do it manually in the browser.
I can't reproduce your problem on my end (using the same single folder version). In case sth. is broken inside the bundle, I created a new one using the most recent pyinstaller version and certifi package if you want to try it. Which Windows version are you using?
Windows 10 Pro Version 10.0.19045 Build 19045
I downloaded the new one, extracted it to a new directory (didn't even try overwriting the old one so I knew I was starting fresh), and tried again.
I got the exact same errors.
Does or can this version leverage Python stuff installed elsewhere on my machine by mistake?
I grabbed the single-folder version because this machine has been used for all sorts of things over the course of a decade+, some of which probably involved Python of some sort. I was trying to avoid the hassle of finding yet another Python version, installing yet another different set of versions of various packages, etc. Plus conflicts and whatnot.
It shouldn't but I can't rule it out. Maybe run python -m pip install --upgrade certifi
to upgrade your CA bundle and then try again. If it works then it was using sth. from your system.
I've actually been trying to do that since my previous reply. 😅
It took some time to realize that I needed to do it with a terminal window that was running as an administrator, but I finally got it to stop reinstalling the old 2022 version I had on here.
Or so it claimed:
PS H:\> python -m pip install --upgrade certifi
Requirement already satisfied: certifi in c:\program files\python310 (2022.9.14)
Collecting certifi
Using cached certifi-2023.5.7-py3-none-any.whl (156 kB)
Installing collected packages: certifi
Attempting uninstall: certifi
Found existing installation: certifi 2022.9.14
Uninstalling certifi-2022.9.14:
Successfully uninstalled certifi-2022.9.14
Successfully installed certifi-2023.5.7
PS H:\GWARipper-0.6.8_single-folder_dev2> python -m pip list --outdated
Package Version Latest Type
---------------------- --------- -------- -----
<.... a bunch of other packages .... >
certifi 2022.9.14 2023.5.7 wheel
<.... a bunch of other packages .... >
PS H:\GWARipper-0.6.8_single-folder_dev2> python -m pip install --upgrade certifi
Requirement already satisfied: certifi in c:\users\moleculor\appdata\roaming\python\python310\site-packages (2022.9.14)
Collecting certifi
Using cached certifi-2023.5.7-py3-none-any.whl (156 kB)
Installing collected packages: certifi
Attempting uninstall: certifi
Found existing installation: certifi 2022.9.14
Uninstalling certifi-2022.9.14:
Successfully uninstalled certifi-2022.9.14
Successfully installed certifi-2023.5.7
PS H:\GWARipper-0.6.8_single-folder_dev2> python -m pip list --outdated
Package Version Latest Type
---------------------- --------- -------- -----
<.... a bunch of other packages .... >
<.... but certifi is no longer listed as outdated ....>
No, I have no idea why I had to install/upgrade twice to get it to stick.
(I have always found Python to be frustrating to work with. I'll add this to the list of reasons why.)
I still get the same errors.
I just tried doing it the other way, where I download GWARipper-0.6.8.zip
, unzip it to a directory, and run python -m pip install -r requirements.txt
.
PS H:\GWARipper-0.6.8> .\gwaripper-runner.py config -p .\DownloadedFiles
New root dir is: H:\GWARipper-0.6.8\DownloadedFiles
PS H:\GWARipper-0.6.8> .\gwaripper-runner.py redditor 200 <user_name>
Same errors. Or, well, mostly the same. Slightly different line number.
WARNING - URL Error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:997) <soundgasm.net link>
WARNING - ERROR - NO_RESPONSE - Request timed out or no response received! (URL was <soundgasm.net link>)
So it's not a problem exclusive to the single-folder version.
I do know enough to stumble my way through Python to identify that the specific line throwing the error is https://github.com/nilfoer/gwaripper/blob/73964617866a09878dcf5089cf7f8cbf953de57f/gwaripper/download.py#LL330C67-L330C67 if that helps in any way?
You may already know/realize that. I'm currently researching to see if there's any way for urllib to show details of what it is getting when SSL fails for some reason, but I definitely don't know Python well enough to think I'll be able to figure that out any time soon.
If you want me to edit files or grab a development branch of some kind to play around with, I could.
I've spent some time trying to debug this, eventually reaching the realization that this is some sort of problem unique to my computer. I have a second computer running on the exact same internet connection that works fine with some basic test code, while nothing more than a simple
import socket
import ssl
import urllib.request
hostname = 'soundgasm.net'
context = ssl.create_default_context()
with socket.create_connection((hostname, 443)) as sock:
with context.wrap_socket(sock, server_hostname=hostname) as ssock:
print(ssock.version())
can replicate the problem on the problem machine.
But I can't seem to find any methods of actually debugging ssl
or urllib
. If you've got ideas, I'd love to hear them, but this looks like it's a problem far outside the scope of your code. Sorry to have wasted any of your time.
Okay. I've done a bit of research, and this is my very inexpert "I don't know what I'm talking about" understanding of what might be going on.
Apparently there's currently a bug in Python or maybe OpenSSL or something like that.
Basically, on Windows, it just uses the first certificate it can find that might be appropriate within the Windows Certificate Store on that individual machine.
But the certificates that are in there are going to be wildly different from machine to machine, and older machines will have a lot more certificates in there.
I narrowed down the issue as being possibly related to the DST Root CA X3 certificate that expired back in 2021? Except if I delete that from my Windows Certificate Store I just get a different error ([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:997)
).
But here's the weird part: You asked me to update certifi
. Like certifi
should be handling root certificates, not Windows.
But it's fairly clear that Windows is handling certificates, not certifi
. Because of two reasons:
certifi
isn't mentioned anywhere in your code.For a demonstration of why I think this is this is the case, see this answer on StackOverflow.
To further demonstrate this:
import ssl
import urllib.request
import certifi
hostname = 'soundgasm.net'
context = ssl.create_default_context()
try:
with urllib.request.urlopen("https://soundgasm.net", context=context) as response:
print("Success with no certifi context.")
except:
print("Failure with no certifi context.")
context2 = ssl.create_default_context(cafile=certifi.where())
try:
with urllib.request.urlopen("https://soundgasm.net", context=context2) as response:
print("Success with certifi context!")
except:
print("Failure with certifi context.")
Outputs:
Failure with no certifi context.
Success with certifi context!
So, on my machine, if I specifically tell it to use the certificate authority file from certifi
, I can connect to soundgasm just fine.
The thing is, I barely know enough about Python to know how to test this in your code. Even me just typing import certifi
at the top of download.py
seems to not allow me to import certifi
, and I'm not sure why.
Thank you for taking the time to investigate! It was an interesting read!
I wrongly assumed that certifi
was being used by python internally, since some SO answers mentioned updating it as a solution. I'll think about adding certifi
to my code directly, but I'm torn, since I found some issues of packages explicitly moving away from it so that the system bundle gets used.
Would it help knowing that using certifi
would be potentially working around a known, unresolved bug within cpython
that has sat unresolved for months? (I don't know how much you want links to this specific project being linked to cpython
issues in GitHub, so rather than pasting a link I'll just point you towards issue number 101738 at cpython
's GitHub?)
That issue report might have alternative workarounds, though?
Actually, it turns out that maybe certifi
is being used by default, supposedly? Though I'm not sure how/why me altering my Windows Certificate Store results in a change in behavior at that point.
import requests
import certifi
print(requests.certs.where())
print(certifi.where())
prints C:\Program Files\Python310\lib\site-packages\certifi\cacert.pem
twice.
So, color me confused. Specifying the location fixes the issue, but it's supposed to be using that location by default.
Unless requests
and urllib
aren't using the same store?
A friend of mine is looking to backup her content before nuking everything.
I'm currently using the
0.6.8_single-folder
version and running.\gwaripper.exe redditor 200 <username>
(is there a way to not have to limit the # of posts?) and I'm getting nothing but the following kinds of errors:I also get this error if I just try to grab a post from the subreddit's current front page. For example:
.\gwaripper.exe links <any comments link from GWA, or any file link from Soundgasm>
gives me the same errors.
(Also, praw is complaining about a new version being out. But I don't think that matters here?)
If it helps, I've checked the certificate sent by Soundgasm via web browser, and that one definitely isn't expired yet.