shadowmoose / RedditDownloader

Scrapes Reddit to download media of your choice.
1.11k stars 101 forks source link

In praw_wrapper.py the function praw_apply_filter is not handling 404 exceptions #225

Open scubanarc opened 2 years ago

scubanarc commented 2 years ago

Describe the bug

All downloads stop on some 404's, the UI hangs, and the RMD application has to be restarted.

Environment Info

Screenshots/Information

The following is the last data dumped:

Sep 22 12:44:35 vm-rmd RMD-ubuntu[85316]: HTTPSConnectionPool(host='assets', port=443): Max retries exceeded with url: /favicon-16x16.png (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f1eabaa3b10>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')) Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: Process RedditElementLoader: Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: Traceback (most recent call last): Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "multiprocessing/process.py", line 297, in _bootstrap Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "processing/redditloader.py", line 30, in run Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "processing/redditloader.py", line 51, in load Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "processing/redditloader.py", line 65, in _scan_sources Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "sources/subreddit_posts_source.py", line 16, in get_elements Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "static/praw_wrapper.py", line 131, in subreddit_posts Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "static/praw_wrapper.py", line 221, in _praw_apply_filter Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "praw/models/listing/generator.py", line 63, in next Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "praw/models/listing/generator.py", line 73, in _next_batch Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "praw/reddit.py", line 566, in get Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "praw/reddit.py", line 672, in _objectify_request Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "praw/reddit.py", line 855, in request Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "prawcore/sessions.py", line 331, in request Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: File "prawcore/sessions.py", line 260, in _request_with_retries Sep 22 12:44:42 vm-rmd RMD-ubuntu[85313]: prawcore.exceptions.NotFound: received 404 HTTP response

Additional context

It appears that in praw_appy_filter you are only handing:

except TypeError as e:

When prawcore/sessions.py gets a 404 the exception bubbles up the stack and stops RMD dead in its tracks.

shadowmoose commented 2 years ago

Interesting. I wasn't aware it was possible to receive a 404 from Reddit mid-scan like that. Do you know what source is providing the problematic post?

scubanarc commented 2 years ago

I do not. I tried to figure it out, but the error logs were sparse about it, and the download queue was 1,000's of entries long.

It happens often, though. It may be related to using PIA VPN, but that is just wild conjecture.

shadowmoose commented 2 years ago

Thanks for checking anyways. I'll try to hunt this down, but it is unlikely that I'll continue pushing out updates to the python build of RMD for much longer since the rewrite is nearing feature parity. It is prohibitively difficult to chase down all the edge cases the Python build seems to produce for some users.