rhiever / reddit-analysis

A Python script that parses post titles, self-texts, and comments on reddit and makes word clouds out of the word frequencies.
286 stars 63 forks source link

Fix HTTP 500 Internal Server Error #10

Closed rhiever closed 11 years ago

rhiever commented 11 years ago

Sometimes in huge subreddits, there's so many submissions to scrape that the script dies with a HTTP 500 Internal Server Error.

(see comments below)

bboe commented 11 years ago

there's so many submissions to scrape that the script dies with a HTTP 500 Interval Server

That's not the issue. The problem is reddit has some broken comment trees currently which result in 500 errors on some submissions. You just need to apply a similar fix as to what I did for subreddit_stats: https://github.com/praw-dev/prawtools/commit/3227acb23435d14540f8bda8e8a60f66980d1b78#L1R207

Probably just put the try/except around the processSubmission call in processSubreddit.

rhiever commented 11 years ago

Ahhh, thanks for pointing that out! Will split this into two issues then -- one to fix the big, one to to add the command line param.

bboe commented 11 years ago

Not seeing your comment anymore -- but you only need to wrap processSubmission in the processSubreddit code as processSubmission from processUser doesn't actually fetch the submission thus won't receive the 500 error unless something is really broken on reddit, in which case the results will likely be incredibly incorrect and you'll want to re-run.

rhiever commented 11 years ago

I removed my comment b/c I saw you already answered it above. :-)

rhiever commented 11 years ago

Fixed by https://github.com/rhiever/reddit-analysis/pull/15

bboe commented 11 years ago

Just curious why you're making pull requests to merge? It's great to work in a separate branch, but having direct-push access means you can directly merge without ever pushing the temporary branch remotely.

rhiever commented 11 years ago

Just practicing working in a separate branch. Probably better practice in the long run?

bboe commented 11 years ago

Makes sense. I just wanted to check if you wanted me to follow the same procedure if/when I make changes. Given your response, it seems skipping the PR step is okay.

rhiever commented 11 years ago

Mostly trying to avoid commit issues like we ran in to earlier. Those are icky.

bboe commented 11 years ago

Mostly trying to avoid commit issues like we ran in to earlier. Those are icky.

It's definitely going to happen from time to time no matter how much branching you do. The key take-a-way is if the merge can't happen automatically then you need to proceed with caution. For instance you could have replied that the merge wouldn't auto commit, and I could have updated it such that it would (that's essentially what I did with the branch you reset to). You probably noticed I didn't get it completely correct as I forgot to propagate the file deletion as part of my merge.

rhiever commented 11 years ago

:+1: