rhiever / reddit-analysis

A Python script that parses post titles, self-texts, and comments on reddit and makes word clouds out of the word frequencies.
286 stars 63 forks source link

running word_freqs throws ValueError for all submissions #53

Open dotnwat opened 8 years ago

dotnwat commented 8 years ago

I'm having some trouble running word_freqs. When I run the tool each submission is skipped due to ValueError, and the output files are empty. Example:

(env) MacBook-Air:reddit-analysis john$ word_freqs -l 1 -p day johndoe /r/news
Analyzing /r/news
.
Skipping submission https://www.reddit.com/r/news/comments/4zkn83/itt_tech_banned_from_enrolling_new_students_who/ due to ValueError.
rhiever commented 8 years ago

Darn. Maybe Reddit changed their API and that's messing up the script. I've filed this as a probable bug, but I probably won't find time to look into this anytime soon. Add a comment here if you're interested in digging into this bug.

cc @bboe

dotnwat commented 8 years ago

I can dig into it it a bit. What I'm seeing is that in parse_text an exception is being thrown when BeautifulSoup object is created.

The input that I printed out was

I just sold all my things and bought a one way ticket to the other side of the world. I consider 
myself opting out; engaged withdrawal. Experimenting with shedding the ways in which I've 
been conditioned by my culture and my comforts. I'll be camping down the coast by myself for 
the next few weeks before catching a flight with nothing but the pack on my back. I have very 
few solid plans, just trust. It's thrilling and extremely uncomfortable.

And looks like the exception that is thrown is <class 'bs4.FeatureNotFound'>

bboe commented 8 years ago

@rhiever I don't have any insight to add from Reddit's API side. Perhaps there is a new bs4 version that is backwards incompatible?

mikeygeeman commented 8 years ago

Try changing findAll to find_all

I think that changed somewhere

bboe commented 6 years ago

Also this one, is it still an issue?

rhiever commented 6 years ago

Never got around to digging into this one to see if it is indeed a bug.