Closed rhiever closed 11 years ago
So I think this is where post-processing should be used. By itself i.imgur.com
won't get through, but something like i.imgur.com/blah.jpg
will get through and be parsed as:
['imgur.com', 'blah.jpg']
Is it really a problem though -- were those significant enough to show up in the graph? I doubt many people will write out the same url without the http://
prefix.
Is it really a problem though -- were those significant enough to show up in the graph?
Yep, /r/androidcirclejerk had big i.imgur.com
and somefilename.jpg
words in the word cloud. I had to manually remove them.
Hmm -- maybe the best thing to do is to sort the output file by count so it's easy to make removals before running through wordle.
Edit: Ah, I think I have a fix.
Fix specific to this case in 6b4ff465d5cda78eacc0280680089b129b95f411. It will still allow somefilename.jpg
to be added if it appears by itself, but that's probably okay if appearing by itself significant enough to be included in the graphic.
:+1:
I had one come up where it had
i.imgur.com
andsomefilename.jpg
in the final results.