t3nsor / quora-backup

Python scripts to download Quora answers and convert them into a more portable form
GNU General Public License v2.0
125 stars 72 forks source link

Update crawler.py #18

Closed dda closed 5 years ago

dda commented 7 years ago

Line #167 page_html = urllib.request.urlopen(url[0:22]+urllib.parse.quote(url[22:])).read() Ad-hoc change to fix a bug where URLs with non-ASCII chars would break urlopen()

t3nsor commented 5 years ago

Can you clarify why this change is needed? For example in my answers.json I see stuff like this:

https://www.quora.com/Why-are-the-dx-i%E2%80%99s-examples-of-1-forms-on-R-n/answer/Brian-Bi

which as you can see already has the non-ASCII character correctly encoded.

Kongduino commented 5 years ago

Since that was almost 2 years ago, hell if I remember...

I submitted this because I got, back then, an error using the original, and this change fixed the issue.

Whatever man...