Closed nedchu closed 5 years ago
Hi, Thanks for highlighting the Unicode issue.
I believe replacing characters won't be a generic solution, have you tried changing file open()
mode to byte i.e. "wb" instead of "w"?
Hi, Thanks for highlighting the Unicode issue.
I believe replacing characters won't be a generic solution, have you tried changing file
open()
mode to byte i.e. "wb" instead of "w"?
Hi, your suggestion works!
I've changed file open mode to byte and using original encoding of html soup.original_encoding
to decode soup
and turn it into bytes
.
During testing, the code can safely downloading http://moss.stanford.edu/results/305606753/
Looks good, Merged.
In
download_report.py
When downloading http://moss.stanford.edu/results/305606753/ , I find an error. This problem is fixed by:Also in
download_report.py
When callingdownload_report(url, path)
, ifpath
is not end with/
, the name of directory will become prefix of file. For example, callingdownload_report(url, "./result")
will get./resultindex.html
and./resultmatch0.html
. This problem is fixed by: