taganaka / polipus

Polipus: distributed and scalable web-crawler framework
MIT License
92 stars 32 forks source link

Revert 59 fix utf8 support #60

Closed pcboy closed 9 years ago

pcboy commented 9 years ago

I found a better way to handle these weird Japanese charsets. With the previous version I manually specified the encoding in Nokogiri::HTML. But it seems SHIFT-JIS is not quite supported by Nokogiri. So now I'm using kconv String#toutf8 monkeypatch to convert the source to utf8 and set the Nokogiri encoding to utf-8. It works well and is much safer (and simpler).

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.5%) to 93.29% when pulling cd8d5838512bbc07c95c37e67c7e9099f4b4ae4f on pcboy:revert-59-fix_utf8_support into 95c325b6747bde6200cda04c13513ff407d4003c on taganaka:master.