I found a better way to handle these weird Japanese charsets.
With the previous version I manually specified the encoding in Nokogiri::HTML.
But it seems SHIFT-JIS is not quite supported by Nokogiri.
So now I'm using kconv String#toutf8 monkeypatch to convert the source to utf8 and set the Nokogiri encoding to utf-8. It works well and is much safer (and simpler).
Coverage increased (+0.5%) to 93.29% when pulling cd8d5838512bbc07c95c37e67c7e9099f4b4ae4f on pcboy:revert-59-fix_utf8_support into 95c325b6747bde6200cda04c13513ff407d4003c on taganaka:master.
I found a better way to handle these weird Japanese charsets. With the previous version I manually specified the encoding in Nokogiri::HTML. But it seems SHIFT-JIS is not quite supported by Nokogiri. So now I'm using kconv String#toutf8 monkeypatch to convert the source to utf8 and set the Nokogiri encoding to utf-8. It works well and is much safer (and simpler).