sparklemotion / mechanize

Mechanize is a ruby library that makes automated web interaction easy.
https://www.rubydoc.info/gems/mechanize/
MIT License
4.39k stars 473 forks source link

Mechanize::page#encoding_error? raises "ArgumentError: invalid byte sequence in UTF-8" when parse error message contains wrong characters. #553

Closed nejiko96 closed 3 years ago

nejiko96 commented 4 years ago

I think this will be avoided by a patch like this:

--- a/lib/mechanize/page.rb
+++ b/lib/mechanize/page.rb
@@ -103,7 +103,7 @@ class Mechanize::Page < Mechanize::File
     parser = self.parser unless parser
     return false if parser.errors.empty?
     parser.errors.any? do |error|
-      error.message =~ /(indicate\ encoding)|
+      error.message.scrub =~ /(indicate\ encoding)|
                         (Invalid\ char)|
                         (input\ conversion\ failed)/x
     end

but it seems like String#scrub requires Ruby version >= 2.1.0.

flavorjones commented 3 years ago

@nejiko96 Can you help me reproduce the error you'd like to fix?

nejiko96 commented 3 years ago
require 'mechanize'

agent = Mechanize.new
page = agent.get('https://atcoder.jp/contests/apg4b/tasks/APG4b_r')
page.search('body')
.../mechanize-2.7.7/lib/mechanize/page.rb:108:in `block in encoding_error?': invalid byte sequence in UTF-8 (ArgumentError)
        from .../mechanize-2.7.7/lib/mechanize/page.rb:105:in `any?'
        from .../mechanize-2.7.7/lib/mechanize/page.rb:105:in `encoding_error?'
        from .../mechanize-2.7.7/lib/mechanize/page.rb:126:in `block in parser'
        from .../mechanize-2.7.7/lib/mechanize/page.rb:123:in `reverse_each'
        from .../mechanize-2.7.7/lib/mechanize/page.rb:123:in `parser'
        from .../ruby/2.4.0/forwardable.rb:223:in `search'
        from test.rb:5:in `<main>'
flavorjones commented 3 years ago

Thank you for your help. Working on a fix now.

flavorjones commented 3 years ago

I've released v2.8.1 which fixes this. Thanks again.

nejiko96 commented 3 years ago

I have confirmed that it is fixed. Thank you very much.