spk / validate-website

Web crawler for checking the validity of your documents.
https://spk.github.com/validate-website/
MIT License
38 stars 9 forks source link

Crash: `split': URI must be ascii only #15

Closed nono closed 8 years ago

nono commented 8 years ago
$ validate-website -s https://cozy.io/ -n   
WARNING: Nokogiri was built against LibXML version 2.9.1, but has dynamically loaded 2.9.2
validating https://cozy.io/

.../home/nono/.rubies/ruby-2.2.0/lib/ruby/2.2.0/uri/rfc3986_parser.rb:20:in `split': URI must be ascii only "../images/community/contributors/nicodel.jp\u{11d}" (URI::InvalidURIError)
    from /home/nono/.rubies/ruby-2.2.0/lib/ruby/2.2.0/uri/rfc3986_parser.rb:72:in `parse'
    from /home/nono/.rubies/ruby-2.2.0/lib/ruby/2.2.0/uri/common.rb:226:in `parse'
    from /home/nono/.gem/ruby/2.2.0/gems/validate-website-1.5.3/lib/validate_website/crawl.rb:38:in `block in extract_imgs_from_page'
    from /home/nono/.gem/ruby/2.2.0/gems/nokogiri-1.6.6.2/lib/nokogiri/xml/node_set.rb:187:in `block in each'
    from /home/nono/.gem/ruby/2.2.0/gems/nokogiri-1.6.6.2/lib/nokogiri/xml/node_set.rb:186:in `upto'
    from /home/nono/.gem/ruby/2.2.0/gems/nokogiri-1.6.6.2/lib/nokogiri/xml/node_set.rb:186:in `each'
    from /home/nono/.gem/ruby/2.2.0/gems/validate-website-1.5.3/lib/validate_website/crawl.rb:36:in `reduce'
    from /home/nono/.gem/ruby/2.2.0/gems/validate-website-1.5.3/lib/validate_website/crawl.rb:36:in `extract_imgs_from_page'
    from /home/nono/.gem/ruby/2.2.0/gems/validate-website-1.5.3/lib/validate_website/crawl.rb:63:in `block in on_every_html_page'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/events.rb:238:in `block in every_html_page'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/agent.rb:582:in `call'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/agent.rb:582:in `block (2 levels) in visit_page'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/agent.rb:582:in `each'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/agent.rb:582:in `block in visit_page'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/agent.rb:518:in `block in get_page'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/agent.rb:684:in `prepare_request'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/agent.rb:512:in `get_page'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/agent.rb:578:in `visit_page'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/agent.rb:249:in `run'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/agent.rb:231:in `start_at'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/agent.rb:184:in `site'
    from /home/nono/.gem/ruby/2.2.0/gems/spidr-0.4.1/lib/spidr/spidr.rb:96:in `site'
    from /home/nono/.gem/ruby/2.2.0/gems/validate-website-1.5.3/lib/validate_website/crawl.rb:44:in `spidr_crawler'
    from /home/nono/.gem/ruby/2.2.0/gems/validate-website-1.5.3/lib/validate_website/crawl.rb:21:in `crawl'
    from /home/nono/.gem/ruby/2.2.0/gems/validate-website-1.5.3/lib/validate_website/runner.rb:16:in `run_crawl'
    from /home/nono/.gem/ruby/2.2.0/gems/validate-website-1.5.3/bin/validate-website:5:in `<top (required)>'
    from /home/nono/.gem/ruby/2.2.0/bin/validate-website:23:in `load'
    from /home/nono/.gem/ruby/2.2.0/bin/validate-website:23:in `<main>'
spk commented 8 years ago

Thanks for the report ! v1.5.4 released !

nono commented 8 years ago

Thanks for the quick fix!