Closed mustiikhalil closed 6 years ago
I'm getting a similar error when crawling a site.
Along the lines of;
Failure/Error raise InvalidURIError, "path conflicts with opaque"
you can clone master in the gem file and it would work perfectly
Finally fixed in 0.6.1.
I'm trying to crawl stackoverflow but the crawler keeps on giving me this error. apparently the problem is happening whenever it reaches the following link
I'm not sure how to fix it. since "subject=Stack%20Overflow%20Question&body=Time%20series%20speed%20forecasting%20using%20regression%20with%20exogenous%20variables%0Ahttps%3a%2f%2fstackoverflow.com%2fq%2f49618734%3fsem%3d2"
Traceback (most recent call last): 21: from main.rb:4:in
start_crawling' 19: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/spidr.rb:53:in
site' 18: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/agent.rb:274:insite' 17: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/agent.rb:355:in
start_at' 16: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/agent.rb:373:inrun' 15: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/agent.rb:665:in
visit_page' 14: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/agent.rb:599:inget_page' 13: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/agent.rb:788:in
prepare_request' 12: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/agent.rb:605:inblock in get_page' 11: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/agent.rb:679:in
block in visit_page' 10: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/page/html.rb:238:ineach_url' 9: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/page/html.rb:188:in
each_link' 8: from /usr/local/lib/ruby/gems/2.5.0/gems/nokogiri-1.8.2/lib/nokogiri/xml/node_set.rb:189:ineach' 7: from /usr/local/lib/ruby/gems/2.5.0/gems/nokogiri-1.8.2/lib/nokogiri/xml/node_set.rb:189:in
upto' 6: from /usr/local/lib/ruby/gems/2.5.0/gems/nokogiri-1.8.2/lib/nokogiri/xml/node_set.rb:190:inblock in each' 5: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/page/html.rb:189:in
block in each_link' 4: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/page/html.rb:182:inblock in each_link' 3: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/page/html.rb:239:in
block in each_url' 2: from /usr/local/lib/ruby/gems/2.5.0/gems/spidr-0.6.0/lib/spidr/page/html.rb:283:into_absolute' 1: from /usr/local/Cellar/ruby/2.5.0_2/lib/ruby/2.5.0/uri/generic.rb:822:in
path=' /usr/local/Cellar/ruby/2.5.0_2/lib/ruby/2.5.0/uri/generic.rb:766:incheck_path': path conflicts with opaque (URI::InvalidURIError)