typhoeus / ethon

Very simple libcurl wrapper.
MIT License
137 stars 139 forks source link

Timeout ignored #30

Closed molfar closed 11 years ago

molfar commented 11 years ago

I set both timeout and connectiontimeout Typhoeus::Request.new line, timeout: 30, followlocation: true, connecttimeout: 10 But at this place https://github.com/typhoeus/ethon/blob/master/lib/ethon/multi/operations.rb#L126 with long-loading url the script waits infinite amount of time, ignoring both timeout values.

hanshasselberg commented 11 years ago

Connecttimeout is in seconds, please try connecttimeout_ms. http://rubydoc.info/github/typhoeus/ethon/Ethon/Easy/Options#connecttimeout%3D-instance_method

molfar commented 11 years ago

I just tried even both timeouts in ms - no effect.

require "bundler"
require "typhoeus"

# Some statistics
start_time = Time.now
@success = 0
@timeout = 0
@failed = 0
@zero_code = 0

# Set callback
Typhoeus.on_complete do |response|
  if response.success?
    puts "Success #{response.effective_url}"
    @success += 1
  elsif response.timed_out?
    puts "Timeout #{response.effective_url}"
    @timeout += 1
  elsif response.code == 0
    puts "Zero code #{response.effective_url}"
    @zero_code += 1
  else
    puts "Failed #{response.effective_url}"
    @failed += 1
  end
  puts "#{(Time.now - start_time).round(1)} sec: S #{@success} / F #{@failed} / T #{@timeout} / Z #{@zero_code}"

  # What should I do here to get off this proccessed response object from memory?
end

# Open file with 10K urls and break it into 10 packs
File.open("list.txt", "r").lines.each_slice(1000) do |pack|

  # I am not sure that this must bu here and not outside before File.open block
  hydra = Typhoeus::Hydra.new max_concurrency: 50

  # Add each url from pack to queue
  pack.each do |line|
    request = Typhoeus::Request.new line, timeout_ms: 30000, followlocation: true, connecttimeout_ms: 10000
    hydra.queue request
  end

  # Run queue with only 1000 urls
  hydra.run

  # This never runs, because the script hangs on the one of queued urls
  # Both timeouts are ignored at all
  # https://github.com/typhoeus/ethon/issues/30 here is the issue
  puts "Pack completed!"

end
hanshasselberg commented 11 years ago

oh. can I see that url?

molfar commented 11 years ago

Thanked to you, I've got it. The lists contains url with cycle redirects. I set maxredirs: 2 and everything should be OK, but... After first loop (see my code), processing first pack of urls and executed "puts "Pack completed!"", all following urls (about 9K) suddenly got zero code.

hanshasselberg commented 11 years ago

Actually there is no need for your that packing - at least not from a typhoeus point. Try queueing everything at once AND lower max_concurrency to 25.

hanshasselberg commented 11 years ago
# What should I do here to get off this proccessed response object from memory?

nothing. Thats the work of the GC.

molfar commented 11 years ago

The problem gets back to memory usage. skitch

hanshasselberg commented 11 years ago

not sure what to do about it - there is no memory leak as far as I know. GC just didn't kicked in... I could show you what the GC does and you would see the memory going down. Other suggestions?

molfar commented 11 years ago

I manually start GC after each 100 urls. But free memory still decreases.

require "bundler"
require "typhoeus"

# Some statistics
start_time = Time.now
@success = 0
@timeout = 0
@failed = 0
@zero_code = 0
@total = 0

# Set callback
Typhoeus.on_complete do |response|
  if response.success?
    puts "Success #{response.effective_url}"
    @success += 1
  elsif response.timed_out?
    puts "Timeout #{response.effective_url}"
    @timeout += 1
  elsif response.code == 0
    puts "Zero code #{response.effective_url}"
    @zero_code += 1
  else
    puts "Failed #{response.effective_url}"
    @failed += 1
  end
  puts "#{(Time.now - start_time).round(1)} sec: S #{@success} / F #{@failed} / T #{@timeout} / Z #{@zero_code}"
  @total += 1

  if @total % 100 == 0
    puts "GC running"
    GC.start
  end

  response = nil
end

hydra = Typhoeus::Hydra.new max_concurrency: 50

# Open file with 10K urls and break it into 10 packs
File.open("list.txt", "r").each_line do |line|

  request = Typhoeus::Request.new line, timeout: 30, followlocation: true, connecttimeout: 10, maxredirs: 2
  hydra.queue request

end

hydra.run
hanshasselberg commented 11 years ago

@molfar I don't have much time atm - but I already have an idea: typhoeus maintains its own easy pool. Will get back to you.

richievos commented 11 years ago

Should we close this issue due to the memory leak fix? Seems like this has likely been tracked down.

hanshasselberg commented 11 years ago

@richievos I think so. @molfar Let me know in case that didn't fix your issue.

joshidhruv commented 10 years ago

I know that issue is closed but i have a que. When I go thru couple of thousands url ..it works fine in beginning and as soon it pass the first if case. it will broke for about 10 sec and then turn on again. Similar issue that i gues u had it where you only see response_code = 0 here is the little code.

hydra = Typhoeus::Hydra.new(max_concurrency: 10) newArray.each do |url| request = Typhoeus::Request.new((url),maxredirs: 2) word = '' request.on_complete do |resp| if resp.success? test_word.each do |target| if resp.body.match(target) word = target puts resp.body

        #break
    end
end

Let me know i fu need some more info..'

hanshasselberg commented 10 years ago

@joshidhruv thanks for reporting! Could you open a new issue? Could you also include you Typhoeus and Ethon version as well as you libcurl version (curl --version)? Could you please also try to spell correctly because I had a hard time understanding your problem. Thanks!