Closed molfar closed 11 years ago
Connecttimeout is in seconds, please try connecttimeout_ms. http://rubydoc.info/github/typhoeus/ethon/Ethon/Easy/Options#connecttimeout%3D-instance_method
I just tried even both timeouts in ms - no effect.
require "bundler"
require "typhoeus"
# Some statistics
start_time = Time.now
@success = 0
@timeout = 0
@failed = 0
@zero_code = 0
# Set callback
Typhoeus.on_complete do |response|
if response.success?
puts "Success #{response.effective_url}"
@success += 1
elsif response.timed_out?
puts "Timeout #{response.effective_url}"
@timeout += 1
elsif response.code == 0
puts "Zero code #{response.effective_url}"
@zero_code += 1
else
puts "Failed #{response.effective_url}"
@failed += 1
end
puts "#{(Time.now - start_time).round(1)} sec: S #{@success} / F #{@failed} / T #{@timeout} / Z #{@zero_code}"
# What should I do here to get off this proccessed response object from memory?
end
# Open file with 10K urls and break it into 10 packs
File.open("list.txt", "r").lines.each_slice(1000) do |pack|
# I am not sure that this must bu here and not outside before File.open block
hydra = Typhoeus::Hydra.new max_concurrency: 50
# Add each url from pack to queue
pack.each do |line|
request = Typhoeus::Request.new line, timeout_ms: 30000, followlocation: true, connecttimeout_ms: 10000
hydra.queue request
end
# Run queue with only 1000 urls
hydra.run
# This never runs, because the script hangs on the one of queued urls
# Both timeouts are ignored at all
# https://github.com/typhoeus/ethon/issues/30 here is the issue
puts "Pack completed!"
end
oh. can I see that url?
Thanked to you, I've got it. The lists contains url with cycle redirects. I set maxredirs: 2 and everything should be OK, but... After first loop (see my code), processing first pack of urls and executed "puts "Pack completed!"", all following urls (about 9K) suddenly got zero code.
Actually there is no need for your that packing - at least not from a typhoeus point. Try queueing everything at once AND lower max_concurrency to 25.
# What should I do here to get off this proccessed response object from memory?
nothing. Thats the work of the GC.
The problem gets back to memory usage.
not sure what to do about it - there is no memory leak as far as I know. GC just didn't kicked in... I could show you what the GC does and you would see the memory going down. Other suggestions?
I manually start GC after each 100 urls. But free memory still decreases.
require "bundler"
require "typhoeus"
# Some statistics
start_time = Time.now
@success = 0
@timeout = 0
@failed = 0
@zero_code = 0
@total = 0
# Set callback
Typhoeus.on_complete do |response|
if response.success?
puts "Success #{response.effective_url}"
@success += 1
elsif response.timed_out?
puts "Timeout #{response.effective_url}"
@timeout += 1
elsif response.code == 0
puts "Zero code #{response.effective_url}"
@zero_code += 1
else
puts "Failed #{response.effective_url}"
@failed += 1
end
puts "#{(Time.now - start_time).round(1)} sec: S #{@success} / F #{@failed} / T #{@timeout} / Z #{@zero_code}"
@total += 1
if @total % 100 == 0
puts "GC running"
GC.start
end
response = nil
end
hydra = Typhoeus::Hydra.new max_concurrency: 50
# Open file with 10K urls and break it into 10 packs
File.open("list.txt", "r").each_line do |line|
request = Typhoeus::Request.new line, timeout: 30, followlocation: true, connecttimeout: 10, maxredirs: 2
hydra.queue request
end
hydra.run
@molfar I don't have much time atm - but I already have an idea: typhoeus maintains its own easy pool. Will get back to you.
Should we close this issue due to the memory leak fix? Seems like this has likely been tracked down.
@richievos I think so. @molfar Let me know in case that didn't fix your issue.
I know that issue is closed but i have a que. When I go thru couple of thousands url ..it works fine in beginning and as soon it pass the first if case. it will broke for about 10 sec and then turn on again. Similar issue that i gues u had it where you only see response_code = 0 here is the little code.
hydra = Typhoeus::Hydra.new(max_concurrency: 10) newArray.each do |url| request = Typhoeus::Request.new((url),maxredirs: 2) word = '' request.on_complete do |resp| if resp.success? test_word.each do |target| if resp.body.match(target) word = target puts resp.body
#break
end
end
Let me know i fu need some more info..'
@joshidhruv thanks for reporting! Could you open a new issue? Could you also include you Typhoeus and Ethon version as well as you libcurl version (curl --version
)? Could you please also try to spell correctly because I had a hard time understanding your problem.
Thanks!
I set both timeout and connectiontimeout Typhoeus::Request.new line, timeout: 30, followlocation: true, connecttimeout: 10 But at this place https://github.com/typhoeus/ethon/blob/master/lib/ethon/multi/operations.rb#L126 with long-loading url the script waits infinite amount of time, ignoring both timeout values.