Closed benempson closed 6 years ago
My understanding is similar to yours. HttpServicePointConnectionLimit is defining how many outbound internet connections are allowed, so it should be greater than MaxConcurrentThreads. One caveat to keep in mind is that the page processing (parsing of links, running any rules/checks, analytics, etc...) also run on the threads created for MaxConcurrentThreads so the relation between ServicePointConnectionLimit and MaxConcurrentThreads is a little blurred (ie.. the theoretical math wont always work).
Hope that helps Steven
On Thu, Jan 25, 2018 at 1:46 AM, benArrayx notifications@github.com wrote:
Hi there, I'm just trying to figure out these 2 properties. My interpretation is that if HttpServicePointConnectionLimit is less than MaxConcurrentThreads, then HttpServicePointConnectionLimit is going to be the limiting factor in the equation.
For example, if HttpServicePointConnectionLimit = 2 and MaxConcurrentThreads = 10 then only 2 concurrent requests are ever going to be made.
Conversely, if HttpServicePointConnectionLimit = 10 and MaxConcurrentThreads = 2 then again only 2 concurrent requests are ever going to be made, albeit for a different reason.
Is this correct? Is there any guidance about which setting to choose to rate limit a crawl?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sjdirect/abot/issues/178, or mute the thread https://github.com/notifications/unsubscribe-auth/ADot4hWKSUkx9SgeUAhMQ95qvgnvHXa-ks5tOE15gaJpZM4RslDP .
Also, questions like these should go to the forum since its a better format for question/answer.
Thanks for the response Steven, sure I'll go to the forum in future, sorry about that. Just to finish up here, from what you are saying, I think it's best therefore to set both properties to the same value ie. if I only want a maximum of 2 connections to be made, then set both to 2. Do you agree with that?
I would say that the service point connection should be GREATER than the maxconcurrentthreads. Any other requests above the actual crawl (for example the robots.txt check or if you are using AbotX ParallelCrawler) would be queued. I would rather configure it high and then let my abot/abotx config limit it further.
On Fri, Jan 26, 2018 at 3:40 AM, benArrayx notifications@github.com wrote:
Thanks for the response Steven, sure I'll go to the forum in future, sorry about that. Just to finish up here, from what you are saying, I think it's best therefore to set both properties to the same value ie. if I only want a maximum of 2 connections to be made, then set both to 2. Do you agree with that?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/sjdirect/abot/issues/178#issuecomment-360763860, or mute the thread https://github.com/notifications/unsubscribe-auth/ADot4qPEz9Rdr2ZAnBAde72_GhJH8-05ks5tObnEgaJpZM4RslDP .
Hi there, I'm just trying to figure out these 2 properties. My interpretation is that if HttpServicePointConnectionLimit is less than MaxConcurrentThreads, then HttpServicePointConnectionLimit is going to be the limiting factor in the equation.
For example, if HttpServicePointConnectionLimit = 2 and MaxConcurrentThreads = 10 then only 2 concurrent requests are ever going to be made.
Conversely, if HttpServicePointConnectionLimit = 10 and MaxConcurrentThreads = 2 then again only 2 concurrent requests are ever going to be made, albeit for a different reason.
Is this correct? Is there any guidance about which setting to choose to rate limit a crawl?