Closed GoogleCodeExporter closed 9 years ago
After quite a bit of back and forth I decided not to implement this for the
reasons below.
1: Most methods available (listed below) to detect memory usages only provide
estimations and are not a good enough metric to count on. We could be stopping
the crawl prematurely based on bad estimations.
-System.GC.GetTotalMemory()
-Process.GetCurrentProcess().VirtualMemorySize64
-Environment.WorkingSet)
-There are several others but all have the same problem
2: Just because you have 2gb of memory allocated to a process and the sum of
all your object instances is below 2gb doesn't mean that you wont have memory
exceptions/problems. If you are using a List<string> (which is backed by an
array) and say it gets up to about 600mb in size, you can get a
OutOfMemoryException if there is not 600mb of contiguous memory space (which
the array requires) to load it or store it in memory. Point being, that just
because the process memory used is below the available memory doesn't mean your
ok. If this feature were implemented it would give a false sense of memory
safety.
I believe the key to properly handling memory issues from within a crawler is
by using the "MaxPagesToCrawl" config item. If you want to crawl millions of
pages in a single continuous crawl you need to make some estimates of how many
page crawls your server can handle. See this forum post on how to determine the
amount of hardware you'll need for your intended purpose. Also be sure to take
page file size into about (at least 1.5 time the ram)...
https://groups.google.com/forum/?fromgroups=#!topic/abot-web-crawler/rsICtZgzpRQ
Bottom line is dynamically handling memory consumption/limits doesn't help much
from a crawler perspective. There are to many variables to consider and it can
actually cause you more problems than your trying to solve. My advice is to set
the "MaxPagesToCrawl" value to what you estimate (from the post above) your
machine can handle.
Original comment by sjdir...@gmail.com
on 28 Dec 2012 at 10:57
Found a dependable way to handle this using....
http://msdn.microsoft.com/en-us/library/system.runtime.memoryfailpoint(v=vs.100)
.aspx
Reopening this ticket to implement
Original comment by sjdir...@gmail.com
on 27 Feb 2013 at 7:25
Original comment by sjdir...@gmail.com
on 1 Mar 2013 at 1:06
This issue was closed by revision r293.
Original comment by sjdir...@gmail.com
on 13 Mar 2013 at 9:18
This issue was closed by revision r294.
Original comment by sjdir...@gmail.com
on 13 Mar 2013 at 9:51
Original issue reported on code.google.com by
sjdir...@gmail.com
on 25 Dec 2012 at 11:58