mandiberg / printwikipedia

30 stars 11 forks source link

Fix Memory Leak #6

Closed mandiberg closed 10 years ago

mandiberg commented 11 years ago

As the code runs, each cycle takes longer and longer. The cycle time increases linearly as articles increase. This occurs no matter where in the DB you start. We assume this is a memory leak of some kind. We have dereferrenced things, which resulted in a small speed increase, but it still grows linearly over time.

As a result, we implemented a graceful restart process, where the script runs for a fixed period of time (currently 27 minutes) then finished the last volume, and finishes. At 30 minutes a scheduler script restarts the process. The workaround works for the most part, but it is not the most elegant long term solution.

Attached below is a screenshot of the graphs of the two sets of data, it can give you a good quick idea of what is happening in the two tests we ran. The x-axis represents the volume number, and the y-axis is the time to complete each volume. The blue points are the data from when we started at article 6,000,000 and then ran for about half a day. The yellow line is a line of best fit for the blue data points. The set of orange points are from when we ran it starting at 6,500,000 for about 2 hours, with the green line being a line of best fit. Both sets of data increase as more volumes are created, and both start around 20 seconds per volume.

volume_time