openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
462 stars 74 forks source link

Collect statistics (and totals) on scraper runs #18

Open mlandauer opened 10 years ago

mlandauer commented 10 years ago

Stats:

mlandauer commented 10 years ago

This blog post is the most useful thing I've found about capturing metrics with lxc containers: http://blog.docker.io/2013/10/gathering-lxc-docker-containers-metrics/

mlandauer commented 10 years ago

There's a much easier way which will do most of what we need. Since morph is starting the scraping process we can put /usr/bin/time in front of it which will give info about the finished process:

Command being timed: "find"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 16%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.02
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 4208
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 2
Minor (reclaiming a frame) page faults: 312
Voluntary context switches: 43
Involuntary context switches: 65
Swaps: 0
File system inputs: 480
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
mlandauer commented 10 years ago

There is a possibility that the maximum resident set size that is reported is 4x larger than it should be. See https://groups.google.com/forum/#!topic/gnu.utils.help/u1MOsHL4bhg

It looks like we're on GNU time 1.7 - we might need to update it?

mlandauer commented 10 years ago

Golly gosh. I checked the source of the time command and it appears to be unfixed in the version of Ubuntu that we're using.

mlandauer commented 10 years ago

Fixed it on the collection side