xrmx / bootchart

merge of bootchart-collector and pybootchartgui
GNU General Public License v2.0
236 stars 88 forks source link

internal buffer overflow and other problem #4

Closed solsticedhiver closed 14 years ago

solsticedhiver commented 14 years ago

hi.

I don't know what's wrong. so i booted, changed the line in grub by adding initcall_debug printk.time=y init=/sbin/bootchartd

then I got my screen full of 'bootchart-collector - internal buffer overflow !' and rapidly filling it I let it boot anyway and that works. but now bootchart-collector is still using 100% cpu on one core. I can't kill it. there is no /var/log/bootchart.tgz and no image either.

and $ pybootchartgui -i No path given, trying /var/log/bootchart.tgz warning: path '/var/log/bootchart.tgz' does not exist, ignoring. Parse error: empty state: '/var/log/bootchart.tgz' does not contain a valid bootchart

using kernel-2.6.33.4 on archlinux

mmeeks commented 14 years ago

Goodness ! that is bad. Well, first off - I made that overflow message only churn out once - that should help a little.

As to overflowing - that should only happen if/as/when you have burned through 128Mb of buffer logging space. It is unusual to consume more than around 10Mb during a normal boot time. So ...

Either - you set your hz value too high (I assume it is the default 50 ?) - at which point bootchart-collector will consume tons of CPU and wallop your system pretty hard, -or- your boot process takes a -really- long time, something like ten minutes - and you filled the buffers anyhow.

Anyhow - I've changed the error in master to:

     fprintf (stderr, "bootchart-collector - internal buffer overflow! "
              "did you set hz (%lu) too high\n", hz);

which may help with further debuging.

solsticedhiver commented 14 years ago

I was using the default config. so 50 for hz. I got the internal buffer overflow error very earlier in the boot process after the 'waiting 10 seconds for udev blah blah...'. May be there is something in archlinux initrd that triggered the bug. It uses its own busybox. The boot did not take 10 minutes. may be a little longer than usual but at most 10 or 30 seconds more (38 seconds with bootchart 1)

so now with latest git, I got a message Starting bootchart-collector logging bootchart-collector started with 1 args: '50' may be it was there before but I never saw it.

then no more message filling up the screen and also I did not see any 'internal buffer overflow' error message anywhere. did you change anything there ?

also upon boot completion, i got bootchart-collector using 25% of cpu. and most importantly still no /var/log/bootchart.tgz or png

mmeeks commented 14 years ago

Okay... interesting; so - 25% of CPU is not unreasonable for the collector. Instantly running out of space to log stuff to is fairly unusual. If it is still running, I guess we didn't think we were in the main system, and didn't launch the beastie that waits and stops the collector.

First - can you verify that "sudo /sbin/bootchartd stop 2>&1 | tee /tmp/b-log" - stops the collector and dumps the data (what there is - and it is prolly corrupted by the buffer wraparound). It'd also be interesting to get the b-log output too.

Secondly. I guess we need to improve our detection of running inside an initrd ;-) most likely that is what we are getting wrong; I'll think about that.

Thanks.

mmeeks commented 14 years ago

I've just pushed 0.11.4 - which has some substantial initrd related cleanups; I hope that finally resolves all of the issues here. Can you give it a go ?

Thanks :-)

solsticedhiver commented 14 years ago

sorry for the delay. I installed bootchart2 but forgot about it :-(

So another test show no improvment. I get a new message at boot: "bootchart-collector running outside initrd"

I don't know if it's god or bad but that strange because I think it is supposed to be in the initrd at that time.

Still bootchart-collector running after boot and no log or png

so with the latest git, b-log is bootchart-collector started as pid 1177 with 2 args: '--dump' '/tmp/bootchart.DBlbLlIgAQ' Extracting profile data from pid 346 map 0xbfa31000 -> 0xbfa46000 size: 84k from 'bfa31000' 'bfa46000' Couldn't find state structures on pid 346's stack bootchart-collector unmounted proc / clean exit Can't find bootchart output in /tmp/bootchart.DBlbLlIgAQ - aborting

a tarball of a vdi and the config xml is available at http://dl.free.fr/hqK1u76Qi if you wish to use a Virtualbox machine with archlinux 2010.05 and bootchart2-git the md5sum is 88e96286b9fb6a872404c8235054798f root and test users have no passwords