Change to 250hz and voluntary preemption

robingroppe commented 8 years ago

Please change the kernel to 250hz and voluntary preemption.

In everyday use my Pi with the modified kernel can take more work while still being responsive to all other running tasks. For example when i run the MumbleRubyPluginbot, with the stock kernel, which needs to be fed with data every 20ms or so and start an apt upgrade of some packets it quickly starts to lag. With the modified kernel everything is fine. You said you would need some evidence. I ran UnixBench on both kernels. I cant tell much about starting graphical apps, because my pi is running headless but i guess you guys are having crosscompilers set up and can easily build a modified kernel. By the way these two things i am mentioning are also used in almost every stockkernel in most distributions (Debian, Ubuntu...). And in my opinion there is a reason for that.

Unixbench Stock Kernel: https://robingroppe.de/media/rpi2/orig.txt Unixbench Modified Kernel: https://robingroppe.de/media/rpi2/mod.txt

robingroppe commented 8 years ago

Have you compared the raw performance of a 100hz kernel vs. a 1000hz kernel?

robingroppe commented 8 years ago

Context switches and scheduler interrupts are pretty cheap in relation of how much smoother the system feels. I have stopped the bc thing ~2m950s with a 1000hz kernel.

Ferroin commented 8 years ago

That depends on what you mean by 'raw performance'. In a pure computational sense, HZ=1000 will give around 10x less computational performance than HZ=100, because you will have approximately 10 the scheduling overhead. As far as latency, I have no actual numbers, but I see a noticeable difference (probably subjective, but I did jump through hoops to do proper double-blind testing), but that doesn't nessicarily mean anything, and 1000Hz is overkill for almost anything but gaming.

robingroppe commented 8 years ago

How do you think the performance will drop about factor 10? How long does a interrupt take?

Ferroin commented 8 years ago

Apparently I'm horrible typing numbers today...

I'm not entirely sure what I was intending to state there, but a 10x performance drop was definitely not it. The bit about the scheduling overhead still stands though, if you're running the scheduler 10 times as often, you have 10 times the overhead.

In response to the statement before my comment: Context switches are always expensive compared to almost anything else(you're saving a significant percentage of the processor state to memory, and copying in some other saved state, and completely trashing the processor cache), that's part of why fork() is so slow on almost every system in existence, and why anybody doing real HPC work locks individual tasks to their own CPU core, and then avoids syscalls at all costs. I don't remember the efficiency of the scheduler on Linux, but I'm pretty sure it's not O(1) and scales in some way with the number of tasks. And on top of that, the more frequently interrupts are firing, the higher your power consumption. I feel that the trade-off is probably worth it to use HZ=250, but it's almost certainly not worth it when using HZ=1000.

robingroppe commented 8 years ago

I have mentioned the 1000hz to get to the other extreme. Even there it was not even a loss of 1%. Maybe there are Workloads where this really hurts. But as we are talking about 250hz and i have tested it a bit, I have to say that i dont see a downside. I have seen a loss of 0.048% in crunchin numbers with bc but a massive gain on Unixbench. Especially in Multicore Workloads. The Arch Guys saw a massive gain on Linpack too.

popcornmix commented 8 years ago

We're happy enough with switching to CONFIG_PREEMPT_VOLUNTARY to match Ubuntu and OpenELEC. This is in latest rpi-update kernel.

We'll see if we get any positive or negative reports from this, and possibly increase the HZ value in a subsequent update.

mk01 commented 8 years ago

perhaps what needs to be understood:

voluntary <> preemptive - more preemption points doesn't mean preemption will always happen. only can happen
periodicity of ticks - periodic ticks are (imho) many years obsolete. sure, there can be specific requirements, but honestly - will be hard to find - in actual kernels cost to handle a tick is close to non measurable times. so it alone has barely any effect. actually if 100 if 1000, doesn't make sense as long as this can be handled dynamically. and - it can (seeing as low as 5-10 ticks /1s is not uncommon, so why to generate still those 100?)
what I agree with is 3-5% gain on throughput with voluntary case, but again - 95% of times our embedded devices run with ONE active task anyhow -> there will be no difference.
nobody checked other CONFIG params - specially HIGH_RES_TIMERS. until this is used, all being said / set / expected is being invalidated. depending on platforms timers implementation (and which one is used for HR timers) - it is responsible for 2000-3000 wakeups per second - so where is the discussion if 250 or 300

mk01

btw: kernel is not the same as 20years ago - this was the time when all those distros set its .config params. and never looked back.

Ferroin commented 8 years ago

@mk01 The point about preemption is perfectly valid, but doesn't have much bearing on the fact that more preemption points means lower latency on stuff that's latency sensitive. As far as tickless systems (which is what you appear to be referring to in the second and third points), that is all well and good except at least one CPU has to have something running to provide timekeeping, and as a result of this, a generation 1 Pi can't be run tickless at all (because it has only one CPU). On top of that, the timer frequency still has an impact when the system isn't sitting idle, because when the tick is running, that's the average frequency it runs at (it's only the average because of how linux's scheduler works, but that is beyond the scope of this discussion). The bit about hrtimers is also worth considering, but there isn't as much variance in that as you would think, and that only causes wakeups when something is directly using it.

As far as the comment about distros not changing kernel configuration, that's blatantly wrong. Aside form the fact that the most widely used distros didn't exist 20 years ago, almost none of them just set config options the first time and never change them. It doesn't happen often in most distros, but it does happen. Usually it's as new features become stable (BTRFS and F2FS are both included as modules in all major distros that ship precompiled kernels, the didn't even exist as config options 5 years ago, let alone 20). Less frequently, distros change config options for performance reasons (this is why Ubuntu ships a standard kernel, a virtualization targeted kernel, a server targeted kernel, and a low-latency kernel (which has HZ=1000 and PREEMPT_FULL, and why they switched from the CFQ I/O scheduler to the deadline I/O scheduler), or for security reasons (I know of at least a few distros that recently disabled vm86 support in their default configs, most jumped on disabling 16-bit segment support, a lot of them quickly turn off any legacy syscall when an option to do so appears, etc).

P33M commented 7 years ago

We have settled on CONFIG_PREEMPT_VOLUNTARY and CONFIG_HZ=100 as the default. We have a microsecond-resolution timestamp source for precise userspace timing.

trabant-asb commented 8 months ago

I do realize that this discussion and associated pull requests are a few years old. I'm trying to figure out the best kernel settings for a Pi3 database server. The Pi 3 defconfig references a tick of 1000Hz and full preemption. The 2711 and 2712 defconfigs set the clock to 250Hz, also with full preemption.

May I ask why the recommended clock speed for the Pi 3 is that high, maybe related to memory access? As my database is all about I/O and interrupts, I will likely go for voluntary preemption, and not sure yet about the clock speed.

pelwell commented 8 months ago

The arm64 config file bcmrpi3_defconfig was the first 64-bit config file, originally contributed by users keen to experiment with 64-bit builds. We've kept it moderately inline with our "official" defconfigs, but never seriously vetted it. I think there's an argument for dropping it altogether, now that 64-bit has become mainstream.

raspberrypi / linux

Change to 250hz and voluntary preemption #1216