pcengines / coreboot

github mirror of coreboot.org's master repository
http://www.coreboot.org/
GNU General Public License v2.0
73 stars 11 forks source link

APU2 CPU turbo clock support #266

Closed soder10 closed 5 years ago

soder10 commented 5 years ago

Hello all,

since https://github.com/pcengines/coreboot/issues/196 seems to be solved finally, can you please have a look at why the turbo clock activation for this CPU is not possible on Opnsense?

krystian-hebel commented 5 years ago

@miczyg1 will https://github.com/pcengines/coreboot/pull/257 enable it?

miczyg1 commented 5 years ago

@krystian-hebel yup. @soder10 have a look at: https://github.com/pcengines/coreboot/blob/coreboot-4.0.x/CHANGELOG.md#v4024---2019-02-04 https://github.com/pcengines/coreboot/blob/release/CHANGELOG.md#v4902---2019-02-04 The Core Performance Boost is what You are asking for and will be enabled since v4.9.0.2/v4.0.24 (should be published soon).

soder10 commented 5 years ago

@miczyg1: thanks for the info. I have subscribed to any changes on this github document, as its very difficult to track when, what, why and how changes in this software. I am glad that this feature will finally be available. But due to the previous issues (reboot hang, CPU stuck at downclocked state) I am extremely cautious to flash this new release when its published. Can you share any preliminary info about the impact?

https://github.com/pcengines/apu2-documentation/blob/master/docs/debug/cpu_frequency.md --> unfortunately this sounds complete chinese to me :( So something more simplified summary would help us better to understand what we can really expect from this new turbo feature.

1) Are there any conditions or limitations when the turbo is not achievable? 2) Is the turbo performance gain measurable? Or the gain is so minimal, that cannot be really detected in benchmarks? 3) How can the correct working of the turbo speed be confirmed in a running system? For example will I see clock speed in Opnsense reported as 1.4 Ghz via: sysctl dev.cpu.0.freq / dev.cpu.0.freq_levels ? 4) How long has it been tested to see its stable, does not cause freeze, reboot, or stuck at reboot phase?

miczyg1 commented 5 years ago

@soder10

  1. The turbo is enabled in automatic mode, so it is always achieveable and handled by hardware.
  2. Yes it is measurable. One can observe memory bandwidth increase in memtest comparing to older releases. Benchmarks work too. I have prepared a blog post that proves the correct operability of boost. The post will be published on 3mdeb blog along with v4.9.0.2 release so stay tuned.
  3. Unfortunately not. Boosted processor states are hidden from OS. Only benchmarks show the increase in overall performance. Blog post will present the exact results. And the boost is applied only to one core, that is how the feature works. Boosted frequency is 1200MHz. I know that some sites report 1.4GHz for the GX-412TC SoC, but this is incorrect.
  4. It is stable. We have sophisticated and detailed reggresion tests that are performed by automated validation framework with each release. The tests give reliable information about the platform state. I have also tested the feature manually by myself in many ways of stressing etc. Additionally the reboot hang has been fixed already in v4.9.0.1. PS: Regarding following the updates, we are launching a newsletter for PC Engines firmware. Any news and introduced fixes/improvements will be described in emails. Subscription to newsletter will be available next week, probably with v4.9.0.2 release, on the https://pcengines.github.io site.
soder10 commented 5 years ago

Thanks for the detailed explanation. Eagerly waiting for the new release, to test it myself.

Regarding 3): all the official sources state the same thing: https://www.amd.com/Documents/AMDGSeriesSOCProductBrief.pdf https://www.amd.com/en/products/specifications/embedded/ Model: GX-412TC Product Type: CPU Family: AMD Embedded G-Series Processors Line: 2nd Generation G-Series SOC OPN: GE412TIYJ44JB TDP: 6W CPU Type: Jaguar+ CPU Base Freq.: 1GHz CPU Max Freq.: 1.4GHz !!! number of CPU Cores: 4 Security Processor: No Total L2 Cache: 2MB System Memory Type: DDR3 DDR3 Rate (Max): 1333 MHz ECC: Yes

So is AMD lying openly to the public?

miczyg1 commented 5 years ago

@soder10 to be clear, I said boosted frequency is 1.2GHz, this is what the benchmarks report (I have calculated the frequency based on overall performance improvement percentage). Specification says max frequency is 1.4GHz (in the means of limit) , not boosted frequency. Most likely it is possible to tune/overclock it to 1.4GHz. The specification is not precise enough and many people misinterpret the values.

soder10 commented 5 years ago

I have never heard about either Intel or AMD specifying in their product sheet the maximum overclockable frequency of any of their CPUs. So I consider whats in the product sheet is what the CPU must be able to achieve without any type of overclock. I mean its a very tragic fact, that after so many years of this product is on the market, not a single person has stood in front of the comminity to reveal if AMD is threatening their embedded partners not to reveal the true specifications of this damn chip.

soder10 commented 5 years ago

If I go to the nearest PC shop, and buy a damn intel core i9 9900k, I can be sure its base clock speed will be 3.6Ghz, and turbo boost will be 5Ghz. And thats granted. If I want, I can try to overclock it. Then it depends on my cooling system, and Vcore value how far I can get.

Whats so difficult to say the same type of info clearly about the AMD GX-412TC ?

miczyg1 commented 5 years ago

The thing is, Intel provides the info in format: Max turbo frequency: x.x GHz. It clearly states that it is turbo frequency. I wish AMD is as precise in their specifications. That is just how I understood the value. From my experience, I have often encountered situations where datasheet did not match real hardware capabilities, thus I am sceptical about datasheets/specifications. My intention was not to offend You in any way, just presenting my findings and own opinions.

If there is something to improve in this field, we will simply do it.

krystian-hebel commented 5 years ago

@soder10 please, watch Your language. We're doing our best to keep the community happy, but blame wars such as this make us wonder if its even worth it.

Hardware internally controls boost state because of its requirements. CPUs were tested with their nominal frequency, and leaving OS with option to go above them would generally be a bad idea (leaving overclocking aside, this is done without OS involvement and voids warranty). As OS cannot control this, there really is no reason for reporting this. It most likely would mess up timeout loops – OS thinks CPU works at 1.4 GHz, enters a loop with pause instruction, CPU drops back to 1 GHz without OS knowing about it and numbers just do not add up.

I believe what @miczyg1 saw in benchmark was a net gain compared to non-boosted state. In the document You mentioned, based on the BKDG I wrote:

CPU can temporarily go above the TDP for one core, given that enough of other cores are halted or waiting on IO operation.

Benchmark was run under complex, multiprocessor OS. In such an environment there is always something to do by CPU (even when it's only clock interrupt handler), so other cores are awoken every now and then. This forces the measured core to drop to "normal" P-state (1 GHz). When other cores are finished the first one can get back to boosted state (usually 1.4 GHz, sometimes 1.2 GHz, depending on multiple hardware and environment factors). Final result was an average of all states during test period. Perhaps with non-MP OS or different kernel configuration we would observe higher performance reported by (somewhat synthetic) benchmark, but we think that such results wouldn't be very trustworthy.

Tests done with memtest showed ~40% improvement in memory/cache speeds, but memtest doesn't use SMP unless explicitly requested. We can use this values to measure gained performance, but this would be benchmarketing (aka lying). Full results will be available in a blog post soon, stay tuned.

To sum up, in real life only one core can get into boosted state, and only for some time. It wouldn't be called boosted state if it was able to work at 1.4 GHz 100% of time. Boosted states are neither reported nor controlled by the OS.

soder10 commented 5 years ago

@krystian-hebel : thanks for the expansion of what 3mdeb is doing in the background, as transparency is what we need here. And I am not pissed off because of what you (miczyg1 or 3mdeb) guys are doing, I am pissed off because of AMD AND PCENGINES.

AMD says GX-412TC is 1ghz base + 1.4Ghz boost. And thats all. I, as a consumer have no access to the terms and conditions secret document where they clarify if 1.4 ghz is in fact just 1.2 Ghz or any similar sneaky dirty little cheats. Then PCENGINES says clockspeed is base 1ghz and nothing else. Here I feel PCENGINES has cheated me. If the vendor AMD says its capable of doing 1.4 Ghz turbo, there has to be a strong clarification why the OEM cannot provide that specification on their PCB.

Then you (3mdeb) say 1 Ghz base, and 1.2-1.4 ghz boost depending on actual workload. Which is sort of the acceptable way how to approach this matter

You have to understand, that I as the customer am in a very restricted situation: if the performance of the board is insufficient, I cannot throw out the old CPU and insert a more powerful one, that has higher clockspeed and can perform better. So when I checked the CPU specs on AMD official site, they said 1Ghz + turbo capability up to 1.4ghz. So if the base 1 ghz is not enough for my workload, there is still chance the extra turbo gain can handle the occasional spikes. So I was ensured, that there ais some reserved performance in my CPU, if the base clock speed may not be enough during some short load spikes. Then i checked pcengines.ch -> they said the CPU speed is 1 Ghz, sofar so good. So I have assumed(!!!) they were just lazy not to say any single word that yes, by the way we support the turbo 1.4 Ghz on our PCB, as AMD already confirmed this cpu is capable of activating turbo up to 1.4 Ghz, I didnt feel PCENGINES is trying to cheat me by not supporting the turbo properly due to any unrevealed secret issues. Then I had to learn it the hard way, that its indeed true, as of 2019 february 10 the latest available firmware still not provides me the same characteristics, that was advertised by the original vendor of the main CPU. I know you guys (3mdeb) are working on this to make I happen, but I still feel I was cheated by PCENGINES that their initial product with the initial software running on it, was inferior to what I expected based on AMD specs.

miczyg1 commented 5 years ago

@soder10 I understand Your reasons and grudge against hardware/silicon vendor as a customer. However, note this is coreboot repository, an open source firmware project. We discuss here about the firmware and only about firmware. Community feedback is very welcome and often is an inspiration for developers to improve the quality of the firmware. But still this is not the right place to blame other instances. Thank You in advance for consideration.

soder10 commented 5 years ago

Ok gents, just do your best!

silentcreek commented 5 years ago

@soder10 My experience is quite the opposite. In the embedded sector, it's not uncommon that hardware vendors deliberately choose to limit clock speed for the sake of power consumption or thermal management, for example. When I bought the APU2, I knew I'd get a device with a maximum clock speed of 1GHz and no ECC support. So, seeing that some smart developers add features like ECC support or turbo boost via firmware updates long after I bought that device, is simply awesome. I can only thank and applaud the 3mdeb engineers for that. Nevertheless, I'm not saying "you should have known." I understand how this can be confusing or misleading.

@miczyg1 Did you test how turbo frequencies affect power consumption and system temperatures?

soder10 commented 5 years ago

@silentcreek sorry, but I disagree with what you said. 1) ECC support was a clearly acknowledged feature since the product hit the market: https://pcengines.ch/apu2.htm --> DRAM 2 or 4 GB DDR3-1333 DRAM, with optional ECC support

Its NOT marked as optional, because its subject to willingness if PCENGINES is in the mood of supporting it or not. Its optional, because not all APU2 boards supports it, only the 4Gb SKU version. But thats a good point: not clear definition why its "optional".

2) PCENGINES advertise APU2 as a board, based on AMD GX-412TC CPU. If so, I expect the board to provide the same specs, as AMD published for this CPU. Otherwise PCENGINES should not be allowed to advertise their PCB with that specific CPU string in their product specification. Call it "custom modified AMD GX-412 CPU" would be a good workaround, if then it is technically specified, that the custom modification means removal of the entire turbo clock feature. So again, specification vs. enduset expectation. Doesnt work without clear communication.

By the way -as 3mdeb already confirmed-, the CPU turbo activation/deactivation is entirely a private task of the CPU internally, nobody from outside has influence on this. So the AMD CPU internal built in control logic has to take care of overheat protection, and TDP bucket enforcement. So by definition the turbo activation cannot have a negative effect on either parameter. This cannot be an excuse, if the turbo clock feature is omitted from the product.

Again, I think this is a communication issue between the vendor and the customer. 3mdeb is trying to solve these missing deliverables.

pietrushnic commented 5 years ago

Its NOT marked as optional, because its subject to willingness if PCENGINES is in the mood of supporting it or not. Its optional, because not all APU2 boards supports it, only the 4Gb SKU version. But thats a good point: not clear definition why its "optional".

@soder10 I wonder where those assumptions came from. Key point for ECC is that because of bug in AGESA (binary firmware component fully controlled by silicon vendor) there was no way to confirm that ECC works, so PC Engines cannot claim it is enable because it was not possible to prove that. It is optional because memory controller according to SoC datasheet supports that, but to truly enable that you have to fight with AMD yourself and win to make this option work. Please note that firmware is open source so you can fix it if you have enough resources. Luckily PC Engines sponsored 3mdeb to go through long and hard road of experimentation to figure out what are the correct bits to switch to be able to proof to users that ECC really works. I believe @miczyg1 @krystian-hebel will provide detailed explanation of that.

@soder10 any claims that you made about bad intention of someone who you even don't know is IMO inappropriate. Please stick to facts. Interpretation of someone behavior doesn't lead to constructive discussion.

@soder10 overall, can we do something better do decrease level of your frustration about PC Engines? Can you help us any way by testing, contributing documentation or developing code?

miczyg1 commented 5 years ago

@silentcreek not yet. I just focused on proving that the performance rises. Of course, it should be noted, that turbo mode cannot be maintained forever. IMO processor is in turbo state only temporarily, for defined maximum time and then has to switch off to normal mode for some time.

@soder10 Regarding 1., @pietrushnic explained the reason clearly. Since AGESA is closed source, leveraging and working the problem in the binary firmware around was much more difficult and time-consuming than a typical bug. Additionally, the documentation from AMD in certain fields is very poor, we have to often make assumptions about what to set and where.

Yes, ECC is supported on 4GB variants only, by board design. Questions like why? should not be addressed here.

  1. The processor manufacturing process is complex and there may be many things we do not know about. Even if the processor is GX412-TC, there may be many variants of the processor in the world, i.e. some features may be fused to be disabled and it does not apply only to this particular processor. I wouldn't be surprised if the silicon vendor did so. For example, sell unfused processor to customers that need more features to their products, but for a little bit higher price. Hard to tell what really is "custom" and what "is not custom", since silicon vendor may do many things under the desk we do not have a clue about.

About the overheat protection:

silentcreek commented 5 years ago

@silentcreek sorry, but I disagree with what you said.

  1. ECC support was a clearly acknowledged feature since the product hit the market: https://pcengines.ch/apu2.htm --> DRAM 2 or 4 GB DDR3-1333 DRAM, with optional ECC support

Its NOT marked as optional, because its subject to willingness if PCENGINES is in the mood of supporting it or not. Its optional, because not all APU2 boards supports it, only the 4Gb SKU version. But thats a good point: not clear definition why its "optional".

Actually, when I bought the board, the PC Engines website had a note that ECC is not supported by the firmware but may be added at a later point. Hence, I knew ECC would not work which was fine for me. I don't find it on their website anymore, but I'm quite sure there used to be such a note somewhere.

Again, I'm not saying your expectations were all wrong. I acknowlegde the situation is confusing. But I wouldn't go as far as to insinuate that PC Engines knowingly advertised features that the product didn't meet.

madman2003 commented 5 years ago

When I bought the APU2C4 it also said ECC wasn't working yet. And the CPU clock (to this day states) on the product page says 1 GHz. I considered the restriction to 1 GHz to be a design or a thermal choice of some kind, not strange considering it's a passively cooled embedded system. I did basic due diligence when I bought the APU2C4, and quickly realized what I was getting.

The current situation with transparent firmware development is a big improvement over the early days. At the end of the day I prefer this much over the shiny brand which gives you "you get X bios versions for a year or two and then it stops" without communicating that clearly.

Not to even mention the security benefits of avoiding UEFI (or any firmware that approaches OS complexity) on a internet facing firewall.

I look forward ofcource to seeing the occasional benefit of the boosted clock once the new release actually is downloadable. But in no way did I ever count on this feature being added, like ever.

Personally, once you realize 3mdeb's position relative to pcengines, I would give 3mdeb some gratitude for what and how they're doing (it). Should anyone ever challenge pcengines on their communication to customers I'd ofcourse be curious of the outcome.

miczyg1 commented 5 years ago

New release is already here. Check out https://pcengines.github.io to download the newest firmware. Feel free to test CPU boost, feedback is welcome. Also consider subscribing to newsletter (available on the linked site) to be informed about all news and improvements in PC Engines firmware.

madman2003 commented 5 years ago

After upgrading to 4.9.0.2 I suffered two hardlocks in an evening (which I never had before). Network died, serial console not working. I've gone back to 4.9.0.1, if that's stable for a week or so, I will need some guidance on what could be causing the issue.

pietrushnic commented 5 years ago

@madman2003 that's definitely concerning. It would be great if we can reproduce that on our side. Is there anything we can reproduce your hardware configuration and workload?

madman2003 commented 5 years ago

@madman2003 that's definitely concerning. It would be great if we can reproduce that on our side. Is there anything we can reproduce your hardware configuration and workload?

It's an APU2C4 using a msata (Crucial MX200) drive, no extra hardware installed.

It's a (arch)linux based internet gateway and firewall, it also firewalls traffic inside the network. I was listening to music, so every few seconds there will be a spike of traffic over the firewall. There is always ~10 Mbit/s traffic with the internet as a baseline, which can go to several thousands places in the world (i.e. you need a server or p2p workload to replicate this).

Traffic is both IPV4/IPV6 and TCP/UDP, although I don't know that makes a difference.

krystian-hebel commented 5 years ago

@madman2003 is it the same platform that was suffering from reboot/CPU frequency problems before? If these problems are connected it would explain why we didn't catch this in our tests.

madman2003 commented 5 years ago

@krystian-hebel yes, consistently didn't reboot, and had frequency scaling issues, if they are connected I don't know obviously

miczyg1 commented 5 years ago

@madman2003 have You possibly tried the legacy v4.0.24 if the problem persists? Legacy version also has the CPU boost enabled.

miczyg1 commented 5 years ago

Here is the blog post about CPB: https://3mdeb.com/firmware/amd-cpu-boost/ We are describing here how it was enabled and tested. Feedback is welcome.

@madman2003 we will try to simulate the production loads by stressing the processor for few hours/night/day etc. Maybe we will run into the same issue.

pek-si commented 5 years ago

First of all, great job on implementing turbo clock support! I, for one, am more than happy to see so active and frequent open source support.

Here is the blog post about CPB: https://3mdeb.com/firmware/amd-cpu-boost/ We are describing here how it was enabled and tested. Feedback is welcome.

Back to the main matter, compared to your results with stress-ng, the latest pfSense/FreeBSD (2.4.4) resulted in 28 % improvement on 1x CPU core (for example, 30s/1x CPU test run resulted in bogo ops before 523 and after 671). Table 1 includes the numbers of my stress-ng tests. Overall, the system does feel slightly faster and more stable with WiFi connections (with Compex WLE200NX) than before.

Table 1. Results of various runs of stress-ng --cpu X --cpu-method matrixprod --timeout N --metrics where X is number of CPUs and N is test length in seconds.

30 seconds/cpu count 4x-1x
4.9.0.1 (Baseline)
stress-ng: info:  [42799] cpu                1823     30.02    119.38      0.02        60.72        15.27
stress-ng: info:  [48934] cpu                 523     30.01     29.98      0.01        17.43        17.44
4.9.0.2 (CPB enabled)
stress-ng: info:  [41043] cpu                1845     30.02    119.44      0.01        61.46        15.45
stress-ng: info:  [39744] cpu                 671     30.02     29.99      0.00        22.36        22.37

5 seconds/cpu count 4x-2x-1x
4.9.0.1 (Baseline)
stress-ng: info:  [52546] cpu                 306      5.03     19.98      0.04        60.83        15.29
stress-ng: info:  [55842] cpu                 164      5.01     10.00      0.01        32.74        16.39
stress-ng: info:  [51970] cpu                  88      5.02      5.01      0.01        17.54        17.55
4.9.0.2 (CPB enabled)
stress-ng: info:  [61189] cpu                 308      5.03     19.98      0.02        61.24        15.41
stress-ng: info:  [32724] cpu                 178      5.00      9.99      0.00        35.57        17.81
stress-ng: info:  [34929] cpu                 113      5.02      5.02      0.00        22.49        22.53

1 second/cpu count 4x-2x-1x
4.9.0.1 (Baseline)
stress-ng: info:  [58044] cpu                  62      1.03      4.05      0.02        60.40        15.20
stress-ng: info:  [58938] cpu                  34      1.03      2.05      0.01        32.91        16.48
stress-ng: info:  [59324] cpu                  18      1.03      1.02      0.01        17.52        17.59
4.9.0.2 (CPB enabled)
stress-ng: info:  [27909] cpu                  64      1.03      4.07      0.03        62.04        15.60
stress-ng: info:  [21169] cpu                  36      1.05      2.08      0.02        34.34        17.19
stress-ng: info:  [20990] cpu                  23      1.03      1.02      0.00        22.44        22.47

There are a few interesting things I discovered so far:

  1. There seems to be some kind of a throttling feature, as the hotter the CPU became, the lower the results were. For example, when temperature was at the beginning of a 30 second test run at 60 Celcius and 57 C at the end, it scored 645 bogo ops which is still 23 % more than the baseline result. I would guess this is totally expected behaviour, but needs to be taken into account when comparing results.

    It might be interesting to find out the performance critical temperatures, e.g., are the results higher on a actively cooled unit or a lot worse on hotter temperatures. For the record, my unit usually hovers around 52--56 Celcius (@20--21 C ambient) when in idle & normal use, and my results were stable between this temperature range. I was able to push the temperature up to 62 C with both iperf3 and stress-ng running simultaneously, but it cooled down below 60 C quite fast.

  2. Some minor improvements are seen on test runs with 2x CPU core as well as 4x CPUs. Perhaps one of the cores is slightly faster at first, but the overall evens out if given enough time. Also, setting the affinity manually to one core did not improve the 1x CPU result.
  3. Before it was a quite a struggle to reach full 1 gigabit throughput on pfSense/FreeBSD, but now I have been able to get up to ~940 Mbits throughput more easily, even with one stream (i.e., iperf3 -P1). But...

    ... the throughput seems to fluctuate without no apparent reason. For example, two consecutive iperf-runs could produce at first 941 Mbits/sec, and a second run immediately after only 485 Mbits/sec (and vice versa). This not a new problem, and it probably has been there all the time (my first documented iperf-record with the phenomena is from May 2018). When the reduced throughput occurs, the CPU seem to have higher utilization rate and it spends more time on interrupts than normally. See below for a comparison of "good" and "bad" cases. The output of top -P is an approximate representation of cpu usage during the iperf3 throughput test. So far I have tried to rule out simple causes such as different computers, laptops, cables, ethernet ports and various network configurations, but the problem persists.

    With the recent development, I think that the hardware is not the issue here, but more likely it has something to do with how pfSense/FreeBSD handles the packets, or possibly somekind of a driver issue. Any ideas how to proceed from now on? Can someone with pfSense try if their setup behaves similarly?

"Good"
$ iperf3 -c 192.168.0.2 -t 20 -P1
Connecting to host 192.168.0.2, port 5201
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-20.00  sec  2.19 GBytes   941 Mbits/sec    0             sender
[  5]   0.00-20.00  sec  2.19 GBytes   941 Mbits/sec                  receiver

$ top -P
61 processes:  2 running, 59 sleeping
CPU 0:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 1:  0.0% user,  0.0% nice, 11.8% system,  0.0% interrupt, 88.2% idle
CPU 2:  0.0% user,  0.0% nice,  0.0% system, 44.1% interrupt, 55.9% idle
CPU 3:  5.9% user,  0.0% nice, 32.4% system,  0.0% interrupt, 61.8% idle
Mem: 74M Active, 88M Inact, 335M Wired, 29M Buf, 3412M Free

"Bad"
$ iperf3 -c 192.168.0.2 -t 20 -P1
Connecting to host 192.168.0.2, port 5201
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-20.00  sec   637 MBytes   267 Mbits/sec    0             sender
[  5]   0.00-20.00  sec   637 MBytes   267 Mbits/sec                  receiver

$ top -P
61 processes:  2 running, 59 sleeping
CPU 0:  0.8% user,  0.0% nice,  1.6% system,  5.5% interrupt, 92.2% idle
CPU 1:  8.6% user,  0.0% nice, 19.5% system,  0.0% interrupt, 71.9% idle
CPU 2:  0.0% user,  0.0% nice,  0.0% system,  100% interrupt,  0.0% idle
CPU 3:  5.5% user,  0.0% nice, 32.0% system,  0.0% interrupt, 62.5% idle
Mem: 74M Active, 88M Inact, 335M Wired, 29M Buf, 3412M Free
krystian-hebel commented 5 years ago

@pek-si first thing that comes to mind when seeing so much CPU time in interrupt is page fault handling, but given that there is still a lot of free memory it is probably something else.

Another probable cause was mentioned here. It is possible that another random connection happened during the "bad" tests. Unfortunately, the OS have no way of knowing if interrupt handler will require much CPU time in advance, so it cannot change its affinity without risk of assigning it to another CPU-heavy handler. It is theoretically possible to route every interrupt with the same vector to only one core through APIC, but it would make every case "bad" (well, at least it would be stable :slightly_smiling_face: ). Other than that, target of interrupt is controlled by hardware, with no way of controlling it, most likely for performance reasons.

As mentioned in the article I linked, this causes problems only for a small number of concurrent connections coupled with a bad luck. When the connections are distributed evenly they still can fight over CPU time, but it doesn't matter, as network speed becomes a new choke point.

Also, a big Thank You for performing independent tests :smiley:

madman2003 commented 5 years ago

@miczyg1 I ran 4.9.0.1 stable for a week, 4.0.24 hardlocked within hours, so my bet is this is caused by the performance boost feature. How to move forward with this? I don't want to be stuck on one version of firmware forever.

miczyg1 commented 5 years ago

@madman2003 we have observed many positive replies about the CPU boost. Unfortunately nobody complained on hangs. We have been running some test in our side, however we did not encounter issues.

We will make CPU boost runtime configurable via sortbootorder as a fallback, so anyone with similar problems as Yours, could still use the platform.

Of course it still worries me that Your platform is unstable with boost. I have been running memtest86 for 120 hours and nothing happened. Maybe OS level tests, load simulation and stressing will bring some results.

krystian-hebel commented 5 years ago

@madman2003 to add to what @miczyg1 said, I strongly believe that this is connected with previous problems with reboot hang/CPU frequency. From what we see only platforms that were affected by them have problems with CPB now - this is based on user reports only, we don't have any affected hardware apparently. I'm afraid that this could be a hardware bug that is impossible to fix in firmware...

If someone have a platform that did not show any of mentioned problems earlier, but has problems with CPB (or the other way around) please, let us know. Did someone test stability of 4.9.0.1 under heavy load on affected platforms?

madman2003 commented 5 years ago

@krystian-hebel @miczyg1 I understand your predicament, any idea how many users were affected by the reboot issues? And if they suffered reproducible failure to reboot?

Does this sort bootloader feature require me to make my own builds of the firmware?

miczyg1 commented 5 years ago

@madman2003 sortbootorder is a payload which is able to modify bootorder configuration and runtime configuration as well. You will not need to rebuild firmware, it just allows to enable/disable certain features freely whenever You want. All You need is access to serial console. We will implement the CPU boost switch with v4.9.0.3 release. See also https://github.com/pcengines/sortbootorder/blob/master/README.md

Regarding people affected by reboot hang issue, You may visit this issue https://github.com/pcengines/apu2-documentation/issues/64

miczyg1 commented 5 years ago

@pek-si regarding Your results, have You seen and tried this? https://teklager.se/en/knowledge-base/apu2-1-gigabit-throughput-pfsense/ We want to test it on our side ASAP, to see if those settings improve the throughput of APU2. Would be great if You could test it on Your side and share Your results if You haven't set those settings yet.

soder10 commented 5 years ago

@miczyg1 Teklager (and everybody else, who keeps claiming that the APU2/3/4 can achieve full gigabit) fails to mention this 1 simple clause:

"and all speed measurements are true, as long as your internet connection is NOT a PPPOE".

miczyg1 commented 5 years ago

@soder10 I understand. However @pek-si did not mention his connection is PPPOE.

pek-si commented 5 years ago

@miczyg1 actually I had read that before upgrading to v4.9.0.2. I had their recommended settings enabled while I ran the tests. Before those settings, my results were already looking quite similar, so I am not sure if these particular configurations had measurable improvements. And looking back now, it seems there are a lot of suggestions that are somewhat outdated, are already implemented in the OS by default, are not applicable for the intended role (e.g., desktop, server, router, firewall, 100mb, 1gb, 10gb...), or that the impact(s) of the tweak(s) are not measured/measurable. I do like of gaining performance improvement, but I would also like to understand why it is happening.

I re-ran my tests with the same setup (only having some reboots in the meanwhile) but for some reason could not reach the same throughput on one stream anymore. So, I just had to find out if there was anything more I can do. Finally, around 2 am., I finally thought that I had had enough :smile:

As a small summary, here is what I was able to learn about FreeBSD:

Settings that are probably useful for APU2 with FreeBSD/pfSense:

Some notes about tunables & settings that could be interesting, but may have unwanted side effects:

Sorry about the links/references in the text, they did not appear quite as I had wanted so here are the links I have used:

  1. https://people.freebsd.org/~olivier/talks/2018_AsiaBSDCon_Tuning_FreeBSD_for_routing_and_firewalling-Paper.pdf
  2. https://calomel.org/freebsd_network_tuning.html
  3. https://wiki.freebsd.org/NetworkPerformanceTuning
  4. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856
  5. https://docs.netgate.com/pfsense/en/latest/hardware/tuning-and-troubleshooting-network-cards.html
  6. https://www.intel.com/content/dam/www/public/us/en/documents/brochures/ethernet-controllers-phys-brochure.pdf
  7. http://caia.swin.edu.au/reports/070717B/CAIA-TR-070717B.pdf
  8. https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/configtuning-kernel-limits.html

Disclaimer: Please do refer to the original sources, and decide by yourself before implementing any of the tweaks listed in this post.

pietrushnic commented 5 years ago

@pek-si that's the great effort that you put in verification and analysis of those resources. We really appreciate that. It would definitely take time to consume that :)

miczyg1 commented 5 years ago

@madman2003 we have tried to stress the cores with CPU boost enabled in the operating system for 40 hours constantly, however we have not managed to reproduce You hangs. Given that we will add runtime configuration for CPU boost in v4.9.0.3 with CPU boost enabled by default. If You will still experience hangs, try disabling it.

madman2003 commented 5 years ago

@miczyg1 I'm running 4.9.0.3 now, and I've indeed had to turn off CPU boost after a hang. I do wonder, is there a way to enable this setting without using serial console? Like setting it in the rom prior to flashing?

krystian-hebel commented 5 years ago

@madman2003 those settings are located in bootorder in CBFS, so it should be possible to do so, although there are no tools for this. Manual extraction with cbfstool and modification would be required. Another issue is that it needs to be repeated after each new option added, so it can't be modified once and only added to every new image before flashing.

Perhaps a tool for changing these settings offline can be developed? It would still require doing it before each flashing.

michaelsteinmann commented 5 years ago

you could read the complete ROM with flashrom and apply that to other boards.

Am Sa., 16. März 2019 um 11:15 Uhr schrieb krystian-hebel < notifications@github.com>:

@madman2003 https://github.com/madman2003 those settings are located in bootorder in CBFS, so it should be possible to do so, although there are no tools for this. Manual extraction with cbfstool and modification would be required. Another issue is that it needs to be repeated after each new option added, so it can't be modified once and only added to the image before flashing.

Perhaps a tool for changing these settings offline can be developed? It would still require doing it before each flashing.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pcengines/coreboot/issues/266#issuecomment-473517972, or mute the thread https://github.com/notifications/unsubscribe-auth/ARUmfwYNK80sf9qjFome6DMwCKyFSR9eks5vXMRIgaJpZM4am6A7 .

madman2003 commented 5 years ago

@krystian-hebel @michaelsteinmann

My usecase is upgrading to a new version on a single device, not deployment across many devices. I just don't like having to physically hook up the serial console, when flashing can be done via ssh, that's all. But good suggestion for mass deployment cases.

miczyg1 commented 5 years ago

@madman2003 ok, we understand the intent. I have opened a separate issue where we can track the offline modification tool development: https://github.com/pcengines/coreboot/issues/308

As this issue corresponds purely to CPU Boost status/enablement only, I am closing it, since it has been resolved.