Open SIA-77 opened 8 months ago
UPD: After we added 8Gb RAM no more crashes were observed. Short tests (up to 1M cycles) shows excellent results (no overruns, stable cycle). We launched the test with 1,1Bn cycles. Settings:
Cycle Period: 200us
Job time: 140us
Jitter Threshold: 5us
Results:
Threshold overruns - 8707 (task started later than expected start time + 5us)
Overruns: 2
Max wake up latency - 15107,568us
Max period - 15302,57us
Difference between expected execution time and actual execution time was 30048,31us.
Which means we have 2 overruns about 15ms each.
We suspect this is somehow related to 1G hugepages (remapping, moving or something else) and followed by VM_exit for a significant amount of time. The problem happens occasionally, we think it is somehow connected to the memory management engine. More memory = less chances to face the issue, but it is still possible. Any ideas how to fix this? This behavior could be very dangerous for real time systems.
Well, the problem was solved by changing TOLUD size. Still it is not clear how to completely avoid such issues in the future. Some clarifications on that will be appreciated.
Describe the bug GPU passthrough leads to significant sporadic interrupts due to VM exit
Platform i8250u 8Gb (board.txt attached)
Codebase ACRN Hypervisor v3.2 ACRN kernel v3.2
Scenario Industrial, 1 RTVM and 1 HMI VM with GPU passthrough
To Reproduce [Steps to reproduce the behavior:
Expected behavior We expected almost the same performance with some degradation.
Additional context Without GPU passthrough (100us period): Max wakeup latency (jitter) = 17,2us Max period = 114,382us VMEXIT_IO_INSTRUCTION 0% (according to acrntrace. CSV attached)
With GPU passthrough (100us period):
Max wakeup latency = 14493,32us Max period = 14589,257us VMEXIT_IO_INSTRUCTION 7.36% (according to acrntrace. CSV attached)
We tried both i915.modeset=0 and 1.
The test device works with monitors with resolution equal or less than 1024x768. If we connect HDMI to high resolution monitor (1980x1280) we face crushing of the host OS and the entire system.
GPU passthrough makes it impossible to use GPU passthrough with RT systems. Are there any means to reduce latencies or at least avoid the system from crushing?
We thought of the following means (but haven't tested them yet):
log_acrn.csv log_wo_overrun.csv board.txt launch_user_vm_id1.txt launch_user_vm_id2.txt