Closed pinembour closed 1 year ago
There's nothing to work with in this report, nor is this reproducible for us. We cannot solve per-user performance issues like this sorry.
If you happen to come up with some kind of profiling or further information please feel free to attach it to this issue thread and we will re-visit it.
I am experiencing the exact same issue, and I have a few things to add to hopefully help solve this:
@pinembour, what DE are you using? I'm using KDE Plasma, maybe that could be a factor (though I doubt it)? One more thing to note is that we are both using high-refreshrate monitors. Mine runs at 120 Hz, yours at 144 Hz. I tried lowering my refresh rate to 60 Hz but that didn't seem to improve things.
If you want to rule out garbage collection, you can try clicking "clear all caches" at the bottom of settings. This will force the harshest garbage collection level (which will almost never run). You can also see garbage collections as they show up as dots on the frame graph, but keep in mind they may not necessarily be the cause but the result (the runtime sees that nothing is happening so it runs a GC).
The spike will show on both graphs because draw is dependent on update. If there's no update run, there is nothing to draw on the screen so it will wait on the update to arrive. If it shows as "work" on both then it's guaranteed to be a draw-side issue (update blocking draw will show as sleep time on the draw thread). This 99.9999999999% rules out garbage collection, and points to a GPU, driver, or GL code issue.
We can't really work further without you performing profiling on your system to find what is causing the gpu pipeline blocking. What I can say is that we don't experience this on Windows or macOS, so it is likely driver or hardware specific.
Thank you for that crazy fast reply!
If you want to rule out garbage collection, you can try clicking "clear all caches" at the bottom of settings. This will force the harshest garbage collection level (which will almost never run). You can also see garbage collections as they show up as dots on the frame graph, but keep in mind they may not necessarily be the cause but the result (the runtime sees that nothing is happening so it runs a GC).
I did that, and just like you said, there are dots appearing on each of those spikes (red, yellow, and green ones if that matters).
The spike will show on both graphs because draw is dependent on update. If there's no update run, there is nothing to draw on the screen so it will wait on the update to arrive. If it shows as "work" on both then it's guaranteed to be a draw-side issue (update blocking draw will show as sleep time on the draw thread). This 99.9999999999% rules out garbage collection, and points to a GPU, driver, or GL code issue.
We can't really work further without you performing profiling on your system to find what is causing the gpu pipeline blocking. What I can say is that we don't experience this on Windows or macOS, so it is likely driver or hardware specific.
I used trace-cmd
to do some profiling but idk how to interpret the results properly. I used gpuvis
to visually look at the resulting traces, and they do show a pattern, but idk what it means.
This is one stutter, for example:
And here it is zoomed further in:
I uploaded all three traces that I created, so hopefully you or anyone else can interpret them better: Profiling Traces.zip
Feel completely free to request any more info, I really want this to get figured out, as it's actually really annoying when playing. And please forgive me if I did something wrong, it's my first time doing something like this.
Are those after manually pressing the button in settings? I may have not been clear: that button will cause a stutter (it does for me as well, for around 10ms) because it's doing a lot of clean-up that usually would not occur. The trace output you provide may be useful if you can confirm (or provide a second set of trace output) during the actual stutters you are experiencing during gameplay or otherwise.
Ah, sorry. These are actual stutters recorded during normal gameplay. I didn't press that putton at all during those traces. Pressing the button causes a much larger stutter, should I record a trace for that as well, so you can compare the two?
I'm glad I'm not the only one facing this issue, we might be able to figure something out here. Every point you added @Micha-ohne-el applies to me as well. I can also add that the time between each stutter seems to be dependent on the difficulty. For example on one song, in an easy 4k map the stutters may appear every 11s or so, but in a harder 7k map they would appear every 8s. ( I can also see a difference, although smaller, between a 4k easy and a harder 4k.)
@pinembour, what DE are you using? I'm using KDE Plasma, maybe that could be a factor (though I doubt it)? One more thing to note is that we are both using high-refreshrate monitors. Mine runs at 120 Hz, yours at 144 Hz. I tried lowering my refresh rate to 60 Hz but that didn't seem to improve things.
I am using gnome, also tried with sway so I could rule out mutter, with no amelioration.
The spike will show on both graphs [...]
I'm not sure it matters, but the spike is actually visible on all 4 graphs
I can also add that the time between each stutter seems to be dependent on the difficulty. For example on one song, in an easy 4k map the stutters may appear every 11s or so, but in a harder 7k map they would appear every 8s. ( I can also see a difference, although smaller, between a 4k easy and a harder 4k.)
I was thinking the same thing, but I thought I was imagining it (since the stutters don't always take the same amount of time, I thought it was just a coincidence that harder maps tended to have more severe stutters). But yes, I can also confirm this, the more object there are, the more frequent the stutters are (and possibly the longer they take).
I am using gnome, also tried with sway so I could rule out mutter, with no amelioration.
Yeah, seems as though the issue is within osu!. As an aside, I also tried running the latest LST kernel (5.4) but the issue persisted, leading me to believe that the kernel is not the issue.
I'm not sure it matters, but the spike is actually visible on all 4 graphs
Yes, for me the Audio and Input graphs show it in blue (meaning Sleep), which makes sense if they're also dependent on Update like Draw is.
Please do not test using mania. This is the only ruleset which has not been optimised for memory usage. Test using the main menu (since you say you can reproduce there) or in another ruleset.
Okay I'll keep that in mind, but I can still see that the difficulty correlates with the time between stutters in osu or taiko.
Yeah, seems as though the issue is within osu!. As an aside, I also tried running the latest LST kernel (5.4) but the issue persisted, leading me to believe that the kernel is not the issue.
I'm not sure the blame falls fully on osu!. I remember the game working fine not that long ago, but even a build from november has the same issue atm. Are you using Arch or a derivative ? I'm using Arch, but when I tried a liveusb of Fedora I couldn't reproduce the issue.
So I recorded two more stutters with very controlled conditions so that they can be reproduced: I started the game, pressed the osu! cookie once, went to the top-bar-thingy to pause the music. Then I just sat there on the main menu, without doing anything, not even moving the mouse cursor. This resulted in this stutter: As you can see it is around 100 ms long. I had to wait quite a bit for it to happen (let's say 30 seconds). One thing to note is that the Update graph showed a blue spike this time! This is also reflected in the trace by a continuous purple line. No clue why it did that this time, it never did this any other time. After this, I went into the settings and clicked the Clear All Caches button to get this stutter: As you can see, it looks very different; for one it is much longer at nearly 650 ms, but the events happening are also very different from the one above, further suggesting that this has nothing to do with GC. Both traces can be found here: More Stutters.zip
I'm not sure the blames falls fully on osu!. I remember the game working fine not that long ago, but even a build from november has the same issue atm.
Yeah, I remember that too, and I can also confirm the exact same thing, even older builds produce this behavior for me, despite me being reasonably sure that those builds worked fine for me before.
Are you using Arch or a derivative ? I'm using Arch, but when I tried a liveusb of Fedora I couldn't reproduce the issue.
Yes, I'm using Manjaro, which is Arch-based. So it seems like some update in some package that Arch rolled out around a week to two weeks ago is causing this issue. Do you have any idea as to what package this could be?
It could be, but I have no idea what the specific package would be.
I'm willing to look into this and potentially point you in the right direction if you can record a trace in the standard dotnet format using dotnet-trace collect -p <pid>
(you will need to install the dotnet sdk and use dotnet tools
to install the trace tool).
Here is the trace from my system, with the exact same procedure @Micha-ohne-el described earlier. The stutter happened 45 seconds after launching the game and the trace. trace.zip
I'm willing to look into this and potentially point you in the right direction if you can record a trace in the standard dotnet format using
dotnet-trace collect -p <pid>
(you will need to install the dotnet sdk and usedotnet tools
to install the trace tool).
@peppy, thank you for considering this!
I recorded a trace with dotnet-trace
.
I was just sitting in the main menu after first launching the game (and entering and exiting the settings), so it took a short while for a stutter to occur. I tried to stop the trace as soon as that happened.
I don't know how to visualize this sort of trace, but since you asked for it, I'm sure you'll know what to do with it: trace.nettrace.zip
If anything interesting happens related to this, or if I manage to fix it, I'll post my findings in here.
Second trace (from https://github.com/ppy/osu/issues/11800#issuecomment-784430804):
This is definitely looking to be a very deep GC execution. Not something we generally see happen, so it's curious as to why it's running. If possible, could you provide a longer trace where you stay at the main menu the whole time, but encounter two of these stutters? I'd like to cross-compare them to make sure it's the same thing each time.
Also could you please specify your general memory stats (used / free / swap available?)
GC trigger reason:
Event Name | Time MSec | Process Name | Rest |
---|---|---|---|
Microsoft-Windows-DotNETRuntime/GC/Start | 24,480.796 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Count="15" Reason="AllocSmall" Depth="1" Type="NonConcurrentGC" ClrInstanceID="0" ClientSequenceNumber="0" |
GC stats:
Event Name | Time MSec | Process Name | Rest |
---|---|---|---|
Microsoft-Windows-DotNETRuntime/GC/Stop | 24,539.920 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" DURATION_MSEC="59.124" Count="15" Depth="1" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/HeapStats | 24,539.921 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" TotalHeapSize="247,511,584" TotalPromoted="3,958,320" Depth="1" GenerationSize0="21,699,360" TotalPromotedSize0="1,538,960" GenerationSize1="1,491,296" TotalPromotedSize1="2,419,360" GenerationSize2="150,392,352" TotalPromotedSize2="0" GenerationSize3="73,928,552" TotalPromotedSize3="0" FinalizationPromotedSize="237,335" FinalizationPromotedCount="5" PinnedObjectCount="3" SinkBlockCount="151" GCHandleCount="46,199" ClrInstanceID="0" GenerationSize4="24" TotalPromotedSize4="0" |
Weirdly it looks to start as a gen1, which shouldn't take this long in the first place, but the full log shows it going deeper? Also note that the GC isn't even under much load...
Event Name | Time MSec | Process Name | Rest |
---|---|---|---|
Microsoft-Windows-DotNETRuntime/GC/Start | 24,480.796 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Count="15" Reason="AllocSmall" Depth="1" Type="NonConcurrentGC" ClrInstanceID="0" ClientSequenceNumber="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 24,480.799 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Generation="4" RangeStart="0x7f5597fff000" RangeUsedLength="0x00000018" RangeReservedLength="0x07fff000" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 24,480.801 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Generation="3" RangeStart="0x7f558ffff000" RangeUsedLength="0x04680f68" RangeReservedLength="0x07fff000" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 24,480.802 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Generation="2" RangeStart="0x7f557ffff000" RangeUsedLength="0x08f6ce20" RangeReservedLength="0x08f6ce20" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 24,480.803 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Generation="1" RangeStart="0x7f547ffff000" RangeUsedLength="0x0137c2e8" RangeReservedLength="0x0137c2e8" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 24,480.805 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Generation="0" RangeStart="0x7f548137b2e8" RangeUsedLength="0x0517f950" RangeReservedLength="0x0ec82d18" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/MarkWithType | 24,481.802 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" HeapNum="0" ClrInstanceID="0" Type="0" Promoted="280" |
Microsoft-Windows-DotNETRuntime/GC/MarkWithType | 24,481.804 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" HeapNum="0" ClrInstanceID="0" Type="1" Promoted="0" |
Microsoft-Windows-DotNETRuntime/GC/MarkWithType | 24,481.846 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" HeapNum="0" ClrInstanceID="0" Type="2" Promoted="60,096" |
Microsoft-Windows-DotNETRuntime/GC/MarkWithType | 24,486.546 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" HeapNum="0" ClrInstanceID="0" Type="3" Promoted="3,656,106" |
Microsoft-Windows-DotNETRuntime/GC/BulkMovedObjectRanges | 24,535.396 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Index="0" Count="2,444" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GlobalHeapHistory | 24,539.910 | Process(22337) (22337) | HasStack="True" ThreadID="22,388" ProcessorNumber="0" FinalYoungestDesired="83,886,080" NumHeaps="1" CondemnedGeneration="1" Gen0ReductionCount="0" Reason="AllocSmall" GlobalMechanisms="Compaction, Promotion, Demotion, CardBundles" ClrInstanceID="0" PauseMode="SustainedLowLatency" MemoryPressure="13" CondemnReasons0="0" CondemnReasons1="0" |
Microsoft-Windows-DotNETRuntime/GC/PerHeapHistory | 24,539.913 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" ClrInstanceID="0" FreeListAllocated="2,419,360" FreeListRejected="5,616" EndOfSegAllocated="0" CondemnedAllocated="0" PinnedAllocated="0" PinnedAllocatedAdvance="0" RunningFreeListEfficiency="94" CondemnReasons0="4" CondemnReasons1="48" CompactMechanisms="compact_low_ephemeral" ExpandMechanisms="not_specified" ConcurrentCompactMechanisms="" HeapIndex="0" ExtraGen0Commit="84,497,280" Count="5" MemoryPressure="" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 24,539.918 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Generation="4" RangeStart="0x7f5597fff000" RangeUsedLength="0x00000018" RangeReservedLength="0x07fff000" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 24,539.918 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Generation="3" RangeStart="0x7f558ffff000" RangeUsedLength="0x04680f68" RangeReservedLength="0x07fff000" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 24,539.919 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Generation="2" RangeStart="0x7f557ffff000" RangeUsedLength="0x08f6ce20" RangeReservedLength="0x08f6ce20" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 24,539.919 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Generation="1" RangeStart="0x7f547ffff000" RangeUsedLength="0x0016c160" RangeReservedLength="0x0016c160" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 24,539.920 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" Generation="0" RangeStart="0x7f548016b160" RangeUsedLength="0x014b1b20" RangeReservedLength="0x0fe92ea0" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/Stop | 24,539.920 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" DURATION_MSEC="59.124" Count="15" Depth="1" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/HeapStats | 24,539.921 | Process(22337) (22337) | ThreadID="22,388" ProcessorNumber="0" TotalHeapSize="247,511,584" TotalPromoted="3,958,320" Depth="1" GenerationSize0="21,699,360" TotalPromotedSize0="1,538,960" GenerationSize1="1,491,296" TotalPromotedSize1="2,419,360" GenerationSize2="150,392,352" TotalPromotedSize2="0" GenerationSize3="73,928,552" TotalPromotedSize3="0" FinalizationPromotedSize="237,335" FinalizationPromotedCount="5" PinnedObjectCount="3" SinkBlockCount="151" GCHandleCount="46,199" ClrInstanceID="0" GenerationSize4="24" TotalPromotedSize4="0" |
First trace (from https://github.com/ppy/osu/issues/11800#issuecomment-783984256)
GC trigger reason:
Event Name | Time MSec | Process Name | Rest |
---|---|---|---|
Microsoft-Windows-DotNETRuntime/GC/Start | 45,088.094 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Count="14" Reason="AllocSmall" Depth="1" Type="NonConcurrentGC" ClrInstanceID="0" ClientSequenceNumber="0" |
GC stats:
Event Name | Time MSec | Process Name | Rest |
---|---|---|---|
Microsoft-Windows-DotNETRuntime/GC/Stop | 45,129.702 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" DURATION_MSEC="41.608" Count="14" Depth="1" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/HeapStats | 45,129.703 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" TotalHeapSize="192,293,344" TotalPromoted="9,875,232" Depth="1" GenerationSize0="24" TotalPromotedSize0="440,016" GenerationSize1="442,440" TotalPromotedSize1="9,435,216" GenerationSize2="117,489,816" TotalPromotedSize2="0" GenerationSize3="74,361,040" TotalPromotedSize3="0" FinalizationPromotedSize="243,375" FinalizationPromotedCount="11" PinnedObjectCount="0" SinkBlockCount="121" GCHandleCount="44,053" ClrInstanceID="0" GenerationSize4="24" TotalPromotedSize4="0" |
This one is shorter, at 42ms, but that's still longer than we can live with. Very weird indeed.
Event Name | Time MSec | Process Name | Rest |
---|---|---|---|
Microsoft-Windows-DotNETRuntime/GC/RestartEEStart | 45,086.027 | Process(281057) (281057) | ThreadID="281,095" ProcessorNumber="0" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/RestartEEStop | 45,086.032 | Process(281057) (281057) | ThreadID="281,095" ProcessorNumber="0" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/SuspendEEStart | 45,087.091 | Process(281057) (281057) | ThreadID="281,095" ProcessorNumber="0" Reason="SuspendOther" Count="-1" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/SuspendEEStop | 45,087.099 | Process(281057) (281057) | ThreadID="281,095" ProcessorNumber="0" DURATION_MSEC="0.008" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/RestartEEStart | 45,087.229 | Process(281057) (281057) | ThreadID="281,095" ProcessorNumber="0" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/RestartEEStop | 45,087.235 | Process(281057) (281057) | ThreadID="281,095" ProcessorNumber="0" DURATION_MSEC="0.144" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/SuspendEEStart | 45,088.023 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Reason="SuspendForGC" Count="13" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/SuspendEEStop | 45,088.029 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" DURATION_MSEC="0.006" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/Triggered | 45,088.048 | Process(281057) (281057) | HasStack="True" ThreadID="281,139" ProcessorNumber="0" Reason="AllocSmall" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/Start | 45,088.094 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Count="14" Reason="AllocSmall" Depth="1" Type="NonConcurrentGC" ClrInstanceID="0" ClientSequenceNumber="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 45,088.095 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Generation="4" RangeStart="0x7f318ffff000" RangeUsedLength="0x00000018" RangeReservedLength="0x07fff000" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 45,088.095 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Generation="3" RangeStart="0x7f3187fff000" RangeUsedLength="0x046ea8d0" RangeReservedLength="0x07fff000" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 45,088.095 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Generation="2" RangeStart="0x7f3177fff000" RangeUsedLength="0x0700c098" RangeReservedLength="0x0700c098" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 45,088.096 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Generation="1" RangeStart="0x7f3023fff000" RangeUsedLength="0x01a32748" RangeReservedLength="0x01a32748" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 45,088.096 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Generation="0" RangeStart="0x7f3025a31748" RangeUsedLength="0x050000b0" RangeReservedLength="0x0e5cc8b8" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/MarkWithType | 45,088.303 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" HeapNum="0" ClrInstanceID="0" Type="0" Promoted="0" |
Microsoft-Windows-DotNETRuntime/GC/MarkWithType | 45,088.303 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" HeapNum="0" ClrInstanceID="0" Type="1" Promoted="0" |
Microsoft-Windows-DotNETRuntime/GC/MarkWithType | 45,088.400 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" HeapNum="0" ClrInstanceID="0" Type="2" Promoted="122,160" |
Microsoft-Windows-DotNETRuntime/GC/MarkWithType | 45,095.341 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" HeapNum="0" ClrInstanceID="0" Type="3" Promoted="9,482,610" |
Microsoft-Windows-DotNETRuntime/GC/BulkMovedObjectRanges | 45,120.223 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Index="0" Count="2,677" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/BulkMovedObjectRanges | 45,120.610 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Index="1" Count="2,677" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/BulkMovedObjectRanges | 45,120.869 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Index="2" Count="2,677" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/BulkMovedObjectRanges | 45,120.984 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Index="3" Count="976" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GlobalHeapHistory | 45,129.693 | Process(281057) (281057) | HasStack="True" ThreadID="281,139" ProcessorNumber="0" FinalYoungestDesired="83,886,080" NumHeaps="1" CondemnedGeneration="1" Gen0ReductionCount="0" Reason="AllocSmall" GlobalMechanisms="Compaction, Promotion, CardBundles" ClrInstanceID="0" PauseMode="SustainedLowLatency" MemoryPressure="25" CondemnReasons0="0" CondemnReasons1="0" |
Microsoft-Windows-DotNETRuntime/GC/PerHeapHistory | 45,129.696 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" ClrInstanceID="0" FreeListAllocated="9,435,216" FreeListRejected="0" EndOfSegAllocated="0" CondemnedAllocated="0" PinnedAllocated="0" PinnedAllocatedAdvance="0" RunningFreeListEfficiency="99" CondemnReasons0="4" CondemnReasons1="16" CompactMechanisms="compact_low_ephemeral" ExpandMechanisms="not_specified" ConcurrentCompactMechanisms="" HeapIndex="0" ExtraGen0Commit="108,285,856" Count="5" MemoryPressure="" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 45,129.700 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Generation="4" RangeStart="0x7f318ffff000" RangeUsedLength="0x00000018" RangeReservedLength="0x07fff000" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 45,129.700 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Generation="3" RangeStart="0x7f3187fff000" RangeUsedLength="0x046ea8d0" RangeReservedLength="0x07fff000" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 45,129.701 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Generation="2" RangeStart="0x7f3177fff000" RangeUsedLength="0x0700c098" RangeReservedLength="0x0700c098" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 45,129.701 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Generation="1" RangeStart="0x7f3023fff000" RangeUsedLength="0x0006c048" RangeReservedLength="0x0006c048" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/GenerationRange | 45,129.702 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" Generation="0" RangeStart="0x7f302406b048" RangeUsedLength="0x00000018" RangeReservedLength="0x0ff92fb8" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/Stop | 45,129.702 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" DURATION_MSEC="41.608" Count="14" Depth="1" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/HeapStats | 45,129.703 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" TotalHeapSize="192,293,344" TotalPromoted="9,875,232" Depth="1" GenerationSize0="24" TotalPromotedSize0="440,016" GenerationSize1="442,440" TotalPromotedSize1="9,435,216" GenerationSize2="117,489,816" TotalPromotedSize2="0" GenerationSize3="74,361,040" TotalPromotedSize3="0" FinalizationPromotedSize="243,375" FinalizationPromotedCount="11" PinnedObjectCount="0" SinkBlockCount="121" GCHandleCount="44,053" ClrInstanceID="0" GenerationSize4="24" TotalPromotedSize4="0" |
Microsoft-Windows-DotNETRuntime/GC/RestartEEStart | 45,129.705 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/RestartEEStop | 45,129.729 | Process(281057) (281057) | ThreadID="281,139" ProcessorNumber="0" DURATION_MSEC="41.707" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/FinalizersStart | 45,129.880 | Process(281057) (281057) | ThreadID="281,096" ProcessorNumber="0" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/FinalizersStop | 45,129.918 | Process(281057) (281057) | ThreadID="281,096" ProcessorNumber="0" Count="11" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/FinalizersStart | 45,129.920 | Process(281057) (281057) | ThreadID="281,096" ProcessorNumber="0" ClrInstanceID="0" |
Microsoft-Windows-DotNETRuntime/GC/FinalizersStop | 45,129.921 | Process(281057) (281057) | ThreadID="281,096" ProcessorNumber="0" Count="0" ClrInstanceID="0" |
Going to reopen to track this as it looks like an actual issue.
The reason question here is that you both say that it worked fine at one point in time, which would point to a change in the way we are allocating memory or the underlying runtime. We recently switched to .NET 5 which could definitely have introduced some issues like this, but since you have already tried older builds (which should have the .NET runtime bundled) I'm not so sure..
Can you confirm that when you are checking old builds, you are using the AppImage, and not self-compiling or otherwise?
Thank you for reopening and taking a closer look at it. I can confirm that I tried both compiling and the AppImage, on a recent and an old build.
I'll send a trace with multiple stutters in a few minutes.
The important part is that when you are testing on the old build, you are using the AppImage, or explicitly compiling with the older .NET runtime (3.x should do fine). Conversely, ensure you are building using .NET 5 when building the newer release (although I believe this is a requirement these days, so should be enforced at compile time).
Can you confirm that when you are checking old builds, you are using the AppImage, and not self-compiling or otherwise?
Yes, I was using the AppImage, both with the most recent build, as well as the one from November last year. This is why I still think the issue is with Arch somehow. I actually reinstalled my entire system yesterday (for a different reason) and the issue still persists. Maybe I should install a non-Arch based distro and see if the issue persists? Definitely very weird.
I'm currently on mobile, but I'll provide more traces and memory stats when I get on my PC.
Here are two new traces. trace-new with a current version of the AppImage, trace-old with a version from november, still using an AppImage, 2020.1114.0. I definitely did not encounter the issue last november, but I do now. trace-new.zip Stutters at 40s, 1m09, and 1m40 trace-old.zip Stutters at 38s, 1m06, and 1m40
Both traces were recorded with dotnet 5.0.3.sdk103
Just for confirmation, can you show how you are running the appimage? The .NET runtime should be bundled inside, so I want to confirm you aren't somehow forcing the old version to run on .NET 5 (which is your system installed runtime/sdk version), as this is feasible and could mean a lot in figuring out where this regressed.
I chmod +x
the appimage, and either run in via cli or by doucle clicking on it. Is there a way to ensure what runtime is used ?
For the recent one, in order to capture the trace from startup I used dotnet-trace collect -- ./osu.AppImage
When I tried to use the same method on the build from november it complained about this method not working on .NET 3.1, so I suppose it uses the included runtime.
That only brings more questions. We only started publishing .NET 5 releases in https://osu.ppy.sh/home/changelog/lazer/2021.118.1. All before that should be 3.1.
I don't see the issue here, the build from november seems to be using dotnet 3.1 since dotnet-trace complains that it can't attach to startup on versions before 5. It works fine on the recent one as it uses dotnet 5 as it should
How are you getting the trace then? Attaching post-startup?
Yes, using dotnet-trace collect -p <PID>
@peppy : If I'm reading those correctly, these are cases where the GC is doing a compaction even though it's basically done no "worthwhile" (for lack of a better word) work in order to justify doing a compaction. It might be the compaction that's causing the large GC time, but if so the GC should not be doing compactions so often in the first place.
There are two things that are worrying to me:
ExtraGen0Commit
. It's hard to find documentation as to what this is, but if it's the total amount of memory allocated to these heaps internally (but not necessarily in use), then maybe this indicates high fragmentation?Furthermore I'm frequently seeing something like the following after clearing caches after exiting gameplay: The size never goes down from this after multiple collections. This is in menu where no allocations are happening.
I think I'm able to replicate the stutters seen here, but have not investigated too deeply - dived into SOS and seen nothing too abnormal form the get-go and looked into adding more GC stats for the F2 window (specifically that Commit
value, which doesn't exist in ETW events).
Those gen0 sizes look way too large. Are you able to try commenting out the GC mode specification we have and see if it helps/changes things at all? https://github.com/ppy/osu-framework/blob/23ca45ece4b559d3fd255f987a16a6f43111e547/osu.Framework/Platform/GameHost.cs#L531
Doesn't really improve it by commenting it out. If anything it actually shows the issue even more clearly.
Here's two videos I captured..
Along with that, you can still see the great disparity in size between Gen0 and Gen1... I'm not sure how to explain any of this - probably requires manual digging into what's actually on the heap, if you want me to go into that.
Like... Are we allocating 33MB between two GC cycles to exhibit this behaviour?
It really does look like something is allocating in 16mb blocks. Those display numbers only update on collection, so it looks like sometimes when you press the button there have been (close to) zero allocs, which is what I would expect at the main menu, then others something has done a massive alloc.
Are you doing this from the main menu after a fresh startup? If so, I can't reproduce on macOS, so it may be something platform specific:
https://user-images.githubusercontent.com/191335/108975308-b27d6b00-76c9-11eb-82f7-007c63d034f4.mp4
Yeah that's on a fresh startup. Are you logged in? I've noticed that this gen0 one doesn't happen when not logged in when performing the same inputs. Maybe something signalr?
Yep, logged in to the dev server
Here is a capture of my screen while the stutters are hapenning. The graphs jump all over the place since I'm recording at the same time but the stutters are still clearly visible around 3s and 25s. The stutters seem to coincide with collections, be they gen0 or gen1. video.zip
As we saw in the event logs, in your video it's only reporting a gen0 collection, which should be basically instant. I believe @smoogipoo has found a potential bug causing this to occur, so we should have enough information to act on this, thanks!
This is definitely looking to be a very deep GC execution. Not something we generally see happen, so it's curious as to why it's running. If possible, could you provide a longer trace where you stay at the main menu the whole time, but encounter two of these stutters? I'd like to cross-compare them to make sure it's the same thing each time.
Also could you please specify your general memory stats (used / free / swap available?)
@peppy, I'm not sure you still need this, there has been a lot of development here, but I recorded more stutters on the main menu, this time with memory info.
To gather the memory info, I used this shell script:
#!/bin/bash
last_rss=0
while true; do
time=$(date +%X.%N)
rss=$(ps -o rss $(pgrep 'osu!') | grep -v RSS)
rss_diff=$(expr $rss - $last_rss)
last_rss=$rss
echo -e "$time\t$rss\t$rss_diff" >> memoryinfo.txt
done
It outputs the memory usage of osu! in this format:
time current_mem mem_difference
time
is current system time, and the values for current_mem
and mem_difference
are in KB.
The values refer to the process's “resident set size, the non-swapped physical memory that a task has used” (from man ps
).
As I don't have any swapspace, this should be the total memory used by the process.
Here are the results of both the trace and the memory info script: profile_osu.zip
There were around 4 or 5 stutters total. Again, I was just sitting in the main menu, doing nothing, playing no music.
And to clarify, this was recorded on the latest AppImage build using dotnet-trace collect -n osu!
.
Are you able to compile the latest master and check if it's any better for you?
There are still stutters, but it feels a little better. It might be placebo I'm not sure. Here's what I'm seeing on latest master :
Collections Gen0
increments, there is a stutter that might feel a little shorter that what it used to.Collection Gen1
increments, there is a longer stutter that looks about as long as before.Collection Gen2
, it usually doesn't trigger when staying idle in the menu or in maps. I only see it often incrementing when browsing and selecting maps, but it stutters way too much doing this to know where to put the blame.Here's a new trace from master, near the start there is a stutter with Collections Gen1
incrementing, a little later there is a second stutter when Collection Gen0
and Collection Gen2
both increment at the same time.
trace_master.zip
I can confirm the exact same things that @pinembour is describing. It's maybe a little better but the stutters definitely still happen.
I also recorded another trace: trace.nettrace.zip
I compiled the latest master branch (commit a792c3f13c21d207378e8dbdf151e84ad7e033ef) with dotnet run --project osu.Desktop -c Release
. dotnet --version
gives 5.0.103
and that is the latest version that I can install atm.
And just to confirm beyond doubt: both of you are able to reproduce these stutters after starting the game, at the main menu without any gameplay involved?
Also: are you compiling locally in release mode? This is important when testing these kinds of issues (dotnet run -c Release
).
Yes for me for both questions.
Yes and yes
Does running with the following environment variables make it any better?:
COMPlus_TieredCompilation=0
COMPlus_TC_QuickJit=0
COMPlus_TC_QuickJitForLoops=0
It doesn't look like those environment variables change anything.
This has nothing to do with the environment variables, but in my testing I found something odd. I was trying to reproduce on a liveusb of Manjaro, but I couldn't. However, the F2 menu reaveled something that might be useful :
We're still looking into this and do have some leads, but in the mean time: are you able to re-test at the main menu with the frame graphs hidden? Note that you may want to test for a longer period of time, as with them hidden there are less allocations and it may take longer for a GC run to kick in.
So far I have been able to draw a direct correlation to having that display visible and seeing increased GC time.
Thank you for your hard work. I tested again on the main menu without the frame graph and let the game run for 3 minutes. There were no stutter that I could find, but with so little moving it's really hard to catch them. I tried osu!taiko and even with the frame graph and the f2 menu closed there were noticable stutters.
I recorded another trace. Latest master
(a792c3f13c21d207378e8dbdf151e84ad7e033ef), FPS graph / display completely off. I was sitting in the main menu again, this time circling my cursor for a couple minutes. I didn't see any stutter but that doesn't mean that there was none, they're hard to see when nothing is really moving, as @pinembour said).
Either way, here is the trace: trace.nettrace.zip
Describe the bug: A few times per map I encounter a stutter that makes frame time for one frame spike. The screenshot shows the issue in mania but I encounter it in every game mode. From what I saw, nothing in the logs seems to happen at the same time as the stutter.
I can reproduce this issue with stock or custom kernel, 5.10 or 5.11, amd gpu or nvidia, built from source or using appimage.
Screenshots or videos showing encountered issue:
osu!lazer version: 2021.212.0.r106.86faa7f465-1, reproductible with 2021.109.0-1 and anything in between, did not try anything earlier
Logs:
logs.zip