temperlang / temper

3 stars 0 forks source link

Stabilize temper watch and langserver #125

Open tjpalmer opened 6 months ago

tjpalmer commented 6 months ago

We've sometimes called this the langserver death spiral. In any case, sometimes it hangs or chews CPU or crashes. We've been taking notes on an issue elsewhere in the past. Likely more than one unrelated problem out there. But we need this to be mostly stable for people to be able to use our tooling effectively. And I don't know how to track down remaining issues that we've seen.

Probably should make new issues here for specific problems we characterize. This can reference those.

tjpalmer commented 5 months ago

The temper java process went nonstop high cpu on me again when I woke up my laptop this morning. I capture some a thread dump then waited a bit then captured again. Many threads aren't doing any thing, but I noticed that at least these ones are very busy. First capture:

"VM Thread" os_prio=2 cpu=18937.50ms elapsed=225986.06s tid=0x000001c14d4a7d70 nid=0x16ca0 runnable  
"GC Thread#0" os_prio=2 cpu=196218.75ms elapsed=225986.10s tid=0x000001c128776460 nid=0x16c84 runnable  
"GC Thread#1" os_prio=2 cpu=194500.00ms elapsed=225984.62s tid=0x000001c1529b0710 nid=0xeff4 runnable  
"GC Thread#2" os_prio=2 cpu=193250.00ms elapsed=225984.62s tid=0x000001c1529b09c0 nid=0xd134 runnable  
"GC Thread#3" os_prio=2 cpu=227546.88ms elapsed=225984.62s tid=0x000001c1529b0c70 nid=0x18d4c runnable  
"GC Thread#4" os_prio=2 cpu=226781.25ms elapsed=225984.62s tid=0x000001c152c99300 nid=0x63d0 runnable  
"GC Thread#5" os_prio=2 cpu=194093.75ms elapsed=225984.62s tid=0x000001c152c995b0 nid=0x640 runnable  
"GC Thread#6" os_prio=2 cpu=192546.88ms elapsed=225984.61s tid=0x000001c152bd36b0 nid=0x6f68 runnable  
"GC Thread#7" os_prio=2 cpu=229828.12ms elapsed=225984.61s tid=0x000001c1528271e0 nid=0x26e8 runnable  

Second capture:

"VM Thread" os_prio=2 cpu=20640.62ms elapsed=226017.46s tid=0x000001c14d4a7d70 nid=0x16ca0 runnable  
"GC Thread#0" os_prio=2 cpu=218046.88ms elapsed=226017.50s tid=0x000001c128776460 nid=0x16c84 runnable  
"GC Thread#1" os_prio=2 cpu=216468.75ms elapsed=226016.02s tid=0x000001c1529b0710 nid=0xeff4 runnable  
"GC Thread#2" os_prio=2 cpu=214671.88ms elapsed=226016.02s tid=0x000001c1529b09c0 nid=0xd134 runnable  
"GC Thread#3" os_prio=2 cpu=251312.50ms elapsed=226016.02s tid=0x000001c1529b0c70 nid=0x18d4c runnable  
"GC Thread#4" os_prio=2 cpu=249562.50ms elapsed=226016.02s tid=0x000001c152c99300 nid=0x63d0 runnable  
"GC Thread#5" os_prio=2 cpu=215968.75ms elapsed=226016.02s tid=0x000001c152c995b0 nid=0x640 runnable  
"GC Thread#6" os_prio=2 cpu=214500.00ms elapsed=226016.02s tid=0x000001c152bd36b0 nid=0x6f68 runnable  
"GC Thread#7" os_prio=2 cpu=252984.38ms elapsed=226016.02s tid=0x000001c1528271e0 nid=0x26e8 runnable  
tjpalmer commented 5 months ago

Example of temper watch hanging in nightly integration testing:

subprocess.TimeoutExpired: Command '['/tmp/tmp6g25sxb8/temper/temper-0.3.1-dev+10-g73d8edce9-linux-x64/temper', 'watch', '--limit', '1']' timed out after 240 seconds