playframework / play1

Play framework
https://www.playframework.com/documentation/1.4.x/home
Other
1.58k stars 684 forks source link

Continuations - partially blocked pool threads #1304

Open sant0s opened 5 years ago

sant0s commented 5 years ago

References

Discuss Play Forum: https://discuss.lightbend.com/t/continuations-partially-blocked-pool-threads/3752 Lighthouse (copy of this ticket): https://play.lighthouseapp.com/projects/57987/tickets/2390-continuations-partially-blocked-pool-threads

Description

Play! version: 1.5.2 Java version: 1.8.0_201 Ubuntu 18.04.2 LTS

Consider a Play app consisting of a controller that makes use of continuations in order to allow simultaneous requests to be run by pool threads.

Create a test app

play new test

and change the application controller to

package controllers;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import play.Logger;
import play.mvc.Controller;

public class Application extends Controller {

    private static ExecutorService pool = Executors.newFixedThreadPool(10);

    public static void index() {
        Future future = pool.submit(() -> {
            try {
                Logger.info("before sleep");
                Thread.sleep(20000);
                Logger.info("after sleep");
            } catch (InterruptedException e) {
                Logger.error(e.getMessage());
            }
        });
        Logger.info("before await");
        await(future);
        Logger.info("after await");
        render();
    }

}

Then add conf/log4j.properties,

log4j.rootLogger=DEBUG, Console
log4j.appender.Console=org.apache.log4j.ConsoleAppender
log4j.appender.Console.layout=org.apache.log4j.PatternLayout
log4j.appender.Console.layout.ConversionPattern=%d{ABSOLUTE} %-5p ~ [%t] %m%n

start play

play run

and simultaneously open 3 browser tabs at http://localhost:9000/

The results are:

Logs show that:

Expected Behavior

Since the 3 tabs were opened at (around) the same time, they should all be loaded after (around) 20 seconds i.e. the 3 pool threads should start/complete handling them at (around) the same time. Furthermore, play-thread-1 should not be blocked.

Actual Behavior

All pool threads should start/complete handling simultaneous requests at (around) the same time and play-thread-1 should not block i.e. it should handle other requests while waiting for pool threads to complete.

Reproducible Test Case

The actual behaviour is consistently reproducible by following the above-mentioned steps.

LouisJackman commented 5 years ago

If true, this could defeat the purpose of await somewhat. That said, the 2nd and 3rd requests are apparently running concurrently; perhaps the first web request handling thread is incorrectly doing dual duty as both the request thread dispatching and as a request thread handler itself. If so, that could lead to every one request in N requests blocking all subsequent requests until it's complete.

I'll dig into the code to investigate further. I might be guessing the completely wrong thing.

sant0s commented 5 years ago

Any luck?

xabolcs commented 5 years ago

I'm unable to reproduce the issue with

Am I doing something wrong? I'm opening "simultaneously" that 3 tabs by quickly middle clicking on the link in this issue three times.

~        _            _ 
~  _ __ | | __ _ _  _| |
~ | '_ \| |/ _' | || |_|
~ |  __/|_|\____|\__ (_)
~ |_|            |__/   
~
~ play! 1.5.2, https://www.playframework.com
~
~
~ Warning: conflict on command doc:export
~
~ Ctrl+C to stop
~ 
~ using java version "1.8.0_212"
Listening for transport dt_socket at address: 8000
.
.
.
12:38:52,075 INFO  ~ [play-thread-1] Application 'play-issue-1304-blocked-pool-threads' is now started !
12:38:52,129 INFO  ~ [play-thread-1] before await
12:38:52,129 INFO  ~ [pool-5-thread-1] before sleep
12:38:52,129 DEBUG ~ [play-thread-1] suspend()
12:38:52,130 DEBUG ~ [play-thread-1] push object java.util.concurrent.FutureTask@1037802910/unknown [java.util.concurrent.FutureTask@3ddb9d9e] i[0],l[0],d[0],f[0],o[0],r[0]
12:38:52,130 DEBUG ~ [play-thread-1] push int 2 i[0],l[0],d[0],f[0],o[1],r[0]
12:38:52,133 WARN  ~ [play-thread-1] Start WaitForTasksCompletion
12:38:52,135 INFO  ~ [play-thread-1] before await
12:38:52,135 INFO  ~ [pool-5-thread-2] before sleep
12:38:52,135 DEBUG ~ [play-thread-1] suspend()
12:38:52,135 DEBUG ~ [play-thread-1] push object java.util.concurrent.FutureTask@1190650721/unknown [java.util.concurrent.FutureTask@46f7e361] i[0],l[0],d[0],f[0],o[0],r[0]
12:38:52,135 DEBUG ~ [play-thread-1] push int 2 i[0],l[0],d[0],f[0],o[1],r[0]
12:38:52,216 INFO  ~ [play-thread-1] before await
12:38:52,216 INFO  ~ [pool-5-thread-3] before sleep
12:38:52,216 DEBUG ~ [play-thread-1] suspend()
12:38:52,216 DEBUG ~ [play-thread-1] push object java.util.concurrent.FutureTask@114452413/unknown [java.util.concurrent.FutureTask@6d267bd] i[0],l[0],d[0],f[0],o[0],r[0]
12:38:52,216 DEBUG ~ [play-thread-1] push int 2 i[0],l[0],d[0],f[0],o[1],r[0]

12:39:12,129 INFO  ~ [pool-5-thread-1] after sleep
12:39:12,135 INFO  ~ [pool-5-thread-2] after sleep
12:39:12,163 DEBUG ~ [play-thread-1] pop int 2 i[0],l[0],d[0],f[0],o[1],r[0]
12:39:12,163 DEBUG ~ [play-thread-1] pop object java.util.concurrent.FutureTask@1037802910/unknown [java.util.concurrent.FutureTask@3ddb9d9e] 
12:39:12,171 INFO  ~ [play-thread-1] after await
12:39:12,216 INFO  ~ [pool-5-thread-3] after sleep
12:39:12,518 DEBUG ~ [play-thread-1] pop int 2 i[0],l[0],d[0],f[0],o[1],r[0]
12:39:12,518 DEBUG ~ [play-thread-1] pop object java.util.concurrent.FutureTask@1190650721/unknown [java.util.concurrent.FutureTask@46f7e361] 
12:39:12,518 INFO  ~ [play-thread-1] after await
12:39:12,526 DEBUG ~ [play-thread-1] pop int 2 i[0],l[0],d[0],f[0],o[1],r[0]
12:39:12,526 DEBUG ~ [play-thread-1] pop object java.util.concurrent.FutureTask@114452413/unknown [java.util.concurrent.FutureTask@6d267bd] 
12:39:12,526 INFO  ~ [play-thread-1] after await

The empty line is where the 20 seconds wait happens.

sant0s commented 5 years ago

@xabolcs You're doing nothing wrong and I am glad that Play works as expected, as this would be a major issue (literally a "blocker" as threads get blocked :)).

Your set up (Java and Play versions) is similar to mine, so I started wondering what else could be different. I re-tested and got the following results:

Scenario A: same behaviour as I reported i.e. it doesn't work Scenario B: same behaviour as you reported i.e. it works

Can you guess what's the difference? :)

Scenario A corresponds to a VirtualBox guest, while Scenario B corresponds to a plain non-virtualised machine (as I assume you tested on).

Any idea about the underlying reason for this mismatching behaviour? It might be related to thread scheduling, but no clue on details.

So these are good news for Play apps deployed in physical machines, but an issue for those deployed in virtualised servers (in production) as they may stall.

xabolcs commented 5 years ago

Nice findings, @sant0s ! Yes, I was testing on bare metal hardware.

How many processors does have that VirtualBox guest? In other words: is the bug related to processor count or to virtualization?

LouisJackman commented 5 years ago

Sorry, I didn't find the time to look too deeply into this as I originally promised. I was having difficulty reproducing, it's good to see I'm not the only one.

I can't see how more CPUs on a physical machine over a VM would change the problem (although, like most concurrency bugs, it can certainly exacerbate it.) This bug seems to be more related to concurrency and not parallelism.

If the pool is large and there are only three tabs open, is it possible that Play just doesn't allocate the main web request dispatch thread by sheer fluke? That seems pretty unlikely if you've refreshed the tabs enough though, so it seems more likely my guess is completely wrong...

sant0s commented 5 years ago

@xabolcs 2 CPUs. Both scenarios - A (VirtualBox guest) and B (host, bare metal) - are run under 2 CPUs each (the host has 4 CPUs). I'd guess virtualisation related.

@LouisJackman No problem.

I wonder if Play 2 is affected by this issue.