ninjaframework / ninja

Ninja is a full stack web framework for Java. Rock solid, fast and super productive.
http://www.ninjaframework.org
Apache License 2.0
1.91k stars 519 forks source link

Ninja app drastically slow / stop responding after some time. #502

Closed danielsawan closed 8 years ago

danielsawan commented 8 years ago

Hi there

I have a probleme that occure after 1 / 2 days of exploitation in my production server. The symptoms are :

"qtp55909012-2270" 
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:389)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:531)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.access$700(QueuedThreadPool.java:47)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:590)
        at java.lang.Thread.run(Thread.java:745)

After searching on the google I think that the problem is that the Ninja JVM reach the XMX (which I have not set at the startup command on my 32Go of ram server). And then the JVM use all the CPU to try to free some memory without success. So the CPU is fully used and the Ninja app stop responding.

And after looking at all dump I have like 6 threads running some clients code and all the rest is that sun.misc.Unsafe.park(Native Method) : about 150 occurrences of this. I don’t know if this is another symptom or the root cause of the problem. I am still waiting that the problem occure one more time in order to check my memory usage and log files.

It is the first time I am facing this sort of problem. And the website has not so many user yet : around 8/12 at the same time at maximum. I found on Google that it can be a lock problem. I am a little bit lost. Any help would be appreciated.

Thanks.

jjlauer commented 8 years ago

Daniel,

There are so many potential variables to diagnose your problem, it's probably difficult to point to anything specific.

The sun.misc.Unsafe.park(Native Method) are threads waiting for work. Sounds like you have 150 of them and since they are parked they are ready for work. So they are idle. Doesn't sound like that's your root problem.

Have you tried using jvisualvm to inspect the heap itself? It'll potentially help you find if something is taking up most of the heap. When the cpu spikes, you're correct it could be the garbage collector, but if you have a memory leak then it can't collect anything. With the right setup, you should be able to connect jvisualvm to your production jvm and inspect it.

-Joe

On Sat, Apr 9, 2016 at 10:29 AM, Daniel SAWAN notifications@github.com wrote:

Hi there

I have a probleme that occure after 1 / 2 days of exploitation in my production server. The symptoms are :

  • all the server 8 core CPU at 100% usage.
  • the ninja app stop responding to request
  • ninja app memory usage rise up to 5/6 Go of Ram
  • my server dump is full of (arround 150) :

"qtp55909012-2270" java.lang.Thread.State: TIMED_WAITING at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:389) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:531) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$700(QueuedThreadPool.java:47) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:590) at java.lang.Thread.run(Thread.java:745)

After searching on the google I think that the problem is that the Ninja JVM reach the XMX (which I have not set at the startup command on my 32Go of ram server). And then the JVM use all the CPU to try to free some memory without success. So the CPU is fully used and the Ninja app stop responding.

And after looking at all dump I have like 6 threads running some clients code and all the rest is that sun.misc.Unsafe.park(Native Method) : about 150 occurrences of this. I don’t know if this is another symptom or the root cause of the problem. I am still waiting that the problem occure one more time in order to check my memory usage and log files.

It is the first time I am facing this sort of problem. And the website has not so many user yet : around 8/12 at the same time at maximum. I found on Google that it can be a lock problem. I am a little bit lost. Any help would be appreciated.

Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/ninjaframework/ninja/issues/502

danielsawan commented 8 years ago

Thanks for your help. But i am wondering if all those sleeping threads should not be normally cleaned? Is this a normal thing that i get so many of them parked ? (150 for 10 simultaneous users)

I will try to connect a jvisualvm in order to check a memory leak.

jjlauer commented 8 years ago

150 threads isn't all that significant. Depending on how your have jetty configured, it could be a cached thread pool that grows up to X and if they don't timeout after an idle, then it'll stay at 150. Or if you're app is busy enough, it may very well keep them all busy frequently enough that they stick around. Threads usually get work with a round-robin approach, so if you only timeout a jetty worker after 60 seconds of idle, but you have a busy enough server, then all 150 threads would get some sort of work within 60 seconds. The fact that almost all of them are "parked" is significant since that means they are idle.

-Joe

On Sat, Apr 9, 2016 at 10:44 AM, Daniel SAWAN notifications@github.com wrote:

Thanks for your help. But i am wondering if all those sleeping threads should not be normally cleaned? Is this a normal thing that i get so many of them parked ?

I will try to connect a jvisualvm in order to check a memory leak.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/ninjaframework/ninja/issues/502#issuecomment-207798267

danielsawan commented 8 years ago

I Think i get my answer after a head dump.

958211 insance of my UserDao : http://puu.sh/occnz/a22d3ccf01.png

Which is very strange because all my Controllers have @Singleton annotation. The only class where i inject UserDao and are not annotated with @Singleton are

I am not sure but should i make all those class @Singleton to ?

jjlauer commented 8 years ago

That's certainly a big issue. Hopefully its the only one.

I don't think making UserDao an @Singleton will fix the root cause -- you need to find out where the UserDao reference is being held onto indefinitely instead of being garbage collected. Are you caching it somewhere? Do you have any circular references -- like UserDao referring to itself.

Your heap dump offers some clues btw. There are an almost same number of RunnableScheduleFuture, ScheduledThreadPoolExecutor instances. Maybe you're endlessly scheduling something that never occurs and continuously just builds up over time. Thus, a memory leak.

On Sat, Apr 9, 2016 at 3:54 PM, Daniel SAWAN notifications@github.com wrote:

I Think i get my answer after a head dump.

958211 insance of my UserDao : http://puu.sh/occnz/a22d3ccf01.png

Which is very strange because all my Controllers have @Singleton https://github.com/Singleton annotation. The only class where i inject UserDao and are not annotated with @Singleton https://github.com/Singleton are

  • Validators
  • ArgumentExtractors
  • Filters

I am not sure but should i make all those class @Singleton https://github.com/Singleton to ?

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/ninjaframework/ninja/issues/502#issuecomment-207843542

danielsawan commented 8 years ago

I don't know if it is a good practice but should also every DAO be annoted with @Singleton ? And what about Validators, ArgumentExtractors and Filters ? @Singleton to ?

Thanks for your help.

raphaelbauer commented 8 years ago

@Singleton is not really important in terms of performance... the jvm is very good in creating instances... so - using @singleton everywhere is not a best practise.... Am 09.04.2016 22:27 schrieb "Daniel SAWAN" notifications@github.com:

I don't know if it is a good practice but should also every DAO be annoted with @Singleton https://github.com/Singleton ? And what about Validators, ArgumentExtractors and Filters ? @Singleton https://github.com/Singleton to ?

Thanks for your help.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/ninjaframework/ninja/issues/502#issuecomment-207848682

danielsawan commented 8 years ago

That's really strange. I have a Filter that is called on all my controllers methods. In this Filter i am injecting my UserDao. For each request made on any controller method there is a new UserDao instance created so that the total UserDao instance is incremented by one.

Not any instance is cleared after the request finished. So it go like that until xmx is reached and CPU goes overload.

I still don't understand why each instance of UserDao is not cleared after the request is finished.

@raphaelbauer i know about jvm being good at creating instance but when i use @singleton on my userdao then i have no bug anymore. It is temporary (maybe final) solution if still don't understand what is happening here.

I continue my investigation.

jjlauer commented 8 years ago

Daniel,

You never answered my point yesterday. I'll copy n' paste.

Your heap dump offers some clues btw. There are an almost same number of RunnableScheduleFuture, ScheduledThreadPoolExecutor instances. Are you scheduling something to occur in the future somewhere in your code?

-Joe

On Sun, Apr 10, 2016 at 11:41 AM, Daniel SAWAN notifications@github.com wrote:

That's really strange. I have a Filter that is called on all my controllers requests. In this Filter i am injecting my UserDao. For each request made on any controller method there is a new UserDao instance created so that the total UserDao instance is incremented by one.

Not any instance is cleared after the request finished. So it go like that until xmx is reached and CPU goes overload.

I still don't understand why each instance of UserDao is not cleared after the request is finished.

@raphaelbauer https://github.com/raphaelbauer i know about jvm being good at creating instance but when i use @singleton https://github.com/singleton on my userdao then i have no bug anymore. It is temporary (maybe final) solution if still don't understand what is happening here.

I continue my investigation.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/ninjaframework/ninja/issues/502#issuecomment-208005517

danielsawan commented 8 years ago

No not really (maybe i don't remember but i don't think so). By the way i made some new test

I just created a simple Controller

@Singleton
@Path("/test2")
public class TestController2 {

    @GET
    @Path({"/testoom"})
    public Result testoom() throws Exception {
        return Results.text().render("ok");
    }

}

As you can see there is no filters at all. I am saying that because i was expecting the memory leak comming from filters injecting UserDao. But i noticed that the user dao still increment by 2 when i call /test2/testoom

So i watch step by step the to see when this increment is done and here is what i found

First the full stack : http://puu.sh/oddKM/9513d5dc8c.png And then the increment happen at this point at run() : http://puu.sh/oddWV/cdc1a49af1.png

So then i immediatly thought it was coming from Ninja.java onRouteRequest. So i commented all UserDao reference on Ninja.java but still the same problem http://puu.sh/odekI/09d88cdd65.png

When i start my app i instantly have 233 instance of userdao and then, even when i call my testoom method, 2 new userdao are created i don't know why.

danielsawan commented 8 years ago

Oh sorry i made a mistake : yes i am scheduling email sending : http://puu.sh/odflL/b00baa1963.png I temporary disabled but nothing changed. When i am calling the /testoom there is no email sent anyway.

jjlauer commented 8 years ago

Daniel,

If just starting your app surfaces the issue then perhaps simply analyzing the stacktrace of where a UserDAO is being created may help find a pattern. In the constructor of your UserDAO throw an exception that you catch and log, but don't rethrow it. Something like this.

public class UserDAO {

// ... keep your code as is

public UserDAO() { // constructor stuff... try { throw new Exception("UserDAO called"); } catch (Exception e) { log.error("UserDAO created", e); } } }

On Apr 10, 2016 12:49 PM, "Daniel SAWAN" notifications@github.com wrote:

Oh sorry i made a mistake : yes i am scheduling email sending : http://puu.sh/odflL/b00baa1963.png I temporary disabled but nothing changed. When i am calling the /testoom there is no email sent anyway.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub

danielsawan commented 8 years ago

Nice tips. I tried this and what happen is really strange. here is the stacktrace : http://puu.sh/odJ5L/5a2e8a3a97.png

so it seems to be called by Ninja.java onRouteRequest. here is my Ninja.java now : http://puu.sh/odJbk/0da803d427.png

the more i dig the less i understand.

jjlauer commented 8 years ago

All 233 instances at startup have that exact same stacktrace?

onRouteRequest shouldn't be called @ startup, only when you hit the first page.

-Joe

On Sun, Apr 10, 2016 at 8:01 PM, Daniel SAWAN notifications@github.com wrote:

Nice tips. I tried this and what happen is really strange. here is the stacktrace : http://puu.sh/odJ5L/5a2e8a3a97.png

so it seems to be called by Ninja.java onRouteRequest. here is my Ninja.java now : http://puu.sh/odJbk/0da803d427.png

the more i dig the less i understand.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/ninjaframework/ninja/issues/502#issuecomment-208097622

danielsawan commented 8 years ago

The 233 instance doesn't have the same stack : http://puu.sh/odJQp/9c8d659a5b.png And something more strange on the first call of the /testoom i get a filter, which has nothing to have with testoom which is call (don't know why). And on the second call it we see the stack trace i sent you before.

I think i made the wrong job i guess would have a been an impressive magician !

But hey if you want to have a look i have Teamviewer installed. The last time on the session switching bug we solved the bug this way.

jjlauer commented 8 years ago

So "getArgumentExtractor" is trigging a UserDao to be built on app startup.

Can you share all your argument extractors that use/include a UserDao?

-Joe

On Sun, Apr 10, 2016 at 8:15 PM, Daniel SAWAN notifications@github.com wrote:

The 233 instance doesn't have the same stack : http://puu.sh/odJQp/9c8d659a5b.png And something more strange on the first call of the /testoom i get a filter, which has nothing to have with testoom which is call (don't know why). And on the second call it we see the stack trace i sent you before.

I think i made the wrong job i guess would have a been an impressive magician !

But hey if you want to have a look i have Teamviewer installed. The last time on the session switching bug we solved the bug this way.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/ninjaframework/ninja/issues/502#issuecomment-208099763

danielsawan commented 8 years ago

Yes you are right. I get many arguments extractors and all of them inject UserDao. But i am not injecting UserDao on the same way i do in my controllers, instead i do that on the constructor because if i remember it wasn't working eitherway : http://puu.sh/odKFy/578d75a5b8.png

jjlauer commented 8 years ago

Well, I suspect the 233 instances at app start are not your problem. That's truly how many are created at start. ArgumentExtractors are built when compiling routes. If you are using them everywhere, it'd be pretty easy to get 233 of them. More below for advice.

I go back to my original point about how you have scheduled future instances matching your UserDao. All those ScheduleFuture instances are just as bad as the UserDao. So find where that's happening.

IMHO it would make a lot more sense to do your user lookup in a Filter, not in an argument extractor. If looked up successfully in the filter then add the User to the context as an attribute (e.g. context.setAttribute("USER", user)). Then you have a single argument extractor that extracts a User object from the context's attribute. The main reason is error handling. Your only option in an argument extractor is to throw a runtime exception, but in a Filter, you could choose to return a real Result (like forbidden, notFound, etc.).

On Sun, Apr 10, 2016 at 8:26 PM, Daniel SAWAN notifications@github.com wrote:

Yes you are right. I get many arguments extractors and all of them inject UserDao. But i am not injecting UserDao on the same way i do in my controllers, instead i do that on the constructor because if i remember it wasn't working eitherway : http://puu.sh/odKFy/578d75a5b8.png

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/ninjaframework/ninja/issues/502#issuecomment-208101099

danielsawan commented 8 years ago

Yes the 233 instances is definitly not a problem. And you are right about ScheduleFuture being as bad as all thoses UserDao. But i think that the main cause may be the same. So solving UserDao things can help me fix all others memory leak.

Just for info, here is the headdump just before xmx is reached (10Go): http://puu.sh/odMJn/2c155c8463.png

jjlauer commented 8 years ago

Daniel,

It is almost w/o question its your utils.Email that's the problem. Your heap dump shows 5 million utils.Email instances, 5 million UserDao instances, etc.

-Joe

On Sun, Apr 10, 2016 at 9:04 PM, Daniel SAWAN notifications@github.com wrote:

Yes the 233 instances is definitly not a problem. And you are right about ScheduleFuture being as bad as all thoses UserDao. But i think that the main cause may be the same. So solving UserDao things can help me fix all others memory leak.

Just for info, here is the headdump just before xmx is reached (10Go): http://puu.sh/odMJn/2c155c8463.png

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/ninjaframework/ninja/issues/502#issuecomment-208105685

danielsawan commented 8 years ago

yes the Email class contains everything :

But it is just a simple utils class that send mail : http://puu.sh/odNkL/40cb0484d2.png i use this class everywhere.

But in my /testoom exemple i am not sending emails. It is a simple hello world request and userdao is incremented.

There is some thing with those guice injection really. I feel like the GC don't manage to free them for an unknow reason.

danielsawan commented 8 years ago

After many tests i think i found out the problem. It was effectively a bug in my Email.java

private final ScheduledExecutorService executorService = Executors.newSingleThreadScheduledExecutor();

@Dispose
public void shutdownExecutor() {
     executorService.shutdown();
}

When shutdownExecutor is declared with the @Dispose annotation then the memory leak happen. If i just comment the annotation evrything is ok. Any idea of what is happening and why ?

And the 1M dollars questions : Why is Email.class called on /testoom which doesn't have any reference to any filters or the Email class itself

danielsawan commented 8 years ago

Here is the stack of the Email instance création when /testoom is called : http://puu.sh/og6u3/9711a35a2b.png

And my Ninja.java is just empty while this happening : http://puu.sh/og6xh/a3244f23fb.png

It look like the same stack then the userdao instentiation from somewhere without any reason. There is really something strange here. The memory leak is like a colateral effect of another bug. And i feel lucky that my prod goes out of memory if not i would have never look at this.

But i feel like there is something that make new instances and i don't have any revelent information. So first i looked at Ninja.java and just delete everything there. Then Filters but as you can see there is not any Filters on my /testoom method / Controllers.

danielsawan commented 8 years ago

Ok i undestand what happen. First of all are Filters linked in any way to the url path ? (from what i understood filters are Linked to method which is linked to path but there is no directlink betweet Filters and path.)

That's my root Controller

@Path({""})
public class ApplicationController {

@GET
@Path({"/"})
@FilterWith({FilterA.class, FilterB.class})
public Result index(@LoggedInUser User user, Context context) {

I feel like Each Filter declared on index is called evrywhere in my app at any url call.

Now if i change the @Path of my ApplicationController to

@Path({"/"})
public class ApplicationController {

Everything is ok.

The bug come from @Path({""}) on my ApplicationController

danielsawan commented 8 years ago

I made a sample app to illustrate the bug : http://dl.free.fr/gqiEQdC8L Just click on the /hello_world.json link on the main page and you will notice that the filter SimpleFilter is called. From what i understand from Filters it shouldn't. But maybe filters are linked to path. And it explains that if we put Filters on path "/" then every /toto /titi /* will be filtered. But it is not explained like that http://www.ninjaframework.org/documentation/basic_concepts/filters.html

jjlauer commented 8 years ago

I tried your demo out and I don't see SimpleFilter run when I click on /hello_world.json

When this occurs on your machine, does the content you expect from hello_world.json get returned or do you get the index page?

I don't personally use JaxyRoutes in my own apps, so I'm also curious if what seems specific to your machine also happens if you don't use JaxyRoutes?

-Joe

On Tue, Apr 12, 2016 at 8:18 PM, Daniel SAWAN notifications@github.com wrote:

I made a sample app to illustrate the bug : http://dl.free.fr/gqiEQdC8L Just click on the /hello_world.json link on the main page and you will notice that the filter SimpleFilter is called. From what i understand from Filters it shouldn't. But maybe filters are linked to path. And it explains that if we put Filters / then every /toto /toto /* will be filtered. But it is not explained like that http://www.ninjaframework.org/documentation/basic_concepts/filters.html

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/ninjaframework/ninja/issues/502#issuecomment-209160892

danielsawan commented 8 years ago

You don't see any stack or log message when you call /hello_world.json in your console ? And yes the content i expect is returned "{"content":"Hello World! Hello Json!"}" but it pass throught the Filter.

I also tried without Jaxy and i have the same bug but i am using Jaxy in my prod so i installed it for the test.

danielsawan commented 8 years ago

oh wow i just tested the project on a friend computer and the filter is not called when requesting /hello_world.json This is strange !

jjlauer commented 8 years ago

Did you try a different web-browser or use 'curl' from the command-line? For example, Chrome will sometimes pre-fetch other pages w/o you requesting it.

On Tue, Apr 12, 2016 at 9:07 PM, Daniel SAWAN notifications@github.com wrote:

You don't see any stack or log message when you call /hello_world.json ? And yes the content i expect is returned "{"content":"Hello World! Hello Json!"}" but it pass throught the Filter.

I also tried without Jaxy and i have the same bug.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/ninjaframework/ninja/issues/502#issuecomment-209172773

danielsawan commented 8 years ago

I was using chrome and now with Firefox there is no problem. "Chrome will sometimes pre-fetch other pages" yes it can explains many things. But now i have to make all my tests again.

Any idea about

@Dispose
public void shutdownExecutor() {
     executorService.shutdown();
}

making memory leak ?

jjlauer commented 8 years ago

http://www.ninjaframework.org/documentation/lifecycle.html

Are you trying to use @Dispose in a service? You haven't sent a full copy of your Email class I believe, but I suspect you're actually creating lots of Email instances across your app?

I'd recommend re-architecting your app to only use thread executors inside a single service. If you are then you definitely need to add an @Singleton to it.

On Wed, Apr 13, 2016 at 6:11 PM, Daniel SAWAN notifications@github.com wrote:

I was using chrome and now with Firefox there is no problem. "Chrome will sometimes pre-fetch other pages" yes it can explains many things. But now i have to make all my tests again.

Any idea about

@Dispose public void shutdownExecutor() { executorService.shutdown(); }

making memory leak ?

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/ninjaframework/ninja/issues/502#issuecomment-209669582

danielsawan commented 8 years ago

yep i added Singleton and solved the problem. Thanks for your help @jjlauer The final world of this story is stop using chrome in dev ;)