marklogic-community / marklogic-samplestack

A sample implementation of the MarkLogic Reference Architecture
Apache License 2.0
82 stars 56 forks source link

JVM/Spring warm-up? #395

Closed laurelnaiad closed 9 years ago

laurelnaiad commented 9 years ago

In e2e tests, I have a "dummy" request that gives the Java middle-tier a poke in order to warm it up. This helps avoid timeouts.

In the actual application, we have no such option. Let's discuss what we could do around the tail end of the bootrun task to get the JVM/Spring into memory and ready to respond to requests, so that we can bring down the browser tineout. I've just set the browser timeout to 30 seconds, but for a local app (or really any app that's not having actual internet connectivity problems) that shouldn't be necessary.

grechaw commented 9 years ago

Hi @laurelnaiad , did your work on e2e tests make this issue moot?

laurelnaiad commented 9 years ago

The e2e tests handle it in a way that I'd like to see implemented in the gradle startup code itself, because it's end-users who are having the slow startup time end up causing errors to be reported when their machines are slow. The e2e tests invoke a dummy request of the midde tier and wait for a response before they start. If the gradle command that finalizes the startup of the app (something like a "poststart" event) were to do the same, then people wouldn't think the middle tier is ready when in fact it's "just warming up".

Not the highest priority in the world, but it is founded in the fact that we've had multiple bug reports come in on this issue.

grechaw commented 9 years ago

I think this would be straightforward, but I'm going to bump to 8.0-2 simply because it seems lower priority than our core work right now.

popzip commented 9 years ago

If related to current bug reports might be higher priority. Can someone clarify impact for me?

grechaw commented 9 years ago

I think @laurelnaiad should clarify that -- it really has not affected my development at all because I use the front end relatively rarely and because I have preconceived expectations of how the middle tier behaves... the startup is pretty long.

What occurs to me implementation-wise (without deep thought yet) is to implement a new groovy task that would be tacked on at the end of the bootrun execution that would hit a middle tier endpoint and block until it responds. Again, a too-quick estimate would be a half-day to a day, since it's an entirely new task.

laurelnaiad commented 9 years ago

It's not related to current bug reports, but past ones and local experience on slower machines in QA world and for developers. I agree not an 8.0-1 concern.

laurelnaiad commented 9 years ago

implement a new groovy task that would be tacked on at the end of the bootrun execution that would hit a middle tier endpoint and block until it responds.

Yes, this sounds good once time becomes available, if ever. :smile:

grechaw commented 9 years ago

I've experienced this now. It's somewhat mysterious, but it's actually MarkLogic taking a while to respond to the first query -- I think that bucket resolution the first time for the dates is a long operation.

I'd suggest that it might get quicker if you prime the pump so to speak with just one query to /v1/search. In the initial state there are no shadow queries, so the three initial /v1/search calls are the same. Not sure if this will help, but it's a real issue of latency somewhere between Java and MarkLogic, not in the middle tier itself.

laurelnaiad commented 9 years ago

That's kind of a complicated workaround as opposed to a gradle step.

Here's what I do in the e2e tests so that this doesn't happen. Can't we put this at the tail end of the startup process (but in Java)?

var pokeServer = function (cb) {
  var request = require('request');
  request.post(
    'http://localhost:8090/v1/search',
    { json: {} },
    cb
  );
};
laurelnaiad commented 9 years ago

@grechaw,

Gaurav is still timing out on the first request after the middle tier says it is ready -- it's not browser-specific, btw, but he does have a relatively slow machine.

I took a look at the commit that was to address this, and the warm up as implemented doesn't seem to hit the Spring HTTP layer at all. I think we need to do that before we can claim success here....

grechaw commented 9 years ago

I'm not completely sure how to test, or whether this fix actually addresses the issue 100%, but I saw significant improvement on a cold MarkLogic server with this commit in place. Let's discuss if you are not able to see improvement of the symptoms in CI.

grechaw commented 9 years ago

Fair enough @laurelnaiad I can take the issue back if @gghai has no success with this.

gghai commented 9 years ago

Thats right Charles . We verified on RC candidate while @laurelnaiad was here .

grechaw commented 9 years ago

Guess I need to put back to 8.0-2 since my hypothesis did not help

grechaw commented 9 years ago

Going to do an attempt at making an http call to the server at the end of the bootrun process to get rid of this delay.

grechaw commented 9 years ago

Not at all straightforward how to accomplish this in the Java application or in the gradle tasks.

The last thing the Spring container does is fire up the web server and hand control of the application over. The last thing bootRun does is invoke the Spring app and block.

So in order to hook into either of these places I'd have to deal with concurrency in a way not supported by either the gradle process or the Spring container. I figured out how to hook into the Spring container, but the point at which I can inject code is before the webserver is available, and I just get a socket error. I'm at a loss how to proceed.

Could the browser send a dummy search as part of ITS startup? It seems that this would accomplish the same thing, and I can't figure out at all where I'd put a call in.

Incidently I found a place where the request time is particularly bad -- after running dbconfigure, there's something about a first-time extension run that is very slow in MarkLogic. It could be that the root cause of the first hit is because of extension evaluations (program cache?) rather than search caches as I previously thought. Not that it matters...

laurelnaiad commented 9 years ago

The browser has no way of knowing whether the middle tier is functional. All users of Samplestack should not be responsible for finalizing middle-tier startup (ever single time anyone lands on the app we would be hitting the server and then waiting up to 30 seconds, or however long it might take to warm the server, until concluding it wasn't available or concluding that we had warmed it up successfully). This is kind of the opposite of what to teach in 3-tiered development, I think.

That said, in a later version, making an HTTP call from Javascript is easy, and we could combine the gradle tasks with a Node.js script which knows how to make an HTTP request before claiming "readiness" to the developer. Or we could wrap the Java server in a Node process which finalizes the startup. I think that doing so is really obtuse, but if gradle can't reference an HTTP client library and "do something" once bootrun has completed, then it is a theoretical option.

popzip commented 9 years ago

Thanks for looking into it Charles...given it does not sound like there is a straightforward interim solution, I recommend we don't invest more time now, and put more effort towards the underlying root cause of the delay later. I think we have a reasonable workaround (manual refresh). Can we document when it happens (I haven't seen it but sounds like the rest of you have..)?

laurelnaiad commented 9 years ago

"On first request to the Spring server (from a browser, for example), it can hang for over 30 seconds on slower machines before responding. When the server doeesn't respond within 30 seconds, the request to it times out. This results in an error message being displayed in the browser. The user may refresh the browser page one or more times to work around this."

(user-friendly "page unavailable" implementation tbd under #443 will make the mentioned error message friendlier)

grechaw commented 9 years ago

I think I found another event that puts me where I need to be for this call

grechaw commented 9 years ago

I'm trying to set this aside but it's really bothering me that I can't find the right application hook.

grechaw commented 9 years ago

Folks I think I have a solution. The slowdown may be due to a bug/unfortunate default in jetty. Jetty evidently scans for annotations on first request, and in a large project this can take some time. I didn't confirm that this is the case, and I didn't attempt to configure jetty, BUT

I swapped out tomcat for jetty and the startup time is much better. Or so it appears to me. I'd like to have someone else try it out; I'll put the commit up in a moment and we can see.

laurelnaiad commented 9 years ago

The jetty->tomcat commit is now on rc1 branch.

gghai commented 9 years ago

Thanks @grechaw @laurelnaiad

gghai commented 9 years ago

Able to deploy the app (all 3 platforms ) with latest code base that uses the tomcat . Will mark this as shipped after some sanity testing on the app .

laurelnaiad commented 9 years ago

@gghai said he still sees a timeout on commit 1bf118ae5ea7f04544d3d5f3b01c77d6642c5ccd.

If that is the case, and if the entire gradlew cache was clean before he tested, then Tomcat would seem not to have solved the problem... but I have my doubts about whether the gradle cache was clean. @grechaw will you please help him ensure that Tomcat is being used?

gghai commented 9 years ago

Below are the steps I did :

I did dbteardown clean ( update the latest code to commit 1bf118a.) ran appserver npm install bower install gulp run

I do see tomcat being used . Below are the logs :

essorChecker - Bean 'metaDataSourceAdvisor' of type [class org.springframework.s ecurity.access.intercept.aopalliance.MethodSecurityMetadataSourceAdvisor] is not eligible for getting processed by all BeanPostProcessors (for example: not elig ible for auto-proxying) 06:31:40.194 [main] INFO o.s.b.c.e.t.TomcatEmbeddedServletContainer - Tomcat in itialized with port(s): 8090 (http) 06:31:42.011 [main] INFO o.a.catalina.core.StandardService - Starting service T omcat 06:31:42.021 [main] INFO o.a.catalina.core.StandardEngine - Starting Servlet En gine: Apache Tomcat/8.0.15 06:31:42.984 [localhost-startStop-1] INFO o.a.c.c.C.[Tomcat].[localhost].[/] - Initializing Spring embedded WebApplicationContext 06:31:42.985 [localhost-startStop-1] INFO o.s.web.context.ContextLoader - Root WebApplicationContext: initialization completed in 12412 ms 06:31:49.567 [localhost-startStop-1] INFO c.m.client.impl.DocumentManagerImpl - Reading metadata and content for multiple uris beginning with com.marklogic.sam plestack.domain.Contributor/9611450a-0663-45a5-8a08-f1c71320475e.json 06:31:49.645 [Finalizer] INFO c.m.client.impl.DatabaseClientImpl - Releasing co nnection 06:31:51.164 [localhost-startStop-1] INFO c.m.client.impl.DocumentManagerImpl - Reading metadata and content for multiple uris beginning with com.marklogic.sam plestack.domain.Contributor/cf99542d-f024-4478-a6dc-7e723a51b040.json 06:31:51.227 [localhost-startStop-1] INFO c.m.s.d.MarkLogicContributorService - joe already in the database 06:31:51.227 [localhost-startStop-1] INFO c.m.s.d.MarkLogicContributorService - mary already in the database 06:31:51.693 [localhost-startStop-1] INFO o.s.s.l.DefaultSpringSecurityContextS ource - URL 'ldap://127.0.0.1:33389/dc=samplestack,dc=org', root DN is 'dc=samp lestack,dc=org' 06:31:51.722 [localhost-startStop-1] INFO o.s.l.c.s.AbstractContextSource - Pro perty 'userDn' not set - anonymous context will be used for read-write operation s 06:31:53.998 [localhost-startStop-1] INFO o.s.s.ldap.server.ApacheDSContainer - Setting working directory for LDAP_PROVIDER: C:\Users\builder\AppData\Local\Tem p\apacheds-spring-security-11711569264546 06:31:55.046 [localhost-startStop-1] INFO o.s.s.ldap.server.ApacheDSContainer - Starting directory server... 06:31:57.407 [localhost-startStop-1] ERROR o.a.d.s.s.r.DefaultAttributeTypeRegis try - attributeType w/ OID 2.5.4.16 not registered! 06:32:07.125 [localhost-startStop-1] INFO o.s.s.ldap.server.ApacheDSContainer - Loading LDIF file: C:\Users\builder\marklogic-samplestack\appserver\java-spring \src\main\resources\samplestack-ds.ldif 06:32:07.923 [localhost-startStop-1] INFO o.s.s.web.DefaultSecurityFilterChain

Using default security password: c7f672f9-90ac-46bb-b5f1-b5212adf0126

06:32:33.629 [main] INFO o.s.j.e.a.AnnotationMBeanExporter - Registering beans for JMX exposure on startup 06:32:33.712 [main] INFO o.a.coyote.http11.Http11NioProtocol - Initializing Pro tocolHandler ["http-nio-8090"] 06:32:33.742 [main] INFO o.a.coyote.http11.Http11NioProtocol - Starting Protoco lHandler ["http-nio-8090"] 06:32:33.762 [main] INFO o.a.tomcat.util.net.NioSelectorPool - Using a shared s elector for servlet write/read 06:32:33.877 [main] INFO o.s.b.c.e.t.TomcatEmbeddedServletContainer - Tomcat st arted on port(s): 8090 (http) 06:32:33.881 [main] INFO c.marklogic.samplestack.Application - Started Applicat ion in 68.883 seconds (JVM running for 87.676)

grechaw commented 9 years ago

Yes, tomcat is in use here.

laurelnaiad commented 9 years ago

@grechaw , if we are to conclude that tomcat didn't solve it (which seems reasonable), should we revert to jetty? Just wondering...

grechaw commented 9 years ago

Well, there's really nothing that would have me choose one over the other, except it seems there is an expectation of the use of tomcat in the field. tomcat's apache, jetty is eclipse. I'm turning on some debugging just to see where in the world this time is going.

laurelnaiad commented 9 years ago

Ok, if you want to back out Tomcat for Jetty , just let me know, otherwise I'll do "nothing". :smile:

grechaw commented 9 years ago

Well, it's not looking like a solution is going to appear for 1.0.0. I suggest we push back to 8.0-2 milestone, although I will keep investigating.

gghai commented 9 years ago

You can pass this back to me in case you want me to re-try one more time on windows m/c Thanks

laurelnaiad commented 9 years ago

@grechaw did http://stackoverflow.com/questions/12857823/web-application-deployed-on-tomcat-is-slow-on-first-access

grechaw commented 9 years ago

So we found that the timeout 'event' is on a GET from the express node server (port 3000) calling the middle tier REST server. @laurelnaiad set an explicit timeout on this GET so that the browser will not see it as a failed request.

I tested the change locally and the browser does indeed wait more patiently now for the slow first request, regardless of why its slow.

grechaw commented 9 years ago

Giving to gaurav for test -- I think we got to a 'root cause' of sorts, which was to find the appropriate timeout and deal with it.

laurelnaiad commented 9 years ago

I think we didn't so much get to the root cause but an exacerbating bug.... 20 seconds to make a request is still pretty heavy...we'll see how Gaurav's slow machine makes out. :)

gghai commented 9 years ago

Worked like a magic . I saw the spinner for few seconds eventually the app loaded fine this time . Win7/IE9

C:\Users\builder\marklogic-samplestack>git rev-parse HEAD f644324c3dd5e1f34d901c90dab6f79ae41a04cd

laurelnaiad commented 9 years ago

How many seconds?

grechaw commented 9 years ago

so the story before -- browser requests proxy requests proxy times out browser waits to timeout before hearing about proxy timeout.

Yes, the initial request is still slow, but at least all the components know about it now

gghai commented 9 years ago

Less then 30 i am guessing . Didnt count :-)

laurelnaiad commented 9 years ago

Will you please rerun and count the seconds? It's helpful in this case to understand what kind of time that machine is taking to do the warm up

laurelnaiad commented 9 years ago

May be able to deduce something about how much tomcat helped.

popzip commented 9 years ago

trying to catch up - is there consensus this is btw java & ML and browser is just a downstream manifestation, or could it still be something between java & browser?

laurelnaiad commented 9 years ago

@popzip I don't understand your question.

popzip commented 9 years ago

have we isolated where the root cause of the warm up issue is?

Is it experienced only in the browser or if we make curl calls to middle tier would we also see it?

laurelnaiad commented 9 years ago

There as a third ingrededient which was also timing out in addition to browser timing out, which was the proxy server. Charles discovered that and I upped its timeout.

I just want to know whether we're at 10, 15, 20, 25 seconds or what on the test machine now.

laurelnaiad commented 9 years ago

have we isolated where the root cause of the warm up issue is?

I think we don't know how much Charles' warm-up query helped, how much Tomcat helped, how much the proxy timeout change helped. Some combination of all three is in effect and it's working, but knowing which was which is why I'm still pursuing that number from Gaurav.... seconds to load...

laurelnaiad commented 9 years ago

if we make curl calls to middle tier would we also see it?

Whatever delay there is in warming up, you could see it in curl, too.

gghai commented 9 years ago

I would say be may be around 10-20 sec since you are pursing that number from me . I dont have exact count.