Closed laurelnaiad closed 9 years ago
Hi @laurelnaiad , did your work on e2e tests make this issue moot?
The e2e tests handle it in a way that I'd like to see implemented in the gradle startup code itself, because it's end-users who are having the slow startup time end up causing errors to be reported when their machines are slow. The e2e tests invoke a dummy request of the midde tier and wait for a response before they start. If the gradle command that finalizes the startup of the app (something like a "poststart" event) were to do the same, then people wouldn't think the middle tier is ready when in fact it's "just warming up".
Not the highest priority in the world, but it is founded in the fact that we've had multiple bug reports come in on this issue.
I think this would be straightforward, but I'm going to bump to 8.0-2 simply because it seems lower priority than our core work right now.
If related to current bug reports might be higher priority. Can someone clarify impact for me?
I think @laurelnaiad should clarify that -- it really has not affected my development at all because I use the front end relatively rarely and because I have preconceived expectations of how the middle tier behaves... the startup is pretty long.
What occurs to me implementation-wise (without deep thought yet) is to implement a new groovy task that would be tacked on at the end of the bootrun execution that would hit a middle tier endpoint and block until it responds. Again, a too-quick estimate would be a half-day to a day, since it's an entirely new task.
It's not related to current bug reports, but past ones and local experience on slower machines in QA world and for developers. I agree not an 8.0-1 concern.
implement a new groovy task that would be tacked on at the end of the bootrun execution that would hit a middle tier endpoint and block until it responds.
Yes, this sounds good once time becomes available, if ever. :smile:
I've experienced this now. It's somewhat mysterious, but it's actually MarkLogic taking a while to respond to the first query -- I think that bucket resolution the first time for the dates is a long operation.
I'd suggest that it might get quicker if you prime the pump so to speak with just one query to /v1/search. In the initial state there are no shadow queries, so the three initial /v1/search calls are the same. Not sure if this will help, but it's a real issue of latency somewhere between Java and MarkLogic, not in the middle tier itself.
That's kind of a complicated workaround as opposed to a gradle step.
Here's what I do in the e2e tests so that this doesn't happen. Can't we put this at the tail end of the startup process (but in Java)?
var pokeServer = function (cb) {
var request = require('request');
request.post(
'http://localhost:8090/v1/search',
{ json: {} },
cb
);
};
@grechaw,
Gaurav is still timing out on the first request after the middle tier says it is ready -- it's not browser-specific, btw, but he does have a relatively slow machine.
I took a look at the commit that was to address this, and the warm up as implemented doesn't seem to hit the Spring HTTP layer at all. I think we need to do that before we can claim success here....
I'm not completely sure how to test, or whether this fix actually addresses the issue 100%, but I saw significant improvement on a cold MarkLogic server with this commit in place. Let's discuss if you are not able to see improvement of the symptoms in CI.
Fair enough @laurelnaiad I can take the issue back if @gghai has no success with this.
Thats right Charles . We verified on RC candidate while @laurelnaiad was here .
Guess I need to put back to 8.0-2 since my hypothesis did not help
Going to do an attempt at making an http call to the server at the end of the bootrun process to get rid of this delay.
Not at all straightforward how to accomplish this in the Java application or in the gradle tasks.
The last thing the Spring container does is fire up the web server and hand control of the application over. The last thing bootRun does is invoke the Spring app and block.
So in order to hook into either of these places I'd have to deal with concurrency in a way not supported by either the gradle process or the Spring container. I figured out how to hook into the Spring container, but the point at which I can inject code is before the webserver is available, and I just get a socket error. I'm at a loss how to proceed.
Could the browser send a dummy search as part of ITS startup? It seems that this would accomplish the same thing, and I can't figure out at all where I'd put a call in.
Incidently I found a place where the request time is particularly bad -- after running dbconfigure, there's something about a first-time extension run that is very slow in MarkLogic. It could be that the root cause of the first hit is because of extension evaluations (program cache?) rather than search caches as I previously thought. Not that it matters...
The browser has no way of knowing whether the middle tier is functional. All users of Samplestack should not be responsible for finalizing middle-tier startup (ever single time anyone lands on the app we would be hitting the server and then waiting up to 30 seconds, or however long it might take to warm the server, until concluding it wasn't available or concluding that we had warmed it up successfully). This is kind of the opposite of what to teach in 3-tiered development, I think.
That said, in a later version, making an HTTP call from Javascript is easy, and we could combine the gradle tasks with a Node.js script which knows how to make an HTTP request before claiming "readiness" to the developer. Or we could wrap the Java server in a Node process which finalizes the startup. I think that doing so is really obtuse, but if gradle can't reference an HTTP client library and "do something" once bootrun has completed, then it is a theoretical option.
Thanks for looking into it Charles...given it does not sound like there is a straightforward interim solution, I recommend we don't invest more time now, and put more effort towards the underlying root cause of the delay later. I think we have a reasonable workaround (manual refresh). Can we document when it happens (I haven't seen it but sounds like the rest of you have..)?
"On first request to the Spring server (from a browser, for example), it can hang for over 30 seconds on slower machines before responding. When the server doeesn't respond within 30 seconds, the request to it times out. This results in an error message being displayed in the browser. The user may refresh the browser page one or more times to work around this."
(user-friendly "page unavailable" implementation tbd under #443 will make the mentioned error message friendlier)
I think I found another event that puts me where I need to be for this call
I'm trying to set this aside but it's really bothering me that I can't find the right application hook.
Folks I think I have a solution. The slowdown may be due to a bug/unfortunate default in jetty. Jetty evidently scans for annotations on first request, and in a large project this can take some time. I didn't confirm that this is the case, and I didn't attempt to configure jetty, BUT
I swapped out tomcat for jetty and the startup time is much better. Or so it appears to me. I'd like to have someone else try it out; I'll put the commit up in a moment and we can see.
The jetty->tomcat commit is now on rc1 branch.
Thanks @grechaw @laurelnaiad
Able to deploy the app (all 3 platforms ) with latest code base that uses the tomcat . Will mark this as shipped after some sanity testing on the app .
@gghai said he still sees a timeout on commit 1bf118ae5ea7f04544d3d5f3b01c77d6642c5ccd.
If that is the case, and if the entire gradlew cache was clean before he tested, then Tomcat would seem not to have solved the problem... but I have my doubts about whether the gradle cache was clean. @grechaw will you please help him ensure that Tomcat is being used?
Below are the steps I did :
I did dbteardown clean ( update the latest code to commit 1bf118a.) ran appserver npm install bower install gulp run
I do see tomcat being used . Below are the logs :
essorChecker - Bean 'metaDataSourceAdvisor' of type [class org.springframework.s ecurity.access.intercept.aopalliance.MethodSecurityMetadataSourceAdvisor] is not eligible for getting processed by all BeanPostProcessors (for example: not elig ible for auto-proxying) 06:31:40.194 [main] INFO o.s.b.c.e.t.TomcatEmbeddedServletContainer - Tomcat in itialized with port(s): 8090 (http) 06:31:42.011 [main] INFO o.a.catalina.core.StandardService - Starting service T omcat 06:31:42.021 [main] INFO o.a.catalina.core.StandardEngine - Starting Servlet En gine: Apache Tomcat/8.0.15 06:31:42.984 [localhost-startStop-1] INFO o.a.c.c.C.[Tomcat].[localhost].[/] - Initializing Spring embedded WebApplicationContext 06:31:42.985 [localhost-startStop-1] INFO o.s.web.context.ContextLoader - Root WebApplicationContext: initialization completed in 12412 ms 06:31:49.567 [localhost-startStop-1] INFO c.m.client.impl.DocumentManagerImpl - Reading metadata and content for multiple uris beginning with com.marklogic.sam plestack.domain.Contributor/9611450a-0663-45a5-8a08-f1c71320475e.json 06:31:49.645 [Finalizer] INFO c.m.client.impl.DatabaseClientImpl - Releasing co nnection 06:31:51.164 [localhost-startStop-1] INFO c.m.client.impl.DocumentManagerImpl - Reading metadata and content for multiple uris beginning with com.marklogic.sam plestack.domain.Contributor/cf99542d-f024-4478-a6dc-7e723a51b040.json 06:31:51.227 [localhost-startStop-1] INFO c.m.s.d.MarkLogicContributorService - joe already in the database 06:31:51.227 [localhost-startStop-1] INFO c.m.s.d.MarkLogicContributorService - mary already in the database 06:31:51.693 [localhost-startStop-1] INFO o.s.s.l.DefaultSpringSecurityContextS ource - URL 'ldap://127.0.0.1:33389/dc=samplestack,dc=org', root DN is 'dc=samp lestack,dc=org' 06:31:51.722 [localhost-startStop-1] INFO o.s.l.c.s.AbstractContextSource - Pro perty 'userDn' not set - anonymous context will be used for read-write operation s 06:31:53.998 [localhost-startStop-1] INFO o.s.s.ldap.server.ApacheDSContainer - Setting working directory for LDAP_PROVIDER: C:\Users\builder\AppData\Local\Tem p\apacheds-spring-security-11711569264546 06:31:55.046 [localhost-startStop-1] INFO o.s.s.ldap.server.ApacheDSContainer - Starting directory server... 06:31:57.407 [localhost-startStop-1] ERROR o.a.d.s.s.r.DefaultAttributeTypeRegis try - attributeType w/ OID 2.5.4.16 not registered! 06:32:07.125 [localhost-startStop-1] INFO o.s.s.ldap.server.ApacheDSContainer - Loading LDIF file: C:\Users\builder\marklogic-samplestack\appserver\java-spring \src\main\resources\samplestack-ds.ldif 06:32:07.923 [localhost-startStop-1] INFO o.s.s.web.DefaultSecurityFilterChain
Using default security password: c7f672f9-90ac-46bb-b5f1-b5212adf0126
06:32:33.629 [main] INFO o.s.j.e.a.AnnotationMBeanExporter - Registering beans for JMX exposure on startup 06:32:33.712 [main] INFO o.a.coyote.http11.Http11NioProtocol - Initializing Pro tocolHandler ["http-nio-8090"] 06:32:33.742 [main] INFO o.a.coyote.http11.Http11NioProtocol - Starting Protoco lHandler ["http-nio-8090"] 06:32:33.762 [main] INFO o.a.tomcat.util.net.NioSelectorPool - Using a shared s elector for servlet write/read 06:32:33.877 [main] INFO o.s.b.c.e.t.TomcatEmbeddedServletContainer - Tomcat st arted on port(s): 8090 (http) 06:32:33.881 [main] INFO c.marklogic.samplestack.Application - Started Applicat ion in 68.883 seconds (JVM running for 87.676)
Yes, tomcat is in use here.
@grechaw , if we are to conclude that tomcat didn't solve it (which seems reasonable), should we revert to jetty? Just wondering...
Well, there's really nothing that would have me choose one over the other, except it seems there is an expectation of the use of tomcat in the field. tomcat's apache, jetty is eclipse. I'm turning on some debugging just to see where in the world this time is going.
Ok, if you want to back out Tomcat for Jetty , just let me know, otherwise I'll do "nothing". :smile:
Well, it's not looking like a solution is going to appear for 1.0.0. I suggest we push back to 8.0-2 milestone, although I will keep investigating.
You can pass this back to me in case you want me to re-try one more time on windows m/c Thanks
@grechaw did http://stackoverflow.com/questions/12857823/web-application-deployed-on-tomcat-is-slow-on-first-access
So we found that the timeout 'event' is on a GET from the express node server (port 3000) calling the middle tier REST server. @laurelnaiad set an explicit timeout on this GET so that the browser will not see it as a failed request.
I tested the change locally and the browser does indeed wait more patiently now for the slow first request, regardless of why its slow.
Giving to gaurav for test -- I think we got to a 'root cause' of sorts, which was to find the appropriate timeout and deal with it.
I think we didn't so much get to the root cause but an exacerbating bug.... 20 seconds to make a request is still pretty heavy...we'll see how Gaurav's slow machine makes out. :)
Worked like a magic . I saw the spinner for few seconds eventually the app loaded fine this time . Win7/IE9
C:\Users\builder\marklogic-samplestack>git rev-parse HEAD f644324c3dd5e1f34d901c90dab6f79ae41a04cd
How many seconds?
so the story before -- browser requests proxy requests proxy times out browser waits to timeout before hearing about proxy timeout.
Yes, the initial request is still slow, but at least all the components know about it now
Less then 30 i am guessing . Didnt count :-)
Will you please rerun and count the seconds? It's helpful in this case to understand what kind of time that machine is taking to do the warm up
May be able to deduce something about how much tomcat helped.
trying to catch up - is there consensus this is btw java & ML and browser is just a downstream manifestation, or could it still be something between java & browser?
@popzip I don't understand your question.
have we isolated where the root cause of the warm up issue is?
Is it experienced only in the browser or if we make curl calls to middle tier would we also see it?
There as a third ingrededient which was also timing out in addition to browser timing out, which was the proxy server. Charles discovered that and I upped its timeout.
I just want to know whether we're at 10, 15, 20, 25 seconds or what on the test machine now.
have we isolated where the root cause of the warm up issue is?
I think we don't know how much Charles' warm-up query helped, how much Tomcat helped, how much the proxy timeout change helped. Some combination of all three is in effect and it's working, but knowing which was which is why I'm still pursuing that number from Gaurav.... seconds to load...
if we make curl calls to middle tier would we also see it?
Whatever delay there is in warming up, you could see it in curl, too.
I would say be may be around 10-20 sec since you are pursing that number from me . I dont have exact count.
In e2e tests, I have a "dummy" request that gives the Java middle-tier a poke in order to warm it up. This helps avoid timeouts.
In the actual application, we have no such option. Let's discuss what we could do around the tail end of the bootrun task to get the JVM/Spring into memory and ready to respond to requests, so that we can bring down the browser tineout. I've just set the browser timeout to 30 seconds, but for a local app (or really any app that's not having actual internet connectivity problems) that shouldn't be necessary.