oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.34k stars 745 forks source link

WebJarAssetLocator() fails due to NoSuchFileException #3479

Closed ChristopheBordieu closed 3 years ago

ChristopheBordieu commented 3 years ago

Sometime, in a random way, when (re)starting Tomcat, webapp opengrok (/grok) does not start cleanly. It happens with:

To Reproduce It is random issue.

Screenshots ==> /srv/apache-tomcat/current/logs/localhost.2021-03-12.log <== 12-Mar-2021 15:08:01.680 SEVERE [http-nio-8080-exec-2] org.apache.catalina.core.StandardWrapperValve.invoke Servlet.service() for servlet [jsp] in context with path [/grok] threw exception [Filter execution threw an exception] with root cause java.nio.file.NoSuchFileException: /home/opengrok/.ansible/tmp/ansible-moduletmp-1615561608.352246-7tsjb95r at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) at java.base/sun.nio.fs.UnixPath.toRealPath(UnixPath.java:899) at nonapi.io.github.classgraph.utils.FileUtils.(FileUtils.java:93) at io.github.classgraph.ScanResult.init(ScanResult.java:215) at io.github.classgraph.ClassGraph.(ClassGraph.java:91) at org.webjars.WebJarAssetLocator.(WebJarAssetLocator.java:150) at org.opengrok.web.Scripts.(Scripts.java:103) at org.opengrok.web.PageConfig.(PageConfig.java:161) at org.opengrok.web.PageConfig.get(PageConfig.java:1557) at org.opengrok.web.StatisticsFilter.doFilter(StatisticsFilter.java:69) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:185) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:158) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:543) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:690) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:353) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:374) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:870) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1696) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.base/java.lang.Thread.run(Thread.java:834) 12-Mar-2021 15:08:01.698 SEVERE [http-nio-8080-exec-2] org.apache.catalina.core.ApplicationDispatcher.invoke Servlet.service() for servlet [error] threw exception java.lang.NoClassDefFoundError: Could not initialize class org.opengrok.web.Scripts at org.opengrok.web.PageConfig.(PageConfig.java:161) at org.opengrok.web.PageConfig.get(PageConfig.java:1557) at org.apache.jsp.error_jsp._jspService(error_jsp.java:147) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:467) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:379) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:327) at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:223) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:158) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:692) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:441) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:366) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:294) at org.apache.catalina.core.StandardHostValve.custom(StandardHostValve.java:382) at org.apache.catalina.core.StandardHostValve.status(StandardHostValve.java:241) at org.apache.catalina.core.StandardHostValve.throwable(StandardHostValve.java:336) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:161) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:690) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:353) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:374) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:870) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1696) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.base/java.lang.Thread.run(Thread.java:834) 12-Mar-2021 15:08:01.699 SEVERE [http-nio-8080-exec-2] org.apache.catalina.core.StandardHostValve.custom Exception Processing ErrorPage[errorCode=500, location=/error] org.apache.jasper.JasperException: An exception occurred processing [httpheader.jspf] at line [34]

31: 32: cfg = PageConfig.get(request); 33: cfg.setTitle("your title"); 34: 35: 36: --%> 37: <%@page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" import="

Stacktrace: at org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:611) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:490) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:379) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:327) at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:223) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:158) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:692) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:441) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:366) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:294) at org.apache.catalina.core.StandardHostValve.custom(StandardHostValve.java:382) at org.apache.catalina.core.StandardHostValve.status(StandardHostValve.java:241) at org.apache.catalina.core.StandardHostValve.throwable(StandardHostValve.java:336) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:161) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:690) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:353) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:374) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:870) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1696) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: jakarta.servlet.ServletException: java.lang.NoClassDefFoundError: Could not initialize class org.opengrok.web.Scripts at org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:674) at org.apache.jsp.error_jsp._jspService(error_jsp.java:379) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:467) ... 26 more Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.opengrok.web.Scripts at org.opengrok.web.PageConfig.(PageConfig.java:161) at org.opengrok.web.PageConfig.get(PageConfig.java:1557) at org.apache.jsp.error_jsp._jspService(error_jsp.java:147) ... 29 more

Any idea what is this problem?

vladak commented 3 years ago

The problem happened when PageConfig instantiated the Scripts class which in turn instantiated the WebJarAssetLocator class (3rd party). It seems that it is performing a scan of various directories and it hit a problem there. Looks like a TOUTOC problem to me.

What is /home/opengrok/.ansible/tmp/ansible-moduletmp-1615561608.352246-7tsjb95r w.r.t. the OpenGrok source/data root ? Perhaps the WebJarAssetLocator constructor should be called with distinct path set to avoid looking in strange locations.

vladak commented 3 years ago

What is the outcome ? Does the webapp actually fail to deploy ?

ChristopheBordieu commented 3 years ago

If I restart Tomcat by hand, it always works. Issue happens when restarting Tomcat remotely via Ansible indeed, so when some temporary files needed by Ansible are available into remote OpenGrok server /home/opengrok/.ansible directory.

vladak commented 3 years ago

Scripts should use WebJarAssetLocator constructor with a whitelist: https://github.com/webjars/webjars-locator-core/blob/b43b2e9f275797c336651de57b58f03c7174cba6/src/main/java/org/webjars/WebJarAssetLocator.java#L157-L159

vladak commented 3 years ago

If I restart Tomcat by hand, it always works. Issue happens when restarting Tomcat remotely via Ansible indeed, so when some temporary files needed by Ansible are available into remote OpenGrok server /home/opengrok/.ansible directory.

Ok, what is /home/opengrok then ?

ChristopheBordieu commented 3 years ago

The home directory of user opengrok that is used to run the OpenGrok service (webapp) and indexing (so owner of OpenGrok source and data directories).

vladak commented 3 years ago

After looking around webjars and ClassGraph and tracing system calls done when calling WebJarAssetLocator(), I am not sure this is actually a bug. When Scripts() calls the WebJarAssetLocator() constructor, it does this:

    public WebJarAssetLocator() {
        this.allWebJars = this.scanForWebJars(new ClassGraph());
    }

The ClassGraph() empty argument constructor is called no matter how the WebJarAssetLocator() constructor is called. The ClassGraph() constructor does this:

    public ClassGraph() {
        ScanResult.init();
    }

and ScanResult has a static block that calls into FileUtils (https://github.com/classgraph/classgraph/blob/ce2c9afcd8e279c912ecdbea87248856db11428a/src/main/java/io/github/classgraph/ScanResult.java#L213):

 FileUtils.closeDirectByteBuffer(ByteBuffer.allocateDirect(32), /* log = */ null);

and FileUtils in turn has a static block https://github.com/classgraph/classgraph/blob/ce2c9afcd8e279c912ecdbea87248856db11428a/src/main/java/nonapi/io/github/classgraph/utils/FileUtils.java#L93-L110:

static {
        String currDirPathStr = "";
        try {
            // The result is moved to currDirPathStr after each step, so we can provide fine-grained debug info and
            // a best guess at the path, if the current dir doesn't exist (#109), or something goes wrong while
            // trying to get the current dir path.
            Path currDirPath = Paths.get("").toAbsolutePath();
            currDirPathStr = currDirPath.toString();
            currDirPath = currDirPath.normalize();
            currDirPathStr = currDirPath.toString();
            currDirPath = currDirPath.toRealPath(LinkOption.NOFOLLOW_LINKS);
            currDirPathStr = currDirPath.toString();
            currDirPathStr = FastPathResolver.resolve(currDirPathStr);
        } catch (final IOException e) {
            throw new RuntimeException("Could not resolve current directory: " + currDirPathStr, e);
        }
        CURR_DIR_PATH = currDirPathStr;
    }

where it is trying to resolve current working directory.

According to the stack trace the above call to currDirPath.toRealPath() throws NoSuchFileException which is subclass of IOException so the above code rethrows it as the unchecked RuntimeException and hence the web app goes down.

So, the above suggestion to call the WebJarAssetLocator() constructor with a whitelist directory would not really help.

I think this is may caused by how the Ansible action is done. Looks like it is creating a temporary directory. It is a bit wild speculation however could it be that it executes the Tomcat restart there after wiping out the directory underneath itself ? Or maybe the Ansible action starts Tomcat from this directory in the background, then removes the directory and by the time the web app tries to resolve the working directory it finds out it is hanging in void. It would make sense to trace the time of the removal of the Ansible temporary directory w.r.t. the web app startup time. Better yet, find the current working directory of the web app once it starts (e.g. via pwdx or lsof).

vladak commented 3 years ago

In a trivial program that basically calls just new WebJarAssetLocator() I put a breakpoint at the beginning of the (decompiled) static block in ClassGraphs's FileUtils, stopped there, removed the current working directory, continued and it resulted in the NoSuchFileException exception thrown from the currDirPath.toRealPath(LinkOption.NOFOLLOW_LINKS) call. I don't think we can do anything about that in OpenGrok.

ChristopheBordieu commented 3 years ago

Many thanks @vladak for the debug! It helps a lot. I have to ensure Tomcat restart is done from outside Ansible remote tmp directory so that this issue no more occurs. I have checked very quickly now and it seems to work ok now.