ops4j / org.ops4j.pax.web

OSGi R7 Http Service, Whiteboard and Web Applications (OSGi CMPN Release chapters 102, 140 and 128) implementation using Jetty 9, Tomcat 9 or Undertow 2.
https://ops4j1.jira.com/wiki/display/paxweb/Pax+Web
Other
146 stars 184 forks source link

Wrong TCCL in HashSessionManager timer threads [PAXWEB-1010] #1288

Closed ops4j-issues closed 8 years ago

ops4j-issues commented 8 years ago

Grzegorz Grzybek created PAXWEB-1010

I'm investigating memory dumps of Karaf/Fuse applications after several redeployments of bundles/features.

I focused on finding these bundle revisions which have more than one active bundle wiring (Felix implementation) and I was looking for GC roots of these wirings (connected with classloaders). I also was checking why there are more than one active instance of org.eclipse.jetty.server.ServerConnector and I saw this:

this     - value: org.eclipse.jetty.server.ServerConnector #2
 <- httpConnector     - class: org.ops4j.pax.web.service.jetty.internal.ServerControllerImpl, value: org.eclipse.jetty.server.ServerConnector #2
  <- serverController     - class: org.ops4j.pax.web.service.internal.Activator, value: org.ops4j.pax.web.service.jetty.internal.ServerControllerImpl #1
   <- m_activator     - class: org.apache.felix.framework.BundleImpl, value: org.ops4j.pax.web.service.internal.Activator #1
    <- [8]     - class: java.lang.Object[], value: org.apache.felix.framework.BundleImpl #96
     <- elementData     - class: java.util.ArrayList, value: java.lang.Object[] #13998
      <- bundles     - class: org.ops4j.pax.web.service.spi.util.ResourceDelegatingBundleClassLoader, value: java.util.ArrayList #13469
       <- contextClassLoader (thread object)     - class: java.lang.Thread, value: org.ops4j.pax.web.service.spi.util.ResourceDelegatingBundleClassLoader #3

the problem is that if org.ops4j.pax.web PID was updated, pax-web-jetty services (like ServerController or HttpService) were republished, but running web bundles were not fully unpublished (bug coming soon - related to wrong ref counting in pax-web-extender-war). In the end, running HashSessionManager related threads had TCCL set to org.ops4j.pax.web.service.spi.util.ResourceDelegatingBundleClassLoader related to not-fully-unpublished WARs and the chain of references goes from bundle, through activator, server controller to ServerConnector.

What's more, this (and this pax-web-extender-war bug) leads to (randomly) java.lang.LinkageError: org/ops4j/pax/web/service/spi/model/ServerModel because we had more than one connector related to different wirings. Unpublished webapps were still available in org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection#_handlers.

I have a fix for this issue, which is part of resolving classloader related problems in pax-web.


Affects: 4.2.8 Fixed in: 4.3.0, 6.0.0 Votes: 0, Watches: 1

ops4j-issues commented 8 years ago

Grzegorz Grzybek commented

Here's how it happens...

PID for cxf-rt-transport-http bundle is changed and CXF registers some servlet:

"CM Configuration Updater (ManagedService Update: pid=[org.apache.cxf.osgi])@3043" daemon prio=5 tid=0x16 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
      at org.eclipse.jetty.server.session.HashSessionManager.doStart(HashSessionManager.java:135)
      at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
      - locked <0x1a0f> (a java.lang.Object)
      at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
      at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
      at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
      at org.eclipse.jetty.server.handler.ScopedHandler.doStart(ScopedHandler.java:120)
      at org.eclipse.jetty.server.session.SessionHandler.doStart(SessionHandler.java:116)
      at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
      - locked <0x1a10> (a java.lang.Object)
      at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:132)
      at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:114)
      at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
      at org.eclipse.jetty.server.handler.ScopedHandler.doStart(ScopedHandler.java:120)
      at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:784)
      at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:294)
      at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.startContext(HttpServiceContext.java:603)
      at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:741)
      at org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doStart(HttpServiceContext.java:261)
      at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
      - locked <0x1a11> (a java.lang.Object)
      at org.ops4j.pax.web.service.jetty.internal.JettyServerImpl$1.start(JettyServerImpl.java:279)
      at org.ops4j.pax.web.service.internal.HttpServiceStarted.registerServlet(HttpServiceStarted.java:221)
      at org.ops4j.pax.web.service.internal.HttpServiceStarted.registerServlet(HttpServiceStarted.java:195)
      at org.ops4j.pax.web.service.internal.HttpServiceStarted.registerServlet(HttpServiceStarted.java:179)
      - locked <0x1a12> (a java.lang.Object)
      at org.ops4j.pax.web.service.internal.HttpServiceProxy.registerServlet(HttpServiceProxy.java:64)
      at org.apache.cxf.transport.http.osgi.ServletExporter.updated(ServletExporter.java:108)
      at org.apache.felix.cm.impl.helper.ManagedServiceTracker.updated(ManagedServiceTracker.java:189)
      at org.apache.felix.cm.impl.helper.ManagedServiceTracker.updateService(ManagedServiceTracker.java:152)
      at org.apache.felix.cm.impl.helper.ManagedServiceTracker.provideConfiguration(ManagedServiceTracker.java:85)
      at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceUpdate.provide(ConfigurationManager.java:1440)
      at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceUpdate.run(ConfigurationManager.java:1396)
      at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
      at java.lang.Thread.run(Thread.java:745)

this sets TCCL of thread from the scheduler to current TCCL which is org.ops4j.pax.web.service.spi.util.ResourceDelegatingBundleClassLoader.

ops4j-issues commented 8 years ago

Grzegorz Grzybek commented

Fixed in web-4.2.x: https://github.com/ops4j/org.ops4j.pax.web/commit/e23acea613434378cdd450636308760d802b5968
Fixed in master: https://github.com/ops4j/org.ops4j.pax.web/commit/b6ec897a0d94afcf97b8575fbfd4b8ebe7f63fcf