ops4j / org.ops4j.pax.web

OSGi R7 Http Service, Whiteboard and Web Applications (OSGi CMPN Release chapters 102, 140 and 128) implementation using Jetty 9, Tomcat 9 or Undertow 2.
https://ops4j1.jira.com/wiki/display/paxweb/Pax+Web
Other
146 stars 184 forks source link

Incorrect reference counting for ServletContextInfo [PAXWEB-1013] #1291

Closed ops4j-issues closed 8 years ago

ops4j-issues commented 8 years ago

Grzegorz Grzybek created PAXWEB-1013

Each call to org.ops4j.pax.web.service.jetty.internal.JettyServerImpl#addServlet() is separate call to org.ops4j.pax.web.service.jetty.internal.JettyServerWrapper#getOrCreateContext() which in Jetty, uses ref counting.
org.ops4j.pax.web.service.jetty.internal.JettyServerImpl#removeServlet() only called removeContext() when number of servlets dropped to 0.

But if you have WAR with filters, listeners, welcome pages, you have multiple increaseRefCount calls and only few (one?) decreaseRefCount.

org.ops4j.pax.web.service.jetty.internal.ServerControllerImpl.Started#removeContext() is called from org.ops4j.pax.web.service.internal.HttpServiceStarted#stop() and should remove the context without respecting refcount.

org.ops4j.pax.web.service.internal.HttpServiceStarted#begin() calls org.ops4j.pax.web.service.spi.ServerController#getContext() and there's no matching removeContext().

That's why I think forced remove is needed.


Affects: 4.2.8 Fixed in: 4.3.0, 3.3.0, 6.0.0 Votes: 0, Watches: 2

ops4j-issues commented 8 years ago

Grzegorz Grzybek commented

Achim Nierbeck here's promised background.

I see two tests failing here - all because of my attempt to fix the refcounting for org.ops4j.pax.web.service.jetty.internal.JettyServerWrapper#contexts.
org.ops4j.pax.web.service.jetty.internal.JettyServerWrapper#removeContext() was actually removing the context (== stopping org.ops4j.pax.web.service.jetty.internal.JettyServerWrapper.ServletContextInfo#handler) only after refcount dropped to zero.

It was fine with whiteboard when registering/unregistering individual servlets, but never happened in webapps. Even registerFilter/unregisterFilter didn't compensate refcount. Registration of welcome files, error pages, filters, event listeners and other elements all increased the refcount, but unregistration of these elements - don't.

The effect of not stopping org.ops4j.pax.web.service.jetty.internal.HttpServiceContext was that JettyServerHandlerCollection._handlers array had stale webapp handlers. It's enough to start and stop a WAR bundle.

In extreme (or even usual) cases it leads even to java.lang.LinkageError - see https://issues.jboss.org/browse/ENTESB-5935.

https://github.com/ops4j/org.ops4j.pax.web/commit/bc21154e29395a5ec91ade0b34a6d3924e87bcd4 adds one new method to internal (I think so) interface org.ops4j.pax.web.service.jetty.internal.JettyServer, so when org.ops4j.pax.web.service.internal.HttpServiceStarted#stop() cleans server model and removes the context, ref count != 0 doesn't prevent the resources from being freed.

ref count is specific to Jetty part of pax-web (right?) and I as that even simple registration of servlet ref count was already equal to 3.

Also, if you look at org.ops4j.pax.web.extender.war.internal.RegisterWebAppVisitorWC#visit(), org.ops4j.pax.web.service.internal.HttpServiceStarted#begin() already calls serverController.getContext(contextModel), which increases the ref count.

And why Tomcat/Undertow parts don't use refcount?

That's all I can describe now - I'll think about proper refcount management in terms of shared context (which I didn't do before).

ops4j-issues commented 8 years ago

Achim Nierbeck commented

The source of all evil (regarding this counting with the jetty container) is somewhere in this Issue Tracker. My guess right now it's a issue number between 350 and 528. Unfortunately the sources have been re-factored quite often so I can't really tell the reason, but there had been a reason.

ops4j-issues commented 8 years ago

Achim Nierbeck commented

according to the jenkins we're down to one failing test:
http://ci.ops4j.org/jenkins/job/org.ops4j.pax.web-4.2.x_leak-fixes/2/testReport/

ops4j-issues commented 8 years ago

Grzegorz Grzybek commented

IMO "expected:<404> but was:<500>" is a hint for PAXWEB-981 == R6/whiteboard/error-pages as well. Test expected 404 because without my fix refcount > 0 prevents from unregistration of the handler from JettyServerHandlerCollection. With my "fix", we have 500, because actually there are no more handlers.
On Monday I'll try to write this test for Undertow and/or Tomcat as well to find some analogies and maybe it'll give me a hint about this issue too.

ops4j-issues commented 8 years ago

Grzegorz Grzybek commented

Achim Nierbeck, I've resolved the SharedContextFilterIntegrationTest failure and build is now OK: http://ci.ops4j.org/jenkins/job/org.ops4j.pax.web-4.2.x_leak-fixes/

I've rebased my ggrzybek-leak-fixes branch on top of 4.3.x. The fix for above test is here - if you think it's ok now, please merge to 4.3.x branch.

ops4j-issues commented 8 years ago

Achim Nierbeck commented

Looks good to me so far, let's go ahead and merge it.

ops4j-issues commented 8 years ago

Grzegorz Grzybek commented

Merging to master and building+testing locally now. Already merged+pushed to pax-web-4.3.x

ops4j-issues commented 8 years ago

Grzegorz Grzybek commented

Fixed and tested in pax-web-4.3.x:

Fixed and tested in master: