plone / plone.testing

Provides tools for writing unit and integration tests in a Zope and Plone environment. It is not tied to Plone, and it does not depend on Zope 2 (although it has some optional Zope 2-only features)
https://pypi.org/project/plone.testing/
3 stars 9 forks source link

WSGIServer layer isolation issues around thread locals (like SiteInfo) #58

Closed lukasgraf closed 4 years ago

lukasgraf commented 6 years ago

(Follow-up issue from plone/Products.CMFPlone#2592)

Steps to reproduce

(You may want to use ROBOT_BROWSER=chrome bin/test ... to run the robot tests with Chrome instead of FF)

=>

Tests fail with

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

because a plone.restapi got HTML instead of JSON in a response.


I've looked into this failure as part of https://github.com/plone/Products.CMFPlone/issues/2592#issuecomment-429538716.

The failure in plone.restapi happens because the subscriber registration of the mark_as_api_request for IPubStart seems to vanish when a robotframework test like the one from p.a.widgets runs before it.

This is the difference I'm seeing (all from inside a waitress request handler thread):


Passing test (just running the plone.restapi test alone):

(Pdb) sm = getSiteManager()
(Pdb) print list(sm.registeredHandlers())
[HandlerRegistration(<BaseGlobalComponents test-stack-3>, [IPubStart], u'', mark_as_api_request, File ".../plone/rest/configure.zcml", line 8.2-11.6)]

=> Site manager contains the event subscriber registration. Good.

Failing test (first the robot test from p.a.widgets, then the plone.restapi test):

(Pdb) sm = getSiteManager()
(Pdb) print list(sm.registeredHandlers())
[]

=> Registration is missing.


When I further inspect which exact stacked component registry instance we're working with, I see something strange:

Passing test:

(Pdb) sm
<BaseGlobalComponents test-stack-3>
(Pdb) from plone.testing import zca
(Pdb) zca._REGISTRIES
[<BaseGlobalComponents base>, <BaseGlobalComponents test-stack-1>, <BaseGlobalComponents test-stack-2>, <BaseGlobalComponents test-stack-3>]
(Pdb) zca._REGISTRIES[-1] is sm
True
(Pdb) id(zca._REGISTRIES[-1]), id(sm)
(4586995088, 4586995088)

=> The current site manager is the last one from plone.testing.zca's component registry stack. Good.

Failing test:

(Pdb) sm
<BaseGlobalComponents test-stack-3>
(Pdb) from plone.testing import zca
(Pdb) zca._REGISTRIES
[<BaseGlobalComponents base>, <BaseGlobalComponents test-stack-1>, <BaseGlobalComponents test-stack-2>, <BaseGlobalComponents test-stack-3>]
(Pdb) zca._REGISTRIES[-1] is sm
False
(Pdb) id(zca._REGISTRIES[-1]), id(sm)
(4652502480, 4583929552)

=> The current site manager is a different one than the last one from plone.testing.zca's stack, yet has the same name (test-stack-3) / stack depth number.


If we actually poke the last component registry from zca's stack (during the failing test), that's where our subscriber registration would be:

Failing test:

(Pdb) print list(zca._REGISTRIES[-1].registeredHandlers())
[HandlerRegistration(<BaseGlobalComponents test-stack-3>, [IPubStart], u'', mark_as_api_request, File ".../plone/rest/configure.zcml", line 8.2-11.6)]

This leads be to believe that there's an isolation issue around the component registry with the WSGIServer layer.

My hypothesis is this:

The reference to the current site manager is stored on the SiteInfo object in zope.component.hooks. SiteInfo is thread local. Since the WSGIServer layer starts its threads on layer setUp, and only tears them down at layer teardown, its threads start off with a SiteInfo instance after initial layer setup that is then detached from the main thread's SiteInfo (becomes thread-local) for the lifecycle of the layer.

That would then render plone.testing.zca's attempts to provide component registry isolation via its stacked configuration contexts ineffective, because its pushGlobalRegistry() / popGlobalRegistry() functions operate on the main thread's site manager, not on the waitress thread one's.

In an attempt to confirm this hypothesis, I changed SiteInfo to inherit from object instead of threading.local - and indeed, this did "fix" the test failure. I was also able to reproduce the test failure with other robot tests, not just p.a.widgets - the robot tests from p.a.multilingual or p.a.contenttypes cause exactly the same issue (if they happen to be run before the plone.restapi test).

/cc @davisagli (because you indicated you could maybe look into this eventually, and I'm out of my depth with threading issues around webtest/waitress/WSGI to be honest)

pbauer commented 6 years ago

@lukasgraf why are you setting it up with buildout-py3.cfg? In Python 3 this should not work at all because Products.ATContentTypes is not ported to Python 3. Instead you should use buildout-py2.cfg which pulls in all AT packages for the tests.

Another question: I do not understand why the issue only happens in TestBatchingArchetypes and not in TestBatchingSearch or any of the other DX testcases that use exactly the same event handler and test the same thing using a very similar testlayer (including plone.testing.zope.WSGIServer).

lukasgraf commented 6 years ago

@pbauer good point, I got that wrong. Should be buildout-py2.cfg as you say. I'll retest it and update the steps to reproduce accordingly.

As for the second question: I think it might be because of a different order the individual tests are run in. If my suspicion about the thread local SiteInfo being at the heart of the issue is correct, then it's basically a scenario where the first test's setup controls what instance of the stacked site manager is getting used for the rest of the layer.

jensens commented 4 years ago

I think this is n longer an issue, reopen if I am wrong.