Problems with threads - Githubissues

pydanny / cached-property

A decorator for caching properties in classes.

BSD 3-Clause "New" or "Revised" License

688 stars 78 forks source link

Problems with threads #24

Open vpavlin opened 9 years ago

vpavlin commented 9 years ago

Hi, I wanted to extend your project so I created a Dockerfile to run tests in container. I did some modifications and when I ran tests they failed from time to time. So I reverted my changes and ran it again against current HEAD.

See results here http://pastebin.centos.org/34206/

Simply put, tests for threading randomly fail - in this case in 6th run. I am not very familiar with multi threaded applications, but will be happy to assist with debugging.

pydanny commented 9 years ago

Yikes! I have some questions:

Are you saying cached_property fails on 1 out of every 6 runs on Fedora?
How many attempts have you made?
My docker-fu is weak. How exactly is cached_property being installed. Is it these lines: https://github.com/vpavlin/cached-property/blob/disk-stored-caching/Dockerfile#L6-L8

Tomorrow I'll try and replicate this on Fedora and Ubuntu.

vpavlin commented 9 years ago

No, in this case it was 1 of 6, in other run it was one of 10..it seems to be completely random. I'll let it run for longer time today and see the ratio.
Not many 5, maybe
Yes, ADD line copies repo content to the image, WORKDIR changes working directory so that I don't have to use absolute path for requirements.txt

pydanny commented 9 years ago

@ncoghlan, do you have any idea why this is happening? I can't reproduce it on Mac OS X or on the CI server.

ncoghlan commented 9 years ago

I was able to reproduce it without Docker on Fedora 22 - same symptoms as @vpavlin, with only 4 entries in the cache rather than the expected 5.

Dependencies were installed in the virtualenv via "pip -r requirements.txt".

Looking at the test case, I'm not seeing anything wrong except that I'm not sure the block comment is right about there being a race condition: run_threads() joins all the created threads before returning, so "cached_total" should always have been updated correctly before the assertions are checked.

ncoghlan commented 9 years ago

Running py.test --pdb rather than plain py.test so it drops into the debugger when the test fails, I can see that check.cached_total is 10, while check.add_cached is only 9.

The particular case that failed there was the one in test_threads_ttl (using freeze_time). The first failure I saw was in test_threads.

Fedora does patch the threading module, but we shouldn't be hitting that code since we aren't passing a timeout to Thread.wait: http://pkgs.fedoraproject.org/cgit/python.git/tree/00181-allow-arbitrary-timeout-in-condition-wait.patch

ncoghlan commented 9 years ago

Oh, now I understand that block comment in test_threads - I didn't realise what cached_property actually did :)

In that case, I suppose it's possible that for some reason one of the threads is dragging its feet and not starting running until after one of the other threads has already shadowed the property with a normal value.

ncoghlan commented 9 years ago

I'm heading out, but I've kicked off a test run using this modified implementation of run_threads in the tests:

    def run_threads(self, num_threads):
        launch = Event()
        def _replace_in_thread(started):
            return lambda: (started.set(), launch.wait(), self.add_cached)
        threads = []
        start_events = []
        for _ in range(num_threads):
            started = Event()
            thread = Thread(target=_replace_in_thread(started))
            thread.start()
            threads.append(thread)
            start_events.append(started)
        for started in start_events:
            started.wait()
        launch.set()
        for thread in threads:
            thread.join()

That means the sleep in the cached property test is only covering the threads reacting to the event being signalled, rather than actually starting them. (There's still be a problem to report to the Fedora python and/or kernel teams though, so it would be weird for 5 threads to be taking an entire second to all start)

ncoghlan commented 9 years ago

Even with that modified implementation, that still failed within 28 runs of the test suite. I'd suggest pinging the Fedora python-devel list at this point.

vpavlin commented 9 years ago

@ncoghlan Do you want me to do it, or will you? (Anyway thanks for looking at this)

ncoghlan commented 9 years ago

@vpavlin I was suggesting you do that. I'm now assuming there's something odd in the Fedora Python patches, and I'm the wrong person to ask about that.

vpavlin commented 9 years ago

@rkuska asked me to file a bug against python in Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=1261445

audreyfeldroy commented 8 years ago

We just moved cached-property's Travis builds to use the container-based infrastructure. There, we ran into what appears to be the same issue: https://travis-ci.org/pydanny/cached-property/jobs/93074100

cedk commented 6 years ago

I got a similar bug report for the Gentoo Ebuild: https://bugs.gentoo.org/638250

pydanny commented 5 years ago

Is this still an issue? All current tests are passing: https://travis-ci.org/pydanny/cached-property

DarthGandalf commented 4 years ago

With 1.5.1:

==================== FAILURES ======================
____ TestCachedPropertyWithTTL.test_threads_ttl_expiry _________

self = <tests.test_cached_property.TestCachedPropertyWithTTL testMethod=test_threads_ttl_expiry>

    def test_threads_ttl_expiry(self):
        Check = CheckFactory(self.cached_property_factory(ttl=100000), threadsafe=True)
        check = Check()
        num_threads = 5

        # Same as in test_threads
        check.run_threads(num_threads)
        self.assert_cached(check, num_threads)
        self.assert_cached(check, num_threads)

        # The cache expires in the future
        with freeze_time("9999-01-01"):
            check.run_threads(num_threads)
>           self.assert_cached(check, 2 * num_threads)

tests/test_cached_property.py:207: 
_ _ _ _ _ _ _ _ 
tests/test_cached_property.py:69: in assert_cached
    self.assertEqual(check.add_cached, expected)
E   AssertionError: 6 != 10
======= 1 failed, 29 passed in 8.89 seconds ==========

All current tests are passing

Perhaps Travis CPU isn't loaded enough for the race to trigger?