Closed dhermes closed 7 years ago
@tseaver Having a not-green build is making me itchy. Are you planning on checking this out?
I will start on this issue today. I have zero clue why the following would fail on Python 3.4 on Windows but not on Linux:
____________________ TestTransactionPingingPool.test_bind _____________________
self = <unit_tests.test_pool.TestTransactionPingingPool testMethod=test_bind>
def test_bind(self):
pool = self._makeOne()
database = _Database('name')
SESSIONS = [_Session(database) for _ in range(10)]
database._sessions.extend(SESSIONS)
> pool.bind(database)
unit_tests\test_pool.py:567:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.tox\py34\lib\site-packages\google\cloud\spanner\pool.py:413: in bind
self.begin_pending_transactions()
.tox\py34\lib\site-packages\google\cloud\spanner\pool.py:440: in begin_pending_transactions
super(TransactionPingingPool, self).put(session)
.tox\py34\lib\site-packages\google\cloud\spanner\pool.py:340: in put
self._sessions.put_nowait((_NOW() + self._delta, session))
c:\python34\Lib\queue.py:187: in put_nowait
return self.put(item, block=False)
c:\python34\Lib\queue.py:146: in put
self._put(item)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <queue.PriorityQueue object at 0x043BEF30>
item = (datetime.datetime(2017, 2, 15, 0, 14, 41, 439103), <unit_tests.test_pool._Session object at 0x043AED50>)
def _put(self, item):
> heappush(self.queue, item)
E TypeError: unorderable types: _Session() < _Session()
c:\python34\Lib\queue.py:230: TypeError
(There is an identical failure in unit_tests.test_pool.TestTransactionPingingPool testMethod=test_put_full
).
OK, my theory is that the clock resolution on the Appveyor Windows host is weird, such that successive calls to datetime.datetime.utcnow()
are returning the same time, which makes the priority queue switch to the mock session to distinguish between entries.
I can work around the test failure by making _Session
total-orderable: likely I need to ensure that the "real" Session
class can be ordered, too.
A "sporadic" failure: https://ci.appveyor.com/project/GoogleCloudPlatform/google-cloud-python/build/1.0.1081.master/job/rl8ms6nhw1gibeb1
@tseaver It seems very possible that the culprit is a unit test that talks to the backend on accident
It seems very possible that the culprit is a unit test that talks to the backend on accident
I don't think so in this case: the database and spanner_api objects are completely mocked out. I think what is happening is that the Appveyor machine is slow / loaded enough such that the very low retry timeout (0.01 seconds) is cutting the retry loop off before the first retry. We can try raising that timeout (maybe 50 ms vs. 10?)
Why don't we also mock the retry?
The whole point of that test is to ensure that run_with_transaction
does retry on a gRPC ABORT.
No worries. Just want to make sure the test is deterministic.
run_with_transaction(callback)
sets up a transaction and passes it to callback
, and afterwards attempts to commit the transaction. While the commit
back-end raises ABORT
, and the user-supplied timeout has not expired, it retries.
In order to exercise the retry loop without adding too much delay, I set the timeout to 10 msec. I didn't think it would be possible for the following to take longer than that:
def unit_of_work(txn):
called_with.append((txn, args, kw))
txn.insert(TABLE_NAME, COLUMNS, VALUES)
(along with the call to txn.commit()
, which then delegates to the mocked-out spanner_api
, raising an Abort
).
I didn't count on the fact that a loaded CI host might make that assumption.
Why does the timeout play any role?
The run_with_transaction
method keeps retrying until the user-supplied timeout expires. In the failing test, the timeout expired unexpectedly during the first run of the callback, which led to the assertion failure: there was no retry.
Got it. Can we mock out the part that decides when to retry so it doesn't rely on elapsed time (which is what makes it nondeterministic)?
Have a look at the MUT (Session.run_in_transaction
) is the bit that makes the decision.
@tseaver Maybe that's an indication that a re-factor would be useful?
https://ci.appveyor.com/project/GoogleCloudPlatform/google-cloud-python/build/1.0.1072.master/job/qvj0q69feleskbpu