Closed leycec closed 5 years ago
I have to admit, I really enjoyed reading this 😅
I briefly looked at the logs and it seems like it was only running the compiler migration and not the version bump. But that doesn't seem right.
The bot's been failing for 4 days in the make_graph
stage. Here is the first failing build. https://circleci.com/gh/regro/circle_worker/4904
</forlorn_sigh>
Right-o. Obscure failures in graph theory. That sounds... non-trivial. Manual bumps it is, then! Thanks for the intrepid investigation, @scopatz and @CJ-Wright. May somebody head this stampede off at the pass before it tramples us all to death under a rising tide of boilerplate busywork.
I couldn't help myself. I must be an open-source sadist, because I ~foolishly~ diligently looked into this. Courtesy the CircleCI logfile that @CJ-Wright kindly referenced, we find the following suspicious multiprocessing
exception:
2019-01-20 15:10:49,892 INFO conda_forge_tick.make_graph || (37, 'davix')
Exception in thread Thread-4:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/opt/conda/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.6/concurrent/futures/process.py", line 272, in _queue_management_worker
result_item = reader.recv()
File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
TypeError: __init__() missing 1 required positional argument: 'lineno'
"But what's suspicious about that, @leycec?", you might now be asking yourself. I shall gladly pontificate.
What's suspicious about that is the failing line "return _ForkingPickler.loads(buf.getbuffer())"
, which appears to contains no implicit calls to an __init__()
method referenced by the exception message "TypeError: __init__() missing 1 required positional argument: 'lineno'"
– or does it? In the Pickling Class Instances section of the official pickle
documentation, we find this suspiciously vague hand-waving:
In most cases, no additional code is needed to make instances picklable. By default,
pickle
will retrieve the class and the attributes of an instance via introspection. When a class instance is unpickled, its__init__()
method is usually not invoked. The default behaviour first creates an uninitialized instance and then restores the saved attributes.
Clearly, the default behaviour does not apply to whatever non-standard object is being pickled here. We can safely infer that, against all sane expectations, an __init__()
method is actually being called here. Unfortunately, the resulting exception is sufficiently vague that it's unclear exactly what type of non-standard object is to blame.
I myself have suffered through the hair-ripping vagaries of both the pickle
and multiprocessing
APIs, whose well-known harmful interactions with one another border on outright broken. My sagely cave-dwelling advice is simple, proven, and effective: abandon the pickle
-based multiprocessing
API for a dill
-based multiprocessing
fork.
In 2019, pickle
and any module requiring pickle
(including multiprocessing
) should be regarded as functionally illiterate, painfully obsolete, and effectively broken. pickle
fails to pickle most real-world objects of interest with non-human-readable exceptions (see: above), which means that multiprocessing
fails to multiprocess most real-world objects of interest. Relevant outstanding issues on the Python issue tracker include this, this, and this.
The Python stdlib should have switched out pickle
for dill
, which suffers none of the well-known deficits of pickle
and is actually capable of pickling most real-world objects of interest, decades ago. Instead, pickle
stagnated in its own dank pool of bodily fluids. Everyone (including us) just use dill
and dill
-based substitutes instead. We're all happier and saner for it. Well-maintained substitutes include:
pathos
, whose pathos.multiprocessing
subpackage is an API-compatible fork of the standard multiprocessing
API that substitutes pickle
for dill
. In fact, given that the pathos
tagline is "parallel graph management and execution in heterogeneous computing," the autoticker would probably benefit from more than merely pathos.multiprocessing
. Also, did I mention that pathos
is maintained by friggin' CalTech?multiprocessing_on_dill
, an API-compatible fork of the standard multiprocessing
API that (...wait for it) substitutes pickle
for dill
.Technically, you might be able to brute-force your way through this specific issue, which would buy you a temporary reprieve and permit the autoticker to continue to leverage the standard multiprocessing
API. Practically, you'll continue to hit obscure pickle
edge cases and collectively wonder why someone didn't simply do the Right Thing™ and abandon the standard multiprocessing
API for genuinely sane alternatives.
Thus ends this broadcast of the emergency multiprocessing system.
odds are we can also just substitute in dask-distributed which uses cloudpickle under the covers
Or loky. PRs on this front would be great!
I think the issue here is a conda exception. There is some serious jank associated with pickles of special errors. Maybe? https://stackoverflow.com/questions/49715881/how-to-pickle-inherited-exceptions
I think this is mostly fixed.
cf-regro-autotick-bot appears to have arthritically fallen down and can no longer get it up – at least, with respect to the two principal feedstocks I maintain: BETSE and BETSEE.
BETSE 0.9.2 and BETSEE 0.9.2.0 were both released over twenty-four hours ago but have yet to receive any automated loving from my formerly favourite autoticker. In the halcyon days of yore, the autoticker would respond within an hour (typically, minutes) of a PyPI release. I can only assume that conda-forge automation loathes scientific simulators, presumably in a futile attempt to forestall the inevitable Machine Singularity. :robot:
Here Is Where I Sigh Forlornly
Of course, Life could be worse. Since manually bumping feedstocks is trivial (if needlessly time-consuming), this isn't necessarily the highest priority ticket. Since another feestock also appears to now be ignored by cf-regro-autotick-bot, however, this may be a slightly more widespread issue. (In which case, this might be a high priority ticket after all.)
Unlike said feedstock, there are no nomenclature complications in the case of either BETSE or BETSEE. Their PyPI project names are the same as their conda-forge feedstock names. Likewise, there have been no significant changes to either their
meta.yaml
recipes or theirsetup.py
installation scripts. Both comply withconda-build: 3
syntax and semantics, including usage of the now-favouredrequirements/host:
list of mandatory dependencies. Moreover, both are tagged asnoarch
and thus quite minimal.Unfortunately, I'm under contractual deadlines here. I'll probably need to manually bump both feedstocks before anything resembling a fleshy human gets a chance to examine these issues. I suspect that may complicate debugging on your end.
Throw Us a Friggin' Bone Here
O.K., I have only one unfounded suspicion as to the underlying culprit. To circumvent conda/conda-build#3318, @jjhelmus kindly injected the
--no-build-isolation
option into thebuild/script:
command for both feedstocks.That's it, guys. That's all I've got. Admittedly, it's not much. Would someone on the bot sub-team mind examining the autoticker logs for any suspicious exceptions, warnings, or other incriminating evidence?
I dearly loved the cf-regro-autotick-bot for the copious free time it once granted me. Admittedly, I squandered it all playing fanservice-friendly JRPGs. Persona 5. Always Persona 5. I'd hate to have to start doing things for myself for once.