regro / cf-scripts

Flagship repo for cf-regro-autotick-bot
Other
49 stars 74 forks source link

Multiple noarch feedstocks fail to autotick #441

Closed leycec closed 5 years ago

leycec commented 5 years ago

cf-regro-autotick-bot appears to have arthritically fallen down and can no longer get it up – at least, with respect to the two principal feedstocks I maintain: BETSE and BETSEE.

BETSE 0.9.2 and BETSEE 0.9.2.0 were both released over twenty-four hours ago but have yet to receive any automated loving from my formerly favourite autoticker. In the halcyon days of yore, the autoticker would respond within an hour (typically, minutes) of a PyPI release. I can only assume that conda-forge automation loathes scientific simulators, presumably in a futile attempt to forestall the inevitable Machine Singularity. :robot:

Here Is Where I Sigh Forlornly

Of course, Life could be worse. Since manually bumping feedstocks is trivial (if needlessly time-consuming), this isn't necessarily the highest priority ticket. Since another feestock also appears to now be ignored by cf-regro-autotick-bot, however, this may be a slightly more widespread issue. (In which case, this might be a high priority ticket after all.)

Unlike said feedstock, there are no nomenclature complications in the case of either BETSE or BETSEE. Their PyPI project names are the same as their conda-forge feedstock names. Likewise, there have been no significant changes to either their meta.yaml recipes or their setup.py installation scripts. Both comply with conda-build: 3 syntax and semantics, including usage of the now-favoured requirements/host: list of mandatory dependencies. Moreover, both are tagged as noarch and thus quite minimal.

Unfortunately, I'm under contractual deadlines here. I'll probably need to manually bump both feedstocks before anything resembling a fleshy human gets a chance to examine these issues. I suspect that may complicate debugging on your end.

Throw Us a Friggin' Bone Here

O.K., I have only one unfounded suspicion as to the underlying culprit. To circumvent conda/conda-build#3318, @jjhelmus kindly injected the --no-build-isolation option into the build/script: command for both feedstocks.

That's it, guys. That's all I've got. Admittedly, it's not much. Would someone on the bot sub-team mind examining the autoticker logs for any suspicious exceptions, warnings, or other incriminating evidence?

I dearly loved the cf-regro-autotick-bot for the copious free time it once granted me. Admittedly, I squandered it all playing fanservice-friendly JRPGs. Persona 5. Always Persona 5. I'd hate to have to start doing things for myself for once.

scopatz commented 5 years ago

I have to admit, I really enjoyed reading this 😅

I briefly looked at the logs and it seems like it was only running the compiler migration and not the version bump. But that doesn't seem right.

CJ-Wright commented 5 years ago

The bot's been failing for 4 days in the make_graph stage. Here is the first failing build. https://circleci.com/gh/regro/circle_worker/4904

leycec commented 5 years ago

</forlorn_sigh>

Right-o. Obscure failures in graph theory. That sounds... non-trivial. Manual bumps it is, then! Thanks for the intrepid investigation, @scopatz and @CJ-Wright. May somebody head this stampede off at the pass before it tramples us all to death under a rising tide of boilerplate busywork.

leycec commented 5 years ago

I couldn't help myself. I must be an open-source sadist, because I ~foolishly~ diligently looked into this. Courtesy the CircleCI logfile that @CJ-Wright kindly referenced, we find the following suspicious multiprocessing exception:

2019-01-20 15:10:49,892 INFO     conda_forge_tick.make_graph || (37, 'davix')
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.6/concurrent/futures/process.py", line 272, in _queue_management_worker
    result_item = reader.recv()
  File "/opt/conda/lib/python3.6/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
TypeError: __init__() missing 1 required positional argument: 'lineno'

"But what's suspicious about that, @leycec?", you might now be asking yourself. I shall gladly pontificate.

What's suspicious about that is the failing line "return _ForkingPickler.loads(buf.getbuffer())", which appears to contains no implicit calls to an __init__() method referenced by the exception message "TypeError: __init__() missing 1 required positional argument: 'lineno'"or does it? In the Pickling Class Instances section of the official pickle documentation, we find this suspiciously vague hand-waving:

In most cases, no additional code is needed to make instances picklable. By default, pickle will retrieve the class and the attributes of an instance via introspection. When a class instance is unpickled, its __init__() method is usually not invoked. The default behaviour first creates an uninitialized instance and then restores the saved attributes.

Clearly, the default behaviour does not apply to whatever non-standard object is being pickled here. We can safely infer that, against all sane expectations, an __init__() method is actually being called here. Unfortunately, the resulting exception is sufficiently vague that it's unclear exactly what type of non-standard object is to blame.

I myself have suffered through the hair-ripping vagaries of both the pickle and multiprocessing APIs, whose well-known harmful interactions with one another border on outright broken. My sagely cave-dwelling advice is simple, proven, and effective: abandon the pickle-based multiprocessing API for a dill-based multiprocessing fork.

In 2019, pickle and any module requiring pickle (including multiprocessing) should be regarded as functionally illiterate, painfully obsolete, and effectively broken. pickle fails to pickle most real-world objects of interest with non-human-readable exceptions (see: above), which means that multiprocessing fails to multiprocess most real-world objects of interest. Relevant outstanding issues on the Python issue tracker include this, this, and this.

The Python stdlib should have switched out pickle for dill, which suffers none of the well-known deficits of pickle and is actually capable of pickling most real-world objects of interest, decades ago. Instead, pickle stagnated in its own dank pool of bodily fluids. Everyone (including us) just use dill and dill-based substitutes instead. We're all happier and saner for it. Well-maintained substitutes include:

Technically, you might be able to brute-force your way through this specific issue, which would buy you a temporary reprieve and permit the autoticker to continue to leverage the standard multiprocessing API. Practically, you'll continue to hit obscure pickle edge cases and collectively wonder why someone didn't simply do the Right Thing™ and abandon the standard multiprocessing API for genuinely sane alternatives.

Thus ends this broadcast of the emergency multiprocessing system.

mariusvniekerk commented 5 years ago

odds are we can also just substitute in dask-distributed which uses cloudpickle under the covers

CJ-Wright commented 5 years ago

Or loky. PRs on this front would be great!

I think the issue here is a conda exception. There is some serious jank associated with pickles of special errors. Maybe? https://stackoverflow.com/questions/49715881/how-to-pickle-inherited-exceptions

CJ-Wright commented 5 years ago

I think this is mostly fixed.