Open liamhuber opened 11 months ago
Happened again today. Was on the Windows 3.11 CI (I want to start noting this here in case there is some pattern of where it comes up, e.g. on windows)
Happened again on the windows 3.11 CI
1/10 seems like an overestimate of how often it crops up, and so far really seems like a windows thing
Happened on Ubuntu 3.11 today, so I guess it's not just windows.
Happened on Ubuntu 3.11 today.
I can't remember the last time it happened on my local machine, so maybe it's exclusively an issue in the CI environment?
Seems to be happening consistency today in #70, so maybe there's a chance to nail this down
Ok, I added a little sleep and then suddenly the CI passed, so maybe this is fixed after all? Could it be simply that the callback function is executing to slowly on the CI (but fast enough on my local machine) that the test comes up before the state is ready for it?
returned_nodes = result.result() # Wait for the process to finish
from time import sleep # The "fixing" change (so far)
sleep(1) # The "fixing" change (so far)
self.assertIsNot(
original_one,
returned_nodes.one,
msg="Executing in a parallel process should be returning new instances"
)
self.assertIs(
macro,
macro.nodes.one.parent,
msg="Returned nodes should get the macro as their parent"
# Once upon a time there was some evidence that this test was failing
# stochastically, but I just ran the whole test suite 6 times and this test
# 8 times and it always passed fine, so maybe the issue is resolved...
)
Finally happened again today (windows), but it's been over a month, so I guess the sleep call is doing the trick.
Again on windows today. Pretty rare, but horrible that it exists at all. Adding the sleep
when collecting results really seems to have helped, but it's already a full second of extra waiting -- more would be a big pain, but is this going to have horrible scaling when the amount of data on the nodes increases? I still find this bug overall very disconcerting.
It finally happened on windows again! #241. It's been a while.
Again today on windows on #773. Maybe the months long stretch was just some kind of good luck?
The interval is nice and long, but it is still happening. Windows running on merge to main. https://github.com/pyiron/pyiron_workflow/actions/runs/9420579496/job/25952940926
Again after two months. Must be a thread safety thing?
It's quite infrequent (< 1 in 10), but sometimes this test fails in the following way:
I.e. when a composite is run on an executor and copies of its children are returned as the result, these new children are somehow failing to get the parent-child relationship set up reflexively (which is the expected behaviour). I vaguely suspect it has something to do with
Future.result()
returning new instances each call, or some sort of delayed execution that I am wrongly assuming is happening synchronously, or similar, but working with executors is pretty new to me and the error is not consistently reproducible, so I haven't managed to track it down yet.