Closed AymenFJA closed 1 year ago
That sleep is likely a left over from a version where the submit
call was not blocking and waiting for task completeness, but where it just submitted the tasks. So yes, the sleep can indeed be removed now.
Oh great, happy to do so. I will open a PR with that. Thanks.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Andre Merzky @.> Sent: Thursday, August 3, 2023 5:47:44 PM To: radical-cybertools/radical.pilot @.> Cc: Aymen Alsaadi @.>; Author @.> Subject: Re: [radical-cybertools/radical.pilot] RAPTOR: raptor_master.py question (Issue #3003)
That sleep is likely a left over from a version where the submit call was not blocking and waiting for task completeness, but where it just submitted the tasks. So yes, the sleep can indeed be removed now.
— Reply to this email directly, view it on GitHubhttps://github.com/radical-cybertools/radical.pilot/issues/3003#issuecomment-1664686269, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGOJMHXN5LHGOC47BU2N6DTXTQMABANCNFSM6AAAAAA3DLHXAU. You are receiving this because you authored the thread.Message ID: @.***>
@andre-merzky I see a different behavior which is originally highlighted by @mtitov. Removing the time.sleep(60)
led to an instant termination of the master:
master.wait_workers(count=1)
master.start()
master.submit()
# TODO: can be run from thread?
master.stop()
# TODO: worker state callback
master.join()
1691163340.425 : master.000000.worker.0000 : 5158 : 139873672492864 : DEBUG : wait for registration to complete
1691163340.435 : master.000000.worker.0000 : 5112 : 140625709971264 : DEBUG : register: master.000000.worker.0000 / master.000000
1691163340.436 : master.000000.worker.0000 : 5112 : 140625709971264 : DEBUG : wait for registration to complete
1691163340.438 : master.000000.worker.0000 : 5112 : 140625709971264 : DEBUG : registration with master ok
1691163340.438 : master.000000.worker.0000 : 5158 : 139873672492864 : DEBUG : registration with master ok
1691163340.487 : master.000000.worker.0000 : 5176 : 140253084510016 : DEBUG : wait for registration to complete
1691163340.487 : master.000000.worker.0000 : 5176 : 140253084510016 : DEBUG : registration with master ok
1691163340.515 : master.000000.worker.0000 : 5120 : 140496532444992 : DEBUG : wait for registration to complete
1691163340.515 : master.000000.worker.0000 : 5120 : 140496532444992 : DEBUG : registration with master ok
1691163340.819 : master.000000.worker.0000 : 5158 : 139873593792256 : DEBUG : worker_terminate signal
1691163340.820 : master.000000.worker.0000 : 5158 : 139873593792256 : ERROR : callback error
Can you please clarify? Thanks.
Hmm, that's unexpected. Let me step through that code to see what's going on...
In
raptor_master.py
specifically in line https://github.com/radical-cybertools/radical.pilot/blob/10d25ebaf603f4a57ed08a129c310a1ebfb213d5/examples/misc/raptor_master.py#L301 although there is a comment, it is still not clear:60s
?This issue was raised while @mtitov and I were discussing this in line with the examples.