python / cpython

The Python programming language
https://www.python.org
Other
63.73k stars 30.53k forks source link

SyncManager and Main Process fail to communicate after reboot or stoping with Ctrl - C #73043

Open cdf1499c-a2a6-4ba0-8e30-9a2e896fed7c opened 8 years ago

cdf1499c-a2a6-4ba0-8e30-9a2e896fed7c commented 8 years ago
BPO 28857
Nosy @bitdancer, @applio

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['extension-modules', 'interpreter-core', 'type-crash'] title = 'SyncManager and Main Process fail to communicate after reboot or stoping with Ctrl - C' updated_at = user = 'https://bugs.python.org/NagarjunaArigapudi' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'none' closed = False closed_date = None closer = None components = ['Extension Modules', 'Interpreter Core'] creation = creator = 'Nagarjuna Arigapudi' dependencies = [] files = [] hgrepos = [] issue_num = 28857 keywords = [] message_count = 4.0 messages = ['282223', '282243', '282319', '282359'] nosy_count = 3.0 nosy_names = ['r.david.murray', 'davin', 'Nagarjuna Arigapudi'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'crash' url = 'https://bugs.python.org/issue28857' versions = ['Python 2.7', 'Python 3.5'] ```

cdf1499c-a2a6-4ba0-8e30-9a2e896fed7c commented 8 years ago

"SyncManager" and "Main Process" and main process look at different directories and fail to communicate, causing shutdown of all process, immediately after start of program. This behavior is seen in both 2.7 and 3.5. The logging of 2.7 is more clear, It tells the file name it is looking for.

Extract of Program:

manager = Manager()
lst1 = manager.list([[]]*V1)  
lst2 = manager.list(range(v2)) 
lst3 = manager.list([0]*V3)         
lst4 = manager.list([0]*V3) 
lst5 = manager.list([0]*V3)  
initializeData(lst1,lst2,lst3,lst4,lst5)
procs = []
for indx in range(noOfProcs):
    procs.append(Process(target=workerProc, args=(lst1,lst2,lst3,lst4,lst5)))
    procs[indx].start()

bContinueWorking = True
while (bContinueWorking):
    logger.debug("Main thread about to sleep")   
    time.sleep(300)
    globLOCK.acquire()
    if(not lst1):
        bContinueWorking = False
    try:
        doPickle(lst1)
    except Exception, ep:
        logger.error("failed to pickle" +str(ep))
    finally:
        globLOCK.release()

The program works well. but if the program is terminated, it will not start back. rebooting or cleaning temporary files does not fix the issue.

below is log when it fails. ( beneath the log is other log where it runs successfully)

FAIL LOG [DEBUG/MainProcess] Star of Application [DEBUG/MainProcess] created semlock with handle 139965318860800 [DEBUG/MainProcess] created semlock with handle 139965318856704 [DEBUG/MainProcess] created semlock with handle 139965318852608 [DEBUG/MainProcess] created semlock with handle 139965318848512 [DEBUG/MainProcess] created semlock with handle 139965318844416 [DEBUG/MainProcess] created semlock with handle 139965318840320 [INFO/SyncManager-1] child process calling self.run()

***[INFO/SyncManager-1] created temp directory /tmp/pymp-xTqdkd***

[DEBUG/MainProcess] requesting creation of a shared 'list' object [INFO/SyncManager-1] manager serving at '/tmp/pymp-xTqdkd/listener-eDG1yJ' [DEBUG/SyncManager-1] 'list' callable returned object with id '7f4c34316f80' [DEBUG/MainProcess] INCREF '7f4c34316f80' [DEBUG/MainProcess] requesting creation of a shared 'list' object [DEBUG/SyncManager-1] 'list' callable returned object with id '7f4c3432a758' [DEBUG/MainProcess] INCREF '7f4c3432a758' [DEBUG/MainProcess] requesting creation of a shared 'list' object [DEBUG/SyncManager-1] 'list' callable returned object with id '7f4c3432a7a0' [DEBUG/MainProcess] INCREF '7f4c3432a7a0' [DEBUG/MainProcess] requesting creation of a shared 'list' object [DEBUG/SyncManager-1] 'list' callable returned object with id '7f4c3432a7e8' [DEBUG/MainProcess] INCREF '7f4c3432a7e8' [DEBUG/MainProcess] requesting creation of a shared 'list' object [DEBUG/SyncManager-1] 'list' callable returned object with id '7f4c3432a830' [DEBUG/MainProcess] INCREF '7f4c3432a830' [DEBUG/MainProcess] thread 'MainThread' does not own a connection [DEBUG/MainProcess] making connection to manager [DEBUG/SyncManager-1] starting server thread to service 'MainProcess'

***[DEBUG/MainProcess] failed to connect to address /tmp/pymp-LOMHoT/listener-EbLeup***

[Errno 2] No such file or directory Initialization failed Exiting [INFO/MainProcess] process shutting down

SUCCESS LOG

[DEBUG/MainProcess] Star of Application [DEBUG/MainProcess] created semlock with handle 139830888992768 [DEBUG/MainProcess] created semlock with handle 139830888988672 [DEBUG/MainProcess] created semlock with handle 139830888984576 [DEBUG/MainProcess] created semlock with handle 139830888980480 [DEBUG/MainProcess] created semlock with handle 139830888976384 [DEBUG/MainProcess] created semlock with handle 139830888972288 [INFO/SyncManager-1] child process calling self.run() [INFO/SyncManager-1] created temp directory /tmp/pymp-UiHuij [DEBUG/MainProcess] requesting creation of a shared 'list' object [INFO/SyncManager-1] manager serving at '/tmp/pymp-UiHuij/listener-lS7hf5' [DEBUG/SyncManager-1] 'list' callable returned object with id '7f2ce78c6f80' [DEBUG/MainProcess] INCREF '7f2ce78c6f80' [DEBUG/MainProcess] requesting creation of a shared 'list' object [DEBUG/SyncManager-1] 'list' callable returned object with id '7f2ce78da758' [DEBUG/MainProcess] INCREF '7f2ce78da758' [DEBUG/MainProcess] requesting creation of a shared 'list' object [DEBUG/SyncManager-1] 'list' callable returned object with id '7f2ce78da7a0' [DEBUG/MainProcess] INCREF '7f2ce78da7a0' [DEBUG/MainProcess] requesting creation of a shared 'list' object [DEBUG/SyncManager-1] 'list' callable returned object with id '7f2ce78da7e8' [DEBUG/MainProcess] INCREF '7f2ce78da7e8' [DEBUG/MainProcess] requesting creation of a shared 'list' object [DEBUG/SyncManager-1] 'list' callable returned object with id '7f2ce78da830' [DEBUG/MainProcess] INCREF '7f2ce78da830' [DEBUG/MainProcess] thread 'MainThread' does not own a connection [DEBUG/MainProcess] making connection to manager [DEBUG/SyncManager-1] starting server thread to service 'MainProcess' [DEBUG/MainProcess] Main thread about to sleep [DEBUG/Process-7] INCREF '7f2ce78c6f80' [DEBUG/Process-6] INCREF '7f2ce78c6f80' [DEBUG/Process-5] INCREF '7f2ce78c6f80' [DEBUG/Process-4] INCREF '7f2ce78c6f80' [DEBUG/Process-3] INCREF '7f2ce78c6f80' [DEBUG/Process-2] INCREF '7f2ce78c6f80' [DEBUG/Process-6] INCREF '7f2ce78da758' [DEBUG/Process-7] INCREF '7f2ce78da758' [DEBUG/Process-5] INCREF '7f2ce78da758' [DEBUG/Process-4] INCREF '7f2ce78da758' [DEBUG/Process-3] INCREF '7f2ce78da758' [DEBUG/Process-2] INCREF '7f2ce78da758' [DEBUG/Process-6] INCREF '7f2ce78da7a0' [DEBUG/Process-7] INCREF '7f2ce78da7a0' [DEBUG/Process-5] INCREF '7f2ce78da7a0' [DEBUG/Process-4] INCREF '7f2ce78da7a0' [DEBUG/Process-3] INCREF '7f2ce78da7a0' [DEBUG/Process-2] INCREF '7f2ce78da7a0' [DEBUG/Process-6] INCREF '7f2ce78da7e8' [DEBUG/Process-7] INCREF '7f2ce78da7e8' [DEBUG/Process-5] INCREF '7f2ce78da7e8' [DEBUG/Process-4] INCREF '7f2ce78da7e8' [DEBUG/Process-3] INCREF '7f2ce78da7e8' [DEBUG/Process-2] INCREF '7f2ce78da7e8' [DEBUG/Process-6] INCREF '7f2ce78da830' [INFO/Process-6] child process calling self.run() [DEBUG/Process-6] thread 'MainThread' does not own a connection [DEBUG/Process-6] making connection to manager [DEBUG/Process-7] INCREF '7f2ce78da830' [INFO/Process-7] child process calling self.run()

bitdancer commented 7 years ago

There doesn't appear to be enough information here to determine what part of Python you are trying to report a bug in. It reads more like you are asking for help debugging your program :)

Maybe you could ask for help on the python-list mailing list to help you formulate a clearer bug report.

cdf1499c-a2a6-4ba0-8e30-9a2e896fed7c commented 7 years ago

I added logs just to provide additional info. The problem is one of the process uses " /tmp/pymp-xTqdkd" for communication and other uses "/tmp/pymp-LOMHoT/listener-EbLeup". because of mismatch porcesse could communicate and multiprocess application shuts down, with message "process shutting down". The files/directories (temporary files in tmp) are not user defined, they are generated by Python.

This behavior is seen only after a running application is sent (Ctrl C) or the server is rebooted while application is running. In Simple terms non-clean shutdown of application make application not runnable, because synchronization of multiprocess gets corrupted, rebooting or clearing /tmp does not fix the issue.

Relevent Logs (both SyncManager-1 and MainProcess) is python code ( in multiprocessing package/module).

[INFO/SyncManager-1] created temp directory /tmp/pymp-xTqdkd

[DEBUG/MainProcess] failed to connect to address /tmp/pymp-LOMHoT/listener-EbLeup

This results in File not found error, and application shuts down. The code in multiprocessing should be using same directory for communication.

Thanks

bitdancer commented 7 years ago

Hmm. I wasn't aware that multiprocessing used anything other than semaphores (ie: not files in /tmp). Clearly my knowledge is not enough to understand this problem, I'll leave it to the multiprocessing experts.