Closed Enlion91 closed 6 months ago
after control + c and retart train without data_sink, it won't start normally, with lots of python process running
we recommend to use default config and dataset to gets the acc and performance in readme if you remove datasink mode, you may need to change the config to get the network start training lots of python processing is multiprocessing of data, use parallel function, it is an normal phenomenon
we recommend to use default config and dataset to gets the acc and performance in readme if you remove datasink mode, you may need to change the config to get the network start training lots of python processing is multiprocessing of data, use parallel function, it is an normal phenomenon
Indeed, I use default mindyolo config and follow start instruction. The following is the INFO level log. It stays here forever.
在华为工程师的帮助下,问题已定位: 子进程不响应退出信号15,流程卡死。临时变更为强制退出规避。 已知问题,在mindspore 2.3 合入了解决措施,但是该措施在我的环境上无效,仍需要kill强制退出。 https://gitee.com/mindspore/mindspore/pulls/66995/files
Environment
Hardware Environment(
Ascend
/GPU
/CPU
):ascend 910A, atlas800-9000 server
Software Environment:
Describe the current behavior
mindyolo, use data sink mode, train won't start
Describe the expected behavior
speed up train
Steps to reproduce the issue
Related log / screenshot
it stays here and does not go on
Special notes for this issue