Open ThomasLimWZ opened 2 months ago
@ThomasLimWZ Hello, thanks for your feedback. As far as I know, mindspore' support for the Windows OS is incomplete. Please consider switching to the Linux OS.
As to the problem of running distributed training tasks, you can try the dynamic cluster startup method (refer to Distributed Parallel Startup Methods). MindSpore provides three distributed parallel startup methods (refer to Distributed Parallel Startup Methods), two of which support GPU.
Hi, I tried to use Windows Subsystem for Linux to run this repository, and is already resolved the issue of OpenMPI. But currently, I'm still facing some issues with both standalone training and distributed training. It returned to me the error messages that said that the [mindspore/ccsrc/backend/common/mem_reuse/mem_dynamic_allocator.cc:303] CalMemBlockAllocSize] Memory not enough: current free memory size[0] is smaller than required size[262144000].
Can I know what is the minimum hardware requirements for this mindocr? FYI, my RAM size is 24GB and my GPU is Nvidia 3050Ti only.
@ThomasLimWZ
As far as I know, there is no MindSpore API to get the required RAM or graphics memory currently. But I am afraid that the 4GB graphic memory of 3050Ti GPU may be insufficient for training DBNet ResNet-50 with the default configurations.
You can try to reduce the value of train.loader.batch_size
and train.loader.num_workers
in configs/det/dbnet/db_r50_icdar15.yaml
. Also, you can try to switch to using DBNet ResNet-18.
Hi, I am unable to run the distributed train using the GPU using this
mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/dbnet/db_r50_icdar15.yaml
. I knew the issue was on the OpenMPI, but my PC is Windows-based and OpenMPI is no longer supported on Windows based on my understanding. Do you have any advice to solve my issue?