Closed ofsoundmind closed 1 month ago
Hi @ofsoundmind thanks for letting us know about this issue. Could you please provide some clarifying details to help us debug this issue? From your post, I can't tell what text you have copied from the error message versus from the thread you linked. Can you provide: (1) the lines of code you are running in your python script; (2) the full text of the error message; and (3) the version of python, opensoundscape and torch in your python environment?
Because you are on mac, I don't think that the thread about windows is relevant to your issue.
Hi @sammlapp. Thanks for your quick reply and sorry for missing out those details. I have posted the code I am working on and some test audio files on https://github.com/ofsoundmind/kihikihi.
python 3.9.19 opensoundscape 0.10.2 pytorch 2.3.1
Below is the error message I got when running code without "if name == "main":".
Training Epoch 0
0%| | 0/120 [00:00<?, ?it/s]
Training Epoch 0
0%| | 0/120 [00:00<?, ?it/s]
Traceback (most recent call last):
File "
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
If I include "if name == "main":" then code appears to run but returns the below warning for each worker instance:
/opt/homebrew/anaconda3/envs/kihikihi_env/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py:222: UserWarning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1716905753886/work/aten/src/ATen/ParallelNative.cpp:228.) torch.set_num_threads(1)
Thanks for the details, I will try to reproduce the error later today but haven't seen anything like this before. It's especially surprising considering that you have set num_workers = 0
(FYI, we typically train with at least num_workers=4, and ideally higher). With 0, it should only be using the root process so I'm not sure why its trying to spawn processes with multiprocessing and create the _MultiProcessingDataLoaderIter object.
@ofsoundmind can you confirm the value of num_workers that produced this error?
based on a few other threads, it does seem like wrapping the code that trains the model in if __name__ == '__main__':
is the suggested solution here. In general, using this if block is a good practice and required in Windows systems; generally, it is used for the entire main script rather than just a few lines (for details see this post). The idea is to make sure that you only run the code once, in the main thread, even if other threads are started (e.g., for avoiding an infinite recursive loop of starting new threads).
I've inquired here in the PyTorch forums about the need to do this on Mac OS, and will wait for a reply before we make a change to our documentation.
thanks for reporting the behavior
Thanks for looking into the issue.
In the script I have used num_workers = 0 to allow the code to run without any issues. But changing the number of workers to 1 or above has been causing the errors for me.
I’ll have a look at the PyTorch forum you posted. Thanks again.
Since we haven't seen any response from PyTorch, we will consider it a best practice to us the if name=="__main__":
block for all scripts, whether on Windows or Mac.
~python3.9/multiprocessing/spawn.py on OSX 14.6.1
https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection
RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module:
The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
The implementation of multiprocessing is different on Windows, which uses spawn instead of fork. So we have to wrap the code with an if-clause to protect the code from executing multiple times. Refactor your code into the following structure.
import torch
def main() for i, data in enumerate(dataloader):
do something here
if name == 'main': main()