I suddently experienced this error:
terminate called after throwing an instance of 'std::system_error' what(): open(/home/nguyen/tmp/tmp_gaq61e3/.torch_distributed_init): No such file or directory
Everything worked fine so far, we went to 5 kimg OK and suddenly met this error. Nothing has been changed as far as environment is concerned and the original StyleGANV2-ADA ran fine as well, same conda environment.
tick 6 kimg 24.0 time 25m 40s sec/tick 147.6 sec/kimg 36.90 maintenance 0.3 cpumem 5.44 gpumem 5.18 augment 0.000
terminate called after throwing an instance of 'std::system_error' what(): open(/home/nguyen/tmp/tmp_gaq61e3/.torch_distributed_init): No such file or directory tick 7 kimg 28.0 time 28m 08s sec/tick 148.3 sec/kimg 37.08 maintenance 0.3 cpumem 5.44 gpumem 5.18 augment 0.000 tick 8 kimg 32.0 time 30m 37s sec/tick 148.7 sec/kimg 37.18 maintenance 0.3 cpumem 5.44 gpumem 5.18 augment 0.000
Weird thing is, the training somehow keeps going, no stopping, I am just worried that this may lead to something bad later.
Hi,
I suddently experienced this error: terminate called after throwing an instance of 'std::system_error' what(): open(/home/nguyen/tmp/tmp_gaq61e3/.torch_distributed_init): No such file or directory
Everything worked fine so far, we went to 5 kimg OK and suddenly met this error. Nothing has been changed as far as environment is concerned and the original StyleGANV2-ADA ran fine as well, same conda environment.
tick 6 kimg 24.0 time 25m 40s sec/tick 147.6 sec/kimg 36.90 maintenance 0.3 cpumem 5.44 gpumem 5.18 augment 0.000
terminate called after throwing an instance of 'std::system_error' what(): open(/home/nguyen/tmp/tmp_gaq61e3/.torch_distributed_init): No such file or directory tick 7 kimg 28.0 time 28m 08s sec/tick 148.3 sec/kimg 37.08 maintenance 0.3 cpumem 5.44 gpumem 5.18 augment 0.000 tick 8 kimg 32.0 time 30m 37s sec/tick 148.7 sec/kimg 37.18 maintenance 0.3 cpumem 5.44 gpumem 5.18 augment 0.000
Weird thing is, the training somehow keeps going, no stopping, I am just worried that this may lead to something bad later.
System:
Any idea ? Steve