hkchengrex / Mask-Propagation

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.
https://hkchengrex.github.io/MiVOS/
MIT License
127 stars 22 forks source link

The server remained unresponsive for a long time when I try to train your model. #4

Closed longmalongma closed 3 years ago

longmalongma commented 3 years ago

When I ran this line of code on our server, the server did not respond for a long time. Do you know why?

UDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=1 train.py --id retrain_s0 --stage 0 --batch_size 4

hkchengrex commented 3 years ago

Did you miss the "C" at the front?

hkchengrex commented 3 years ago

but that's probably not the reason... Which line is it stuck at? You can use a tool to check: https://github.com/benfred/py-spy

longmalongma commented 3 years ago

but that's probably not the reason... Which line is it stuck at? You can use a tool to check: https://github.com/benfred/py-spy

Thanks for your reply. I will try it.

hkchengrex commented 3 years ago

@longmalongma It seems that this problem has not been fixed. How about you try some simple debugging (like printing 1, 2, 3, 4) at different lines of train.py and see which line does the program stuck at?

longmalongma commented 3 years ago

@longmalongma It seems that this problem has not been fixed. How about you try some simple debugging (like printing 1, 2, 3, 4) at different lines of train.py and see which line does the program stuck at?

@longmalongma It seems that this problem has not been fixed. How about you try some simple debugging (like printing 1, 2, 3, 4) at different lines of train.py and see which line does the program stuck at?

OK, now I will try it according to your suggestion.

longmalongma commented 3 years ago

@longmalongma It seems that this problem has not been fixed. How about you try some simple debugging (like printing 1, 2, 3, 4) at different lines of train.py and see which line does the program stuck at?

@longmalongma It seems that this problem has not been fixed. How about you try some simple debugging (like printing 1, 2, 3, 4) at different lines of train.py and see which line does the program stuck at?

OK, now I will try it according to your suggestion.

image I can't print anything, it stays in this interface.

longmalongma commented 3 years ago

@longmalongma It seems that this problem has not been fixed. How about you try some simple debugging (like printing 1, 2, 3, 4) at different lines of train.py and see which line does the program stuck at?

@hkchengrex Is there a problem with the data set? image

hkchengrex commented 3 years ago

If you cannot even finish importing it's not because of the dataset... Can you add a few more print statements to see where does it fail?

longmalongma commented 3 years ago

If you cannot even finish importing it's not because of the dataset... Can you add a few more print statements to see where does it fail?

Sorry, none of the rows can be printed, I think it's probably a problem with the data set, because the difference between me and others is the download of the data set.

longmalongma commented 3 years ago

If you cannot even finish importing it's not because of the dataset... Can you add a few more print statements to see where does it fail?

Sorry, none of the rows can be printed, I think it's probably a problem with the data set, because the difference between me and others is the download of the data set.

If you cannot even finish importing it's not because of the dataset... Can you add a few more print statements to see where does it fail?

Sorry, none of the rows can be printed, I think it's probably a problem with the data set, because the difference between me and others is the download of the data set. I found a problem. It got stuck when introducing Torch, but my environment can run STM. Do I need to update the version of Torch? image

hkchengrex commented 3 years ago

Maybe you can uninstall and re-install pytorch?

longmalongma commented 3 years ago

Maybe you can uninstall and re-install pytorch? ok,my current environment is like this: torchaudio 0.8.1 py38 pytorch

longmalongma commented 3 years ago

Maybe you can uninstall and re-install pytorch?

I uninstalled and reinstalled PyTorch. The improved version still doesn't work, and the runtime is still stuck in the Import Torch line.

hkchengrex commented 3 years ago

What if you open up python, and only import torch?

longmalongma commented 3 years ago

What if you open up python, and only import torch? I can import torch when I use the command line to enter Python, but I'm stuck in the import torch line when I run codes. Why? image

hkchengrex commented 3 years ago

What if you import torch first, before everything else? i.e., move import torch to the first line.

longmalongma commented 3 years ago

move import torch to the first line

When I moved the Import Torch to the first line, the Import Torch succeeded. Thank you for your help.

longmalongma commented 3 years ago

What if you import torch first, before everything else? i.e., move import torch to the first line.

Why is Import Torch correct on the first line, but incorrect on any other line?Do you know why?I'm being hard to understand.

hkchengrex commented 3 years ago

I have no idea. Maybe you can ask the PyTorch guys: https://github.com/pytorch/pytorch/issues

longmalongma commented 3 years ago

I have no idea. Maybe you can ask the PyTorch guys: https://github.com/pytorch/pytorch/issues

ok, thanks!