Open fermanda opened 4 months ago
Hi @fermanda ,
Sorry for the delayed response. As far as I know, our code can run with one GPU. @WYJSJTU, could you please check this issue?
I think I've only tested it on at least two-gpu devices. It might require some modify to the code if it couldn't run on one GPU.
Hi @fermanda, how did you run the code? I ran it using the following script and did not encounter this error.
sh scripts/inference.sh data/checkpoint/aios_checkpoint.pth short_video.mp4 demo 2 0.1 1
Note: I have updated the code, so you should merge the latest code.
@ttxskk @WYJSJTU , Thank you for your reply. I havent tested with the latest code, but I did it by following the previous README.
Inference
- Place the mp4 video for inference under
AiOS/demo/
- Prepare the pretrained models to be used for inference under
AiOS/data/checkpoint
- Inference output will be saved in
AiOS/demo/{INPUT_VIDEO}_out
cd main sh scripts/inference.sh {INPUT_VIDEO} {OUTPUT_DIR} # For inferencing short_video.mp4 with output directory of demo/short_video_out sh scripts/inference.sh short_video demo
I would be grateful and close this issues if you describe the specific computer specs to run this code (CPU, RAM, GPU and number of GPU, Operating System) for easy replication.
Thank you in advance for your help.
Hi @fermanda ,
Sorry for the delayed response. As far as I know, our code can run with one GPU. @WYJSJTU, could you please check this issue?
I have the same problem as him.and I have multiple GPUs, I tried the command for 1 and 2, but all met the same problem.
Hi @formoree, How do you run the script?
Hi @formoree, How do you run the script?
Hi, I had solved this question, but I met another in issue (issue 24)[https://github.com/ttxskk/AiOS/issues/24]
Hi @formoree, thanks for your feedback. Would you mind telling us how to solve it?
I'm sorry, it's been so long that I've forgotten some details, but I remember it should be an issue with the Python package version; adjusting it should help.
Hi @formoree, thanks for your feedback. Would you mind telling us how to solve it?
Thank you for the interesting works. Can you describe the required computer specs for running the inference?
I tried to run the inference, but I kept getting device ordinal error.
LOCAL_RANK
environment seems to be set to 1 by default. I suspect that you use multiple GPU in a single computer?Here is my computer specs for running the inference code.
I also tested on other pytorch version but the error persist.
I modified the code
utils.init_distributed_mode(args)
function inmisc.py
line 581. I added a print command to print the GPU used in line 612.And here is the error result
The line error
Cuda GPU device set to X
is called twice, beforeprint("End torch.distributed.barrier()")
. I supposed because of the multiprocess?And then I forced to only use
GPU:0
by addingos.environ['LOCAL_RANK'] = "0"
in the beginning ofmain.py
but I got the following error.So, does it require multiple GPU to run the inference?