Closed xugaoxiang closed 2 years ago
Have you solved this problem?
same here when running on cuda on linux
Sorry, cannot reproduce this error on Linux
I'm working on AWS EC2 type g4dn.xlarge
.
I ran:
python track.py --source v.mp4 --yolo-weights yolov7-e6e.pt --img 1280
And I got:
Downloading https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e.pt to yolov7-e6e.pt...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 290M/290M [00:19<00:00, 15.4MB/s]
Fusing layers...
Downloading...
From: https://drive.google.com/uc?id=1Kkx2zW89jq_NETu4u42CFZTMVD5Hwm6e
To: /home/ec2-user/Yolov7_StrongSORT_OSNet/weights/osnet_x0_25_msmt17.pt
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.34M/9.34M [00:00<00:00, 17.9MB/s]
Model: osnet_x0_25
- params: 203,568
- flops: 82,316,000
Successfully loaded pretrained weights from "/home/ec2-user/Yolov7_StrongSORT_OSNet/weights/osnet_x0_25_msmt17.pt"
** The following layers are discarded due to unmatched keys or layer size: ['classifier.weight', 'classifier.bias']
(1, 256, 128, 3)
video 1/1 (1/1100) /home/ec2-user/Yolov7_StrongSORT_OSNet/v.mp4: Traceback (most recent call last):
File "/home/ec2-user/Yolov7_StrongSORT_OSNet/track.py", line 332, in <module>
main(opt)
File "/home/ec2-user/Yolov7_StrongSORT_OSNet/track.py", line 327, in main
run(**vars(opt))
File "/home/ec2-user/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/ec2-user/Yolov7_StrongSORT_OSNet/track.py", line 149, in run
for frame_idx, (path, im, im0s, vid_cap) in enumerate(dataset):
File "/home/ec2-user/Yolov7_StrongSORT_OSNet/yolov7/utils/datasets.py", line 191, in __next__
img = letterbox(img0, self.img_size, stride=self.stride)[0]
File "/home/ec2-user/Yolov7_StrongSORT_OSNet/yolov7/utils/datasets.py", line 1000, in letterbox
dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
File "/home/ec2-user/.local/lib/python3.9/site-packages/torch/_tensor.py", line 732, in __array__
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
P.S. It works if I run with CPU, and also it works on this VM with YOLOV5-STRONGSORT.
THX (:
git pull
and try again, please @yagelgen . Let's if I manged to fix it now. Still can't reproduce this behavior on a newly cloned repo by:
python track.py --source v.mp4 --yolo-weights yolov7-e6e.pt --img 1280 --device 0
@mikel-brostrom Same error. Did you checked it with aws ec2 g4dn?
(If you want, we can schedule like half hour zoom to try to fix it.)
Have not tried to deploy this on any cloud platform. I am available 11-12AM CET tomorrow. Otherwise, Wednesday 8-12.
I solved the problem. You can try under this file File "/home/ec2-user/.local/lib/python3.9/site-packages/torch/_tensor.py", line 732, in array return self.numpy() modify self.numpy() to self.cpu().numpy
modify self.numpy() to self.cpu().numpy() After I revised it, there was no error reported
you can try it
@Zhengzhiyang0000 yeah! now it works.
@mikel-brostrom do you know how to fix it in the code?
(If you want I'm available tomorrow - you can set half hour in google calendar - yagelgen@gmail.com)
I solved the problem.
You can try under this file File "/home/ec2-user/.local/lib/python3.9/site-packages/torch/_tensor.py", line 732, in array
return self.numpy()
modify self.numpy() to self.cpu().numpy
Your fix is within torch @Zhengzhiyang0000? That is wierd
@mikel-brostrom
dataset = LoadImages(source, img_size=imgsz, stride=stride.cpu().numpy())
instead of
dataset = LoadImages(source, img_size=imgsz, stride=stride)
But, it's too slow.
I fixed it by changing
stride = model.stride.max()
to
stride = int(model.stride.max())
in track.py line 105 and also removing the .cpu().numpy() in the same file
StrongSORT is still very slow in itself so I see no application for it in real time scenarios (~ 0.1 seconds for just strongsort per frame on a 1660ti mobile while my custom trained yolov7 tiny needs an order of magnitude less than that. )..
I achieve the following inference times on my webcam with a modest Quadro P2000. Which is way below a 1660ti in terms of specs @Jimmeimetis.
Yolov5s.pt + mobilenetv2_x1_0_msmt17.pt
0: 480x640 1 person, 3 cars, Done. YOLO:(0.024s), StrongSORT:(0.047s)
0: 480x640 1 person, 5 cars, Done. YOLO:(0.019s), StrongSORT:(0.031s)
0: 480x640 1 person, 5 cars, Done. YOLO:(0.018s), StrongSORT:(0.032s)
0: 480x640 1 person, 5 cars, Done. YOLO:(0.019s), StrongSORT:(0.030s)
0: 480x640 1 person, 4 cars, Done. YOLO:(0.018s), StrongSORT:(0.027s)
0: 480x640 1 person, 4 cars, Done. YOLO:(0.018s), StrongSORT:(0.027s)
0: 480x640 1 person, 4 cars, Done. YOLO:(0.019s), StrongSORT:(0.025s)
~20FPS
Yolov5s.engine + mobilenetv2_x1_0_msmt17.engine
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.018s)
0: 640x640 1 class0, 3 class2s, Done. YOLO:(0.019s), StrongSORT:(0.020s)
0: 640x640 1 class0, 3 class2s, Done. YOLO:(0.017s), StrongSORT:(0.020s)
0: 640x640 1 class0, 3 class2s, Done. YOLO:(0.019s), StrongSORT:(0.020s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.017s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.016s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.017s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.017s), StrongSORT:(0.017s)
~27FPS
Notice that my main work is in my Yolov5StrongSORT repo which is currently ahead of Yolov7StrongSORT.
These look much more reasonable given the GFLOPS of the models used in StrongSORT, lots of weird behavior on my turing GPU (1660ti) compared to my pascal one (1070). Cuda 11 makes my 1660ti detect nothing on yolov7 and on cuda 10.2 that im running as a workaround , fp16 is significantly slower vs fp32 .
Also thanks for letting me know about your work on the yolov5 repo. Will test it later!
Ok tested it, StrongSORT run time is proper on the yolov5 repo so I will use that implementation or v7. Lastly, just disabling half precision on my cuda11 environment with the 1660ti seems to do the trick inference wise (it now detects).
Will test it on a 3090 soon enough in attempt to try and find the culprit. Thanks!
Notice that the more detection you have the longer time it will take for StrongSORT to finish the association process. Btw, I don't think 1660ti supports half precision inference...
Btw, I don't think 1660ti supports half precision inference...
It does and the issue is likely some poor interaction with pytorch/cuda. Even if it didn't support accelerated fp16 at 2x the rate of fp32 the performance should have been roughly the same and not degraded ~10x like it is on my side. I will get to the bottom of this eventually but its not a priority right now.
Thanks and have a good night
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
@Jimmeimetis have you found the culprit of this issue, i am using 1660 too and strongsort process 0.2s per frame, pretty slow
@NQHuy1905 I ported the strongsort tracker from the v5 repo to the v7 and the execution times lined up with the v5 ones. That being said, while it was able to run in real time using a very fast inference model I did not consider it being worth using over deepsort due to the higher execution time as is(even used significantly smaller models for strong sort and still it wasn't good enough for my standards)..
The porting i did of the code + testing actually took place the following day from my last post here. I did it as fast as possible to get the results i needed so the changes are somewhat poorly made.
Either way if you want to try it, I can try uploading the project somewhere this weekend
@Jimmeimetis So you mean the reason of high execution time is because strongsort. I haven't try deepsort with yolov7 but have you tried and did execution time is lower? I tried tracker with v5 and v7 repo of smaller yolo and strongsort models and it wasn't good enough for my standards too
@NQHuy1905 Yes I have been running yolo v7 and v8 with deepsort. It has its own problems but at this point I don't have the time yet to dive into other trackers. There are public repos out there that have paired v7 with deepsort if you want to try
Search before asking
Yolov7_StrongSORT_OSNet Component
Tracking
Bug
(pytorch1.7) PS D:\Github\Yolov7_StrongSORT_OSNet> python track.py --source .\test.mp4 --strong-sort-weights osnet_x0_25_market1501.pt D:\Github\Yolov7_StrongSORT_OSNet\strong_sort/deep/reid\torchreid\metrics\rank.py:11: UserWarning: Cython evaluation (very fast so highly recommended) is unavailable, now use python evaluation. warnings.warn( Fusing layers... RepConv.fuse_repvgg_block RepConv.fuse_repvgg_block RepConv.fuse_repvgg_block Model: osnet_x0_25
Environment
v1.0 osnet_x0_25_market1501 windows 10 64bit python 3.8 pytorch 1.7.1 + cu101
Minimal Reproducible Example
python track.py --source .\test.mp4 --strong-sort-weights osnet_x0_25_market1501.pt