yjh0410 commented 1 year ago

Thanks for the open source of YOWO, a real-time method in spatio-temporal action detection task. Recently, I follow this repo. to reimplemented YOWO and achieve better performance, as shown in the tabels below. I name this YOWO as YOWO-Plus. We also design a efficient YOWO, YOWO-Nano whose 3D backbone is the 3D-ShuffleNet-v2-1.0x proposed by the authors of YOWO. My code is available at https://github.com/yjh0410/PyTorch_YOWO.

Improvement

Better 2D backbone: We use the weights of YOLOv2 from my project. Our YOLOv2 achieves a significantly higher AP （27 AP with 416 input） on the COCO dataset.
Better label assignment: For a groundtruth, we assign the anchor boxes with IoU higher than the threshold 0.5, so each groundtruth might be assigned with multiple anchor boxes.
Better loss: We deploy GIoU loss as the box regression loss. As for the conference loss and classification loss, they are same as the ones used in YOWO. Finally, all the losses are normalized by the batch size.

Experiment

UCF101-24

Model	Clip	GFLOPs	Frame mAP	Video mAP	FPS	Weight
YOWO	16	43.8	80.4	48.8	-	-
YOWO-Plus	16	43.8	84.9	50.5	36	github
YOWO-Nano	16	6.0	81.0	49.7	91	github

AVA v2.2

Model	Clip	mAP	FPS	weight
YOWO	16	17.9	31	-
YOWO	32	19.1	23	-
YOWO-Plus	16	20.6	33	github
YOWO-Plus	32	21.6	25	github
YOWO-Nano	16	18.4	100	github
YOWO-Nano	32	19.5	95	github

jaca-pereira commented 1 year ago

Hello! Do you plan on adding support for more resource efficient networks in the 3D backbone?

yjh0410 commented 1 year ago

@jaca-pereira I have added 3D-ShuffleNet-v2 in my repo. I will update the performance of YOWO with efficient 3D backbone in the future.

yjh0410 commented 1 year ago

@jaca-pereira Hi ！Dear friend, I recently release the YOWO-Nano whose 3D backbone is the 3D-ShuffleNet-v2.

jaca-pereira commented 1 year ago

Hello! Thank you very much, I'll check it out.

wei-tim / YOWO

A stronger YOWO achieved by us. #90

Improvement

Experiment