🐼 Panda-70M

This is the offical Github repository of Panda-70M.

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, Sergey Tulyakov
Computer Vision and Pattern Recognition (CVPR) 2024

Introduction

Panda-70M is a large-scale dataset with 70M high-quality video-caption pairs. This repository have three sections:

Dataset Dataloading includes the csv files listing the data of Panda-70M and the code to download the dataset.
Splitting includes the code to split a long video into multiple semantics-consistent short clips.
Captioning includes the proposed video captioning model trained on Panda-70M.

Dataset

Collection Pipeline

Download

Split	Download	# Source Videos	# Samples	Video Duration	Storage Space
Training (full)	link (2.01 GB)	3,779,763	70,723,513	167 khrs	~36 TB
Training (10M)	link (381 MB)	3,755,240	10,473,922	37.0 khrs	~8.0 TB
Training (2M)	link (86.5 MB)	800,000	2,400,000	7.56 khrs	~1.6 TB
Validation	link (803 KB)	2,000	6,000	18.5 hrs	~4.0 GB
Testing	link (803 KB)	2,000	6,000	18.5 hrs	~4.0 GB

More details can be found in Dataset Dataloading section.

Demonstration

Video-Caption Pairs in Panda-70M


A rhino and a lion are fighting in the dirt.	A person is holding a long haired dachshund in their arms.	A rocket launches into space on the launch pad.


A person is kneading dough and putting jam on it.	A little boy is playing with a basketball in the city.	A 3d rendering of a zoo with animals and a train.


A person in blue gloves is connecting an electrical supply to an injector.	There is a beach with waves and rocks in the foreground, and a city skyline in the background.	It is a rally car driving on a dirt road in the countryside, with people watching from the side of the road.

^{**We will remove the video samples from our dataset / Github / project webpage / technical presentation as long as you need it. Please contact tsaishienchen at gmail dot com for the request.}

Please check here for more samples.

Long Video Splitting and Captioning

https://github.com/snap-research/Panda-70M/assets/3857997/8144cf3d-c20c-4c18-a4bd-011451da9f9b

https://github.com/snap-research/Panda-70M/assets/3857997/b262128e-2152-41e8-873e-db2dc275c40f

License of Panda-70M

See license. The video samples are collected from a publicly available dataset. Users must follow the related license to use these video samples.

Citation

If you find this project useful for your research, please cite our paper. :blush:

@article{chen2024panda70m,
    title   = {Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers},
    author  = {Chen, Tsai-Shien and Siarohin, Aliaksandr and Menapace, Willi and Deyneka, Ekaterina and Chao, Hsiang-wei and Jeon, Byung Eun and Fang, Yuwei and Lee, Hsin-Ying and Ren, Jian and Yang, Ming-Hsuan and Tulyakov, Sergey},
    journal = {arXiv preprint arXiv:2402.19479},
    year    = {2024}
}

Contact Information

Tsai-Shien Chen: tsaishienchen@gmail.com

snap-research / Panda-70M

readme