Eliminating Warping Shakes for Unsupervised Online Video Stitching

🚩Recommendation

We have released the complete code of StabStitch++ (an extension of StabStitch) with better alignment, fewer distortions, and higher stability.
It contains the codes for training, inference, and multi-video stitching.

Introduction

This is the official implementation for StabStitch (ECCV2024).

Lang Nie¹, Chunyu Lin¹, Kang Liao², Yun Zhang³, Shuaicheng Liu⁴, Rui Ai⁵, Yao Zhao¹

¹ Beijing Jiaotong University {nielang, cylin, yzhao}@bjtu.edu.cn

² Nanyang Technological University

³ Communication University of Zhejiang

⁴ University of Electronic Science and Technology of China

⁵ HAMO.AI

Feature

Nowadays, the videos captured from hand-held cameras are typically stable due to the advancements and widespread adoption of video stabilization in both hardware and software. Under such circumstances, we retarget video stitching to an emerging issue, warping shake, which describes the undesired content instability in non-overlapping regions especially when image stitching technology is directly applied to videos. To address it, we propose the first unsupervised online video stitching framework, named StabStitch, by generating stitching trajectories and smoothing them. The above figure shows the occurrence and elimination of warping shakes.

Video

Here, we provide a video (released on YouTube) to show the stitched results from StabStitch and other solutions.

📝 Changelog

[x] 2024.03.11: The paper of the arXiv version is online.
[x] 2024.07.11: We have replaced the original arXiv version with the final camera-ready version.
[x] 2024.07.11: The StabStitch-D dataset is available.
[x] 2024.07.11: The inference code and pre-trained models are available.
[x] 2024.07.12: We add a simple analysis of the limitations and prospects.

Dataset (StabStitch-D)

The details of the dataset can be found in our paper. (arXiv)

The dataset can be available at Google Drive or Baidu Cloud(Extraction code: 1234).

Code

Requirement

We implement StabStitch with one GPU of RTX4090Ti. Refer to environment.yml for more details.

Pre-trained model

The pre-trained models (spatial_warp.pth, temporal_warp.pth, and smooth_warp.pth) are available at Google Drive or Baidu Cloud (extraction code: 1234). Please download them and put them in the 'model' folder.

Test on the StabStitch-D dataset

Modify the test_path in Codes/test_online.py and run:

python test_online.py

Then, a folder named 'result' will be created automatically to store the stitched videos.

About the TPS warping function, we set two modes to warp frames as follows:

'FAST' mode: It uses F.grid_sample to implement interpolation. It's fast but may produce thin black boundaries.
'NORMAL' mode: It uses our implemented interpolation function. It's a bit slower but avoid the black boundaries.

You can change the mode here.

Calculate the metrics on the StabStitch-D dataset

Modify the test_path in Codes/test_metric.py and run:

python test_metric.py

Limitation and Future Prospect

Generalization

To test the model generalization, we adopt the pre-trained model (on the StabStitch-D dataset) to conduct some tests on traditional video stitching datasets. Surprisingly, it severely degrades and produces obvious distortions and artifacts, as illustrated in Figure (a) below. To further validate the generalization, we collect other video pairs from traditional video stitching datasets (over 30 video pairs) and retrain our model in the new dataset. As shown in Figure (b) below, it works well in the new dataset but fails to produce natural stitched videos on the StabStitch-D dataset.

Prospect

We found that performance degradation mainly occurs in the spatial warp model. Without corrected spatial warps, the subsequent smoothing process will amplify the distortion.

It then throws a question about how to ensure the model generalization in learning-based stitching models. A simple and intuitive idea is to establish a large-scale real-world stitching benchmark dataset with various complex scenes. It should benefit various stitching networks in the generalization. Another idea is to apply continuous learning to the field of stitching, enabling the network to work robustly across various datasets with different distributions

These are just a few simple proposals. We hope you, the intelligent minds in this field, can help to solve this problem and contribute to the advancement of this field. If you have some ideas and want to discuss them with me, please feel free to drop me an email. I’m open to any kinds of collaboration.

References

[1] S. Liu, P. Tan, L. Yuan, J. Sun, and B. Zeng. Meshflow: Minimum latency online video stabilization. ECCV, 2016.
[2] L. Nie, C. Lin, K. Liao, S. Liu, and Y. Zhao. Unsupervised Deep Image Stitching: Reconstructing Stitched Features to Images. TIP, 2021.
[3] L. Nie, C. Lin, K. Liao, S. Liu, and Y. Zhao. Parallax-Tolerant Unsupervised Deep Image Stitching. ICCV, 2023.

nie-lang / StabStitch

readme