Abstract
Multi-view stereo (MVS) reconstruction is essential for creating 3D models. The approach involves applying epipolar rectification followed by dense matching for disparity estimation. However, existing approaches face challenges in applying dense matching for images with different viewpoints primarily due to large differences in object scale. In this paper, we propose a spherical model for epipolar rectification to minimize distortions caused by differences in principal rays. We evaluate the proposed approach using two aerial-based datasets consisting of multi-camera head systems. We show through qualitative and quantitative evaluation that the proposed approach performs better than frame-based epipolar correction by enhancing the completeness of point clouds by up to 4.05% while improving the accuracy by up to 10.23% using LiDAR data as ground truth.
Constrained Bundle Adjustment for Structure From Motion Using Uncalibrated Multi-Camera Systems
Abstract
Structure from motion using uncalibrated multi-camera systems is a challenging task. This paper proposes a bundle adjustment solution that implements a baseline constraint respecting that these cameras are static to each other. We assume these cameras are mounted on a mobile platform, uncalibrated, and coarsely synchronized. To this end, we propose the baseline constraint that is formulated for the scenario in which the cameras have overlapping views. The constraint is incorporated in the bundle adjustment solution to keep the relative motion of different cameras static. Experiments were conducted using video frames of two collocated GoPro cameras mounted on a vehicle with no system calibration. These two cameras were placed capturing overlapping contents. We performed our bundle adjustment using the proposed constraint and then produced 3D dense point clouds. Evaluations were performed by comparing these dense point clouds against LiDAR reference data. We showed that, as compared to traditional bundle adjustment, our proposed method achieved an improvement of 29.38%.
A Credible and Robust approach to Ego-Motion Estimation using an Automotive Radar
Authors: Karim Haggag, Sven Lange, Tim Pfeifer, Peter Protzel
Abstract
Consistent motion estimation is fundamental for all mobile autonomous systems. While this sounds like an easy task, often, it is not the case because of changing environmental conditions affecting odometry obtained from vision, Lidar, or the wheels themselves. Unsusceptible to challenging lighting and weather conditions, radar sensors are an obvious alternative. Usually, automotive radars return a sparse point cloud, representing the surroundings. Utilizing this information to motion estimation is challenging due to unstable and phantom measurements, which result in a high rate of outliers. We introduce a credible and robust probabilistic approach to estimate the ego-motion based on these challenging radar measurements; intended to be used within a loosely-coupled sensor fusion framework. Compared to existing solutions, evaluated on the popular nuScenes dataset and others, we show that our proposed algorithm is more credible while not depending on explicit correspondence calculation.
Metasurface-enhanced Light Detection and Ranging Technology
Authors: Renato Juliano Martins, Emil Marinov, M. Aziz Ben Youssef, Christina Kyrou, Mathilde Joubert, Constance Colmagro, Valentin Gâté, Colette Turbil, Pierre-Marie Coulon, Daniel Turover, Samira Khadir, Massimo Giudici, Charalambos Klitis, Marc Sorel, Patrice Genevet
Subjects: Robotics (cs.RO); Instrumentation and Detectors (physics.ins-det); Optics (physics.optics)
Abstract
Deploying advanced imaging solutions to robotic and autonomous systems by mimicking human vision requires simultaneous acquisition of multiple fields of views, named the peripheral and fovea regions. Low-resolution peripheral field provides coarse scene exploration to direct the eye to focus to a highly resolved fovea region for sharp imaging. Among 3D computer vision techniques, Light Detection and Ranging (LiDAR) is currently considered at the industrial level for robotic vision. LiDAR is an imaging technique that monitors pulses of light at optical frequencies to sense the space and to recover three-dimensional ranging information. Notwithstanding the efforts on LiDAR integration and optimization, commercially available devices have slow frame rate and low image resolution, notably limited by the performance of mechanical or slow solid-state deflection systems. Metasurfaces (MS) are versatile optical components that can distribute the optical power in desired regions of space. Here, we report on an advanced LiDAR technology that uses ultrafast low FoV deflectors cascaded with large area metasurfaces to achieve large FoV and simultaneous peripheral and central imaging zones. This technology achieves MHz frame rate for 2D imaging, and up to KHz for 3D imaging, with extremely large FoV (up to 150{\deg}deg. on both vertical and horizontal scanning axes). The use of this disruptive LiDAR technology with advanced learning algorithms offers perspectives to improve further the perception capabilities and decision-making process of autonomous vehicles and robotic systems.
Keyword: loop detection
There is no result
Keyword: autonomous driving
There is no result
Keyword: mapping
TemporalUV: Capturing Loose Clothing with Temporally Coherent UV Coordinates
Authors: You Xie, Huiqi Mao, Angela Yao, Nils Thuerey
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
We propose a novel approach to generate temporally coherent UV coordinates for loose clothing. Our method is not constrained by human body outlines and can capture loose garments and hair. We implemented a differentiable pipeline to learn UV mapping between a sequence of RGB inputs and textures via UV coordinates. Instead of treating the UV coordinates of each frame separately, our data generation approach connects all UV coordinates via feature matching for temporal stability. Subsequently, a generative model is trained to balance the spatial quality and temporal stability. It is driven by supervised and unsupervised losses in both UV and image spaces. Our experiments show that the trained models output high-quality UV coordinates and generalize to new poses. Once a sequence of UV coordinates has been inferred by our model, it can be used to flexibly synthesize new looks and modified visual styles. Compared to existing methods, our approach reduces the computational workload to animate new outfits by several orders of magnitude.
Canonical Mean Filter for Almost Zero-Shot Multi-Task classification
Authors: Yong Li, Heng Wang, Xiang Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The support set is a key to providing conditional prior for fast adaption of the model in few-shot tasks. But the strict form of support set makes its construction actually difficult in practical application. Motivated by ANIL, we rethink the role of adaption in the feature extractor of CNAPs, which is a state-of-the-art representative few-shot method. To investigate the role, Almost Zero-Shot (AZS) task is designed by fixing the support set to replace the common scheme, which provides corresponding support sets for the different conditional prior of different tasks. The AZS experiment results infer that the adaptation works little in the feature extractor. However, CNAPs cannot be robust to randomly selected support sets and perform poorly on some datasets of Meta-Dataset because of its scattered mean embeddings responded by the simple mean operator. To enhance the robustness of CNAPs, Canonical Mean Filter (CMF) module is proposed to make the mean embeddings intensive and stable in feature space by mapping the support sets into a canonical form. CMFs make CNAPs robust to any fixed support sets even if they are random matrices. This attribution makes CNAPs be able to remove the mean encoder and the parameter adaptation network at the test stage, while CNAP-CMF on AZS tasks keeps the performance with one-shot tasks. It leads to a big parameter reduction. Precisely, 40.48\% parameters are dropped at the test stage. Also, CNAP-CMF outperforms CNAPs in one-shot tasks because it addresses inner-task unstable performance problems. Classification performance, visualized and clustering results verify that CMFs make CNAPs better and simpler.
Ontology Matching Through Absolute Orientation of Embedding Spaces
Authors: Jan Portisch, Guilherme Costa, Karolin Stefani, Katharina Kreplin, Michael Hladik, Heiko Paulheim
Abstract
Ontology matching is a core task when creating interoperable and linked open datasets. In this paper, we explore a novel structure-based mapping approach which is based on knowledge graph embeddings: The ontologies to be matched are embedded, and an approach known as absolute orientation is used to align the two embedding spaces. Next to the approach, the paper presents a first, preliminary evaluation using synthetic and real-world datasets. We find in experiments with synthetic data, that the approach works very well on similarly structured graphs; it handles alignment noise better than size and structural differences in the ontologies.
C-NMT: A Collaborative Inference Framework for Neural Machine Translation
Abstract
Collaborative Inference (CI) optimizes the latency and energy consumption of deep learning inference through the inter-operation of edge and cloud devices. Albeit beneficial for other tasks, CI has never been applied to the sequence- to-sequence mapping problem at the heart of Neural Machine Translation (NMT). In this work, we address the specific issues of collaborative NMT, such as estimating the latency required to generate the (unknown) output sequence, and show how existing CI methods can be adapted to these applications. Our experiments show that CI can reduce the latency of NMT by up to 44% compared to a non-collaborative approach.
Keyword: localization
Spatiotemporal Augmentation on Selective Frequencies for Video Representation Learning
Authors: Jinhyung Kim, Taeoh Kim, Minho Shim, Dongyoon Han, Dongyoon Wee, Junmo Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent self-supervised video representation learning methods focus on maximizing the similarity between multiple augmented views from the same video and largely rely on the quality of generated views. In this paper, we propose frequency augmentation (FreqAug), a spatio-temporal data augmentation method in the frequency domain for video representation learning. FreqAug stochastically removes undesirable information from the video by filtering out specific frequency components so that learned representation captures essential features of the video for various downstream tasks. Specifically, FreqAug pushes the model to focus more on dynamic features rather than static features in the video via dropping spatial or temporal low-frequency components. In other words, learning invariance between remaining frequency components results in high-frequency enhanced representation with less static bias. To verify the generality of the proposed method, we experiment with FreqAug on multiple self-supervised learning frameworks along with standard augmentations. Transferring the improved representation to five video action recognition and two temporal action localization downstream tasks shows consistent improvements over baselines.
Keyword: SLAM
There is no result
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Investigating Spherical Epipolar Rectification for Multi-View Stereo 3D Reconstruction
Constrained Bundle Adjustment for Structure From Motion Using Uncalibrated Multi-Camera Systems
A Credible and Robust approach to Ego-Motion Estimation using an Automotive Radar
Metasurface-enhanced Light Detection and Ranging Technology
Keyword: loop detection
There is no result
Keyword: autonomous driving
There is no result
Keyword: mapping
TemporalUV: Capturing Loose Clothing with Temporally Coherent UV Coordinates
Canonical Mean Filter for Almost Zero-Shot Multi-Task classification
Ontology Matching Through Absolute Orientation of Embedding Spaces
C-NMT: A Collaborative Inference Framework for Neural Machine Translation
Keyword: localization
Spatiotemporal Augmentation on Selective Frequencies for Video Representation Learning