Abstract
While lifelong SLAM addresses the capability of a robot to adapt to changes within a single environment over time, in this paper we introduce the task of continual SLAM. Here, a robot is deployed sequentially in a variety of different environments and has to transfer its knowledge of previously experienced environments to thus far unseen environments, while avoiding catastrophic forgetting. This is particularly relevant in the context of vision-based approaches, where the relevant features vary widely between different environments. We propose a novel approach for solving the continual SLAM problem by introducing CL-SLAM. Our approach consists of a dual-network architecture that handles both short-term adaptation and long-term memory retention by incorporating a replay buffer. Extensive evaluations of CL-SLAM in three different environments demonstrate that it outperforms several baselines inspired by existing continual learning-based visual odometry methods. The code of our work is publicly available at this http URL
STUN: Self-Teaching Uncertainty Estimation for Place Recognition
Authors: Kaiwen Cai, Chris Xiaoxuan Lu, Xiaowei Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Place recognition is key to Simultaneous Localization and Mapping (SLAM) and spatial perception. However, a place recognition in the wild often suffers from erroneous predictions due to image variations, e.g., changing viewpoints and street appearance. Integrating uncertainty estimation into the life cycle of place recognition is a promising method to mitigate the impact of variations on place recognition performance. However, existing uncertainty estimation approaches in this vein are either computationally inefficient (e.g., Monte Carlo dropout) or at the cost of dropped accuracy. This paper proposes STUN, a self-teaching framework that learns to simultaneously predict the place and estimate the prediction uncertainty given an input image. To this end, we first train a teacher net using a standard metric learning pipeline to produce embedding priors. Then, supervised by the pretrained teacher net, a student net with an additional variance branch is trained to finetune the embedding priors and estimate the uncertainty sample by sample. During the online inference phase, we only use the student net to generate a place prediction in conjunction with the uncertainty. When compared with place recognition systems that are ignorant to the uncertainty, our framework features the uncertainty estimation for free without sacrificing any prediction accuracy. Our experimental results on the large-scale Pittsburgh30k dataset demonstrate that STUN outperforms the state-of-the-art methods in both recognition accuracy and the quality of uncertainty estimation.
Keyword: Visual inertial
There is no result
Keyword: livox
Intensity Image-based LiDAR Fiducial Marker System
Authors: Yibo Liu, Hunter Schofield, Jinjun Shan
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
The fiducial marker system for LiDAR is crucial for the robotic application but it is still rare to date. In this paper, an Intensity Image-based LiDAR Fiducial Marker (IILFM) system is developed. This system only requires an unstructured point cloud with intensity as the input and it has no restriction on marker placement and shape. A marker detection method that locates the predefined 3D fiducials in the point cloud through the intensity image is introduced. Then, an approach that utilizes the detected 3D fiducials to estimate the LiDAR 6-DOF pose that describes the transmission from the world coordinate system to the LiDAR coordinate system is developed. Moreover, all these processes run in real-time (approx 40 Hz on Livox Mid-40 and approx 143 Hz on VLP-16). Qualitative and quantitative experiments are conducted to demonstrate that the proposed system has similar convenience and accuracy as the conventional visual fiducial marker system. The codes and results are available at: https://github.com/York-SDCNLab/IILFM.
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Quantity over Quality: Training an AV Motion Planner with Large Scale Commodity Vision Data
Authors: Lukas Platinsky, Tayyab Naseer, Hui Chen, Ben Haines, Haoyue Zhu, Hugo Grimmett, Luca Del Pero
Abstract
With the Autonomous Vehicle (AV) industry shifting towards Autonomy 2.0, the performance of self-driving systems starts to rely heavily on large quantities of expert driving demonstrations. However, collecting this demonstration data typically involves expensive HD sensor suites (LiDAR + RADAR + cameras), which quickly becomes financially infeasible at the scales required. This motivates the use of commodity vision sensors for data collection, which are an order of magnitude cheaper than the HD sensor suites, but offer lower fidelity. If it were possible to leverage these for training an AV motion planner, observing the `long tail' of driving events would become a financially viable strategy. As our main contribution we show it is possible to train a high-performance motion planner using commodity vision data which outperforms planners trained on HD-sensor data for a fraction of the cost. We do this by comparing the autonomy system performance when training on these two different sensor configurations, and showing that we can compensate for the lower sensor fidelity by means of increased quantity: a planner trained on 100h of commodity vision data outperforms one with 25h of expensive HD data. We also share the technical challenges we had to tackle to make this work. To the best of our knowledge, we are the first to demonstrate that this is possible using real-world data.
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
Abstract
3D single object tracking (3D SOT) in LiDAR point clouds plays a crucial role in autonomous driving. Current approaches all follow the Siamese paradigm based on appearance matching. However, LiDAR point clouds are usually textureless and incomplete, which hinders effective appearance matching. Besides, previous methods greatly overlook the critical motion clues among targets. In this work, beyond 3D Siamese tracking, we introduce a motion-centric paradigm to handle 3D SOT from a new perspective. Following this paradigm, we propose a matching-free two-stage tracker M^2-Track. At the 1^st-stage, M^2-Track localizes the target within successive frames via motion transformation. Then it refines the target box through motion-assisted shape completion at the 2^nd-stage. Extensive experiments confirm that M^2-Track significantly outperforms previous state-of-the-arts on three large-scale datasets while running at 57FPS (~8%, ~17%, and ~22%) precision gains on KITTI, NuScenes, and Waymo Open Dataset respectively). Further analysis verifies each component's effectiveness and shows the motion-centric paradigm's promising potential when combined with appearance matching.
Intensity Image-based LiDAR Fiducial Marker System
Authors: Yibo Liu, Hunter Schofield, Jinjun Shan
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
The fiducial marker system for LiDAR is crucial for the robotic application but it is still rare to date. In this paper, an Intensity Image-based LiDAR Fiducial Marker (IILFM) system is developed. This system only requires an unstructured point cloud with intensity as the input and it has no restriction on marker placement and shape. A marker detection method that locates the predefined 3D fiducials in the point cloud through the intensity image is introduced. Then, an approach that utilizes the detected 3D fiducials to estimate the LiDAR 6-DOF pose that describes the transmission from the world coordinate system to the LiDAR coordinate system is developed. Moreover, all these processes run in real-time (approx 40 Hz on Livox Mid-40 and approx 143 Hz on VLP-16). Qualitative and quantitative experiments are conducted to demonstrate that the proposed system has similar convenience and accuracy as the conventional visual fiducial marker system. The codes and results are available at: https://github.com/York-SDCNLab/IILFM.
Keyword: loop detection
There is no result
Keyword: autonomous driving
MUAD: Multiple Uncertainties for Autonomous Driving benchmark for multiple uncertainty types and tasks
Authors: Gianni Franchi, Xuanlong Yu, Andrei Bursuc, Rémi Kazmierczak, Séverine Dubuisson, Emanuel Aldea, David Filliat
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Predictive uncertainty estimation is essential for deploying Deep Neural Networks in real-world autonomous systems. However, disentangling the different types and sources of uncertainty is non trivial in most datasets, especially since there is no ground truth for uncertainty. In addition, different degrees of weather conditions can disrupt neural networks, resulting in inconsistent training data quality. Thus, we introduce the MUAD dataset (Multiple Uncertainties for Autonomous Driving), consisting of 8,500 realistic synthetic images with diverse adverse weather conditions (night, fog, rain, snow), out-of-distribution objects and annotations for semantic segmentation, depth estimation, object and instance detection. MUAD allows to better assess the impact of different sources of uncertainty on model performance. We propose a study that shows the importance of having reliable Deep Neural Networks (DNNs) in multiple experiments, and will release our dataset to allow researchers to benchmark their algorithm methodically in ad-verse conditions. More information and the download link for MUAD are available at https://muad-dataset.github.io/ .
Deformable Radar Polygon: A Lightweight and Predictable Occupancy Representation for Short-range Collision Avoidance
Abstract
Inferring the drivable area in a scene is a key capability for ensuring vehicle avoids obstacles and enabling safe autonomous driving. However, traditional occupancy grid map suffers from high memory consumption when forming a fine-resolution grid for a large map. In this paper, we propose a lightweight, accurate, and predictable occupancy representation for automotive radars working for short-range applications that take interest in instantaneous free space surrounding the sensor. This new occupancy format is a polygon composed of a bunch of vertexes selected from radar measurements, which covers free space inside and gives a Doppler moving velocity for each vertex. It not only takes a very small memory for storage and update at every timeslot, but also has the predictable shape-change property based on vertex Doppler velocity. We name this kind of occupancy representation `deformable radar polygon'. Two formation algorithms for radar polygon are introduced for both single timeslot and continuous ISM update. To fit this new polygon representation, a matrix-form collision detection method have been modeled as well. The radar polygon algorithms and collision detection model have been validated via extensive experiments with real collected data and simulations, showing that the deformable radar polygon is very competitive in terms of its completeness, smoothness, accuracy, lightweight as well as shape-predictable property. Our codes will be made publicly available for ease of future works.
Spatial-Temporal Gating-Adjacency GCN for Human Motion Prediction
Authors: Chongyang Zhong, Lei Hu, Zihao Zhang, Yongjing Ye, Shihong Xia
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Predicting future motion based on historical motion sequence is a fundamental problem in computer vision, and it has wide applications in autonomous driving and robotics. Some recent works have shown that Graph Convolutional Networks(GCN) are instrumental in modeling the relationship between different joints. However, considering the variants and diverse action types in human motion data, the cross-dependency of the spatial-temporal relationships will be difficult to depict due to the decoupled modeling strategy, which may also exacerbate the problem of insufficient generalization. Therefore, we propose the Spatial-Temporal Gating-Adjacency GCN(GAGCN) to learn the complex spatial-temporal dependencies over diverse action types. Specifically, we adopt gating networks to enhance the generalization of GCN via the trainable adaptive adjacency matrix obtained by blending the candidate spatial-temporal adjacency matrices. Moreover, GAGCN addresses the cross-dependency of space and time by balancing the weights of spatial-temporal modeling and fusing the decoupled spatial-temporal features. Extensive experiments on Human 3.6M, AMASS, and 3DPW demonstrate that GAGCN achieves state-of-the-art performance in both short-term and long-term predictions. Our code will be released in the future.
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
Abstract
3D single object tracking (3D SOT) in LiDAR point clouds plays a crucial role in autonomous driving. Current approaches all follow the Siamese paradigm based on appearance matching. However, LiDAR point clouds are usually textureless and incomplete, which hinders effective appearance matching. Besides, previous methods greatly overlook the critical motion clues among targets. In this work, beyond 3D Siamese tracking, we introduce a motion-centric paradigm to handle 3D SOT from a new perspective. Following this paradigm, we propose a matching-free two-stage tracker M^2-Track. At the 1^st-stage, M^2-Track localizes the target within successive frames via motion transformation. Then it refines the target box through motion-assisted shape completion at the 2^nd-stage. Extensive experiments confirm that M^2-Track significantly outperforms previous state-of-the-arts on three large-scale datasets while running at 57FPS (~8%, ~17%, and ~22%) precision gains on KITTI, NuScenes, and Waymo Open Dataset respectively). Further analysis verifies each component's effectiveness and shows the motion-centric paradigm's promising potential when combined with appearance matching.
LatentFormer: Multi-Agent Transformer-Based Interaction Modeling and Trajectory Prediction
Authors: Elmira Amirloo, Amir Rasouli, Peter Lakner, Mohsen Rohani, Jun Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multi-agent trajectory prediction is a fundamental problem in autonomous driving. The key challenges in prediction are accurately anticipating the behavior of surrounding agents and understanding the scene context. To address these problems, we propose LatentFormer, a transformer-based model for predicting future vehicle trajectories. The proposed method leverages a novel technique for modeling interactions among dynamic objects in the scene. Contrary to many existing approaches which model cross-agent interactions during the observation time, our method additionally exploits the future states of the agents. This is accomplished using a hierarchical attention mechanism where the evolving states of the agents autoregressively control the contributions of past trajectories and scene encodings in the final prediction. Furthermore, we propose a multi-resolution map encoding scheme that relies on a vision transformer module to effectively capture both local and global scene context to guide the generation of more admissible future trajectories. We evaluate the proposed method on the nuScenes benchmark dataset and show that our approach achieves state-of-the-art performance and improves upon trajectory metrics by up to 40%. We further investigate the contributions of various components of the proposed technique via extensive ablation studies.
Keyword: mapping
Adaptive Path Planning for UAVs for Multi-Resolution Semantic Segmentation
Authors: Felix Stache, Jonas Westheider, Federico Magistri, Cyrill Stachniss, Marija Popović
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Efficient data collection methods play a major role in helping us better understand the Earth and its ecosystems. In many applications, the usage of unmanned aerial vehicles (UAVs) for monitoring and remote sensing is rapidly gaining momentum due to their high mobility, low cost, and flexible deployment. A key challenge is planning missions to maximize the value of acquired data in large environments given flight time limitations. This is, for example, relevant for monitoring agricultural fields. This paper addresses the problem of adaptive path planning for accurate semantic segmentation of using UAVs. We propose an online planning algorithm which adapts the UAV paths to obtain high-resolution semantic segmentations necessary in areas with fine details as they are detected in incoming images. This enables us to perform close inspections at low altitudes only where required, without wasting energy on exhaustive mapping at maximum image resolution. A key feature of our approach is a new accuracy model for deep learning-based architectures that captures the relationship between UAV altitude and semantic segmentation accuracy. We evaluate our approach on different domains using real-world data, proving the efficacy and generability of our solution.
On an application of graph neural networks in population based SHM
Authors: G. Tsialiamanis, C. Mylonas, E. Chatzi, D.J. Wagg, N. Dervilis, K. Worden
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
Attempts have been made recently in the field of population-based structural health monitoring (PBSHM), to transfer knowledge between SHM models of different structures. The attempts have been focussed on homogeneous and heterogeneous populations. A more general approach to transferring knowledge between structures, is by considering all plausible structures as points on a multidimensional base manifold and building a fibre bundle. The idea is quite powerful, since, a mapping between points in the base manifold and their fibres, the potential states of any arbitrary structure, can be learnt. A smaller scale problem, but still useful, is that of learning a specific point of every fibre, i.e. that corresponding to the undamaged state of structures within a population. Under the framework of PBSHM, a data-driven approach to the aforementioned problem is developed. Structures are converted into graphs and inference is attempted within a population, using a graph neural network (GNN) algorithm. The algorithm solves a major problem existing in such applications. Structures comprise different sizes and are defined as abstract objects, thus attempting to perform inference within a heterogeneous population is not trivial. The proposed approach is tested in a simulated population of trusses. The goal of the application is to predict the first natural frequency of trusses of different sizes, across different environmental temperatures and having different bar member types. After training the GNN using part of the total population, it was tested on trusses that were not included in the training dataset. Results show that the accuracy of the regression is satisfactory even in structures with higher number of nodes and members than those used to train it.
Informative Path Planning for Active Learning in Aerial Semantic Mapping
Authors: Julius Rückin, Liren Jin, Federico Magistri, Cyrill Stachniss, Marija Popović
Abstract
Semantic segmentation of aerial imagery is an important tool for mapping and earth observation. However, supervised deep learning models for segmentation rely on large amounts of high-quality labelled data, which is labour-intensive and time-consuming to generate. To address this, we propose a new approach for using unmanned aerial vehicles (UAVs) to autonomously collect useful data for model training. We exploit a Bayesian approach to estimate model uncertainty in semantic segmentation. During a mission, the semantic predictions and model uncertainty are used as input for terrain mapping. A key aspect of our pipeline is to link the mapped model uncertainty to a robotic planning objective based on active learning. This enables us to adaptively guide a UAV to gather the most informative terrain images to be labelled by a human for model training. Our experimental evaluation on real-world data shows the benefit of using our informative planning approach in comparison to static coverage paths in terms of maximising model performance and reducing labelling efforts.
Bridging the Source-to-target Gap for Cross-domain Person Re-Identification with Intermediate Domains
Authors: Yongxing Dai, Yifan Sun, Jun Liu, Zekun Tong, Yi Yang, Ling-Yu Duan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Cross-domain person re-identification (re-ID), such as unsupervised domain adaptive (UDA) re-ID, aims to transfer the identity-discriminative knowledge from the source to the target domain. Existing methods commonly consider the source and target domains are isolated from each other, i.e., no intermediate status is modeled between both domains. Directly transferring the knowledge between two isolated domains can be very difficult, especially when the domain gap is large. From a novel perspective, we assume these two domains are not completely isolated, but can be connected through intermediate domains. Instead of directly aligning the source and target domains against each other, we propose to align the source and target domains against their intermediate domains for a smooth knowledge transfer. To discover and utilize these intermediate domains, we propose an Intermediate Domain Module (IDM) and a Mirrors Generation Module (MGM). IDM has two functions: 1) it generates multiple intermediate domains by mixing the hidden-layer features from source and target domains and 2) it dynamically reduces the domain gap between the source / target domain features and the intermediate domain features. While IDM achieves good domain alignment, it introduces a side effect, i.e., the mix-up operation may mix the identities into a new identity and lose the original identities. To compensate this, MGM is introduced by mapping the features into the IDM-generated intermediate domains without changing their original identity. It allows to focus on minimizing domain variations to promote the alignment between the source / target domain and intermediate domains, which reinforces IDM into IDM++. We extensively evaluate our method under both the UDA and domain generalization (DG) scenarios and observe that IDM++ yields consistent performance improvement for cross-domain re-ID, achieving new state of the art.
STUN: Self-Teaching Uncertainty Estimation for Place Recognition
Authors: Kaiwen Cai, Chris Xiaoxuan Lu, Xiaowei Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Place recognition is key to Simultaneous Localization and Mapping (SLAM) and spatial perception. However, a place recognition in the wild often suffers from erroneous predictions due to image variations, e.g., changing viewpoints and street appearance. Integrating uncertainty estimation into the life cycle of place recognition is a promising method to mitigate the impact of variations on place recognition performance. However, existing uncertainty estimation approaches in this vein are either computationally inefficient (e.g., Monte Carlo dropout) or at the cost of dropped accuracy. This paper proposes STUN, a self-teaching framework that learns to simultaneously predict the place and estimate the prediction uncertainty given an input image. To this end, we first train a teacher net using a standard metric learning pipeline to produce embedding priors. Then, supervised by the pretrained teacher net, a student net with an additional variance branch is trained to finetune the embedding priors and estimate the uncertainty sample by sample. During the online inference phase, we only use the student net to generate a place prediction in conjunction with the uncertainty. When compared with place recognition systems that are ignorant to the uncertainty, our framework features the uncertainty estimation for free without sacrificing any prediction accuracy. Our experimental results on the large-scale Pittsburgh30k dataset demonstrate that STUN outperforms the state-of-the-art methods in both recognition accuracy and the quality of uncertainty estimation.
An observer cascade for velocity and multiple line estimation
Authors: André Mateus, Pedro U. Lima, Pedro Miraldo
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Previous incremental estimation methods consider estimating a single line, requiring as many observers as the number of lines to be mapped. This leads to the need for having at least $4N$ state variables, with $N$ being the number of lines. This paper presents the first approach for multi-line incremental estimation. Since lines are common in structured environments, we aim to exploit that structure to reduce the state space. The modeling of structured environments proposed in this paper reduces the state space to $3N + 3$ and is also less susceptible to singular configurations. An assumption the previous methods make is that the camera velocity is available at all times. However, the velocity is usually retrieved from odometry, which is noisy. With this in mind, we propose coupling the camera with an Inertial Measurement Unit (IMU) and an observer cascade. A first observer retrieves the scale of the linear velocity and a second observer for the lines mapping. The stability of the entire system is analyzed. The cascade is shown to be asymptotically stable and shown to converge in experiments with simulated data.
Keyword: localization
Graph-based Multi-sensor Fusion for Consistent Localization of Autonomous Construction Robots
Authors: Julian Nubert, Shehryar Khattak, Marco Hutter
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Enabling autonomous operation of large-scale construction machines, such as excavators, can bring key benefits for human safety and operational opportunities for applications in dangerous and hazardous environments. To facilitate robot autonomy, robust and accurate state-estimation remains a core component to enable these machines for operation in a diverse set of complex environments. In this work, a method for multi-modal sensor fusion for robot state-estimation and localization is presented, enabling operation of construction robots in real-world scenarios. The proposed approach presents a graph-based prediction-update loop that combines the benefits of filtering and smoothing in order to provide consistent state estimates at high update rate, while maintaining accurate global localization for large-scale earth-moving excavators. Furthermore, the proposed approach enables a flexible integration of asynchronous sensor measurements and provides consistent pose estimates even during phases of sensor dropout. For this purpose, a dual-graph design for switching between two distinct optimization problems is proposed, directly addressing temporary failure and the subsequent return of global position estimates. The proposed approach is implemented on-board two Menzi Muck walking excavators and validated during real-world tests conducted in representative operational environments.
Effect of Timing Error: A Case Study of Navigation Camera
Authors: Sandeep S. Kulkarni, Sanjay M. Joshi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We focus on the problem of timing errors in navigation camera as a case study in a broader problem of the effect of a timing error in cyber-physical systems. These systems rely on the requirement that certain things happen at the same time or certain things happen periodically at some period $T$. However, as these systems get more complex, timing errors can occur between the components thereby violating the assumption about events being simultaneous (or periodic). We consider the problem of a surgical navigation system where optical markers detected in the 2D pictures taken by two cameras are used to localize the markers in 3D space. A predefined array of such markers, known as a reference element, is used to navigate the corresponding CAD model of a surgical instrument on patient's images. The cameras rely on the assumption that the pictures from both cameras are taken exactly at the same time. If a timing error occurs then the instrument may have moved between the pictures. We find that, depending upon the location of the instrument, this can lead to a substantial error in the localization of the instrument. Specifically, we find that if the actual movement is $\delta$ then the observed movement may be as high as $5\delta$ in the operating range of the camera. Furthermore, we also identify potential issues that could affect the error in case there are changes to the camera system or to the operating range.
Weakly Supervised Object Localization as Domain Adaption
Authors: Lei Zhu, Qi She, Qian Chen, Yunfei You, Boyu Wang, Yanye Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Weakly supervised object localization (WSOL) focuses on localizing objects only with the supervision of image-level classification masks. Most previous WSOL methods follow the classification activation map (CAM) that localizes objects based on the classification structure with the multi-instance learning (MIL) mechanism. However, the MIL mechanism makes CAM only activate discriminative object parts rather than the whole object, weakening its performance for localizing objects. To avoid this problem, this work provides a novel perspective that models WSOL as a domain adaption (DA) task, where the score estimator trained on the source/image domain is tested on the target/pixel domain to locate objects. Under this perspective, a DA-WSOL pipeline is designed to better engage DA approaches into WSOL to enhance localization performance. It utilizes a proposed target sampling strategy to select different types of target samples. Based on these types of target samples, domain adaption localization (DAL) loss is elaborated. It aligns the feature distribution between the two domains by DA and makes the estimator perceive target domain cues by Universum regularization. Experiments show that our pipeline outperforms SOTA methods on multi benchmarks. Code are released at \url{https://github.com/zh460045050/DA-WSOL_CVPR2022}.
STUN: Self-Teaching Uncertainty Estimation for Place Recognition
Authors: Kaiwen Cai, Chris Xiaoxuan Lu, Xiaowei Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Place recognition is key to Simultaneous Localization and Mapping (SLAM) and spatial perception. However, a place recognition in the wild often suffers from erroneous predictions due to image variations, e.g., changing viewpoints and street appearance. Integrating uncertainty estimation into the life cycle of place recognition is a promising method to mitigate the impact of variations on place recognition performance. However, existing uncertainty estimation approaches in this vein are either computationally inefficient (e.g., Monte Carlo dropout) or at the cost of dropped accuracy. This paper proposes STUN, a self-teaching framework that learns to simultaneously predict the place and estimate the prediction uncertainty given an input image. To this end, we first train a teacher net using a standard metric learning pipeline to produce embedding priors. Then, supervised by the pretrained teacher net, a student net with an additional variance branch is trained to finetune the embedding priors and estimate the uncertainty sample by sample. During the online inference phase, we only use the student net to generate a place prediction in conjunction with the uncertainty. When compared with place recognition systems that are ignorant to the uncertainty, our framework features the uncertainty estimation for free without sacrificing any prediction accuracy. Our experimental results on the large-scale Pittsburgh30k dataset demonstrate that STUN outperforms the state-of-the-art methods in both recognition accuracy and the quality of uncertainty estimation.
Keyword: SLAM
Continual SLAM: Beyond Lifelong Simultaneous Localization and Mapping through Continual Learning
STUN: Self-Teaching Uncertainty Estimation for Place Recognition
Keyword: Visual inertial
There is no result
Keyword: livox
Intensity Image-based LiDAR Fiducial Marker System
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Quantity over Quality: Training an AV Motion Planner with Large Scale Commodity Vision Data
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
Intensity Image-based LiDAR Fiducial Marker System
Keyword: loop detection
There is no result
Keyword: autonomous driving
MUAD: Multiple Uncertainties for Autonomous Driving benchmark for multiple uncertainty types and tasks
Deformable Radar Polygon: A Lightweight and Predictable Occupancy Representation for Short-range Collision Avoidance
Spatial-Temporal Gating-Adjacency GCN for Human Motion Prediction
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
LatentFormer: Multi-Agent Transformer-Based Interaction Modeling and Trajectory Prediction
Keyword: mapping
Adaptive Path Planning for UAVs for Multi-Resolution Semantic Segmentation
On an application of graph neural networks in population based SHM
Informative Path Planning for Active Learning in Aerial Semantic Mapping
Bridging the Source-to-target Gap for Cross-domain Person Re-Identification with Intermediate Domains
STUN: Self-Teaching Uncertainty Estimation for Place Recognition
An observer cascade for velocity and multiple line estimation
Keyword: localization
Graph-based Multi-sensor Fusion for Consistent Localization of Autonomous Construction Robots
Effect of Timing Error: A Case Study of Navigation Camera
Weakly Supervised Object Localization as Domain Adaption
STUN: Self-Teaching Uncertainty Estimation for Place Recognition