New submissions for Wed, 9 Mar 22

Keyword: SLAM

An Online Semantic Mapping System for Extending and Enhancing Visual SLAM

Authors: Thorsten Hempel, Ayoub Al-Hamadi
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.03944
Pdf link: https://arxiv.org/pdf/2203.03944
Abstract We present a real-time semantic mapping approach for mobile vision systems with a 2D to 3D object detection pipeline and rapid data association for generated landmarks. Besides the semantic map enrichment the associated detections are further introduced as semantic constraints into a simultaneous localization and mapping (SLAM) system for pose correction purposes. This way, we are able generate additional meaningful information that allows to achieve higher-level tasks, while simultaneously leveraging the view-invariance of object detections to improve the accuracy and the robustness of the odometry estimation. We propose tracklets of locally associated object observations to handle ambiguous and false predictions and an uncertainty-based greedy association scheme for an accelerated processing time. Our system reaches real-time capabilities with an average iteration duration of 65~ms and is able to improve the pose estimation of a state-of-the-art SLAM by up to 68% on a public dataset. Additionally, we implemented our approach as a modular ROS package that makes it straightforward for integration in arbitrary graph-based SLAM methods.
Keyword: Visual inertial

There is no result

Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: Visual inertial odometry

There is no result

Keyword: lidar

Direct LiDAR-Inertial Odometry
Authors: Kenny Chen, Ryan Nemiroff, Brett T. Lopez
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2203.03749
Pdf link: https://arxiv.org/pdf/2203.03749
Abstract This paper proposes a new LiDAR-inertial odometry framework that generates accurate state estimates and detailed maps in real-time on resource-constrained mobile robots. Our Direct LiDAR-Inertial Odometry (DLIO) algorithm utilizes a hybrid architecture that combines the benefits of loosely-coupled and tightly-coupled IMU integration to enhance reliability and real-time performance while improving accuracy. The proposed architecture has two key elements. The first is a fast keyframe-based LiDAR scan-matcher that builds an internal map by registering dense point clouds to a local submap with a translational and rotational prior generated by a nonlinear motion model. The second is a factor graph and high-rate propagator that fuses the output of the scan-matcher with preintegrated IMU measurements for up-to-date pose, velocity, and bias estimates. These estimates enable us to accurately deskew the next point cloud using a nonlinear kinematic model for precise motion correction, in addition to initializing the next scan-to-map optimization prior. We demonstrate DLIO's superior localization accuracy, map quality, and lower computational overhead by comparing it to the state-of-the-art using multiple benchmark, public, and self-collected datasets on both consumer and hobby-grade hardware.
ROLL: Long-Term Robust LiDAR-based Localization With Temporary Mapping in Changing Environments
Authors: Bin Peng, Hongle Xie, Weidong Chen
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2203.03923
Pdf link: https://arxiv.org/pdf/2203.03923
Abstract Long-term scene changes present challenges to localization systems using a pre-built map. This paper presents a LiDAR-based system that can provide robust localization against those challenges. Our method starts with activation of a mapping process temporarily when global matching towards the pre-built map is unreliable. The temporary map will be merged onto the pre-built map for later localization runs once reliable matching is obtained again. We further integrate a LiDAR inertial odometry (LIO) to provide motion-compensated LiDAR scans and a reliable initial pose guess for the global matching module. To generate a smooth real-time trajectory for navigation purposes, we fuse poses from odometry and global matching by solving a pose graph optimization problem. We evaluate our localization system with extensive experiments on the NCLT dataset including a variety of changing indoor and outdoor environments, and the results demonstrate a robust and accurate localization performance for over a year. The implementations are open sourced on GitHub.
Analyzing General-Purpose Deep-Learning Detection and Segmentation Models with Images from a Lidar as a Camera Sensor
Authors: Yu Xianjia, Sahar Salimpour, Jorge Peña Queralta, Tomi Westerlund
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.04064
Pdf link: https://arxiv.org/pdf/2203.04064
Abstract Over the last decade, robotic perception algorithms have significantly benefited from the rapid advances in deep learning (DL). Indeed, a significant amount of the autonomy stack of different commercial and research platforms relies on DL for situational awareness, especially vision sensors. This work explores the potential of general-purpose DL perception algorithms, specifically detection and segmentation neural networks, for processing image-like outputs of advanced lidar sensors. Rather than processing the three-dimensional point cloud data, this is, to the best of our knowledge, the first work to focus on low-resolution images with 360\textdegree field of view obtained with lidar sensors by encoding either depth, reflectivity, or near-infrared light in the image pixels. We show that with adequate preprocessing, general-purpose DL models can process these images, opening the door to their usage in environmental conditions where vision sensors present inherent limitations. We provide both a qualitative and quantitative analysis of the performance of a variety of neural network architectures. We believe that using DL models built for visual cameras offers significant advantages due to the much wider availability and maturity compared to point cloud-based perception.
A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds
Authors: Yan Xia, Qiangqiang Wu, Tianyu Yang, Wei Li, Antoni B. Chan, Uwe Stilla
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.04232
Pdf link: https://arxiv.org/pdf/2203.04232
Abstract Recent works on 3D single object tracking treat the tracking as a target-specific 3D detection task, where an off-the-shelf 3D detector is commonly employed for tracking. However, it is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete. In this paper, we address this issue by explicitly leveraging temporal motion cues and propose DMT, a Detector-free Motion prediction based 3D Tracking network that totally removes the usage of complicated 3D detectors, which is lighter, faster, and more accurate than previous trackers. Specifically, the motion prediction module is firstly introduced to estimate a potential target center of the current frame in a point-cloud free way. Then, an explicit voting module is proposed to directly regress the 3D box from the estimated target center. Extensive experiments on KITTI and NuScenes datasets demonstrate that our DMT, without applying any complicated 3D detectors, can still achieve better performance (~10% improvement on the NuScenes dataset) and faster tracking speed (i.e., 72 FPS) than state-of-the-art approaches. Our codes will be released publicly.
Keyword: loop detection

There is no result

Keyword: autonomous driving

Occupancy Flow Fields for Motion Forecasting in Autonomous Driving
Authors: Reza Mahjourian, Jinkyu Kim, Yuning Chai, Mingxing Tan, Ben Sapp, Dragomir Anguelov
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2203.03875
Pdf link: https://arxiv.org/pdf/2203.03875
Abstract We propose Occupancy Flow Fields, a new representation for motion forecasting of multiple agents, an important task in autonomous driving. Our representation is a spatio-temporal grid with each grid cell containing both the probability of the cell being occupied by any agent, and a two-dimensional flow vector representing the direction and magnitude of the motion in that cell. Our method successfully mitigates shortcomings of the two most commonly-used representations for motion forecasting: trajectory sets and occupancy grids. Although occupancy grids efficiently represent the probabilistic location of many agents jointly, they do not capture agent motion and lose the agent identities. To this end, we propose a deep learning architecture that generates Occupancy Flow Fields with the help of a new flow trace loss that establishes consistency between the occupancy and flow predictions. We demonstrate the effectiveness of our approach using three metrics on occupancy prediction, motion estimation, and agent ID recovery. In addition, we introduce the problem of predicting speculative agents, which are currently-occluded agents that may appear in the future through dis-occlusion or by entering the field of view. We report experimental results on a large in-house autonomous driving dataset and the public INTERACTION dataset, and show that our model outperforms state-of-the-art models.
BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs
Authors: Lang Peng, Zhirong Chen, Zhangjie Fu, Pengpeng Liang, Erkang Cheng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.04050
Pdf link: https://arxiv.org/pdf/2203.04050
Abstract Semantic segmentation in bird's eye view (BEV) is an important task for autonomous driving. Though this task has attracted a large amount of research efforts, it is still challenging to flexibly cope with arbitrary (single or multiple) camera sensors equipped on the autonomous vehicle. In this paper, we present BEVSegFormer, an effective transformer-based method for BEV semantic segmentation from arbitrary camera rigs. Specifically, our method first encodes image features from arbitrary cameras with a shared backbone. These image features are then enhanced by a deformable transformer-based encoder. Moreover, we introduce a BEV transformer decoder module to parse BEV semantic segmentation results. An efficient multi-camera deformable attention unit is designed to carry out the BEV-to-image view transformation. Finally, the queries are reshaped according the layout of grids in the BEV, and upsampled to produce the semantic segmentation result in a supervised manner. We evaluate the proposed algorithm on the public nuScenes dataset and a self-collected dataset. Experimental results show that our method achieves promising performance on BEV semantic segmentation from arbitrary camera rigs. We also demonstrate the effectiveness of each component via ablation study.
Lane Detection with Versatile AtrousFormer and Local Semantic Guidance
Authors: Jiaxing Yang, Lihe Zhang, Huchuan Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2203.04067
Pdf link: https://arxiv.org/pdf/2203.04067
Abstract Lane detection is one of the core functions in autonomous driving and has aroused widespread attention recently. The networks to segment lane instances, especially with bad appearance, must be able to explore lane distribution properties. Most existing methods tend to resort to CNN-based techniques. A few have a try on incorporating the recent adorable, the seq2seq Transformer \cite{transformer}. However, their innate drawbacks of weak global information collection ability and exorbitant computation overhead prohibit a wide range of the further applications. In this work, we propose Atrous Transformer (AtrousFormer) to solve the problem. Its variant local AtrousFormer is interleaved into feature extractor to enhance extraction. Their collecting information first by rows and then by columns in a dedicated manner finally equips our network with stronger information gleaning ability and better computation efficiency. To further improve the performance, we also propose a local semantic guided decoder to delineate the identities and shapes of lanes more accurately, in which the predicted Gaussian map of the starting point of each lane serves to guide the process. Extensive results on three challenging benchmarks (CULane, TuSimple, and BDD100K) show that our network performs favorably against the state of the arts.
Keyword: mapping

UniXcoder: Unified Cross-Modal Pre-training for Code Representation
Authors: Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, Jian Yin
Subjects: Computation and Language (cs.CL); Programming Languages (cs.PL); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2203.03850
Pdf link: https://arxiv.org/pdf/2203.03850
Abstract Pre-trained models for programming languages have recently demonstrated great success on code intelligence. To support both code-related understanding and generation tasks, recent works attempt to pre-train unified encoder-decoder models. However, such encoder-decoder framework is sub-optimal for auto-regressive tasks, especially code completion that requires a decoder-only manner for efficient inference. In this paper, we present UniXcoder, a unified cross-modal pre-trained model for programming language. The model utilizes mask attention matrices with prefix adapters to control the behavior of the model and leverages cross-modal contents like AST and code comment to enhance code representation. To encode AST that is represented as a tree in parallel, we propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree. Furthermore, we propose to utilize multi-modal contents to learn representation of code fragment with contrastive learning, and then align representations among programming languages using a cross-modal generation task. We evaluate UniXcoder on five code-related tasks over nine datasets. To further evaluate the performance of code fragment representation, we also construct a dataset for a new task, called zero-shot code-to-code search. Results show that our model achieves state-of-the-art performance on most tasks and analysis reveals that comment and AST can both enhance UniXcoder.
ROLL: Long-Term Robust LiDAR-based Localization With Temporary Mapping in Changing Environments
Authors: Bin Peng, Hongle Xie, Weidong Chen
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2203.03923
Pdf link: https://arxiv.org/pdf/2203.03923
Abstract Long-term scene changes present challenges to localization systems using a pre-built map. This paper presents a LiDAR-based system that can provide robust localization against those challenges. Our method starts with activation of a mapping process temporarily when global matching towards the pre-built map is unreliable. The temporary map will be merged onto the pre-built map for later localization runs once reliable matching is obtained again. We further integrate a LiDAR inertial odometry (LIO) to provide motion-compensated LiDAR scans and a reliable initial pose guess for the global matching module. To generate a smooth real-time trajectory for navigation purposes, we fuse poses from odometry and global matching by solving a pose graph optimization problem. We evaluate our localization system with extensive experiments on the NCLT dataset including a variety of changing indoor and outdoor environments, and the results demonstrate a robust and accurate localization performance for over a year. The implementations are open sourced on GitHub.
An Online Semantic Mapping System for Extending and Enhancing Visual SLAM
Authors: Thorsten Hempel, Ayoub Al-Hamadi
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.03944
Pdf link: https://arxiv.org/pdf/2203.03944
Abstract We present a real-time semantic mapping approach for mobile vision systems with a 2D to 3D object detection pipeline and rapid data association for generated landmarks. Besides the semantic map enrichment the associated detections are further introduced as semantic constraints into a simultaneous localization and mapping (SLAM) system for pose correction purposes. This way, we are able generate additional meaningful information that allows to achieve higher-level tasks, while simultaneously leveraging the view-invariance of object detections to improve the accuracy and the robustness of the odometry estimation. We propose tracklets of locally associated object observations to handle ambiguous and false predictions and an uncertainty-based greedy association scheme for an accelerated processing time. Our system reaches real-time capabilities with an average iteration duration of 65~ms and is able to improve the pose estimation of a state-of-the-art SLAM by up to 68% on a public dataset. Additionally, we implemented our approach as a modular ROS package that makes it straightforward for integration in arbitrary graph-based SLAM methods.
Universal Prototype Transport for Zero-Shot Action Recognition and Localization
Authors: Pascal Mettes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.03971
Pdf link: https://arxiv.org/pdf/2203.03971
Abstract This work addresses the problem of recognizing action categories in videos for which no training examples are available. The current state-of-the-art enables such a zero-shot recognition by learning universal mappings from videos to a shared semantic space, either trained on large-scale seen actions or on objects. While effective, we find that universal action and object mappings are biased to their seen categories. Such biases are further amplified due to biases between seen and unseen categories in the semantic space. The compounding biases result in many unseen action categories simply never being selected during inference, hampering zero-shot progress. We seek to address this limitation and introduce universal prototype transport for zero-shot action recognition. The main idea is to re-position the semantic prototypes of unseen actions through transduction, i.e. by using the distribution of the unlabelled test set. For universal action models, we first seek to find a hyperspherical optimal transport mapping from unseen action prototypes to the set of all projected test videos. We then define a target prototype for each unseen action as the weighted Fr\'echet mean over the transport couplings. Equipped with a target prototype, we propose to re-position unseen action prototypes along the geodesic spanned by the original and target prototypes, acting as a form of semantic regularization. For universal object models, we outline a variant that defines target prototypes based on an optimal transport between unseen action prototypes and semantic object prototypes. Empirically, we show that universal prototype transport diminishes the biased selection of unseen action prototypes and boosts both universal action and object models, resulting in state-of-the-art performance for zero-shot classification and spatio-temporal localization.
Bayesian Optimisation-Assisted Neural Network Training Technique for Radio Localisation
Authors: Xingchi Liu, Peizheng Li, Ziming Zhu
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2203.04032
Pdf link: https://arxiv.org/pdf/2203.04032
Abstract Radio signal-based (indoor) localisation technique is important for IoT applications such as smart factory and warehouse. Through machine learning, especially neural networks methods, more accurate mapping from signal features to target positions can be achieved. However, different radio protocols, such as WiFi, Bluetooth, etc., have different features in the transmitted signals that can be exploited for localisation purposes. Also, neural networks methods often rely on carefully configured models and extensive training processes to obtain satisfactory performance in individual localisation scenarios. The above poses a major challenge in the process of determining neural network model structure, or hyperparameters, as well as the selection of training features from the available data. This paper proposes a neural network model hyperparameter tuning and training method based on Bayesian optimisation. Adaptive selection of model hyperparameters and training features can be realised with minimal need for manual model training design. With the proposed technique, the training process is optimised in a more automatic and efficient way, enhancing the applicability of neural networks in localisation.
Proximal PanNet: A Model-Based Deep Network for Pansharpening
Authors: Xiangyong Cao, Yang Chen, Wenfei Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.04286
Pdf link: https://arxiv.org/pdf/2203.04286
Abstract Recently, deep learning techniques have been extensively studied for pansharpening, which aims to generate a high resolution multispectral (HRMS) image by fusing a low resolution multispectral (LRMS) image with a high resolution panchromatic (PAN) image. However, existing deep learning-based pansharpening methods directly learn the mapping from LRMS and PAN to HRMS. These network architectures always lack sufficient interpretability, which limits further performance improvements. To alleviate this issue, we propose a novel deep network for pansharpening by combining the model-based methodology with the deep learning method. Firstly, we build an observation model for pansharpening using the convolutional sparse coding (CSC) technique and design a proximal gradient algorithm to solve this model. Secondly, we unfold the iterative algorithm into a deep network, dubbed as Proximal PanNet, by learning the proximal operators using convolutional neural networks. Finally, all the learnable modules can be automatically learned in an end-to-end manner. Experimental results on some benchmark datasets show that our network performs better than other advanced methods both quantitatively and qualitatively.
Keyword: localization

Direct LiDAR-Inertial Odometry
Authors: Kenny Chen, Ryan Nemiroff, Brett T. Lopez
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2203.03749
Pdf link: https://arxiv.org/pdf/2203.03749
Abstract This paper proposes a new LiDAR-inertial odometry framework that generates accurate state estimates and detailed maps in real-time on resource-constrained mobile robots. Our Direct LiDAR-Inertial Odometry (DLIO) algorithm utilizes a hybrid architecture that combines the benefits of loosely-coupled and tightly-coupled IMU integration to enhance reliability and real-time performance while improving accuracy. The proposed architecture has two key elements. The first is a fast keyframe-based LiDAR scan-matcher that builds an internal map by registering dense point clouds to a local submap with a translational and rotational prior generated by a nonlinear motion model. The second is a factor graph and high-rate propagator that fuses the output of the scan-matcher with preintegrated IMU measurements for up-to-date pose, velocity, and bias estimates. These estimates enable us to accurately deskew the next point cloud using a nonlinear kinematic model for precise motion correction, in addition to initializing the next scan-to-map optimization prior. We demonstrate DLIO's superior localization accuracy, map quality, and lower computational overhead by comparing it to the state-of-the-art using multiple benchmark, public, and self-collected datasets on both consumer and hobby-grade hardware.
UWB-based Target Localization using Adaptive Belief Propagation in the HMM Framework
Authors: Minwon Seo, Solmaz S. Kia
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2203.03815
Pdf link: https://arxiv.org/pdf/2203.03815
Abstract This paper proposes a novel adaptive sample space-based Viterbi algorithm for ultra-wideband (UWB) based target localization in an online manner. As the discretized area of interest is defined as a finite number of hidden states, the most probable trajectory of the unspecified agent is computed efficiently via dynamic programming in a Hidden Markov Model (HMM) framework. Furthermore, the approach has no requirements about Gaussian assumption and linearization for Bayesian calculation. However, the issue of computational complexity becomes very critical as the number of hidden states increases for estimation accuracy and large space. Previous localization works, based on discrete-state HMM, handle a small number of hidden variables, which represent specific paths or places. Inspired by the k-d Tree algorithm (e.g., quadtree) that is commonly used in the computer vision field, we propose a belief propagation in the most probable belief space with a low to high-resolution sequentially, thus reducing the required resources significantly. Our method has three advantages for localization: (a) no Gaussian assumptions and linearization, (b) handling the whole area of interest, not specific or small map representations, (c) reducing computation time and required memory size. Experimental tests demonstrate our results.
Weakly Supervised Semantic Segmentation using Out-of-Distribution Data
Authors: Jungbeom Lee, Seong Joon Oh, Sangdoo Yun, Junsuk Choe, Eunji Kim, Sungroh Yoon
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.03860
Pdf link: https://arxiv.org/pdf/2203.03860
Abstract Weakly supervised semantic segmentation (WSSS) methods are often built on pixel-level localization maps obtained from a classifier. However, training on class labels only, classifiers suffer from the spurious correlation between foreground and background cues (e.g. train and rail), fundamentally bounding the performance of WSSS. There have been previous endeavors to address this issue with additional supervision. We propose a novel source of information to distinguish foreground from the background: Out-of-Distribution (OoD) data, or images devoid of foreground object classes. In particular, we utilize the hard OoDs that the classifier is likely to make false-positive predictions. These samples typically carry key visual features on the background (e.g. rail) that the classifiers often confuse as foreground (e.g. train), so these cues let classifiers correctly suppress spurious background cues. Acquiring such hard OoDs does not require an extensive amount of annotation efforts; it only incurs a few additional image-level labeling costs on top of the original efforts to collect class labels. We propose a method, W-OoD, for utilizing the hard OoDs. W-OoD achieves state-of-the-art performance on Pascal VOC 2012.
Towards Large-Scale Relative Localization in Multi-Robot Systems with Dynamic UWB Role Allocation
Authors: Paola Torrico Morón, Jorge Peña Queralta, Tomi Westerlund
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2203.03893
Pdf link: https://arxiv.org/pdf/2203.03893
Abstract Ultra-wideband (UWB) ranging has emerged as a key radio technology for robot positioning and relative localization in multi-robot systems. Multiple works are now advancing towards more scalable systems, but challenges still remain. This paper proposes a novel approach to relative localization in multi-robot systems where the roles of the UWB nodes are dynamically allocated between active nodes (using time-of-flight for ranging estimation to other active nodes) and passive nodes (using time-difference-of-arrival for estimating range differences with respect to pairs of active nodes). We adaptively update UWB roles based on the location of the robots with respect to the convex envelope defined by active nodes, and introducing constraints in the form of localization frequency and accuracy requirements. We demonstrate the applicability of the proposed approach and show that the localization errors remain comparable to fixed-role systems. Then, we show how the navigation of an autonomous drone is affected by the changes in the localization system, obtaining significantly better trajectory tracking accuracy than when relying in passive localization only. Our results pave the way for UWB-based localization in large-scale multi-robot deployments, for either relative positioning or for applications in GNSS-denied environments.
ROLL: Long-Term Robust LiDAR-based Localization With Temporary Mapping in Changing Environments
Authors: Bin Peng, Hongle Xie, Weidong Chen
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2203.03923
Pdf link: https://arxiv.org/pdf/2203.03923
Abstract Long-term scene changes present challenges to localization systems using a pre-built map. This paper presents a LiDAR-based system that can provide robust localization against those challenges. Our method starts with activation of a mapping process temporarily when global matching towards the pre-built map is unreliable. The temporary map will be merged onto the pre-built map for later localization runs once reliable matching is obtained again. We further integrate a LiDAR inertial odometry (LIO) to provide motion-compensated LiDAR scans and a reliable initial pose guess for the global matching module. To generate a smooth real-time trajectory for navigation purposes, we fuse poses from odometry and global matching by solving a pose graph optimization problem. We evaluate our localization system with extensive experiments on the NCLT dataset including a variety of changing indoor and outdoor environments, and the results demonstrate a robust and accurate localization performance for over a year. The implementations are open sourced on GitHub.
An Online Semantic Mapping System for Extending and Enhancing Visual SLAM
Authors: Thorsten Hempel, Ayoub Al-Hamadi
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.03944
Pdf link: https://arxiv.org/pdf/2203.03944
Abstract We present a real-time semantic mapping approach for mobile vision systems with a 2D to 3D object detection pipeline and rapid data association for generated landmarks. Besides the semantic map enrichment the associated detections are further introduced as semantic constraints into a simultaneous localization and mapping (SLAM) system for pose correction purposes. This way, we are able generate additional meaningful information that allows to achieve higher-level tasks, while simultaneously leveraging the view-invariance of object detections to improve the accuracy and the robustness of the odometry estimation. We propose tracklets of locally associated object observations to handle ambiguous and false predictions and an uncertainty-based greedy association scheme for an accelerated processing time. Our system reaches real-time capabilities with an average iteration duration of 65~ms and is able to improve the pose estimation of a state-of-the-art SLAM by up to 68% on a public dataset. Additionally, we implemented our approach as a modular ROS package that makes it straightforward for integration in arbitrary graph-based SLAM methods.
Universal Prototype Transport for Zero-Shot Action Recognition and Localization
Authors: Pascal Mettes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.03971
Pdf link: https://arxiv.org/pdf/2203.03971
Abstract This work addresses the problem of recognizing action categories in videos for which no training examples are available. The current state-of-the-art enables such a zero-shot recognition by learning universal mappings from videos to a shared semantic space, either trained on large-scale seen actions or on objects. While effective, we find that universal action and object mappings are biased to their seen categories. Such biases are further amplified due to biases between seen and unseen categories in the semantic space. The compounding biases result in many unseen action categories simply never being selected during inference, hampering zero-shot progress. We seek to address this limitation and introduce universal prototype transport for zero-shot action recognition. The main idea is to re-position the semantic prototypes of unseen actions through transduction, i.e. by using the distribution of the unlabelled test set. For universal action models, we first seek to find a hyperspherical optimal transport mapping from unseen action prototypes to the set of all projected test videos. We then define a target prototype for each unseen action as the weighted Fr\'echet mean over the transport couplings. Equipped with a target prototype, we propose to re-position unseen action prototypes along the geodesic spanned by the original and target prototypes, acting as a form of semantic regularization. For universal object models, we outline a variant that defines target prototypes based on an optimal transport between unseen action prototypes and semantic object prototypes. Empirically, we show that universal prototype transport diminishes the biased selection of unseen action prototypes and boosts both universal action and object models, resulting in state-of-the-art performance for zero-shot classification and spatio-temporal localization.
End-to-End Semi-Supervised Learning for Video Action Detection
Authors: Akash Kumar, Yogesh Singh Rawat
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.04251
Pdf link: https://arxiv.org/pdf/2203.04251
Abstract In this work, we focus on semi-supervised learning for video action detection which utilizes both labeled as well as unlabeled data. We propose a simple end-to-end consistency based approach which effectively utilizes the unlabeled data. Video action detection requires both, action class prediction as well as a spatio-temporal localization of actions. Therefore, we investigate two types of constraints, classification consistency, and spatio-temporal consistency. The presence of predominant background and static regions in a video makes it challenging to utilize spatio-temporal consistency for action detection. To address this, we propose two novel regularization constraints for spatio-temporal consistency; 1) temporal coherency, and 2) gradient smoothness. Both these aspects exploit the temporal continuity of action in videos and are found to be effective for utilizing unlabeled videos for action detection. We demonstrate the effectiveness of the proposed approach on two different action detection benchmark datasets, UCF101-24 and JHMDB-21. In addition, we also show the effectiveness of the proposed approach for video object segmentation on the Youtube-VOS dataset which demonstrates its generalization capability to other tasks. The proposed approach achieves competitive performance by using merely 20% of annotations on UCF101-24 when compared with recent fully supervised methods. On UCF101-24, it improves the score by +8.9% and +11% at 0.5 f-mAP and v-mAP respectively, compared to supervised approach.

zhuhu00 / Paper-Daily-Notice

New submissions for Wed, 9 Mar 22 #116

Keyword: SLAM

An Online Semantic Mapping System for Extending and Enhancing Visual SLAM

Keyword: Visual inertial

Keyword: livox

Keyword: loam

Keyword: Visual inertial odometry

Keyword: lidar

Direct LiDAR-Inertial Odometry

ROLL: Long-Term Robust LiDAR-based Localization With Temporary Mapping in Changing Environments

Analyzing General-Purpose Deep-Learning Detection and Segmentation Models with Images from a Lidar as a Camera Sensor

A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds

Keyword: loop detection

Keyword: autonomous driving

Occupancy Flow Fields for Motion Forecasting in Autonomous Driving

BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs

Lane Detection with Versatile AtrousFormer and Local Semantic Guidance

Keyword: mapping

UniXcoder: Unified Cross-Modal Pre-training for Code Representation

ROLL: Long-Term Robust LiDAR-based Localization With Temporary Mapping in Changing Environments

An Online Semantic Mapping System for Extending and Enhancing Visual SLAM

Universal Prototype Transport for Zero-Shot Action Recognition and Localization

Bayesian Optimisation-Assisted Neural Network Training Technique for Radio Localisation

Proximal PanNet: A Model-Based Deep Network for Pansharpening

Keyword: localization

Direct LiDAR-Inertial Odometry

UWB-based Target Localization using Adaptive Belief Propagation in the HMM Framework

Weakly Supervised Semantic Segmentation using Out-of-Distribution Data

Towards Large-Scale Relative Localization in Multi-Robot Systems with Dynamic UWB Role Allocation

ROLL: Long-Term Robust LiDAR-based Localization With Temporary Mapping in Changing Environments

An Online Semantic Mapping System for Extending and Enhancing Visual SLAM

Universal Prototype Transport for Zero-Shot Action Recognition and Localization

End-to-End Semi-Supervised Learning for Video Action Detection