Abstract
We present an unsupervised simultaneous learning framework for the task of monocular camera re-localization and depth estimation from unlabeled video sequences. Monocular camera re-localization refers to the task of estimating the absolute camera pose from an instance image in a known environment, which has been intensively studied for alternative localization in GPS-denied environments. In recent works, camera re-localization methods are trained via supervised learning from pairs of camera images and camera poses. In contrast to previous works, we propose a completely unsupervised learning framework for camera re-localization and depth estimation, requiring only monocular video sequences for training. In our framework, we train two networks that estimate the scene coordinates using directions and the depth map from each image which are then combined to estimate the camera pose. The networks can be trained through the minimization of loss functions based on our loop closed view synthesis. In experiments with the 7-scenes dataset, the proposed method outperformed the re-localization of the state-of-the-art visual SLAM, ORB-SLAM3. Our method also outperforms state-of-the-art monocular depth estimation in a trained environment.
MD-SLAM: Multi-cue Direct SLAM
Authors: Luca Di Giammarino, Leonardo Brizi, Tiziano Guadagnino, Cyrill Stachniss, Giorgio Grisetti
Abstract
Simultaneous Localization and Mapping (SLAM) systems are fundamental building blocks for any autonomous robot navigating in unknown environments. The SLAM implementation heavily depends on the sensor modality employed on the mobile platform. For this reason, assumptions on the scene's structure are often made to maximize estimation accuracy. This paper presents a novel direct 3D SLAM pipeline that works independently for RGB-D and LiDAR sensors. Building upon prior work on multi-cue photometric frame-to-frame alignment, our proposed approach provides an easy-to-extend and generic SLAM system. Our pipeline requires only minor adaptations within the projection model to handle different sensor modalities. We couple a position tracking system with an appearance-based relocalization mechanism that handles large loop closures. Loop closures are validated by the same direct registration algorithm used for odometry estimation. We present comparative experiments with state-of-the-art approaches on publicly available benchmarks using RGB-D cameras and 3D LiDARs. Our system performs well in heterogeneous datasets compared to other sensor-specific methods while making no assumptions about the environment. Finally, we release an open-source C++ implementation of our system.
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Robust Onboard Localization in Changing Environments Exploiting Text Spotting
Authors: Nicky Zimmerman, Louis Wiesmann, Tiziano Guadagnino, Thomas Läbe, Jens Behley, Cyrill Stachniss
Abstract
Robust localization in a given map is a crucial component of most autonomous robots. In this paper, we address the problem of localizing in an indoor environment that changes and where prominent structures have no correspondence in the map built at a different point in time. To overcome the discrepancy between the map and the observed environment caused by such changes, we exploit human-readable localization cues to assist localization. These cues are readily available in most facilities and can be detected using RGB camera images by utilizing text spotting. We integrate these cues into a Monte Carlo localization framework using a particle filter that operates on 2D LiDAR scans and camera data. By this, we provide a robust localization solution for environments with structural changes and dynamics by humans walking. We evaluate our localization framework on multiple challenging indoor scenarios in an office environment. The experiments suggest that our approach is robust to structural changes and can run on an onboard computer. We release an open source implementation of our approach (upon paper acceptance), which uses off-the-shelf text spotting, written in C++ with a ROS wrapper.
AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception
Abstract
Studying the inherent symmetry of data is of great importance in machine learning. Point cloud, the most important data format for 3D environmental perception, is naturally endowed with strong radial symmetry. In this work, we exploit this radial symmetry via a divide-and-conquer strategy to boost 3D perception performance and ease optimization. We propose Azimuth Normalization (AziNorm), which normalizes the point clouds along the radial direction and eliminates the variability brought by the difference of azimuth. AziNorm can be flexibly incorporated into most LiDAR-based perception methods. To validate its effectiveness and generalization ability, we apply AziNorm in both object detection and semantic segmentation. For detection, we integrate AziNorm into two representative detection methods, the one-stage SECOND detector and the state-of-the-art two-stage PV-RCNN detector. Experiments on Waymo Open Dataset demonstrate that AziNorm improves SECOND and PV-RCNN by 7.03 mAPH and 3.01 mAPH respectively. For segmentation, we integrate AziNorm into KPConv. On SemanticKitti dataset, AziNorm improves KPConv by 1.6/1.1 mIoU on val/test set. Besides, AziNorm remarkably improves data efficiency and accelerates convergence, reducing the requirement of data amounts or training epochs by an order of magnitude. SECOND w/ AziNorm can significantly outperform fully trained vanilla SECOND, even trained with only 10% data or 10% epochs. Code and models are available at https://github.com/hustvl/AziNorm.
MD-SLAM: Multi-cue Direct SLAM
Authors: Luca Di Giammarino, Leonardo Brizi, Tiziano Guadagnino, Cyrill Stachniss, Giorgio Grisetti
Abstract
Simultaneous Localization and Mapping (SLAM) systems are fundamental building blocks for any autonomous robot navigating in unknown environments. The SLAM implementation heavily depends on the sensor modality employed on the mobile platform. For this reason, assumptions on the scene's structure are often made to maximize estimation accuracy. This paper presents a novel direct 3D SLAM pipeline that works independently for RGB-D and LiDAR sensors. Building upon prior work on multi-cue photometric frame-to-frame alignment, our proposed approach provides an easy-to-extend and generic SLAM system. Our pipeline requires only minor adaptations within the projection model to handle different sensor modalities. We couple a position tracking system with an appearance-based relocalization mechanism that handles large loop closures. Loop closures are validated by the same direct registration algorithm used for odometry estimation. We present comparative experiments with state-of-the-art approaches on publicly available benchmarks using RGB-D cameras and 3D LiDARs. Our system performs well in heterogeneous datasets compared to other sensor-specific methods while making no assumptions about the environment. Finally, we release an open-source C++ implementation of our system.
Keyword: loop detection
There is no result
Keyword: autonomous driving
Extending MAPE-K to support Human-Machine Teaming
Authors: Jane Cleland-Huang, Ankit Agrawal, Michael Vierhauser, Michael Murphy, Mike Prieto
Abstract
The MAPE-K feedback loop has been established as the primary reference model for self-adaptive and autonomous systems in domains such as autonomous driving, robotics, and Cyber-Physical Systems. At the same time, the Human Machine Teaming (HMT) paradigm is designed to promote partnerships between humans and autonomous machines. It goes far beyond the degree of collaboration expected in human-on-the-loop and human-in-the-loop systems and emphasizes interactions, partnership, and teamwork between humans and machines. However, while MAPE-K enables fully autonomous behavior, it does not explicitly address the interactions between humans and machines as intended by HMT. In this paper, we present the MAPE-K-HMT framework which augments the traditional MAPE-K loop with support for HMT. We identify critical human-machine teaming factors and describe the infrastructure needed across the various phases of the MAPE-K loop in order to effectively support HMT. This includes runtime models that are constructed and populated dynamically across monitoring, analysis, planning, and execution phases to support human-machine partnerships. We illustrate MAPE-K-HMT using examples from an autonomous multi-UAV emergency response system, and present guidelines for integrating HMT into MAPE-K.
Keyword: mapping
Methods2Test: A dataset of focal methods mapped to test cases
Authors: Michele Tufano, Shao Kun Deng, Neel Sundaresan, Alexey Svyatkovskiy
Abstract
Unit testing is an essential part of the software development process, which helps to identify issues with source code in early stages of development and prevent regressions. Machine learning has emerged as viable approach to help software developers generate automated unit tests. However, generating reliable unit test cases that are semantically correct and capable of catching software bugs or unintended behavior via machine learning requires large, metadata-rich, datasets. In this paper we present Methods2Test: A dataset of focal methods mapped to test cases: a large, supervised dataset of test cases mapped to corresponding methods under test (i.e., focal methods). This dataset contains 780,944 pairs of JUnit tests and focal methods, extracted from a total of 91,385 Java open source projects hosted on GitHub with licenses permitting re-distribution. The main challenge behind the creation of the Methods2Test was to establish a reliable mapping between a test case and the relevant focal method. To this aim, we designed a set of heuristics, based on developers' best practices in software testing, which identify the likely focal method for a given test case. To facilitate further analysis, we store a rich set of metadata for each method-test pair in JSON-formatted files. Additionally, we extract textual corpus from the dataset at different context levels, which we provide both in raw and tokenized forms, in order to enable researchers to train and evaluate machine learning models for Automated Test Generation. Methods2Test is publicly available at: https://github.com/microsoft/methods2test
Intrinsic Bias Identification on Medical Image Datasets
Authors: Shijie Zhang, Lanjun Wang, Lian Ding, Senhua Zhu, Dandan Tu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Machine learning based medical image analysis highly depends on datasets. Biases in the dataset can be learned by the model and degrade the generalizability of the applications. There are studies on debiased models. However, scientists and practitioners are difficult to identify implicit biases in the datasets, which causes lack of reliable unbias test datasets to valid models. To tackle this issue, we first define the data intrinsic bias attribute, and then propose a novel bias identification framework for medical image datasets. The framework contains two major components, KlotskiNet and Bias Discriminant Direction Analysis(bdda), where KlostkiNet is to build the mapping which makes backgrounds to distinguish positive and negative samples and bdda provides a theoretical solution on determining bias attributes. Experimental results on three datasets show the effectiveness of the bias attributes discovered by the framework.
WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation
Abstract
We propose WarpingGAN, an effective and efficient 3D point cloud generation network. Unlike existing methods that generate point clouds by directly learning the mapping functions between latent codes and 3D shapes, Warping-GAN learns a unified local-warping function to warp multiple identical pre-defined priors (i.e., sets of points uniformly distributed on regular 3D grids) into 3D shapes driven by local structure-aware semantics. In addition, we also ingeniously utilize the principle of the discriminator and tailor a stitching loss to eliminate the gaps between different partitions of a generated shape corresponding to different priors for boosting quality. Owing to the novel generating mechanism, WarpingGAN, a single lightweight network after one-time training, is capable of efficiently generating uniformly distributed 3D point clouds with various resolutions. Extensive experimental results demonstrate the superiority of our WarpingGAN over state-of-the-art methods in terms of quantitative metrics, visual quality, and efficiency. The source code is publicly available at https://github.com/yztang4/WarpingGAN.git.
Learning Dense Correspondence from Synthetic Environments
Authors: Mithun Lal, Anthony Paproki, Nariman Habili, Lars Petersson, Olivier Salvado, Clinton Fookes
Abstract
Estimation of human shape and pose from a single image is a challenging task. It is an even more difficult problem to map the identified human shape onto a 3D human model. Existing methods map manually labelled human pixels in real 2D images onto the 3D surface, which is prone to human error, and the sparsity of available annotated data often leads to sub-optimal results. We propose to solve the problem of data scarcity by training 2D-3D human mapping algorithms using automatically generated synthetic data for which exact and dense 2D-3D correspondence is known. Such a learning strategy using synthetic environments has a high generalisation potential towards real-world data. Using different camera parameter variations, background and lighting settings, we created precise ground truth data that constitutes a wider distribution. We evaluate the performance of models trained on synthetic using the COCO dataset and validation framework. Results show that training 2D-3D mapping network models on synthetic data is a viable alternative to using real data.
MD-SLAM: Multi-cue Direct SLAM
Authors: Luca Di Giammarino, Leonardo Brizi, Tiziano Guadagnino, Cyrill Stachniss, Giorgio Grisetti
Abstract
Simultaneous Localization and Mapping (SLAM) systems are fundamental building blocks for any autonomous robot navigating in unknown environments. The SLAM implementation heavily depends on the sensor modality employed on the mobile platform. For this reason, assumptions on the scene's structure are often made to maximize estimation accuracy. This paper presents a novel direct 3D SLAM pipeline that works independently for RGB-D and LiDAR sensors. Building upon prior work on multi-cue photometric frame-to-frame alignment, our proposed approach provides an easy-to-extend and generic SLAM system. Our pipeline requires only minor adaptations within the projection model to handle different sensor modalities. We couple a position tracking system with an appearance-based relocalization mechanism that handles large loop closures. Loop closures are validated by the same direct registration algorithm used for odometry estimation. We present comparative experiments with state-of-the-art approaches on publicly available benchmarks using RGB-D cameras and 3D LiDARs. Our system performs well in heterogeneous datasets compared to other sensor-specific methods while making no assumptions about the environment. Finally, we release an open-source C++ implementation of our system.
Keyword: localization
Robust Onboard Localization in Changing Environments Exploiting Text Spotting
Authors: Nicky Zimmerman, Louis Wiesmann, Tiziano Guadagnino, Thomas Läbe, Jens Behley, Cyrill Stachniss
Abstract
Robust localization in a given map is a crucial component of most autonomous robots. In this paper, we address the problem of localizing in an indoor environment that changes and where prominent structures have no correspondence in the map built at a different point in time. To overcome the discrepancy between the map and the observed environment caused by such changes, we exploit human-readable localization cues to assist localization. These cues are readily available in most facilities and can be detected using RGB camera images by utilizing text spotting. We integrate these cues into a Monte Carlo localization framework using a particle filter that operates on 2D LiDAR scans and camera data. By this, we provide a robust localization solution for environments with structural changes and dynamics by humans walking. We evaluate our localization framework on multiple challenging indoor scenarios in an office environment. The experiments suggest that our approach is robust to structural changes and can run on an onboard computer. We release an open source implementation of our approach (upon paper acceptance), which uses off-the-shelf text spotting, written in C++ with a ROS wrapper.
Functional mimicry of Ruffini receptors with Fiber Bragg Gratings and Deep Neural Networks enables a bio-inspired large-area tactile sensitive skin
Authors: Luca Massari, Giulia Fransvea, Jessica D'Abbraccio, Mariangela Filosa, Giuseppe Terruso, Andrea Aliperta, Giacomo D'Alesio, Martina Zaltieri, Emiliano Schena, Eduardo Palermo, Edoardo Sinibaldi, Calogero Maria Oddo
Subjects: Robotics (cs.RO); Signal Processing (eess.SP); Systems and Control (eess.SY)
Abstract
Collaborative robots are expected to physically interact with humans in daily living and workplace, including industrial and healthcare settings. A related key enabling technology is tactile sensing, which currently requires addressing the outstanding scientific challenge to simultaneously detect contact location and intensity by means of soft conformable artificial skins adapting over large areas to the complex curved geometries of robot embodiments. In this work, the development of a large-area sensitive soft skin with a curved geometry is presented, allowing for robot total-body coverage through modular patches. The biomimetic skin consists of a soft polymeric matrix, resembling a human forearm, embedded with photonic Fiber Bragg Grating (FBG) transducers, which partially mimics Ruffini mechanoreceptor functionality with diffuse, overlapping receptive fields. A Convolutional Neural Network deep learning algorithm and a multigrid Neuron Integration Process were implemented to decode the FBG sensor outputs for inferring contact force magnitude and localization through the skin surface. Results achieved 35 mN (IQR = 56 mN) and 3.2 mm (IQR = 2.3 mm) median errors, for force and localization predictions, respectively. Demonstrations with an anthropomorphic arm pave the way towards AI-based integrated skins enabling safe human-robot cooperation via machine intelligence.
Unsupervised Simultaneous Learning for Camera Re-Localization and Depth Estimation from Video
Authors: Shun Taguchi, Noriaki Hirose
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
We present an unsupervised simultaneous learning framework for the task of monocular camera re-localization and depth estimation from unlabeled video sequences. Monocular camera re-localization refers to the task of estimating the absolute camera pose from an instance image in a known environment, which has been intensively studied for alternative localization in GPS-denied environments. In recent works, camera re-localization methods are trained via supervised learning from pairs of camera images and camera poses. In contrast to previous works, we propose a completely unsupervised learning framework for camera re-localization and depth estimation, requiring only monocular video sequences for training. In our framework, we train two networks that estimate the scene coordinates using directions and the depth map from each image which are then combined to estimate the camera pose. The networks can be trained through the minimization of loss functions based on our loop closed view synthesis. In experiments with the 7-scenes dataset, the proposed method outperformed the re-localization of the state-of-the-art visual SLAM, ORB-SLAM3. Our method also outperforms state-of-the-art monocular depth estimation in a trained environment.
Keypoints Tracking via Transformer Networks
Authors: Oleksii Nasypanyi, Francois Rameau
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this thesis, we propose a pioneering work on sparse keypoints tracking across images using transformer networks. While deep learning-based keypoints matching have been widely investigated using graph neural networks - and more recently transformer networks, they remain relatively too slow to operate in real-time and are particularly sensitive to the poor repeatability of the keypoints detectors. In order to address these shortcomings, we propose to study the particular case of real-time and robust keypoints tracking. Specifically, we propose a novel architecture which ensures a fast and robust estimation of the keypoints tracking between successive images of a video sequence. Our method takes advantage of a recent breakthrough in computer vision, namely, visual transformer networks. Our method consists of two successive stages, a coarse matching followed by a fine localization of the keypoints' correspondences prediction. Through various experiments, we demonstrate that our approach achieves competitive results and demonstrates high robustness against adverse conditions, such as illumination change, occlusion and viewpoint differences.
Is Geometry Enough for Matching in Visual Localization?
Authors: Qunjie Zhou, Sergio Agostinho, Aljosa Osep, Laura Leal-Taixe
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we propose to go beyond the well-established approach to vision-based localization that relies on visual descriptor matching between a query image and a 3D point cloud. While matching keypoints via visual descriptors makes localization highly accurate, it has significant storage demands, raises privacy concerns and increases map maintenance complexity. To elegantly address those practical challenges for large-scale localization, we present GoMatch, an alternative to visual-based matching that solely relies on geometric information for matching image keypoints to maps, represented as sets of bearing vectors. Our novel bearing vectors representation of 3D points, significantly relieves the cross-domain challenge in geometric-based matching that prevented prior work to tackle localization in a realistic environment. With additional careful architecture design, GoMatch improves over prior geometric-based matching work with a reduction of ($10.67m, 95.7^{\circ}$) and ($1.43m$, $34.7^{\circ}$) in average median pose errors on Cambridge Landmarks and 7-Scenes, while requiring as little as $1.5/1.7\%$ of storage capacity in comparison to the best visual-based matching methods. This confirms its potential and feasibility for real-world localization and opens the door to future efforts in advancing city-scale visual localization methods that do not require storing visual descriptors.
A Simulation Benchmark for Vision-based Autonomous Navigation
Authors: Lauri Suomela, Atakan Dag, Harry Edelman, Joni-Kristian Kämäräinen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This work introduces a simulator benchmark for vision-based autonomous navigation. The simulator offers control over real world variables such as the environment, time of day, weather and traffic. The benchmark includes a modular integration of different components of a full autonomous visual navigation stack. In the experimental part of the paper, state-of-the-art visual localization methods are evaluated as a part of the stack in realistic navigation tasks. To the authors' best knowledge, the proposed benchmark is the first to study modern visual localization methods as part of a full autonomous visual navigation stack.
MD-SLAM: Multi-cue Direct SLAM
Authors: Luca Di Giammarino, Leonardo Brizi, Tiziano Guadagnino, Cyrill Stachniss, Giorgio Grisetti
Abstract
Simultaneous Localization and Mapping (SLAM) systems are fundamental building blocks for any autonomous robot navigating in unknown environments. The SLAM implementation heavily depends on the sensor modality employed on the mobile platform. For this reason, assumptions on the scene's structure are often made to maximize estimation accuracy. This paper presents a novel direct 3D SLAM pipeline that works independently for RGB-D and LiDAR sensors. Building upon prior work on multi-cue photometric frame-to-frame alignment, our proposed approach provides an easy-to-extend and generic SLAM system. Our pipeline requires only minor adaptations within the projection model to handle different sensor modalities. We couple a position tracking system with an appearance-based relocalization mechanism that handles large loop closures. Loop closures are validated by the same direct registration algorithm used for odometry estimation. We present comparative experiments with state-of-the-art approaches on publicly available benchmarks using RGB-D cameras and 3D LiDARs. Our system performs well in heterogeneous datasets compared to other sensor-specific methods while making no assumptions about the environment. Finally, we release an open-source C++ implementation of our system.
Keyword: SLAM
Unsupervised Simultaneous Learning for Camera Re-Localization and Depth Estimation from Video
MD-SLAM: Multi-cue Direct SLAM
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Robust Onboard Localization in Changing Environments Exploiting Text Spotting
AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception
MD-SLAM: Multi-cue Direct SLAM
Keyword: loop detection
There is no result
Keyword: autonomous driving
Extending MAPE-K to support Human-Machine Teaming
Keyword: mapping
Methods2Test: A dataset of focal methods mapped to test cases
Intrinsic Bias Identification on Medical Image Datasets
WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation
Learning Dense Correspondence from Synthetic Environments
MD-SLAM: Multi-cue Direct SLAM
Keyword: localization
Robust Onboard Localization in Changing Environments Exploiting Text Spotting
Functional mimicry of Ruffini receptors with Fiber Bragg Gratings and Deep Neural Networks enables a bio-inspired large-area tactile sensitive skin
Unsupervised Simultaneous Learning for Camera Re-Localization and Depth Estimation from Video
Keypoints Tracking via Transformer Networks
Is Geometry Enough for Matching in Visual Localization?
A Simulation Benchmark for Vision-based Autonomous Navigation
MD-SLAM: Multi-cue Direct SLAM