New submissions for Wed, 2 Mar 22

Keyword: SLAM

Collaborative Robot Mapping using Spectral Graph Analysis

Authors: Lukas Bernreiter, Shehryar Khattak, Lionel Ott, Roland Siegwart, Marco Hutter, Cesar Cadena
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2203.00308
Pdf link: https://arxiv.org/pdf/2203.00308
Abstract In this paper, we deal with the problem of creating globally consistent pose graphs in a centralized multi-robot SLAM framework. For each robot to act autonomously, individual onboard pose estimates and maps are maintained, which are then communicated to a central server to build an optimized global map. However, inconsistencies between onboard and server estimates can occur due to onboard odometry drift or failure. Furthermore, robots do not benefit from the collaborative map if the server provides no feedback in a computationally tractable and bandwidth-efficient manner. Motivated by this challenge, this paper proposes a novel collaborative mapping framework to enable accurate global mapping among robots and server. In particular, structural differences between robot and server graphs are exploited at different spatial scales using graph spectral analysis to generate necessary constraints for the individual robot pose graphs. The proposed approach is thoroughly analyzed and validated using several real-world multi-robot field deployments where we show improvements of the onboard system up to 90%.
Descriptellation: Deep Learned Constellation Descriptors for SLAM
Authors: Chunwei Xing, Xinyu Sun, Andrei Cramariuc, Samuel Gull, Jen Jen Chung, Cesar Cadena, Roland Siegwart, Florian Tschopp
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.00567
Pdf link: https://arxiv.org/pdf/2203.00567
Abstract Current global localization descriptors in Simultaneous Localization and Mapping (SLAM) often fail under vast viewpoint or appearance changes. Adding topological information of semantic objects into the descriptors ameliorates the problem. However, hand-crafted topological descriptors extract limited information and they are not robust to environmental noise, drastic perspective changes, or object occlusion or misdetections. To solve this problem, we formulate a learning-based approach by constructing constellations from semantically meaningful objects and use Deep Graph Convolution Networks to map the constellation representation to a descriptor. We demonstrate the effectiveness of our Deep Learned Constellation Descriptor (Descriptellation) on the Paris-Rue-Lille and IQmulus datasets. Although Descriptellation is trained on randomly generated simulation datasets, it shows good generalization abilities on real-world datasets. Descriptellation outperforms the PointNet and handcrafted constellation descriptors for global localization, and shows robustness against different types of noise.
Keyword: Visual inertial

There is no result

Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: Visual inertial odometry

There is no result

Keyword: lidar

Elliptical Slice Sampling for Probabilistic Verification of Stochastic Systems with Signal Temporal Logic Specifications
Authors: Guy Scher, Sadra Sadraddini, Russ Tedrake, Hadas Kress-Gazit
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2203.00078
Pdf link: https://arxiv.org/pdf/2203.00078
Abstract Autonomous robots typically incorporate complex sensors in their decision-making and control loops. These sensors, such as cameras and Lidars, have imperfections in their sensing and are influenced by environmental conditions. In this paper, we present a method for probabilistic verification of linearizable systems with Gaussian and Gaussian mixture noise models (e.g. from perception modules, machine learning components). We compute the probabilities of task satisfaction under Signal Temporal Logic (STL) specifications, using its robustness semantics, with a Markov Chain Monte-Carlo slice sampler. As opposed to other techniques, our method avoids over-approximations and double-counting of failure events. Central to our approach is a method for efficient and rejection-free sampling of signals from a Gaussian distribution such that satisfy or violate a given STL formula. We show illustrative examples from applications in robot motion planning.
Deep Camera Pose Regression Using Pseudo-LiDAR
Authors: Ali Raza, Lazar Lolic, Shahmir Akhter, Alfonso Dela Cruz, Michael Liut
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2203.00080
Pdf link: https://arxiv.org/pdf/2203.00080
Abstract An accurate and robust large-scale localization system is an integral component for active areas of research such as autonomous vehicles and augmented reality. To this end, many learning algorithms have been proposed that predict 6DOF camera pose from RGB or RGB-D images. However, previous methods that incorporate depth typically treat the data the same way as RGB images, often adding depth maps as additional channels to RGB images and passing them through convolutional neural networks (CNNs). In this paper, we show that converting depth maps into pseudo-LiDAR signals, previously shown to be useful for 3D object detection, is a better representation for camera localization tasks by projecting point clouds that can accurately determine 6DOF camera pose. This is demonstrated by first comparing localization accuracies of a network operating exclusively on pseudo-LiDAR representations, with networks operating exclusively on depth maps. We then propose FusionLoc, a novel architecture that uses pseudo-LiDAR to regress a 6DOF camera pose. FusionLoc is a dual stream neural network, which aims to remedy common issues with typical 2D CNNs operating on RGB-D images. The results from this architecture are compared against various other state-of-the-art deep pose regression implementations using the 7 Scenes dataset. The findings are that FusionLoc performs better than a number of other camera localization methods, with a notable improvement being, on average, 0.33m and 4.35{\deg} more accurate than RGB-D PoseNet. By proving the validity of using pseudo-LiDAR signals over depth maps for localization, there are new considerations when implementing large-scale localization systems.
Spatiotemporal Transformer Attention Network for 3D Voxel Level Joint Segmentation and Motion Prediction in Point Cloud
Authors: Zhensong Wei, Xuewei Qi, Zhengwei Bai, Guoyuan Wu, Saswat Nayak, Peng Hao, Matthew Barth, Yongkang Liu, Kentaro Oguchi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.00138
Pdf link: https://arxiv.org/pdf/2203.00138
Abstract Environment perception including detection, classification, tracking, and motion prediction are key enablers for automated driving systems and intelligent transportation applications. Fueled by the advances in sensing technologies and machine learning techniques, LiDAR-based sensing systems have become a promising solution. The current challenges of this solution are how to effectively combine different perception tasks into a single backbone and how to efficiently learn the spatiotemporal features directly from point cloud sequences. In this research, we propose a novel spatiotemporal attention network based on a transformer self-attention mechanism for joint semantic segmentation and motion prediction within a point cloud at the voxel level. The network is trained to simultaneously outputs the voxel level class and predicted motion by learning directly from a sequence of point cloud datasets. The proposed backbone includes both a temporal attention module (TAM) and a spatial attention module (SAM) to learn and extract the complex spatiotemporal features. This approach has been evaluated with the nuScenes dataset, and promising performance has been achieved.
Understanding the Challenges When 3D Semantic Segmentation Faces Class Imbalanced and OOD Data
Authors: Yancheng Pan, Fan Xie, Huijing Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2203.00214
Pdf link: https://arxiv.org/pdf/2203.00214
Abstract 3D semantic segmentation (3DSS) is an essential process in the creation of a safe autonomous driving system. However, deep learning models for 3D semantic segmentation often suffer from the class imbalance problem and out-of-distribution (OOD) data. In this study, we explore how the class imbalance problem affects 3DSS performance and whether the model can detect the category prediction correctness, or whether data is ID (in-distribution) or OOD. For these purposes, we conduct two experiments using three representative 3DSS models and five trust scoring methods, and conduct both a confusion and feature analysis of each class. Furthermore, a data augmentation method for the 3D LiDAR dataset is proposed to create a new dataset based on SemanticKITTI and SemanticPOSS, called AugKITTI. We propose the wPre metric and TSD for a more in-depth analysis of the results, and follow are proposals with an insightful discussion. Based on the experimental results, we find that: (1) the classes are not only imbalanced in their data size but also in the basic properties of each semantic category. (2) The intraclass diversity and interclass ambiguity make class learning difficult and greatly limit the models' performance, creating the challenges of semantic and data gaps. (3) The trust scores are unreliable for classes whose features are confused with other classes. For 3DSS models, those misclassified ID classes and OODs may also be given high trust scores, making the 3DSS predictions unreliable, and leading to the challenges in judging 3DSS result trustworthiness. All of these outcomes point to several research directions for improving the performance and reliability of the 3DSS models used for real-world applications.
FP-Loc: Lightweight and Drift-free Floor Plan-assisted LiDAR Localization
Authors: Ling Gao, Laurent Kneip
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.00292
Pdf link: https://arxiv.org/pdf/2203.00292
Abstract We present a novel framework for floor plan-based, full six degree-of-freedom LiDAR localization. Our approach relies on robust ceiling and ground plane detection, which solves part of the pose and supports the segmentation of vertical structure elements such as walls and pillars. Our core contribution is a novel nearest neighbour data structure for an efficient look-up of nearest vertical structure elements from the floor plan. The registration is realized as a pair-wise regularized windowed pose graph optimization. Highly efficient, accurate and drift-free long-term localization is demonstrated on multiple scenes.
Keyword: loop detection

There is no result

Keyword: autonomous driving

Semi-supervised Deep Learning for Image Classification with Distribution Mismatch: A Survey
Authors: Saul Calderon-Ramirez, Shengxiang Yang, David Elizondo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2203.00190
Pdf link: https://arxiv.org/pdf/2203.00190
Abstract Deep learning methodologies have been employed in several different fields, with an outstanding success in image recognition applications, such as material quality control, medical imaging, autonomous driving, etc. Deep learning models rely on the abundance of labelled observations to train a prospective model. These models are composed of millions of parameters to estimate, increasing the need of more training observations. Frequently it is expensive to gather labelled observations of data, making the usage of deep learning models not ideal, as the model might over-fit data. In a semi-supervised setting, unlabelled data is used to improve the levels of accuracy and generalization of a model with small labelled datasets. Nevertheless, in many situations different unlabelled data sources might be available. This raises the risk of a significant distribution mismatch between the labelled and unlabelled datasets. Such phenomena can cause a considerable performance hit to typical semi-supervised deep learning frameworks, which often assume that both labelled and unlabelled datasets are drawn from similar distributions. Therefore, in this paper we study the latest approaches for semi-supervised deep learning for image recognition. Emphasis is made in semi-supervised deep learning models designed to deal with a distribution mismatch between the labelled and unlabelled datasets. We address open challenges with the aim to encourage the community to tackle them, and overcome the high data demand of traditional deep learning pipelines under real-world usage settings.
Understanding the Challenges When 3D Semantic Segmentation Faces Class Imbalanced and OOD Data
Authors: Yancheng Pan, Fan Xie, Huijing Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2203.00214
Pdf link: https://arxiv.org/pdf/2203.00214
Abstract 3D semantic segmentation (3DSS) is an essential process in the creation of a safe autonomous driving system. However, deep learning models for 3D semantic segmentation often suffer from the class imbalance problem and out-of-distribution (OOD) data. In this study, we explore how the class imbalance problem affects 3DSS performance and whether the model can detect the category prediction correctness, or whether data is ID (in-distribution) or OOD. For these purposes, we conduct two experiments using three representative 3DSS models and five trust scoring methods, and conduct both a confusion and feature analysis of each class. Furthermore, a data augmentation method for the 3D LiDAR dataset is proposed to create a new dataset based on SemanticKITTI and SemanticPOSS, called AugKITTI. We propose the wPre metric and TSD for a more in-depth analysis of the results, and follow are proposals with an insightful discussion. Based on the experimental results, we find that: (1) the classes are not only imbalanced in their data size but also in the basic properties of each semantic category. (2) The intraclass diversity and interclass ambiguity make class learning difficult and greatly limit the models' performance, creating the challenges of semantic and data gaps. (3) The trust scores are unreliable for classes whose features are confused with other classes. For 3DSS models, those misclassified ID classes and OODs may also be given high trust scores, making the 3DSS predictions unreliable, and leading to the challenges in judging 3DSS result trustworthiness. All of these outcomes point to several research directions for improving the performance and reliability of the 3DSS models used for real-world applications.
Adversarial samples for deep monocular 6D object pose estimation
Authors: Jinlai Zhang, Weiming Li, Shuang Liang, Hao Wang, Jihong Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.00302
Pdf link: https://arxiv.org/pdf/2203.00302
Abstract Estimating object 6D pose from an RGB image is important for many real-world applications such as autonomous driving and robotic grasping, where robustness of the estimation is crucial. In this work, for the first time, we study adversarial samples that can fool state-of-the-art (SOTA) deep learning based 6D pose estimation models. In particular, we propose a Unified 6D pose estimation Attack, namely U6DA, which can successfully attack all the three main categories of models for 6D pose estimation. The key idea of our U6DA is to fool the models to predict wrong results for object shapes that are essential for correct 6D pose estimation. Specifically, we explore a transfer-based black-box attack to 6D pose estimation. By shifting the segmentation attention map away from its original position, adversarial samples are crafted. We show that such adversarial samples are not only effective for the direct 6D pose estimation models, but also able to attack the two-stage based models regardless of their robust RANSAC modules. Extensive experiments were conducted to demonstrate the effectiveness of our U6DA with large-scale public benchmarks. We also introduce a new U6DA-Linemod dataset for robustness study of the 6D pose estimation task. Our codes and dataset will be available at \url{https://github.com/cuge1995/U6DA}.
Keyword: mapping

Towards Targeted Change Detection with Heterogeneous Remote Sensing Images for Forest Mortality Mapping
Authors: Jørgen A. Agersborg, Luigi T. Luppino, Stian Normann Anfinsen, Jane Uhd Jepsen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2203.00049
Pdf link: https://arxiv.org/pdf/2203.00049
Abstract In this paper we develop a method for mapping forest mortality in the forest-tundra ecotone using satellite data from heterogeneous sensors. We use medium resolution imagery in order to provide the complex pattern of forest mortality in this sparsely forested area, which has been induced by an outbreak of geometrid moths. Specifically, Landsat-5 Thematic Mapper images from before the event are used, with RADARSAT-2 providing the post-event images. We obtain the difference images for both multispectral optical and synthetic aperture radar (SAR) by using a recently developed deep learning method for translating between the two domains. These differences are stacked with the original pre- and post-event images in order to let our algorithm also learn how the areas appear before and after the change event. By doing this, and focusing on learning only the changes of interest with one-class classification (OCC), we obtain good results with very little training data.
ERF: Explicit Radiance Field Reconstruction From Scratch
Authors: Samir Aroudj, Steven Lovegrove, Eddy Ilg, Tanner Schmidt, Michael Goesele, Richard Newcombe
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2203.00051
Pdf link: https://arxiv.org/pdf/2203.00051
Abstract We propose a novel explicit dense 3D reconstruction approach that processes a set of images of a scene with sensor poses and calibrations and estimates a photo-real digital model. One of the key innovations is that the underlying volumetric representation is completely explicit in contrast to neural network-based (implicit) alternatives. We encode scenes explicitly using clear and understandable mappings of optimization variables to scene geometry and their outgoing surface radiance. We represent them using hierarchical volumetric fields stored in a sparse voxel octree. Robustly reconstructing such a volumetric scene model with millions of unknown variables from registered scene images only is a highly non-convex and complex optimization problem. To this end, we employ stochastic gradient descent (Adam) which is steered by an inverse differentiable renderer. We demonstrate that our method can reconstruct models of high quality that are comparable to state-of-the-art implicit methods. Importantly, we do not use a sequential reconstruction pipeline where individual steps suffer from incomplete or unreliable information from previous stages, but start our optimizations from uniformed initial solutions with scene geometry and radiance that is far off from the ground truth. We show that our method is general and practical. It does not require a highly controlled lab setup for capturing, but allows for reconstructing scenes with a vast variety of objects, including challenging ones, such as outdoor plants or furry toys. Finally, our reconstructed scene models are versatile thanks to their explicit design. They can be edited interactively which is computationally too costly for implicit alternatives.
Collaborative Robot Mapping using Spectral Graph Analysis
Authors: Lukas Bernreiter, Shehryar Khattak, Lionel Ott, Roland Siegwart, Marco Hutter, Cesar Cadena
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2203.00308
Pdf link: https://arxiv.org/pdf/2203.00308
Abstract In this paper, we deal with the problem of creating globally consistent pose graphs in a centralized multi-robot SLAM framework. For each robot to act autonomously, individual onboard pose estimates and maps are maintained, which are then communicated to a central server to build an optimized global map. However, inconsistencies between onboard and server estimates can occur due to onboard odometry drift or failure. Furthermore, robots do not benefit from the collaborative map if the server provides no feedback in a computationally tractable and bandwidth-efficient manner. Motivated by this challenge, this paper proposes a novel collaborative mapping framework to enable accurate global mapping among robots and server. In particular, structural differences between robot and server graphs are exploited at different spatial scales using graph spectral analysis to generate necessary constraints for the individual robot pose graphs. The proposed approach is thoroughly analyzed and validated using several real-world multi-robot field deployments where we show improvements of the onboard system up to 90%.
MIRRAX: A Reconfigurable Robot for Limited Access Environments
Authors: Wei Cheah, Keir Groves, Horatio Martin, Harriet Peel, Simon Watson, Ognjen Marjanovic, Barry Lennox
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2203.00337
Pdf link: https://arxiv.org/pdf/2203.00337
Abstract The development of mobile robot platforms for inspection has gained traction in recent years with the rapid advancement in hardware and software. However, conventional mobile robots are unable to address the challenge of operating in extreme environments where the robot is required to traverse narrow gaps in highly cluttered areas with restricted access. This paper presents MIRRAX, a robot that has been designed to meet these challenges with the capability of re-configuring itself to both access restricted environments through narrow ports and navigate through tightly spaced obstacles. Controllers for the robot are detailed, along with an analysis on the controllability of the robot given the use of Mecanum wheels in a variable configuration. Characterisation on the robot's performance identified suitable configurations for operating in narrow environments. The minimum lateral footprint width achievable for stable configuration ($<2^\text{o}$~roll) was 0.19~m. Experimental validation of the robot's controllability shows good agreement with the theoretical analysis. A further series of experiments shows the feasibility of the robot in addressing the challenges above: the capability to reconfigure itself for restricted entry through ports as small as 150mm diameter, and navigating through cluttered environments. The paper also presents results from a deployment in a Magnox facility at the Sellafield nuclear site in the UK -- the first robot to ever do so, for remote inspection and mapping.
Two Classes of Power Mappings with Boomerang Uniformity 2
Authors: Zhen Li, Haode Yan
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2203.00485
Pdf link: https://arxiv.org/pdf/2203.00485
Abstract Let $q$ be an odd prime power. Let $F_1(x)=x^{d_1}$ and $F_2(x)=x^{d_2}$ be power mappings over $\mathrm{GF}(q^2)$, where $d_1=q-1$ and $d_2=d_1+\frac{q^2-1}{2}=\frac{(q-1)(q+3)}{2}$. In this paper, we study the the boomerang uniformity of $F_1$ and $F_2$ via their differential properties. It is shown that, the boomerang uniformity of $F_i$ ($i=1,2$) is 2 with some conditions on $q$.
E-LMC: Extended Linear Model of Coregionalization for Predictions of Spatial Fields
Authors: Shihong Wang, Xueying Zhang, Yichen Meng, Wei Xing
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2203.00525
Pdf link: https://arxiv.org/pdf/2203.00525
Abstract Physical simulations based on partial differential equations typically generate spatial fields results, which are utilized to calculate specific properties of a system for engineering design and optimization. Due to the intensive computational burden of the simulations, a surrogate model mapping the low-dimensional inputs to the spatial fields are commonly built based on a relatively small dataset. To resolve the challenge of predicting the whole spatial field, the popular linear model of coregionalization (LMC) can disentangle complicated correlations within the high-dimensional spatial field outputs and deliver accurate predictions. However, LMC fails if the spatial field cannot be well approximated by a linear combination of base functions with latent processes. In this paper, we extend LMC by introducing an invertible neural network to linearize the highly complex and nonlinear spatial fields such that the LMC can easily generalize to nonlinear problems while preserving the traceability and scalability. Several real-world applications demonstrate that E-LMC can exploit spatial correlations effectively, showing a maximum improvement of about 40% over the original LMC and outperforming the other state-of-the-art spatial field models.
On genetic programming representations and fitness functions for interpretable dimensionality reduction
Authors: Thomas Uriot, Marco Virgolin, Tanja Alderliesten, Peter Bosman
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2203.00528
Pdf link: https://arxiv.org/pdf/2203.00528
Abstract Dimensionality reduction (DR) is an important technique for data exploration and knowledge discovery. However, most of the main DR methods are either linear (e.g., PCA), do not provide an explicit mapping between the original data and its lower-dimensional representation (e.g., MDS, t-SNE, isomap), or produce mappings that cannot be easily interpreted (e.g., kernel PCA, neural-based autoencoder). Recently, genetic programming (GP) has been used to evolve interpretable DR mappings in the form of symbolic expressions. There exists a number of ways in which GP can be used to this end and no study exists that performs a comparison. In this paper, we fill this gap by comparing existing GP methods as well as devising new ones. We evaluate our methods on several benchmark datasets based on predictive accuracy and on how well the original features can be reconstructed using the lower-dimensional representation only. Finally, we qualitatively assess the resulting expressions and their complexity. We find that various GP methods can be competitive with state-of-the-art DR algorithms and that they have the potential to produce interpretable DR mappings.
DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index
Authors: Yujia Zhou, Jing Yao, Zhicheng Dou, Ledell Wu, Ji-Rong Wen
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2203.00537
Pdf link: https://arxiv.org/pdf/2203.00537
Abstract Web search provides a promising way for people to obtain information and has been extensively studied. With the surgence of deep learning and large-scale pre-training techniques, various neural information retrieval models are proposed and they have demonstrated the power for improving search (especially, the ranking) quality. All these existing search methods follow a common paradigm, i.e. index-retrieve-rerank, where they first build an index of all documents based on document terms (i.e., sparse inverted index) or representation vectors (i.e., dense vector index), then retrieve and rerank retrieved documents based on similarity between the query and documents via ranking models. In this paper, we explore a new paradigm of information retrieval with neither sparse nor dense index but only a model. Specifically, we propose a pre-training model-based IR system called DynamicRetriever. As for this system, the training stage embeds the token-level and document-level information (especially, document identifiers) of the corpus into the model parameters, then the inference stage directly generates document identifiers for a given query. Compared with existing search methods, the model-based IR system has two advantages: i) it parameterizes the traditional static index with a pre-training model, which converts the document semantic mapping into a dynamic and updatable process; ii) with separate document identifiers, it captures both the term-level and document-level information for each document. Extensive experiments conducted on the public search benchmark MS MARCO verify the effectiveness and potential of our proposed new paradigm for information retrieval.
Descriptellation: Deep Learned Constellation Descriptors for SLAM
Authors: Chunwei Xing, Xinyu Sun, Andrei Cramariuc, Samuel Gull, Jen Jen Chung, Cesar Cadena, Roland Siegwart, Florian Tschopp
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.00567
Pdf link: https://arxiv.org/pdf/2203.00567
Abstract Current global localization descriptors in Simultaneous Localization and Mapping (SLAM) often fail under vast viewpoint or appearance changes. Adding topological information of semantic objects into the descriptors ameliorates the problem. However, hand-crafted topological descriptors extract limited information and they are not robust to environmental noise, drastic perspective changes, or object occlusion or misdetections. To solve this problem, we formulate a learning-based approach by constructing constellations from semantically meaningful objects and use Deep Graph Convolution Networks to map the constellation representation to a descriptor. We demonstrate the effectiveness of our Deep Learned Constellation Descriptor (Descriptellation) on the Paris-Rue-Lille and IQmulus datasets. Although Descriptellation is trained on randomly generated simulation datasets, it shows good generalization abilities on real-world datasets. Descriptellation outperforms the PointNet and handcrafted constellation descriptors for global localization, and shows robustness against different types of noise.
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
Authors: Mohamed Afham, Isuru Dissanayake, Dinithi Dissanayake, Amaya Dharmasiri, Kanchana Thilakarathna, Ranga Rodrigo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.00680
Pdf link: https://arxiv.org/pdf/2203.00680
Abstract Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object classification, segmentation and detection is often laborious owing to the irregular structure of point clouds. Self-supervised learning, which operates without any human labeling, is a promising approach to address this issue. We observe in the real world that humans are capable of mapping the visual concepts learnt from 2D images to understand the 3D world. Encouraged by this insight, we propose CrossPoint, a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations. It enables a 3D-2D correspondence of objects by maximizing agreement between point clouds and the corresponding rendered 2D image in the invariant space, while encouraging invariance to transformations in the point cloud modality. Our joint training objective combines the feature correspondences within and across modalities, thus ensembles a rich learning signal from both 3D point cloud and 2D image modalities in a self-supervised fashion. Experimental results show that our approach outperforms the previous unsupervised learning methods on a diverse range of downstream tasks including 3D object classification and segmentation. Further, the ablation studies validate the potency of our approach for a better point cloud understanding. Code and pretrained models are available at this http URL
Keyword: localization

Deep Camera Pose Regression Using Pseudo-LiDAR
Authors: Ali Raza, Lazar Lolic, Shahmir Akhter, Alfonso Dela Cruz, Michael Liut
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2203.00080
Pdf link: https://arxiv.org/pdf/2203.00080
Abstract An accurate and robust large-scale localization system is an integral component for active areas of research such as autonomous vehicles and augmented reality. To this end, many learning algorithms have been proposed that predict 6DOF camera pose from RGB or RGB-D images. However, previous methods that incorporate depth typically treat the data the same way as RGB images, often adding depth maps as additional channels to RGB images and passing them through convolutional neural networks (CNNs). In this paper, we show that converting depth maps into pseudo-LiDAR signals, previously shown to be useful for 3D object detection, is a better representation for camera localization tasks by projecting point clouds that can accurately determine 6DOF camera pose. This is demonstrated by first comparing localization accuracies of a network operating exclusively on pseudo-LiDAR representations, with networks operating exclusively on depth maps. We then propose FusionLoc, a novel architecture that uses pseudo-LiDAR to regress a 6DOF camera pose. FusionLoc is a dual stream neural network, which aims to remedy common issues with typical 2D CNNs operating on RGB-D images. The results from this architecture are compared against various other state-of-the-art deep pose regression implementations using the 7 Scenes dataset. The findings are that FusionLoc performs better than a number of other camera localization methods, with a notable improvement being, on average, 0.33m and 4.35{\deg} more accurate than RGB-D PoseNet. By proving the validity of using pseudo-LiDAR signals over depth maps for localization, there are new considerations when implementing large-scale localization systems.
FP-Loc: Lightweight and Drift-free Floor Plan-assisted LiDAR Localization
Authors: Ling Gao, Laurent Kneip
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.00292
Pdf link: https://arxiv.org/pdf/2203.00292
Abstract We present a novel framework for floor plan-based, full six degree-of-freedom LiDAR localization. Our approach relies on robust ceiling and ground plane detection, which solves part of the pose and supports the segmentation of vertical structure elements such as walls and pillars. Our core contribution is a novel nearest neighbour data structure for an efficient look-up of nearest vertical structure elements from the floor plan. The registration is realized as a pair-wise regularized windowed pose graph optimization. Highly efficient, accurate and drift-free long-term localization is demonstrated on multiple scenes.
Indoor Localization for Quadrotors using Invisible Projected Tags
Authors: Jinjie Li, Liang Han, Zhang Ren
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2203.00356
Pdf link: https://arxiv.org/pdf/2203.00356
Abstract Augmented reality (AR) technology has been introduced into the robotics field to narrow the visual gap between indoor and outdoor environments. However, without signals from satellite navigation systems, flight experiments in these indoor AR scenarios need other accurate localization approaches. This work proposes a real-time centimeter-level indoor localization method based on psycho-visually invisible projected tags (IPT), requiring a projector as the sender and quadrotors with high-speed cameras as the receiver. The method includes a modulation process for the sender, as well as demodulation and pose estimation steps for the receiver, where screen-camera communication technology is applied to hide fiducial tags using human vision property. Experiments have demonstrated that IPT can achieve accuracy within ten centimeters and a speed of about ten FPS. Compared with other localization methods for AR robotics platforms, IPT is affordable by using only a projector and high-speed cameras as hardware consumption and convenient by omitting a coordinate alignment step. To the authors' best knowledge, this is the first time screen-camera communication is utilized for AR robot localization.
Descriptellation: Deep Learned Constellation Descriptors for SLAM
Authors: Chunwei Xing, Xinyu Sun, Andrei Cramariuc, Samuel Gull, Jen Jen Chung, Cesar Cadena, Roland Siegwart, Florian Tschopp
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.00567
Pdf link: https://arxiv.org/pdf/2203.00567
Abstract Current global localization descriptors in Simultaneous Localization and Mapping (SLAM) often fail under vast viewpoint or appearance changes. Adding topological information of semantic objects into the descriptors ameliorates the problem. However, hand-crafted topological descriptors extract limited information and they are not robust to environmental noise, drastic perspective changes, or object occlusion or misdetections. To solve this problem, we formulate a learning-based approach by constructing constellations from semantically meaningful objects and use Deep Graph Convolution Networks to map the constellation representation to a descriptor. We demonstrate the effectiveness of our Deep Learned Constellation Descriptor (Descriptellation) on the Paris-Rue-Lille and IQmulus datasets. Although Descriptellation is trained on randomly generated simulation datasets, it shows good generalization abilities on real-world datasets. Descriptellation outperforms the PointNet and handcrafted constellation descriptors for global localization, and shows robustness against different types of noise.
Finite difference method for stochastic Cahn--Hilliard equation: Strong convergence rate and density convergence
Authors: Jialin Hong, Diancong Jin, Derui Sheng
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2203.00571
Pdf link: https://arxiv.org/pdf/2203.00571
Abstract This paper presents the strong convergence rate and density convergence of a spatial finite difference method (FDM) when applied to numerically solve the stochastic Cahn--Hilliard equation driven by multiplicative space-time white noises. The main difficulty lies in the control of the drift coefficient that is neither global Lipschitz nor one-sided Lipschitz. To handle this difficulty, we first utilize an interpolation approach to derive the discrete $H^1$-regularity of the numerical solution. This is the key to deriving the optimal strong convergence order $1$ of the numerical solution. Further, we propose a novel localization argument to estimate the total variation distance between the exact and numerical solutions, which along with the existence of the density of the numerical solution finally yields the convergence of density in $L^1(\mathbb{R})$ of the numerical solution. This partially answers positively to the open problem emerged in [9,Section 5] on computing the density of the exact solution numerically.
Algorithm Design and Integration for a Robotic Apple Harvesting System
Authors: Kaixiang Zhang, Kyle Lammers, Pengyu Chu, Nathan Dickinson, Zhaojian Li, Renfu Lu
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2203.00582
Pdf link: https://arxiv.org/pdf/2203.00582
Abstract Due to labor shortage and rising labor cost for the apple industry, there is an urgent need for the development of robotic systems to efficiently and autonomously harvest apples. In this paper, we present a system overview and algorithm design of our recently developed robotic apple harvester prototype. Our robotic system is enabled by the close integration of several core modules, including calibration, visual perception, planning, and control. This paper covers the main methods and advancements in robust extrinsic parameter calibration, deep learning-based multi-view fruit detection and localization, unified picking and dropping planning, and dexterous manipulation control. Indoor and field experiments were conducted to evaluate the performance of the developed system, which achieved an average picking rate of 3.6 seconds per apple. This is a significant improvement over other reported apple harvesting robots with a picking rate in the range of 7-10 seconds per apple. The current prototype shows promising performance towards further development of efficient and automated apple harvesting technology. Finally, limitations of the current system and future work are discussed.
A unified 3D framework for Organs at Risk Localization and Segmentation for Radiation Therapy Planning
Authors: Fernando Navarro, Guido Sasahara, Suprosanna Shit, Ivan Ezhov, Jan C. Peeken, Stephanie E. Combs, Bjoern H. Menze
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2203.00624
Pdf link: https://arxiv.org/pdf/2203.00624
Abstract Automatic localization and segmentation of organs-at-risk (OAR) in CT are essential pre-processing steps in medical image analysis tasks, such as radiation therapy planning. For instance, the segmentation of OAR surrounding tumors enables the maximization of radiation to the tumor area without compromising the healthy tissues. However, the current medical workflow requires manual delineation of OAR, which is prone to errors and is annotator-dependent. In this work, we aim to introduce a unified 3D pipeline for OAR localization-segmentation rather than novel localization or segmentation architectures. To the best of our knowledge, our proposed framework fully enables the exploitation of 3D context information inherent in medical imaging. In the first step, a 3D multi-variate regression network predicts organs' centroids and bounding boxes. Secondly, 3D organ-specific segmentation networks are leveraged to generate a multi-organ segmentation map. Our method achieved an overall Dice score of $0.9260\pm 0.18 \%$ on the VISCERAL dataset containing CT scans with varying fields of view and multiple organs.
Lodestar: An Integrated Embedded Real-Time Control Engine
Authors: Hamza El-Kebir, Joseph Bentsman, Melkior Ornik
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2203.00649
Pdf link: https://arxiv.org/pdf/2203.00649
Abstract In this work we present Lodestar, an integrated engine for rapid real-time control system development. Using a functional block diagram paradigm, Lodestar allows for complex multi-disciplinary control software design, while automatically resolving execution order, circular data-dependencies, and networking. In particular, Lodestar presents a unified set of control, signal processing, and computer vision routines to users, which may be interfaced with external hardware and software packages using interoperable user-defined wrappers. Lodestar allows for user-defined block diagrams to be directly executed, or for them to be translated to overhead-free source code for integration in other programs. We demonstrate how our framework departs from approaches used in state-of-the-art simulation frameworks to enable real-time performance, and compare its capabilities to existing solutions in the realm of control software. To demonstrate the utility of Lodestar in real-time control systems design, we have applied Lodestar to implement two real-time torque-based controller for a robotic arm. In addition, we have developed a novel autofocus algorithm for use in thermography-based localization and parameter estimation in electrosurgery and other areas of robot-assisted surgery. We compare our algorithm design approach in Lodestar to a classical ground-up approach, showing that Lodestar considerably eases the design process. We also show how Lodestar can seamlessly interface with existing simulation and networking framework in a number of simulation examples.

zhuhu00 / Paper-Daily-Notice

New submissions for Wed, 2 Mar 22 #111

Keyword: SLAM

Collaborative Robot Mapping using Spectral Graph Analysis

Descriptellation: Deep Learned Constellation Descriptors for SLAM

Keyword: Visual inertial

Keyword: livox

Keyword: loam

Keyword: Visual inertial odometry

Keyword: lidar

Elliptical Slice Sampling for Probabilistic Verification of Stochastic Systems with Signal Temporal Logic Specifications

Deep Camera Pose Regression Using Pseudo-LiDAR

Spatiotemporal Transformer Attention Network for 3D Voxel Level Joint Segmentation and Motion Prediction in Point Cloud

Understanding the Challenges When 3D Semantic Segmentation Faces Class Imbalanced and OOD Data

FP-Loc: Lightweight and Drift-free Floor Plan-assisted LiDAR Localization

Keyword: loop detection

Keyword: autonomous driving

Semi-supervised Deep Learning for Image Classification with Distribution Mismatch: A Survey

Understanding the Challenges When 3D Semantic Segmentation Faces Class Imbalanced and OOD Data

Adversarial samples for deep monocular 6D object pose estimation

Keyword: mapping

Towards Targeted Change Detection with Heterogeneous Remote Sensing Images for Forest Mortality Mapping

ERF: Explicit Radiance Field Reconstruction From Scratch

Collaborative Robot Mapping using Spectral Graph Analysis

MIRRAX: A Reconfigurable Robot for Limited Access Environments

Two Classes of Power Mappings with Boomerang Uniformity 2

E-LMC: Extended Linear Model of Coregionalization for Predictions of Spatial Fields

On genetic programming representations and fitness functions for interpretable dimensionality reduction

DynamicRetriever: A Pre-training Model-based IR System with Neither Sparse nor Dense Index

Descriptellation: Deep Learned Constellation Descriptors for SLAM

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding

Keyword: localization

Deep Camera Pose Regression Using Pseudo-LiDAR

FP-Loc: Lightweight and Drift-free Floor Plan-assisted LiDAR Localization

Indoor Localization for Quadrotors using Invisible Projected Tags

Descriptellation: Deep Learned Constellation Descriptors for SLAM

Finite difference method for stochastic Cahn--Hilliard equation: Strong convergence rate and density convergence

Algorithm Design and Integration for a Robotic Apple Harvesting System

A unified 3D framework for Organs at Risk Localization and Segmentation for Radiation Therapy Planning

Lodestar: An Integrated Embedded Real-Time Control Engine