New submissions for Tue, 28 Dec 21

Keyword: SLAM

3D Point Cloud Reconstruction and SLAM as an Input

Authors: Ziyu Li, Fangyang Ye, Xinran Guan
Subjects: Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2112.12907
Pdf link: https://arxiv.org/pdf/2112.12907
Abstract To handle the different types of surface reconstruction tasks, we have replicated as well as modified a few of reconstruction methods and have made comparisons between the traditional method and data-driven method for reconstruction the surface of an object with dense point cloud as input. On top of that, we proposed a system using tightly-coupled SLAM as an input to generate deskewed point cloud and odometry and a Truncated Signed Distance Function based Surface Reconstruction Library. To get higher accuracy, IMU(Inertial Measurement Unit) pre-integration and pose graph optimization are conduct in the SLAM part. With the help of the Robot Operating System, we could build a system containing those two parts, which can conduct a real-time outdoor surface reconstruction.
Edge Robotics: Edge-Computing-Accelerated Multi-Robot Simultaneous Localization and Mapping
Authors: Peng Huang, Liekang Zeng, Xu Chen, Ke Luo, Zhi Zhou, Shuai Yu
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2112.13222
Pdf link: https://arxiv.org/pdf/2112.13222
Abstract With the wide penetration of smart robots in multifarious fields, Simultaneous Localization and Mapping (SLAM) technique in robotics has attracted growing attention in the community. Yet collaborating SLAM over multiple robots still remains challenging due to performance contradiction between the intensive graphics computation of SLAM and the limited computing capability of robots. While traditional solutions resort to the powerful cloud servers acting as an external computation provider, we show by real-world measurements that the significant communication overhead in data offloading prevents its practicability to real deployment. To tackle these challenges, this paper promotes the emerging edge computing paradigm into multi-robot SLAM and proposes RecSLAM, a multi-robot laser SLAM system that focuses on accelerating map construction process under the robot-edge-cloud architecture. In contrast to conventional multi-robot SLAM that generates graphic maps on robots and completely merges them on the cloud, RecSLAM develops a hierarchical map fusion technique that directs robots' raw data to edge servers for real-time fusion and then sends to the cloud for global merging. To optimize the overall pipeline, an efficient multi-robot SLAM collaborative processing framework is introduced to adaptively optimize robot-to-edge offloading tailored to heterogeneous edge resource conditions, meanwhile ensuring the workload balancing among the edge servers. Extensive evaluations show RecSLAM can achieve up to 39% processing latency reduction over the state-of-the-art. Besides, a proof-of-concept prototype is developed and deployed in real scenes to demonstrate its effectiveness.
Simultaneous Location of Rail Vehicles and Mapping of Environment with Multiple LiDARs
Authors: Yusheng Wang, Weiwei Song, Yidong Lou, Fei Huang, Zhiyong Tu, Shimin Zhang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2112.13224
Pdf link: https://arxiv.org/pdf/2112.13224
Abstract Precise and real-time rail vehicle localization as well as railway environment monitoring is crucial for railroad safety. In this letter, we propose a multi-LiDAR based simultaneous localization and mapping (SLAM) system for railway applications. Our approach starts with measurements preprocessing to denoise and synchronize multiple LiDAR inputs. Different frame-to-frame registration methods are used according to the LiDAR placement. In addition, we leverage the plane constraints from extracted rail tracks to improve the system accuracy. The local map is further aligned with global map utilizing absolute position measurements. Considering the unavoidable metal abrasion and screw loosening, online extrinsic refinement is awakened for long-during operation. The proposed method is extensively verified on datasets gathered over 3000 km. The results demonstrate that the proposed system achieves accurate and robust localization together with effective mapping for large-scale environments. Our system has already been applied to a freight traffic railroad for monitoring tasks.
UV-SLAM: Unconstrained Line-based SLAM Using Vanishing Points for Structural Mapping
Authors: Hyunjun Lim, Jinwoo Jeon, Hyun Myung
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2112.13515
Pdf link: https://arxiv.org/pdf/2112.13515
Abstract In feature-based simultaneous localization and mapping (SLAM), line features complement the sparsity of point features, making it possible to map the surrounding environment structure. Existing approaches utilizing line features have primarily employed a measurement model that uses line re-projection. However, the direction vectors used in the 3D line mapping process cannot be corrected because the line measurement model employs only the lines' normal vectors in the Pl\"{u}cker coordinate. As a result, problems like degeneracy that occur during the 3D line mapping process cannot be solved. To tackle the problem, this paper presents a UV-SLAM, which is an unconstrained line-based SLAM using vanishing points for structural mapping. This paper focuses on using structural regularities without any constraints, such as the Manhattan world assumption. For this, we use the vanishing points that can be obtained from the line features. The difference between the vanishing point observation calculated through line features in the image and the vanishing point estimation calculated through the direction vector is defined as a residual and added to the cost function of optimization-based SLAM. Furthermore, through Fisher information matrix rank analysis, we prove that vanishing point measurements guarantee a unique mapping solution. Finally, we demonstrate that the localization accuracy and mapping quality are improved compared to the state-of-the-art algorithms using public datasets.
M2DGR: A Multi-sensor and Multi-scenario SLAM Dataset for Ground Robots
Authors: Jie Yin, Ang Li, Tao Li, Wenxian Yu, Danping Zou
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2112.13659
Pdf link: https://arxiv.org/pdf/2112.13659
Abstract We introduce M2DGR: a novel large-scale dataset collected by a ground robot with a full sensor-suite including six fish-eye and one sky-pointing RGB cameras, an infrared camera, an event camera, a Visual-Inertial Sensor (VI-sensor), an inertial measurement unit (IMU), a LiDAR, a consumer-grade Global Navigation Satellite System (GNSS) receiver and a GNSS-IMU navigation system with real-time kinematic (RTK) signals. All those sensors were well-calibrated and synchronized, and their data were recorded simultaneously. The ground truth trajectories were obtained by the motion capture device, a laser 3D tracker, and an RTK receiver. The dataset comprises 36 sequences (about 1TB) captured in diverse scenarios including both indoor and outdoor environments. We evaluate state-of-the-art SLAM algorithms on M2DGR. Results show that existing solutions perform poorly in some scenarios. For the benefit of the research community, we make the dataset and tools public. The webpage of our project is https://github.com/SJTU-ViSYS/M2DGR.
Keyword: Visual inertial

There is no result

Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: Visual inertial odometry

There is no result

Keyword: lidar

Doppler velocity-based algorithm for Clustering and Velocity Estimation of moving objects
Authors: Mian Guo, Kai Zhong, Xiaozhi Wang
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2112.12984
Pdf link: https://arxiv.org/pdf/2112.12984
Abstract We propose a Doppler velocity-based cluster and velocity estimation algorithm based on the characteristics of FMCW LiDAR which achieves highly accurate, single-scan, and real-time motion state detection and velocity estimation. We prove the continuity of the Doppler velocity on the same object. Based on this principle, we achieve the distinction between moving objects and stationary background via region growing clustering algorithm. The obtained stationary background will be used to estimate the velocity of the FMCW LiDAR by the least-squares method. Then we estimate the velocity of the moving objects using the estimated LiDAR velocity and the Doppler velocity of moving objects obtained by clustering. To ensure real-time processing, we set the appropriate least-squares parameters. Meanwhile, to verify the effectiveness of the algorithm, we create the FMCW LiDAR model on the autonomous driving simulation platform CARLA for spawning data. The results show that our algorithm can process at least a 4.5million points and estimate the velocity of 150 moving objects per second under the arithmetic power of the Ryzen 3600x CPU, with a motion state detection accuracy of over 99% and estimated velocity accuracy of 0.1 m/s.
Simultaneous Location of Rail Vehicles and Mapping of Environment with Multiple LiDARs
Authors: Yusheng Wang, Weiwei Song, Yidong Lou, Fei Huang, Zhiyong Tu, Shimin Zhang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2112.13224
Pdf link: https://arxiv.org/pdf/2112.13224
Abstract Precise and real-time rail vehicle localization as well as railway environment monitoring is crucial for railroad safety. In this letter, we propose a multi-LiDAR based simultaneous localization and mapping (SLAM) system for railway applications. Our approach starts with measurements preprocessing to denoise and synchronize multiple LiDAR inputs. Different frame-to-frame registration methods are used according to the LiDAR placement. In addition, we leverage the plane constraints from extracted rail tracks to improve the system accuracy. The local map is further aligned with global map utilizing absolute position measurements. Considering the unavoidable metal abrasion and screw loosening, online extrinsic refinement is awakened for long-during operation. The proposed method is extensively verified on datasets gathered over 3000 km. The results demonstrate that the proposed system achieves accurate and robust localization together with effective mapping for large-scale environments. Our system has already been applied to a freight traffic railroad for monitoring tasks.
M2DGR: A Multi-sensor and Multi-scenario SLAM Dataset for Ground Robots
Authors: Jie Yin, Ang Li, Tao Li, Wenxian Yu, Danping Zou
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2112.13659
Pdf link: https://arxiv.org/pdf/2112.13659
Abstract We introduce M2DGR: a novel large-scale dataset collected by a ground robot with a full sensor-suite including six fish-eye and one sky-pointing RGB cameras, an infrared camera, an event camera, a Visual-Inertial Sensor (VI-sensor), an inertial measurement unit (IMU), a LiDAR, a consumer-grade Global Navigation Satellite System (GNSS) receiver and a GNSS-IMU navigation system with real-time kinematic (RTK) signals. All those sensors were well-calibrated and synchronized, and their data were recorded simultaneously. The ground truth trajectories were obtained by the motion capture device, a laser 3D tracker, and an RTK receiver. The dataset comprises 36 sequences (about 1TB) captured in diverse scenarios including both indoor and outdoor environments. We evaluate state-of-the-art SLAM algorithms on M2DGR. Results show that existing solutions perform poorly in some scenarios. For the benefit of the research community, we make the dataset and tools public. The webpage of our project is https://github.com/SJTU-ViSYS/M2DGR.
Keyword: loop detection

There is no result

Keyword: autonomous driving

Multi-Camera Sensor Fusion for Visual Odometry using Deep Uncertainty Estimation
Authors: Nimet Kaygusuz, Oscar Mendez, Richard Bowden
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2112.12818
Pdf link: https://arxiv.org/pdf/2112.12818
Abstract Visual Odometry (VO) estimation is an important source of information for vehicle state estimation and autonomous driving. Recently, deep learning based approaches have begun to appear in the literature. However, in the context of driving, single sensor based approaches are often prone to failure because of degraded image quality due to environmental factors, camera placement, etc. To address this issue, we propose a deep sensor fusion framework which estimates vehicle motion using both pose and uncertainty estimations from multiple on-board cameras. We extract spatio-temporal feature representations from a set of consecutive images using a hybrid CNN - RNN model. We then utilise a Mixture Density Network (MDN) to estimate the 6-DoF pose as a mixture of distributions and a fusion module to estimate the final pose using MDN outputs from multi-cameras. We evaluate our approach on the publicly available, large scale autonomous vehicle dataset, nuScenes. The results show that the proposed fusion approach surpasses the state-of-the-art, and provides robust estimates and accurate trajectories compared to individual camera-based estimations.
Doppler velocity-based algorithm for Clustering and Velocity Estimation of moving objects
Authors: Mian Guo, Kai Zhong, Xiaozhi Wang
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2112.12984
Pdf link: https://arxiv.org/pdf/2112.12984
Abstract We propose a Doppler velocity-based cluster and velocity estimation algorithm based on the characteristics of FMCW LiDAR which achieves highly accurate, single-scan, and real-time motion state detection and velocity estimation. We prove the continuity of the Doppler velocity on the same object. Based on this principle, we achieve the distinction between moving objects and stationary background via region growing clustering algorithm. The obtained stationary background will be used to estimate the velocity of the FMCW LiDAR by the least-squares method. Then we estimate the velocity of the moving objects using the estimated LiDAR velocity and the Doppler velocity of moving objects obtained by clustering. To ensure real-time processing, we set the appropriate least-squares parameters. Meanwhile, to verify the effectiveness of the algorithm, we create the FMCW LiDAR model on the autonomous driving simulation platform CARLA for spawning data. The results show that our algorithm can process at least a 4.5million points and estimate the velocity of 150 moving objects per second under the arithmetic power of the Ryzen 3600x CPU, with a motion state detection accuracy of over 99% and estimated velocity accuracy of 0.1 m/s.
A Survey on Interpretable Reinforcement Learning
Authors: Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang, Jianye Hao, Wulong Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2112.13112
Pdf link: https://arxiv.org/pdf/2112.13112
Abstract Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such contexts, a learned policy needs for instance to be interpretable, so that it can be inspected before any deployment (e.g., for safety and verifiability reasons). This survey provides an overview of various approaches to achieve higher interpretability in reinforcement learning (RL). To that aim, we distinguish interpretability (as a property of a model) and explainability (as a post-hoc operation, with the intervention of a proxy) and discuss them in the context of RL with an emphasis on the former notion. In particular, we argue that interpretable RL may embrace different facets: interpretable inputs, interpretable (transition/reward) models, and interpretable decision-making. Based on this scheme, we summarize and analyze recent work related to interpretable RL with an emphasis on papers published in the past 10 years. We also discuss briefly some related research areas and point to some potential promising research directions.
Adversarial Attack for Asynchronous Event-based Data
Authors: Wooju Lee, Hyun Myung
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2112.13534
Pdf link: https://arxiv.org/pdf/2112.13534
Abstract Deep neural networks (DNNs) are vulnerable to adversarial examples that are carefully designed to cause the deep learning model to make mistakes. Adversarial examples of 2D images and 3D point clouds have been extensively studied, but studies on event-based data are limited. Event-based data can be an alternative to a 2D image under high-speed movements, such as autonomous driving. However, the given adversarial events make the current deep learning model vulnerable to safety issues. In this work, we generate adversarial examples and then train the robust models for event-based data, for the first time. Our algorithm shifts the time of the original events and generates additional adversarial events. Additional adversarial events are generated in two stages. First, null events are added to the event-based data to generate additional adversarial events. The perturbation size can be controlled with the number of null events. Second, the location and time of additional adversarial events are set to mislead DNNs in a gradient-based attack. Our algorithm achieves an attack success rate of 97.95\% on the N-Caltech101 dataset. Furthermore, the adversarial training model improves robustness on the adversarial event data compared to the original model.
An Empirical Study of Adder Neural Networks for Object Detection
Authors: Xinghao Chen, Chang Xu, Minjing Dong, Chunjing Xu, Yunhe Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2112.13608
Pdf link: https://arxiv.org/pdf/2112.13608
Abstract Adder neural networks (AdderNets) have shown impressive performance on image classification with only addition operations, which are more energy efficient than traditional convolutional neural networks built with multiplications. Compared with classification, there is a strong demand on reducing the energy consumption of modern object detectors via AdderNets for real-world applications such as autonomous driving and face detection. In this paper, we present an empirical study of AdderNets for object detection. We first reveal that the batch normalization statistics in the pre-trained adder backbone should not be frozen, since the relatively large feature variance of AdderNets. Moreover, we insert more shortcut connections in the neck part and design a new feature fusion architecture for avoiding the sparse features of adder layers. We present extensive ablation studies to explore several design choices of adder detectors. Comparisons with state-of-the-arts are conducted on COCO and PASCAL VOC benchmarks. Specifically, the proposed Adder FCOS achieves a 37.8\% AP on the COCO val set, demonstrating comparable performance to that of the convolutional counterpart with an about $1.4\times$ energy reduction.
Keyword: mapping

One-to-One or One-to-many? What function inlining brings to binary2source similarity analysis
Authors: Ang Jia, Ming Fan, Wuxia Jin, Xi Xu, Ting Liu
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2112.12928
Pdf link: https://arxiv.org/pdf/2112.12928
Abstract Binary2source code matching is critical to many code-reuse-related tasks, including code clone detection, software license violation detection, and reverse engineering assistance. Existing binary2source works always apply a "1-to-1" (one-to-one) mechanism, i.e., one function in a binary file is matched against one function in a source file. However, we assume that such mapping is usually a more complex problem of "1-to-n" (one-to-many) due to the existence of function inlining. To the best of our knowledge, few existing works have systematically studied the effect of function inlining on binary2source matching tasks. This paper will address this issue. To support our study, we first construct two datasets containing 61,179 binaries and 19,976,067 functions. We also propose an automated approach to label the dataset with line-level and function-level mapping. Based on our labeled dataset, we then investigate the extent of function inlining, the factors affecting function inlining, and the impact of function inlining on existing binary2source similarity methods. Finally, we discuss the interesting findings and give suggestions for designing more effective methodologies.
Continuous Spectral Reconstruction from RGB Images via Implicit Neural Representation
Authors: Ruikang Xu, Mingde Yao, Chang Chen, Lizhi Wang, Zhiwei Xiong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2112.13003
Pdf link: https://arxiv.org/pdf/2112.13003
Abstract Existing methods for spectral reconstruction usually learn a discrete mapping from RGB images to a number of spectral bands. However, this modeling strategy ignores the continuous nature of spectral signature. In this paper, we propose Neural Spectral Reconstruction (NeSR) to lift this limitation, by introducing a novel continuous spectral representation. To this end, we embrace the concept of implicit function and implement a parameterized embodiment with a neural network. Specifically, we first adopt a backbone network to extract spatial features of RGB inputs. Based on it, we devise Spectral Profile Interpolation (SPI) module and Neural Attention Mapping (NAM) module to enrich deep features, where the spatial-spectral correlation is involved for a better representation. Then, we view the number of sampled spectral bands as the coordinate of continuous implicit function, so as to learn the projection from deep features to spectral intensities. Extensive experiments demonstrate the distinct advantage of NeSR in reconstruction accuracy over baseline methods. Moreover, NeSR extends the flexibility of spectral reconstruction by enabling an arbitrary number of spectral bands as the target output.
Edge Robotics: Edge-Computing-Accelerated Multi-Robot Simultaneous Localization and Mapping
Authors: Peng Huang, Liekang Zeng, Xu Chen, Ke Luo, Zhi Zhou, Shuai Yu
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2112.13222
Pdf link: https://arxiv.org/pdf/2112.13222
Abstract With the wide penetration of smart robots in multifarious fields, Simultaneous Localization and Mapping (SLAM) technique in robotics has attracted growing attention in the community. Yet collaborating SLAM over multiple robots still remains challenging due to performance contradiction between the intensive graphics computation of SLAM and the limited computing capability of robots. While traditional solutions resort to the powerful cloud servers acting as an external computation provider, we show by real-world measurements that the significant communication overhead in data offloading prevents its practicability to real deployment. To tackle these challenges, this paper promotes the emerging edge computing paradigm into multi-robot SLAM and proposes RecSLAM, a multi-robot laser SLAM system that focuses on accelerating map construction process under the robot-edge-cloud architecture. In contrast to conventional multi-robot SLAM that generates graphic maps on robots and completely merges them on the cloud, RecSLAM develops a hierarchical map fusion technique that directs robots' raw data to edge servers for real-time fusion and then sends to the cloud for global merging. To optimize the overall pipeline, an efficient multi-robot SLAM collaborative processing framework is introduced to adaptively optimize robot-to-edge offloading tailored to heterogeneous edge resource conditions, meanwhile ensuring the workload balancing among the edge servers. Extensive evaluations show RecSLAM can achieve up to 39% processing latency reduction over the state-of-the-art. Besides, a proof-of-concept prototype is developed and deployed in real scenes to demonstrate its effectiveness.
Simultaneous Location of Rail Vehicles and Mapping of Environment with Multiple LiDARs
Authors: Yusheng Wang, Weiwei Song, Yidong Lou, Fei Huang, Zhiyong Tu, Shimin Zhang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2112.13224
Pdf link: https://arxiv.org/pdf/2112.13224
Abstract Precise and real-time rail vehicle localization as well as railway environment monitoring is crucial for railroad safety. In this letter, we propose a multi-LiDAR based simultaneous localization and mapping (SLAM) system for railway applications. Our approach starts with measurements preprocessing to denoise and synchronize multiple LiDAR inputs. Different frame-to-frame registration methods are used according to the LiDAR placement. In addition, we leverage the plane constraints from extracted rail tracks to improve the system accuracy. The local map is further aligned with global map utilizing absolute position measurements. Considering the unavoidable metal abrasion and screw loosening, online extrinsic refinement is awakened for long-during operation. The proposed method is extensively verified on datasets gathered over 3000 km. The results demonstrate that the proposed system achieves accurate and robust localization together with effective mapping for large-scale environments. Our system has already been applied to a freight traffic railroad for monitoring tasks.
Deeper Clinical Document Understanding Using Relation Extraction
Authors: Hasham Ul Haq, Veysel Kocaman, David Talby
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2112.13259
Pdf link: https://arxiv.org/pdf/2112.13259
Abstract The surging amount of biomedical literature & digital clinical records presents a growing need for text mining techniques that can not only identify but also semantically relate entities in unstructured data. In this paper we propose a text mining framework comprising of Named Entity Recognition (NER) and Relation Extraction (RE) models, which expands on previous work in three main ways. First, we introduce two new RE model architectures -- an accuracy-optimized one based on BioBERT and a speed-optimized one utilizing crafted features over a Fully Connected Neural Network (FCNN). Second, we evaluate both models on public benchmark datasets and obtain new state-of-the-art F1 scores on the 2012 i2b2 Clinical Temporal Relations challenge (F1 of 73.6, +1.2% over the previous SOTA), the 2010 i2b2 Clinical Relations challenge (F1 of 69.1, +1.2%), the 2019 Phenotype-Gene Relations dataset (F1 of 87.9, +8.5%), the 2012 Adverse Drug Events Drug-Reaction dataset (F1 of 90.0, +6.3%), and the 2018 n2c2 Posology Relations dataset (F1 of 96.7, +0.6%). Third, we show two practical applications of this framework -- for building a biomedical knowledge graph and for improving the accuracy of mapping entities to clinical codes. The system is built using the Spark NLP library which provides a production-grade, natively scalable, hardware-optimized, trainable & tunable NLP framework.
Miti-DETR: Object Detection based on Transformers with Mitigatory Self-Attention Convergence
Authors: Wenchi Ma, Tianxiao Zhang, Guanghui Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2112.13310
Pdf link: https://arxiv.org/pdf/2112.13310
Abstract Object Detection with Transformers (DETR) and related works reach or even surpass the highly-optimized Faster-RCNN baseline with self-attention network architectures. Inspired by the evidence that pure self-attention possesses a strong inductive bias that leads to the transformer losing the expressive power with respect to network depth, we propose a transformer architecture with a mitigatory self-attention mechanism by applying possible direct mapping connections in the transformer architecture to mitigate the rank collapse so as to counteract feature expression loss and enhance the model performance. We apply this proposal in object detection tasks and develop a model named Miti-DETR. Miti-DETR reserves the inputs of each single attention layer to the outputs of that layer so that the "non-attention" information has participated in any attention propagation. The formed residual self-attention network addresses two critical issues: (1) stop the self-attention networks from degenerating to rank-1 to the maximized degree; and (2) further diversify the path distribution of parameter update so that easier attention learning is expected. Miti-DETR significantly enhances the average detection precision and convergence speed towards existing DETR-based models on the challenging COCO object detection dataset. Moreover, the proposed transformer with the residual self-attention network can be easily generalized or plugged in other related task models without specific customization.
A Compact Neural Network-based Algorithm for Robust Image Watermarking
Authors: Hong-Bo Xu, Rong Wang, Jia Wei, Shao-Ping Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2112.13491
Pdf link: https://arxiv.org/pdf/2112.13491
Abstract Digital image watermarking seeks to protect the digital media information from unauthorized access, where the message is embedded into the digital image and extracted from it, even some noises or distortions are applied under various data processing including lossy image compression and interactive content editing. Traditional image watermarking solutions easily suffer from robustness when specified with some prior constraints, while recent deep learning-based watermarking methods could not tackle the information loss problem well under various separate pipelines of feature encoder and decoder. In this paper, we propose a novel digital image watermarking solution with a compact neural network, named Invertible Watermarking Network (IWN). Our IWN architecture is based on a single Invertible Neural Network (INN), this bijective propagation framework enables us to effectively solve the challenge of message embedding and extraction simultaneously, by taking them as a pair of inverse problems for each other and learning a stable invertible mapping. In order to enhance the robustness of our watermarking solution, we specifically introduce a simple but effective bit message normalization module to condense the bit message to be embedded, and a noise layer is designed to simulate various practical attacks under our IWN framework. Extensive experiments demonstrate the superiority of our solution under various distortions.
UV-SLAM: Unconstrained Line-based SLAM Using Vanishing Points for Structural Mapping
Authors: Hyunjun Lim, Jinwoo Jeon, Hyun Myung
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2112.13515
Pdf link: https://arxiv.org/pdf/2112.13515
Abstract In feature-based simultaneous localization and mapping (SLAM), line features complement the sparsity of point features, making it possible to map the surrounding environment structure. Existing approaches utilizing line features have primarily employed a measurement model that uses line re-projection. However, the direction vectors used in the 3D line mapping process cannot be corrected because the line measurement model employs only the lines' normal vectors in the Pl\"{u}cker coordinate. As a result, problems like degeneracy that occur during the 3D line mapping process cannot be solved. To tackle the problem, this paper presents a UV-SLAM, which is an unconstrained line-based SLAM using vanishing points for structural mapping. This paper focuses on using structural regularities without any constraints, such as the Manhattan world assumption. For this, we use the vanishing points that can be obtained from the line features. The difference between the vanishing point observation calculated through line features in the image and the vanishing point estimation calculated through the direction vector is defined as a residual and added to the cost function of optimization-based SLAM. Furthermore, through Fisher information matrix rank analysis, we prove that vanishing point measurements guarantee a unique mapping solution. Finally, we demonstrate that the localization accuracy and mapping quality are improved compared to the state-of-the-art algorithms using public datasets.
Sparsest Univariate Learning Models Under Lipschitz Constraint
Authors: Shayan Aziznejad, Thomas Debarre, Michael Unser
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2112.13542
Pdf link: https://arxiv.org/pdf/2112.13542
Abstract Beside the minimization of the prediction error, two of the most desirable properties of a regression scheme are stability and interpretability. Driven by these principles, we propose continuous-domain formulations for one-dimensional regression problems. In our first approach, we use the Lipschitz constant as a regularizer, which results in an implicit tuning of the overall robustness of the learned mapping. In our second approach, we control the Lipschitz constant explicitly using a user-defined upper-bound and make use of a sparsity-promoting regularizer to favor simpler (and, hence, more interpretable) solutions. The theoretical study of the latter formulation is motivated in part by its equivalence, which we prove, with the training of a Lipschitz-constrained two-layer univariate neural network with rectified linear unit (ReLU) activations and weight decay. By proving representer theorems, we show that both problems admit global minimizers that are continuous and piecewise-linear (CPWL) functions. Moreover, we propose efficient algorithms that find the sparsest solution of each problem: the CPWL mapping with the least number of linear regions. Finally, we illustrate numerically the outcome of our formulations.
Semantic Characterizations of General Belief Base Revision
Authors: Faiq Miftakhul Falakh, Sebastian Rudolph, Kai Sauerwald
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2112.13557
Pdf link: https://arxiv.org/pdf/2112.13557
Abstract The AGM postulates by Alchourr\'on, G\"ardenfors, and Makinson continue to represent a cornerstone in research related to belief change. Katsuno and Mendelzon (K&M) adopted the AGM postulates for changing belief bases and characterized AGM belief base revision in propositional logic over finite signatures. We generalize K&M's approach to the setting of (multiple) base revision in arbitrary Tarskian logics, covering all logics with a classical model-theoretic semantics and hence a wide variety of logics used in knowledge representation and beyond. Our generic formulation applies to various notions of "base" (such as belief sets, arbitrary or finite sets of sentences, or single sentences). The core result is a representation theorem showing a two-way correspondence between AGM base revision operators and certain "assignments": functions mapping belief bases to total - yet not transitive - "preference" relations between interpretations. Alongside, we present a companion result for the case when the AGM postulate of syntax-independence is abandoned. We also provide a characterization of all logics for which our result can be strengthened to assignments producing transitive preference relations (as in K&M's original work), giving rise to two more representation theorems for such logics, according to syntax dependence vs. independence.
Weakly Supervised Visual-Auditory Saliency Detection with Multigranularity Perception
Authors: Guotao Wang, Chenglizhao Chen, Dengping Fan, Aimin Hao, Hong Qin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2112.13697
Pdf link: https://arxiv.org/pdf/2112.13697
Abstract Thanks to the rapid advances in deep learning techniques and the wide availability of large-scale training sets, the performance of video saliency detection models has been improving steadily and significantly. However, deep learning-based visualaudio fixation prediction is still in its infancy. At present, only a few visual-audio sequences have been furnished, with real fixations being recorded in real visual-audio environments. Hence, it would be neither efficient nor necessary to recollect real fixations under the same visual-audio circumstances. To address this problem, this paper promotes a novel approach in a weakly supervised manner to alleviate the demand of large-scale training sets for visual-audio model training. By using only the video category tags, we propose the selective class activation mapping (SCAM) and its upgrade (SCAM+). In the spatial-temporal-audio circumstance, the former follows a coarse-to-fine strategy to select the most discriminative regions, and these regions are usually capable of exhibiting high consistency with the real human-eye fixations. The latter equips the SCAM with an additional multi-granularity perception mechanism, making the whole process more consistent with that of the real human visual system. Moreover, we distill knowledge from these regions to obtain complete new spatial-temporal-audio (STA) fixation prediction (FP) networks, enabling broad applications in cases where video tags are not available. Without resorting to any real human-eye fixation, the performances of these STA FP networks are comparable to those of fully supervised networks. The code and results are publicly available at https://github.com/guotaowang/STANet.
Keyword: localization

Recurrent Neural Networks (RNNs) with dimensionality reduction and break down in computational mechanics; application to multi-scale localization step
Authors: Ling Wu, Ludovic Noels
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2112.12842
Pdf link: https://arxiv.org/pdf/2112.12842
Abstract Artificial Neural Networks (NNWs) are appealing functions to substitute high dimensional and non-linear history-dependent problems in computational mechanics since they offer the possibility to drastically reduce the computational time. This feature has recently been exploited in the context of multi-scale simulations, in which the NNWs serve as surrogate model of micro-scale finite element resolutions. Nevertheless, in the literature, mainly the macro-stress-macro-strain response of the meso-scale boundary value problem was considered and the micro-structure information could not be recovered in a so-called localization step. In this work, we develop Recurrent Neural Networks (RNNs) as surrogates of the RVE response while being able to recover the evolution of the local micro-structure state variables for complex loading scenarios. The main difficulty is the high dimensionality of the RNNs output which consists in the internal state variable distribution in the micro-structure. We thus propose and compare several surrogate models based on a dimensionality reduction: i) direct RNN modeling with implicit NNW dimensionality reduction, ii) RNN with PCA dimensionality reduction, and iii) RNN with PCA dimensionality reduction and dimensionality break down, i.e. the use of several RNNs instead of a single one. Besides, we optimize the sequential training strategy of the latter surrogate for GPU usage in order to speed up the process. Finally, through RNN modeling of the principal components coefficients, the connection between the physical state variables and the hidden variables of the RNN is revealed, and exploited in order to select the hyper-parameters of the RNN-based surrogate models in their design stage.
DeepMTL Pro: Deep Learning Based MultipleTransmitter Localization and Power Estimation
Authors: Caitao Zhan, Mohammad Ghaderibaneh, Pranjal Sahu, Himanshu Gupta
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2112.13181
Pdf link: https://arxiv.org/pdf/2112.13181
Abstract In this paper, we address the problem of Multiple Transmitter Localization (MTL). MTL is to determine the locations of potential multiple transmitters in a field, based on readings from a distributed set of sensors. In contrast to the widely studied single transmitter localization problem, the MTL problem has only been studied recently in a few works. MTL is of great significance in many applications wherein intruders may be present. E.g., in shared spectrum systems, detection of unauthorized transmitters and estimating their power are imperative to efficient utilization of the shared spectrum. In this paper, we present DeepMTL, a novel deep-learning approach to address the MTL problem. In particular, we frame MTL as a sequence of two steps, each of which is a computer vision problem: image-to-image translation and object detection. The first step of image-to-image translation essentially maps an input image representing sensor readings to an image representing the distribution of transmitter locations, and the second object detection step derives precise locations of transmitters from the image of transmitter distributions. For the first step, we design our learning model Sen2Peak, while for the second step, we customize a state-of-the-art object detection model Yolo-cust. Using DeepMTL as a building block, we also develop techniques to estimate transmit power of the localized transmitters. We demonstrate the effectiveness of our approach via extensive large-scale simulations, and show that our approach outperforms the previous approaches significantly (by 50% or more) in accuracy performance metrics, and incurs an order of magnitude less latency compared to other prior works. We also evaluate our techniques over a small-scale area with real testbed data.
Edge Robotics: Edge-Computing-Accelerated Multi-Robot Simultaneous Localization and Mapping
Authors: Peng Huang, Liekang Zeng, Xu Chen, Ke Luo, Zhi Zhou, Shuai Yu
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2112.13222
Pdf link: https://arxiv.org/pdf/2112.13222
Abstract With the wide penetration of smart robots in multifarious fields, Simultaneous Localization and Mapping (SLAM) technique in robotics has attracted growing attention in the community. Yet collaborating SLAM over multiple robots still remains challenging due to performance contradiction between the intensive graphics computation of SLAM and the limited computing capability of robots. While traditional solutions resort to the powerful cloud servers acting as an external computation provider, we show by real-world measurements that the significant communication overhead in data offloading prevents its practicability to real deployment. To tackle these challenges, this paper promotes the emerging edge computing paradigm into multi-robot SLAM and proposes RecSLAM, a multi-robot laser SLAM system that focuses on accelerating map construction process under the robot-edge-cloud architecture. In contrast to conventional multi-robot SLAM that generates graphic maps on robots and completely merges them on the cloud, RecSLAM develops a hierarchical map fusion technique that directs robots' raw data to edge servers for real-time fusion and then sends to the cloud for global merging. To optimize the overall pipeline, an efficient multi-robot SLAM collaborative processing framework is introduced to adaptively optimize robot-to-edge offloading tailored to heterogeneous edge resource conditions, meanwhile ensuring the workload balancing among the edge servers. Extensive evaluations show RecSLAM can achieve up to 39% processing latency reduction over the state-of-the-art. Besides, a proof-of-concept prototype is developed and deployed in real scenes to demonstrate its effectiveness.
Simultaneous Location of Rail Vehicles and Mapping of Environment with Multiple LiDARs
Authors: Yusheng Wang, Weiwei Song, Yidong Lou, Fei Huang, Zhiyong Tu, Shimin Zhang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2112.13224
Pdf link: https://arxiv.org/pdf/2112.13224
Abstract Precise and real-time rail vehicle localization as well as railway environment monitoring is crucial for railroad safety. In this letter, we propose a multi-LiDAR based simultaneous localization and mapping (SLAM) system for railway applications. Our approach starts with measurements preprocessing to denoise and synchronize multiple LiDAR inputs. Different frame-to-frame registration methods are used according to the LiDAR placement. In addition, we leverage the plane constraints from extracted rail tracks to improve the system accuracy. The local map is further aligned with global map utilizing absolute position measurements. Considering the unavoidable metal abrasion and screw loosening, online extrinsic refinement is awakened for long-during operation. The proposed method is extensively verified on datasets gathered over 3000 km. The results demonstrate that the proposed system achieves accurate and robust localization together with effective mapping for large-scale environments. Our system has already been applied to a freight traffic railroad for monitoring tasks.
UV-SLAM: Unconstrained Line-based SLAM Using Vanishing Points for Structural Mapping
Authors: Hyunjun Lim, Jinwoo Jeon, Hyun Myung
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2112.13515
Pdf link: https://arxiv.org/pdf/2112.13515
Abstract In feature-based simultaneous localization and mapping (SLAM), line features complement the sparsity of point features, making it possible to map the surrounding environment structure. Existing approaches utilizing line features have primarily employed a measurement model that uses line re-projection. However, the direction vectors used in the 3D line mapping process cannot be corrected because the line measurement model employs only the lines' normal vectors in the Pl\"{u}cker coordinate. As a result, problems like degeneracy that occur during the 3D line mapping process cannot be solved. To tackle the problem, this paper presents a UV-SLAM, which is an unconstrained line-based SLAM using vanishing points for structural mapping. This paper focuses on using structural regularities without any constraints, such as the Manhattan world assumption. For this, we use the vanishing points that can be obtained from the line features. The difference between the vanishing point observation calculated through line features in the image and the vanishing point estimation calculated through the direction vector is defined as a residual and added to the cost function of optimization-based SLAM. Furthermore, through Fisher information matrix rank analysis, we prove that vanishing point measurements guarantee a unique mapping solution. Finally, we demonstrate that the localization accuracy and mapping quality are improved compared to the state-of-the-art algorithms using public datasets.

zhuhu00 / Paper-Daily-Notice

New submissions for Tue, 28 Dec 21 #69

Keyword: SLAM

3D Point Cloud Reconstruction and SLAM as an Input

Edge Robotics: Edge-Computing-Accelerated Multi-Robot Simultaneous Localization and Mapping

Simultaneous Location of Rail Vehicles and Mapping of Environment with Multiple LiDARs

UV-SLAM: Unconstrained Line-based SLAM Using Vanishing Points for Structural Mapping

M2DGR: A Multi-sensor and Multi-scenario SLAM Dataset for Ground Robots

Keyword: Visual inertial

Keyword: livox

Keyword: loam

Keyword: Visual inertial odometry

Keyword: lidar

Doppler velocity-based algorithm for Clustering and Velocity Estimation of moving objects

Simultaneous Location of Rail Vehicles and Mapping of Environment with Multiple LiDARs

M2DGR: A Multi-sensor and Multi-scenario SLAM Dataset for Ground Robots

Keyword: loop detection

Keyword: autonomous driving

Multi-Camera Sensor Fusion for Visual Odometry using Deep Uncertainty Estimation

Doppler velocity-based algorithm for Clustering and Velocity Estimation of moving objects

A Survey on Interpretable Reinforcement Learning

Adversarial Attack for Asynchronous Event-based Data

An Empirical Study of Adder Neural Networks for Object Detection

Keyword: mapping

One-to-One or One-to-many? What function inlining brings to binary2source similarity analysis

Continuous Spectral Reconstruction from RGB Images via Implicit Neural Representation

Edge Robotics: Edge-Computing-Accelerated Multi-Robot Simultaneous Localization and Mapping

Simultaneous Location of Rail Vehicles and Mapping of Environment with Multiple LiDARs

Deeper Clinical Document Understanding Using Relation Extraction

Miti-DETR: Object Detection based on Transformers with Mitigatory Self-Attention Convergence

A Compact Neural Network-based Algorithm for Robust Image Watermarking

UV-SLAM: Unconstrained Line-based SLAM Using Vanishing Points for Structural Mapping

Sparsest Univariate Learning Models Under Lipschitz Constraint

Semantic Characterizations of General Belief Base Revision

Weakly Supervised Visual-Auditory Saliency Detection with Multigranularity Perception

Keyword: localization

Recurrent Neural Networks (RNNs) with dimensionality reduction and break down in computational mechanics; application to multi-scale localization step

DeepMTL Pro: Deep Learning Based MultipleTransmitter Localization and Power Estimation

Edge Robotics: Edge-Computing-Accelerated Multi-Robot Simultaneous Localization and Mapping

Simultaneous Location of Rail Vehicles and Mapping of Environment with Multiple LiDARs

UV-SLAM: Unconstrained Line-based SLAM Using Vanishing Points for Structural Mapping