Abstract
Modern autonomous vehicles (AVs) often rely on vision, LIDAR, and even radar-based simultaneous localization and mapping (SLAM) frameworks for precise localization and navigation. However, modern SLAM frameworks often lead to unacceptably high levels of drift (i.e., localization error) when AVs observe few visually distinct features or encounter occlusions due to dynamic obstacles. This paper argues that minimizing drift must be a key desiderata in AV motion planning, which requires an AV to take active control decisions to move towards feature-rich regions while also minimizing conventional control cost. To do so, we first introduce a novel data-driven perception module that observes LIDAR point clouds and estimates which features/regions an AV must navigate towards for drift minimization. Then, we introduce an interpretable model predictive controller (MPC) that moves an AV toward such feature-rich regions while avoiding visual occlusions and gracefully trading off drift and control cost. Our experiments on challenging, dynamic scenarios in the state-of-the-art CARLA simulator indicate our method reduces drift up to 76.76% compared to benchmark approaches.
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
PillarGrid: Deep Learning-based Cooperative Perception for 3D Object Detection from Onboard-Roadside LiDAR
Authors: Zhengwei Bai, Guoyuan Wu, Matthew J. Barth, Yongkang Liu, Akin Sisbot, Kentaro Oguchi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
3D object detection plays a fundamental role in enabling autonomous driving, which is regarded as the significant key to unlocking the bottleneck of contemporary transportation systems from the perspectives of safety, mobility, and sustainability. Most of the state-of-the-art (SOTA) object detection methods from point clouds are developed based on a single onboard LiDAR, whose performance will be inevitably limited by the range and occlusion, especially in dense traffic scenarios. In this paper, we propose \textit{PillarGrid}, a novel cooperative perception method fusing information from multiple 3D LiDARs (both on-board and roadside), to enhance the situation awareness for connected and automated vehicles (CAVs). PillarGrid consists of four main phases: 1) cooperative preprocessing of point clouds, 2) pillar-wise voxelization and feature extraction, 3) grid-wise deep fusion of features from multiple sensors, and 4) convolutional neural network (CNN)-based augmented 3D object detection. A novel cooperative perception platform is developed for model training and testing. Extensive experimentation shows that PillarGrid outperforms the SOTA single-LiDAR-based 3D object detection methods with respect to both accuracy and range by a large margin.
Implicit LiDAR Network: LiDAR Super-Resolution via Interpolation Weight Prediction
Abstract
Super-resolution of LiDAR range images is crucial to improving many downstream tasks such as object detection, recognition, and tracking. While deep learning has made a remarkable advances in super-resolution techniques, typical convolutional architectures limit upscaling factors to specific output resolutions in training. Recent work has shown that a continuous representation of an image and learning its implicit function enable almost limitless upscaling. However, the detailed approach, predicting values (depths) for neighbor pixels in the input and then linearly interpolating them, does not best fit the LiDAR range images since it does not fill the unmeasured details but creates a new image with regression in a high-dimensional space. In addition, the linear interpolation blurs sharp edges providing important boundary information of objects in 3-D points. To handle these problems, we propose a novel network, Implicit LiDAR Network (ILN), which learns not the values per pixels but weights in the interpolation so that the superresolution can be done by blending the input pixel depths but with non-linear weights. Also, the weights can be considered as attentions from the query to the neighbor pixels, and thus an attention module in the recent Transformer architecture can be leveraged. Our experiments with a novel large-scale synthetic dataset demonstrate that the proposed network reconstructs more accurately than the state-of-the-art methods, achieving much faster convergence in training.
CVFNet: Real-time 3D Object Detection by Learning Cross View Features
Abstract
In recent years 3D object detection from LiDAR point clouds has made great progress thanks to the development of deep learning technologies. Although voxel or point based methods are popular in 3D object detection, they usually involve time-consuming operations such as 3D convolutions on voxels or ball query among points, making the resulting network inappropriate for time critical applications. On the other hand, 2D view-based methods feature high computing efficiency while usually obtaining inferior performance than the voxel or point based methods. In this work, we present a real-time view-based single stage 3D object detector, namely CVFNet to fulfill this task. To strengthen the cross-view feature learning under the condition of demanding efficiency, our framework extracts the features of different views and fuses them in an efficient progressive way. We first propose a novel Point-Range feature fusion module that deeply integrates point and range view features in multiple stages. Then, a special Slice Pillar is designed to well maintain the 3D geometry when transforming the obtained deep point-view features into bird's eye view. To better balance the ratio of samples, a sparse pillar detection head is presented to focus the detection on the nonempty grids. We conduct experiments on the popular KITTI and NuScenes benchmark, and state-of-the-art performances are achieved in terms of both accuracy and speed.
A Single Correspondence Is Enough: Robust Global Registration to Avoid Degeneracy in Urban Environments
Abstract
Global registration using 3D point clouds is a crucial technology for mobile platforms to achieve localization or manage loop-closing situations. In recent years, numerous researchers have proposed global registration methods to address a large number of outlier correspondences. Unfortunately, the degeneracy problem, which represents the phenomenon in which the number of estimated inliers becomes lower than three, is still potentially inevitable. To tackle the problem, a degeneracy-robust decoupling-based global registration method is proposed, called Quatro. In particular, our method employs quasi-SO(3) estimation by leveraging the Atlanta world assumption in urban environments to avoid degeneracy in rotation estimation. Thus, the minimum degree of freedom (DoF) of our method is reduced from three to one. As verified in indoor and outdoor 3D LiDAR datasets, our proposed method yields robust global registration performance compared with other global registration methods, even for distant point cloud pairs. Furthermore, the experimental results confirm the applicability of our method as a coarse alignment. Our code is available: https://github.com/url-kaist/quatro.
Drift Reduced Navigation with Deep Explainable Features
Authors: Mohd Omama, Sundar Sripada Venugopalaswamy Sriraman, Sandeep Chinchali, Arun Kumar Singh, K. Madhava Krishna
Abstract
Modern autonomous vehicles (AVs) often rely on vision, LIDAR, and even radar-based simultaneous localization and mapping (SLAM) frameworks for precise localization and navigation. However, modern SLAM frameworks often lead to unacceptably high levels of drift (i.e., localization error) when AVs observe few visually distinct features or encounter occlusions due to dynamic obstacles. This paper argues that minimizing drift must be a key desiderata in AV motion planning, which requires an AV to take active control decisions to move towards feature-rich regions while also minimizing conventional control cost. To do so, we first introduce a novel data-driven perception module that observes LIDAR point clouds and estimates which features/regions an AV must navigate towards for drift minimization. Then, we introduce an interpretable model predictive controller (MPC) that moves an AV toward such feature-rich regions while avoiding visual occlusions and gracefully trading off drift and control cost. Our experiments on challenging, dynamic scenarios in the state-of-the-art CARLA simulator indicate our method reduces drift up to 76.76% compared to benchmark approaches.
LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network
Abstract
With the rapid advances of autonomous driving, it becomes critical to equip its sensing system with more holistic 3D perception. However, existing works focus on parsing either the objects (e.g. cars and pedestrians) or scenes (e.g. trees and buildings) from the LiDAR sensor. In this work, we address the task of LiDAR-based panoptic segmentation, which aims to parse both objects and scenes in a unified manner. As one of the first endeavors towards this new challenging task, we propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm. In particular, DS-Net has three appealing properties: 1) Strong backbone design. DS-Net adopts the cylinder convolution that is specifically designed for LiDAR point clouds. 2) Dynamic Shifting for complex point distributions. We observe that commonly-used clustering algorithms are incapable of handling complex autonomous driving scenes with non-uniform point cloud distributions and varying instance sizes. Thus, we present an efficient learnable clustering module, dynamic shifting, which adapts kernel functions on the fly for different instances. 3) Extension to 4D prediction. Furthermore, we extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames. To comprehensively evaluate the performance of LiDAR-based panoptic segmentation, we construct and curate benchmarks from two large-scale autonomous driving LiDAR datasets, SemanticKITTI and nuScenes. Extensive experiments demonstrate that our proposed DS-Net achieves superior accuracies over current state-of-the-art methods in both tasks. Notably, in the single frame version of the task, we outperform the SOTA method by 1.8% in terms of the PQ metric. In the 4D version of the task, we surpass 2nd place by 5.4% in terms of the LSTQ metric.
Keyword: loop detection
There is no result
Keyword: autonomous driving
PillarGrid: Deep Learning-based Cooperative Perception for 3D Object Detection from Onboard-Roadside LiDAR
Authors: Zhengwei Bai, Guoyuan Wu, Matthew J. Barth, Yongkang Liu, Akin Sisbot, Kentaro Oguchi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
3D object detection plays a fundamental role in enabling autonomous driving, which is regarded as the significant key to unlocking the bottleneck of contemporary transportation systems from the perspectives of safety, mobility, and sustainability. Most of the state-of-the-art (SOTA) object detection methods from point clouds are developed based on a single onboard LiDAR, whose performance will be inevitably limited by the range and occlusion, especially in dense traffic scenarios. In this paper, we propose \textit{PillarGrid}, a novel cooperative perception method fusing information from multiple 3D LiDARs (both on-board and roadside), to enhance the situation awareness for connected and automated vehicles (CAVs). PillarGrid consists of four main phases: 1) cooperative preprocessing of point clouds, 2) pillar-wise voxelization and feature extraction, 3) grid-wise deep fusion of features from multiple sensors, and 4) convolutional neural network (CNN)-based augmented 3D object detection. A novel cooperative perception platform is developed for model training and testing. Extensive experimentation shows that PillarGrid outperforms the SOTA single-LiDAR-based 3D object detection methods with respect to both accuracy and range by a large margin.
Characterizing and Understanding Software Security Vulnerabilities in Machine Learning Libraries
Authors: Nima Shiri Harzevili, Jiho Shin, Junjie Wang, Song Wang
Abstract
The application of machine learning (ML) libraries has been tremendously increased in many domains, including autonomous driving systems, medical, and critical industries. Vulnerabilities of such libraries result in irreparable consequences. However, the characteristics of software security vulnerabilities have not been well studied. In this paper, to bridge this gap, we take the first step towards characterizing and understanding the security vulnerabilities of five well-known ML libraries, including Tensorflow, PyTorch, Sickit-learn, Pandas, and Numpy. To do so, in total, we collected 596 security-related commits to exploring five major factors: 1) vulnerability types, 2) root causes, 3) symptoms, 4) fixing patterns, and 5) fixing efforts of security vulnerabilities in ML libraries. The findings of this study can assist developers in having a better understanding of software security vulnerabilities across different ML libraries and gain a better insight into their weaknesses of them. To make our finding actionable, we further developed DeepMut, an automated mutation testing tool, as a proof-of-concept application of our findings. DeepMut is designed to assess the adequacy of existing test suites of ML libraries against security-aware mutation operators extracted from the vulnerabilities studied in this work. We applied DeepMut on the Tensorflow kernel module and found more than 1k alive mutants not considered by the existing test suits. The results demonstrate the usefulness of our findings.
Contrastive Learning for Automotive mmWave Radar Detection Points Based Instance Segmentation
Abstract
The automotive mmWave radar plays a key role in advanced driver assistance systems (ADAS) and autonomous driving. Deep learning-based instance segmentation enables real-time object identification from the radar detection points. In the conventional training process, accurate annotation is the key. However, high-quality annotations of radar detection points are challenging to achieve due to their ambiguity and sparsity. To address this issue, we propose a contrastive learning approach for implementing radar detection points-based instance segmentation. We define the positive and negative samples according to the ground-truth label, apply the contrastive loss to train the model first, and then perform training for the following downstream task. In addition, these two steps can be merged into one, and pseudo labels can be generated for the unlabeled data to improve the performance further. Thus, there are four different training settings for our method. Experiments show that when the ground-truth information is only available for 5% of the training data, our method still achieves a comparable performance to the approach trained in a supervised manner with 100% ground-truth information.
LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network
Abstract
With the rapid advances of autonomous driving, it becomes critical to equip its sensing system with more holistic 3D perception. However, existing works focus on parsing either the objects (e.g. cars and pedestrians) or scenes (e.g. trees and buildings) from the LiDAR sensor. In this work, we address the task of LiDAR-based panoptic segmentation, which aims to parse both objects and scenes in a unified manner. As one of the first endeavors towards this new challenging task, we propose the Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm. In particular, DS-Net has three appealing properties: 1) Strong backbone design. DS-Net adopts the cylinder convolution that is specifically designed for LiDAR point clouds. 2) Dynamic Shifting for complex point distributions. We observe that commonly-used clustering algorithms are incapable of handling complex autonomous driving scenes with non-uniform point cloud distributions and varying instance sizes. Thus, we present an efficient learnable clustering module, dynamic shifting, which adapts kernel functions on the fly for different instances. 3) Extension to 4D prediction. Furthermore, we extend DS-Net to 4D panoptic LiDAR segmentation by the temporally unified instance clustering on aligned LiDAR frames. To comprehensively evaluate the performance of LiDAR-based panoptic segmentation, we construct and curate benchmarks from two large-scale autonomous driving LiDAR datasets, SemanticKITTI and nuScenes. Extensive experiments demonstrate that our proposed DS-Net achieves superior accuracies over current state-of-the-art methods in both tasks. Notably, in the single frame version of the task, we outperform the SOTA method by 1.8% in terms of the PQ metric. In the 4D version of the task, we surpass 2nd place by 5.4% in terms of the LSTQ metric.
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation
Authors: Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract
We consider model-based multi-agent reinforcement learning, where the environment transition model is unknown and can only be learned via expensive interactions with the environment. We propose H-MARL (Hallucinated Multi-Agent Reinforcement Learning), a novel sample-efficient algorithm that can efficiently balance exploration, i.e., learning about the environment, and exploitation, i.e., achieve good equilibrium performance in the underlying general-sum Markov game. H-MARL builds high-probability confidence intervals around the unknown transition model and sequentially updates them based on newly observed data. Using these, it constructs an optimistic hallucinated game for the agents for which equilibrium policies are computed at each round. We consider general statistical models (e.g., Gaussian processes, deep ensembles, etc.) and policy classes (e.g., deep neural networks), and theoretically analyze our approach by bounding the agents' dynamic regret. Moreover, we provide a convergence rate to the equilibria of the underlying Markov game. We demonstrate our approach experimentally on an autonomous driving simulation benchmark. H-MARL learns successful equilibrium policies after a few interactions with the environment and can significantly improve the performance compared to non-exploratory methods.
Keyword: mapping
Bit-Metric Decoding Rate in Multi-User MIMO Systems: Theory
Authors: Pavan Koteshwar Srinath, Jakob Hoydis
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)
Abstract
Link-adaptation (LA) is one of the most important aspects of wireless communications where the modulation and coding scheme (MCS) used by the transmitter is adapted to the channel conditions in order to meet a certain target error-rate. In a single-user SISO (SU-SISO) system, LA is performed by computing the post-equalization signal-to-interference-noise ratio (SINR) at the receiver. The same technique can be employed in multi-user MIMO (MU-MIMO) receivers that use linear detectors. Another important use of post-equalization SINR is for physical layer (PHY) abstraction, where several PHY blocks like the channel encoder, the detector, and the channel decoder are replaced by an abstraction model in order to speed up system-level simulations. This is achieved by mapping the post-equalization SINR to a codeword error rate (CER) or a block error rate (BLER). However, for MU-MIMO systems with non-linear receivers, like those that use variants of the sphere-decoder algorithm, there is no known equivalent of post-equalization SINR which makes both LA and PHY abstraction extremely challenging. This important issue is addressed in this two-part paper. A metric called the bit-metric decoding rate (BMDR) of a detector for a set of channel realizations is presented in this part. BMDR is the proposed equivalent of post-equalization SINR for arbitrary detectors. Since BMDR does not have a closed form expression that would enable its instantaneous calculation, a machine-learning approach to predict it is presented. The second part describes the algorithms to perform LA, detector selection, and PHY abstraction using BMDR for MU-MIMO systems with arbitrary detectors. Extensive simulation results corroborating the claims are presented.
G2GML: Graph to Graph Mapping Language for Bridging RDF and Property Graphs
Abstract
How can we maximize the value of accumulated RDF data? Whereas the RDF data can be queried using the SPARQL language, even the SPARQL-based operation has a limitation in implementing traversal or analytical algorithms. Recently, a variety of database implementations dedicated to analyses on the property graph (PG) model have emerged. Importing RDF datasets into these graph analysis engines provides access to the accumulated datasets through various application interfaces. However, the RDF model and the PG model are not interoperable. Here, we developed a framework based on the Graph to Graph Mapping Language (G2GML) for mapping RDF graphs to PGs to make the most of accumulated RDF data. Using this framework, accumulated graph data described in the RDF model can be converted to the PG model, which can then be loaded to graph database engines for further analysis. For supporting different graph database implementations, we redefined the PG model and proposed its exchangeable serialization formats. We demonstrate several use cases, where publicly available RDF data are extracted and converted to PGs. This study bridges RDF and PGs and contributes to interoperable management of knowledge graphs, thereby expanding the use cases of accumulated RDF data.
A Mixed Quantization Network for Computationally Efficient Mobile Inverse Tone Mapping
Authors: Juan Borrego-Carazo, Mete Ozay, Frederik Laboyrie, Paul Wisbey
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recovering a high dynamic range (HDR) image from a single low dynamic range (LDR) image, namely inverse tone mapping (ITM), is challenging due to the lack of information in over- and under-exposed regions. Current methods focus exclusively on training high-performing but computationally inefficient ITM models, which in turn hinder deployment of the ITM models in resource-constrained environments with limited computing power such as edge and mobile device applications. To this end, we propose combining efficient operations of deep neural networks with a novel mixed quantization scheme to construct a well-performing but computationally efficient mixed quantization network (MQN) which can perform single image ITM on mobile platforms. In the ablation studies, we explore the effect of using different attention mechanisms, quantization schemes, and loss functions on the performance of MQN in ITM tasks. In the comparative analyses, ITM models trained using MQN perform on par with the state-of-the-art methods on benchmark datasets. MQN models provide up to 10 times improvement on latency and 25 times improvement on memory consumption.
Secret-to-Image Reversible Transformation for Generative Steganography
Authors: Zhili Zhou, Yuecheng Su, Q. M. Jonathan Wu, Zhangjie Fu, Yunqing Shi
Abstract
Recently, generative steganography that transforms secret information to a generated image has been a promising technique to resist steganalysis detection. However, due to the inefficiency and irreversibility of the secret-to-image transformation, it is hard to find a good trade-off between the information hiding capacity and extraction accuracy. To address this issue, we propose a secret-to-image reversible transformation (S2IRT) scheme for generative steganography. The proposed S2IRT scheme is based on a generative model, i.e., Glow model, which enables a bijective-mapping between latent space with multivariate Gaussian distribution and image space with a complex distribution. In the process of S2I transformation, guided by a given secret message, we construct a latent vector and then map it to a generated image by the Glow model, so that the secret message is finally transformed to the generated image. Owing to good efficiency and reversibility of S2IRT scheme, the proposed steganographic approach achieves both high hiding capacity and accurate extraction of secret message from generated image. Furthermore, a separate encoding-based S2IRT (SE-S2IRT) scheme is also proposed to improve the robustness to common image attacks. The experiments demonstrate the proposed steganographic approaches can achieve high hiding capacity (up to 4 bpp) and accurate information extraction (almost 100% accuracy rate) simultaneously, while maintaining desirable anti-detectability and imperceptibility.
FlexBlock: A Flexible DNN Training Accelerator with Multi-Mode Block Floating Point Support
Authors: Seock-Hwan Noh, Jahyun Koo, Seunghyun Lee, Jongse Park, Jaeha Kung
Abstract
Training deep neural networks (DNNs) is a computationally expensive job, which can take weeks or months even with high performance GPUs. As a remedy for this challenge, community has started exploring the use of more efficient data representations in the training process, e.g., block floating point (BFP). However, prior work on BFP-based DNN accelerators rely on a specific BFP representation making them less versatile. This paper builds upon an algorithmic observation that we can accelerate the training by leveraging multiple BFP precisions without compromising the finally achieved accuracy. Backed up by this algorithmic opportunity, we develop a flexible DNN training accelerator, dubbed FlexBlock, which supports three different BFP precision modes, possibly different among activation, weight, and gradient tensors. While several prior works proposed such multi-precision support for DNN accelerators, not only do they focus only on the inference, but also their core utilization is suboptimal at a fixed precision and specific layer types when the training is considered. Instead, FlexBlock is designed in such a way that high core utilization is achievable for i) various layer types, and ii) three BFP precisions by mapping data in a hierarchical manner to its compute units. We evaluate the effectiveness of FlexBlock architecture using well-known DNNs on CIFAR, ImageNet and WMT14 datasets. As a result, training in FlexBlock significantly improves the training speed by 1.5~5.3x and the energy efficiency by 2.4~7.0x on average compared to other training accelerators and incurs marginal accuracy loss compared to full-precision training.
Semi-Discrete Normalizing Flows through Differentiable Tessellation
Authors: Ricky T. Q. Chen, Brandon Amos, Maximilian Nickel
Abstract
Mapping between discrete and continuous distributions is a difficult task and many have had to resort to approximate or heuristical approaches. We propose a tessellation-based approach that directly learns quantization boundaries on a continuous space, complete with exact likelihood evaluations. This is done through constructing normalizing flows on convex polytopes parameterized through a differentiable Voronoi tessellation. Using a simple homeomorphism with an efficient log determinant Jacobian, we can then cheaply parameterize distributions on convex polytopes. We explore this approach in two application settings, mapping from discrete to continuous and vice versa. Firstly, a Voronoi dequantization allows automatically learning quantization boundaries in a multidimensional space. The location of boundaries and distances between regions can encode useful structural relations between the quantized discrete values. Secondly, a Voronoi mixture model has constant computation cost for likelihood evaluation regardless of the number of mixture components. Empirically, we show improvements over existing methods across a range of structured data modalities, and find that we can achieve a significant gain from just adding Voronoi mixtures to a baseline model.
Drift Reduced Navigation with Deep Explainable Features
Authors: Mohd Omama, Sundar Sripada Venugopalaswamy Sriraman, Sandeep Chinchali, Arun Kumar Singh, K. Madhava Krishna
Abstract
Modern autonomous vehicles (AVs) often rely on vision, LIDAR, and even radar-based simultaneous localization and mapping (SLAM) frameworks for precise localization and navigation. However, modern SLAM frameworks often lead to unacceptably high levels of drift (i.e., localization error) when AVs observe few visually distinct features or encounter occlusions due to dynamic obstacles. This paper argues that minimizing drift must be a key desiderata in AV motion planning, which requires an AV to take active control decisions to move towards feature-rich regions while also minimizing conventional control cost. To do so, we first introduce a novel data-driven perception module that observes LIDAR point clouds and estimates which features/regions an AV must navigate towards for drift minimization. Then, we introduce an interpretable model predictive controller (MPC) that moves an AV toward such feature-rich regions while avoiding visual occlusions and gracefully trading off drift and control cost. Our experiments on challenging, dynamic scenarios in the state-of-the-art CARLA simulator indicate our method reduces drift up to 76.76% compared to benchmark approaches.
Non-Parametric Modeling of Spatio-Temporal Human Activity Based on Mobile Robot Observations
Abstract
This work presents a non-parametric spatio-temporal model for mapping human activity by mobile autonomous robots in a long-term context. Based on Variational Gaussian Process Regression, the model incorporates prior information of spatial and temporal-periodic dependencies to create a continuous representation of human occurrences. The inhomogeneous data distribution resulting from movements of the robot is included in the model via a heteroscedastic likelihood function and can be accounted for as predictive uncertainty. Using a sparse formulation, data sets over multiple weeks and several hundred square meters can be used for model creation. The experimental evaluation, based on multi-week data sets, demonstrates that the proposed approach outperforms the state of the art both in terms of predictive quality and subsequent path planning.
Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots
Authors: Zejin Wang, Jiazheng Liu, Guoqing Li, Hua Han
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Real noisy-clean pairs on a large scale are costly and difficult to obtain. Meanwhile, supervised denoisers trained on synthetic data perform poorly in practice. Self-supervised denoisers, which learn only from single noisy images, solve the data collection problem. However, self-supervised denoising methods, especially blindspot-driven ones, suffer sizable information loss during input or network design. The absence of valuable information dramatically reduces the upper bound of denoising performance. In this paper, we propose a simple yet efficient approach called Blind2Unblind to overcome the information loss in blindspot-driven denoising methods. First, we introduce a global-aware mask mapper that enables global perception and accelerates training. The mask mapper samples all pixels at blind spots on denoised volumes and maps them to the same channel, allowing the loss function to optimize all blind spots at once. Second, we propose a re-visible loss to train the denoising network and make blind spots visible. The denoiser can learn directly from raw noise images without losing information or being trapped in identity mapping. We also theoretically analyze the convergence of the re-visible loss. Extensive experiments on synthetic and real-world datasets demonstrate the superior performance of our approach compared to previous work. Code is available at https://github.com/demonsjin/Blind2Unblind.
MotionSC: Data Set and Network for Real-Time Semantic Mapping in Dynamic Environments
Authors: Joey Wilson, Jingyu Song, Yuewei Fu, Arthur Zhang, Andrew Capodieci, Paramsothy Jayakumar, Kira Barton, Maani Ghaffari
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
This work addresses a gap in semantic scene completion (SSC) data by creating a novel outdoor data set with accurate and complete dynamic scenes. Our data set is formed from randomly sampled views of the world at each time step, which supervises generalizability to complete scenes without occlusions or traces. We create SSC baselines from state-of-the-art open source networks and construct a benchmark real-time dense local semantic mapping algorithm, MotionSC, by leveraging recent 3D deep learning architectures to enhance SSC with temporal information. Our network shows that the proposed data set can quantify and supervise accurate scene completion in the presence of dynamic objects, which can lead to the development of improved dynamic mapping algorithms. All software is available at https://github.com/UMich-CURLY/3DMapping.
Conservative Filtering for Heterogeneous Decentralized Data Fusion in Dynamic Robotic Systems
Authors: Ofer Dagan, Nisar R. Ahmed
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Abstract
This paper presents a method for Bayesian multi-robot peer-to-peer data fusion where any pair of autonomous robots hold non-identical, but overlapping parts of a global joint probability distribution, representing real world inference tasks (e.g., mapping, tracking). It is shown that in dynamic stochastic systems, filtering, which corresponds to marginalization of past variables, results in direct and hidden dependencies between variables not mutually monitored by the robots, which might lead to an overconfident fused estimate. The paper makes both theoretical and practical contributions by providing (i) a rigorous analysis of the origin of the dependencies and and (ii) a conservative filtering algorithm for heterogeneous data fusion in dynamic systems that can be integrated with existing fusion algorithms. This work uses factor graphs as an analysis tool and an inference engine. Each robot in the network maintains a local factor graph and communicates only relevant parts of it (a sub-graph) to its neighboring robot. We discuss the applicability to various multi-robot robotic applications and demonstrate the performance using a multi-robot multi-target tracking simulation, showing that the proposed algorithm produces conservative estimates at each robot.
Orchestrated Value Mapping for Reinforcement Learning
Abstract
We present a general convergent class of reinforcement learning algorithms that is founded on two distinct principles: (1) mapping value estimates to a different space using arbitrary functions from a broad class, and (2) linearly decomposing the reward signal into multiple channels. The first principle enables incorporating specific properties into the value estimator that can enhance learning. The second principle, on the other hand, allows for the value function to be represented as a composition of multiple utility functions. This can be leveraged for various purposes, e.g. dealing with highly varying reward scales, incorporating a priori knowledge about the sources of reward, and ensemble learning. Combining the two principles yields a general blueprint for instantiating convergent algorithms by orchestrating diverse mapping functions over multiple reward channels. This blueprint generalizes and subsumes algorithms such as Q-Learning, Log Q-Learning, and Q-Decompose. In addition, our convergence proof for this general class relaxes certain required assumptions in some of these algorithms. Based on our theory, we discuss several interesting configurations as special cases. Finally, to illustrate the potential of the design space that our theory opens up, we instantiate a particular algorithm and evaluate its performance on the Atari suite.
Relational Diagrams: a pattern-preserving diagrammatic representation of non-disjunctive Relational Queries
Authors: Wolfgang Gatterbauer, Cody Dunne, Mirek Riedewald
Subjects: Databases (cs.DB); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
Abstract
Analyzing relational languages by their logical expressiveness is well understood. Something not well understood or even formalized is the vague concept of relational query patterns. What are query patterns? And how can we reason about query patterns across different relational languages, irrespective of their syntax and their procedural or declarative nature? In this paper, we formalize the concept of query patterns with a variant of pattern-preserving mappings between the relational atoms of queries. This formalism allows us to analyze the relative pattern expressiveness of relational query languages and to create a hierarchy of languages with equal logical expressiveness yet different pattern expressiveness. In this analysis, relational calculus can expressive more patterns than the basic operators of relational algebra. We additionally contribute an intuitive, complete, and sound diagrammatic representation of safe relational calculus that is not only relationally complete, but can also express all logical patterns for the large and useful fragment of non-disjunctive relational calculus. Among all diagrammatic representations for relational queries that we are aware of, this is the only one that is relationally complete and that can represent all logical patterns in the non-disjunctive fragment.
Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies
Authors: Alex J. Chan, Alicia Curth, Mihaela van der Schaar
Abstract
Human decision making is well known to be imperfect and the ability to analyse such processes individually is crucial when attempting to aid or improve a decision-maker's ability to perform a task, e.g. to alert them to potential biases or oversights on their part. To do so, it is necessary to develop interpretable representations of how agents make decisions and how this process changes over time as the agent learns online in reaction to the accrued experience. To then understand the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem. By interpreting actions within a potential outcomes framework, we introduce a meaningful mapping based on agents choosing an action they believe to have the greatest treatment effect. We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them, using a novel architecture built upon an expressive family of deep state-space models. Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
Keyword: localization
Distributed Dual Quaternion Based Localization of Visual Sensor Networks
Authors: Luca Varotto, Marco Fabris, Giulia Michieletto, Angelo Cenedese
Abstract
In this paper we consider the localization problem for a visual sensor network. Inspired by the alternate attitude and position distributed optimization framework discussed in [1], we propose an estimation scheme that exploits the unit dual quaternion algebra to describe the sensors pose. This representation is beneficial in the formulation of the optimization scheme allowing to solve the localization problem without designing two interlaced position and orientation estimators, thus improving the estimation error distribution over the two pose components and the overall localization performance. Furthermore, the numerical experimentation asserts the robustness of the proposed algorithm w.r.t. the initial conditions.
End-to-End Multi-Tab Website Fingerprinting Attack: A Detection Perspective
Abstract
Website fingerprinting attack (WFA) aims to deanonymize the website a user is visiting through anonymous networks channels (e.g., Tor). Despite of remarkable progress in the past years, most existing methods make implicitly a couple of artificial assumptions that (1) only a single website (i.e., single-tab) is visited each time, and (2) website fingerprinting data are pre-trimmed into a single trace per website manually. In reality, a user often open multiple tabs for multiple websites spontaneously. Indeed, this multi-tab WFA (MT-WFA) setting has been studied in a few recent works, but all of them still fail to fully respect the real-world situations. In particular, the overlapping challenge between website fingerprinting has never been investigated in depth. In this work, we redefine the problem of MT-WFA as detecting multiple monitored traces, given a natural untrimmed traffic data including monitored traces, unmonitored traces, and potentially unconstrained overlapping between them. This eliminates the above assumptions, going beyond the conventional single website fingerprint classification perspective taken by all previous WFA methods. To tackle this realistic MT-WFA problem, we formulate a novel Website Fingerprint Detection (WFD) model capable of detecting accurately the start and end points of all the monitored traces and classifying them jointly, given long, untrimmed raw traffic data. WFD is end-to-end, with the trace localization and website classification integrated in a single unified pipeline. To enable quantitative evaluation in our MT-WFA setting, we introduce new performance metrics. Extensive experiments on several newly constructed benchmarks show that our WFD outperforms the state-of-the-art alternative methods in both accuracy and efficiency by a large margin, even with a very small training set. Code is available at https://github.com/WFDetector/WFDetection
Evaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements?
Authors: Weina Jin, Xiaoxiao Li, Ghassan Hamarneh
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Being able to explain the prediction to clinical end-users is a necessity to leverage the power of artificial intelligence (AI) models for clinical decision support. For medical images, a feature attribution map, or heatmap, is the most common form of explanation that highlights important features for AI models' prediction. However, it is unknown how well heatmaps perform on explaining decisions on multi-modal medical images, where each image modality or channel visualizes distinct clinical information of the same underlying biomedical phenomenon. Understanding such modality-dependent features is essential for clinical users' interpretation of AI decisions. To tackle this clinically important but technically ignored problem, we propose the modality-specific feature importance (MSFI) metric. It encodes clinical image and explanation interpretation patterns of modality prioritization and modality-specific feature localization. We conduct a clinical requirement-grounded, systematic evaluation using computational methods and a clinician user study. Results show that the examined 16 heatmap algorithms failed to fulfill clinical requirements to correctly indicate AI model decision process or decision quality. The evaluation and MSFI metric can guide the design and selection of XAI algorithms to meet clinical requirements on multi-modal explanation.
A Single Correspondence Is Enough: Robust Global Registration to Avoid Degeneracy in Urban Environments
Abstract
Global registration using 3D point clouds is a crucial technology for mobile platforms to achieve localization or manage loop-closing situations. In recent years, numerous researchers have proposed global registration methods to address a large number of outlier correspondences. Unfortunately, the degeneracy problem, which represents the phenomenon in which the number of estimated inliers becomes lower than three, is still potentially inevitable. To tackle the problem, a degeneracy-robust decoupling-based global registration method is proposed, called Quatro. In particular, our method employs quasi-SO(3) estimation by leveraging the Atlanta world assumption in urban environments to avoid degeneracy in rotation estimation. Thus, the minimum degree of freedom (DoF) of our method is reduced from three to one. As verified in indoor and outdoor 3D LiDAR datasets, our proposed method yields robust global registration performance compared with other global registration methods, even for distant point cloud pairs. Furthermore, the experimental results confirm the applicability of our method as a coarse alignment. Our code is available: https://github.com/url-kaist/quatro.
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video
Authors: Bin Li, Yixuan Weng, Bin Sun, Shutao Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
The temporal answering grounding in the video (TAGV) is a new task naturally deriving from temporal sentence grounding in the video (TSGV). Given an untrimmed video and a text question, this task aims at locating the matching span from the video that can semantically answer the question. Existing methods tend to formulate the TAGV task with a visual span-based question answering (QA) approach by matching the visual frame span queried by the text question. However, due to the weak correlations and huge gaps in semantics in features between the textual question and visual answer, existing methods adopting visual span predictor fail to perform well in the TAGV task. In this work, we propose a visual-prompt text span localizing (VPTSL) method, which enhances the text span localization in the pre-trained language model (PLM) with the visual highlight features. Specifically, the context query attention is utilized to perform cross-modal modeling between the textual and visual features. Then, the highlight features are obtained through the highlight module with a linear layer to provide the visual prompt. To alleviate the differences in semantics and correlations between textual and visual features, we design the text span predictor by encoding the question, the subtitles, and the visual prompt in the PLM. As a result, the TAGV task is formulated to predict the span of subtitles matching the answering frame timeline. Extensive experiments on the medical instructional dataset, namely MedVidQA, show the proposed VPTSL outperforms other state-of-the-art methods, which demonstrates the effectiveness of visual prompt and the text span predictor.
Drift Reduced Navigation with Deep Explainable Features
Authors: Mohd Omama, Sundar Sripada Venugopalaswamy Sriraman, Sandeep Chinchali, Arun Kumar Singh, K. Madhava Krishna
Abstract
Modern autonomous vehicles (AVs) often rely on vision, LIDAR, and even radar-based simultaneous localization and mapping (SLAM) frameworks for precise localization and navigation. However, modern SLAM frameworks often lead to unacceptably high levels of drift (i.e., localization error) when AVs observe few visually distinct features or encounter occlusions due to dynamic obstacles. This paper argues that minimizing drift must be a key desiderata in AV motion planning, which requires an AV to take active control decisions to move towards feature-rich regions while also minimizing conventional control cost. To do so, we first introduce a novel data-driven perception module that observes LIDAR point clouds and estimates which features/regions an AV must navigate towards for drift minimization. Then, we introduce an interpretable model predictive controller (MPC) that moves an AV toward such feature-rich regions while avoiding visual occlusions and gracefully trading off drift and control cost. Our experiments on challenging, dynamic scenarios in the state-of-the-art CARLA simulator indicate our method reduces drift up to 76.76% compared to benchmark approaches.
Abstract
Limited by the locality of convolutional neural networks, most existing local features description methods only learn local descriptors with local information and lack awareness of global and surrounding spatial context. In this work, we focus on making local descriptors "look wider to describe better" by learning local Descriptors with More Than just Local information (MTLDesc). Specifically, we resort to context augmentation and spatial attention mechanisms to make our MTLDesc obtain non-local awareness. First, Adaptive Global Context Augmented Module and Diverse Local Context Augmented Module are proposed to construct robust local descriptors with context information from global to local. Second, Consistent Attention Weighted Triplet Loss is designed to integrate spatial attention awareness into both optimization and matching stages of local descriptors learning. Third, Local Features Detection with Feature Pyramid is given to obtain more stable and accurate keypoints localization. With the above innovations, the performance of our MTLDesc significantly surpasses the prior state-of-the-art local descriptors on HPatches, Aachen Day-Night localization and InLoc indoor localization benchmarks.
RCL: Recurrent Continuous Localization for Temporal Action Detection
Authors: Qiang Wang, Yanhao Zhang, Yun Zheng, Pan Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Temporal representation is the cornerstone of modern action detection techniques. State-of-the-art methods mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the temporal domain with a discretized grid, and then regress the accurate boundaries. In this paper, we revisit this foundational stage and introduce Recurrent Continuous Localization (RCL), which learns a fully continuous anchoring representation. Specifically, the proposed representation builds upon an explicit model conditioned with video embeddings and temporal coordinates, which ensure the capability of detecting segments with arbitrary length. To optimize the continuous representation, we develop an effective scale-invariant sampling strategy and recurrently refine the prediction in subsequent iterations. Our continuous anchoring scheme is fully differentiable, allowing to be seamlessly integrated into existing detectors, e.g., BMN and G-TAD. Extensive experiments on two benchmarks demonstrate that our continuous representation steadily surpasses other discretized counterparts by ~2% mAP. As a result, RCL achieves 52.92% mAP@0.5 on THUMOS14 and 37.65% mAP on ActivtiyNet v1.3, outperforming all existing single-model detectors.
Keyword: SLAM
Drift Reduced Navigation with Deep Explainable Features
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
PillarGrid: Deep Learning-based Cooperative Perception for 3D Object Detection from Onboard-Roadside LiDAR
Implicit LiDAR Network: LiDAR Super-Resolution via Interpolation Weight Prediction
CVFNet: Real-time 3D Object Detection by Learning Cross View Features
A Single Correspondence Is Enough: Robust Global Registration to Avoid Degeneracy in Urban Environments
Drift Reduced Navigation with Deep Explainable Features
LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network
Keyword: loop detection
There is no result
Keyword: autonomous driving
PillarGrid: Deep Learning-based Cooperative Perception for 3D Object Detection from Onboard-Roadside LiDAR
Characterizing and Understanding Software Security Vulnerabilities in Machine Learning Libraries
Contrastive Learning for Automotive mmWave Radar Detection Points Based Instance Segmentation
LiDAR-based 4D Panoptic Segmentation via Dynamic Shifting Network
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation
Keyword: mapping
Bit-Metric Decoding Rate in Multi-User MIMO Systems: Theory
G2GML: Graph to Graph Mapping Language for Bridging RDF and Property Graphs
A Mixed Quantization Network for Computationally Efficient Mobile Inverse Tone Mapping
Secret-to-Image Reversible Transformation for Generative Steganography
FlexBlock: A Flexible DNN Training Accelerator with Multi-Mode Block Floating Point Support
Semi-Discrete Normalizing Flows through Differentiable Tessellation
Drift Reduced Navigation with Deep Explainable Features
Non-Parametric Modeling of Spatio-Temporal Human Activity Based on Mobile Robot Observations
Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots
MotionSC: Data Set and Network for Real-Time Semantic Mapping in Dynamic Environments
Conservative Filtering for Heterogeneous Decentralized Data Fusion in Dynamic Robotic Systems
Orchestrated Value Mapping for Reinforcement Learning
Relational Diagrams: a pattern-preserving diagrammatic representation of non-disjunctive Relational Queries
Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies
Keyword: localization
Distributed Dual Quaternion Based Localization of Visual Sensor Networks
End-to-End Multi-Tab Website Fingerprinting Attack: A Detection Perspective
Evaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements?
A Single Correspondence Is Enough: Robust Global Registration to Avoid Degeneracy in Urban Environments
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video
Drift Reduced Navigation with Deep Explainable Features
MTLDesc: Looking Wider to Describe Better
RCL: Recurrent Continuous Localization for Temporal Action Detection