Abstract
Point clouds collected by real-world sensors are always unaligned and sparse, which makes it hard to reconstruct the complete shape of object from a single frame of data. In this work, we manage to provide complete point clouds from sparse input with pose disturbance by limited translation and rotation. We also use temporal information to enhance the completion model, refining the output with a sequence of inputs. With the help of gated recovery units(GRU) and attention mechanisms as temporal units, we propose a point cloud completion framework that accepts a sequence of unaligned and sparse inputs, and outputs consistent and aligned point clouds. Our network performs in an online manner and presents a refined point cloud for each frame, which enables it to be integrated into any SLAM or reconstruction pipeline. As far as we know, our framework is the first to utilize temporal information and ensure temporal consistency with limited transformation. Through experiments in ShapeNet and KITTI, we prove that our framework is effective in both synthetic and real-world datasets.
Autonomous Vehicles: Open-Source Technologies, Considerations, and Development
Abstract
Autonomous vehicles are the culmination of advances in many areas such as sensor technologies, artificial intelligence (AI), networking, and more. This paper will introduce the reader to the technologies that build autonomous vehicles. It will focus on open-source tools and libraries for autonomous vehicle development, making it cheaper and easier for developers and researchers to participate in the field. The topics covered are as follows. First, we will discuss the sensors used in autonomous vehicles and summarize their performance in different environments, costs, and unique features. Then we will cover Simultaneous Localization and Mapping (SLAM) and algorithms for each modality. Third, we will review popular open-source driving simulators, a cost-effective way to train machine learning models and test vehicle software performance. We will then highlight embedded operating systems and the security and development considerations when choosing one. After that, we will discuss Vehicle-to-Vehicle (V2V) and Internet-of-Vehicle (IoV) communication, which are areas that fuse networking technologies with autonomous vehicles to extend their functionality. We will then review the five levels of vehicle automation, commercial and open-source Advanced Driving Assistance Systems, and their features. Finally, we will touch on the major manufacturing and software companies involved in the field, their investments, and their partnerships. These topics will give the reader an understanding of the industry, its technologies, active research, and the tools available for developers to build autonomous vehicles.
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Vehicular Visible Light Communications for Automated Valet Parking
Authors: Bugra Turan, Ali Uyrus, Osman Nuri Koc, Emrah Kar, Sinem Coleri
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Visible light communication (VLC) is a promising Optical Wireless Communications (OWC) scheme that is demonstrated to provide secure, line-of-sight (LoS), and short-distance vehicle-to-vehicle (V2V) and vehicle-to-infrastructure(V2I) communications. Recently, automated driving applications, supported by V2I links are proposed to increase the reliability of the autonomous vehicles. To this regard, we propose a VLCbased V2I scheme to increase the V2I communication redundancy of autonomous valet parking (AVP) applications, through jam-free and location-based characteristics of VLC. In this paper, we demonstrate a novel architecture to support indoor parking-garage online-map update with vehicle on-board data transmissions and location-based map update dissemination through bidirectional VLC communications. The proposed system yields error-free LoS transmissions with Direct Current Biased Optical OFDM (DCO-OFDM) up to 33 m transmitter-receiver distance enabling vehicle CAN Bus data, infrastructure camera video, and LIDAR point cloud data sharing in an indoor parking garage.
Comparative study of 3D object detection frameworks based on LiDAR data and sensor fusion techniques
Authors: Sreenivasa Hikkal Venugopala
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Estimating and understanding the surroundings of the vehicle precisely forms the basic and crucial step for the autonomous vehicle. The perception system plays a significant role in providing an accurate interpretation of a vehicle's environment in real-time. Generally, the perception system involves various subsystems such as localization, obstacle (static and dynamic) detection, and avoidance, mapping systems, and others. For perceiving the environment, these vehicles will be equipped with various exteroceptive (both passive and active) sensors in particular cameras, Radars, LiDARs, and others. These systems are equipped with deep learning techniques that transform the huge amount of data from the sensors into semantic information on which the object detection and localization tasks are performed. For numerous driving tasks, to provide accurate results, the location and depth information of a particular object is necessary. 3D object detection methods, by utilizing the additional pose data from the sensors such as LiDARs, stereo cameras, provides information on the size and location of the object. Based on recent research, 3D object detection frameworks performing object detection and localization on LiDAR data and sensor fusion techniques show significant improvement in their performance. In this work, a comparative study of the effect of using LiDAR data for object detection frameworks and the performance improvement seen by using sensor fusion techniques are performed. Along with discussing various state-of-the-art methods in both the cases, performing experimental analysis, and providing future research directions.
LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation
Authors: Ngoc Phuong Anh Duong, Alexandre Almin, Léo Lemarié, B Ravi Kiran
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Autonomous driving (AD) datasets have progressively grown in size in the past few years to enable better deep representation learning. Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size. AL has remained relatively unexplored for AD datasets, especially on point cloud data from LiDARs. This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset. Further on, the gains in model performance due to data augmentation (DA) are demonstrated across different subsets of the AL loop. We also demonstrate how DA improves the selection of informative samples to annotate. We observe that data augmentation achieves full dataset accuracy using only 60\% of samples from the selected dataset configuration. This provides faster training time and subsequent gains in annotation costs.
Simulation-to-Reality domain adaptation for offline 3D object annotation on pointclouds with correlation alignment
Authors: Weishuang Zhang, B Ravi Kiran, Thomas Gauthier, Yanis Mazouz, Theo Steger
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Annotating objects with 3D bounding boxes in LiDAR pointclouds is a costly human driven process in an autonomous driving perception system. In this paper, we present a method to semi-automatically annotate real-world pointclouds collected by deployment vehicles using simulated data. We train a 3D object detector model on labeled simulated data from CARLA jointly with real world pointclouds from our target vehicle. The supervised object detection loss is augmented with a CORAL loss term to reduce the distance between labeled simulated and unlabeled real pointcloud feature representations. The goal here is to learn representations that are invariant to simulated (labeled) and real-world (unlabeled) target domains. We also provide an updated survey on domain adaptation methods for pointclouds.
Multi-modal Sensor Fusion for Auto Driving Perception: A Survey
Authors: Keli Huang, Botian Shi, Xiang Li, Xin Li, Siyuan Huang, Yikang Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multi-modal fusion is a fundamental task for the perception of an autonomous driving system, which has recently intrigued many researchers. However, achieving a rather good performance is not an easy task due to the noisy raw data, underutilized information, and the misalignment of multi-modal sensors. In this paper, we provide a literature review of the existing multi-modal-based methods for perception tasks in autonomous driving. Generally, we make a detailed analysis including over 50 papers leveraging perception sensors including LiDAR and camera trying to solve object detection and semantic segmentation tasks. Different from traditional fusion methodology for categorizing fusion models, we propose an innovative way that divides them into two major classes, four minor classes by a more reasonable taxonomy in the view of the fusion stage. Moreover, we dive deep into the current fusion methods, focusing on the remaining problems and open-up discussions on the potential research opportunities. In conclusion, what we expect to do in this paper is to present a new taxonomy of multi-modal fusion methods for the autonomous driving perception tasks and provoke thoughts of the fusion-based techniques in the future.
Keyword: loop detection
There is no result
Keyword: autonomous driving
Learning Interpretable, High-Performing Policies for Continuous Control Problems
Authors: Rohan Paleja, Yaru Niu, Andrew Silva, Chace Ritchie, Sugju Choi, Matthew Gombolay
Abstract
Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for continuous control problems. While the performance of these approaches warrants real-world adoption in domains, such as in autonomous driving and robotics, these policies lack interpretability, limiting deployability in safety-critical and legally-regulated domains. Such domains require interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33$\%$ in autonomous driving scenarios while achieving a $300$x-$600$x reduction in the number of policy parameters against deep learning baselines.
Temporal Robustness of Stochastic Signals
Authors: Lars Lindemann, Alena Rodionova, George J. Pappas
Subjects: Systems and Control (eess.SY); Formal Languages and Automata Theory (cs.FL)
Abstract
We study the temporal robustness of stochastic signals. This topic is of particular interest in interleaving processes such as multi-agent systems where communication and individual agents induce timing uncertainty. For a deterministic signal and a given specification, we first introduce the synchronous and the asynchronous temporal robustness to quantify the signal's robustness with respect to synchronous and asynchronous time shifts in its sub-signals. We then define the temporal robustness risk by investigating the temporal robustness of the realizations of a stochastic signal. This definition can be interpreted as the risk associated with a stochastic signal to not satisfy a specification robustly in time. In this definition, general forms of specifications such as signal temporal logic specifications are permitted. We show how the temporal robustness risk is estimated from data for the value-at-risk. The usefulness of the temporal robustness risk is underlined by both theoretical and empirical evidence. In particular, we provide various numerical case studies including a T-intersection scenario in autonomous driving.
LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation
Authors: Ngoc Phuong Anh Duong, Alexandre Almin, Léo Lemarié, B Ravi Kiran
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Autonomous driving (AD) datasets have progressively grown in size in the past few years to enable better deep representation learning. Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size. AL has remained relatively unexplored for AD datasets, especially on point cloud data from LiDARs. This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset. Further on, the gains in model performance due to data augmentation (DA) are demonstrated across different subsets of the AL loop. We also demonstrate how DA improves the selection of informative samples to annotate. We observe that data augmentation achieves full dataset accuracy using only 60\% of samples from the selected dataset configuration. This provides faster training time and subsequent gains in annotation costs.
Simulation-to-Reality domain adaptation for offline 3D object annotation on pointclouds with correlation alignment
Authors: Weishuang Zhang, B Ravi Kiran, Thomas Gauthier, Yanis Mazouz, Theo Steger
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Annotating objects with 3D bounding boxes in LiDAR pointclouds is a costly human driven process in an autonomous driving perception system. In this paper, we present a method to semi-automatically annotate real-world pointclouds collected by deployment vehicles using simulated data. We train a 3D object detector model on labeled simulated data from CARLA jointly with real world pointclouds from our target vehicle. The supervised object detection loss is augmented with a CORAL loss term to reduce the distance between labeled simulated and unlabeled real pointcloud feature representations. The goal here is to learn representations that are invariant to simulated (labeled) and real-world (unlabeled) target domains. We also provide an updated survey on domain adaptation methods for pointclouds.
Multi-modal Sensor Fusion for Auto Driving Perception: A Survey
Authors: Keli Huang, Botian Shi, Xiang Li, Xin Li, Siyuan Huang, Yikang Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multi-modal fusion is a fundamental task for the perception of an autonomous driving system, which has recently intrigued many researchers. However, achieving a rather good performance is not an easy task due to the noisy raw data, underutilized information, and the misalignment of multi-modal sensors. In this paper, we provide a literature review of the existing multi-modal-based methods for perception tasks in autonomous driving. Generally, we make a detailed analysis including over 50 papers leveraging perception sensors including LiDAR and camera trying to solve object detection and semantic segmentation tasks. Different from traditional fusion methodology for categorizing fusion models, we propose an innovative way that divides them into two major classes, four minor classes by a more reasonable taxonomy in the view of the fusion stage. Moreover, we dive deep into the current fusion methods, focusing on the remaining problems and open-up discussions on the potential research opportunities. In conclusion, what we expect to do in this paper is to present a new taxonomy of multi-modal fusion methods for the autonomous driving perception tasks and provoke thoughts of the fusion-based techniques in the future.
Automated Vehicle Safety Guarantee, Verification and Certification: A Survey
Authors: Tong Zhao, Ekim Yurtsever, Joel Paulson, Giorgio Rizzoni
Abstract
Challenges related to automated driving are no longer focused on just the construction of such automated vehicles (AVs), but in assuring the safety of their operation. Recent advances in Level 3 and Level 4 autonomous driving have motivated more extensive study in safety guarantees of complicated AV maneuvers, which aligns with the goal of ISO 21448 (Safety of the Intended Functions, or SOTIF), i.e. minimizing unsafe scenarios both known and unknown, as well as Vision Zero -- eliminating highway fatalities by 2050. A majority of approaches used in providing safety guarantees for AV motion control originate from formal methods, especially reachability analysis (RA), which relies on mathematical models for the dynamic evolution of the system to provide guarantees. However, to the best of the authors' knowledge, there have been no review papers dedicated to describing and interpreting state-of-the-art of formal methods in the context of AVs. In this work, we provide both an overview of the safety verification, validation and certification process, as well as review formal safety techniques that are best suited to AV applications. We also propose a unified scenario coverage framework that can provide either a formal or sample-based estimate of safety verification for full AVs.
3D Object Detection from Images for Autonomous Driving: A Survey
Authors: Xinzhu Ma, Wanli Ouyang, Andrea Simonelli, Elisa Ricci
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
3D object detection from images, one of the fundamental and challenging problems in autonomous driving, has received increasing attention from both industry and academia in recent years. Benefiting from the rapid development of deep learning technologies, image-based 3D detection has achieved remarkable progress. Particularly, more than 200 works have studied this problem from 2015 to 2021, encompassing a broad spectrum of theories, algorithms, and applications. However, to date no recent survey exists to collect and organize this knowledge. In this paper, we fill this gap in the literature and provide the first comprehensive survey of this novel and continuously growing research field, summarizing the most commonly used pipelines for image-based 3D detection and deeply analyzing each of their components. Additionally, we also propose two new taxonomies to organize the state-of-the-art methods into different categories, with the intent of providing a more systematic review of existing methods and facilitating fair comparisons with future works. In retrospect of what has been achieved so far, we also analyze the current challenges in the field and discuss future directions for image-based 3D detection research.
Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics
Abstract
The advent of autonomous driving and advanced driver assistance systems necessitates continuous developments in computer vision for 3D scene understanding. Self-supervised monocular depth estimation, a method for pixel-wise distance estimation of objects from a single camera without the use of ground truth labels, is an important task in 3D scene understanding. However, existing methods for this task are limited to convolutional neural network (CNN) architectures. In contrast with CNNs that use localized linear operations and lose feature resolution across the layers, vision transformers process at constant resolution with a global receptive field at every stage. While recent works have compared transformers against their CNN counterparts for tasks such as image classification, no study exists that investigates the impact of using transformers for self-supervised monocular depth estimation. Here, we first demonstrate how to adapt vision transformers for self-supervised monocular depth estimation. Thereafter, we compare the transformer and CNN-based architectures for their performance on KITTI depth prediction benchmarks, as well as their robustness to natural corruptions and adversarial attacks, including when the camera intrinsics are unknown. Our study demonstrates how transformer-based architecture, though lower in run-time efficiency, achieves comparable performance while being more robust and generalizable.
Discrete-Event Controller Synthesis for Autonomous Systems with Deep-Learning Perception Components
Authors: Radu Calinescu (1), Calum Imrie (1), Ravi Mangal (2), Corina Păsăreanu (2), Misael Alpizar Santana (1), Gricel Vázquez (1) ((1) University of York, (2) Carnegie Mellon University)
Abstract
We present DEEPDECS, a new method for the synthesis of correct-by-construction discrete-event controllers for autonomous systems that use deep neural network (DNN) classifiers for the perception step of their decision-making processes. Despite major advances in deep learning in recent years, providing safety guarantees for these systems remains very challenging. Our controller synthesis method addresses this challenge by integrating DNN verification with the synthesis of verified Markov models. The synthesised models correspond to discrete-event controllers guaranteed to satisfy the safety, dependability and performance requirements of the autonomous system, and to be Pareto optimal with respect to a set of optimisation criteria. We use the method in simulation to synthesise controllers for mobile-robot collision avoidance, and for maintaining driver attentiveness in shared-control autonomous driving.
Keyword: mapping
Condensation Jacobian with Adaptivity
Authors: Nicholas J. Weidner, Theodore Kim, Shinjiro Sueda
Abstract
We present a new approach that allows large time steps in dynamic simulations. Our approach, ConJac, is based on condensation, a technique for eliminating many degrees of freedom (DOFs) by expressing them in terms of the remaining degrees of freedom. In this work, we choose a subset of nodes to be dynamic nodes, and apply condensation at the velocity level by defining a linear mapping from the velocities of these chosen dynamic DOFs to the velocities of the remaining quasistatic DOFs. We then use this mapping to derive reduced equations of motion involving only the dynamic DOFs. We also derive a novel stabilization term that enables us to use complex nonlinear material models. ConJac remains stable at large time steps, exhibits highly dynamic motion, and displays minimal numerical damping. In marked contrast to subspace approaches, ConJac gives exactly the same configuration as the full space approach once the static state is reached. Furthermore, ConJac can automatically choose which parts of the object are to be simulated dynamically or quasistatically. Finally, ConJac works with a wide range of moderate to stiff materials, supports anisotropy and heterogeneity, handles topology changes, and can be combined with existing solvers including rigid body dynamics.
Neural Logic Analogy Learning
Authors: Yujia Fan, Yongfeng Zhang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
Abstract
Letter-string analogy is an important analogy learning task which seems to be easy for humans but very challenging for machines. The main idea behind current approaches to solving letter-string analogies is to design heuristic rules for extracting analogy structures and constructing analogy mappings. However, one key problem is that it is difficult to build a comprehensive and exhaustive set of analogy structures which can fully describe the subtlety of analogies. This problem makes current approaches unable to handle complicated letter-string analogy problems. In this paper, we propose Neural logic analogy learning (Noan), which is a dynamic neural architecture driven by differentiable logic reasoning to solve analogy problems. Each analogy problem is converted into logical expressions consisting of logical variables and basic logical operations (AND, OR, and NOT). More specifically, Noan learns the logical variables as vector embeddings and learns each logical operation as a neural module. In this way, the model builds computational graph integrating neural network with logical reasoning to capture the internal logical structure of the input letter strings. The analogy learning problem then becomes a True/False evaluation problem of the logical expressions. Experiments show that our machine learning-based Noan approach outperforms state-of-the-art approaches on standard letter-string analogy benchmark datasets.
Age of Information-based Scheduling for Wireless D2D Systems with a Deep Learning Approach
Authors: Ling Luo, Zhenyu Liu, Zhiyong Chen, Min Hua, Wenqing Li, Bin Xia
Abstract
Device-to-device (D2D) links scheduling for avoiding excessive interference is critical to the success of wireless D2D communications. Most of the traditional scheduling schemes only consider the maximum throughput or fairness of the system and do not consider the freshness of information. In this paper, we propose a novel D2D links scheduling scheme to optimize an age of information (AoI) and throughput jointly scheduling problem when D2D links transmit packets under the last-come-first-serve policy with packet-replacement (LCFS-PR). It is motivated by the fact that the maximum throughput scheduling may reduce the activation probability of links with poor channel conditions, which results in terrible AoI performance. Specifically, We derive the expression of the overall average AoI and throughput of the network under the spatio-temporal interfering queue dynamics with the mean-field assumption. Moreover, a neural network structure is proposed to learn the mapping from the geographic location to the optimal scheduling parameters under a stationary randomized policy, where the scheduling decision can be made without estimating the channel state information(CSI) after the neural network is well-trained. To overcome the problem that implicit loss functions cannot be back-propagated, we derive a numerical solution of the gradient. Finally, numerical results reveal that the performance of the deep learning approach is close to that of a local optimal algorithm which has a higher computational complexity. The trade-off curve of AoI and throughput is also obtained, where the AoI tends to infinity when throughput is maximized.
Security-Aware Virtual Network Embedding Algorithm based on Reinforcement Learning
Abstract
Virtual network embedding (VNE) algorithm is always the key problem in network virtualization (NV) technology. At present, the research in this field still has the following problems. The traditional way to solve VNE problem is to use heuristic algorithm. However, this method relies on manual embedding rules, which does not accord with the actual situation of VNE. In addition, as the use of intelligent learning algorithm to solve the problem of VNE has become a trend, this method is gradually outdated. At the same time, there are some security problems in VNE. However, there is no intelligent algorithm to solve the security problem of VNE. For this reason, this paper proposes a security-aware VNE algorithm based on reinforcement learning (RL). In the training phase, we use a policy network as a learning agent and take the extracted attributes of the substrate nodes to form a feature matrix as input. The learning agent is trained in this environment to get the mapping probability of each substrate node. In the test phase, we map nodes according to the mapping probability and use the breadth-first strategy (BFS) to map links. For the security problem, we add security requirements level constraint for each virtual node and security level constraint for each substrate node. Virtual nodes can only be embedded on substrate nodes that are not lower than the level of security requirements. Experimental results show that the proposed algorithm is superior to other typical algorithms in terms of long-term average return, long-term revenue consumption ratio and virtual network request (VNR) acceptance rate.
ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning
Authors: Sean Chen, Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D. Dragan, Sergey Levine
Abstract
Building assistive interfaces for controlling robots through arbitrary, high-dimensional, noisy inputs (e.g., webcam images of eye gaze) can be challenging, especially when it involves inferring the user's desired action in the absence of a natural 'default' interface. Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem, and enables the interface to adapt to individual users. However, this approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse. We propose a hierarchical solution that learns efficiently from sparse user feedback: we use offline pre-training to acquire a latent embedding space of useful, high-level robot behaviors, which, in turn, enables the system to focus on using online user feedback to learn a mapping from user inputs to desired high-level behaviors. The key insight is that access to a pre-trained policy enables the system to learn more from sparse rewards than a na\"ive RL algorithm: using the pre-trained policy, the system can make use of successful task executions to relabel, in hindsight, what the user actually meant to do during unsuccessful executions. We evaluate our method primarily through a user study with 12 participants who perform tasks in three simulated robotic manipulation domains using a webcam and their eye gaze: flipping light switches, opening a shelf door to reach objects inside, and rotating a valve. The results show that our method successfully learns to map 128-dimensional gaze features to 7-dimensional joint torques from sparse rewards in under 10 minutes of online training, and seamlessly helps users who employ different gaze strategies, while adapting to distributional shift in webcam inputs, tasks, and environments.
Comparative study of 3D object detection frameworks based on LiDAR data and sensor fusion techniques
Authors: Sreenivasa Hikkal Venugopala
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Estimating and understanding the surroundings of the vehicle precisely forms the basic and crucial step for the autonomous vehicle. The perception system plays a significant role in providing an accurate interpretation of a vehicle's environment in real-time. Generally, the perception system involves various subsystems such as localization, obstacle (static and dynamic) detection, and avoidance, mapping systems, and others. For perceiving the environment, these vehicles will be equipped with various exteroceptive (both passive and active) sensors in particular cameras, Radars, LiDARs, and others. These systems are equipped with deep learning techniques that transform the huge amount of data from the sensors into semantic information on which the object detection and localization tasks are performed. For numerous driving tasks, to provide accurate results, the location and depth information of a particular object is necessary. 3D object detection methods, by utilizing the additional pose data from the sensors such as LiDARs, stereo cameras, provides information on the size and location of the object. Based on recent research, 3D object detection frameworks performing object detection and localization on LiDAR data and sensor fusion techniques show significant improvement in their performance. In this work, a comparative study of the effect of using LiDAR data for object detection frameworks and the performance improvement seen by using sensor fusion techniques are performed. Along with discussing various state-of-the-art methods in both the cases, performing experimental analysis, and providing future research directions.
Symmetric Volume Maps
Authors: S. Mazdak Abulnaga, Oded Stein, Polina Golland, Justin Solomon
Abstract
Although shape correspondence is a central problem in geometry processing, most methods for this task apply only to two-dimensional surfaces. The neglected task of volumetric correspondence--a natural extension relevant to shapes extracted from simulation, medical imaging, volume rendering, and even improving surface maps of boundary representations--presents unique challenges that do not appear in the two-dimensional case. In this work, we propose a method for mapping between volumes represented as tetrahedral meshes. Our formulation minimizes a distortion energy designed to extract maps symmetrically, i.e., without dependence on the ordering of the source and target domains. We accompany our method with theoretical discussion describing the consequences of this symmetry assumption, leading us to select a symmetrized ARAP energy that favors isometric correspondences. Our final formulation optimizes for near-isometry while matching the boundary. We demonstrate our method on a diverse geometric dataset, producing low-distortion matchings that align to the boundary.
Ethics, Rules of Engagement, and AI: Neural Narrative Mapping Using Large Transformer Language Models
Authors: Philip Feldman, Aaron Dant, David Rosenbluth
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
The problem of determining if a military unit has correctly understood an order and is properly executing on it is one that has bedeviled military planners throughout history. The advent of advanced language models such as OpenAI's GPT-series offers new possibilities for addressing this problem. This paper presents a mechanism to harness the narrative output of large language models and produce diagrams or "maps" of the relationships that are latent in the weights of such models as the GPT-3. The resulting "Neural Narrative Maps" (NNMs), are intended to provide insight into the organization of information, opinion, and belief in the model, which in turn provide means to understand intent and response in the context of physical distance. This paper discusses the problem of mapping information spaces in general, and then presents a concrete implementation of this concept in the context of OpenAI's GPT-3 language model for determining if a subordinate is following a commander's intent in a high-risk situation. The subordinate's locations within the NNM allow a novel capability to evaluate the intent of the subordinate with respect to the commander. We show that is is possible not only to determine if they are nearby in narrative space, but also how they are oriented, and what "trajectory" they are on. Our results show that our method is able to produce high-quality maps, and demonstrate new ways of evaluating intent more generally.
Multi-domain Unsupervised Image-to-Image Translation with Appearance Adaptive Convolution
Authors: Somi Jeong, Jiyoung Lee, Kwanghoon Sohn
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Over the past few years, image-to-image (I2I) translation methods have been proposed to translate a given image into diverse outputs. Despite the impressive results, they mainly focus on the I2I translation between two domains, so the multi-domain I2I translation still remains a challenge. To address this problem, we propose a novel multi-domain unsupervised image-to-image translation (MDUIT) framework that leverages the decomposed content feature and appearance adaptive convolution to translate an image into a target appearance while preserving the given geometric content. We also exploit a contrast learning objective, which improves the disentanglement ability and effectively utilizes multi-domain image data in the training process by pairing the semantically similar images. This allows our method to learn the diverse mappings between multiple visual domains with only a single framework. We show that the proposed method produces visually diverse and plausible results in multiple domains compared to the state-of-the-art methods.
Abstract
We propose an efficient algorithm for learning mappings between two metric spaces, $\X$ and $\Y$. Our procedure is strongly Bayes-consistent whenever $\X$ and $\Y$ are topologically separable and $\Y$ is "bounded in expectation" (our term; the separability assumption can be somewhat weakened). At this level of generality, ours is the first such learnability result for unbounded loss in the agnostic setting. Our technique is based on metric medoids (a variant of Fr\'echet means) and presents a significant departure from existing methods, which, as we demonstrate, fail to achieve Bayes-consistency on general instance- and label-space metrics. Our proofs introduce the technique of {\em semi-stable compression}, which may be of independent interest.
Abstract
The channel synthesis problem has been widely investigated over the last decade. In this paper, we consider the sequential version in which the encoder and the decoder work in a sequential way. Under a mild assumption on the target joint distribution we provide a complete (single-letter) characterization of the solution for the point-to-point case, which shows that the canonical symbol-by-symbol mapping is not optimal in general, but is indeed optimal if we make some additional assumptions on the encoder and decoder. We also extend this result to the broadcast scenario and the interactive communication scenario. We provide bounds in the broadcast setting and a complete characterization of the solution under a mild condition on the target joint distribution in the interactive communication case. Our proofs are based on a R\'enyi entropy method.
Autonomous Vehicles: Open-Source Technologies, Considerations, and Development
Abstract
Autonomous vehicles are the culmination of advances in many areas such as sensor technologies, artificial intelligence (AI), networking, and more. This paper will introduce the reader to the technologies that build autonomous vehicles. It will focus on open-source tools and libraries for autonomous vehicle development, making it cheaper and easier for developers and researchers to participate in the field. The topics covered are as follows. First, we will discuss the sensors used in autonomous vehicles and summarize their performance in different environments, costs, and unique features. Then we will cover Simultaneous Localization and Mapping (SLAM) and algorithms for each modality. Third, we will review popular open-source driving simulators, a cost-effective way to train machine learning models and test vehicle software performance. We will then highlight embedded operating systems and the security and development considerations when choosing one. After that, we will discuss Vehicle-to-Vehicle (V2V) and Internet-of-Vehicle (IoV) communication, which are areas that fuse networking technologies with autonomous vehicles to extend their functionality. We will then review the five levels of vehicle automation, commercial and open-source Advanced Driving Assistance Systems, and their features. Finally, we will touch on the major manufacturing and software companies involved in the field, their investments, and their partnerships. These topics will give the reader an understanding of the industry, its technologies, active research, and the tools available for developers to build autonomous vehicles.
Network Resource Allocation Strategy Based on Deep Reinforcement Learning
Abstract
The traditional Internet has encountered a bottleneck in allocating network resources for emerging technology needs. Network virtualization (NV) technology as a future network architecture, the virtual network embedding (VNE) algorithm it supports shows great potential in solving resource allocation problems. Combined with the efficient machine learning (ML) algorithm, a neural network model close to the substrate network environment is constructed to train the reinforcement learning agent. This paper proposes a two-stage VNE algorithm based on deep reinforcement learning (DRL) (TS-DRL-VNE) for the problem that the mapping result of existing heuristic algorithm is easy to converge to the local optimal solution. For the problem that the existing VNE algorithm based on ML often ignores the importance of substrate network representation and training mode, a DRL VNE algorithm based on full attribute matrix (FAM-DRL-VNE) is proposed. In view of the problem that the existing VNE algorithm often ignores the underlying resource changes between virtual network requests, a DRL VNE algorithm based on matrix perturbation theory (MPT-DRL-VNE) is proposed. Experimental results show that the above algorithm is superior to other algorithms.
Abstract
Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method comprised of two main components: i) a two-level architecture consisting of modality-specific base encoder, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.
Keyword: localization
Comparative study of 3D object detection frameworks based on LiDAR data and sensor fusion techniques
Authors: Sreenivasa Hikkal Venugopala
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Estimating and understanding the surroundings of the vehicle precisely forms the basic and crucial step for the autonomous vehicle. The perception system plays a significant role in providing an accurate interpretation of a vehicle's environment in real-time. Generally, the perception system involves various subsystems such as localization, obstacle (static and dynamic) detection, and avoidance, mapping systems, and others. For perceiving the environment, these vehicles will be equipped with various exteroceptive (both passive and active) sensors in particular cameras, Radars, LiDARs, and others. These systems are equipped with deep learning techniques that transform the huge amount of data from the sensors into semantic information on which the object detection and localization tasks are performed. For numerous driving tasks, to provide accurate results, the location and depth information of a particular object is necessary. 3D object detection methods, by utilizing the additional pose data from the sensors such as LiDARs, stereo cameras, provides information on the size and location of the object. Based on recent research, 3D object detection frameworks performing object detection and localization on LiDAR data and sensor fusion techniques show significant improvement in their performance. In this work, a comparative study of the effect of using LiDAR data for object detection frameworks and the performance improvement seen by using sensor fusion techniques are performed. Along with discussing various state-of-the-art methods in both the cases, performing experimental analysis, and providing future research directions.
Autonomous Vehicles: Open-Source Technologies, Considerations, and Development
Abstract
Autonomous vehicles are the culmination of advances in many areas such as sensor technologies, artificial intelligence (AI), networking, and more. This paper will introduce the reader to the technologies that build autonomous vehicles. It will focus on open-source tools and libraries for autonomous vehicle development, making it cheaper and easier for developers and researchers to participate in the field. The topics covered are as follows. First, we will discuss the sensors used in autonomous vehicles and summarize their performance in different environments, costs, and unique features. Then we will cover Simultaneous Localization and Mapping (SLAM) and algorithms for each modality. Third, we will review popular open-source driving simulators, a cost-effective way to train machine learning models and test vehicle software performance. We will then highlight embedded operating systems and the security and development considerations when choosing one. After that, we will discuss Vehicle-to-Vehicle (V2V) and Internet-of-Vehicle (IoV) communication, which are areas that fuse networking technologies with autonomous vehicles to extend their functionality. We will then review the five levels of vehicle automation, commercial and open-source Advanced Driving Assistance Systems, and their features. Finally, we will touch on the major manufacturing and software companies involved in the field, their investments, and their partnerships. These topics will give the reader an understanding of the industry, its technologies, active research, and the tools available for developers to build autonomous vehicles.
Crafting Better Contrastive Views for Siamese Representation Learning
Authors: Xiangyu Peng, Kai Wang, Zheng Zhu, Yang You
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent self-supervised contrastive learning methods greatly benefit from the Siamese structure that aims at minimizing distances between positive pairs. For high performance Siamese representation learning, one of the keys is to design good contrastive pairs. Most previous works simply apply random sampling to make different crops of the same image, which overlooks the semantic information that may degrade the quality of views. In this work, we propose ContrastiveCrop, which could effectively generate better crops for Siamese representation learning. Firstly, a semantic-aware object localization strategy is proposed within the training process in a fully unsupervised manner. This guides us to generate contrastive views which could avoid most false positives (i.e., object vs. background). Moreover, we empirically find that views with similar appearances are trivial for the Siamese model training. Thus, a center-suppressed sampling is further designed to enlarge the variance of crops. Remarkably, our method takes a careful consideration of positive pairs for contrastive learning with negligible extra training overhead. As a plug-and-play and framework-agnostic module, ContrastiveCrop consistently improves SimCLR, MoCo, BYOL, SimSiam by 0.4% ~ 2.0% classification accuracy on CIFAR-10, CIFAR-100, Tiny ImageNet and STL-10. Superior results are also achieved on downstream detection and segmentation tasks when pre-trained on ImageNet-1K.
Keyword: SLAM
Temporal Point Cloud Completion with Pose Disturbance
Autonomous Vehicles: Open-Source Technologies, Considerations, and Development
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Vehicular Visible Light Communications for Automated Valet Parking
Comparative study of 3D object detection frameworks based on LiDAR data and sensor fusion techniques
LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation
Simulation-to-Reality domain adaptation for offline 3D object annotation on pointclouds with correlation alignment
Multi-modal Sensor Fusion for Auto Driving Perception: A Survey
Keyword: loop detection
There is no result
Keyword: autonomous driving
Learning Interpretable, High-Performing Policies for Continuous Control Problems
Temporal Robustness of Stochastic Signals
LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation
Simulation-to-Reality domain adaptation for offline 3D object annotation on pointclouds with correlation alignment
Multi-modal Sensor Fusion for Auto Driving Perception: A Survey
Automated Vehicle Safety Guarantee, Verification and Certification: A Survey
3D Object Detection from Images for Autonomous Driving: A Survey
Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics
Discrete-Event Controller Synthesis for Autonomous Systems with Deep-Learning Perception Components
Keyword: mapping
Condensation Jacobian with Adaptivity
Neural Logic Analogy Learning
Age of Information-based Scheduling for Wireless D2D Systems with a Deep Learning Approach
Security-Aware Virtual Network Embedding Algorithm based on Reinforcement Learning
ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning
Comparative study of 3D object detection frameworks based on LiDAR data and sensor fusion techniques
Symmetric Volume Maps
Ethics, Rules of Engagement, and AI: Neural Narrative Mapping Using Large Transformer Language Models
Multi-domain Unsupervised Image-to-Image Translation with Appearance Adaptive Convolution
Metric-valued regression
Sequential Channel Synthesis
Autonomous Vehicles: Open-Source Technologies, Considerations, and Development
Network Resource Allocation Strategy Based on Deep Reinforcement Learning
GMC -- Geometric Multimodal Contrastive Representation Learning
Keyword: localization
Comparative study of 3D object detection frameworks based on LiDAR data and sensor fusion techniques
Autonomous Vehicles: Open-Source Technologies, Considerations, and Development
Crafting Better Contrastive Views for Siamese Representation Learning