Abstract
SLAM algorithm is based on the static assumption of environment. Therefore, the dynamic factors in the environment will have a great impact on the matching points due to violating this assumption, and then directly affect the accuracy of subsequent camera pose estimation. Recently, some related works generally use the combination of semantic constraints and geometric constraints to deal with dynamic objects, but there are some problems, such as poor real-time performance, easy to treat people as rigid bodies, and poor performance in low dynamic scenes. In this paper, a dynamic scene oriented visual SLAM algorithm based on target detection and static probability named DYP-SLAM is proposed. The algorithm combines semantic constraints and geometric constraints to calculate the static probability of objects, keypoints and map points, and takes them as weights to participate in camera pose estimation. The proposed algorithm is evaluated on the public dataset and compared with a variety of advanced algorithms. It has achieved the best results in almost all low dynamics and high dynamic scenarios, and showing quite high real-time.
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
There is no result
Keyword: loop detection
There is no result
Keyword: autonomous driving
A Machine Learning Smartphone-based Sensing for Driver Behavior Classification
Authors: Sarra Ben Brahim, Hakim Ghazzai, Hichem Besbes, Yehia Massoud
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract
Driver behavior profiling is one of the main issues in the insurance industries and fleet management, thus being able to classify the driver behavior with low-cost mobile applications remains in the spotlight of autonomous driving. However, using mobile sensors may face the challenge of security, privacy, and trust issues. To overcome those challenges, we propose to collect data sensors using Carla Simulator available in smartphones (Accelerometer, Gyroscope, GPS) in order to classify the driver behavior using speed, acceleration, direction, the 3-axis rotation angles (Yaw, Pitch, Roll) taking into account the speed limit of the current road and weather conditions to better identify the risky behavior. Secondly, after fusing inter-axial data from multiple sensors into a single file, we explore different machine learning algorithms for time series classification to evaluate which algorithm results in the highest performance.
Ad-datasets: a meta-collection of data sets for autonomous driving
Authors: Daniel Bogdoll, Felix Schreyer, J. Marius Zöllner
Abstract
Autonomous driving is among the largest domains in which deep learning has been fundamental for progress within the last years. The rise of datasets went hand in hand with this development. All the more striking is the fact that researchers do not have a tool available that provides a quick, comprehensive and up-to-date overview of data sets and their features in the domain of autonomous driving. In this paper, we present ad-datasets, an online tool that provides such an overview for more than 150 data sets. The tool enables users to sort and filter the data sets according to currently 16 different categories. ad-datasets is an open-source project with community contributions. It is in constant development, ensuring that the content stays up-to-date.
Abstract
Over the last few years, neural networks have started penetrating safety critical systems to take decisions in robots, rockets, autonomous driving car, etc. A problem is that these critical systems often have limited computing resources. Often, they use the fixed-point arithmetic for its many advantages (rapidity, compatibility with small memory devices.) In this article, a new technique is introduced to tune the formats (precision) of already trained neural networks using fixed-point arithmetic, which can be implemented using integer operations only. The new optimized neural network computes the output with fixed-point numbers without modifying the accuracy up to a threshold fixed by the user. A fixed-point code is synthesized for the new optimized neural network ensuring the respect of the threshold for any input vector belonging the range [xmin, xmax] determined during the analysis. From a technical point of view, we do a preliminary analysis of our floating neural network to determine the worst cases, then we generate a system of linear constraints among integer variables that we can solve by linear programming. The solution of this system is the new fixed-point format of each neuron. The experimental results obtained show the efficiency of our method which can ensure that the new fixed-point neural network has the same behavior as the initial floating-point neural network.
A Survey on Safety-critical Scenario Generation from Methodological Perspective
Abstract
Autonomous driving systems have witnessed a great development during the past years thanks to the advance in sensing and decision-making. One critical obstacle for their massive deployment in the real world is the evaluation of safety. Most existing driving systems are still trained and evaluated on naturalistic scenarios that account for the vast majority of daily life or heuristically-generated adversarial ones. However, the large population of cars requires an extremely low collision rate, indicating safety-critical scenarios collected in the real world would be rare. Thus, methods to artificially generate artificial scenarios becomes critical to manage the risk and reduce the cost. In this survey, we focus on the algorithms of safety-critical scenario generation. We firstly provide a comprehensive taxonomy of existing algorithms by dividing them into three categories: data-driven generation, adversarial generation, and knowledge-based generation. Then, we discuss useful tools for scenario generation, including simulation platforms and packages. Finally, we extend our discussion to five main challenges of current works -- fidelity, efficiency, diversity, transferability, controllability -- and the research opportunities lighted up by these challenges.
Keyword: mapping
Multi Objective Resource Optimization of Wireless Network Based on Cross Domain Virtual Network Embedding
Abstract
The rapid development of virtual network architecture makes it possible for wireless network to be widely used. With the popularity of artificial intelligence (AI) industry in daily life, efficient resource allocation of wireless network has become a problem. Especially when network users request wireless network resources from different management domains, they still face many practical problems. From the perspective of virtual network embedding (VNE), this paper designs and implements a multi-objective optimization VNE algorithm for wireless network resource allocation. Resource allocation in virtual network is essentially a problem of allocating underlying resources for virtual network requests (VNRs). According to the proposed objective formula, we consider the optimization mapping cost, network delay and VNR acceptance rate. VNE is completed by node mapping and link mapping. In the experiment and simulation stage, it is compared with other VNE algorithms, the cross domain VNE algorithm proposed in this paper is optimal in the above three indicators. This shows the effectiveness of the algorithm in wireless network resource allocation.
Generative Modeling of Complex Data
Authors: Luca Canale, Nicolas Grislain, Grégoire Lothe, Johan Leduc
Abstract
In recent years, several models have improved the capacity to generate synthetic tabular datasets. However, such models focus on synthesizing simple columnar tables and are not useable on real-life data with complex structures. This paper puts forward a generic framework to synthesize more complex data structures with composite and nested types. It then proposes one practical implementation, built with causal transformers, for struct (mappings of types) and lists (repeated instances of a type). The results on standard benchmark datasets show that such implementation consistently outperforms current state-of-the-art models both in terms of machine learning utility and statistical similarity. Moreover, it shows very strong results on two complex hierarchical datasets with multiple nesting and sparse data, that were previously out of reach.
Geometrically Higher Order Unfitted Space-Time Methods for PDEs on Moving Domains
Authors: Fabian Heimann, Christoph Lehrenfeld, Janosch Preuß
Abstract
In this paper, we propose new geometrically unfitted space-time Finite Element methods for partial differential equations posed on moving domains of higher order accuracy in space and time. As a model problem, the convection-diffusion problem on a moving domain is studied. For geometrically higher order accuracy, we apply a parametric mapping on a background space-time tensor-product mesh. Concerning discretisation in time, we consider discontinuous Galerkin, as well as related continuous (Petrov-)Galerkin and Galerkin collocation methods. For stabilisation with respect to bad cut configurations and as an extension mechanism that is required for the latter two schemes, a ghost penalty stabilisation is employed. The article puts an emphasis on the techniques that allow to achieve a robust but higher order geometry handling for smooth domains. We investigate the computational properties of the respective methods in a series of numerical experiments. These include studies in different dimensions for different polynomial degrees in space and time, validating the higher order accuracy in both variables.
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators
Abstract
Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs). These kernels are used extensively during CNN training and inference of applications such as image segmentation and high-resolution image generation. Although these kernels have grown in popularity, they stress current compute systems due to their high memory intensity, exascale compute demands, and large energy consumption. We find that commonly-used low-power CNN inference accelerators based on spatial architectures are not optimized for both of these convolutional kernels. Dilated and transposed convolutions introduce significant zero padding when mapped to the underlying spatial architecture, significantly degrading performance and energy efficiency. Existing approaches that address this issue require significant design changes to the otherwise simple, efficient, and well-adopted architectures used to compute direct convolutions. To address this challenge, we propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions. These algorithms are tailored to execute efficiently on existing low-cost, small-scale spatial architectures and requires minimal changes to the network-on-chip of existing accelerators. EcoFlow eliminates zero padding through careful dataflow orchestration and data mapping tailored to the spatial architecture. EcoFlow enables flexible and high-performance transpose and dilated convolutions on architectures that are otherwise optimized for CNN inference. We evaluate the efficiency of EcoFlow on CNN training workloads and Generative Adversarial Network (GAN) training workloads. Experiments in our new cycle-accurate simulator show that EcoFlow 1) reduces end-to-end CNN training time between 7-85%, and 2) improves end-to-end GAN training performance between 29-42%, compared to state-of-the-art CNN inference accelerators.
Interactive Mobile App Navigation with Uncertain or Under-specified Natural Language Commands
Authors: Andrea Burns, Deniz Arsan, Sanjna Agrawal, Ranjitha Kumar, Kate Saenko, Bryan A. Plummer
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Abstract
We introduce Mobile app Tasks with Iterative Feedback (MoTIF), a new dataset where the goal is to complete a natural language query in a mobile app. Current datasets for related tasks in interactive question answering, visual common sense reasoning, and question-answer plausibility prediction do not support research in resolving ambiguous natural language requests or operating in diverse digital domains. As a result, they fail to capture complexities of real question answering or interactive tasks. In contrast, MoTIF contains natural language requests that are not satisfiable, the first such work to investigate this issue for interactive vision-language tasks. MoTIF also contains follow up questions for ambiguous queries to enable research on task uncertainty resolution. We introduce task feasibility prediction and propose an initial model which obtains an F1 score of 61.1. We next benchmark task automation with our dataset and find adaptations of prior work perform poorly due to our realistic language requests, obtaining an accuracy of only 20.2% when mapping commands to grounded actions. We analyze performance and gain insight for future work that may bridge the gap between current model ability and what is needed for successful use in application.
Keyword: localization
Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization
Authors: Andrea Vallone, Frederik Warburg, Hans Hansen, Søren Hauberg, Javier Civera
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Place recognition and visual localization are particularly challenging in wide baseline configurations. In this paper, we contribute with the \emph{Danish Airs and Grounds} (DAG) dataset, a large collection of street-level and aerial images targeting such cases. Its main challenge lies in the extreme viewing-angle difference between query and reference images with consequent changes in illumination and perspective. The dataset is larger and more diverse than current publicly available data, including more than 50 km of road in urban, suburban and rural areas. All images are associated with accurate 6-DoF metadata that allows the benchmarking of visual localization methods. We also propose a map-to-image re-localization pipeline, that first estimates a dense 3D reconstruction from the aerial images and then matches query street-level images to street-level renderings of the 3D model. The dataset can be downloaded at: https://frederikwarburg.github.io/DAG
Multi-Output Gaussian Process-Based Data Augmentation for Multi-Building and Multi-Floor Indoor Localization
Authors: Zhe Tang, Sihao Li, Kyeong Soo Kim, Jeremy Smith
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract
Location fingerprinting based on RSSI becomes a mainstream indoor localization technique due to its advantage of not requiring the installation of new infrastructure and the modification of existing devices, especially given the prevalence of Wi-Fi-enabled devices and the ubiquitous Wi-Fi access in modern buildings. The use of AI/ML technologies like DNNs makes location fingerprinting more accurate and reliable, especially for large-scale multi-building and multi-floor indoor localization. The application of DNNs for indoor localization, however, depends on a large amount of preprocessed and deliberately-labeled data for their training. Considering the difficulty of the data collection in an indoor environment, especially under the current epidemic situation of COVID-19, we investigate three different methods of RSSI data augmentation based on Multi-Output Gaussian Process (MOGP), i.e., by a single floor, by neighboring floors, and by a single building; unlike Single-Output Gaussian Process (SOGP), MOGP can take into account the correlation among RSSI observations from multiple Access Points (APs) deployed closely to each other (e.g., APs on the same floor of a building) by collectively handling them. The feasibility of the MOGP-based RSSI data augmentation is demonstrated through experiments based on the state-of-the-art RNN indoor localization model and the UJIIndoorLoc, i.e., the most popular publicly-available multi-building and multi-floor indoor localization database, where the RNN model trained with the UJIIndoorLoc database augmented by using the whole RSSI data of a building in fitting an MOGP model (i.e., by a single building) outperforms the other two augmentation methods as well as the RNN model trained with the original UJIIndoorLoc database, resulting in the mean three-dimensional positioning error of 8.42 m.
Video Violence Recognition and Localization using a Semi-Supervised Hard-Attention Model
Authors: Hamid Mohammadi, Ehsan Nazerfard
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Empowering automated violence monitoring and surveillance systems amid the growing social violence and extremist activities worldwide could keep communities safe and save lives. The questionable reliability of human monitoring personnel and the increasing number of surveillance cameras makes automated artificial intelligence-based solutions compelling. Improving the current state-of-the-art deep learning approaches to video violence recognition to higher levels of accuracy and performance could enable surveillance systems to be more reliable and scalable. The main contribution of the proposed deep reinforcement learning method is to achieve state-of-the-art accuracy on RWF, Hockey, and Movies datasets while removing some of the computationally expensive processes and input features used in the previous solutions. The implementation of hard attention using a semi-supervised learning method made the proposed method capable of rough violence localization and added increased agent interpretability to the violence detection system.
Webly Supervised Concept Expansion for General Purpose Vision Models
Authors: Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Abstract
General purpose vision (GPV) systems are models that are designed to solve a wide array of visual tasks without requiring architectural changes. Today, GPVs primarily learn both skills and concepts from large fully supervised datasets. Scaling GPVs to tens of thousands of concepts by acquiring data to learn each concept for every skill quickly becomes prohibitive. This work presents an effective and inexpensive alternative: learn skills from fully supervised datasets, learn concepts from web image search results, and leverage a key characteristic of GPVs -- the ability to transfer visual knowledge across skills. We use a dataset of 1M+ images spanning 10k+ visual concepts to demonstrate webly-supervised concept expansion for two existing GPVs (GPV-1 and VL-T5) on 3 benchmarks - 5 COCO based datasets (80 primary concepts), a newly curated series of 5 datasets based on the OpenImages and VisualGenome repositories (~500 concepts) and the Web-derived dataset (10k+ concepts). We also propose a new architecture, GPV-2 that supports a variety of tasks -- from vision tasks like classification and localization to vision+language tasks like QA and captioning to more niche ones like human-object interaction recognition. GPV-2 benefits hugely from web data, outperforms GPV-1 and VL-T5 across these benchmarks, and does well in a 0-shot setting at action and attribute recognition.
Keyword: SLAM
DYP-SLAM: A Real-time Visual SLAM Based on YOLO and Probability in Dynamic Environments
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
There is no result
Keyword: loop detection
There is no result
Keyword: autonomous driving
A Machine Learning Smartphone-based Sensing for Driver Behavior Classification
Ad-datasets: a meta-collection of data sets for autonomous driving
Fixed-Point Code Synthesis For Neural Networks
A Survey on Safety-critical Scenario Generation from Methodological Perspective
Keyword: mapping
Multi Objective Resource Optimization of Wireless Network Based on Cross Domain Virtual Network Embedding
Generative Modeling of Complex Data
Geometrically Higher Order Unfitted Space-Time Methods for PDEs on Moving Domains
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators
Interactive Mobile App Navigation with Uncertain or Under-specified Natural Language Commands
Keyword: localization
Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization
Multi-Output Gaussian Process-Based Data Augmentation for Multi-Building and Multi-Floor Indoor Localization
Video Violence Recognition and Localization using a Semi-Supervised Hard-Attention Model
Webly Supervised Concept Expansion for General Purpose Vision Models