Abstract
Recent progress in learning-based object pose estimation paves the way for developing richer object-level world representations. However, the estimators, often trained with out-of-domain data, can suffer performance degradation as deployed in novel environments. To address the problem, we present a SLAM-supported self-training procedure to autonomously improve robot object pose estimation ability during navigation. Combining the network predictions with robot odometry, we can build a consistent object-level environment map via pose graph optimization (PGO). Exploiting the state estimates from PGO, we pseudo-label robot-collected RGB images to fine-tune the pose estimators. Unfortunately, it is difficult to model the uncertainty of the estimator predictions. The unmodeled uncertainty in the data used for PGO can result in low-quality object pose estimates. An automatic covariance tuning method is developed for robust PGO by allowing the measurement uncertainty models to change as part of the optimization process. The formulation permits a straightforward alternating minimization procedure that re-scales covariances analytically and component-wise, enabling more flexible noise modeling for learning-based measurements. We test our method with the deep object pose estimator (DOPE) on the YCB video dataset and in real-world robot experiments. The method can achieve significant performance gain in pose estimation, and in return facilitates the success of object SLAM.
Tune your Place Recognition: Self-Supervised Domain Calibration via Robust SLAM
Authors: Pierre-Yves Lajoie, Giovanni Beltrame
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Visual place recognition techniques based on deep learning, which have imposed themselves as the state-of-the-art in recent years, do not always generalize well to environments that are visually different from the training set. Thus, to achieve top performance, it is sometimes necessary to fine-tune the networks to the target environment. To this end, we propose a completely self-supervised domain calibration procedure based on robust pose graph estimation from Simultaneous Localization and Mapping (SLAM) as the supervision signal without requiring GPS or manual labeling. We first show that the training samples produced by our technique are sufficient to train a visual place recognition system from a pre-trained classification model. Then, we show that our approach can improve the performance of a state-of-the-art technique on a target environment dissimilar from the training set. We believe that this approach will help practitioners to deploy more robust place recognition solutions in real-world applications.
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Pointillism: Accurate 3D bounding box estimation with multi-radars
Abstract
Autonomous perception requires high-quality environment sensing in the form of 3D bounding boxes of dynamic objects. The primary sensors used in automotive systems are light-based cameras and LiDARs. However, they are known to fail in adverse weather conditions. Radars can potentially solve this problem as they are barely affected by adverse weather conditions. However, specular reflections of wireless signals cause poor performance of radar point clouds. We introduce Pointillism, a system that combines data from multiple spatially separated radars with an optimal separation to mitigate these problems. We introduce a novel concept of Cross Potential Point Clouds, which uses the spatial diversity induced by multiple radars and solves the problem of noise and sparsity in radar point clouds. Furthermore, we present the design of RP-net, a novel deep learning architecture, designed explicitly for radar's sparse data distribution, to enable accurate 3D bounding box estimation. The spatial techniques designed and proposed in this paper are fundamental to radars point cloud distribution and would benefit other radar sensing applications.
Keyword: loop detection
There is no result
Keyword: autonomous driving
Fast Road Segmentation via Uncertainty-aware Symmetric Network
Abstract
The high performance of RGB-D based road segmentation methods contrasts with their rare application in commercial autonomous driving, which is owing to two reasons: 1) the prior methods cannot achieve high inference speed and high accuracy in both ways; 2) the different properties of RGB and depth data are not well-exploited, limiting the reliability of predicted road. In this paper, based on the evidence theory, an uncertainty-aware symmetric network (USNet) is proposed to achieve a trade-off between speed and accuracy by fully fusing RGB and depth data. Firstly, cross-modal feature fusion operations, which are indispensable in the prior RGB-D based methods, are abandoned. We instead separately adopt two light-weight subnetworks to learn road representations from RGB and depth inputs. The light-weight structure guarantees the real-time inference of our method. Moreover, a multiscale evidence collection (MEC) module is designed to collect evidence in multiple scales for each modality, which provides sufficient evidence for pixel class determination. Finally, in uncertainty-aware fusion (UAF) module, the uncertainty of each modality is perceived to guide the fusion of the two subnetworks. Experimental results demonstrate that our method achieves a state-of-the-art accuracy with real-time inference speed of 43+ FPS. The source code is available at https://github.com/morancyc/USNet.
Bilateral Deep Reinforcement Learning Approach for Better-than-human Car Following Model
Abstract
In the coming years and decades, autonomous vehicles (AVs) will become increasingly prevalent, offering new opportunities for safer and more convenient travel and potentially smarter traffic control methods exploiting automation and connectivity. Car following is a prime function in autonomous driving. Car following based on reinforcement learning has received attention in recent years with the goal of learning and achieving performance levels comparable to humans. However, most existing RL methods model car following as a unilateral problem, sensing only the vehicle ahead. Recent literature, however, Wang and Horn [16] has shown that bilateral car following that considers the vehicle ahead and the vehicle behind exhibits better system stability. In this paper we hypothesize that this bilateral car following can be learned using RL, while learning other goals such as efficiency maximisation, jerk minimization, and safety rewards leading to a learned model that outperforms human driving. We propose and introduce a Deep Reinforcement Learning (DRL) framework for car following control by integrating bilateral information into both state and reward function based on the bilateral control model (BCM) for car following control. Furthermore, we use a decentralized multi-agent reinforcement learning framework to generate the corresponding control action for each agent. Our simulation results demonstrate that our learned policy is better than the human driving policy in terms of (a) inter-vehicle headways, (b) average speed, (c) jerk, (d) Time to Collision (TTC) and (e) string stability.
Keyword: mapping
Tune your Place Recognition: Self-Supervised Domain Calibration via Robust SLAM
Authors: Pierre-Yves Lajoie, Giovanni Beltrame
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Visual place recognition techniques based on deep learning, which have imposed themselves as the state-of-the-art in recent years, do not always generalize well to environments that are visually different from the training set. Thus, to achieve top performance, it is sometimes necessary to fine-tune the networks to the target environment. To this end, we propose a completely self-supervised domain calibration procedure based on robust pose graph estimation from Simultaneous Localization and Mapping (SLAM) as the supervision signal without requiring GPS or manual labeling. We first show that the training samples produced by our technique are sufficient to train a visual place recognition system from a pre-trained classification model. Then, we show that our approach can improve the performance of a state-of-the-art technique on a target environment dissimilar from the training set. We believe that this approach will help practitioners to deploy more robust place recognition solutions in real-world applications.
Autonomous Mosquito Habitat Detection Using Satellite Imagery and Convolutional Neural Networks for Disease Risk Mapping
Abstract
Mosquitoes are known vectors for disease transmission that cause over one million deaths globally each year. The majority of natural mosquito habitats are areas containing standing water that are challenging to detect using conventional ground-based technology on a macro scale. Contemporary approaches, such as drones, UAVs, and other aerial imaging technology are costly when implemented and are only most accurate on a finer spatial scale whereas the proposed convolutional neural network(CNN) approach can be applied for disease risk mapping and further guide preventative efforts on a more global scale. By assessing the performance of autonomous mosquito habitat detection technology, the transmission of mosquito-borne diseases can be prevented in a cost-effective manner. This approach aims to identify the spatiotemporal distribution of mosquito habitats in extensive areas that are difficult to survey using ground-based technology by employing computer vision on satellite imagery for proof of concept. The research presents an evaluation and the results of 3 different CNN models to determine their accuracy of predicting large-scale mosquito habitats. For this approach, a dataset was constructed containing a variety of geographical features. Larger land cover variables such as ponds/lakes, inlets, and rivers were utilized to classify mosquito habitats while minute sites were omitted for higher accuracy on a larger scale. Using the dataset, multiple CNN networks were trained and evaluated for accuracy of habitat prediction. Utilizing a CNN-based approach on readily available satellite imagery is cost-effective and scalable, unlike most aerial imaging technology. Testing revealed that YOLOv4 obtained greater accuracy in mosquito habitat detection for identifying large-scale mosquito habitats.
Mapping global dynamics of benchmark creation and saturation in artificial intelligence
Authors: Adriano Barbosa-Silva, Simon Ott, Kathrin Blagec, Jan Brauner, Matthias Samwald
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curated data for 1688 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trended towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks were prone to unforeseen bursts. We conclude that future work should focus on large-scale community collaboration and on mapping benchmark performance gains to real-world utility and impact of AI.
A Brain-Inspired Low-Dimensional Computing Classifier for Inference on Tiny Devices
Abstract
By mimicking brain-like cognition and exploiting parallelism, hyperdimensional computing (HDC) classifiers have been emerging as a lightweight framework to achieve efficient on-device inference. Nonetheless, they have two fundamental drawbacks, heuristic training process and ultra-high dimension, which result in sub-optimal inference accuracy and large model sizes beyond the capability of tiny devices with stringent resource constraints. In this paper, we address these fundamental drawbacks and propose a low-dimensional computing (LDC) alternative. Specifically, by mapping our LDC classifier into an equivalent neural network, we optimize our model using a principled training approach. Most importantly, we can improve the inference accuracy while successfully reducing the ultra-high dimension of existing HDC models by orders of magnitude (e.g., 8000 vs. 4/64). We run experiments to evaluate our LDC classifier by considering different datasets for inference on tiny devices, and also implement different models on an FPGA platform for acceleration. The results highlight that our LDC classifier offers an overwhelming advantage over the existing brain-inspired HDC models and is particularly suitable for inference on tiny devices.
Keyword: localization
Tune your Place Recognition: Self-Supervised Domain Calibration via Robust SLAM
Authors: Pierre-Yves Lajoie, Giovanni Beltrame
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract
Visual place recognition techniques based on deep learning, which have imposed themselves as the state-of-the-art in recent years, do not always generalize well to environments that are visually different from the training set. Thus, to achieve top performance, it is sometimes necessary to fine-tune the networks to the target environment. To this end, we propose a completely self-supervised domain calibration procedure based on robust pose graph estimation from Simultaneous Localization and Mapping (SLAM) as the supervision signal without requiring GPS or manual labeling. We first show that the training samples produced by our technique are sufficient to train a visual place recognition system from a pre-trained classification model. Then, we show that our approach can improve the performance of a state-of-the-art technique on a target environment dissimilar from the training set. We believe that this approach will help practitioners to deploy more robust place recognition solutions in real-world applications.
Object-Based Visual Camera Pose Estimation From Ellipsoidal Model and 3D-Aware Ellipse Prediction
Abstract
In this paper, we propose a method for initial camera pose estimation from just a single image which is robust to viewing conditions and does not require a detailed model of the scene. This method meets the growing need of easy deployment of robotics or augmented reality applications in any environments, especially those for which no accurate 3D model nor huge amount of ground truth data are available. It exploits the ability of deep learning techniques to reliably detect objects regardless of viewing conditions. Previous works have also shown that abstracting the geometry of a scene of objects by an ellipsoid cloud allows to compute the camera pose accurately enough for various application needs. Though promising, these approaches use the ellipses fitted to the detection bounding boxes as an approximation of the imaged objects. In this paper, we go one step further and propose a learning-based method which detects improved elliptic approximations of objects which are coherent with the 3D ellipsoids in terms of perspective projection. Experiments prove that the accuracy of the computed pose significantly increases thanks to our method. This is achieved with very little effort in terms of training data acquisition - a few hundred calibrated images of which only three need manual object annotation. Code and models are released at https://gitlab.inria.fr/tangram/3d-aware-ellipses-for-visual-localization
Fragmentation analysis of a bar with the Lip-field approach
Authors: Nicolas Moës, Benoît Lé, Andrew Stershic
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
The Lip-field approach is a new way to regularize softening material models. It has already been tested in 1D quasistatic and 2D quasistatic: this paper extends it to 1D dynamics, on the challenging problem of dynamic fragmentation. The Lip-field approach formulates the mechanical problem to be solved as an optimization problem, where the incremental potential to be minimized is the non-regularized one. Spurious localization is prevented by imposing a Lipschitz constraint on the damage field. The displacement and damage field at each time step are obtained by a staggered algorithm, that is the displacement field is computed for a fixed damage field, then the damage field is computed for a fixed displacement field. Indeed, these two problems are convex, which is not the case of the global problem where the displacement and damage fields are sought at the same time. The incremental potential is obtained by equivalence with a cohesive zone model, which makes material parameters calibration simple. A non-regularized local damage equivalent to a cohesive zone model is also proposed. It is used as a reference for the Lip-field approach, without the need to implement displacement jumps. These approaches are applied to the brittle fragmentation of a 1D bar with randomly perturbed material properties to accelerate spatial convergence. Both explicit and implicit dynamic implementations are compared. Favorable comparison to several analytical, numerical and experimental references serves to validate the modeling approach.
Keyword: SLAM
SLAM-Supported Self-Training for 6D Object Pose Estimation
Tune your Place Recognition: Self-Supervised Domain Calibration via Robust SLAM
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Pointillism: Accurate 3D bounding box estimation with multi-radars
Keyword: loop detection
There is no result
Keyword: autonomous driving
Fast Road Segmentation via Uncertainty-aware Symmetric Network
Bilateral Deep Reinforcement Learning Approach for Better-than-human Car Following Model
Keyword: mapping
Tune your Place Recognition: Self-Supervised Domain Calibration via Robust SLAM
Autonomous Mosquito Habitat Detection Using Satellite Imagery and Convolutional Neural Networks for Disease Risk Mapping
Mapping global dynamics of benchmark creation and saturation in artificial intelligence
A Brain-Inspired Low-Dimensional Computing Classifier for Inference on Tiny Devices
Keyword: localization
Tune your Place Recognition: Self-Supervised Domain Calibration via Robust SLAM
Object-Based Visual Camera Pose Estimation From Ellipsoidal Model and 3D-Aware Ellipse Prediction
Fragmentation analysis of a bar with the Lip-field approach