Abstract
We present a robust and accurate depth refinement system, named GeoRefine, for geometrically-consistent dense mapping from monocular sequences. GeoRefine consists of three modules: a hybrid SLAM module using learning-based priors, an online depth refinement module leveraging self-supervision, and a global mapping module via TSDF fusion. The proposed system is online by design and achieves great robustness and accuracy via: (i) a robustified hybrid SLAM that incorporates learning-based optical flow and/or depth; (ii) self-supervised losses that leverage SLAM outputs and enforce long-term geometric consistency; (iii) careful system design that avoids degenerate cases in online depth refinement. We extensively evaluate GeoRefine on multiple public datasets and reach as low as $5\%$ absolute relative depth errors.
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Cost-Aware Comparison of LiDAR-based 3D Object Detectors
Authors: Xiaofang Wang, Kris M. Kitani
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Considerable research efforts have been devoted to LiDAR-based 3D object detection and its empirical performance has been significantly improved. While the progress has been encouraging, we observe an overlooked issue: it is not yet common practice to compare different 3D detectors under the same cost, e.g., inference latency. This makes it difficult to quantify the true performance gain brought by recently proposed architecture designs. The goal of this work is to conduct a fair comparison of LiDAR-based 3D object detectors. Specifically, we focus on SECOND, a simple grid-based one-stage detector, and analyze its performance under different costs by scaling its original architecture. Then we compare the family of scaled SECOND with recent 3D detection methods, such as Voxel R-CNN and PV-RCNN++. The results are surprising. We find that, if allowed to use the same latency, SECOND can match the performance of PV-RCNN++, the current state-of-the-art method on the Waymo Open Dataset. Scaled SECOND also easily outperforms many recent 3D detection methods published during the past year. We recommend future research control the inference cost in their empirical comparison and include the family of scaled SECOND as a strong baseline when presenting novel 3D detection methods.
Multimodal Detection of Unknown Objects on Roads for Autonomous Driving
Authors: Daniel Bogdoll, Enrico Eisen, Maximilian Nitsche, Christin Scheib, J. Marius Zöllner
Abstract
Tremendous progress in deep learning over the last years has led towards a future with autonomous vehicles on our roads. Nevertheless, the performance of their perception systems is strongly dependent on the quality of the utilized training data. As these usually only cover a fraction of all object classes an autonomous driving system will face, such systems struggle with handling the unexpected. In order to safely operate on public roads, the identification of objects from unknown classes remains a crucial task. In this paper, we propose a novel pipeline to detect unknown objects. Instead of focusing on a single sensor modality, we make use of lidar and camera data by combining state-of-the art detection models in a sequential manner. We evaluate our approach on the Waymo Open Perception Dataset and point out current research gaps in anomaly detection.
RangeSeg: Range-Aware Real Time Segmentation of 3D LiDAR Point Clouds
Authors: Tzu-Hsuan Chen, Tian Sheuan Chang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Semantic outdoor scene understanding based on 3D LiDAR point clouds is a challenging task for autonomous driving due to the sparse and irregular data structure. This paper takes advantages of the uneven range distribution of different LiDAR laser beams to propose a range aware instance segmentation network, RangeSeg. RangeSeg uses a shared encoder backbone with two range dependent decoders. A heavy decoder only computes top of a range image where the far and small objects locate to improve small object detection accuracy, and a light decoder computes whole range image for low computational cost. The results are further clustered by the DBSCAN method with a resolution weighted distance function to get instance-level segmentation results. Experiments on the KITTI dataset show that RangeSeg outperforms the state-of-the-art semantic segmentation methods with enormous speedup and improves the instance-level segmentation performance on small and far objects. The whole RangeSeg pipeline meets the real time requirement on NVIDIA\textsuperscript{\textregistered} JETSON AGX Xavier with 19 frames per second in average.
Keyword: loop detection
There is no result
Keyword: autonomous driving
Multimodal Detection of Unknown Objects on Roads for Autonomous Driving
Authors: Daniel Bogdoll, Enrico Eisen, Maximilian Nitsche, Christin Scheib, J. Marius Zöllner
Abstract
Tremendous progress in deep learning over the last years has led towards a future with autonomous vehicles on our roads. Nevertheless, the performance of their perception systems is strongly dependent on the quality of the utilized training data. As these usually only cover a fraction of all object classes an autonomous driving system will face, such systems struggle with handling the unexpected. In order to safely operate on public roads, the identification of objects from unknown classes remains a crucial task. In this paper, we propose a novel pipeline to detect unknown objects. Instead of focusing on a single sensor modality, we make use of lidar and camera data by combining state-of-the art detection models in a sequential manner. We evaluate our approach on the Waymo Open Perception Dataset and point out current research gaps in anomaly detection.
Multitask Network for Joint Object Detection, Semantic Segmentation and Human Pose Estimation in Vehicle Occupancy Monitoring
Authors: Nikolas Ebert, Patrick Mangat, Oliver Wasenmüller
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In order to ensure safe autonomous driving, precise information about the conditions in and around the vehicle must be available. Accordingly, the monitoring of occupants and objects inside the vehicle is crucial. In the state-of-the-art, single or multiple deep neural networks are used for either object recognition, semantic segmentation, or human pose estimation. In contrast, we propose our Multitask Detection, Segmentation and Pose Estimation Network (MDSP) -- the first multitask network solving all these three tasks jointly in the area of occupancy monitoring. Due to the shared architecture, memory and computing costs can be saved while achieving higher accuracy. Furthermore, our architecture allows a flexible combination of the three mentioned tasks during a simple end-to-end training. We perform comprehensive evaluations on the public datasets SVIRO and TiCaM in order to demonstrate the superior performance.
RangeSeg: Range-Aware Real Time Segmentation of 3D LiDAR Point Clouds
Authors: Tzu-Hsuan Chen, Tian Sheuan Chang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Semantic outdoor scene understanding based on 3D LiDAR point clouds is a challenging task for autonomous driving due to the sparse and irregular data structure. This paper takes advantages of the uneven range distribution of different LiDAR laser beams to propose a range aware instance segmentation network, RangeSeg. RangeSeg uses a shared encoder backbone with two range dependent decoders. A heavy decoder only computes top of a range image where the far and small objects locate to improve small object detection accuracy, and a light decoder computes whole range image for low computational cost. The results are further clustered by the DBSCAN method with a resolution weighted distance function to get instance-level segmentation results. Experiments on the KITTI dataset show that RangeSeg outperforms the state-of-the-art semantic segmentation methods with enormous speedup and improves the instance-level segmentation performance on small and far objects. The whole RangeSeg pipeline meets the real time requirement on NVIDIA\textsuperscript{\textregistered} JETSON AGX Xavier with 19 frames per second in average.
Keyword: mapping
Paving the Way for Mature Secondary Research: The Seven Types of Literature Review
Abstract
Confusion over different kinds of secondary research, and their divergent purposes, is undermining the effectiveness and usefulness of secondary studies in software engineering. This short paper therefore explains the differences between ad hoc reviews, case surveys, critical reviews, meta-analyses (aka systematic literature reviews), meta-synthesis (aka thematic analysis), rapid reviews and scoping reviews (aka systematic mapping studies). We envision that these guidelines will help researchers better select and describe their literature reviews, while helping reviewers select more appropriate evaluation criteria.
A Hybrid Defense Method against Adversarial Attacks on Traffic Sign Classifiers in Autonomous Vehicles
Authors: Zadid Khan, Mashrur Chowdhury, Sakib Mahmud Khan
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Adversarial attacks can make deep neural network (DNN) models predict incorrect output labels, such as misclassified traffic signs, for autonomous vehicle (AV) perception modules. Resilience against adversarial attacks can help AVs navigate safely on the road by avoiding misclassication of signs or objects. This DNN-based study develops a resilient traffic sign classifier for AVs that uses a hybrid defense method. We use transfer learning to retrain the Inception-V3 and Resnet-152 models as traffic sign classifiers. This method also utilizes a combination of three different strategies: random filtering, ensembling, and local feature mapping. We use the random cropping and resizing technique for random filtering, plurality voting as ensembling strategy and an optical character recognition model as a local feature mapper. This DNN-based hybrid defense method has been tested for the no attack scenario and against well-known untargeted adversarial attacks (e.g., Projected Gradient Descent or PGD, Fast Gradient Sign Method or FGSM, Momentum Iterative Method or MIM attack, and Carlini and Wagner or C&W). We find that our hybrid defense method achieves 99% average traffic sign classification accuracy for the no attack scenario and 88% average traffic sign classification accuracy for all attack scenarios. Moreover, the hybrid defense method, presented in this study, improves the accuracy for traffic sign classification compared to the traditional defense methods (i.e., JPEG filtering, feature squeezing, binary filtering, and random filtering) up to 6%, 50%, and 55% for FGSM, MIM, and PGD attacks, respectively.
GRAPHYP: A Scientific Knowledge Graph with Manifold Subnetworks of Communities. Detection of Scholarly Disputes in Adversarial Information Routes
Authors: Renaud Fabre (LED), Otmane Azeroual (DZHW), Patrice Bellot (LIS), Joachim Schöpfel (GERIICO), Daniel Egret (PSL)
Abstract
The cognitive manifold of published content is currently expanding in all areas of science. However, Scientific Knowledge Graphs (SKGs) only provide poor pictures of the adversarial directions and scientific controversies that feed the production of knowledge. In this Article, we tackle the understanding of the design of the information space of a cognitive representation of research activities, and of related bottlenecks that affect search interfaces, in the mapping of structured objects into graphs. We propose, with SKG GRAPHYP, a novel graph designed geometric architecture which optimizes both the detection of the knowledge manifold of "cognitive communities", and the representation of alternative paths to adversarial answers to a research question, for instance in the context of academic disputes. With a methodology for designing "Manifold Subnetworks of Cognitive Communities", GRAPHYP provides a classification of distinct search paths in a research field. Users are detected from the variety of their search practices and classified in "Cognitive communities" from the analysis of the search history of their logs of scientific documentation. The manifold of practices is expressed from metrics of differentiated uses by triplets of nodes shaped into symmetrical graph subnetworks, with the following three parameters: Mass, Intensity, and Variety.
Copy Motion From One to Another: Fake Motion Video Generation
Abstract
One compelling application of artificial intelligence is to generate a video of a target person performing arbitrary desired motion (from a source person). While the state-of-the-art methods are able to synthesize a video demonstrating similar broad stroke motion details, they are generally lacking in texture details. A pertinent manifestation appears as distorted face, feet, and hands, and such flaws are very sensitively perceived by human observers. Furthermore, current methods typically employ GANs with a L2 loss to assess the authenticity of the generated videos, inherently requiring a large amount of training samples to learn the texture details for adequate video generation. In this work, we tackle these challenges from three aspects: 1) We disentangle each video frame into foreground (the person) and background, focusing on generating the foreground to reduce the underlying dimension of the network output. 2) We propose a theoretically motivated Gromov-Wasserstein loss that facilitates learning the mapping from a pose to a foreground image. 3) To enhance texture details, we encode facial features with geometric guidance and employ local GANs to refine the face, feet, and hands. Extensive experiments show that our method is able to generate realistic target person videos, faithfully copying complex motions from a source person. Our code and datasets are released at https://github.com/Sifann/FakeMotion
A Mapping Approach to Convert MTPs into a Capability and Skill Ontology
Authors: Aljosha Köcher, Lasse Beers, Alexander Fay
Abstract
Being able to quickly integrate new equipment and functions into an existing plant is a major goal for both discrete and process manufacturing. But currently, these two industry domains use different approaches to achieve this goal. While the Module Type Package (MTP) is getting more and more adapted in practical applications of process manufacturing, so-called skill-based manufacturing approaches are favored in the context of discrete manufacturing. The two approaches are incompatible because their models feature different contents and they use different technologies. This contribution provides a comparison of the MTP with a skill-based approach as well as an automated mapping that can be used to transfer the contents of an MTP into a skill ontology. Through this mapping an MTP can be semantically lifted in order to apply functions like querying or reasoning. Furthermore, machines that were previously described using two incompatible models can now be used in one production process.
Subspace Diffusion Generative Models
Authors: Bowen Jing, Gabriele Corso, Renato Berlinghieri, Tommi Jaakkola
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Score-based models generate samples by mapping noise to data (and vice versa) via a high-dimensional diffusion process. We question whether it is necessary to run this entire process at high dimensionality and incur all the inconveniences thereof. Instead, we restrict the diffusion via projections onto subspaces as the data distribution evolves toward noise. When applied to state-of-the-art models, our framework simultaneously improves sample quality -- reaching an FID of 2.17 on unconditional CIFAR-10 -- and reduces the computational cost of inference for the same number of denoising steps. Our framework is fully compatible with continuous-time diffusion and retains its flexible capabilities, including exact log-likelihoods and controllable generation. Code is available at https://github.com/bjing2016/subspace-diffusion.
GeoRefine: Self-Supervised Online Depth Refinement for Accurate Dense Mapping
Authors: Pan Ji, Qingan Yan, Yuxin Ma, Yi Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We present a robust and accurate depth refinement system, named GeoRefine, for geometrically-consistent dense mapping from monocular sequences. GeoRefine consists of three modules: a hybrid SLAM module using learning-based priors, an online depth refinement module leveraging self-supervision, and a global mapping module via TSDF fusion. The proposed system is online by design and achieves great robustness and accuracy via: (i) a robustified hybrid SLAM that incorporates learning-based optical flow and/or depth; (ii) self-supervised losses that leverage SLAM outputs and enforce long-term geometric consistency; (iii) careful system design that avoids degenerate cases in online depth refinement. We extensively evaluate GeoRefine on multiple public datasets and reach as low as $5\%$ absolute relative depth errors.
Keyword: localization
POCD: Probabilistic Object-Level Change Detection and Volumetric Mapping in Semi-Static Scenes
Authors: Jingxing Qian, Veronica Chatrath, Jun Yang, James Servos, Angela P. Schoellig, Steven L. Waslander
Abstract
Maintaining an up-to-date map to reflect recent changes in the scene is very important, particularly in situations involving repeated traversals by a robot operating in an environment over an extended period. Undetected changes may cause a deterioration in map quality, leading to poor localization, inefficient operations, and lost robots. Volumetric methods, such as truncated signed distance functions (TSDFs), have quickly gained traction due to their real-time production of a dense and detailed map, though map updating in scenes that change over time remains a challenge. We propose a framework that introduces a novel probabilistic object state representation to track object pose changes in semi-static scenes. The representation jointly models a stationarity score and a TSDF change measure for each object. A Bayesian update rule that incorporates both geometric and semantic information is derived to achieve consistent online map maintenance. To extensively evaluate our approach alongside the state-of-the-art, we release a novel real-world dataset in a warehouse environment. We also evaluate on the public ToyCar dataset. Our method outperforms state-of-the-art methods on the reconstruction quality of semi-static environments.
Keyword: SLAM
GeoRefine: Self-Supervised Online Depth Refinement for Accurate Dense Mapping
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Cost-Aware Comparison of LiDAR-based 3D Object Detectors
Multimodal Detection of Unknown Objects on Roads for Autonomous Driving
RangeSeg: Range-Aware Real Time Segmentation of 3D LiDAR Point Clouds
Keyword: loop detection
There is no result
Keyword: autonomous driving
Multimodal Detection of Unknown Objects on Roads for Autonomous Driving
Multitask Network for Joint Object Detection, Semantic Segmentation and Human Pose Estimation in Vehicle Occupancy Monitoring
RangeSeg: Range-Aware Real Time Segmentation of 3D LiDAR Point Clouds
Keyword: mapping
Paving the Way for Mature Secondary Research: The Seven Types of Literature Review
A Hybrid Defense Method against Adversarial Attacks on Traffic Sign Classifiers in Autonomous Vehicles
GRAPHYP: A Scientific Knowledge Graph with Manifold Subnetworks of Communities. Detection of Scholarly Disputes in Adversarial Information Routes
Copy Motion From One to Another: Fake Motion Video Generation
A Mapping Approach to Convert MTPs into a Capability and Skill Ontology
Subspace Diffusion Generative Models
GeoRefine: Self-Supervised Online Depth Refinement for Accurate Dense Mapping
Keyword: localization
POCD: Probabilistic Object-Level Change Detection and Volumetric Mapping in Semi-Static Scenes