New submissions for Tue, 1 Mar 22

Keyword: SLAM

RL-PGO: Reinforcement Learning-based Planar Pose-Graph Optimization

Authors: Nikolaos Kourtzanidis, Sajad Saeedi
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2202.13221
Pdf link: https://arxiv.org/pdf/2202.13221
Abstract The objective of pose SLAM or pose-graph optimization (PGO) is to estimate the trajectory of a robot given odometric and loop closing constraints. State-of-the-art iterative approaches typically involve the linearization of a non-convex objective function and then repeatedly solve a set of normal equations. Furthermore, these methods may converge to a local minima yielding sub-optimal results. In this work, we present to the best of our knowledge the first Deep Reinforcement Learning (DRL) based environment and proposed agent for 2D pose-graph optimization. We demonstrate that the pose-graph optimization problem can be modeled as a partially observable Markov Decision Process and evaluate performance on real-world and synthetic datasets. The proposed agent outperforms state-of-the-art solver g2o on challenging instances where traditional nonlinear least-squares techniques may fail or converge to unsatisfactory solutions. Experimental results indicate that iterative-based solvers bootstrapped with the proposed approach allow for significantly higher quality estimations. We believe that reinforcement learning-based PGO is a promising avenue to further accelerate research towards globally optimal algorithms. Thus, our work paves the way to new optimization strategies in the 2D pose SLAM domain.
Keyword: Visual inertial

There is no result

Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: Visual inertial odometry

There is no result

Keyword: lidar

How much depth information can radar infer and contribute
Authors: Chen-Chou Lo, Patrick Vandewalle
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2202.13220
Pdf link: https://arxiv.org/pdf/2202.13220
Abstract Since the release of radar data in large scale autonomous driving dataset, many works have been proposed fusing radar data as an additional guidance signal into monocular depth estimation models. Although positive performances are reported, it is still hard to tell how much depth information radar can infer and contribute in depth estimation models. In this paper, we conduct two experiments to investigate the intrinsic depth capability of radar data using state-of-the-art depth estimation models. Our experiments demonstrate that the estimated depth from only sparse radar input can detect the shape of surroundings to a certain extent. Furthermore, the monocular depth estimation model supervised by preprocessed radar only during training can achieve 70% performance in delta_1 score compared to the baseline model trained with sparse lidar.
Robust Self-Supervised LiDAR Odometry via Representative Structure Discovery and 3D Inherent Error Modeling
Authors: Yan Xu, Junyi Lin, Jianping Shi, Guofeng Zhang, Xiaogang Wang, Hongsheng Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.13353
Pdf link: https://arxiv.org/pdf/2202.13353
Abstract The correct ego-motion estimation basically relies on the understanding of correspondences between adjacent LiDAR scans. However, given the complex scenarios and the low-resolution LiDAR, finding reliable structures for identifying correspondences can be challenging. In this paper, we delve into structure reliability for accurate self-supervised ego-motion estimation and aim to alleviate the influence of unreliable structures in training, inference and mapping phases. We improve the self-supervised LiDAR odometry substantially from three aspects: 1) A two-stage odometry estimation network is developed, where we obtain the ego-motion by estimating a set of sub-region transformations and averaging them with a motion voting mechanism, to encourage the network focusing on representative structures. 2) The inherent alignment errors, which cannot be eliminated via ego-motion optimization, are down-weighted in losses based on the 3D point covariance estimations. 3) The discovered representative structures and learned point covariances are incorporated in the mapping module to improve the robustness of map construction. Our two-frame odometry outperforms the previous state of the arts by 16%/12% in terms of translational/rotational errors on the KITTI dataset and performs consistently well on the Apollo-Southbay datasets. We can even rival the fully supervised counterparts with our mapping module and more unlabeled training data.
Meta-RangeSeg: LiDAR Sequence Semantic Segmentation Using Multiple Feature Aggregation
Authors: Song Wang, Jianke Zhu, Ruixiang Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.13377
Pdf link: https://arxiv.org/pdf/2202.13377
Abstract LiDAR sensor is essential to the perception system in autonomous vehicles and intelligent robots. To fulfill the real-time requirements in real-world applications, it is necessary to efficiently segment the LiDAR scans. Most of previous approaches directly project 3D point cloud onto the 2D spherical range image so that they can make use of the efficient 2D convolutional operations for image segmentation. Although having achieved the encouraging results, the neighborhood information is not well-preserved in the spherical projection. Moreover, the temporal information is not taken into consideration in the single scan segmentation task. To tackle these problems, we propose a novel approach to semantic segmentation for LiDAR sequences named Meta-RangeSeg, where a novel range residual image representation is introduced to capture the spatial-temporal information. Specifically, Meta-Kernel is employed to extract the meta features, which reduces the inconsistency between the 2D range image coordinates input and Cartesian coordinates output. An efficient U-Net backbone is used to obtain the multi-scale features. Furthermore, Feature Aggregation Module (FAM) aggregates the meta features and multi-scale features, which tends to strengthen the role of range channel. We have conducted extensive experiments for performance evaluation on SemanticKITTI, which is the de-facto dataset for LiDAR semantic segmentation. The promising results show that our proposed Meta-RangeSeg method is more efficient and effective than the existing approaches.
Globally Optimal Boresight Alignment of UAV-LiDAR Systems
Authors: Smitha Gopinath, Hassan L. Hijazi, Adam Collins, Julian Dann Nathan Lemons, Emily Schultz-Fellenz, Russell Bent, Amira Hijazi, Gert Riemersma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.13501
Pdf link: https://arxiv.org/pdf/2202.13501
Abstract In airborne light detection and ranging (LiDAR) systems, misalignments between the LiDAR-scanner and the inertial navigation system (INS) mounted on an unmanned aerial vehicle (UAV)'s frame can lead to inaccurate 3D point clouds. Determining the orientation offset, or boresight error is key to many LiDAR-based applications. In this work, we introduce a mixed-integer quadratically constrained quadratic program (MIQCQP) that can globally solve this misalignment problem. We also propose a nested spatial branch and bound (nsBB) algorithm that improves computational performance. The nsBB relies on novel preprocessing steps that progressively reduce the problem size. In addition, an adaptive grid search (aGS) allowing us to obtain quick heuristic solutions is presented. Our algorithms are open-source, multi-threaded and multi-machine compatible.
Cyber Mobility Mirror: Deep Learning-based Real-time 3D Object Perception and Reconstruction Using Roadside LiDAR
Authors: Zhengwei Bai, Saswat Priyadarshi Nayak, Xuanpeng Zhao, Guoyuan Wu, Matthew J. Barth, Xuewei Qi, Yongkang Liu, Kentaro Oguchi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.13505
Pdf link: https://arxiv.org/pdf/2202.13505
Abstract Enabling Cooperative Driving Automation (CDA) requires high-fidelity and real-time perception information, which is available from onboard sensors or vehicle-to-everything (V2X) communications. Nevertheless, the accessibility of this information may suffer from the range and occlusion of perception or limited penetration rates in connectivity. In this paper, we introduce the prototype of Cyber Mobility Mirror (CMM), a next-generation real-time traffic surveillance system for 3D object detection, classification, tracking, and reconstruction, to provide CAVs with wide-range high-fidelity perception information in a mixed traffic environment. The CMM system consists of six main components: 1) the data pre-processor to retrieve and pre-process raw data from the roadside LiDAR; 2) the 3D object detector to generate 3D bounding boxes based on point cloud data; 3) the multi-objects tracker to endow unique IDs to detected objects and estimate their dynamic states; 4) the global locator to map positioning information from the LiDAR coordinate to geographic coordinate using coordinate transformation; 5) the cloud-based communicator to transmit perception information from roadside sensors to equipped vehicles; and 6) the onboard advisor to reconstruct and display the real-time traffic conditions via Graphical User Interface (GUI). In this study, a field-operational prototype system is deployed at a real-world intersection, University Avenue and Iowa Avenue in Riverside, California to assess the feasibility and performance of our CMM system. Results from field tests demonstrate that our CMM prototype system can provide satisfactory perception performance with 96.99% precision and 83.62% recall. High-fidelity real-time traffic conditions (at the object level) can be displayed on the GUI of the equipped vehicle with a frequency of 3-4 Hz.
Towards Class-agnostic Tracking Using Feature Decorrelation in Point Clouds
Authors: Shengjing Tian, Jun Liu, Xiuping Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.13524
Pdf link: https://arxiv.org/pdf/2202.13524
Abstract Single object tracking in point clouds has been attracting more and more attention owing to the presence of LiDAR sensors in 3D vision. However, the existing methods based on deep neural networks focus mainly on training different models for different categories, which makes them unable to perform well in real-world applications when encountering classes unseen during the training phase. In this work, we thus turn our thoughts to a more challenging task in the LiDAR point clouds, class-agnostic tracking, where a general model is supposed to be learned for any specified targets of both observed and unseen categories. In particular, we first investigate the class-agnostic performances of the state-of-the-art trackers via exposing the unseen categories to them during testing, finding that a key factor for class-agnostic tracking is how to constrain fused features between the template and search region to maintain generalization when the distribution is shifted from observed to unseen classes. Therefore, we propose a feature decorrelation method to address this problem, which eliminates the spurious correlations of the fused features through a set of learned weights and further makes the search region consistent among foreground points and distinctive between foreground and background points. Experiments on the KITTI and NuScenes demonstrate that the proposed method can achieve considerable improvements by benchmarking against the advanced trackers P2B and BAT, especially when tracking unseen objects.
Joint Camera Intrinsic and LiDAR-Camera Extrinsic Calibration
Authors: Guohang Yan, Feiyu He, Chunlei Shi, Xinyu Cai, Yikang Li
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.13708
Pdf link: https://arxiv.org/pdf/2202.13708
Abstract Sensor-based environmental perception is a crucial step for autonomous driving systems, for which an accurate calibration between multiple sensors plays a critical role. For the calibration of LiDAR and camera, the existing method is generally to calibrate the intrinsic of the camera first and then calibrate the extrinsic of the LiDAR and camera. If the camera's intrinsic is not calibrated correctly in the first stage, it isn't easy to calibrate the LiDAR-camera extrinsic accurately. Due to the complex internal structure of the camera and the lack of an effective quantitative evaluation method for the camera's intrinsic calibration, in the actual calibration, the accuracy of extrinsic parameter calibration is often reduced due to the tiny error of the camera's intrinsic parameters. To this end, we propose a novel target-based joint calibration method of the camera intrinsic and LiDAR-camera extrinsic parameters. Firstly, we design a novel calibration board pattern, adding four circular holes around the checkerboard for locating the LiDAR pose. Subsequently, a cost function defined under the reprojection constraints of the checkerboard and circular holes features is designed to solve the camera's intrinsic parameters, distortion factor, and LiDAR-camera extrinsic parameter. In the end, quantitative and qualitative experiments are conducted in actual and simulated environments, and the result shows the proposed method can achieve accuracy and robustness performance. The open-source code is available at https://github.com/OpenCalib/JointCalib.
ANTLER: Bayesian Nonlinear Tensor Learning and Modeler for Unstructured, Varying-Size Point Cloud Data
Authors: Michael Biehler, Hao Yan, Jianjun Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2202.13788
Pdf link: https://arxiv.org/pdf/2202.13788
Abstract Unstructured point clouds with varying sizes are increasingly acquired in a variety of environments through laser triangulation or Light Detection and Ranging (LiDAR). Predicting a scalar response based on unstructured point clouds is a common problem that arises in a wide variety of applications. The current literature relies on several pre-processing steps such as structured subsampling and feature extraction to analyze the point cloud data. Those techniques lead to quantization artifacts and do not consider the relationship between the regression response and the point cloud during pre-processing. Therefore, we propose a general and holistic "Bayesian Nonlinear Tensor Learning and Modeler" (ANTLER) to model the relationship of unstructured, varying-size point cloud data with a scalar or multivariate response. The proposed ANTLER simultaneously optimizes a nonlinear tensor dimensionality reduction and a nonlinear regression model with a 3D point cloud input and a scalar or multivariate response. ANTLER has the ability to consider the complex data representation, high-dimensionality,and inconsistent size of the 3D point cloud data.
TEScalib: Targetless Extrinsic Self-Calibration of LiDAR and Stereo Camera for Automated Driving Vehicles with Uncertainty Analysis
Authors: Haohao Hu, Fengze Han, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.13847
Pdf link: https://arxiv.org/pdf/2202.13847
Abstract In this paper, we present TEScalib, a novel extrinsic self-calibration approach of LiDAR and stereo camera using the geometric and photometric information of surrounding environments without any calibration targets for automated driving vehicles. Since LiDAR and stereo camera are widely used for sensor data fusion on automated driving vehicles, their extrinsic calibration is highly important. However, most of the LiDAR and stereo camera calibration approaches are mainly target-based and therefore time consuming. Even the newly developed targetless approaches in last years are either inaccurate or unsuitable for driving platforms. To address those problems, we introduce TEScalib. By applying a 3D mesh reconstruction-based point cloud registration, the geometric information is used to estimate the LiDAR to stereo camera extrinsic parameters accurately and robustly. To calibrate the stereo camera, a photometric error function is builded and the LiDAR depth is involved to transform key points from one camera to another. During driving, these two parts are processed iteratively. Besides that, we also propose an uncertainty analysis for reflecting the reliability of the estimated extrinsic parameters. Our TEScalib approach evaluated on the KITTI dataset achieves very promising results.
Large-Scale 3D Semantic Reconstruction for Automated Driving Vehicles with Adaptive Truncated Signed Distance Function
Authors: Haohao Hu, Hexing Yang, Jian Wu, Xiao Lei, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.13855
Pdf link: https://arxiv.org/pdf/2202.13855
Abstract The Large-scale 3D reconstruction, texturing and semantic mapping are nowadays widely used for automated driving vehicles, virtual reality and automatic data generation. However, most approaches are developed for RGB-D cameras with colored dense point clouds and not suitable for large-scale outdoor environments using sparse LiDAR point clouds. Since a 3D surface can be usually observed from multiple camera images with different view poses, an optimal image patch selection for the texturing and an optimal semantic class estimation for the semantic mapping are still challenging. To address these problems, we propose a novel 3D reconstruction, texturing and semantic mapping system using LiDAR and camera sensors. An Adaptive Truncated Signed Distance Function is introduced to describe surfaces implicitly, which can deal with different LiDAR point sparsities and improve model quality. The from this implicit function extracted triangle mesh map is then textured from a series of registered camera images by applying an optimal image patch selection strategy. Besides that, a Markov Random Field-based data fusion approach is proposed to estimate the optimal semantic class for each triangle mesh. Our approach is evaluated on a synthetic dataset, the KITTI dataset and a dataset recorded with our experimental vehicle. The results show that the 3D models generated using our approach are more accurate in comparison to using other state-of-the-art approaches. The texturing and semantic mapping achieve also very promising results.
Keyword: loop detection

There is no result

Keyword: autonomous driving

Photonic reinforcement learning based on optoelectronic reservoir computing
Authors: Kazutaka Kanno, Atsushi Uchida
Subjects: Emerging Technologies (cs.ET); Chaotic Dynamics (nlin.CD); Optics (physics.optics)
Arxiv link: https://arxiv.org/abs/2202.12896
Pdf link: https://arxiv.org/pdf/2202.12896
Abstract Reinforcement learning has been intensively investigated and developed in artificial intelligence in the absence of training data, such as autonomous driving vehicles, robot control, internet advertising, and elastic optical networks. However, the computational cost of reinforcement learning with deep neural networks is extremely high and reducing the learning cost is a challenging issue. We propose a photonic on-line implementation of reinforcement learning using optoelectronic delay-based reservoir computing, both experimentally and numerically. In the proposed scheme, we accelerate reinforcement learning at a rate of several megahertz because there is no required learning process for the internal connection weights in reservoir computing. We perform two benchmark tasks, CartPole-v0 and MountanCar-v0 tasks, to evaluate the proposed scheme. Our results represent the first hardware implementation of reinforcement learning based on photonic reservoir computing and pave the way for fast and efficient reinforcement learning as a novel photonic accelerator.
Attacks and Faults Injection in Self-Driving Agents on the Carla Simulator -- Experience Report
Authors: Niccolò Piazzesi, Massimo Hong, Andrea Ceccarelli
Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2202.12991
Pdf link: https://arxiv.org/pdf/2202.12991
Abstract Machine Learning applications are acknowledged at the foundation of autonomous driving, because they are the enabling technology for most driving tasks. However, the inclusion of trained agents in automotive systems exposes the vehicle to novel attacks and faults, that can result in safety threats to the driv-ing tasks. In this paper we report our experimental campaign on the injection of adversarial attacks and software faults in a self-driving agent running in a driving simulator. We show that adversarial attacks and faults injected in the trained agent can lead to erroneous decisions and severely jeopardize safety. The paper shows a feasible and easily-reproducible approach based on open source simula-tor and tools, and the results clearly motivate the need of both protective measures and extensive testing campaigns.
How much depth information can radar infer and contribute
Authors: Chen-Chou Lo, Patrick Vandewalle
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2202.13220
Pdf link: https://arxiv.org/pdf/2202.13220
Abstract Since the release of radar data in large scale autonomous driving dataset, many works have been proposed fusing radar data as an additional guidance signal into monocular depth estimation models. Although positive performances are reported, it is still hard to tell how much depth information radar can infer and contribute in depth estimation models. In this paper, we conduct two experiments to investigate the intrinsic depth capability of radar data using state-of-the-art depth estimation models. Our experiments demonstrate that the estimated depth from only sparse radar input can detect the shape of surroundings to a certain extent. Furthermore, the monocular depth estimation model supervised by preprocessed radar only during training can achieve 70% performance in delta_1 score compared to the baseline model trained with sparse lidar.
Aggressive Racecar Drifting Control Using Onboard Cameras and Inertial Measurement Unit
Authors: Shuaibing Lin, JiaLiang Qu, Zishuo Li, Xiaoqiang Ren, Yilin Mo
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.13513
Pdf link: https://arxiv.org/pdf/2202.13513
Abstract Complex autonomous driving, such as drifting, requires high-precision and high-frequency pose information to ensure accuracy and safety, which is notably difficult when using only onboard sensors. In this paper, we propose a drift controller with two feedback control loops: sideslip controller that stabilizes the sideslip angle by tuning the front wheel steering angle, and circle controller that maintains a stable trajectory radius and circle center by controlling the wheel rotational speed. We use an extended Kalman filter to estimate the state. A robustified KASA algorithm is further proposed to accurately estimate the parameters of the circle (i.e., the center and radius) that best fits into the current trajectory. On the premise of the uniform circular motion of the vehicle in the process of stable drift, we use angle information instead of acceleration to describe the dynamic of the vehicle. We implement our method on a 1/10 scale race car. The car drifts stably with a given center and radius, which illustrates the effectiveness of our method.
Unsupervised Representation Learning for Point Clouds: A Survey
Authors: Aoran Xiao, Jiaxing Huang, Dayan Guan, Shijian Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2202.13589
Pdf link: https://arxiv.org/pdf/2202.13589
Abstract Point cloud data have been widely explored due to its superior accuracy and robustness under various adverse situations. Meanwhile, deep neural networks (DNNs) have achieved very impressive success in various applications such as surveillance and autonomous driving. The convergence of point cloud and DNNs has led to many deep point cloud models, largely trained under the supervision of large-scale and densely-labelled point cloud data. Unsupervised point cloud representation learning, which aims to learn general and useful point cloud representations from unlabelled point cloud data, has recently attracted increasing attention due to the constraint in large-scale point cloud labelling. This paper provides a comprehensive review of unsupervised point cloud representation learning using DNNs. It first describes the motivation, general pipelines as well as terminologies of the recent studies. Relevant background including widely adopted point cloud datasets and DNN architectures is then briefly presented. This is followed by an extensive discussion of existing unsupervised point cloud representation learning methods according to their technical approaches. We also quantitatively benchmark and discuss the reviewed methods over multiple widely adopted point cloud datasets. Finally, we share our humble opinion about several challenges and problems that could be pursued in the future research in unsupervised point cloud representation learning. A project associated with this survey has been built at https://github.com/xiaoaoran/3d_url_survey.
Joint Camera Intrinsic and LiDAR-Camera Extrinsic Calibration
Authors: Guohang Yan, Feiyu He, Chunlei Shi, Xinyu Cai, Yikang Li
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.13708
Pdf link: https://arxiv.org/pdf/2202.13708
Abstract Sensor-based environmental perception is a crucial step for autonomous driving systems, for which an accurate calibration between multiple sensors plays a critical role. For the calibration of LiDAR and camera, the existing method is generally to calibrate the intrinsic of the camera first and then calibrate the extrinsic of the LiDAR and camera. If the camera's intrinsic is not calibrated correctly in the first stage, it isn't easy to calibrate the LiDAR-camera extrinsic accurately. Due to the complex internal structure of the camera and the lack of an effective quantitative evaluation method for the camera's intrinsic calibration, in the actual calibration, the accuracy of extrinsic parameter calibration is often reduced due to the tiny error of the camera's intrinsic parameters. To this end, we propose a novel target-based joint calibration method of the camera intrinsic and LiDAR-camera extrinsic parameters. Firstly, we design a novel calibration board pattern, adding four circular holes around the checkerboard for locating the LiDAR pose. Subsequently, a cost function defined under the reprojection constraints of the checkerboard and circular holes features is designed to solve the camera's intrinsic parameters, distortion factor, and LiDAR-camera extrinsic parameter. In the end, quantitative and qualitative experiments are conducted in actual and simulated environments, and the result shows the proposed method can achieve accuracy and robustness performance. The open-source code is available at https://github.com/OpenCalib/JointCalib.
Path-Aware Graph Attention for HD Maps in Motion Prediction
Authors: Fang Da, Yu Zhang
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.13772
Pdf link: https://arxiv.org/pdf/2202.13772
Abstract The success of motion prediction for autonomous driving relies on integration of information from the HD maps. As maps are naturally graph-structured, investigation on graph neural networks (GNNs) for encoding HD maps is burgeoning in recent years. However, unlike many other applications where GNNs have been straightforwardly deployed, HD maps are heterogeneous graphs where vertices (lanes) are connected by edges (lane-lane interaction relationships) of various nature, and most graph-based models are not designed to understand the variety of edge types which provide crucial cues for predicting how the agents would travel the lanes. To overcome this challenge, we propose Path-Aware Graph Attention, a novel attention architecture that infers the attention between two vertices by parsing the sequence of edges forming the paths that connect them. Our analysis illustrates how the proposed attention mechanism can facilitate learning in a didactic problem where existing graph networks like GCN struggle. By improving map encoding, the proposed model surpasses previous state of the art on the Argoverse Motion Forecasting dataset, and won the first place in the 2021 Argoverse Motion Forecasting Competition.
"If you could see me through my eyes": Predicting Pedestrian Perception
Authors: Julian Petzold, Mostafa Wahby, Franek Stark, Ulrich Behrje, Heiko Hamann
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2202.13981
Pdf link: https://arxiv.org/pdf/2202.13981
Abstract Pedestrians are particularly vulnerable road users in urban traffic. With the arrival of autonomous driving, novel technologies can be developed specifically to protect pedestrians. We propose a~machine learning toolchain to train artificial neural networks as models of pedestrian behavior. In a~preliminary study, we use synthetic data from simulations of a~specific pedestrian crossing scenario to train a~variational autoencoder and a~long short-term memory network to predict a~pedestrian's future visual perception. We can accurately predict a~pedestrian's future perceptions within relevant time horizons. By iteratively feeding these predicted frames into these networks, they can be used as simulations of pedestrians as indicated by our results. Such trained networks can later be used to predict pedestrian behaviors even from the perspective of the autonomous car. Another future extension will be to re-train these networks with real-world video data.
Keyword: mapping

Conformal capacity and polycircular domains
Authors: Harri Hakula, Mohamed M. S. Nasser, Matti Vuorinen
Subjects: Numerical Analysis (math.NA); Complex Variables (math.CV)
Arxiv link: https://arxiv.org/abs/2202.12922
Pdf link: https://arxiv.org/pdf/2202.12922
Abstract We study numerical conformal mapping of multiply connected planar domains with boundaries consisting of unions of finitely many circular arcs, so called polycircular domains. We compute the conformal capacities of condensers defined by polycircular domains. Experimental error estimates are provided for the computed capacity and, when possible, the rate of convergence under refinement of discretisation is analysed. The main ingredients of the computation are two computational methods, on one hand the boundary integral equation method combined with the fast multipole method and on the other hand the $hp$-FEM method. The results obtained with these two methods agree with high accuracy.
Multi-view Gradient Consistency for SVBRDF Estimation of Complex Scenes under Natural Illumination
Authors: Alen Joy, Charalambos Poullis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2202.13017
Pdf link: https://arxiv.org/pdf/2202.13017
Abstract This paper presents a process for estimating the spatially varying surface reflectance of complex scenes observed under natural illumination. In contrast to previous methods, our process is not limited to scenes viewed under controlled lighting conditions but can handle complex indoor and outdoor scenes viewed under arbitrary illumination conditions. An end-to-end process uses a model of the scene's geometry and several images capturing the scene's surfaces from arbitrary viewpoints and under various natural illumination conditions. We develop a differentiable path tracer that leverages least-square conformal mapping for handling multiple disjoint objects appearing in the scene. We follow a two-step optimization process and introduce a multi-view gradient consistency loss which results in up to 30-50% improvement in the image reconstruction loss and can further achieve better disentanglement of the diffuse and specular BRDFs compared to other state-of-the-art. We demonstrate the process in real-world indoor and outdoor scenes from images in the wild and show that we can produce realistic renders consistent with actual images using the estimated reflectance properties. Experiments show that our technique produces realistic results for arbitrary outdoor scenes with complex geometry. The source code is publicly available at: https://gitlab.com/alen.joy/multi-view-gradient-consistency-for-svbrdf-estimation-of-complex-scenes-under-natural-illumination
Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs
Authors: Tomoki Ando, Hiroto Iino, Hiroki Mori, Ryota Torishima, Kuniyuki Takahashi, Shoichiro Yamaguchi, Daisuke Okanohara, Tetsuya Ogata
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.13062
Pdf link: https://arxiv.org/pdf/2202.13062
Abstract We propose a new method for collision-free path planning by Conditional Generative Adversarial Networks (cGANs) by mapping its latent space to only the collision-free areas of the robot joint space when an obstacle map is given as a condition. When manipulating a robot arm, it is necessary to generate a trajectory that avoids contact with the robot itself or the surrounding environment for safety reasons, and it is convenient to generate multiple arbitrary trajectories appropriate for respective purposes. In the proposed method, various trajectories to avoid obstacles can be generated by connecting the start and goal with arbitrary line segments in this latent space. Our method simply provides this collision-free latent space after which any planner, using any optimization conditions, can be used to generate the most suitable paths on the fly. We successfully verified this method with a simulated and actual UR5e 6-DoF robotic arm. We confirmed that different trajectories can be generated according to different optimization conditions.
Robust Self-Supervised LiDAR Odometry via Representative Structure Discovery and 3D Inherent Error Modeling
Authors: Yan Xu, Junyi Lin, Jianping Shi, Guofeng Zhang, Xiaogang Wang, Hongsheng Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.13353
Pdf link: https://arxiv.org/pdf/2202.13353
Abstract The correct ego-motion estimation basically relies on the understanding of correspondences between adjacent LiDAR scans. However, given the complex scenarios and the low-resolution LiDAR, finding reliable structures for identifying correspondences can be challenging. In this paper, we delve into structure reliability for accurate self-supervised ego-motion estimation and aim to alleviate the influence of unreliable structures in training, inference and mapping phases. We improve the self-supervised LiDAR odometry substantially from three aspects: 1) A two-stage odometry estimation network is developed, where we obtain the ego-motion by estimating a set of sub-region transformations and averaging them with a motion voting mechanism, to encourage the network focusing on representative structures. 2) The inherent alignment errors, which cannot be eliminated via ego-motion optimization, are down-weighted in losses based on the 3D point covariance estimations. 3) The discovered representative structures and learned point covariances are incorporated in the mapping module to improve the robustness of map construction. Our two-frame odometry outperforms the previous state of the arts by 16%/12% in terms of translational/rotational errors on the KITTI dataset and performs consistently well on the Apollo-Southbay datasets. We can even rival the fully supervised counterparts with our mapping module and more unlabeled training data.
TC-Net: Triple Context Network for Automated Stroke Lesion Segmentation
Authors: Xiuquan Du, Kunpeng Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.13687
Pdf link: https://arxiv.org/pdf/2202.13687
Abstract Accurate lesion segmentation plays a key role in the clinical mapping of stroke. Convolutional neural network (CNN) approaches based on U-shaped structures have achieved remarkable performance in this task. However, the single-stage encoder-decoder unresolvable the inter-class similarity due to the inadequate utilization of contextual information, such as lesion-tissue similarity. In addition, most approaches use fine-grained spatial attention to capture spatial context information, yet fail to generate accurate attention maps in encoding stage and lack effective regularization. In this work, we propose a new network, Triple Context Network (TC-Net), with the capture of spatial contextual information as the core. We firstly design a coarse-grained patch attention module to generate patch-level attention maps in the encoding stage to distinguish targets from patches and learn target-specific detail features. Then, to enrich the representation of boundary information of these features, a cross-feature fusion module with global contextual information is explored to guide the selective aggregation of 2D and 3D feature maps, which compensates for the lack of boundary learning capability of 2D convolution. Finally, we use multi-scale deconvolution instead of linear interpolation to enhance the recovery of target space and boundary information in the decoding stage. Our network is evaluated on the open dataset ATLAS, achieving the highest DSC score of 0.594, Hausdorff distance of 27.005 mm, and average symmetry surface distance of 7.137 mm, where our proposed method outperforms other state-of-the-art methods.
SFIP: Coarse-Grained Syscall-Flow-Integrity Protection in Modern Systems
Authors: Claudio Canella, Sebastian Dorn, Daniel Gruss, Michael Schwarz
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2202.13716
Pdf link: https://arxiv.org/pdf/2202.13716
Abstract Growing code bases of modern applications have led to a steady increase in the number of vulnerabilities. Control-Flow Integrity (CFI) is one promising mitigation that is more and more widely deployed and prevents numerous exploits. CFI focuses purely on one security domain. That is, transitions between user space and kernel space are not protected by CFI. Furthermore, if user space CFI is bypassed, the system and kernel interfaces remain unprotected, and an attacker can run arbitrary transitions. In this paper, we introduce the concept of syscall-flow-integrity protection (SFIP) that complements the concept of CFI with integrity for user-kernel transitions. Our proof-of-concept implementation relies on static analysis during compilation to automatically extract possible syscall transitions. An application can opt-in to SFIP by providing the extracted information to the kernel for runtime enforcement. The concept is built on three fully-automated pillars: First, a syscall state machine, representing possible transitions according to a syscall digraph model. Second, a syscall-origin mapping, which maps syscalls to the locations at which they can occur. Third, an efficient enforcement of syscall-flow integrity in a modified Linux kernel. In our evaluation, we show that SFIP can be applied to large scale applications with minimal slowdowns. In a micro- and a macrobenchmark, it only introduces an overhead of 13.1% and 1.8%, respectively. In terms of security, we discuss and demonstrate its effectiveness in preventing control-flow-hijacking attacks in real-world applications. Finally, to highlight the reduction in attack surface, we perform an analysis of the state machines and syscall-origin mappings of several real-world applications. On average, SFIP decreases the number of possible transitions by 38.6% compared to seccomp and 90.9% when no protection is applied.
Large-Scale 3D Semantic Reconstruction for Automated Driving Vehicles with Adaptive Truncated Signed Distance Function
Authors: Haohao Hu, Hexing Yang, Jian Wu, Xiao Lei, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.13855
Pdf link: https://arxiv.org/pdf/2202.13855
Abstract The Large-scale 3D reconstruction, texturing and semantic mapping are nowadays widely used for automated driving vehicles, virtual reality and automatic data generation. However, most approaches are developed for RGB-D cameras with colored dense point clouds and not suitable for large-scale outdoor environments using sparse LiDAR point clouds. Since a 3D surface can be usually observed from multiple camera images with different view poses, an optimal image patch selection for the texturing and an optimal semantic class estimation for the semantic mapping are still challenging. To address these problems, we propose a novel 3D reconstruction, texturing and semantic mapping system using LiDAR and camera sensors. An Adaptive Truncated Signed Distance Function is introduced to describe surfaces implicitly, which can deal with different LiDAR point sparsities and improve model quality. The from this implicit function extracted triangle mesh map is then textured from a series of registered camera images by applying an optimal image patch selection strategy. Besides that, a Markov Random Field-based data fusion approach is proposed to estimate the optimal semantic class for each triangle mesh. Our approach is evaluated on a synthetic dataset, the KITTI dataset and a dataset recorded with our experimental vehicle. The results show that the 3D models generated using our approach are more accurate in comparison to using other state-of-the-art approaches. The texturing and semantic mapping achieve also very promising results.
A Novel Viewport-Adaptive Motion Compensation Technique for Fisheye Video
Authors: Andy Regensky, Christian Herglotz, André Kaup
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2202.13892
Pdf link: https://arxiv.org/pdf/2202.13892
Abstract Although fisheye cameras are in high demand in many application areas due to their large field of view, many image and video signal processing tasks such as motion compensation suffer from the introduced strong radial distortions. A recently proposed projection-based approach takes the fisheye projection into account to improve fisheye motion compensation. However, the approach does not consider the large field of view of fisheye lenses that requires the consideration of different motion planes in 3D space. We propose a novel viewport-adaptive motion compensation technique that applies the motion vectors in different perspective viewports in order to realize these motion planes. Thereby, some pixels are mapped to so-called virtual image planes and require special treatment to obtain reliable mappings between the perspective viewports and the original fisheye image. While the state-of-the-art ultra wide-angle compensation is sufficiently accurate, we propose a virtual image plane compensation that leads to perfect mappings. All in all, we achieve average gains of +2.40 dB in terms of PSNR compared to the state of the art in fisheye motion compensation.
Motion dynamics of inertial pair coupled via frictional interface
Authors: Michael Ruderman, Andrei Zagvozdkin, Dmitrii Rachinskii
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2202.13913
Pdf link: https://arxiv.org/pdf/2202.13913
Abstract Understanding how the motion dynamics of two moving bodies with an unbounded friction interface arise, is essential for multiple system and control applications. Coupling terms in the dynamics of an inertial pair, which is linked to each other through a passive frictional contact, is nontrivial and, for a long time, remained less studied. This problem is especially demanding from a viewpoint of the interaction forces and motion states. This paper introduces a generalized problem of relative motion in systems with an unbounded (i.e. free of motion constraints) frictional interface, while assuming a classical Coulomb friction with discontinuity at the velocity zero crossing. We formulate the motion dynamics in a closed form of ordinary differential equations, which include the sign operator for mapping both the Coulomb friction and switching conditions, and discuss their validity in the generalized force and motion coordinates. Here the system with one active degree of freedom (meaning a driving body) and one passive degree of freedom (meaning a driven body) is studied. We analyze and demonstrate the global convergence of trajectories for a free system case, i.e. without an external control. An illustrative case study of solutions is presented for a harmonic oscillator, which has a friction-coupled second mass not connected (or joint-linked) to the ground. This example elucidates the addressed problem statement and the proposed modeling framework. Relevant future developments and related challenging questions are discussed at the end of the paper.
Keyword: localization

An End-to-End Transformer Model for Crowd Localization
Authors: Dingkang Liang, Wei Xu, Xiang Bai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.13065
Pdf link: https://arxiv.org/pdf/2202.13065
Abstract Crowd localization, predicting head positions, is a more practical and high-level task than simply counting. Existing methods employ pseudo-bounding boxes or pre-designed localization maps, relying on complex post-processing to obtain the head positions. In this paper, we propose an elegant, end-to-end Crowd Localization TRansformer named CLTR that solves the task in the regression-based paradigm. The proposed method views the crowd localization as a direct set prediction problem, taking extracted features and trainable embeddings as input of the transformer-decoder. To achieve good matching results, we introduce a KMO-based Hungarian, which innovatively revisits the label assignment from a context view instead of an independent instance view. Extensive experiments conducted on five datasets in various data settings show the effectiveness of our method. In particular, the proposed method achieves the best localization performance on the NWPU-Crowd, UCF-QNRF, and ShanghaiTech Part A datasets.
How to Debug Inclusivity Bugs? A Debugging Process with Information Architecture
Authors: Mariam Guizani, Igor Steinmacher, Jillian Emard, Abrar Fallatah, Margaret Burnett, Anita Sarma
Subjects: Software Engineering (cs.SE); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2202.13303
Pdf link: https://arxiv.org/pdf/2202.13303
Abstract Although some previous research has found ways to find inclusivity bugs (biases in software that introduce inequities), little attention has been paid to how to go about fixing such bugs. Without a process to move from finding to fixing, acting upon such findings is an ad-hoc activity, at the mercy of the skills of each individual developer. To address this gap, we created Why/Where/Fix, a systematic inclusivity debugging process whose inclusivity fault localization harnesses Information Architecture(IA) -- the way user-facing information is organized, structured and labeled. We then conducted a multi-stage qualitative empirical evaluation of the effectiveness of Why/Where/Fix, using an Open Source Software (OSS) project's infrastructure as our setting. In our study, the OSS project team used the Why/Where/Fix process to find inclusivity bugs, localize the IA faults behind them, and then fix the IA to remove the inclusivity bugs they had found. Our results showed that using Why/Where/Fix reduced the number of inclusivity bugs that OSS newcomer participants experienced by 90%.
StrongSORT: Make DeepSORT Great Again
Authors: Yunhao Du, Yang Song, Bo Yang, Yanyun Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.13514
Pdf link: https://arxiv.org/pdf/2202.13514
Abstract Existing Multi-Object Tracking (MOT) methods can be roughly classified as tracking-by-detection and joint-detection-association paradigms. Although the latter has elicited more attention and demonstrates comparable performance relative to the former, we claim that the tracking-by-detection paradigm is still the optimal solution in terms of tracking accuracy. In this paper, we revisit the classic tracker DeepSORT and upgrade it from various aspects, i.e., detection, embedding and association. The resulting tracker, called StrongSORT, sets new HOTA and IDF1 records on MOT17 and MOT20. We also present two lightweight and plug-and-play algorithms to further refine the tracking results. Firstly, an appearance-free link model (AFLink) is proposed to associate short tracklets into complete trajectories. To the best of our knowledge, this is the first global link model without appearance information. Secondly, we propose Gaussian-smoothed interpolation (GSI) to compensate for missing detections. Instead of ignoring motion information like linear interpolation, GSI is based on the Gaussian process regression algorithm and can achieve more accurate localizations. Moreover, AFLink and GSI can be plugged into various trackers with a negligible extra computational cost (591.9 and 140.9 Hz, respectively, on MOT17). By integrating StrongSORT with the two algorithms, the final tracker StrongSORT++ ranks first on MOT17 and MOT20 in terms of HOTA and IDF1 metrics and surpasses the second-place one by 1.3 - 2.2. Code will be released soon.
Localization via Multiple Reconfigurable Intelligent Surfaces Equipped with Single Receive RF Chains
Authors: George C. Alexandropoulos, Ioanna Vinieratou, Henk Wymeersch
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2202.13939
Pdf link: https://arxiv.org/pdf/2202.13939
Abstract The extra degrees of freedom resulting from the consideration of Reconfigurable Intelligent Surfaces (RISs) for smart signal propagation can be exploited for high accuracy localization and tracking. In this paper, capitalizing on a recent RIS hardware architecture incorporating a single receive Radio Frequency (RF) chain for measurement collection, we present a user localization method with multiple RISs. The proposed method includes an initial step for direction estimation at each RIS, followed by maximum likelihood position estimation, which is initialized with a least squares line intersection technique. Our numerical results showcase the accuracy of the proposed localization, verifying our theoretical estimation analysis.
SmartBelt: A Wearable Microphone Array for Sound Source Localization with Haptic Feedback
Authors: Simon Michaud, Benjamin Moffett, Ana Tapia Rousiouk, Victoria Duda, François Grondin
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.13974
Pdf link: https://arxiv.org/pdf/2202.13974
Abstract This paper introduces SmartBelt, a wearable microphone array on a belt that performs sound source localization and returns the direction of arrival with respect to the user waist. One of the haptic motors on the belt then vibrates in the corresponding direction to provide useful feedback to the user. We also introduce a simple calibration step to adapt the belt to different waist sizes. Experiments are performed to confirm the accuracy of this wearable sound source localization system, and results show a Mean Average Error (MAE) of 2.90 degrees, and a correct haptic motor selection with a rate of 92.3%. Results suggest the device can provide useful haptic feedback, and will be evaluated in a study with people having hearing impairments.

zhuhu00 / Paper-Daily-Notice

New submissions for Tue, 1 Mar 22 #110

Keyword: SLAM

RL-PGO: Reinforcement Learning-based Planar Pose-Graph Optimization

Keyword: Visual inertial

Keyword: livox

Keyword: loam

Keyword: Visual inertial odometry

Keyword: lidar

How much depth information can radar infer and contribute

Robust Self-Supervised LiDAR Odometry via Representative Structure Discovery and 3D Inherent Error Modeling

Meta-RangeSeg: LiDAR Sequence Semantic Segmentation Using Multiple Feature Aggregation

Globally Optimal Boresight Alignment of UAV-LiDAR Systems

Cyber Mobility Mirror: Deep Learning-based Real-time 3D Object Perception and Reconstruction Using Roadside LiDAR

Towards Class-agnostic Tracking Using Feature Decorrelation in Point Clouds

Joint Camera Intrinsic and LiDAR-Camera Extrinsic Calibration

ANTLER: Bayesian Nonlinear Tensor Learning and Modeler for Unstructured, Varying-Size Point Cloud Data

TEScalib: Targetless Extrinsic Self-Calibration of LiDAR and Stereo Camera for Automated Driving Vehicles with Uncertainty Analysis

Large-Scale 3D Semantic Reconstruction for Automated Driving Vehicles with Adaptive Truncated Signed Distance Function

Keyword: loop detection

Keyword: autonomous driving

Photonic reinforcement learning based on optoelectronic reservoir computing

Attacks and Faults Injection in Self-Driving Agents on the Carla Simulator -- Experience Report

How much depth information can radar infer and contribute

Aggressive Racecar Drifting Control Using Onboard Cameras and Inertial Measurement Unit

Unsupervised Representation Learning for Point Clouds: A Survey

Joint Camera Intrinsic and LiDAR-Camera Extrinsic Calibration

Path-Aware Graph Attention for HD Maps in Motion Prediction

"If you could see me through my eyes": Predicting Pedestrian Perception

Keyword: mapping

Conformal capacity and polycircular domains

Multi-view Gradient Consistency for SVBRDF Estimation of Complex Scenes under Natural Illumination

Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Robust Self-Supervised LiDAR Odometry via Representative Structure Discovery and 3D Inherent Error Modeling

TC-Net: Triple Context Network for Automated Stroke Lesion Segmentation

SFIP: Coarse-Grained Syscall-Flow-Integrity Protection in Modern Systems

Large-Scale 3D Semantic Reconstruction for Automated Driving Vehicles with Adaptive Truncated Signed Distance Function

A Novel Viewport-Adaptive Motion Compensation Technique for Fisheye Video

Motion dynamics of inertial pair coupled via frictional interface

Keyword: localization

An End-to-End Transformer Model for Crowd Localization

How to Debug Inclusivity Bugs? A Debugging Process with Information Architecture

StrongSORT: Make DeepSORT Great Again

Localization via Multiple Reconfigurable Intelligent Surfaces Equipped with Single Receive RF Chains

SmartBelt: A Wearable Microphone Array for Sound Source Localization with Haptic Feedback