New submissions for Tue, 22 Feb 22

Keyword: SLAM

SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

Authors: Xingtong Liu, Zhaoshuo Li, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.09487
Pdf link: https://arxiv.org/pdf/2202.09487
Abstract In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor graph optimization. The appearance and geometry priors are explicitly learned in an end-to-end differentiable training pipeline to master the task of pair-wise image alignment, one of the core components of the SLAM system. In our experiments, the proposed SLAM system is shown to robustly handle the challenges of texture scarceness and illumination variation that are commonly seen in endoscopy. The system generalizes well to unseen endoscopes and subjects and performs favorably compared with a state-of-the-art feature-based SLAM system. The code repository is available at https://github.com/lppllppl920/SAGE-SLAM.git.
Keyword: Visual inertial

There is no result

Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: Visual inertial odometry

There is no result

Keyword: lidar

LiDAR-guided Stereo Matching with a Spatial Consistency Constraint
Authors: Yongjun Zhang, Siyuan Zou, Xinyi Liu, Xu Huang, Yi Wan, Yongxiang Yao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2202.09953
Pdf link: https://arxiv.org/pdf/2202.09953
Abstract The complementary fusion of light detection and ranging (LiDAR) data and image data is a promising but challenging task for generating high-precision and high-density point clouds. This study proposes an innovative LiDAR-guided stereo matching approach called LiDAR-guided stereo matching (LGSM), which considers the spatial consistency represented by continuous disparity or depth changes in the homogeneous region of an image. The LGSM first detects the homogeneous pixels of each LiDAR projection point based on their color or intensity similarity. Next, we propose a riverbed enhancement function to optimize the cost volume of the LiDAR projection points and their homogeneous pixels to improve the matching robustness. Our formulation expands the constraint scopes of sparse LiDAR projection points with the guidance of image information to optimize the cost volume of pixels as much as possible. We applied LGSM to semi-global matching and AD-Census on both simulated and real datasets. When the percentage of LiDAR points in the simulated datasets was 0.16%, the matching accuracy of our method achieved a subpixel level, while that of the original stereo matching algorithm was 3.4 pixels. The experimental results show that LGSM is suitable for indoor, street, aerial, and satellite image datasets and provides good transferability across semi-global matching and AD-Census. Furthermore, the qualitative and quantitative evaluations demonstrate that LGSM is superior to two state-of-the-art optimizing cost volume methods, especially in reducing mismatches in difficult matching areas and refining the boundaries of objects.
PCSCNet: Fast 3D Semantic Segmentation of LiDAR Point Cloud for Autonomous Car using Point Convolution and Sparse Convolution Network
Authors: Jaehyun Park, Chansoo Kim, Kichun Jo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.10047
Pdf link: https://arxiv.org/pdf/2202.10047
Abstract The autonomous car must recognize the driving environment quickly for safe driving. As the Light Detection And Range (LiDAR) sensor is widely used in the autonomous car, fast semantic segmentation of LiDAR point cloud, which is the point-wise classification of the point cloud within the sensor framerate, has attracted attention in recognition of the driving environment. Although the voxel and fusion-based semantic segmentation models are the state-of-the-art model in point cloud semantic segmentation recently, their real-time performance suffer from high computational load due to high voxel resolution. In this paper, we propose the fast voxel-based semantic segmentation model using Point Convolution and 3D Sparse Convolution (PCSCNet). The proposed model is designed to outperform at both high and low voxel resolution using point convolution-based feature extraction. Moreover, the proposed model accelerates the feature propagation using 3D sparse convolution after the feature extraction. The experimental results demonstrate that the proposed model outperforms the state-of-the-art real-time models in semantic segmentation of SemanticKITTI and nuScenes, and achieves the real-time performance in LiDAR point cloud inference.
Keyword: loop detection

There is no result

Keyword: autonomous driving

Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study
Authors: Yuecong Xu, Jianfei Yang, Haozhi Cao, Jianxiong Yin, Zhenghua Chen, Xiaoli Li, Zhengguo Li, Qianwen Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.09545
Pdf link: https://arxiv.org/pdf/2202.09545
Abstract While action recognition (AR) has gained large improvements with the introduction of large-scale video datasets and the development of deep neural networks, AR models robust to challenging environments in real-world scenarios are still under-explored. We focus on the task of action recognition in dark environments, which can be applied to fields such as surveillance and autonomous driving at night. Intuitively, current deep networks along with visual enhancement techniques should be able to handle AR in dark environments, however, it is observed that this is not always the case in practice. To dive deeper into exploring solutions for AR in dark environments, we launched the UG2+ Challenge Track 2 (UG2-2) in IEEE CVPR 2021, with a goal of evaluating and advancing the robustness of AR models in dark environments. The challenge builds and expands on top of a novel ARID dataset, the first dataset for the task of dark video AR, and guides models to tackle such a task in both fully and semi-supervised manners. Baseline results utilizing current AR models and enhancement methods are reported, justifying the challenging nature of this task with substantial room for improvements. Thanks to the active participation from the research community, notable advances have been made in participants' solutions, while analysis of these solutions helped better identify possible directions to tackle the challenge of AR in dark environments.
Multi-task Safe Reinforcement Learning for Navigating Intersections in Dense Traffic
Authors: Yuqi Liu, Qichao Zhang, Dongbin Zhao
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2202.09644
Pdf link: https://arxiv.org/pdf/2202.09644
Abstract Multi-task intersection navigation including the unprotected turning left, turning right, and going straight in dense traffic is still a challenging task for autonomous driving. For the human driver, the negotiation skill with other interactive vehicles is the key to guarantee safety and efficiency. However, it is hard to balance the safety and efficiency of the autonomous vehicle for multi-task intersection navigation. In this paper, we formulate a multi-task safe reinforcement learning with social attention to improve the safety and efficiency when interacting with other traffic participants. Specifically, the social attention module is used to focus on the states of negotiation vehicles. In addition, a safety layer is added to the multi-task reinforcement learning framework to guarantee safe negotiation. We compare the experiments in the simulator SUMO with abundant traffic flows and CARLA with high-fidelity vehicle models, which both show that the proposed algorithm can improve safety with consistent traffic efficiency for multi-task intersection navigation.
Adaptive Safe Merging Control for Heterogeneous Autonomous Vehicles using Parametric Control Barrier Functions
Authors: Yiwei Lyu, Wenhao Luo, John M. Dolan
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.09936
Pdf link: https://arxiv.org/pdf/2202.09936
Abstract With the increasing emphasis on the safe autonomy for robots, model-based safe control approaches such as Control Barrier Functions have been extensively studied to ensure guaranteed safety during inter-robot interactions. In this paper, we introduce the Parametric Control Barrier Function (Parametric-CBF), a novel variant of the traditional Control Barrier Function to extend its expressivity in describing different safe behaviors among heterogeneous robots. Instead of assuming cooperative and homogeneous robots using the same safe controllers, the ego robot is able to model the neighboring robots' underlying safe controllers through different Parametric-CBFs with observed data. Given learned parametric-CBF and proved forward invariance, it provides greater flexibility for the ego robot to better coordinate with other heterogeneous robots with improved efficiency while enjoying formally provable safety guarantees. We demonstrate the usage of Parametric-CBF in behavior prediction and adaptive safe control in the ramp merging scenario from the applications of autonomous driving. Compared to traditional CBF, Parametric-CBF has the advantage of capturing varying drivers' characteristics given richer description of robot behavior in the context of safe control. Numerical simulations are given to validate the effectiveness of the proposed method.
Vision-based Autonomous Driving for Unstructured Environments Using Imitation Learning
Authors: Joonwoo Ahn, Minsoo Kim, Jaeheung Park
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.10002
Pdf link: https://arxiv.org/pdf/2202.10002
Abstract Unstructured environments are difficult for autonomous driving. This is because various unknown obstacles are lied in drivable space without lanes, and its width and curvature change widely. In such complex environments, searching for a path in real-time is difficult. Also, inaccurate localization data reduce the path tracking accuracy, increasing the risk of collision. Instead of searching and tracking the path, an alternative approach has been proposed that reactively avoids obstacles in real-time. Some methods are available for tracking global path while avoiding obstacles using the candidate paths and the artificial potential field. However, these methods require heuristics to find specific parameters for handling various complex environments. In addition, it is difficult to track the global path accurately in practice because of inaccurate localization data. If the drivable space is not accurately recognized (i.e., noisy state), the vehicle may not smoothly drive or may collide with obstacles. In this study, a method in which the vehicle drives toward drivable space only using a vision-based occupancy grid map is proposed. The proposed method uses imitation learning, where a deep neural network is trained with expert driving data. The network can learn driving patterns suited for various complex and noisy situations because these situations are contained in the training data. Experiments with a vehicle in actual parking lots demonstrated the limitations of general model-based methods and the effectiveness of the proposed imitation learning method.
Jerk Constrained Velocity Planning for an Autonomous Vehicle: Linear Programming Approach
Authors: Yutaka Shimizu, Takamasa Horibe, Fumiya Watanabe, Shinpei Kato
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.10029
Pdf link: https://arxiv.org/pdf/2202.10029
Abstract Velocity Planning for self-driving vehicles in a complex environment is one of the most challenging tasks. It must satisfy the following three requirements: safety with regards to collisions; respect of the maximum velocity limits defined by the traffic rules; comfort of the passengers. In order to achieve these goals, the jerk and dynamic objects should be considered, however, it makes the problem as complex as a non-convex optimization problem. In this paper, we propose a linear programming (LP) based velocity planning method with jerk limit and obstacle avoidance constraints for an autonomous driving system. To confirm the efficiency of the proposed method, a comparison is made with several optimization-based approaches, and we show that our method can generate a velocity profile which satisfies the aforementioned requirements more efficiently than the compared methods. In addition, we tested our algorithm on a real vehicle at a test field to validate the effectiveness of the proposed method.
Multi-Task Conditional Imitation Learning for Autonomous Navigation at Crowded Intersections
Authors: Zeyu Zhu, Huijing Zhao
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.10124
Pdf link: https://arxiv.org/pdf/2202.10124
Abstract In recent years, great efforts have been devoted to deep imitation learning for autonomous driving control, where raw sensory inputs are directly mapped to control actions. However, navigating through densely populated intersections remains a challenging task due to uncertainty caused by uncertain traffic participants. We focus on autonomous navigation at crowded intersections that require interaction with pedestrians. A multi-task conditional imitation learning framework is proposed to adapt both lateral and longitudinal control tasks for safe and efficient interaction. A new benchmark called IntersectNav is developed and human demonstrations are provided. Empirical results show that the proposed method can achieve a success rate gain of up to 30% compared to the state-of-the-art.
Keyword: mapping

Geometric Algebra based Embeddings for Staticand Temporal Knowledge Graph Completion
Authors: Chengjin Xu, Mojtaba Nayyeri, Yung-Yu Chen, Jens Lehmann
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2202.09464
Pdf link: https://arxiv.org/pdf/2202.09464
Abstract Recent years, Knowledge Graph Embeddings (KGEs) have shown promising performance on link prediction tasks by mapping the entities and relations from a Knowledge Graph (KG) into a geometric space and thus have gained increasing attentions. In addition, many recent Knowledge Graphs involve evolving data, e.g., the fact (\textit{Obama}, \textit{PresidentOf}, \textit{USA}) is valid only from 2009 to 2017. This introduces important challenges for knowledge representation learning since such temporal KGs change over time. In this work, we strive to move beyond the complex or hypercomplex space for KGE and propose a novel geometric algebra based embedding approach, GeomE, which uses multivector representations and the geometric product to model entities and relations. GeomE subsumes several state-of-the-art KGE models and is able to model diverse relations patterns. On top of this, we extend GeomE to TGeomE for temporal KGE, which performs 4th-order tensor factorization of a temporal KG and devises a new linear temporal regularization for time representation learning. Moreover, we study the effect of time granularity on the performance of TGeomE models. Experimental results show that our proposed models achieve the state-of-the-art performances on link prediction over four commonly-used static KG datasets and four well-established temporal KG datasets across various metrics.
Automated Attack Synthesis by Extracting Finite State Machines from Protocol Specification Documents
Authors: Maria Leonor Pacheco, Max von Hippel, Ben Weintraub, Dan Goldwasser, Cristina Nita-Rotaru
Subjects: Cryptography and Security (cs.CR); Formal Languages and Automata Theory (cs.FL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2202.09470
Pdf link: https://arxiv.org/pdf/2202.09470
Abstract Automated attack discovery techniques, such as attacker synthesis or model-based fuzzing, provide powerful ways to ensure network protocols operate correctly and securely. Such techniques, in general, require a formal representation of the protocol, often in the form of a finite state machine (FSM). Unfortunately, many protocols are only described in English prose, and implementing even a simple network protocol as an FSM is time-consuming and prone to subtle logical errors. Automatically extracting protocol FSMs from documentation can significantly contribute to increased use of these techniques and result in more robust and secure protocol implementations. In this work we focus on attacker synthesis as a representative technique for protocol security, and on RFCs as a representative format for protocol prose description. Unlike other works that rely on rule-based approaches or use off-the-shelf NLP tools directly, we suggest a data-driven approach for extracting FSMs from RFC documents. Specifically, we use a hybrid approach consisting of three key steps: (1) large-scale word-representation learning for technical language, (2) focused zero-shot learning for mapping protocol text to a protocol-independent information language, and (3) rule-based mapping from protocol-independent information to a specific protocol FSM. We show the generalizability of our FSM extraction by using the RFCs for six different protocols: BGPv4, DCCP, LTP, PPTP, SCTP and TCP. We demonstrate how automated extraction of an FSM from an RFC can be applied to the synthesis of attacks, with TCP and DCCP as case-studies. Our approach shows that it is possible to automate attacker synthesis against protocols by using textual specifications such as RFCs.
SAGE: SLAM with Appearance and Geometry Prior for Endoscopy
Authors: Xingtong Liu, Zhaoshuo Li, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.09487
Pdf link: https://arxiv.org/pdf/2202.09487
Abstract In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor graph optimization. The appearance and geometry priors are explicitly learned in an end-to-end differentiable training pipeline to master the task of pair-wise image alignment, one of the core components of the SLAM system. In our experiments, the proposed SLAM system is shown to robustly handle the challenges of texture scarceness and illumination variation that are commonly seen in endoscopy. The system generalizes well to unseen endoscopes and subjects and performs favorably compared with a state-of-the-art feature-based SLAM system. The code repository is available at https://github.com/lppllppl920/SAGE-SLAM.git.
Function-valued RKHS-based Operator Learning for Differential Equations
Authors: Kaijun Bao, Xu Qian, Ziyuan Liu, Haifeng Wang, Songhe Song
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2202.09488
Pdf link: https://arxiv.org/pdf/2202.09488
Abstract Recently, a steam of works seek for solving a family of partial differential equations, which consider solving partial differential equations as computing the inverse operator map between the input and solution space. Toward this end, we incorporate function-valued reproducing kernel Hilbert spaces into our operator learning model, which shows that the approximate solution of target operator has a special form. With an appropriate kernel and growth of the data, the approximation solution will converge to the exact one. Then we propose a neural network architecture based on the special form. We perform various experiments and show that the proposed architecture has a desirable accuracy on linear and non-linear partial differential equations even in a small amount of data. By learning the mappings between function spaces, the proposed method has the ability to find the solution of a high-resolution input after learning from lower-resolution data.
Implementing Boolean Functions with switching lattice networks
Authors: Rajesh Kumar Datta
Subjects: Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2202.09551
Pdf link: https://arxiv.org/pdf/2202.09551
Abstract Four terminal switching network is an alternative structure to realize the logic functions in electronic circuit modeling. This network can be used to implement a Boolean function with less number of switches than the two terminal based CMOS switch. Each switch of the network is driven by a Boolean literal. Any switch is connected to its four neighbors if a literal takes the value 1 , else it is disconnected. In our work, we aimed to develop a technique by which we can find out if any Boolean function can be implemented with a given four-terminal network. It is done using the path of any given lattice network. First, we developed a synthesis tool by which we can create a library of Boolean functions with a given four-terminal switching network and random Boolean literals. This tool can be used to check the output of any lattice network which can also function as a lattice network solver. In the next step, we used the library functions to develop and test our MAPPING tool where the functions were given as input and from the output, we can get the implemented function in four terminal lattice network. Finally, we have proposed a systematic procedure to implement any Boolean function with a efficient way by any given one type of lattice network.
A Probabilistic Programming Idiom for Active Knowledge Search
Authors: Malte R. Damgaard, Rasmus Pedersen, Thomas Bak
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2202.09555
Pdf link: https://arxiv.org/pdf/2202.09555
Abstract In this paper, we derive and implement a probabilistic programming idiom for the problem of acquiring new knowledge about an environment. The idiom is implemented utilizing a modern probabilistic programming language. We demonstrate the utility of this idiom by implementing an algorithm for the specific problem of active mapping and robot exploration. Finally, we evaluate the functionality of the implementation through an extensive simulation study utilizing the HouseExpo dataset.
Confidence-rich Localization and Mapping based on Particle Filter for Robotic Exploration
Authors: Yang Xu, Ronghao Zheng, Senlin Zhang, Meiqin Liu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.09631
Pdf link: https://arxiv.org/pdf/2202.09631
Abstract This paper mainly studies the information-theoretic exploration in an environmental representation with dense belief, considering pose uncertainty for range sensing robots. Previous works concern more about active mapping/exploration with known poses or utilize inaccurate information metrics, resulting in imbalanced exploration. This motivates us to extend the confidence-rich mutual information (CRMI) with measurable pose uncertainty. Specifically, we propose a Rao-Blackwellized particle filter-based confidence-rich localization and mapping (RBPF-CRLM) scheme with a new closed-form weighting method. We further compute the uncertain CRMI (UCRMI) with the weighted particles by a more accurate approximation. Simulations and experimental evaluations show the localization accuracy and exploration performance of the proposed methods in unstructured environments.
Teaching Drones on the Fly: Can Emotional Feedback Serve as Learning Signal for Training Artificial Agents?
Authors: Manuela Pollak, Andrea Salfinger, Karin Anna Hummel
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2202.09634
Pdf link: https://arxiv.org/pdf/2202.09634
Abstract We investigate whether \emph{naturalistic emotional human feedback} can be directly exploited as a \emph{reward signal} for training artificial agents via \emph{interactive human-in-the-loop reinforcement learning}. To answer this question, we devise an experimental setting inspired by animal training, in which human test subjects interactively teach an emulated drone agent their desired command-action-mapping by providing emotional feedback on the drone's action selections. We present a first empirical proof-of-concept study and analysis confirming that human facial emotion expression can be directly exploited as reward signal in such interactive learning settings. Thereby, we contribute empirical findings towards more naturalistic and intuitive forms of %machine teaching reinforcement learning especially designed for non-expert users.
Analytic continuation from limited noisy Matsubara data
Authors: Lexing Ying
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2202.09719
Pdf link: https://arxiv.org/pdf/2202.09719
Abstract This note proposes a new algorithm for analytic continuation from limited noisy Matsubara data. We consider both the molecule and condensed matter cases. In both cases, the algorithm constructs an accurate interpolant of the Matsubara data and uses conformal mapping and Prony's method. Numerical results are provided to demonstrate the performance of the algorithm.
SOInter: A Novel Deep Energy Based Interpretation Method for Explaining Structured Output Models
Authors: S. Fatemeh Seyyedsalehi, Mahdieh Soleymani, Hamid R. Rabiee
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2202.09914
Pdf link: https://arxiv.org/pdf/2202.09914
Abstract We propose a novel interpretation technique to explain the behavior of structured output models, which learn mappings between an input vector to a set of output variables simultaneously. Because of the complex relationship between the computational path of output variables in structured models, a feature can affect the value of output through other ones. We focus on one of the outputs as the target and try to find the most important features utilized by the structured model to decide on the target in each locality of the input space. In this paper, we assume an arbitrary structured output model is available as a black box and argue how considering the correlations between output variables can improve the explanation performance. The goal is to train a function as an interpreter for the target output variable over the input space. We introduce an energy-based training process for the interpreter function, which effectively considers the structural information incorporated into the model to be explained. The effectiveness of the proposed method is confirmed using a variety of simulated and real data sets.
SRL-SOA: Self-Representation Learning with Sparse 1D-Operational Autoencoder for Hyperspectral Image Band Selection
Authors: Mete Ahishali, Serkan Kiranyaz, Iftikhar Ahmad, Moncef Gabbouj
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2202.09918
Pdf link: https://arxiv.org/pdf/2202.09918
Abstract The band selection in the hyperspectral image (HSI) data processing is an important task considering its effect on the computational complexity and accuracy. In this work, we propose a novel framework for the band selection problem: Self-Representation Learning (SRL) with Sparse 1D-Operational Autoencoder (SOA). The proposed SLR-SOA approach introduces a novel autoencoder model, SOA, that is designed to learn a representation domain where the data are sparsely represented. Moreover, the network composes of 1D-operational layers with the non-linear neuron model. Hence, the learning capability of neurons (filters) is greatly improved with shallow architectures. Using compact architectures is especially crucial in autoencoders as they tend to overfit easily because of their identity mapping objective. Overall, we show that the proposed SRL-SOA band selection approach outperforms the competing methods over two HSI data including Indian Pines and Salinas-A considering the achieved land cover classification accuracies. The software implementation of the SRL-SOA approach is shared publicly at https://github.com/meteahishali/SRL-SOA.
Guided Visual Attention Model Based on Interactions Between Top-down and Bottom-up Information for Robot Pose Prediction
Authors: Hyogo Hiruma, Hiroki Mori, Tetsuya Ogata
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.10036
Pdf link: https://arxiv.org/pdf/2202.10036
Abstract Learning to control a robot commonly requires mapping between robot states and camera images, where conventional deep vision models require large training dataset. Existing visual attention models, such as Deep Spatial Autoencoders, have improved the data-efficiency by training the model to selectively extract only the task relevant image area. However, since the models are unable to select attention targets on demand, the diversity of trainable tasks are limited. This paper proposed a novel Key-Query-Value formulated visual attention model which can be guided to a certain attention target. The model creates an attention heatmap from Key and Query, and selectively extracts the attended data represented in Value. Such structure is capable of incorporating external inputs to create the Query, which will be trained to represent the target objects. The separation of Query creation improved the model's flexibility, enabling to simultaneously obtain and switch between multiple targets in a top-down manner. The proposed model is experimented on a simulator and a real-world environment, showing better performance compared to existing end-to-end robot vision models. The results of real-world experiments indicated the model's high scalability and extendiblity on robot controlling tasks.
A self-adaptive RIS that estimates and shapes fading rich-scattering wireless channels
Authors: Chloé Saigre-Tardif, Philipp del Hougne
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Applied Physics (physics.app-ph)
Arxiv link: https://arxiv.org/abs/2202.10248
Pdf link: https://arxiv.org/pdf/2202.10248
Abstract We present a framework for operating a self-adaptive RIS inside a fading rich-scattering wireless environment. We model the rich-scattering wireless channel as being double-parametrized by (i) the RIS, and (ii) dynamic perturbers (moving objects, etc.). Within each coherence time, first, the self-adaptive RIS estimates the status of the dynamic perturbers (e.g., the perturbers' orientations and locations) based on measurements with an auxiliary wireless channel. Then, second, using a learned surrogate forward model of the mapping from RIS configuration and perturber status to wireless channel, an optimized RIS configuration to achieve a desired functionality is obtained. We demonstrate our technique using a physics-based end-to-end model of RIS-parametrized communication with adjustable fading (PhysFad) for the example objective of maximizing the received signal strength indicator. Our results present a route toward convergence of RIS-empowered localization and sensing with RIS-empowered channel shaping beyond the simple case of operation in free space without fading.
GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis
Authors: Nika Mansouri Ghiasi, Jisung Park, Harun Mustafa, Jeremie Kim, Ataberk Olgun, Arvid Gollwitzer, Damla Senol Cali, Can Firtina, Haiyu Mao, Nour Almadhoun Alserr, Rachata Ausavarungnirun, Nandita Vijaykumar, Mohammed Alser, Onur Mutlu
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS); Genomics (q-bio.GN)
Arxiv link: https://arxiv.org/abs/2202.10400
Pdf link: https://arxiv.org/pdf/2202.10400
Abstract Read mapping is a fundamental, yet computationally-expensive step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). To address the computational challenges in genome analysis, many prior works propose various approaches such as filters that select the reads that must undergo expensive computation, efficient heuristics, and hardware acceleration. While effective at reducing the computation overhead, all such approaches still require the costly movement of a large amount of data from storage to the rest of the system, which can significantly lower the end-to-end performance of read mapping in conventional and emerging genomics systems. We propose GenStore, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. GenStore leverages hardware/software co-design to address the challenges of in-storage processing, supporting reads with 1) different read lengths and error rates, and 2) different degrees of genetic variation. Through rigorous analysis of read mapping processes, we meticulously design low-cost hardware accelerators and data/computation flows inside a NAND flash-based SSD. Our evaluation using a wide range of real genomic datasets shows that GenStore, when implemented in three modern SSDs, significantly improves the read mapping performance of state-of-the-art software (hardware) baselines by 2.07-6.05$\times$ (1.52-3.32$\times$) for read sets with high similarity to the reference genome and 1.45-33.63$\times$ (2.70-19.2$\times$) for read sets with low similarity to the reference genome.
Keyword: localization

SAGE: SLAM with Appearance and Geometry Prior for Endoscopy
Authors: Xingtong Liu, Zhaoshuo Li, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.09487
Pdf link: https://arxiv.org/pdf/2202.09487
Abstract In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor graph optimization. The appearance and geometry priors are explicitly learned in an end-to-end differentiable training pipeline to master the task of pair-wise image alignment, one of the core components of the SLAM system. In our experiments, the proposed SLAM system is shown to robustly handle the challenges of texture scarceness and illumination variation that are commonly seen in endoscopy. The system generalizes well to unseen endoscopes and subjects and performs favorably compared with a state-of-the-art feature-based SLAM system. The code repository is available at https://github.com/lppllppl920/SAGE-SLAM.git.
Confidence-rich Localization and Mapping based on Particle Filter for Robotic Exploration
Authors: Yang Xu, Ronghao Zheng, Senlin Zhang, Meiqin Liu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.09631
Pdf link: https://arxiv.org/pdf/2202.09631
Abstract This paper mainly studies the information-theoretic exploration in an environmental representation with dense belief, considering pose uncertainty for range sensing robots. Previous works concern more about active mapping/exploration with known poses or utilize inaccurate information metrics, resulting in imbalanced exploration. This motivates us to extend the confidence-rich mutual information (CRMI) with measurable pose uncertainty. Specifically, we propose a Rao-Blackwellized particle filter-based confidence-rich localization and mapping (RBPF-CRLM) scheme with a new closed-form weighting method. We further compute the uncertain CRMI (UCRMI) with the weighted particles by a more accurate approximation. Simulations and experimental evaluations show the localization accuracy and exploration performance of the proposed methods in unstructured environments.
Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Authors: Nathan Kallus, Xiaojie Mao, Kaiwen Wang, Zhengyuan Zhou
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2202.09667
Pdf link: https://arxiv.org/pdf/2202.09667
Abstract Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where experimentation is necessarily limited. OPE/L is nonetheless sensitive to discrepancies between the data-generating environment and that where policies are deployed. Recent work proposed distributionally robust OPE/L (DROPE/L) to remedy this, but the proposal relies on inverse-propensity weighting, whose regret rates may deteriorate if propensities are estimated and whose variance is suboptimal even if not. For vanilla OPE/L, this is solved by doubly robust (DR) methods, but they do not naturally extend to the more complex DROPE/L, which involves a worst-case expectation. In this paper, we propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets. For evaluation, we propose Localized Doubly Robust DROPE (LDR$^2$OPE) and prove its semiparametric efficiency under weak product rates conditions. Notably, thanks to a localization technique, LDR$^2$OPE only requires fitting a small number of regressions, just like DR methods for vanilla OPE. For learning, we propose Continuum Doubly Robust DROPL (CDR$^2$OPL) and show that, under a product rate condition involving a continuum of regressions, it enjoys a fast regret rate of $\mathcal{O}(N^{-1/2})$ even when unknown propensities are nonparametrically estimated. We further extend our results to general $f$-divergence uncertainty sets. We illustrate the advantage of our algorithms in simulations.
Multiscale Crowd Counting and Localization By Multitask Point Supervision
Authors: Mohsen Zand, Haleh Damirchi, Andrew Farley, Mahdiyar Molahasani, Michael Greenspan, Ali Etemad
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2202.09942
Pdf link: https://arxiv.org/pdf/2202.09942
Abstract We propose a multitask approach for crowd counting and person localization in a unified framework. As the detection and localization tasks are well-correlated and can be jointly tackled, our model benefits from a multitask solution by learning multiscale representations of encoded crowd images, and subsequently fusing them. In contrast to the relatively more popular density-based methods, our model uses point supervision to allow for crowd locations to be accurately identified. We test our model on two popular crowd counting datasets, ShanghaiTech A and B, and demonstrate that our method achieves strong results on both counting and localization tasks, with MSE measures of 110.7 and 15.0 for crowd counting and AP measures of 0.71 and 0.75 for localization, on ShanghaiTech A and B respectively. Our detailed ablation experiments show the impact of our multiscale approach as well as the effectiveness of the fusion module embedded in our network. Our code is available at: https://github.com/RCVLab-AiimLab/crowd_counting.
Vision-based Autonomous Driving for Unstructured Environments Using Imitation Learning
Authors: Joonwoo Ahn, Minsoo Kim, Jaeheung Park
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2202.10002
Pdf link: https://arxiv.org/pdf/2202.10002
Abstract Unstructured environments are difficult for autonomous driving. This is because various unknown obstacles are lied in drivable space without lanes, and its width and curvature change widely. In such complex environments, searching for a path in real-time is difficult. Also, inaccurate localization data reduce the path tracking accuracy, increasing the risk of collision. Instead of searching and tracking the path, an alternative approach has been proposed that reactively avoids obstacles in real-time. Some methods are available for tracking global path while avoiding obstacles using the candidate paths and the artificial potential field. However, these methods require heuristics to find specific parameters for handling various complex environments. In addition, it is difficult to track the global path accurately in practice because of inaccurate localization data. If the drivable space is not accurately recognized (i.e., noisy state), the vehicle may not smoothly drive or may collide with obstacles. In this study, a method in which the vehicle drives toward drivable space only using a vision-based occupancy grid map is proposed. The proposed method uses imitation learning, where a deep neural network is trained with expert driving data. The network can learn driving patterns suited for various complex and noisy situations because these situations are contained in the training data. Experiments with a vehicle in actual parking lots demonstrated the limitations of general model-based methods and the effectiveness of the proposed imitation learning method.
A self-adaptive RIS that estimates and shapes fading rich-scattering wireless channels
Authors: Chloé Saigre-Tardif, Philipp del Hougne
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Applied Physics (physics.app-ph)
Arxiv link: https://arxiv.org/abs/2202.10248
Pdf link: https://arxiv.org/pdf/2202.10248
Abstract We present a framework for operating a self-adaptive RIS inside a fading rich-scattering wireless environment. We model the rich-scattering wireless channel as being double-parametrized by (i) the RIS, and (ii) dynamic perturbers (moving objects, etc.). Within each coherence time, first, the self-adaptive RIS estimates the status of the dynamic perturbers (e.g., the perturbers' orientations and locations) based on measurements with an auxiliary wireless channel. Then, second, using a learned surrogate forward model of the mapping from RIS configuration and perturber status to wireless channel, an optimized RIS configuration to achieve a desired functionality is obtained. We demonstrate our technique using a physics-based end-to-end model of RIS-parametrized communication with adjustable fading (PhysFad) for the example objective of maximizing the received signal strength indicator. Our results present a route toward convergence of RIS-empowered localization and sensing with RIS-empowered channel shaping beyond the simple case of operation in free space without fading.
Improving Radioactive Material Localization by Leveraging Cyber-Security Model Optimizations
Authors: Ryan Sheatsley, Matthew Durbin, Azaree Lintereur, Patrick McDaniel
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2202.10387
Pdf link: https://arxiv.org/pdf/2202.10387
Abstract One of the principal uses of physical-space sensors in public safety applications is the detection of unsafe conditions (e.g., release of poisonous gases, weapons in airports, tainted food). However, current detection methods in these applications are often costly, slow to use, and can be inaccurate in complex, changing, or new environments. In this paper, we explore how machine learning methods used successfully in cyber domains, such as malware detection, can be leveraged to substantially enhance physical space detection. We focus on one important exemplar application--the detection and localization of radioactive materials. We show that the ML-based approaches can significantly exceed traditional table-based approaches in predicting angular direction. Moreover, the developed models can be expanded to include approximations of the distance to radioactive material (a critical dimension that reference tables used in practice do not capture). With four and eight detector arrays, we collect counts of gamma-rays as features for a suite of machine learning models to localize radioactive material. We explore seven unique scenarios via simulation frameworks frequently used for radiation detection and with physical experiments using radioactive material in laboratory environments. We observe that our approach can outperform the standard table-based method, reducing the angular error by 37% and reliably predicting distance within 2.4%. In this way, we show that advances in cyber-detection provide substantial opportunities for enhancing detection in public safety applications and beyond.

zhuhu00 / Paper-Daily-Notice

New submissions for Tue, 22 Feb 22 #105

Keyword: SLAM

SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

Keyword: Visual inertial

Keyword: livox

Keyword: loam

Keyword: Visual inertial odometry

Keyword: lidar

LiDAR-guided Stereo Matching with a Spatial Consistency Constraint

PCSCNet: Fast 3D Semantic Segmentation of LiDAR Point Cloud for Autonomous Car using Point Convolution and Sparse Convolution Network

Keyword: loop detection

Keyword: autonomous driving

Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study

Multi-task Safe Reinforcement Learning for Navigating Intersections in Dense Traffic

Adaptive Safe Merging Control for Heterogeneous Autonomous Vehicles using Parametric Control Barrier Functions

Vision-based Autonomous Driving for Unstructured Environments Using Imitation Learning

Jerk Constrained Velocity Planning for an Autonomous Vehicle: Linear Programming Approach

Multi-Task Conditional Imitation Learning for Autonomous Navigation at Crowded Intersections

Keyword: mapping

Geometric Algebra based Embeddings for Staticand Temporal Knowledge Graph Completion

Automated Attack Synthesis by Extracting Finite State Machines from Protocol Specification Documents

SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

Function-valued RKHS-based Operator Learning for Differential Equations

Implementing Boolean Functions with switching lattice networks

A Probabilistic Programming Idiom for Active Knowledge Search

Confidence-rich Localization and Mapping based on Particle Filter for Robotic Exploration

Teaching Drones on the Fly: Can Emotional Feedback Serve as Learning Signal for Training Artificial Agents?

Analytic continuation from limited noisy Matsubara data

SOInter: A Novel Deep Energy Based Interpretation Method for Explaining Structured Output Models

SRL-SOA: Self-Representation Learning with Sparse 1D-Operational Autoencoder for Hyperspectral Image Band Selection

Guided Visual Attention Model Based on Interactions Between Top-down and Bottom-up Information for Robot Pose Prediction

A self-adaptive RIS that estimates and shapes fading rich-scattering wireless channels

GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

Keyword: localization

SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

Confidence-rich Localization and Mapping based on Particle Filter for Robotic Exploration

Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

Multiscale Crowd Counting and Localization By Multitask Point Supervision

Vision-based Autonomous Driving for Unstructured Environments Using Imitation Learning

A self-adaptive RIS that estimates and shapes fading rich-scattering wireless channels

Improving Radioactive Material Localization by Leveraging Cyber-Security Model Optimizations