Abstract
Modelling individual objects as Neural Radiance Fields (NeRFs) within a robotic context can benefit many downstream tasks such as scene understanding and object manipulation. However, real-world training data collected by a robot deviate from the ideal in several key aspects. (i) The trajectories are constrained and full visual coverage is not guaranteed - especially when obstructions are present. (ii) The poses associated with the images are noisy. (iii) The objects are not easily isolated from the background. This paper addresses the above three points and uses the outputs of an object-based SLAM system to bound objects in the scene with coarse primitives and - in concert with instance masks - identify obstructions in the training images. Objects are therefore automatically bounded, and non-relevant geometry is excluded from the NeRF representation. The method's performance is benchmarked under ideal conditions and tested against errors in the poses and instance masks. Our results show that object-based NeRFs are robust to pose variations but sensitive to the quality of the instance masks.
Making Parameterization and Constrains of Object Landmark Globally Consistent via SPD(3) Manifold and Improved Cost Functions
Abstract
Object-level SLAM introduces semantic meaningful and compact object landmarks that help both indoor robot applications and outdoor autonomous driving tasks. However, the back end of object-level SLAM suffers from singularity problems because existing methods parameterize object landmark separately by their scales and poses. Under that parameterization method, the same abstract object can be represented by rotating the object coordinate frame by 90 deg and swapping its length with width value, making the pose of the same object landmark not globally consistent. To avoid the singularity problem, we first introduce the symmetric positive-definite (SPD) matrix manifold as an improved object-level landmark representation and further improve the cost functions in the back end to make them compatible with the representation. Our method demonstrates a faster convergence rate and more robustness in simulation experiments. Experiments on real datasets also reveal that using the same front-end data, our strategy improves the mapping accuracy by 22% on average.
Fast Autonomous Robotic Exploration Using the Underlying Graph Structure
Abstract
In this work, we fully define the existing relationships between traditional optimality criteria and the connectivity of the underlying pose-graph in Active SLAM, characterizing, therefore, the connection between Graph Theory and the Theory Optimal Experimental Design. We validate the proposed relationships in 2D and 3D graph SLAM datasets, showing a remarkable relaxation of the computational load when using the graph structure. Furthermore, we present a novel Active SLAM framework which outperforms traditional methods by successfully leveraging the graphical facet of the problem so as to autonomously explore an unknown environment.
Enough is Enough: Towards Autonomous Uncertainty-driven Stopping Criteria
Abstract
Autonomous robotic exploration has long attracted the attention of the robotics community and is a topic of high relevance. Deploying such systems in the real world, however, is still far from being a reality. In part, it can be attributed to the fact that most research is directed towards improving existing algorithms and testing novel formulations in simulation environments rather than addressing practical issues of real-world scenarios. This is the case of the fundamental problem of autonomously deciding when exploration has to be terminated or changed (stopping criteria), which has not received any attention recently. In this paper, we discuss the importance of using appropriate stopping criteria and analyse the behaviour of a novel criterion based on the evolution of optimality criteria in active graph-SLAM.
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Pay "Attention" to Adverse Weather: Weather-aware Attention-based Object Detection
Authors: Saket S. Chaturvedi, Lan Zhang, Xiaoyong Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Despite the recent advances of deep neural networks, object detection for adverse weather remains challenging due to the poor perception of some sensors in adverse weather. Instead of relying on one single sensor, multimodal fusion has been one promising approach to provide redundant detection information based on multiple sensors. However, most existing multimodal fusion approaches are ineffective in adjusting the focus of different sensors under varying detection environments in dynamic adverse weather conditions. Moreover, it is critical to simultaneously observe local and global information under complex weather conditions, which has been neglected in most early or late-stage multimodal fusion works. In view of these, this paper proposes a Global-Local Attention (GLA) framework to adaptively fuse the multi-modality sensing streams, i.e., camera, gated camera, and lidar data, at two fusion stages. Specifically, GLA integrates an early-stage fusion via a local attention network and a late-stage fusion via a global attention network to deal with both local and global information, which automatically allocates higher weights to the modality with better detection features at the late-stage fusion to cope with the specific weather condition adaptively. Experimental results demonstrate the superior performance of the proposed GLA compared with state-of-the-art fusion approaches under various adverse weather conditions, such as light fog, dense fog, and snow.
Keyword: loop detection
There is no result
Keyword: autonomous driving
Making Parameterization and Constrains of Object Landmark Globally Consistent via SPD(3) Manifold and Improved Cost Functions
Abstract
Object-level SLAM introduces semantic meaningful and compact object landmarks that help both indoor robot applications and outdoor autonomous driving tasks. However, the back end of object-level SLAM suffers from singularity problems because existing methods parameterize object landmark separately by their scales and poses. Under that parameterization method, the same abstract object can be represented by rotating the object coordinate frame by 90 deg and swapping its length with width value, making the pose of the same object landmark not globally consistent. To avoid the singularity problem, we first introduce the symmetric positive-definite (SPD) matrix manifold as an improved object-level landmark representation and further improve the cost functions in the back end to make them compatible with the representation. Our method demonstrates a faster convergence rate and more robustness in simulation experiments. Experiments on real datasets also reveal that using the same front-end data, our strategy improves the mapping accuracy by 22% on average.
Lossy compression of matrices by black-box optimisation of mixed-integer non-linear programming
Authors: Tadashi Kadowaki, Mitsuru Ambai
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Quantum Physics (quant-ph)
Abstract
In edge computing, suppressing data size is a challenge for machine learning models that perform complex tasks such as autonomous driving, in which computational resources (speed, memory size and power) are limited. Efficient lossy compression of matrix data has been introduced by decomposing it into the product of an integer and real matrices. However, its optimisation is difficult as it requires simultaneous optimisation of an integer and real variables. In this paper, we improve this optimisation by utilising recently developed black-box optimisation (BBO) algorithms with an Ising solver for integer variables. In addition, the algorithm can be used to solve mixed-integer programming problems that are linear and non-linear in terms of real and integer variables, respectively. The differences between the choice of Ising solvers (simulated annealing (SA), quantum annealing (QA) and simulated quenching (SQ)) and the strategies of the BBO algorithms (BOCS, FMQA and their variations) are discussed for further development of the BBO techniques.
Keyword: mapping
Making Parameterization and Constrains of Object Landmark Globally Consistent via SPD(3) Manifold and Improved Cost Functions
Abstract
Object-level SLAM introduces semantic meaningful and compact object landmarks that help both indoor robot applications and outdoor autonomous driving tasks. However, the back end of object-level SLAM suffers from singularity problems because existing methods parameterize object landmark separately by their scales and poses. Under that parameterization method, the same abstract object can be represented by rotating the object coordinate frame by 90 deg and swapping its length with width value, making the pose of the same object landmark not globally consistent. To avoid the singularity problem, we first introduce the symmetric positive-definite (SPD) matrix manifold as an improved object-level landmark representation and further improve the cost functions in the back end to make them compatible with the representation. Our method demonstrates a faster convergence rate and more robustness in simulation experiments. Experiments on real datasets also reveal that using the same front-end data, our strategy improves the mapping accuracy by 22% on average.
Efficient Pipeline Planning for Expedited Distributed DNN Training
Authors: Ziyue Luo, Xiaodong Yi, Guoping Long, Shiqing Fan, Chuan Wu, Jun Yang, Wei Lin
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
To train modern large DNN models, pipeline parallelism has recently emerged, which distributes the model across GPUs and enables different devices to process different microbatches in pipeline. Earlier pipeline designs allow multiple versions of model parameters to co-exist (similar to asynchronous training), and cannot ensure the same model convergence and accuracy performance as without pipelining. Synchronous pipelining has recently been proposed which ensures model performance by enforcing a synchronization barrier between training iterations. Nonetheless, the synchronization barrier requires waiting for gradient aggregation from all microbatches and thus delays the training progress. Optimized pipeline planning is needed to minimize such wait and hence the training time, which has not been well studied in the literature. This paper designs efficient, near-optimal algorithms for expediting synchronous pipeline-parallel training of modern large DNNs over arbitrary inter-GPU connectivity. Our algorithm framework comprises two components: a pipeline partition and device mapping algorithm, and a pipeline scheduler that decides processing order of microbatches over the partitions, which together minimize the per-iteration training time. We conduct thorough theoretical analysis, extensive testbed experiments and trace-driven simulation, and demonstrate our scheme can accelerate training up to 157% compared with state-of-the-art designs.
A New Polar Code Design Based on Reciprocal Channel Approximation
Authors: Hideki Ochiai, Kosuke Ikeya, Patrick Mitran
Abstract
This paper revisits polar code design for a binary-input additive white Gaussian noise (BI-AWGN) channel when successive cancellation (SC) decoding is applied at the receiver. We focus on the reciprocal channel approximation (RCA), which is often adopted in the design of low-density parity-check (LDPC) codes. In order to apply RCA to polar code design for various codeword lengths, we derive rigorous closed-form approximations that are valid over a wide range of SNR over an AWGN channel, for both the mutual information of BPSK signaling and the corresponding reciprocal channel mapping. As a result, the computational complexity required for evaluating channel polarization is thus equivalent to that based on the popular Gaussian approximation (GA) approach. Simulation results show that the proposed polar code design based on RCA outperforms those based on GA as well as the so-called improved GA (IGA) approach, especially as the codeword length is increased. Furthermore, the RCA-based design yields a better block error rate (BLER) estimate compared to GA-based approaches.
Keyword: localization
DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
Authors: Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Michael B. Gotway, Jianming Liang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Discriminative learning, restorative learning, and adversarial learning have proven beneficial for self-supervised learning schemes in computer vision and medical imaging. Existing efforts, however, omit their synergistic effects on each other in a ternary setup, which, we envision, can significantly benefit deep semantic representation learning. To realize this vision, we have developed DiRA, the first framework that unites discriminative, restorative, and adversarial learning in a unified manner to collaboratively glean complementary visual information from unlabeled medical images for fine-grained semantic representation learning. Our extensive experiments demonstrate that DiRA (1) encourages collaborative learning among three learning ingredients, resulting in more generalizable representation across organs, diseases, and modalities; (2) outperforms fully supervised ImageNet models and increases robustness in small data regimes, reducing annotation cost across multiple medical imaging applications; (3) learns fine-grained semantic representation, facilitating accurate lesion localization with only image-level annotation; and (4) enhances state-of-the-art restorative approaches, revealing that DiRA is a general mechanism for united representation learning. All code and pre-trained models are available at https: //github.com/JLiangLab/DiRA.
Diverse Instance Discovery: Vision-Transformer for Instance-Aware Multi-Label Image Recognition
Abstract
Previous works on multi-label image recognition (MLIR) usually use CNNs as a starting point for research. In this paper, we take pure Vision Transformer (ViT) as the research base and make full use of the advantages of Transformer with long-range dependency modeling to circumvent the disadvantages of CNNs limited to local receptive field. However, for multi-label images containing multiple objects from different categories, scales, and spatial relations, it is not optimal to use global information alone. Our goal is to leverage ViT's patch tokens and self-attention mechanism to mine rich instances in multi-label images, named diverse instance discovery (DiD). To this end, we propose a semantic category-aware module and a spatial relationship-aware module, respectively, and then combine the two by a re-constraint strategy to obtain instance-aware attention maps. Finally, we propose a weakly supervised object localization-based approach to extract multi-scale local features, to form a multi-view pipeline. Our method requires only weakly supervised information at the label level, no additional knowledge injection or other strongly supervised information is required. Experiments on three benchmark datasets show that our method significantly outperforms previous works and achieves state-of-the-art results under fair experimental comparisons.
Keyword: SLAM
Implicit Object Mapping With Noisy Data
Making Parameterization and Constrains of Object Landmark Globally Consistent via SPD(3) Manifold and Improved Cost Functions
Fast Autonomous Robotic Exploration Using the Underlying Graph Structure
Enough is Enough: Towards Autonomous Uncertainty-driven Stopping Criteria
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Pay "Attention" to Adverse Weather: Weather-aware Attention-based Object Detection
Keyword: loop detection
There is no result
Keyword: autonomous driving
Making Parameterization and Constrains of Object Landmark Globally Consistent via SPD(3) Manifold and Improved Cost Functions
Lossy compression of matrices by black-box optimisation of mixed-integer non-linear programming
Keyword: mapping
Making Parameterization and Constrains of Object Landmark Globally Consistent via SPD(3) Manifold and Improved Cost Functions
Efficient Pipeline Planning for Expedited Distributed DNN Training
A New Polar Code Design Based on Reciprocal Channel Approximation
Keyword: localization
DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
Diverse Instance Discovery: Vision-Transformer for Instance-Aware Multi-Label Image Recognition