Abstract
Intricate behaviors an organism can exhibit is predicated on its ability to sense and effectively interpret complexities of its surroundings. Relevant information is often distributed between multiple modalities, and requires the organism to exhibit information assimilation capabilities in addition to information seeking behaviors. While biological beings leverage multiple sensing modalities for decision making, current robots are overly reliant on visual inputs. In this work, we want to augment our robots with the ability to leverage the (relatively under-explored) modality of touch. To focus our investigation, we study the problem of scene reconstruction where touch is the only available sensing modality. We present Tactile Slam (tSLAM) -- which prepares an agent to acquire information seeking behavior and use implicit understanding of common household items to reconstruct the geometric details of the object under exploration. Using the anthropomorphic `ADROIT' hand, we demonstrate that tSLAM is highly effective in reconstructing objects of varying complexities within 6 seconds of interactions. We also established the generality of tSLAM by training only on 3D Warehouse objects and testing on ContactDB objects.
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
A Survey of Robust 3D Object Detection Methods in Point Clouds
Abstract
The purpose of this work is to review the state-of-the-art LiDAR-based 3D object detection methods, datasets, and challenges. We describe novel data augmentation methods, sampling strategies, activation functions, attention mechanisms, and regularization methods. Furthermore, we list recently introduced normalization methods, learning rate schedules and loss functions. Moreover, we also cover advantages and limitations of 10 novel autonomous driving datasets. We evaluate novel 3D object detectors on the KITTI, nuScenes, and Waymo dataset and show their accuracy, speed, and robustness. Finally, we mention the current challenges in 3D object detection in LiDAR point clouds and list some open issues.
Real-Time and Robust 3D Object Detection Within Road-Side LiDARs Using Domain Adaptation
Authors: Walter Zimmer, Marcus Grabler, Alois Knoll
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This work aims to address the challenges in domain adaptation of 3D object detection using infrastructure LiDARs. We design a model DASE-ProPillars that can detect vehicles in infrastructure-based LiDARs in real-time. Our model uses PointPillars as the baseline model with additional modules to improve the 3D detection performance. To prove the effectiveness of our proposed modules in DASE-ProPillars, we train and evaluate the model on two datasets, the open source A9-Dataset and a semi-synthetic infrastructure dataset created within the Regensburg Next project. We do several sets of experiments for each module in the DASE-ProPillars detector that show that our model outperforms the SE-ProPillars baseline on the real A9 test set and a semi-synthetic A9 test set, while maintaining an inference speed of 45 Hz (22 ms). We apply domain adaptation from the semi-synthetic A9-Dataset to the semi-synthetic dataset from the Regensburg Next project by applying transfer learning and achieve a 3D mAP@0.25 of 93.49% on the Car class of the target test set using 40 recall positions.
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection
Authors: Yanan Zhang, Jiaxin Chen, Di Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In autonomous driving, LiDAR point-clouds and RGB images are two major data modalities with complementary cues for 3D object detection. However, it is quite difficult to sufficiently use them, due to large inter-modal discrepancies. To address this issue, we propose a novel framework, namely Contrastively Augmented Transformer for multi-modal 3D object Detection (CAT-Det). Specifically, CAT-Det adopts a two-stream structure consisting of a Pointformer (PT) branch, an Imageformer (IT) branch along with a Cross-Modal Transformer (CMT) module. PT, IT and CMT jointly encode intra-modal and inter-modal long-range contexts for representing an object, thus fully exploring multi-modal information for detection. Furthermore, we propose an effective One-way Multi-modal Data Augmentation (OMDA) approach via hierarchical contrastive learning at both the point and object levels, significantly improving the accuracy only by augmenting point-clouds, which is free from complex generation of paired samples of the two modalities. Extensive experiments on the KITTI benchmark show that CAT-Det achieves a new state-of-the-art, highlighting its effectiveness.
Keyword: loop detection
There is no result
Keyword: autonomous driving
A Survey of Robust 3D Object Detection Methods in Point Clouds
Abstract
The purpose of this work is to review the state-of-the-art LiDAR-based 3D object detection methods, datasets, and challenges. We describe novel data augmentation methods, sampling strategies, activation functions, attention mechanisms, and regularization methods. Furthermore, we list recently introduced normalization methods, learning rate schedules and loss functions. Moreover, we also cover advantages and limitations of 10 novel autonomous driving datasets. We evaluate novel 3D object detectors on the KITTI, nuScenes, and Waymo dataset and show their accuracy, speed, and robustness. Finally, we mention the current challenges in 3D object detection in LiDAR point clouds and list some open issues.
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection
Authors: Yanan Zhang, Jiaxin Chen, Di Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In autonomous driving, LiDAR point-clouds and RGB images are two major data modalities with complementary cues for 3D object detection. However, it is quite difficult to sufficiently use them, due to large inter-modal discrepancies. To address this issue, we propose a novel framework, namely Contrastively Augmented Transformer for multi-modal 3D object Detection (CAT-Det). Specifically, CAT-Det adopts a two-stream structure consisting of a Pointformer (PT) branch, an Imageformer (IT) branch along with a Cross-Modal Transformer (CMT) module. PT, IT and CMT jointly encode intra-modal and inter-modal long-range contexts for representing an object, thus fully exploring multi-modal information for detection. Furthermore, we propose an effective One-way Multi-modal Data Augmentation (OMDA) approach via hierarchical contrastive learning at both the point and object levels, significantly improving the accuracy only by augmenting point-clouds, which is free from complex generation of paired samples of the two modalities. Extensive experiments on the KITTI benchmark show that CAT-Det achieves a new state-of-the-art, highlighting its effectiveness.
Keyword: mapping
AKF-SR: Adaptive Kalman Filtering-based Successor Representation
Authors: Parvin Malekzadeh, Mohammad Salimibeni, Ming Hou, Arash Mohammadi, Konstantinos N. Plataniotis
Abstract
Recent studies in neuroscience suggest that Successor Representation (SR)-based models provide adaptation to changes in the goal locations or reward function faster than model-free algorithms, together with lower computational cost compared to that of model-based algorithms. However, it is not known how such representation might help animals to manage uncertainty in their decision-making. Existing methods for SR learning do not capture uncertainty about the estimated SR. In order to address this issue, the paper presents a Kalman filter-based SR framework, referred to as Adaptive Kalman Filtering-based Successor Representation (AKF-SR). First, Kalman temporal difference approach, which is a combination of the Kalman filter and the temporal difference method, is used within the AKF-SR framework to cast the SR learning procedure into a filtering problem to benefit from the uncertainty estimation of the SR, and also decreases in memory requirement and sensitivity to model's parameters in comparison to deep neural network-based algorithms. An adaptive Kalman filtering approach is then applied within the proposed AKF-SR framework in order to tune the measurement noise covariance and measurement mapping function of Kalman filter as the most important parameters affecting the filter's performance. Moreover, an active learning method that exploits the estimated uncertainty of the SR to form the behaviour policy leading to more visits to less certain values is proposed to improve the overall performance of an agent in terms of received rewards while interacting with its environment.
Making Pre-trained Language Models End-to-end Few-shot Learners with Contrastive Prompt Tuning
Abstract
Pre-trained Language Models (PLMs) have achieved remarkable performance for various language understanding tasks in IR systems, which require the fine-tuning process based on labeled training data. For low-resource scenarios, prompt-based learning for PLMs exploits prompts as task guidance and turns downstream tasks into masked language problems for effective few-shot fine-tuning. In most existing approaches, the high performance of prompt-based learning heavily relies on handcrafted prompts and verbalizers, which may limit the application of such approaches in real-world scenarios. To solve this issue, we present CP-Tuning, the first end-to-end Contrastive Prompt Tuning framework for fine-tuning PLMs without any manual engineering of task-specific prompts and verbalizers. It is integrated with the task-invariant continuous prompt encoding technique with fully trainable prompt parameters. We further propose the pair-wise cost-sensitive contrastive learning procedure to optimize the model in order to achieve verbalizer-free class mapping and enhance the task-invariance of prompts. It explicitly learns to distinguish different classes and makes the decision boundary smoother by assigning different costs to easy and hard cases. Experiments over a variety of language understanding tasks used in IR systems and different PLMs show that CP-Tuning outperforms state-of-the-art methods.
Towards gain tuning for numerical KKL observers
Authors: Mona Buisson-Fenet, Lukas Bahr, Florent Di Meglio
Abstract
This paper presents a first step towards tuning observers for nonlinear systems. Relying on recent results around Kazantzis-Kravaris/Luenberger (KKL) observers, we propose to design a family of observers parametrized by the cut-off frequency of a linear filter. We use neural networks to learn the mapping between the observer and the nonlinear system as a function of this frequency, and present a novel method to sample the state-space efficiently for nonlinear regression. We then propose a criterion related to noise sensitivity, which can be used to tune the observer by choosing the most appropriate frequency. We illustrate the merits of this approach in numerical simulations.
Keyword: localization
Ball 3D localization from a single calibrated image
Authors: Gabriel Van Zandycke, Christophe De Vleeshouwer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Ball 3D localization in team sports has various applications including automatic offside detection in soccer, or shot release localization in basketball. Today, this task is either resolved by using expensive multi-views setups, or by restricting the analysis to ballistic trajectories. In this work, we propose to address the task on a single image from a calibrated monocular camera by estimating ball diameter in pixels and use the knowledge of real ball diameter in meters. This approach is suitable for any game situation where the ball is (even partly) visible. To achieve this, we use a small neural network trained on image patches around candidates generated by a conventional ball detector. Besides predicting ball diameter, our network outputs the confidence of having a ball in the image patch. Validations on 3 basketball datasets reveals that our model gives remarkable predictions on ball 3D localization. In addition, through its confidence output, our model improves the detection rate by filtering the candidates produced by the detector. The contributions of this work are (i) the first model to address 3D ball localization on a single image, (ii) an effective method for ball 3D annotation from single calibrated images, (iii) a high quality 3D ball evaluation dataset annotated from a single viewpoint. In addition, the code to reproduce this research is be made freely available at https://github.com/gabriel-vanzandycke/deepsport.
Leakage Localization in Water Distribution Networks: A Model-Based Approach
Authors: Ludvig Lindstrom, Sebin Gracy, Sindri Magnusson, Henrik Sandberg
Abstract
The paper studies the problem of leakage localization in water distribution networks. For the case of a single pipe that suffers from a single leak, by taking recourse to pressure and flow measurements, and assuming those are noiseless, we provide a closed-form expression for leak localization, leak exponent and leak constant. For the aforementioned setting, but with noisy pressure and flow measurements, an expression for estimating the location of the leak is provided. Finally, assuming the existence of a single leak, for a network comprising of more than one pipe and assuming that the network has a tree structure, we provide a systematic procedure for determining the leak location, the leak exponent, and the leak constant
TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization
Authors: Sijie Zhu, Mubarak Shah, Chen Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The dominant CNN-based methods for cross-view image geo-localization rely on polar transform and fail to model global correlation. We propose a pure transformer-based approach (TransGeo) to address these limitations from a different perspective. TransGeo takes full advantage of the strengths of transformer related to global information modeling and explicit position information encoding. We further leverage the flexibility of transformer input and propose an attention-guided non-uniform cropping method, so that uninformative image patches are removed with negligible drop on performance to reduce computation cost. The saved computation can be reallocated to increase resolution only for informative patches, resulting in performance improvement with no additional computation cost. This "attend and zoom-in" strategy is highly similar to human behavior when observing images. Remarkably, TransGeo achieves state-of-the-art results on both urban and rural datasets, with significantly less computation cost than CNN-based methods. It does not rely on polar transform and infers faster than CNN-based methods. Code is available at https://github.com/Jeff-Zilence/TransGeo2022.
LASER: LAtent SpacE Rendering for 2D Visual Localization
Abstract
We present LASER, an image-based Monte Carlo Localization (MCL) framework for 2D floor maps. LASER introduces the concept of latent space rendering, where 2D pose hypotheses on the floor map are directly rendered into a geometrically-structured latent space by aggregating viewing ray features. Through a tightly coupled rendering codebook scheme, the viewing ray features are dynamically determined at rendering-time based on their geometries (i.e. length, incident-angle), endowing our representation with view-dependent fine-grain variability. Our codebook scheme effectively disentangles feature encoding from rendering, allowing the latent space rendering to run at speeds above 10KHz. Moreover, through metric learning, our geometrically-structured latent space is common to both pose hypotheses and query images with arbitrary field of views. As a result, LASER achieves state-of-the-art performance on large-scale indoor localization datasets (i.e. ZInD and Structured3D) for both panorama and perspective image queries, while significantly outperforming existing learning-based methods in speed.
Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization
Authors: Eunji Kim, Siwon Kim, Jungbeom Lee, Hyunwoo Kim, Sungroh Yoon
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Weakly supervised object localization aims to find a target object region in a given image with only weak supervision, such as image-level labels. Most existing methods use a class activation map (CAM) to generate a localization map; however, a CAM identifies only the most discriminative parts of a target object rather than the entire object region. In this work, we find the gap between classification and localization in terms of the misalignment of the directions between an input feature and a class-specific weight. We demonstrate that the misalignment suppresses the activation of CAM in areas that are less discriminative but belong to the target object. To bridge the gap, we propose a method to align feature directions with a class-specific weight. The proposed method achieves a state-of-the-art localization performance on the CUB-200-2011 and ImageNet-1K benchmarks.
Generic Event Boundary Captioning: A Benchmark for Status Changes Understanding
Authors: Yuxuan Wang, Difei Gao, Licheng Yu, Stan Weixian Lei, Matt Feiszli, Mike Zheng Shou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Cognitive science has shown that humans perceive videos in terms of events separated by state changes of dominant subjects. State changes trigger new events and are one of the most useful among the large amount of redundant information perceived. However, previous research focuses on the overall understanding of segments without evaluating the fine-grained status changes inside. In this paper, we introduce a new dataset called Kinetic-GEBC (Generic Event Boundary Captioning). The dataset consists of over 170k boundaries associated with captions describing status changes in the generic events in 12K videos. Upon this new dataset, we propose three tasks supporting the development of a more fine-grained, robust, and human-like understanding of videos through status changes. We evaluate many representative baselines in our dataset, where we also design a new TPD (Temporal-based Pairwise Difference) Modeling method for current state-of-the-art backbones and achieve significant performance improvements. Besides, the results show there are still formidable challenges for current methods in the utilization of different granularities, representation of visual difference, and the accurate localization of status changes. Further analysis shows that our dataset can drive developing more powerful methods to understand status changes and thus improve video level comprehension.
A Global Modeling Approach for Load Forecasting in Distribution Networks
Abstract
Efficient load forecasting is needed to ensure better observability in the distribution networks, whereas such forecasting is made possible by an increasing number of smart meter installations. Because distribution networks include a large amount of different loads at various aggregation levels, such as individual consumers, transformer stations and feeders loads, it is impractical to develop individual (or so-called local) forecasting models for each load separately. Furthermore, such local models ignore the strong dependencies between different loads that might be present due to their spatial proximity and the characteristics of the distribution network. To address these issues, this paper proposes a global modeling approach based on deep learning for efficient forecasting of a large number of loads in distribution networks. In this way, the computational burden of training a large amount of local forecasting models can be largely reduced, and the cross-series information shared among different loads can be utilized. Additionally, an unsupervised localization mechanism and optimal ensemble construction strategy are also proposed to localize/personalize the forecasting model to different groups of loads and to improve the forecasting accuracy further. Comprehensive experiments are conducted on real-world smart meter data to demonstrate the superiority of the proposed approach compared to competing methods.
DFNet: Enhance Aboslute Pose Regression with Direct Feature Matching
Authors: Shuai Chen, Xinghui Li, Zirui Wang, Victor Prisacariu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We introduce a camera relocalization pipeline that combines absolute pose regression (APR) and direct feature matching. Existing photometric-based methods have trouble on scenes with large photometric distortions, e.g. outdoor environments. By incorporating an exposure-adaptive novel view synthesis, our methods can successfully address the challenges. Moreover, by introducing domain-invariant feature matching, our solution can improve pose regression accuracy while using semi-supervised learning on unlabeled data. In particular, the pipeline consists of two components, Novel View Synthesizer and FeatureNet (DFNet). The former synthesizes novel views compensating for changes in exposure and the latter regresses camera poses and extracts robust features that bridge the domain gap between real images and synthetic ones. We show that domain invariant feature matching effectively enhances camera pose estimation both in indoor and outdoor scenes. Hence, our method achieves a state-of-the-art accuracy by outperforming existing single-image APR methods by as much as 56%, comparable to 3D structure-based methods.
Reference Network and Localization Architecture for Smart Manufacturing based on 5G
Authors: Stephan Ludwig, Doris Aschenbrenner, Marvin Schürle, Henrik Klessig, Michael Karrenbauer, Huanzhuo Wu, Maroua Taghouti, Pedro Lozano, Hans D. Schotten, Frank H. P. Fitzek
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
5G promises to shift Industry 4.0 to the next level by allowing flexible production. However, many communication standards are used throughout a production site, which will stay so in the foreseeable future. Furthermore, localization of assets will be equally valuable in order to get to a higher level of automation. This paper proposes a reference architecture for a convergent localization and communication network for smart manufacturing that combines 5G with other existing technologies and focuses on high-mix low-volume application, in particular at small and medium-sized enterprises. The architecture is derived from a set of functional requirements, and we describe different views on this architecture to show how the requirements can be fulfilled. It connects private and public mobile networks with local networking technologies to achieve a flexible setup addressing many industrial use cases.
Keyword: SLAM
Curiosity Driven Self-supervised Tactile Exploration of Unknown Objects
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
A Survey of Robust 3D Object Detection Methods in Point Clouds
Real-Time and Robust 3D Object Detection Within Road-Side LiDARs Using Domain Adaptation
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection
Keyword: loop detection
There is no result
Keyword: autonomous driving
A Survey of Robust 3D Object Detection Methods in Point Clouds
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection
Keyword: mapping
AKF-SR: Adaptive Kalman Filtering-based Successor Representation
Making Pre-trained Language Models End-to-end Few-shot Learners with Contrastive Prompt Tuning
Towards gain tuning for numerical KKL observers
Keyword: localization
Ball 3D localization from a single calibrated image
Leakage Localization in Water Distribution Networks: A Model-Based Approach
TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization
LASER: LAtent SpacE Rendering for 2D Visual Localization
Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization
Generic Event Boundary Captioning: A Benchmark for Status Changes Understanding
A Global Modeling Approach for Load Forecasting in Distribution Networks
DFNet: Enhance Aboslute Pose Regression with Direct Feature Matching
Reference Network and Localization Architecture for Smart Manufacturing based on 5G