New submissions for Wed, 27 Apr 22

Keyword: SLAM

There is no result

Keyword: Visual inertial

There is no result

Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: Visual inertial odometry

There is no result

Keyword: lidar

Centimeter-level Positioning by Instantaneous Lidar-aided GNSS Ambiguity Resolution

Authors: Junjie Zhang, Amir Khodabandeh, Kourosh Khoshelham
Subjects: Robotics (cs.RO); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2204.12103
Pdf link: https://arxiv.org/pdf/2204.12103
Abstract High-precision vehicle positioning is key to the implementation of modern driving systems in urban environments. Global Navigation Satellite System (GNSS) carrier phase measurements can provide millimeter- to centimeter-level positioning, provided that the integer ambiguities are correctly resolved. Abundant code measurements are often used to facilitate integer ambiguity resolution (IAR), however, they suffer from signal blockage and multipath in urban canyons. In this contribution, a lidar-aided instantaneous ambiguity resolution method is proposed. Lidar measurements, in the form of 3D keypoints, are generated by a learning-based point cloud registration method using a pre-built HD map and integrated with GNSS observations in a mixed measurement model to produce precise float solutions, which in turn increase the ambiguity success rate. Closed-form expressions of the ambiguity variance matrix and the associated Ambiguity Dilution of Precision (ADOP) are developed to provide a priori evaluation of such lidar-aided ambiguity resolution performance. Both analytical and experimental results show that the proposed method enables successful instantaneous IAR with limited GNSS satellites and frequencies, leading to centimeter-level vehicle positioning.
Keyword: loop detection

There is no result

Keyword: autonomous driving

Where and What: Driver Attention-based Object Detection
Authors: Yao Rong, Naemi-Rebecca Kassautzki, Wolfgang Fuhl, Enkelejda Kasneci
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2204.12150
Pdf link: https://arxiv.org/pdf/2204.12150
Abstract Human drivers use their attentional mechanisms to focus on critical objects and make decisions while driving. As human attention can be revealed from gaze data, capturing and analyzing gaze information has emerged in recent years to benefit autonomous driving technology. Previous works in this context have primarily aimed at predicting "where" human drivers look at and lack knowledge of "what" objects drivers focus on. Our work bridges the gap between pixel-level and object-level attention prediction. Specifically, we propose to integrate an attention prediction module into a pretrained object detection framework and predict the attention in a grid-based style. Furthermore, critical objects are recognized based on predicted attended-to areas. We evaluate our proposed method on two driver attention datasets, BDD-A and DR(eye)VE. Our framework achieves competitive state-of-the-art performance in the attention prediction on both pixel-level and object-level but is far more efficient (75.3 GFLOPs less) in computation.
Keyword: mapping

Logistic-ELM: A Novel Fault Diagnosis Method for Rolling Bearings
Authors: Zhenhua Tan, Jingyu Ning, Kai Peng, Zhenche Xia, Danke Wu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2204.11845
Pdf link: https://arxiv.org/pdf/2204.11845
Abstract The fault diagnosis of rolling bearings is a critical technique to realize predictive maintenance for mechanical condition monitoring. In real industrial systems, the main challenges for the fault diagnosis of rolling bearings pertain to the accuracy and real-time requirements. Most existing methods focus on ensuring the accuracy, and the real-time requirement is often neglected. In this paper, considering both requirements, we propose a novel fast fault diagnosis method for rolling bearings, based on extreme learning machine (ELM) and logistic mapping, named logistic-ELM. First, we identify 14 kinds of time-domain features from the original vibration signals according to mechanical vibration principles and adopt the sequential forward selection (SFS) strategy to select optimal features from them to ensure the basic predictive accuracy and efficiency. Next, we propose the logistic-ELM for fast fault classification, where the biases in ELM are omitted and the random input weights are replaced by the chaotic logistic mapping sequence which involves a higher uncorrelation to obtain more accurate results with fewer hidden neurons. We conduct extensive experiments on the rolling bearing vibration signal dataset of the Case Western Reserve University (CWRU) Bearing Data Centre. The experimental results show that the proposed approach outperforms existing SOTA comparison methods in terms of the predictive accuracy, and the highest accuracy is 100% in seven separate sub data environments. The relevant code is publicly available at https://github.com/TAN-OpenLab/logistic-ELM.
On Leveraging Variational Graph Embeddings for Open World Compositional Zero-Shot Learning
Authors: Muhammad Umer Anwaar, Zhihui Pan, Martin Kleinsteuber
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2204.11848
Pdf link: https://arxiv.org/pdf/2204.11848
Abstract Humans are able to identify and categorize novel compositions of known concepts. The task in Compositional Zero-Shot learning (CZSL) is to learn composition of primitive concepts, i.e. objects and states, in such a way that even their novel compositions can be zero-shot classified. In this work, we do not assume any prior knowledge on the feasibility of novel compositions i.e.open-world setting, where infeasible compositions dominate the search space. We propose a Compositional Variational Graph Autoencoder (CVGAE) approach for learning the variational embeddings of the primitive concepts (nodes) as well as feasibility of their compositions (via edges). Such modelling makes CVGAE scalable to real-world application scenarios. This is in contrast to SOTA method, CGE, which is computationally very expensive. e.g.for benchmark C-GQA dataset, CGE requires 3.94 x 10^5 nodes, whereas CVGAE requires only 1323 nodes. We learn a mapping of the graph and image embeddings onto a common embedding space. CVGAE adopts a deep metric learning approach and learns a similarity metric in this space via bi-directional contrastive loss between projected graph and image embeddings. We validate the effectiveness of our approach on three benchmark datasets.We also demonstrate via an image retrieval task that the representations learnt by CVGAE are better suited for compositional generalization.
Mapping Research Trajectories
Authors: Bastian Schäfermeier, Gerd Stumme, Tom Hanika
Subjects: Digital Libraries (cs.DL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2204.11859
Pdf link: https://arxiv.org/pdf/2204.11859
Abstract Steadily growing amounts of information, such as annually published scientific papers, have become so large that they elude an extensive manual analysis. Hence, to maintain an overview, automated methods for the mapping and visualization of knowledge domains are necessary and important, e.g., for scientific decision makers. Of particular interest in this field is the development of research topics of different entities (e.g., scientific authors and venues) over time. However, existing approaches for their analysis are only suitable for single entity types, such as venues, and they often do not capture the research topics or the time dimension in an easily interpretable manner. Hence, we propose a principled approach for \emph{mapping research trajectories}, which is applicable to all kinds of scientific entities that can be represented by sets of published papers. For this, we transfer ideas and principles from the geographic visualization domain, specifically trajectory maps and interactive geographic maps. Our visualizations depict the research topics of entities over time in a straightforward interpr. manner. They can be navigated by the user intuitively and restricted to specific elements of interest. The maps are derived from a corpus of research publications (i.e., titles and abstracts) through a combination of unsupervised machine learning methods. In a practical demonstrator application, we exemplify the proposed approach on a publication corpus from machine learning. We observe that our trajectory visualizations of 30 top machine learning venues and 1000 major authors in this field are well interpretable and are consistent with background knowledge drawn from the entities' publications. Next to producing interactive, interpr. visualizations supporting different kinds of analyses, our computed trajectories are suitable for trajectory mining applications in the future.
Discrete-Continuous Smoothing and Mapping
Authors: Kevin J. Doherty, Ziqi Lu, Kurran Singh, John J. Leonard
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2204.11936
Pdf link: https://arxiv.org/pdf/2204.11936
Abstract We describe a general approach to smoothing and mapping with a class of discrete-continuous factor graphs commonly encountered in robotics applications. While there are openly available tools providing flexible and easy-to-use interfaces for specifying and solving optimization problems formulated in terms of either discrete or continuous graphical models, at present, no similarly general tools exist enabling the same functionality for hybrid discrete-continuous problems. We aim to address this problem. In particular, we provide a library, DC-SAM, extending existing tools for optimization problems defined in terms of factor graphs to the setting of discrete-continuous models. A key contribution of our work is a novel solver for efficiently recovering approximate solutions to discrete-continuous optimization problems. The key insight to our approach is that while joint inference over continuous and discrete state spaces is often hard, many commonly encountered discrete-continuous problems can naturally be split into a "discrete part" and a "continuous part" that can individually be solved easily. Leveraging this structure, we optimize discrete and continuous variables in an alternating fashion. In consequence, our proposed work enables straightforward representation of and approximate inference in discrete-continuous graphical models. We also provide a method to recover the uncertainty in estimates of both discrete and continuous variables. We demonstrate the versatility of our approach through its application to three distinct robot perception applications: point-cloud registration, robust pose graph optimization, and object-based mapping and localization.
C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval
Authors: Eugene Yang, Suraj Nair, Ramraj Chandradevan, Rebecca Iglesias-Flores, Douglas W. Oard
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2204.11989
Pdf link: https://arxiv.org/pdf/2204.11989
Abstract Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval. Recent work has shown that continuing to pretrain a language model with auxiliary objectives before fine-tuning on the retrieval task can further improve retrieval effectiveness. Unlike monolingual retrieval, designing an appropriate auxiliary task for cross-language mappings is challenging. To address this challenge, we use comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task. We show that our approach yields improvements in retrieval effectiveness.
Learning Weighting Map for Bit-Depth Expansion within a Rational Range
Authors: Yuqing Liu, Qi Jia, Jian Zhang, Xin Fan, Shanshe Wang, Siwei Ma, Wen Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2204.12039
Pdf link: https://arxiv.org/pdf/2204.12039
Abstract Bit-depth expansion (BDE) is one of the emerging technologies to display high bit-depth (HBD) image from low bit-depth (LBD) source. Existing BDE methods have no unified solution for various BDE situations, and directly learn a mapping for each pixel from LBD image to the desired value in HBD image, which may change the given high-order bits and lead to a huge deviation from the ground truth. In this paper, we design a bit restoration network (BRNet) to learn a weight for each pixel, which indicates the ratio of the replenished value within a rational range, invoking an accurate solution without modifying the given high-order bit information. To make the network adaptive for any bit-depth degradation, we investigate the issue in an optimization perspective and train the network under progressive training strategy for better performance. Moreover, we employ Wasserstein distance as a visual quality indicator to evaluate the difference of color distribution between restored image and the ground truth. Experimental results show our method can restore colorful images with fewer artifacts and false contours, and outperforms state-of-the-art methods with higher PSNR/SSIM results and lower Wasserstein distance. The source code will be made available at https://github.com/yuqing-liu-dut/bit-depth-expansion
Exact Wirelength of Embedding 3-Ary n-Cubes into certain Cylinders and Trees
Authors: Rajeshwari S, M Rajesh
Subjects: Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2204.12079
Pdf link: https://arxiv.org/pdf/2204.12079
Abstract Graph embeddings play a significant role in the design and analysis of parallel algorithms. It is a mapping of the topological structure of a guest graph G into a host graph H, which is represented as a one-to-one mapping from the vertex set of the guest graph to the vertex set of the host graph. In multiprocessing systems the interconnection networks enhance the efficient communication between the components in the system. Obtaining minimum wirelength in embedding problems is significant in the designing of network and simulating one architecture by another. In this paper, we determine the wirelength of embedding 3-ary n-cubes into cylinders and certain trees.
Time Series Prediction by Multi-task GPR with Spatiotemporal Information Transformation
Authors: Peng Tao, Xiaohu Hao, Jie Cheng, Luonan Chen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2204.12085
Pdf link: https://arxiv.org/pdf/2204.12085
Abstract Making an accurate prediction of an unknown system only from a short-term time series is difficult due to the lack of sufficient information, especially in a multi-step-ahead manner. However, a high-dimensional short-term time series contains rich dynamical information, and also becomes increasingly available in many fields. In this work, by exploiting spatiotemporal information (STI) transformation scheme that transforms such high-dimensional/spatial information to temporal information, we developed a new method called MT-GPRMachine to achieve accurate prediction from a short-term time series. Specifically, we first construct a specific multi-task GPR which is multiple linked STI mappings to transform high dimensional/spatial information into temporal/dynamical information of any given target variable, and then makes multi step-ahead prediction of the target variable by solving those STI mappings. The multi-step-ahead prediction results on various synthetic and real-world datasets clearly validated that MT-GPRMachine outperformed other existing approaches.
ClothFormer:Taming Video Virtual Try-on in All Module
Authors: Jianbin Jiang, Tan Wang, He Yan, Junhui Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2204.12151
Pdf link: https://arxiv.org/pdf/2204.12151
Abstract The task of video virtual try-on aims to fit the target clothes to a person in the video with spatio-temporal consistency. Despite tremendous progress of image virtual try-on, they lead to inconsistency between frames when applied to videos. Limited work also explored the task of video-based virtual try-on but failed to produce visually pleasing and temporally coherent results. Moreover, there are two other key challenges: 1) how to generate accurate warping when occlusions appear in the clothing region; 2) how to generate clothes and non-target body parts (e.g. arms, neck) in harmony with the complicated background; To address them, we propose a novel video virtual try-on framework, ClothFormer, which successfully synthesizes realistic, harmonious, and spatio-temporal consistent results in complicated environment. In particular, ClothFormer involves three major modules. First, a two-stage anti-occlusion warping module that predicts an accurate dense flow mapping between the body regions and the clothing regions. Second, an appearance-flow tracking module utilizes ridge regression and optical flow correction to smooth the dense flow sequence and generate a temporally smooth warped clothing sequence. Third, a dual-stream transformer extracts and fuses clothing textures, person features, and environment information to generate realistic try-on videos. Through rigorous experiments, we demonstrate that our method highly surpasses the baselines in terms of synthesized video quality both qualitatively and quantitatively.
Managing Reliability Skew in DNA Storage
Authors: Dehui Lin, Yasamin Tabatabaee, Yash Pote, Djordje Jevdjic
Subjects: Emerging Technologies (cs.ET); Hardware Architecture (cs.AR); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2204.12261
Pdf link: https://arxiv.org/pdf/2204.12261
Abstract DNA is emerging as an increasingly attractive medium for data storage due to a number of important and unique advantages it offers, most notably the unprecedented durability and density. While the technology is evolving rapidly, the prohibitive cost of reads and writes, the high frequency and the peculiar nature of errors occurring in DNA storage pose a significant challenge to its adoption. In this work we make a novel observation that the probability of successful recovery of a given bit from any type of a DNA-based storage system highly depends on its physical location within the DNA molecule. In other words, when used as a storage medium, some parts of DNA molecules appear significantly more reliable than others. We show that large differences in reliability between different parts of DNA molecules lead to highly inefficient use of error-correction resources, and that commonly used techniques such as unequal error-correction cannot be used to bridge the reliability gap between different locations in the context of DNA storage. We then propose two approaches to address the problem. The first approach is general and applies to any types of data; it stripes the data and ECC codewords across DNA molecules in a particular fashion such that the effects of errors are spread out evenly across different codewords and molecules, effectively de-biasing the underlying storage medium. The second approach is application-specific, and seeks to leverage the underlying reliability bias by using application-aware mapping of data onto DNA molecules such that data that requires higher reliability is stored in more reliable locations, whereas data that needs lower reliability is stored in less reliable parts of DNA molecules. We show that the proposed data mapping can be used to achieve graceful degradation in the presence of high error rates, or to implement the concept of approximate storage in DNA.
Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims
Authors: Ivan Srba, Branislav Pecher, Matus Tomlein, Robert Moro, Elena Stefancova, Jakub Simko, Maria Bielikova
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2204.12294
Pdf link: https://arxiv.org/pdf/2204.12294
Abstract False information has a significant negative influence on individuals as well as on the whole society. Especially in the current COVID-19 era, we witness an unprecedented growth of medical misinformation. To help tackle this problem with machine learning approaches, we are publishing a feature-rich dataset of approx. 317k medical news articles/blogs and 3.5k fact-checked claims. It also contains 573 manually and more than 51k automatically labelled mappings between claims and articles. Mappings consist of claim presence, i.e., whether a claim is contained in a given article, and article stance towards the claim. We provide several baselines for these two tasks and evaluate them on the manually labelled part of the dataset. The dataset enables a number of additional tasks related to medical misinformation, such as misinformation characterisation studies or studies of misinformation diffusion between sources.
ROMA: Cross-Domain Region Similarity Matching for Unpaired Nighttime Infrared to Daytime Visible Video Translation
Authors: Zhenjie Yu, Kai Chen, Shuang Li, Bingfeng Han, Chi Harold Liu, Shuigen Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2204.12367
Pdf link: https://arxiv.org/pdf/2204.12367
Abstract Infrared cameras are often utilized to enhance the night vision since the visible light cameras exhibit inferior efficacy without sufficient illumination. However, infrared data possesses inadequate color contrast and representation ability attributed to its intrinsic heat-related imaging principle. This makes it arduous to capture and analyze information for human beings, meanwhile hindering its application. Although, the domain gaps between unpaired nighttime infrared and daytime visible videos are even huger than paired ones that captured at the same time, establishing an effective translation mapping will greatly contribute to various fields. In this case, the structural knowledge within nighttime infrared videos and semantic information contained in the translated daytime visible pairs could be utilized simultaneously. To this end, we propose a tailored framework ROMA that couples with our introduced cRoss-domain regiOn siMilarity mAtching technique for bridging the huge gaps. To be specific, ROMA could efficiently translate the unpaired nighttime infrared videos into fine-grained daytime visible ones, meanwhile maintain the spatiotemporal consistency via matching the cross-domain region similarity. Furthermore, we design a multiscale region-wise discriminator to distinguish the details from synthesized visible results and real references. Extensive experiments and evaluations for specific applications indicate ROMA outperforms the state-of-the-art methods. Moreover, we provide a new and challenging dataset encouraging further research for unpaired nighttime infrared and daytime visible video translation, named InfraredCity. In particular, it consists of 9 long video clips including City, Highway and Monitor scenarios. All clips could be split into 603,142 frames in total, which are 20 times larger than the recently released daytime infrared-to-visible dataset IRVI.
Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN Accelerators
Authors: Axel Stjerngren, Perry Gibson, José Cano
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2204.12418
Pdf link: https://arxiv.org/pdf/2204.12418
Abstract Reconfigurable accelerators for deep neural networks (DNNs) promise to improve performance such as inference latency. STONNE is the first cycle-accurate simulator for reconfigurable DNN inference accelerators which allows for the exploration of accelerator designs and configuration space. However, preparing models for evaluation and exploring configuration space in STONNE is a manual developer-timeconsuming process, which is a barrier for research. This paper introduces Bifrost, an end-to-end framework for the evaluation and optimization of reconfigurable DNN inference accelerators. Bifrost operates as a frontend for STONNE and leverages the TVM deep learning compiler stack to parse models and automate offloading of accelerated computations. We discuss Bifrost's advantages over STONNE and other tools, and evaluate the MAERI and SIGMA architectures using Bifrost. Additionally, Bifrost introduces a module leveraging AutoTVM to efficiently explore accelerator designs and dataflow mapping space to optimize performance. This is demonstrated by tuning the MAERI architecture and generating efficient dataflow mappings for AlexNet, obtaining an average speedup of $50\times$ for the convolutional layers and $11\times$ for the fully connected layers. Our code is available at www.github.com/gicLAB/bifrost.
Keyword: localization

Discrete-Continuous Smoothing and Mapping
Authors: Kevin J. Doherty, Ziqi Lu, Kurran Singh, John J. Leonard
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2204.11936
Pdf link: https://arxiv.org/pdf/2204.11936
Abstract We describe a general approach to smoothing and mapping with a class of discrete-continuous factor graphs commonly encountered in robotics applications. While there are openly available tools providing flexible and easy-to-use interfaces for specifying and solving optimization problems formulated in terms of either discrete or continuous graphical models, at present, no similarly general tools exist enabling the same functionality for hybrid discrete-continuous problems. We aim to address this problem. In particular, we provide a library, DC-SAM, extending existing tools for optimization problems defined in terms of factor graphs to the setting of discrete-continuous models. A key contribution of our work is a novel solver for efficiently recovering approximate solutions to discrete-continuous optimization problems. The key insight to our approach is that while joint inference over continuous and discrete state spaces is often hard, many commonly encountered discrete-continuous problems can naturally be split into a "discrete part" and a "continuous part" that can individually be solved easily. Leveraging this structure, we optimize discrete and continuous variables in an alternating fashion. In consequence, our proposed work enables straightforward representation of and approximate inference in discrete-continuous graphical models. We also provide a method to recover the uncertainty in estimates of both discrete and continuous variables. We demonstrate the versatility of our approach through its application to three distinct robot perception applications: point-cloud registration, robust pose graph optimization, and object-based mapping and localization.
Toward Consistent and Efficient Map-based Visual-inertial Localization: Theory Framework and Filter Design
Authors: Zhuqing Zhang, Yang Song, Shoudong Huang, Rong Xiong, Yue Wang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2204.12108
Pdf link: https://arxiv.org/pdf/2204.12108
Abstract This paper focuses on designing a consistent and efficient filter for map-based visual-inertial localization. First, we propose a new Lie group with its algebra, based on which a novel invariant extended Kalman filter (invariant EKF) is designed. We theoretically prove that, when we do not consider the uncertainty of the map information, the proposed invariant EKF can naturally maintain the correct observability properties of the system. To consider the uncertainty of the map information, we introduce a Schmidt filter. With the Schmidt filter, the uncertainty of the map information can be taken into consideration to avoid over-confident estimation while the computation cost only increases linearly with the size of the map keyframes. In addition, we introduce an easily implemented observability-constrained technique because directly combining the invariant EKF with the Schmidt filter cannot maintain the correct observability properties of the system that considers the uncertainty of the map information. Finally, we validate our proposed system's high consistency, accuracy, and efficiency via extensive simulations and real-world experiments.
Map-based Visual-Inertial Localization: Consistency and Complexity
Authors: Zhuqing Zhang, Yanmei Jiao, Shoudong Huang, Yue Wang, Rong Xiong
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2204.12173
Pdf link: https://arxiv.org/pdf/2204.12173
Abstract Drift-free localization is essential for autonomous vehicles. In this paper, we address the problem by proposing a filter-based framework, which integrates the visual-inertial odometry and the measurements of the features in the pre-built map. In this framework, the transformation between the odometry frame and the map frame is augmented into the state and estimated on the fly. Besides, we maintain only the keyframe poses in the map and employ Schmidt extended Kalman filter to update the state partially, so that the uncertainty of the map information can be consistently considered with low computational cost. Moreover, we theoretically demonstrate that the ever-changing linearization points of the estimated state can introduce spurious information to the augmented system and make the original four-dimensional unobservable subspace vanish, leading to inconsistent estimation in practice. To relieve this problem, we employ first-estimate Jacobian (FEJ) to maintain the correct observability properties of the augmented system. Furthermore, we introduce an observability-constrained updating method to compensate for the significant accumulated error after the long-term absence (can be 3 minutes and 1 km) of map-based measurements. Through simulations, the consistent estimation of our proposed algorithm is validated. Through real-world experiments, we demonstrate that our proposed algorithm runs successfully on four kinds of datasets with the lower computational cost (20% time-saving) and the better estimation accuracy (45% trajectory error reduction) compared with the baseline algorithm VINS-Fusion, whereas VINS-Fusion fails to give bounded localization performance on three of four datasets because of its inconsistent estimation.
Contrastive Language-Action Pre-training for Temporal Localization
Authors: Mengmeng Xu, Erhan Gundogdu, Maksim Lapin, Bernard Ghanem, Michael Donoser, Loris Bazzani
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2204.12293
Pdf link: https://arxiv.org/pdf/2204.12293
Abstract Long-form video understanding requires designing approaches that are able to temporally localize activities or language. End-to-end training for such tasks is limited by the compute device memory constraints and lack of temporal annotations at large-scale. These limitations can be addressed by pre-training on large datasets of temporally trimmed videos supervised by class annotations. Once the video encoder is pre-trained, it is common practice to freeze it during fine-tuning. Therefore, the video encoder does not learn temporal boundaries and unseen classes, causing a domain gap with respect to the downstream tasks. Moreover, using temporally trimmed videos prevents to capture the relations between different action categories and the background context in a video clip which results in limited generalization capacity. To address these limitations, we propose a novel post-pre-training approach without freezing the video encoder which leverages language. We introduce a masked contrastive learning loss to capture visio-linguistic relations between activities, background video clips and language in the form of captions. Our experiments show that the proposed approach improves the state-of-the-art on temporal action localization, few-shot temporal action localization, and video language grounding tasks.
Sound Localization by Self-Supervised Time Delay Estimation
Authors: Ziyang Chen, David F. Fouhey, Andrew Owens
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2204.12489
Pdf link: https://arxiv.org/pdf/2204.12489
Abstract Sounds reach one microphone in a stereo pair sooner than the other, resulting in an interaural time delay that conveys their directions. Estimating a sound's time delay requires finding correspondences between the signals recorded by each microphone. We propose to learn these correspondences through self-supervision, drawing on recent techniques from visual tracking. We adapt the contrastive random walk of Jabri et al. to learn a cycle-consistent representation from unlabeled stereo sounds, resulting in a model that performs on par with supervised methods on "in the wild" internet recordings. We also propose a multimodal contrastive learning model that solves a visually-guided localization task: estimating the time delay for a particular person in a multi-speaker mixture, given a visual representation of their face. Project site: https://ificl.github.io/stereocrw/

zhuhu00 / Paper-Daily-Notice

New submissions for Wed, 27 Apr 22 #150

Keyword: SLAM

Keyword: Visual inertial

Keyword: livox

Keyword: loam

Keyword: Visual inertial odometry

Keyword: lidar

Centimeter-level Positioning by Instantaneous Lidar-aided GNSS Ambiguity Resolution

Keyword: loop detection

Keyword: autonomous driving

Where and What: Driver Attention-based Object Detection

Keyword: mapping

Logistic-ELM: A Novel Fault Diagnosis Method for Rolling Bearings

On Leveraging Variational Graph Embeddings for Open World Compositional Zero-Shot Learning

Mapping Research Trajectories

Discrete-Continuous Smoothing and Mapping

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

Learning Weighting Map for Bit-Depth Expansion within a Rational Range

Exact Wirelength of Embedding 3-Ary n-Cubes into certain Cylinders and Trees

Time Series Prediction by Multi-task GPR with Spatiotemporal Information Transformation

ClothFormer:Taming Video Virtual Try-on in All Module

Managing Reliability Skew in DNA Storage

Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims

ROMA: Cross-Domain Region Similarity Matching for Unpaired Nighttime Infrared to Daytime Visible Video Translation

Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN Accelerators

Keyword: localization

Discrete-Continuous Smoothing and Mapping

Toward Consistent and Efficient Map-based Visual-inertial Localization: Theory Framework and Filter Design

Map-based Visual-Inertial Localization: Consistency and Complexity

Contrastive Language-Action Pre-training for Temporal Localization

Sound Localization by Self-Supervised Time Delay Estimation