New submissions for Fri, 4 Feb 22

Keyword: SLAM

mSLAM: Massively multilingual joint pre-training for speech and text

Authors: Ankur Bapna, Colin Cherry, Yu Zhang, Ye Jia, Melvin Johnson, Yong Cheng, Simran Khanuja, Jason Riesa, Alexis Conneau
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2202.01374
Pdf link: https://arxiv.org/pdf/2202.01374
Abstract We present mSLAM, a multilingual Speech and LAnguage Model that learns cross-lingual cross-modal representations of speech and text by pre-training jointly on large amounts of unlabeled speech and text in multiple languages. mSLAM combines w2v-BERT pre-training on speech with SpanBERT pre-training on character-level text, along with Connectionist Temporal Classification (CTC) losses on paired speech and transcript data, to learn a single model capable of learning from and representing both speech and text signals in a shared representation space. We evaluate mSLAM on several downstream speech understanding tasks and find that joint pre-training with text improves quality on speech translation, speech intent classification and speech language-ID while being competitive on multilingual ASR, when compared against speech-only pre-training. Our speech translation model demonstrates zero-shot text translation without seeing any text translation data, providing evidence for cross-modal alignment of representations. mSLAM also benefits from multi-modal fine-tuning, further improving the quality of speech translation by directly leveraging text translation data during the fine-tuning process. Our empirical analysis highlights several opportunities and challenges arising from large-scale multimodal pre-training, suggesting directions for future research.
Keyword: Visual inertial

There is no result

Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: Visual inertial odometry

There is no result

Keyword: lidar

There is no result

Keyword: loop detection

There is no result

Keyword: autonomous driving

PanoDepth: A Two-Stage Approach for Monocular Omnidirectional Depth Estimation
Authors: Yuyan Li, Zhixin Yan, Ye Duan, Liu Ren
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2202.01323
Pdf link: https://arxiv.org/pdf/2202.01323
Abstract Omnidirectional 3D information is essential for a wide range of applications such as Virtual Reality, Autonomous Driving, Robotics, etc. In this paper, we propose a novel, model-agnostic, two-stage pipeline for omnidirectional monocular depth estimation. Our proposed framework PanoDepth takes one 360 image as input, produces one or more synthesized views in the first stage, and feeds the original image and the synthesized images into the subsequent stereo matching stage. In the second stage, we propose a differentiable Spherical Warping Layer to handle omnidirectional stereo geometry efficiently and effectively. By utilizing the explicit stereo-based geometric constraints in the stereo matching stage, PanoDepth can generate dense high-quality depth. We conducted extensive experiments and ablation studies to evaluate PanoDepth with both the full pipeline as well as the individual modules in each stage. Our results show that PanoDepth outperforms the state-of-the-art approaches by a large margin for 360 monocular depth estimation.
AI-as-a-Service Toolkit for Human-Centered Intelligence in Autonomous Driving
Authors: Valerio De Caro, Saira Bano, Achilles Machumilane, Alberto Gotta, Pietro Cassará, Antonio Carta, Christos Sardianos, Christos Chronis, Iraklis Varlamis, Konstantinos Tserpes, Vincenzo Lomonaco, Claudio Gallicchio, Davide Bacciu
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2202.01645
Pdf link: https://arxiv.org/pdf/2202.01645
Abstract This paper presents a proof-of-concept implementation of the AI-as-a-service toolkit developed within the H2020 TEACHING project and designed to implement an autonomous driving personalization system according to the output of an automatic driver's stress recognition algorithm, both of them realizing a Cyber-Physical System of Systems. In addition, we implemented a data-gathering subsystem to collect data from different sensors, i.e., wearables and cameras, to automatize stress recognition. The system was attached for testing to a driving emulation software, CARLA, which allows testing the approach's feasibility with minimum cost and without putting at risk drivers and passengers. At the core of the relative subsystems, different learning algorithms were implemented using Deep Neural Networks, Recurrent Neural Networks, and Reinforcement Learning.
Keyword: mapping

VNE Solution for Network Differentiated QoS and Security Requirements: From the Perspective of Deep Reinforcement Learning
Authors: Chao Wang, Ranbir Singh Batth, Peiying Zhang, Gagangeet Singh Aujla, Youxiang Duan, Lihua Ren
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2202.01362
Pdf link: https://arxiv.org/pdf/2202.01362
Abstract The rapid development and deployment of network services has brought a series of challenges to researchers. On the one hand, the needs of Internet end users/applications reflect the characteristics of travel alienation, and they pursue different perspectives of service quality. On the other hand, with the explosive growth of information in the era of big data, a lot of private information is stored in the network. End users/applications naturally start to pay attention to network security. In order to solve the requirements of differentiated quality of service (QoS) and security, this paper proposes a virtual network embedding (VNE) algorithm based on deep reinforcement learning (DRL), aiming at the CPU, bandwidth, delay and security attributes of substrate network. DRL agent is trained in the network environment constructed by the above attributes. The purpose is to deduce the mapping probability of each substrate node and map the virtual node according to this probability. Finally, the breadth first strategy (BFS) is used to map the virtual links. In the experimental stage, the algorithm based on DRL is compared with other representative algorithms in three aspects: long term average revenue, long term revenue consumption ratio and acceptance rate. The results show that the algorithm proposed in this paper has achieved good experimental results, which proves that the algorithm can be effectively applied to solve the end user/application differentiated QoS and security requirements.
A multi-domain virtual network embedding algorithm with delay prediction
Authors: Peiying Zhang, Xue Pang, Yongjing Ni, Haipeng Yao, Xin Li
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2202.01473
Pdf link: https://arxiv.org/pdf/2202.01473
Abstract Virtual network embedding (VNE) is an crucial part of network virtualization (NV), which aims to map the virtual networks (VNs) to a shared substrate network (SN). With the emergence of various delay-sensitive applications, how to improve the delay performance of the system has become a hot topic in academic circles. Based on extensive research, we proposed a multi-domain virtual network embedding algorithm based on delay prediction (DP-VNE). Firstly, the candidate physical nodes are selected by estimating the delay of virtual requests, then particle swarm optimization (PSO) algorithm is used to optimize the mapping process, so as to reduce the delay of the system. The simulation results show that compared with the other three advanced algorithms, the proposed algorithm can significantly reduce the system delay while keeping other indicators unaffected.
Keyword: localization

Training Semantic Descriptors for Image-Based Localization
Authors: Ibrahim Cinaroglu, Yalin Bastanlar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2202.01212
Pdf link: https://arxiv.org/pdf/2202.01212
Abstract Vision based solutions for the localization of vehicles have become popular recently. We employ an image retrieval based visual localization approach. The database images are kept with GPS coordinates and the location of the retrieved database image serves as an approximate position of the query image. We show that localization can be performed via descriptors solely extracted from semantically segmented images. It is reliable especially when the environment is subjected to severe illumination and seasonal changes. Our experiments reveal that the localization performance of a semantic descriptor can increase up to the level of state-of-the-art RGB image based methods.
A semi-discrete numerical scheme for nonlocally regularized KdV-type equations
Authors: H. A. Erbay, S. Erbay, A. Erkip
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2202.01262
Pdf link: https://arxiv.org/pdf/2202.01262
Abstract A general class of KdV-type wave equations regularized with a convolution-type nonlocality in space is considered. The class differs from the class of the nonlinear nonlocal unidirectional wave equations previously studied by the addition of a linear convolution term involving third-order derivative. To solve the Cauchy problem we propose a semi-discrete numerical method based on a uniform spatial discretization, that is an extension of a previously published work of the present authors. We prove uniform convergence of the numerical method as the mesh size goes to zero. We also prove that the localization error resulting from localization to a finite domain is significantly less than a given threshold if the finite domain is large enough. To illustrate the theoretical results, some numerical experiments are carried out for the Rosenau-KdV equation, the Rosenau-BBM-KdV equation and a convolution-type integro-differential equation. The experiments conducted for three particular choices of the kernel function confirm the error estimates that we provide.
Feasibility of Interactive 3D Map for Remote Sighted Assistance
Authors: Jingyi Xie, Rui Yu, Sooyeon Lee, Yao Lyu, Syed Masum Billah, John M. Carroll
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2202.01365
Pdf link: https://arxiv.org/pdf/2202.01365
Abstract Remote sighted assistance (RSA) has emerged as a conversational assistive technology, where remote sighted workers, i.e., agents, provide real-time assistance to users with vision impairments via video-chat-like communication. Researchers found that agents' lack of environmental knowledge, the difficulty of orienting users in their surroundings, and the inability to estimate distances from users' camera feeds are key challenges to sighted agents. To address these challenges, researchers have suggested assisting agents with computer vision technologies, especially 3D reconstruction. This paper presents a high-fidelity prototype of such an RSA, where agents use interactive 3D maps with localization capability. We conducted a walkthrough study with thirteen agents and one user with simulated vision impairment using this prototype. The study revealed that, compared to baseline RSA, the agents were significantly faster in providing navigational assistance to users, and their mental workload was significantly reduced -- all indicate the feasibility and prospect of 3D maps in RSA.

zhuhu00 / Paper-Daily-Notice

New submissions for Fri, 4 Feb 22 #93

Keyword: SLAM

mSLAM: Massively multilingual joint pre-training for speech and text

Keyword: Visual inertial

Keyword: livox

Keyword: loam

Keyword: Visual inertial odometry

Keyword: lidar

Keyword: loop detection

Keyword: autonomous driving

PanoDepth: A Two-Stage Approach for Monocular Omnidirectional Depth Estimation

AI-as-a-Service Toolkit for Human-Centered Intelligence in Autonomous Driving

Keyword: mapping

VNE Solution for Network Differentiated QoS and Security Requirements: From the Perspective of Deep Reinforcement Learning

A multi-domain virtual network embedding algorithm with delay prediction

Keyword: localization

Training Semantic Descriptors for Image-Based Localization

A semi-discrete numerical scheme for nonlocally regularized KdV-type equations

Feasibility of Interactive 3D Map for Remote Sighted Assistance