New submissions for Wed, 18 May 22

Keyword: SLAM

Cluster on Wheels

Authors: Yuanyuan Yang, Delin Feng, Sören Schwertfeger
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2205.08151
Pdf link: https://arxiv.org/pdf/2205.08151
Abstract This paper presents a very compact 16-node cluster that is the core of a future robot for collecting and storing massive amounts of sensor data for research on Simultaneous Localization and Mapping (SLAM). To the best of our knowledge, this is the first time that such a cluster is used in robotics. We first present the requirements and different options for computing of such a robot and then show the hardware and software of our solution in detail. The cluster consists of 16 nodes of AMD Ryzen 7 5700U CPUs with a total of 128 cores. As a system that is to be used on a Clearpath Husky robot, it is very small in size, can be operated from battery power and has all required power and networking components integrated. Stress tests on the completed cluster show that it performs well.
Keyword: odometry

DynPL-SVO: A New Method Using Point and Line Features for Stereo Visual Odometry in Dynamic Scenes
Authors: Xiaoguang Ma, Ya Wang, Baosheng Zhang, Hong-Jun Ma, Chunbo Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2205.08207
Pdf link: https://arxiv.org/pdf/2205.08207
Abstract Stereo visual odometry is widely used where a robot tracks its position and orientation using stereo cameras. Most of the approaches recovered mobile robotics motion based on the matching and tracking of point features along a sequence of stereo images. But in low-textured and dynamic scenes, there are no sufficient robust static point features for motion estimation, causing lots of previous work to fail to reconstruct the robotic motion. However, line features can be detected in such low-textured and dynamic scenes. In this paper, we proposed DynPL-SVO, a stereo visual odometry with the $dynamic$ $grid$ algorithm and the cost function containing both vertical and horizontal information of the line features. Stereo camera motion was obtained through Levenberg-Marquard minimization of re-projection error of point and line features. The experimental results on the KITTI and EuRoC MAV datasets showed that the DynPL-SVO had a competitive performance when compared to other state-of-the-art systems by producing more robust and accurate motion estimation, especially in low-textured and dynamic scenes.
Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: lidar

Efficient Stereo Depth Estimation for Pseudo LiDAR: A Self-Supervised Approach Based on Multi-Input ResNet Encoder
Authors: Sabir Hossain, Xianke Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2205.08089
Pdf link: https://arxiv.org/pdf/2205.08089
Abstract Perception and localization are essential for autonomous delivery vehicles, mostly estimated from 3D LiDAR sensors due to their precise distance measurement capability. This paper presents a strategy to obtain the real-time pseudo point cloud instead of the laser sensor from the image sensor. We propose an approach to use different depth estimators to obtain pseudo point clouds like LiDAR to obtain better performance. Moreover, the training and validating strategy of the depth estimator has adopted stereo imagery data to estimate more accurate depth estimation as well as point cloud results. Our approach to generating depth maps outperforms on KITTI benchmark while yielding point clouds significantly faster than other approaches.
UnPWC-SVDLO: Multi-SVD on PointPWC for Unsupervised Lidar Odometry
Authors: Yiming Tu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2205.08150
Pdf link: https://arxiv.org/pdf/2205.08150
Abstract High-precision lidar odomety is an essential part of autonomous driving. In recent years, deep learning methods have been widely used in lidar odomety tasks, but most of the current methods only extract the global features of the point clouds. It is impossible to obtain more detailed point-level features in this way. In addition, only the fully connected layer is used to estimate the pose. The fully connected layer has achieved obvious results in the classification task, but the changes in pose are a continuous rather than discrete process, high-precision pose estimation can not be obtained only by using the fully connected layer. Our method avoids problems mentioned above. We use PointPWC as our backbone network. PointPWC is originally used for scene flow estimation. The scene flow estimation task has a strong correlation with lidar odomety. Traget point clouds can be obtained by adding the scene flow and source point clouds. We can achieve the pose directly through ICP algorithm solved by SVD, and the fully connected layer is no longer used. PointPWC extracts point-level features from point clouds with different sampling levels, which solves the problem of too rough feature extraction. We conduct experiments on KITTI, Ford Campus Vision and Lidar DataSe and Apollo-SouthBay Dataset. Our result is comparable with the state-of-the-art unsupervised deep learing method SelfVoxeLO.
Keyword: loop detection

There is no result

Keyword: autonomous driving

Design and Implement an Enhanced Simulator for Autonomous Delivery Robot
Authors: Zhaofeng Tian, Weisong Shi
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2205.07944
Pdf link: https://arxiv.org/pdf/2205.07944
Abstract As autonomous driving technology is getting more and more mature today, autonomous delivery companies like Starship, Marble, and Nuro has been making progress in the tests of their autonomous delivery robots. While simulations and simulators are very important for the final product landing of the autonomous delivery robots since the autonomous delivery robots need to navigate on the sidewalk, campus, and other urban scenarios, where the simulations can avoid real damage to pedestrians and properties in the real world caused by any algorithm failures and programming errors and thus accelerate the whole developing procedure and cut down the cost. In this case, this study proposes an open-source simulator based on our autonomous delivery robot ZebraT to accelerate the research on autonomous delivery. The simulator developing procedure is illustrated step by step. What is more, the applications on the simulator that we are working on are also introduced, which includes autonomous navigation in the simulated urban environment, cooperation between an autonomous vehicle and an autonomous delivery robot, and reinforcement learning practice on the task training in the simulator. We have published the proposed simulator in Github.
UnPWC-SVDLO: Multi-SVD on PointPWC for Unsupervised Lidar Odometry
Authors: Yiming Tu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2205.08150
Pdf link: https://arxiv.org/pdf/2205.08150
Abstract High-precision lidar odomety is an essential part of autonomous driving. In recent years, deep learning methods have been widely used in lidar odomety tasks, but most of the current methods only extract the global features of the point clouds. It is impossible to obtain more detailed point-level features in this way. In addition, only the fully connected layer is used to estimate the pose. The fully connected layer has achieved obvious results in the classification task, but the changes in pose are a continuous rather than discrete process, high-precision pose estimation can not be obtained only by using the fully connected layer. Our method avoids problems mentioned above. We use PointPWC as our backbone network. PointPWC is originally used for scene flow estimation. The scene flow estimation task has a strong correlation with lidar odomety. Traget point clouds can be obtained by adding the scene flow and source point clouds. We can achieve the pose directly through ICP algorithm solved by SVD, and the fully connected layer is no longer used. PointPWC extracts point-level features from point clouds with different sampling levels, which solves the problem of too rough feature extraction. We conduct experiments on KITTI, Ford Campus Vision and Lidar DataSe and Apollo-SouthBay Dataset. Our result is comparable with the state-of-the-art unsupervised deep learing method SelfVoxeLO.
Landing AI on Networks: An equipment vendor viewpoint on Autonomous Driving Networks
Authors: Dario Rossi, Liang Zhang
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2205.08347
Pdf link: https://arxiv.org/pdf/2205.08347
Abstract The tremendous achievements of Artificial Intelligence (AI) in computer vision, natural language processing, games and robotics, has extended the reach of the AI hype to other fields: in telecommunication networks, the long term vision is to let AI fully manage, and autonomously drive, all aspects of network operation. In this industry vision paper, we discuss challenges and opportunities of Autonomous Driving Network (ADN) driven by AI technologies. To understand how AI can be successfully landed in current and future networks, we start by outlining challenges that are specific to the networking domain, putting them in perspective with advances that AI has achieved in other fields. We then present a system view, clarifying how AI can be fitted in the network architecture. We finally discuss current achievements as well as future promises of AI in networks, mentioning a roadmap to avoid bumps in the road that leads to true large-scale deployment of AI technologies in networks.
Keyword: mapping

Functional2Structural: Cross-Modality Brain Networks Representation Learning
Authors: Haoteng Tang, Xiyao Fu, Lei Guo, Yalin Wang, Scott Mackin, Olusola Ajilore, Alex Leow, Paul Thompson, Heng Huang, Liang Zhan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2205.07854
Pdf link: https://arxiv.org/pdf/2205.07854
Abstract MRI-based modeling of brain networks has been widely used to understand functional and structural interactions and connections among brain regions, and factors that affect them, such as brain development and disease. Graph mining on brain networks may facilitate the discovery of novel biomarkers for clinical phenotypes and neurodegenerative diseases. Since brain networks derived from functional and structural MRI describe the brain topology from different perspectives, exploring a representation that combines these cross-modality brain networks is non-trivial. Most current studies aim to extract a fused representation of the two types of brain network by projecting the structural network to the functional counterpart. Since the functional network is dynamic and the structural network is static, mapping a static object to a dynamic object is suboptimal. However, mapping in the opposite direction is not feasible due to the non-negativity requirement of current graph learning techniques. Here, we propose a novel graph learning framework, known as Deep Signed Brain Networks (DSBN), with a signed graph encoder that, from an opposite perspective, learns the cross-modality representations by projecting the functional network to the structural counterpart. We validate our framework on clinical phenotype and neurodegenerative disease prediction tasks using two independent, publicly available datasets (HCP and OASIS). The experimental results clearly demonstrate the advantages of our model compared to several state-of-the-art methods.
An Extension to Basis-Hypervectors for Learning from Circular Data in Hyperdimensional Computing
Authors: Igor Nunes, Mike Heddes, Tony Givargis, Alexandru Nicolau
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2205.07920
Pdf link: https://arxiv.org/pdf/2205.07920
Abstract Hyperdimensional Computing (HDC) is a computation framework based on properties of high-dimensional random spaces. It is particularly useful for machine learning in resource-constrained environments, such as embedded systems and IoT, as it achieves a good balance between accuracy, efficiency and robustness. The mapping of information to the hyperspace, named encoding, is the most important stage in HDC. At its heart are basis-hypervectors, responsible for representing the smallest units of meaningful information. In this work we present a detailed study on basis-hypervector sets, which leads to practical contributions to HDC in general: 1) we propose an improvement for level-hypervectors, used to encode real numbers; 2) we introduce a method to learn from circular data, an important type of information never before addressed in machine learning with HDC. Empirical results indicate that these contributions lead to considerably more accurate models for both classification and regression with circular data.
Cluster on Wheels
Authors: Yuanyuan Yang, Delin Feng, Sören Schwertfeger
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2205.08151
Pdf link: https://arxiv.org/pdf/2205.08151
Abstract This paper presents a very compact 16-node cluster that is the core of a future robot for collecting and storing massive amounts of sensor data for research on Simultaneous Localization and Mapping (SLAM). To the best of our knowledge, this is the first time that such a cluster is used in robotics. We first present the requirements and different options for computing of such a robot and then show the hardware and software of our solution in detail. The cluster consists of 16 nodes of AMD Ryzen 7 5700U CPUs with a total of 128 cores. As a system that is to be used on a Clearpath Husky robot, it is very small in size, can be operated from battery power and has all required power and networking components integrated. Stress tests on the completed cluster show that it performs well.
IIsy: Practical In-Network Classification
Authors: Changgang Zheng, Zhaoqi Xiong, Thanh T Bui, Siim Kaupmees, Riyad Bensoussane, Antoine Bernabeu, Shay Vargaftik, Yaniv Ben-Itzhak, Noa Zilberman
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2205.08243
Pdf link: https://arxiv.org/pdf/2205.08243
Abstract The rat race between user-generated data and data-processing systems is currently won by data. The increased use of machine learning leads to further increase in processing requirements, while data volume keeps growing. To win the race, machine learning needs to be applied to the data as it goes through the network. In-network classification of data can reduce the load on servers, reduce response time and increase scalability. In this paper, we introduce IIsy, implementing machine learning classification models in a hybrid fashion using off-the-shelf network devices. IIsy targets three main challenges of in-network classification: (i) mapping classification models to network devices (ii) extracting the required features and (iii) addressing resource and functionality constraints. IIsy supports a range of traditional and ensemble machine learning models, scaling independently of the number of stages in a switch pipeline. Moreover, we demonstrate the use of IIsy for hybrid classification, where a small model is implemented on a switch and a large model at the backend, achieving near optimal classification results, while significantly reducing latency and load on the servers.
Deep Supervised Information Bottleneck Hashing for Cross-modal Retrieval based Computer-aided Diagnosis
Authors: Yufeng Shi, Shuhuang Chen, Xinge You, Qinmu Peng, Weihua Ou, Yue Zhao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2205.08365
Pdf link: https://arxiv.org/pdf/2205.08365
Abstract Mapping X-ray images, radiology reports, and other medical data as binary codes in the common space, which can assist clinicians to retrieve pathology-related data from heterogeneous modalities (i.e., hashing-based cross-modal medical data retrieval), provides a new view to promot computeraided diagnosis. Nevertheless, there remains a barrier to boost medical retrieval accuracy: how to reveal the ambiguous semantics of medical data without the distraction of superfluous information. To circumvent this drawback, we propose Deep Supervised Information Bottleneck Hashing (DSIBH), which effectively strengthens the discriminability of hash codes. Specifically, the Deep Deterministic Information Bottleneck (Yu, Yu, and Principe 2021) for single modality is extended to the cross-modal scenario. Benefiting from this, the superfluous information is reduced, which facilitates the discriminability of hash codes. Experimental results demonstrate the superior accuracy of the proposed DSIBH compared with state-of-the-arts in cross-modal medical data retrieval tasks.
A Comprehensive Study on Artificial Intelligence Algorithms to Implement Safety Using Communication Technologies
Authors: Rafia Inam, Alberto Yukinobu Hata, Vlasjov Prifti, Sara Abbaspour Asadollah
Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2205.08404
Pdf link: https://arxiv.org/pdf/2205.08404
Abstract The recent development of artificial intelligence (AI) has increased the interest of researchers and practitioners towards applying its techniques into multiple domains like automotive, health care and air space to achieve automation. Combined to these applications, the attempt to use AI techniques into carrying out safety issues is momentarily at a progressive state. As AI problems are getting even more complex, large processing power is demanded for safety-critical systems to fulfill real-time requirements. These challenges can be solved through edge or cloud computing, which makes the communication an integral part of the solution. This study aims at providing a comprehensive picture of the state of the art AI based safety solutions that uses different communication technologies in diverse application domains. To achieve this, a systematic mapping study is conducted and 565 relevant papers are shortlisted through a multistage selection process, which are then analyzed according to a systematically defined classification framework. The results of the study are based on these main objectives: to clarify current research gaps in the field, to identify the possibility of increased usage of cellular communication in multiple domains, to identify the mostly used AI algorithms and to summarize the emerging future research trends on the topic. The results demonstrate that automotive domain is the one applying AI and communication the most to implement safety and the most used AI in this domain is neural networks, clustering and computer vision; applying cellular communication to automotive domain is highest; the use of non-cellular communication technologies is dominant however a clear trend of a rapid increase in the use of cellular communication is observed specially from 2020 with the roll-out of 5G technology.
Twenty-two years since revealing cross-site scripting attacks: a systematic mapping and a comprehensive survey
Authors: Abdelhakim Hannousse, Salima Yahiouche, Mohamed Cherif Nait-Hamoud
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2205.08425
Pdf link: https://arxiv.org/pdf/2205.08425
Abstract Cross-site scripting (XSS) is one of the major threats menacing the privacy of data and the navigation of trusted web applications. Since its reveal in late 1999 by Microsoft security engineers, several techniques have been developed in the aim to secure web navigation and protect web applications against XSS attacks. The problem became worse with the emergence of advanced web technologies such as Web services and APIs and new programming styles such as AJAX, CSS3 and HTML5. While new technologies enable complex interactions and data exchanges between clients and servers in the network, new programming styles introduce new and complicate injection flaws to web applications. XSS has been and still in the TOP 10 list of web vulnerabilities reported by the Open Web Applications Security Project (OWASP). Consequently, handling XSS attacks became one of the major concerns of several web security communities. In this paper, we contribute by conducting a systematic mapping and a comprehensive survey. We summarize and categorize existent endeavors that aim to protect against XSS attacks and develop XSS-free web applications. The present review covers 147 high quality published studies since 1999 including early publications of 2022. A comprehensive taxonomy is drawn out describing the different techniques used to prevent, detect, protect and defend against XSS attacks. Although the diversity of XSS attack types and the scripting languages that can be used to state them, the systematic mapping revealed a remarkable bias toward basic and JavaScript XSS attacks and a dearth of vulnerability repair mechanisms. The survey highlighted the limitations, discussed the potentials of existing XSS attack defense mechanisms and identified potential gaps.
Systematic Mapping Protocol: Variability Management in Dynamic Software Product Lines for Self-Adaptive Systems
Authors: Oscar Aguayo, Samuel Sepúlveda
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2205.08487
Pdf link: https://arxiv.org/pdf/2205.08487
Abstract Context: The Importance of Dynamic Variability Management in Dynamic Software Product Lines. Objective: Define a protocol for conducting a systematic mapping study to summarize and synthesize evidence on dynamic variability management for Dynamic Software Product Lines in self-adaptive systems. Method: Application the protocol to conduct a systematic mapping study according the guidelines of K. Petersen. Results: A validated protocol to conduct a systematic mapping study. Conclusions: First findings show that it is necessary to visualize new ways to manage variability in dynamic software product lines.
Control Interface Remapping for Bias-Aware Assistive Teleoperation
Authors: Andrew Thompson, Larisa Y.C. Loke, Brenna Argall
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2205.08489
Pdf link: https://arxiv.org/pdf/2205.08489
Abstract Users of assistive devices vary in their extent of motor impairment, and hence their physical interaction with control interfaces can differ. There is the potential for improved utility if control interface actuation is mapped to assistive device control signals in a manner customized to each user. In this paper, we present (1) a method for creating a custom interface to assistive device control mapping based on the design of a user's bias profile, (2) a procedure and virtual task for gathering interface actuation data from which to build the bias profile and map, and (3) an evaluation of our method on 6 participants with upper limb motor impairments. Our results show that custom interface remapping based on user bias profiles shows promise in providing assistance via an improvement in the reachability of the device control space. This effect was especially pronounced for individuals who had a more limited reachable space.
Keyword: localization

Learning Car Speed Using Inertial Sensors
Authors: Maxim Freydin, Barak Or
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2205.07883
Pdf link: https://arxiv.org/pdf/2205.07883
Abstract A deep neural network (DNN) is trained to estimate the speed of a car driving in an urban area using as input a stream of measurements from a low-cost six-axis inertial measurement unit (IMU). Three hours of data was collected by driving through the city of Ashdod, Israel in a car equipped with a global navigation satellite system (GNSS) real time kinematic (RTK) positioning device and a synchronized IMU. Ground truth labels for the car speed were calculated using the position measurements obtained at the high rate of 50 [Hz]. A DNN architecture with long short-term memory layers is proposed to enable high-frequency speed estimation that accounts for previous inputs history and the nonlinear relation between speed, acceleration, and angular velocity. A simplified aided dead reckoning localization scheme is formulated to assess the trained model which provides the speed pseudo-measurement. The trained model is shown to substantially improve the position accuracy during a 4 minutes drive without the use of GNSS position updates.
Efficient Stereo Depth Estimation for Pseudo LiDAR: A Self-Supervised Approach Based on Multi-Input ResNet Encoder
Authors: Sabir Hossain, Xianke Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2205.08089
Pdf link: https://arxiv.org/pdf/2205.08089
Abstract Perception and localization are essential for autonomous delivery vehicles, mostly estimated from 3D LiDAR sensors due to their precise distance measurement capability. This paper presents a strategy to obtain the real-time pseudo point cloud instead of the laser sensor from the image sensor. We propose an approach to use different depth estimators to obtain pseudo point clouds like LiDAR to obtain better performance. Moreover, the training and validating strategy of the depth estimator has adopted stereo imagery data to estimate more accurate depth estimation as well as point cloud results. Our approach to generating depth maps outperforms on KITTI benchmark while yielding point clouds significantly faster than other approaches.
Cluster on Wheels
Authors: Yuanyuan Yang, Delin Feng, Sören Schwertfeger
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2205.08151
Pdf link: https://arxiv.org/pdf/2205.08151
Abstract This paper presents a very compact 16-node cluster that is the core of a future robot for collecting and storing massive amounts of sensor data for research on Simultaneous Localization and Mapping (SLAM). To the best of our knowledge, this is the first time that such a cluster is used in robotics. We first present the requirements and different options for computing of such a robot and then show the hardware and software of our solution in detail. The cluster consists of 16 nodes of AMD Ryzen 7 5700U CPUs with a total of 128 cores. As a system that is to be used on a Clearpath Husky robot, it is very small in size, can be operated from battery power and has all required power and networking components integrated. Stress tests on the completed cluster show that it performs well.
Keyword: transformer

Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers
Authors: Arda Sahiner, Tolga Ergen, Batu Ozturkler, John Pauly, Morteza Mardani, Mert Pilanci
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2205.08078
Pdf link: https://arxiv.org/pdf/2205.08078
Abstract Vision transformers using self-attention or its proposed alternatives have demonstrated promising results in many image related tasks. However, the underpinning inductive bias of attention is not well understood. To address this issue, this paper analyzes attention through the lens of convex duality. For the non-linear dot-product self-attention, and alternative mechanisms such as MLP-mixer and Fourier Neural Operator (FNO), we derive equivalent finite-dimensional convex problems that are interpretable and solvable to global optimality. The convex programs lead to {\it block nuclear-norm regularization} that promotes low rank in the latent feature and token dimensions. In particular, we show how self-attention networks implicitly clusters the tokens, based on their latent similarity. We conduct experiments for transferring a pre-trained transformer backbone for CIFAR-100 classification by fine-tuning a variety of convex attention heads. The results indicate the merits of the bias induced by attention compared with the existing MLP or linear heads.
MATrIX -- Modality-Aware Transformer for Information eXtraction
Authors: Thomas Delteil, Edouard Belval, Lei Chen, Luis Goncalves, Vijay Mahadevan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2205.08094
Pdf link: https://arxiv.org/pdf/2205.08094
Abstract We present MATrIX - a Modality-Aware Transformer for Information eXtraction in the Visual Document Understanding (VDU) domain. VDU covers information extraction from visually rich documents such as forms, invoices, receipts, tables, graphs, presentations, or advertisements. In these, text semantics and visual information supplement each other to provide a global understanding of the document. MATrIX is pre-trained in an unsupervised way with specifically designed tasks that require the use of multi-modal information (spatial, visual, or textual). We consider the spatial and text modalities all at once in a single token set. To make the attention more flexible, we use a learned modality-aware relative bias in the attention mechanism to modulate the attention between the tokens of different modalities. We evaluate MATrIX on 3 different datasets each with strong baselines.
ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks
Authors: Haoran You, Baopu Li, Huihong Shi, Yonggan Fu, Yingyan Lin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2205.08119
Pdf link: https://arxiv.org/pdf/2205.08119
Abstract Neural networks (NNs) with intensive multiplications (e.g., convolutions and transformers) are capable yet power hungry, impeding their more extensive deployment into resource-constrained devices. As such, multiplication-free networks, which follow a common practice in energy-efficient hardware implementation to parameterize NNs with more efficient operators (e.g., bitwise shifts and additions), have gained growing attention. However, multiplication-free networks usually under-perform their vanilla counterparts in terms of the achieved accuracy. To this end, this work advocates hybrid NNs that consist of both powerful yet costly multiplications and efficient yet less powerful operators for marrying the best of both worlds, and proposes ShiftAddNAS, which can automatically search for more accurate and more efficient NNs. Our ShiftAddNAS highlights two enablers. Specifically, it integrates (1) the first hybrid search space that incorporates both multiplication-based and multiplication-free operators for facilitating the development of both accurate and efficient hybrid NNs; and (2) a novel weight sharing strategy that enables effective weight sharing among different operators that follow heterogeneous distributions (e.g., Gaussian for convolutions vs. Laplacian for add operators) and simultaneously leads to a largely reduced supernet size and much better searched networks. Extensive experiments and ablation studies on various models, datasets, and tasks consistently validate the efficacy of ShiftAddNAS, e.g., achieving up to a +7.7% higher accuracy or a +4.9 better BLEU score compared to state-of-the-art NN, while leading to up to 93% or 69% energy and latency savings, respectively. Codes and pretrained models are available at https://github.com/RICE-EIC/ShiftAddNAS.
Efficient Unsupervised Sentence Compression by Fine-tuning Transformers with Reinforcement Learning
Authors: Demian Gholipour Ghalandari, Chris Hokamp, Georgiana Ifrim
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2205.08221
Pdf link: https://arxiv.org/pdf/2205.08221
Abstract Sentence compression reduces the length of text by removing non-essential content while preserving important facts and grammaticality. Unsupervised objective driven methods for sentence compression can be used to create customized models without the need for ground-truth training data, while allowing flexibility in the objective function(s) that are used for learning and inference. Recent unsupervised sentence compression approaches use custom objectives to guide discrete search; however, guided search is expensive at inference time. In this work, we explore the use of reinforcement learning to train effective sentence compression models that are also fast when generating predictions. In particular, we cast the task as binary sequence labelling and fine-tune a pre-trained transformer using a simple policy gradient approach. Our approach outperforms other unsupervised models while also being more efficient at inference time.
MulT: An End-to-End Multitask Learning Transformer
Authors: Deblina Bhattacharjee, Tong Zhang, Sabine Süsstrunk, Mathieu Salzmann
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2205.08303
Pdf link: https://arxiv.org/pdf/2205.08303
Abstract We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks, including depth estimation, semantic segmentation, reshading, surface normal estimation, 2D keypoint detection, and edge detection. Based on the Swin transformer model, our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads. At the heart of our approach is a shared attention mechanism modeling the dependencies across the tasks. We evaluate our model on several multitask benchmarks, showing that our MulT framework outperforms both the state-of-the art multitask convolutional neural network models and all the respective single task transformer models. Our experiments further highlight the benefits of sharing attention across all the tasks, and demonstrate that our MulT model is robust and generalizes well to new domains. Our project website is at https://ivrl.github.io/MulT/.
A Study of the Attention Abnormality in Trojaned BERTs
Authors: Weimin Lyu, Songzhu Zheng, Tengfei Ma, Chao Chen
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2205.08305
Pdf link: https://arxiv.org/pdf/2205.08305
Abstract Trojan attacks raise serious security concerns. In this paper, we investigate the underlying mechanism of Trojaned BERT models. We observe the attention focus drifting behavior of Trojaned models, i.e., when encountering an poisoned input, the trigger token hijacks the attention focus regardless of the context. We provide a thorough qualitative and quantitative analysis of this phenomenon, revealing insights into the Trojan mechanism. Based on the observation, we propose an attention-based Trojan detector to distinguish Trojaned models from clean ones. To the best of our knowledge, this is the first paper to analyze the Trojan mechanism and to develop a Trojan detector based on the transformer's attention.
Moving Stuff Around: A study on efficiency of moving documents into memory for Neural IR models
Authors: Arthur Câmara, Claudia Hauff
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2205.08343
Pdf link: https://arxiv.org/pdf/2205.08343
Abstract When training neural rankers using Large Language Models, it's expected that a practitioner would make use of multiple GPUs to accelerate the training time. By using more devices, deep learning frameworks, like PyTorch, allow the user to drastically increase the available VRAM pool, making larger batches possible when training, therefore shrinking training time. At the same time, one of the most critical processes, that is generally overlooked when running data-hungry models, is how data is managed between disk, main memory and VRAM. Most open source research implementations overlook this memory hierarchy, and instead resort to loading all documents from disk to main memory and then allowing the framework (e.g., PyTorch) to handle moving data into VRAM. Therefore, with the increasing sizes of datasets dedicated to IR research, a natural question arises: s this the optimal solution for optimizing training time? We here study how three different popular approaches to handling documents for IR datasets behave and how they scale with multiple GPUs. Namely, loading documents directly into memory, reading documents directly from text files with a lookup table and using a library for handling IR datasets (ir_datasets) differ, both in performance (i.e. samples processed per second) and memory footprint. We show that, when using the most popular libraries for neural ranker research (i.e. PyTorch and Hugging Face's Transformers), the practice of loading all documents into main memory is not always the fastest option and is not feasible for setups with more than a couple GPUs. Meanwhile, a good implementation of data streaming from disk can be faster, while being considerably more scalable. We also show how popular techniques for improving loading times, like memory pining, multiple workers, and RAMDISK usage, can reduce the training time further with minor memory overhead.
Should attention be all we need? The epistemic and ethical implications of unification in machine learning
Authors: Nic Fishman, Leif Hancox-Li
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2205.08377
Pdf link: https://arxiv.org/pdf/2205.08377
Abstract "Attention is all you need" has become a fundamental precept in machine learning research. Originally designed for machine translation, transformers and the attention mechanisms that underpin them now find success across many problem domains. With the apparent domain-agnostic success of transformers, many researchers are excited that similar model architectures can be successfully deployed across diverse applications in vision, language and beyond. We consider the benefits and risks of these waves of unification on both epistemic and ethical fronts. On the epistemic side, we argue that many of the arguments in favor of unification in the natural sciences fail to transfer over to the machine learning case, or transfer over only under assumptions that might not hold. Unification also introduces epistemic risks related to portability, path dependency, methodological diversity, and increased black-boxing. On the ethical side, we discuss risks emerging from epistemic concerns, further marginalizing underrepresented perspectives, the centralization of power, and having fewer models across more domains of application
ColonFormer: An Efficient Transformer based Method for Colon Polyp Segmentation
Authors: Nguyen Thanh Duc, Nguyen Thi Oanh, Nguyen Thi Thuy, Tran Minh Triet, Dinh Viet Sang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2205.08473
Pdf link: https://arxiv.org/pdf/2205.08473
Abstract Identifying polyps is a challenging problem for automatic analysis of endoscopic images in computer-aided clinical support systems. Models based on convolutional networks (CNN), transformers, and combinations of them have been proposed to segment polyps with promising results. However, those approaches have limitations either in modeling the local appearance of the polyps only or lack of multi-level features for spatial dependency in the decoding process. This paper proposes a novel network, namely ColonFormer, to address these limitations. ColonFormer is an encoder-decoder architecture with the capability of modeling long-range semantic information at both encoder and decoder branches. The encoder is a lightweight architecture based on transformers for modeling global semantic relations at multi scales. The decoder is a hierarchical network structure designed for learning multi-level features to enrich feature representation. Besides, a refinement module is added with a new skip connection technique to refine the boundary of polyp objects in the global map for accurate segmentation. Extensive experiments have been conducted on five popular benchmark datasets for polyp segmentation, including Kvasir, CVC-Clinic DB, CVCColonDB, EndoScene, and ETIS. Experimental results show that our ColonFormer achieve state-of-the-art performance on all benchmark datasets.
Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using Multilingual BERT
Authors: Beiduo Chen, Wu Guo, Quan Liu, Kun Tao
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2205.08497
Pdf link: https://arxiv.org/pdf/2205.08497
Abstract Multilingual BERT (mBERT), a language model pre-trained on large multilingual corpora, has impressive zero-shot cross-lingual transfer capabilities and performs surprisingly well on zero-shot POS tagging and Named Entity Recognition (NER), as well as on cross-lingual model transfer. At present, the mainstream methods to solve the cross-lingual downstream tasks are always using the last transformer layer's output of mBERT as the representation of linguistic information. In this work, we explore the complementary property of lower layers to the last transformer layer of mBERT. A feature aggregation module based on an attention mechanism is proposed to fuse the information contained in different layers of mBERT. The experiments are conducted on four zero-shot cross-lingual transfer datasets, and the proposed method obtains performance improvements on key multilingual benchmark tasks XNLI (+1.5 %), PAWS-X (+2.4 %), NER (+1.2 F1), and POS (+1.5 F1). Through the analysis of the experimental results, we prove that the layers before the last layer of mBERT can provide extra useful information for cross-lingual downstream tasks and explore the interpretability of mBERT empirically.
Vision Transformer Adapter for Dense Predictions
Authors: Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2205.08534
Pdf link: https://arxiv.org/pdf/2205.08534
Abstract This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike recent visual transformers that introduce vision-specific inductive biases into their architectures, ViT achieves inferior performance on dense prediction tasks due to lacking prior information of images. To solve this issue, we propose a Vision Transformer Adapter (ViT-Adapter), which can remedy the defects of ViT and achieve comparable performance to vision-specific models by introducing inductive biases via an additional architecture. Specifically, the backbone in our framework is a vanilla transformer that can be pre-trained with multi-modal data. When fine-tuning on downstream tasks, a modality-specific adapter is used to introduce the data and tasks' prior information into the model, making it suitable for these tasks. We verify the effectiveness of our ViT-Adapter on multiple downstream tasks, including object detection, instance segmentation, and semantic segmentation. Notably, when using HTC++, our ViT-Adapter-L yields 60.1 box AP and 52.1 mask AP on COCO test-dev, surpassing Swin-L by 1.4 box AP and 1.0 mask AP. For semantic segmentation, our ViT-Adapter-L establishes a new state-of-the-art of 60.5 mIoU on ADE20K val, 0.6 points higher than SwinV2-G. We hope that the proposed ViT-Adapter could serve as an alternative for vision-specific transformers and facilitate future research.

zhuhu00 / Paper-Daily-Notice

New submissions for Wed, 18 May 22 #165

Keyword: SLAM

Cluster on Wheels

Keyword: odometry

DynPL-SVO: A New Method Using Point and Line Features for Stereo Visual Odometry in Dynamic Scenes

Keyword: livox

Keyword: loam

Keyword: lidar

Efficient Stereo Depth Estimation for Pseudo LiDAR: A Self-Supervised Approach Based on Multi-Input ResNet Encoder

UnPWC-SVDLO: Multi-SVD on PointPWC for Unsupervised Lidar Odometry

Keyword: loop detection

Keyword: autonomous driving

Design and Implement an Enhanced Simulator for Autonomous Delivery Robot

UnPWC-SVDLO: Multi-SVD on PointPWC for Unsupervised Lidar Odometry

Landing AI on Networks: An equipment vendor viewpoint on Autonomous Driving Networks

Keyword: mapping

Functional2Structural: Cross-Modality Brain Networks Representation Learning

An Extension to Basis-Hypervectors for Learning from Circular Data in Hyperdimensional Computing

Cluster on Wheels

IIsy: Practical In-Network Classification

Deep Supervised Information Bottleneck Hashing for Cross-modal Retrieval based Computer-aided Diagnosis

A Comprehensive Study on Artificial Intelligence Algorithms to Implement Safety Using Communication Technologies

Twenty-two years since revealing cross-site scripting attacks: a systematic mapping and a comprehensive survey

Systematic Mapping Protocol: Variability Management in Dynamic Software Product Lines for Self-Adaptive Systems

Control Interface Remapping for Bias-Aware Assistive Teleoperation

Keyword: localization

Learning Car Speed Using Inertial Sensors

Efficient Stereo Depth Estimation for Pseudo LiDAR: A Self-Supervised Approach Based on Multi-Input ResNet Encoder

Cluster on Wheels

Keyword: transformer

Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

MATrIX -- Modality-Aware Transformer for Information eXtraction

ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks

Efficient Unsupervised Sentence Compression by Fine-tuning Transformers with Reinforcement Learning

MulT: An End-to-End Multitask Learning Transformer

A Study of the Attention Abnormality in Trojaned BERTs

Moving Stuff Around: A study on efficiency of moving documents into memory for Neural IR models

Should attention be all we need? The epistemic and ethical implications of unification in machine learning

ColonFormer: An Efficient Transformer based Method for Colon Polyp Segmentation

Feature Aggregation in Zero-Shot Cross-Lingual Transfer Using Multilingual BERT

Vision Transformer Adapter for Dense Predictions