Abstract
The ability to detect whether an object is a 2D or 3D object is extremely important in autonomous driving, since a detection error can have life-threatening consequences, endangering the safety of the driver, passengers, pedestrians, and others on the road. Methods proposed to distinguish between 2 and 3D objects (e.g., liveness detection methods) are not suitable for autonomous driving, because they are object dependent or do not consider the constraints associated with autonomous driving (e.g., the need for real-time decision-making while the vehicle is moving). In this paper, we present EyeDAS, a novel few-shot learning-based method aimed at securing an object detector (OD) against the threat posed by the stereoblindness syndrome (i.e., the inability to distinguish between 2D and 3D objects). We evaluate EyeDAS's real-time performance using 2,000 objects extracted from seven YouTube video recordings of street views taken by a dash cam from the driver's seat perspective. When applying EyeDAS to seven state-of-the-art ODs as a countermeasure, EyeDAS was able to reduce the 2D misclassification rate from 71.42-100% to 2.4% with a 3D misclassification rate of 0% (TPR of 1.0). We also show that EyeDAS outperforms the baseline method and achieves an AUC of over 0.999 and a TPR of 1.0 with an FPR of 0.024.
Optimized Partitioning and Priority Assignment of Real-Time Applications on Heterogeneous Platforms with Hardware Acceleration
Authors: Daniel Casini, Paolo Pazzaglia, Alessandro Biondi, Marco Di Natale
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Hardware accelerators, such as those based on GPUs and FPGAs, offer an excellent opportunity to efficiently parallelize functionalities. Recently, modern embedded platforms started being equipped with such accelerators, resulting in a compelling choice for emerging, highly computational intensive workloads, like those required by next-generation autonomous driving systems. Alongside the need for computational efficiency, such workloads are commonly characterized by real-time requirements, which need to be satisfied to guarantee the safe and correct behavior of the system. To this end, this paper proposes a holistic framework to help designers partition real-time applications on heterogeneous platforms with hardware accelerators. The proposed model is inspired by a realistic setup of an advanced driving assistance system presented in the WATERS 2019 Challenge by Bosch, further generalized to encompass a broader range of heterogeneous architectures. The resulting analysis is linearized and used to encode an optimization problem that jointly (i) guarantees timing constraints, (ii) finds a suitable task-to-core mapping, (iii) assigns a priority to each task, and (iv) selects which computations to accelerate, seeking for the most convenient trade-off between the smaller worst-case execution time provided by accelerators and synchronization and queuing delays.
Abstract
Robotic Information Gathering (RIG) relies on the uncertainty of a probabilistic model to identify critical areas for efficient data collection. Gaussian processes (GPs) with stationary kernels have been widely adopted for spatial modeling. However, real-world spatial data typically does not satisfy the assumption of stationarity, where different locations are assumed to have the same degree of variability. As a result, the prediction uncertainty does not accurately capture prediction error, limiting the success of RIG algorithms. We propose a novel family of nonstationary kernels, named the Attentive Kernel (AK), which is simple, robust, and can extend any existing kernel to a nonstationary one. We evaluate the new kernel in elevation mapping tasks, where AK provides better accuracy and uncertainty quantification over the commonly used RBF kernel and other popular nonstationary kernels. The improved uncertainty quantification guides the downstream RIG planner to collect more valuable data around the high-error area, further increasing prediction accuracy. A field experiment demonstrates that the proposed method can guide an Autonomous Surface Vehicle (ASV) to prioritize data collection in locations with high spatial variations, enabling the model to characterize the salient environmental features.
Optimized Partitioning and Priority Assignment of Real-Time Applications on Heterogeneous Platforms with Hardware Acceleration
Authors: Daniel Casini, Paolo Pazzaglia, Alessandro Biondi, Marco Di Natale
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Hardware accelerators, such as those based on GPUs and FPGAs, offer an excellent opportunity to efficiently parallelize functionalities. Recently, modern embedded platforms started being equipped with such accelerators, resulting in a compelling choice for emerging, highly computational intensive workloads, like those required by next-generation autonomous driving systems. Alongside the need for computational efficiency, such workloads are commonly characterized by real-time requirements, which need to be satisfied to guarantee the safe and correct behavior of the system. To this end, this paper proposes a holistic framework to help designers partition real-time applications on heterogeneous platforms with hardware accelerators. The proposed model is inspired by a realistic setup of an advanced driving assistance system presented in the WATERS 2019 Challenge by Bosch, further generalized to encompass a broader range of heterogeneous architectures. The resulting analysis is linearized and used to encode an optimization problem that jointly (i) guarantees timing constraints, (ii) finds a suitable task-to-core mapping, (iii) assigns a priority to each task, and (iv) selects which computations to accelerate, seeking for the most convenient trade-off between the smaller worst-case execution time provided by accelerators and synchronization and queuing delays.
Keyword: localization
Interface Networks for Failure Localization in Power Systems
Authors: Chen Liang, Alessandro Zocca, Steven H. Low, Adam Wierman
Abstract
Transmission power systems usually consist of interconnected sub-grids that are operated relatively independently. When a failure happens, it is desirable to localize its impact within the sub-grid where the failure occurs. This paper introduces three interface networks to connect sub-grids, achieving better failure localization while maintaining robust network connectivity. The proposed interface networks are validated with numerical experiments on the IEEE 118-bus test network under both DC and AC power flow models.
Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations
Authors: Negin Heravi, Ayzaan Wahid, Corey Lynch, Pete Florence, Travis Armstrong, Jonathan Tompson, Pierre Sermanet, Jeannette Bohg, Debidatta Dwibedi
Abstract
Perceptual understanding of the scene and the relationship between its different components is important for successful completion of robotic tasks. Representation learning has been shown to be a powerful technique for this, but most of the current methodologies learn task specific representations that do not necessarily transfer well to other tasks. Furthermore, representations learned by supervised methods require large labeled datasets for each task that are expensive to collect in the real world. Using self-supervised learning to obtain representations from unlabeled data can mitigate this problem. However, current self-supervised representation learning methods are mostly object agnostic, and we demonstrate that the resulting representations are insufficient for general purpose robotics tasks as they fail to capture the complexity of scenes with many components. In this paper, we explore the effectiveness of using object-aware representation learning techniques for robotic tasks. Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and is queried in two different settings: (i) policy learning and (ii) object location prediction. We show that our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object agnostic techniques as well as methods trained on raw RGB images. Our results show a 20 percent increase in performance in low data regimes (1000 trajectories) in policy training using implicit behavioral cloning (IBC). Furthermore, our method outperforms the baselines for the task of object localization in multi-object scenes.
Self-Supervised Masking for Unsupervised Anomaly Detection and Localization
Abstract
Recently, anomaly detection and localization in multimedia data have received significant attention among the machine learning community. In real-world applications such as medical diagnosis and industrial defect detection, anomalies only present in a fraction of the images. To extend the reconstruction-based anomaly detection architecture to the localized anomalies, we propose a self-supervised learning approach through random masking and then restoring, named Self-Supervised Masking (SSM) for unsupervised anomaly detection and localization. SSM not only enhances the training of the inpainting network but also leads to great improvement in the efficiency of mask prediction at inference. Through random masking, each image is augmented into a diverse set of training triplets, thus enabling the autoencoder to learn to reconstruct with masks of various sizes and shapes during training. To improve the efficiency and effectiveness of anomaly detection and localization at inference, we propose a novel progressive mask refinement approach that progressively uncovers the normal regions and finally locates the anomalous regions. The proposed SSM method outperforms several state-of-the-arts for both anomaly detection and anomaly localization, achieving 98.3% AUC on Retinal-OCT and 93.9% AUC on MVTec AD, respectively.
Scribble2D5: Weakly-Supervised Volumetric Image Segmentation via Scribble Annotations
Authors: Qiuhui Chen, Yi Hong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recently, weakly-supervised image segmentation using weak annotations like scribbles has gained great attention, since such annotations are much easier to obtain compared to time-consuming and label-intensive labeling at the pixel/voxel level. However, because scribbles lack structure information of region of interest (ROI), existing scribble-based methods suffer from poor boundary localization. Furthermore, most current methods are designed for 2D image segmentation, which do not fully leverage the volumetric information if directly applied to image slices. In this paper, we propose a scribble-based volumetric image segmentation, Scribble2D5, which tackles 3D anisotropic image segmentation and improves boundary prediction. To achieve this, we augment a 2.5D attention UNet with a proposed label propagation module to extend semantic information from scribbles and a combination of static and active boundary prediction to learn ROI's boundary and regularize its shape. Extensive experiments on three public datasets demonstrate Scribble2D5 significantly outperforms current scribble-based methods and approaches the performance of fully-supervised ones. Our code is available online.
Keyword: transformer
Improving Sequential Query Recommendation with Immediate User Feedback
Authors: Shameem A Puthiya Parambath, Christos Anagnostopoulos, Roderick Murray-Smith
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
Abstract
We propose an algorithm for next query recommendation in interactive data exploration settings, like knowledge discovery for information gathering. The state-of-the-art query recommendation algorithms are based on sequence-to-sequence learning approaches that exploit historical interaction data. We propose to augment the transformer-based causal language models for query recommendations to adapt to the immediate user feedback using multi-armed bandit (MAB) framework. We conduct a large-scale experimental study using log files from a popular online literature discovery service and demonstrate that our algorithm improves the cumulative regret substantially, with respect to the state-of-the-art transformer-based query recommendation models, which do not make use of the immediate user feedback. Our data model and source code are available at ~\url{https://anonymous.4open.science/r/exp3_ss-9985/}.
Deep Learning for Prawn Farming: Forecasting and Anomaly Detection
Authors: Joel Janek Dabrowski, Ashfaqur Rahman, Andrew Hellicar, Mashud Rana, Stuart Arnold
Abstract
We present a decision support system for managing water quality in prawn ponds. The system uses various sources of data and deep learning models in a novel way to provide 24-hour forecasting and anomaly detection of water quality parameters. It provides prawn farmers with tools to proactively avoid a poor growing environment, thereby optimising growth and reducing the risk of losing stock. This is a major shift for farmers who are forced to manage ponds by reactively correcting poor water quality conditions. To our knowledge, we are the first to apply Transformer as an anomaly detection model, and the first to apply anomaly detection in general to this aquaculture problem. Our technical contributions include adapting ForecastNet for multivariate data and adapting Transformer and the Attention model to incorporate weather forecast data into their decoders. We attain an average mean absolute percentage error of 12% for dissolved oxygen forecasts and we demonstrate two anomaly detection case studies. The system is successfully running in its second year of deployment on a commercial prawn farm.
ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation
Authors: Long Phan, Hieu Tran, Hieu Nguyen, Trieu H. Trinh
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
We present ViT5, a pretrained Transformer-based encoder-decoder model for the Vietnamese language. With T5-style self-supervised pretraining, ViT5 is trained on a large corpus of high-quality and diverse Vietnamese texts. We benchmark ViT5 on two downstream text generation tasks, Abstractive Text Summarization and Named Entity Recognition. Although Abstractive Text Summarization has been widely studied for the English language thanks to its rich and large source of data, there has been minimal research into the same task in Vietnamese, a much lower resource language. In this work, we perform exhaustive experiments on both Vietnamese Abstractive Summarization and Named Entity Recognition, validating the performance of ViT5 against many other pretrained Transformer-based encoder-decoder models. Our experiments show that ViT5 significantly outperforms existing models and achieves state-of-the-art results on Vietnamese Text Summarization. On the task of Named Entity Recognition, ViT5 is competitive against previous best results from pretrained encoder-based Transformer models. Further analysis shows the importance of context length during the self-supervised pretraining on downstream performance across different settings.
Local Attention Graph-based Transformer for Multi-target Genetic Alteration Prediction
Authors: Daniel Reisenbüchler, Sophia J. Wagner, Melanie Boxberg, Tingying Peng
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Classical multiple instance learning (MIL) methods are often based on the identical and independent distributed assumption between instances, hence neglecting the potentially rich contextual information beyond individual entities. On the other hand, Transformers with global self-attention modules have been proposed to model the interdependencies among all instances. However, in this paper we question: Is global relation modeling using self-attention necessary, or can we appropriately restrict self-attention calculations to local regimes in large-scale whole slide images (WSIs)? We propose a general-purpose local attention graph-based Transformer for MIL (LA-MIL), introducing an inductive bias by explicitly contextualizing instances in adaptive local regimes of arbitrary size. Additionally, an efficiently adapted loss function enables our approach to learn expressive WSI embeddings for the joint analysis of multiple biomarkers. We demonstrate that LA-MIL achieves state-of-the-art results in mutation prediction for gastrointestinal cancer, outperforming existing models on important biomarkers such as microsatellite instability for colorectal cancer. This suggests that local self-attention sufficiently models dependencies on par with global modules. Our implementation will be published.
Twitter-Based Gender Recognition Using Transformers
Authors: Zahra Movahedi Nia, Ali Ahmadi, Bruce Mellado, Jianhong Wu, James Orbinski, Ali Agary, Jude Dzevela Kong
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract
Social media contains useful information about people and the society that could help advance research in many different areas (e.g. by applying opinion mining, emotion/sentiment analysis, and statistical analysis) such as business and finance, health, socio-economic inequality and gender vulnerability. User demographics provide rich information that could help study the subject further. However, user demographics such as gender are considered private and are not freely available. In this study, we propose a model based on transformers to predict the user's gender from their images and tweets. We fine-tune a model based on Vision Transformers (ViT) to stratify female and male images. Next, we fine-tune another model based on Bidirectional Encoders Representations from Transformers (BERT) to recognize the user's gender by their tweets. This is highly beneficial, because not all users provide an image that indicates their gender. The gender of such users could be detected form their tweets. The combination model improves the accuracy of image and text classification models by 6.98% and 4.43%, respectively. This shows that the image and text classification models are capable of complementing each other by providing additional information to one another. We apply our method to the PAN-2018 dataset, and obtain an accuracy of 85.52%.
Keyword: SLAM
There is no result
Keyword: odometry
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: lidar
There is no result
Keyword: loop detection
There is no result
Keyword: autonomous driving
EyeDAS: Securing Perception of Autonomous Cars Against the Stereoblindness Syndrome
Optimized Partitioning and Priority Assignment of Real-Time Applications on Heterogeneous Platforms with Hardware Acceleration
Keyword: mapping
AK: Attentive Kernel for Information Gathering
Optimized Partitioning and Priority Assignment of Real-Time Applications on Heterogeneous Platforms with Hardware Acceleration
Keyword: localization
Interface Networks for Failure Localization in Power Systems
Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations
Self-Supervised Masking for Unsupervised Anomaly Detection and Localization
Scribble2D5: Weakly-Supervised Volumetric Image Segmentation via Scribble Annotations
Keyword: transformer
Improving Sequential Query Recommendation with Immediate User Feedback
Deep Learning for Prawn Farming: Forecasting and Anomaly Detection
ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation
Local Attention Graph-based Transformer for Multi-target Genetic Alteration Prediction
Twitter-Based Gender Recognition Using Transformers