Abstract
For many data mining and machine learning tasks, the quality of a similarity measure is the key for their performance. To automatically find a good similarity measure from datasets, metric learning and similarity learning are proposed and studied extensively. Metric learning will learn a Mahalanobis distance based on positive semi-definite (PSD) matrix, to measure the distances between objectives, while similarity learning aims to directly learn a similarity function without PSD constraint so that it is more attractive. Most of the existing similarity learning algorithms are online similarity learning method, since online learning is more scalable than offline learning. However, most existing online similarity learning algorithms learn a full matrix with d 2 parameters, where d is the dimension of the instances. This is clearly inefficient for high dimensional tasks due to its high memory and computational complexity. To solve this issue, we introduce several Sparse Online Relative Similarity (SORS) learning algorithms, which learn a sparse model during the learning process, so that the memory and computational cost can be significantly reduced. We theoretically analyze the proposed algorithms, and evaluate them on some real-world high dimensional datasets. Encouraging empirical results demonstrate the advantages of our approach in terms of efficiency and efficacy.
Keyword: image retrieval
Learning Regional Attention over Multi-resolution Deep Convolutional Features for Trademark Retrieval
Authors: Osman Tursun, Simon Denman, Sridha Sridharan, Clinton Fookes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Large-scale trademark retrieval is an important content-based image retrieval task. A recent study shows that off-the-shelf deep features aggregated with Regional-Maximum Activation of Convolutions (R-MAC) achieve state-of-the-art results. However, R-MAC suffers in the presence of background clutter/trivial regions and scale variance, and discards important spatial information. We introduce three simple but effective modifications to R-MAC to overcome these drawbacks. First, we propose the use of both sum and max pooling to minimise the loss of spatial information. We also employ domain-specific unsupervised soft-attention to eliminate background clutter and unimportant regions. Finally, we add multi-resolution inputs to enhance the scale-invariance of R-MAC. We evaluate these three modifications on the million-scale METU dataset. Our results show that all modifications bring non-trivial improvements, and surpass previous state-of-the-art results.
Keyword: face recognition
An Improved Real-Time Face Recognition System at Low Resolution Based on Local Binary Pattern Histogram Algorithm and CLAHE
Authors: Kamal Chandra Paul, Semih Aslan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
This research presents an improved real-time face recognition system at a low resolution of 15 pixels with pose and emotion and resolution variations. We have designed our datasets named LRD200 and LRD100, which have been used for training and classification. The face detection part uses the Viola-Jones algorithm, and the face recognition part receives the face image from the face detection part to process it using the Local Binary Pattern Histogram (LBPH) algorithm with preprocessing using contrast limited adaptive histogram equalization (CLAHE) and face alignment. The face database in this system can be updated via our custom-built standalone android app and automatic restarting of the training and recognition process with an updated database. Using our proposed algorithm, a real-time face recognition accuracy of 78.40% at 15 px and 98.05% at 45 px have been achieved using the LRD200 database containing 200 images per person. With 100 images per person in the database (LRD100) the achieved accuracies are 60.60% at 15 px and 95% at 45 px respectively. A facial deflection of about 30 degrees on either side from the front face showed an average face recognition precision of 72.25% - 81.85%. This face recognition system can be employed for law enforcement purposes, where the surveillance camera captures a low-resolution image because of the distance of a person from the camera. It can also be used as a surveillance system in airports, bus stations, etc., to reduce the risk of possible criminal threats.
Robust Backdoor Attacks against Deep Neural Networks in Real Physical World
Authors: Mingfu Xue, Can He, Shichang Sun, Jian Wang, Weiqiang Liu
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Abstract
Deep neural networks (DNN) have been widely deployed in various practical applications. However, many researches indicated that DNN is vulnerable to backdoor attacks. The attacker can create a hidden backdoor in target DNN model, and trigger the malicious behaviors by submitting specific backdoor instance. However, almost all the existing backdoor works focused on the digital domain, while few studies investigate the backdoor attacks in real physical world. Restricted to a variety of physical constrains, the performance of backdoor attacks in the real world will be severely degraded. In this paper, we propose a robust physical backdoor attack method, PTB (physical transformations for backdoors), to implement the backdoor attacks against deep learning models in the physical world. Specifically, in the training phase, we perform a series of physical transformations on these injected backdoor instances at each round of model training, so as to simulate various transformations that a backdoor may experience in real world, thus improves its physical robustness. Experimental results on the state-of-the-art face recognition model show that, compared with the methods that without PTB, the proposed attack method can significantly improve the performance of backdoor attacks in real physical world. Under various complex physical conditions, by injecting only a very small ratio (0.5%) of backdoor instances, the success rate of physical backdoor attacks with the PTB method on VGGFace is 82%, while the attack success rate of backdoor attacks without the proposed PTB method is lower than 11%. Meanwhile, the normal performance of target DNN model has not been affected. This paper is the first work on the robustness of physical backdoor attacks, and is hopeful for providing guideline for the subsequent physical backdoor works.
TransRPPG: Remote Photoplethysmography Transformer for 3D Mask Face Presentation Attack Detection
Abstract
3D mask face presentation attack detection (PAD) plays a vital role in securing face recognition systems from emergent 3D mask attacks. Recently, remote photoplethysmography (rPPG) has been developed as an intrinsic liveness clue for 3D mask PAD without relying on the mask appearance. However, the rPPG features for 3D mask PAD are still needed expert knowledge to design manually, which limits its further progress in the deep learning and big data era. In this letter, we propose a pure rPPG transformer (TransRPPG) framework for learning intrinsic liveness representation efficiently. At first, rPPG-based multi-scale spatial-temporal maps (MSTmap) are constructed from facial skin and background regions. Then the transformer fully mines the global relationship within MSTmaps for liveness representation, and gives a binary prediction for 3D mask detection. Comprehensive experiments are conducted on two benchmark datasets to demonstrate the efficacy of the TransRPPG on both intra- and cross-dataset testings. Our TransRPPG is lightweight and efficient (with only 547K parameters and 763M FLOPs), which is promising for mobile-level applications.
Keyword: imagenet
Continual Learning From Unlabeled Data Via Deep Clustering
Authors: Jiangpeng He, Fengqing Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Continual learning, a promising future learning strategy, aims to learn new tasks incrementally using less computation and memory resources instead of retraining the model from scratch whenever new task arrives. However, existing approaches are designed in supervised fashion assuming all data from new tasks have been annotated, which are not practical for many real-life applications. In this work, we introduce a new framework to make continual learning feasible in unsupervised mode by using pseudo label obtained from cluster assignments to update model. We focus on image classification task under class-incremental setting and assume no class label is provided for training in each incremental learning step. For illustration purpose, we apply k-means clustering, knowledge distillation loss and exemplar set as our baseline solution, which achieves competitive results even compared with supervised approaches on both challenging CIFAR-100 and ImageNet (ILSVRC) datasets. We also demonstrate that the performance of our baseline solution can be further improved by incorporating recently developed supervised continual learning techniques, showing great potential for our framework to minimize the gap between supervised and unsupervised continual learning.
See through Gradients: Image Batch Recovery via GradInversion
Authors: Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Training deep neural networks requires gradient estimation from data batches to update parameters. Gradients per parameter are averaged over a set of data and this has been presumed to be safe for privacy-preserving training in joint, collaborative, and federated learning applications. Prior work only showed the possibility of recovering input data given gradients under very restrictive conditions - a single input point, or a network with no non-linearities, or a small 32x32 px input batch. Therefore, averaging gradients over larger batches was thought to be safe. In this work, we introduce GradInversion, using which input images from a larger batch (8 - 48 images) can also be recovered for large networks such as ResNets (50 layers), on complex datasets such as ImageNet (1000 classes, 224x224 px). We formulate an optimization task that converts random noise into natural images, matching gradients while regularizing image fidelity. We also propose an algorithm for target class label recovery given gradients. We further propose a group consistency regularization framework, where multiple agents starting from different random seeds work together to find an enhanced reconstruction of original data batch. We show that gradients encode a surprisingly large amount of information, such that all the individual images can be recovered with high fidelity via GradInversion, even for complex datasets, deep networks, and large batch sizes.
Keyword: augmentation
A Better-Than-2 Approximation for Weighted Tree Augmentation
Authors: Vera Traub, Rico Zenklusen
Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
Abstract
We present an approximation algorithm for Weighted Tree Augmentation with approximation factor $1+\ln 2 + \varepsilon < 1.7$. This is the first algorithm beating the longstanding factor of $2$, which can be achieved through many standard techniques.
On the Robustness of Goal Oriented Dialogue Systems to Real-world Noise
Authors: Jason Krone, Sailik Sengupta, Saab Mansoor
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Goal oriented dialogue systems, that interact in real-word environments, often encounter noisy data. In this work, we investigate how robust goal oriented dialogue systems are to noisy data. Specifically, our analysis considers intent classification (IC) and slot labeling (SL) models that form the basis of most dialogue systems. We collect a test-suite for six common phenomena found in live human-to-bot conversations (abbreviations, casing, misspellings, morphological variants, paraphrases, and synonyms) and show that these phenomena can degrade the IC/SL performance of state-of-the-art BERT based models. Through the use of synthetic data augmentation, we are improve IC/SL model's robustness to real-world noise by +11.5 for IC and +17.3 points for SL on average across noise types. We make our suite of noisy test data public to enable further research into the robustness of dialog systems.
A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation
Abstract
Recently, significant progress has been made on semantic segmentation. However, the success of supervised semantic segmentation typically relies on a large amount of labelled data, which is time-consuming and costly to obtain. Inspired by the success of semi-supervised learning methods in image classification, here we propose a simple yet effective semi-supervised learning framework for semantic segmentation. We demonstrate that the devil is in the details: a set of simple design and training techniques can collectively improve the performance of semi-supervised semantic segmentation significantly. Previous works [3, 27] fail to employ strong augmentation in pseudo label learning efficiently, as the large distribution change caused by strong augmentation harms the batch normalisation statistics. We design a new batch normalisation, namely distribution-specific batch normalisation (DSBN) to address this problem and demonstrate the importance of strong augmentation for semantic segmentation. Moreover, we design a self correction loss which is effective in noise resistance. We conduct a series of ablation studies to show the effectiveness of each component. Our method achieves state-of-the-art results in the semi-supervised settings on the Cityscapes and Pascal VOC datasets.
Memory Capacity of Neural Turing Machines with Matrix Representation
Abstract
It is well known that recurrent neural networks (RNNs) faced limitations in learning long-term dependencies that have been addressed by memory structures in long short-term memory (LSTM) networks. Matrix neural networks feature matrix representation which inherently preserves the spatial structure of data and has the potential to provide better memory structures when compared to canonical neural networks that use vector representation. Neural Turing machines (NTMs) are novel RNNs that implement notion of programmable computers with neural network controllers to feature algorithms that have copying, sorting, and associative recall tasks. In this paper, we study the augmentation of memory capacity with a matrix representation of RNNs and NTMs (MatNTMs). We investigate if matrix representation has a better memory capacity than the vector representations in conventional neural networks. We use a probabilistic model of the memory capacity using Fisher information and investigate how the memory capacity for matrix representation networks are limited under various constraints, and in general, without any constraints. In the case of memory capacity without any constraints, we found that the upper bound on memory capacity to be $N^2$ for an $N\times N$ state matrix. The results from our experiments using synthetic algorithmic tasks show that MatNTMs have a better learning capacity when compared to its counterparts.
Keyword: self-supervised
Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding
Authors: Vladan Stojnić (1), Vladimir Risojević (1) ((1) Faculty of Electrical Engineering, University of Banja Luka, Bosnia and Herzegovina)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In recent years self-supervised learning has emerged as a promising candidate for unsupervised representation learning. In the visual domain its applications are mostly studied in the context of images of natural scenes. However, its applicability is especially interesting in specific areas, like remote sensing and medicine, where it is hard to obtain huge amounts of labeled data. In this work, we conduct an extensive analysis of the applicability of self-supervised learning in remote sensing image classification. We analyze the influence of the number and domain of images used for self-supervised pre-training on the performance on downstream tasks. We show that, for the downstream task of remote sensing image classification, using self-supervised pre-training on remote sensing images can give better results than using supervised pre-training on images of natural scenes. Besides, we also show that self-supervised pre-training can be easily extended to multispectral images producing even better results on our downstream tasks.
Self-supervised Learning of 3D Object Understanding by Data Association and Landmark Estimation for Image Sequence
Authors: Hyeonwoo Yu, Jean Oh
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we propose a self-supervised learningmethod for multi-object pose estimation. 3D object under-standing from 2D image is a challenging task that infers ad-ditional dimension from reduced-dimensional information.In particular, the estimation of the 3D localization or orien-tation of an object requires precise reasoning, unlike othersimple clustering tasks such as object classification. There-fore, the scale of the training dataset becomes more cru-cial. However, it is challenging to obtain large amount of3D dataset since achieving 3D annotation is expensive andtime-consuming. If the scale of the training dataset can beincreased by involving the image sequence obtained fromsimple navigation, it is possible to overcome the scale lim-itation of the dataset and to have efficient adaptation tothe new environment. However, when the self annotation isconducted on single image by the network itself, trainingperformance of the network is bounded to the self perfor-mance. Therefore, we propose a strategy to exploit multipleobservations of the object in the image sequence in orderto surpass the self-performance: first, the landmarks for theglobal object map are estimated through network predic-tion and data association, and the corrected annotation fora single frame is obtained. Then, network fine-tuning is con-ducted including the dataset obtained by self-annotation,thereby exceeding the performance boundary of the networkitself. The proposed method was evaluated on the KITTIdriving scene dataset, and we demonstrate the performanceimprovement in the pose estimation of multi-object in 3D space.
Variational Co-embedding Learning for Attributed Network Clustering
Abstract
Recent works for attributed network clustering utilize graph convolution to obtain node embeddings and simultaneously perform clustering assignments on the embedding space. It is effective since graph convolution combines the structural and attributive information for node embedding learning. However, a major limitation of such works is that the graph convolution only incorporates the attribute information from the local neighborhood of nodes but fails to exploit the mutual affinities between nodes and attributes. In this regard, we propose a variational co-embedding learning model for attributed network clustering (VCLANC). VCLANC is composed of dual variational auto-encoders to simultaneously embed nodes and attributes. Relying on this, the mutual affinity information between nodes and attributes could be reconstructed from the embedding space and served as extra self-supervised knowledge for representation learning. At the same time, trainable Gaussian mixture model is used as priors to infer the node clustering assignments. To strengthen the performance of the inferred clusters, we use a mutual distance loss on the centers of the Gaussian priors and a clustering assignment hardening loss on the node embeddings. Experimental results on four real-world attributed network datasets demonstrate the effectiveness of the proposed VCLANC for attributed network clustering.
Self-Supervised Exploration via Latent Bayesian Surprise
Authors: Pietro Mazzaglia, Ozan Catal, Tim Verbelen, Bart Dhoedt
Abstract
Training with Reinforcement Learning requires a reward function that is used to guide the agent towards achieving its objective. However, designing smooth and well-behaved rewards is in general not trivial and requires significant human engineering efforts. Generating rewards in self-supervised way, by inspiring the agent with an intrinsic desire to learn and explore the environment, might induce more general behaviours. In this work, we propose a curiosity-based bonus as intrinsic reward for Reinforcement Learning, computed as the Bayesian surprise with respect to a latent state variable, learnt by reconstructing fixed random features. We extensively evaluate our model by measuring the agent's performance in terms of environment exploration, for continuous tasks, and looking at the game scores achieved, for video games. Our model is computationally cheap and empirically shows state-of-the-art performance on several problems. Furthermore, experimenting on an environment with stochastic actions, our approach emerged to be the most resilient to simple stochasticity. Further visualization is available on the project webpage.(https://lbsexploration.github.io/)
Self-supervised Video Object Segmentation by Motion Grouping
Authors: Charig Yang, Hala Lamdouar, Erika Lu, Andrew Zisserman, Weidi Xie
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Animals have evolved highly functional visual systems to understand motion, assisting perception even under complex environments. In this paper, we work towards developing a computer vision system able to segment objects by exploiting motion cues, i.e. motion segmentation. We make the following contributions: First, we introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background. Second, we train the architecture in a self-supervised manner, i.e. without using any manual annotations. Third, we analyze several critical components of our method and conduct thorough ablation studies to validate their necessity. Fourth, we evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59). Despite using only optical flow as input, our approach achieves superior or comparable results to previous state-of-the-art self-supervised methods, while being an order of magnitude faster. We additionally evaluate on a challenging camouflage dataset (MoCA), significantly outperforming the other self-supervised approaches, and comparing favourably to the top supervised approach, highlighting the importance of motion cues, and the potential bias towards visual appearance in existing video segmentation models.
Keyword: generalization
Web Question Answering with Neurosymbolic Program Synthesis
Abstract
In this paper, we propose a new technique based on program synthesis for extracting information from webpages. Given a natural language query and a few labeled webpages, our method synthesizes a program that can be used to extract similar types of information from other unlabeled webpages. To handle websites with diverse structure, our approach employs a neurosymbolic DSL that incorporates both neural NLP models as well as standard language constructs for tree navigation and string manipulation. We also propose an optimal synthesis algorithm that generates all DSL programs that achieve optimal F1 score on the training examples. Our synthesis technique is compositional, prunes the search space by exploiting a monotonicity property of the DSL, and uses transductive learning to select programs with good generalization power. We have implemented these ideas in a new tool called WebQA and evaluate it on 25 different tasks across multiple domains. Our experiments show that WebQA significantly outperforms existing tools such as state-of-the-art question answering models and wrapper induction systems.
Attentive Max Feature Map for Acoustic Scene Classification with Joint Learning considering the Abstraction of Classes
Authors: Hye-jin Shim, Ju-ho Kim, Jee-weon Jung, Ha-Jin Yu
Abstract
The attention mechanism has been widely adopted in acoustic scene classification. However, we find that during the process of attention exclusively emphasizing information, it tends to excessively discard information although improving the performance. We propose a mechanism referred to as the attentive max feature map which combines two effective techniques, attention and max feature map, to further elaborate the attention mechanism and mitigate the abovementioned phenomenon. Furthermore, we explore various joint learning methods that utilize additional labels originally generated for subtask B (3-classes) on top of existing labels for subtask A (10-classes) of the DCASE2020 challenge. We expect that using two kinds of labels simultaneously would be helpful because the labels of the two subtasks differ in their degree of abstraction. Applying two proposed techniques, our proposed system achieves state-of-the-art performance among single systems on subtask A. In addition, because the model has a complexity comparable to subtask B's requirement, it shows the possibility of developing a system that fulfills the requirements of both subtasks; generalization on multiple devices and low-complexity.
Regularizing Models via Pointwise Mutual Information for Named Entity Recognition
Authors: Minbyul Jeong, Jaewoo Kang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
In Named Entity Recognition (NER), pre-trained language models have been overestimated by focusing on dataset biases to solve current benchmark datasets. However, these biases hinder generalizability which is necessary to address real-world situations such as weak name regularity and plenty of unseen mentions. To alleviate the use of dataset biases and make the models fully exploit data, we propose a debiasing method that our bias-only model can be replaced with a Pointwise Mutual Information (PMI) to enhance generalization ability while outperforming an in-domain performance. Our approach enables to debias highly correlated word and labels in the benchmark datasets; reflect informative statistics via subword frequency; alleviates a class imbalance between positive and negative examples. For long-named and complex-structure entities, our method can predict these entities through debiasing on conjunction or special characters. Extensive experiments on both general and biomedical domains demonstrate the effectiveness and generalization capabilities of the PMI.
Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing
Authors: Akshat Shrivastava, Pierce Chuang, Arun Babu, Shrey Desai, Abhinav Arora, Alexander Zotov, Ahmed Aly
Abstract
An effective recipe for building seq2seq, non-autoregressive, task-oriented parsers to map utterances to semantic frames proceeds in three steps: encoding an utterance $x$, predicting a frame's length |y|, and decoding a |y|-sized frame with utterance and ontology tokens. Though empirically strong, these models are typically bottle necked by length prediction, as even small inaccuracies change the syntactic and semantic characteristics of resulting frames. In our work, we propose span pointer networks, non-autoregressive parsers which shift the decoding task from text generation to span prediction; that is, when imputing utterance spans into frame slots, our model produces endpoints (e.g., [i, j]) as opposed to text (e.g., "6pm"). This natural quantization of the output space reduces the variability of gold frames, therefore improving length prediction and, ultimately, exact match. Furthermore, length prediction is now responsible for frame syntax and the decoder is responsible for frame semantics, resulting in a coarse-to-fine model. We evaluate our approach on several task-oriented semantic parsing datasets. Notably, we bridge the quality gap between non-autogressive and autoregressive parsers, achieving 87 EM on TOPv2 (Chen et al. 2020). Furthermore,due to our more consistent gold frames, we show strong improvements in model generalization in both cross-domain and cross-lingual transfer in low-resource settings. Finally, due to our diminished output vocabulary, we observe 70% reduction in latency and 83% reduction in memory at beam size 5 compared to prior non-autoregressive parsers.
Abstract
Markov chains are the de facto finite-state model for stochastic dynamical systems, and Markov decision processes (MDPs) extend Markov chains by incorporating non-deterministic behaviors. Given an MDP and rewards on states, a classical optimization criterion is the maximal expected total reward where the MDP stops after $T$ steps, which can be computed by a simple dynamic programming algorithm. We consider a natural generalization of the problem where the stopping times can be chosen according to a probability distribution, such that the expected stopping time is $T$, to optimize the expected total reward. Quite surprisingly we establish inter-reducibility of the expected stopping-time problem for Markov chains with the Positivity problem (which is related to the well-known Skolem problem), for which establishing either decidability or undecidability would be a major breakthrough. Given the hardness of the exact problem, we consider the approximate version of the problem: we show that it can be solved in exponential time for Markov chains and in exponential space for MDPs.
Rehearsal revealed: The limits and merits of revisiting samples in continual learning
Authors: Eli Verwimp, Matthias De Lange, Tinne Tuytelaars
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Learning from non-stationary data streams and overcoming catastrophic forgetting still poses a serious challenge for machine learning research. Rather than aiming to improve state-of-the-art, in this work we provide insight into the limits and merits of rehearsal, one of continual learning's most established methods. We hypothesize that models trained sequentially with rehearsal tend to stay in the same low-loss region after a task has finished, but are at risk of overfitting on its sample memory, hence harming generalization. We provide both conceptual and strong empirical evidence on three benchmarks for both behaviors, bringing novel insights into the dynamics of rehearsal and continual learning in general. Finally, we interpret important continual learning works in the light of our findings, allowing for a deeper understanding of their successes.
Abstract
In this paper, it is shown that an auto-encoder using optimal reconstruction significantly outperforms a conventional auto-encoder. Optimal reconstruction uses the conditional mean of the input given the features, under a maximum entropy prior distribution. The optimal reconstruction network, which is called deterministic projected belied network (D-PBN), resembles a standard reconstruction network, but with special non-linearities that mist be iteratively solved. The method, which can be seen as a generalization of maximum entropy image reconstruction, extends to multiple layers. In experiments, mean square reconstruction error reduced by up to a factor of two. The performance improvement diminishes for deeper networks, or for input data with unconstrained values (Gaussian assumption).
Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations
Authors: Jonathan Herzig, Peter Shaw, Ming-Wei Chang, Kelvin Guu, Panupong Pasupat, Yuan Zhang
Abstract
Sequence-to-sequence (seq2seq) models are prevalent in semantic parsing, but have been found to struggle at out-of-distribution compositional generalization. While specialized model architectures and pre-training of seq2seq models have been proposed to address this issue, the former often comes at the cost of generality and the latter only shows limited success. In this paper, we study the impact of intermediate representations on compositional generalization in pre-trained seq2seq models, without changing the model architecture at all, and identify key aspects for designing effective representations. Instead of training to directly map natural language to an executable form, we map to a reversible or lossy intermediate representation that has stronger structural correspondence with natural language. The combination of our proposed intermediate representations and pre-trained models is surprisingly effective, where the best combinations obtain a new state-of-the-art on CFQ (+14.8 accuracy points) and on the template-splits of three text-to-SQL datasets (+15.0 to +19.4 accuracy points). This work highlights that intermediate representations provide an important and potentially overlooked degree of freedom for improving the compositional generalization abilities of pre-trained seq2seq models.
Towards Deconfounding the Influence of Subject's Demographic Characteristics in Question Answering
Authors: Maharshi Gor, Kellie Webster, Jordan Boyd-Graber
Abstract
Question Answering (QA) tasks are used as benchmarks of general machine intelligence. Therefore, robust QA evaluation is critical, and metrics should indicate how models will answer any question. However, major QA datasets have skewed distributions over gender, profession, and nationality. Despite that skew, models generalize -- we find little evidence that accuracy is lower for people based on gender or nationality. Instead, there is more variation in question topic and question ambiguity. Adequately accessing the generalization of QA systems requires more representative datasets.
Demystify Optimization Challenges in Multilingual Transformers
Authors: Xian Li, Hongyu Gong
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Multilingual Transformer improves parameter efficiency and crosslingual transfer. How to effectively train multilingual models has not been well studied. Using multilingual machine translation as a testbed, we study optimization challenges from loss landscape and parameter plasticity perspectives. We found that imbalanced training data poses task interference between high and low resource languages, characterized by nearly orthogonal gradients for major parameters and the optimization trajectory being mostly dominated by high resource. We show that local curvature of the loss surface affects the degree of interference, and existing heuristics of data subsampling implicitly reduces the sharpness, although still face a trade-off between high and low resource languages. We propose a principled multi-objective optimization algorithm, Curvature Aware Task Scaling (CATS), which improves both optimization and generalization especially for low resource. Experiments on TED, WMT and OPUS-100 benchmarks demonstrate that CATS advances the Pareto front of accuracy while being efficient to apply to massive multilingual settings at the scale of 100 languages.
SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements
Authors: Qianli Ma, Shunsuke Saito, Jinlong Yang, Siyu Tang, Michael J. Black
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
Learning to model and reconstruct humans in clothing is challenging due to articulation, non-rigid deformation, and varying clothing types and topologies. To enable learning, the choice of representation is the key. Recent work uses neural networks to parameterize local surface elements. This approach captures locally coherent geometry and non-planar details, can deal with varying topology, and does not require registered training data. However, naively using such methods to model 3D clothed humans fails to capture fine-grained local deformations and generalizes poorly. To address this, we present three key innovations: First, we deform surface elements based on a human body model such that large-scale deformations caused by articulation are explicitly separated from topological changes and local clothing deformations. Second, we address the limitations of existing neural surface elements by regressing local geometry from local features, significantly improving the expressiveness. Third, we learn a pose embedding on a 2D parameterization space that encodes posed body geometry, improving generalization to unseen poses by reducing non-local spurious correlations. We demonstrate the efficacy of our surface representation by learning models of complex clothing from point clouds. The clothing can change topology and deviate from the topology of the body. Once learned, we can animate previously unseen motions, producing high-quality point clouds, from which we generate realistic images with neural rendering. We assess the importance of each technical contribution and show that our approach outperforms the state-of-the-art methods in terms of reconstruction accuracy and inference time. The code is available for research purposes at https://qianlim.github.io/SCALE .
Keyword: detection
An Explainable Machine Learning-based Network Intrusion Detection System for Enabling Generalisability in Securing IoT Networks
Authors: Mohanad Sarhan, Siamak Layeghy, Marius Portmann
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Abstract
Machine Learning (ML)-based network intrusion detection systems bring many benefits for enhancing the security posture of an organisation. Many systems have been designed and developed in the research community, often achieving a perfect detection rate when evaluated using certain datasets. However, the high number of academic research has not translated into practical deployments. There are a number of causes behind the lack of production usage. This paper tightens the gap by evaluating the generalisability of a common feature set to different network environments and attack types. Therefore, two feature sets (NetFlow and CICFlowMeter) have been evaluated across three datasets, i.e. CSE-CIC-IDS2018, BoT-IoT, and ToN-IoT. The results showed that the NetFlow feature set enhances the two ML models' detection accuracy in detecting intrusions across different datasets. In addition, due to the complexity of the learning models, the SHAP, an explainable AI methodology, has been adopted to explain and interpret the classification decisions of two ML models. The Shapley values of the features have been analysed across multiple datasets to determine the influence contributed by each feature towards the final ML prediction.
Optimal Positioning of PMUs for Fault Detection and Localization in Active Distribution Networks
Authors: F. Conte, B. Gabriele, G.-P. Schiapparelli, F. Silvestro, C. Bossi, M. Cabiati
Abstract
This paper considers the problem of fault detection and localization in active distribution networks using PMUs. The proposed algorithm consists in computing a set of weighted least squares state estimates whose results are used to detect, characterize and localize the occurrence of a fault. Moreover, a criteria to minimize the number of PMUs required to correctly perform the proposed algorithm is defined. Such a criteria, based on system observability conditions, allows the design of an optimization problem to set the positions of PMUs along the grid, in order to get the desired fault localization resolution. The performances of the strategy are tested via simulations on a benchmark distribution system.
Learning structure-aware semantic segmentation with image-level supervision
Authors: Jiawei Liu, Jing Zhang, Yicong Hong, Nick Barnes
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Compared with expensive pixel-wise annotations, image-level labels make it possible to learn semantic segmentation in a weakly-supervised manner. Within this pipeline, the class activation map (CAM) is obtained and further processed to serve as a pseudo label to train the semantic segmentation model in a fully-supervised manner. In this paper, we argue that the lost structure information in CAM limits its application in downstream semantic segmentation, leading to deteriorated predictions. Furthermore, the inconsistent class activation scores inside the same object contradicts the common sense that each region of the same object should belong to the same semantic category. To produce sharp prediction with structure information, we introduce an auxiliary semantic boundary detection module, which penalizes the deteriorated predictions. Furthermore, we adopt smoothness loss to encourage prediction inside the object to be consistent. Experimental results on the PASCAL-VOC dataset illustrate the effectiveness of the proposed solution.
An Improved Real-Time Face Recognition System at Low Resolution Based on Local Binary Pattern Histogram Algorithm and CLAHE
Authors: Kamal Chandra Paul, Semih Aslan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
This research presents an improved real-time face recognition system at a low resolution of 15 pixels with pose and emotion and resolution variations. We have designed our datasets named LRD200 and LRD100, which have been used for training and classification. The face detection part uses the Viola-Jones algorithm, and the face recognition part receives the face image from the face detection part to process it using the Local Binary Pattern Histogram (LBPH) algorithm with preprocessing using contrast limited adaptive histogram equalization (CLAHE) and face alignment. The face database in this system can be updated via our custom-built standalone android app and automatic restarting of the training and recognition process with an updated database. Using our proposed algorithm, a real-time face recognition accuracy of 78.40% at 15 px and 98.05% at 45 px have been achieved using the LRD200 database containing 200 images per person. With 100 images per person in the database (LRD100) the achieved accuracies are 60.60% at 15 px and 95% at 45 px respectively. A facial deflection of about 30 degrees on either side from the front face showed an average face recognition precision of 72.25% - 81.85%. This face recognition system can be employed for law enforcement purposes, where the surveillance camera captures a low-resolution image because of the distance of a person from the camera. It can also be used as a surveillance system in airports, bus stations, etc., to reduce the risk of possible criminal threats.
Phase noise in communication systems: from measures to models
Authors: Amina Piemontese, Giulio Colavolpe, Thomas Eriksson
Abstract
We consider the phase noise affecting communication systems, where local oscillators are employed to obtain a reference signal for frequency and timing synchronizations. We try to fill the gap between measurements and analytical models, in case of free-running and phase-locked oscillators. In particular, we propose a closed form expression for the power spectral density of the phase noise described by few parameters that are explicitly linked to the oscillator measurements. We also propose an analytical model for the oscillator power spectral density. We then consider the discrete-time domain, and analyse the most common discrete-time phase noise channel model with reference to the measurements parameters and to the system bandwidth. In this context, we show that the proposed models for power spectral densities lead to analytical models for the discrete-time phase noise signal, that can be used for the design of estimation/detection algorithms, for performance evaluation, or simply for fast simulations.
Weakly Supervised Video Anomaly Detection via Center-guided Discriminative Learning
Authors: Boyang Wan, Yuming Fang, Xue Xia, Jiajie Mei
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Anomaly detection in surveillance videos is a challenging task due to the diversity of anomalous video content and duration. In this paper, we consider video anomaly detection as a regression problem with respect to anomaly scores of video clips under weak supervision. Hence, we propose an anomaly detection framework, called Anomaly Regression Net (AR-Net), which only requires video-level labels in training stage. Further, to learn discriminative features for anomaly detection, we design a dynamic multiple-instance learning loss and a center loss for the proposed AR-Net. The former is used to enlarge the inter-class distance between anomalous and normal instances, while the latter is proposed to reduce the intra-class distance of normal instances. Comprehensive experiments are performed on a challenging benchmark: ShanghaiTech. Our method yields a new state-of-the-art result for video anomaly detection on ShanghaiTech dataset
Continual Learning for Fake Audio Detection
Authors: Haoxin Ma, Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengkun Tian, Chenglong Wang
Abstract
Fake audio attack becomes a major threat to the speaker verification system. Although current detection approaches have achieved promising results on dataset-specific scenarios, they encounter difficulties on unseen spoofing data. Fine-tuning and retraining from scratch have been applied to incorporate new data. However, fine-tuning leads to performance degradation on previous data. Retraining takes a lot of time and computation resources. Besides, previous data are unavailable due to privacy in some situations. To solve the above problems, this paper proposes detecting fake without forgetting, a continual-learning-based method, to make the model learn new spoofing attacks incrementally. A knowledge distillation loss is introduced to loss function to preserve the memory of original model. Supposing the distribution of genuine voice is consistent among different scenarios, an extra embedding similarity loss is used as another constraint to further do a positive sample alignment. Experiments are conducted on the ASVspoof2019 dataset. The results show that our proposed method outperforms fine-tuning by the relative reduction of average equal error rate up to 81.62%.
OneLog: Towards End-to-End Training in Software Log Anomaly Detection
Abstract
In recent years, with the growth of online services and IoT devices, software log anomaly detection has become a significant concern for both academia and industry. However, at the time of writing this paper, almost all contributions to the log anomaly detection task, follow the same traditional architecture based on parsing, vectorizing, and classifying. This paper proposes OneLog, a new approach that uses a large deep model based on instead of multiple small components. OneLog utilizes a character-based convolutional neural network (CNN) originating from traditional NLP tasks. This allows the model to take advantage of multiple datasets at once and take advantage of numbers and punctuations, which were removed in previous architectures. We evaluate OneLog using four open data sets Hadoop Distributed File System (HDFS), BlueGene/L (BGL), Hadoop, and OpenStack. We evaluate our model with single and multi-project datasets. Additionally, we evaluate robustness with synthetically evolved datasets and ahead-of-time anomaly detection test that indicates capabilities to predict anomalies before occurring. To the best of our knowledge, our multi-project model outperforms state-of-the-art methods in HDFS, Hadoop, and BGL datasets, respectively setting getting F1 scores of 99.99, 99.99, and 99.98. However, OneLog's performance on the Openstack is unsatisfying with F1 score of only 21.18. Furthermore, Onelogs performance suffers very little from noise showing F1 scores of 99.95, 99.92, and 99.98 in HDFS, Hadoop, and BGL.
SDN-Based Intrusion Detection System for Early Detection and Mitigation of DDoS Attacks
Authors: Pedro Manso, Jose Moura, Carlos Serrao
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Abstract
The current paper addresses relevant network security vulnerabilities introduced by network devices within the emerging paradigm of Internet of Things (IoT) as well as the urgent need to mitigate the negative effects of some types of Distributed Denial of Service (DDoS) attacks that try to explore those security weaknesses. We design and implement a Software-Defined Intrusion Detection System (IDS) that reactively impairs the attacks at its origin, ensuring the normal operation of the network infrastructure. Our proposal includes an IDS that automatically detects several DDoS attacks, and then as an attack is detected, it notifies a Software Defined Networking (SDN) controller. The current proposal also downloads some convenient traffic forwarding decisions from the SDN controller to network devices. The evaluation results suggest that our proposal timely detects several types of cyber-attacks based on DDoS, mitigates their negative impacts on the network performance, and ensures the correct data delivery of normal traffic. Our work sheds light on the programming relevance over an abstracted view of the network infrastructure to timely detect a Botnet exploitation, mitigate malicious traffic at its source, and protect benign traffic.
Microfluidic-based Bacterial Molecular Computing on a Chip
Authors: Daniel P. Martins, Michael Taynnan Barros, Benjamin O'Sullivan, Ian Seymour, Alan O'Riordan, Lee Coffey, Joseph Sweeney, Sasitharan Balasubramaniam
Subjects: Emerging Technologies (cs.ET); Signal Processing (eess.SP); Biomolecules (q-bio.BM)
Abstract
Biocomputing systems based on engineered bacteria can lead to novel tools for environmental monitoring and detection of metabolic diseases. In this paper, we propose a Bacterial Molecular Computing on a Chip (BMCoC) using microfluidic and electrochemical sensing technologies. The computing can be flexibly integrated into the chip, but we focus on engineered bacterial AND Boolean logic gate and ON-OFF switch sensors that produces secondary signals to change the pH and dissolved oxygen concentrations. We present a prototype with experimental results that shows the electrochemical sensors can detect small pH and dissolved oxygen concentration changes created by the engineered bacterial populations' molecular signals. Additionally, we present a theoretical model analysis of the BMCoC computation reliability when subjected to unwanted effects, i.e., molecular signal delays and noise, and electrochemical sensors threshold settings that are based on either standard or blind detectors. Our numerical analysis found that the variations in the production delay and the molecular output signal concentration can impact on the computation reliability for the AND logic gate and ON-OFF switch. The molecular communications of synthetic engineered cells for logic gates integrated with sensing systems can lead to a new breed of biochips that can be used for numerous diagnostic applications.
UIT-E10dot3 at SemEval-2021 Task 5: Toxic Spans Detection with Named Entity Recognition and Question-Answering Approaches
Authors: Phu Gia Hoang, Luan Thanh Nguyen, Kiet Van Nguyen
Abstract
The increment of toxic comments on online space is causing tremendous effects on other vulnerable users. For this reason, considerable efforts are made to deal with this, and SemEval-2021 Task 5: Toxic Spans Detection is one of those. This task asks competitors to extract spans that have toxicity from the given texts, and we have done several analyses to understand its structure before doing experiments. We solve this task by two approaches, Named Entity Recognition with spaCy library and Question-Answering with RoBERTa combining with ToxicBERT, and the former gains the highest F1-score of 66.99%.
MM-Rec: Multimodal News Recommendation
Authors: Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang
Abstract
Accurate news representation is critical for news recommendation. Most of existing news representation methods learn news representations only from news texts while ignore the visual information in news like images. In fact, users may click news not only because of the interest in news titles but also due to the attraction of news images. Thus, images are useful for representing news and predicting user behaviors. In this paper, we propose a multimodal news recommendation method, which can incorporate both textual and visual information of news to learn multimodal news representations. We first extract region-of-interests (ROIs) from news images via objective detection. Then we use a pre-trained visiolinguistic model to encode both news texts and news image ROIs and model their inherent relatedness using co-attentional Transformers. In addition, we propose a crossmodal candidate-aware attention network to select relevant historical clicked news for accurate user modeling by measuring the crossmodal relatedness between clicked news and candidate news. Experiments validate that incorporating multimodal news information can effectively improve news recommendation.
Ransomware Detection Using Deep Learning in the SCADA System of Electric Vehicle Charging Station
Authors: Manoj Basnet, Subash Poudyal, Mohd. Hasan Ali, Dipankar Dasgupta
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Abstract
The Supervisory control and data acquisition (SCADA) systems have been continuously leveraging the evolution of network architecture, communication protocols, next-generation communication techniques (5G, 6G, Wi-Fi 6), and the internet of things (IoT). However, SCADA system has become the most profitable and alluring target for ransomware attackers. This paper proposes the deep learning-based novel ransomware detection framework in the SCADA controlled electric vehicle charging station (EVCS) with the performance analysis of three deep learning algorithms, namely deep neural network (DNN), 1D convolution neural network (CNN), and long short-term memory (LSTM) recurrent neural network. All three-deep learning-based simulated frameworks achieve around 97% average accuracy (ACC), more than 98% of the average area under the curve (AUC), and an average F1-score under 10-fold stratified cross-validation with an average false alarm rate (FAR) less than 1.88%. Ransomware driven distributed denial of service (DDoS) attack tends to shift the SOC profile by exceeding the SOC control thresholds. The severity has been found to increase as the attack progress and penetration increases. Also, ransomware driven false data injection (FDI) attack has the potential to damage the entire BES or physical system by manipulating the SOC control thresholds. It's a design choice and optimization issue that a deep learning algorithm can deploy based on the tradeoffs between performance metrics.
TransRPPG: Remote Photoplethysmography Transformer for 3D Mask Face Presentation Attack Detection
Abstract
3D mask face presentation attack detection (PAD) plays a vital role in securing face recognition systems from emergent 3D mask attacks. Recently, remote photoplethysmography (rPPG) has been developed as an intrinsic liveness clue for 3D mask PAD without relying on the mask appearance. However, the rPPG features for 3D mask PAD are still needed expert knowledge to design manually, which limits its further progress in the deep learning and big data era. In this letter, we propose a pure rPPG transformer (TransRPPG) framework for learning intrinsic liveness representation efficiently. At first, rPPG-based multi-scale spatial-temporal maps (MSTmap) are constructed from facial skin and background regions. Then the transformer fully mines the global relationship within MSTmaps for liveness representation, and gives a binary prediction for 3D mask detection. Comprehensive experiments are conducted on two benchmark datasets to demonstrate the efficacy of the TransRPPG on both intra- and cross-dataset testings. Our TransRPPG is lightweight and efficient (with only 547K parameters and 763M FLOPs), which is promising for mobile-level applications.
Points as Queries: Weakly Semi-supervised Object Detection by Points
Abstract
We propose a novel point annotated setting for the weakly semi-supervised object detection task, in which the dataset comprises small fully annotated images and large weakly annotated images by points. It achieves a balance between tremendous annotation burden and detection performance. Based on this setting, we analyze existing detectors and find that these detectors have difficulty in fully exploiting the power of the annotated points. To solve this, we introduce a new detector, Point DETR, which extends DETR by adding a point encoder. Extensive experiments conducted on MS-COCO dataset in various data settings show the effectiveness of our method. In particular, when using 20% fully labeled data from COCO, our detector achieves a promising performance, 33.3 AP, which outperforms a strong baseline (FCOS) by 2.0 AP, and we demonstrate the point annotations bring over 10 points in various AR metrics.
Bayesian and Dempster-Shafer models for combining multiple sources of evidence in a fraud detection system
Abstract
Combining evidence from different sources can be achieved with Bayesian or Dempster-Shafer methods. The first requires an estimate of the priors and likelihoods while the second only needs an estimate of the posterior probabilities and enables reasoning with uncertain information due to imprecision of the sources and with the degree of conflict between them. This paper describes the two methods and how they can be applied to the estimation of a global score in the context of fraud detection.
Abstract
Stance detection concerns the classification of a writer's viewpoint towards a target. There are different task variants, e.g., stance of a tweet vs. a full article, or stance with respect to a claim vs. an (implicit) topic. Moreover, task definitions vary, which includes the label inventory, the data collection, and the annotation protocol. All these aspects hinder cross-domain studies, as they require changes to standard domain adaptation approaches. In this paper, we perform an in-depth analysis of 16 stance detection datasets, and we explore the possibility for cross-domain learning from them. Moreover, we propose an end-to-end unsupervised framework for out-of-domain prediction of unseen, user-defined labels. In particular, we combine domain adaptation techniques such as mixture of experts and domain-adversarial training with label embeddings, and we demonstrate sizable performance gains over strong baselines -- both (i) in-domain, i.e., for seen targets, and (ii) out-of-domain, i.e., for unseen targets. Finally, we perform an exhaustive analysis of the cross-domain results, and we highlight the important factors influencing the model performance.
Advanced Lane Detection Model for the Virtual Development of Highly Automated Functions
Authors: Philip Pannagger, Demin Nalic, Faris Orucevic, Arno Eichberger, Branko Rogic
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Virtual development and prototyping has already become an integral part in the field of automated driving systems (ADS). There are plenty of software tools that are used for the virtual development of ADS. One such tool is CarMaker from IPG Automotive, which is widely used in the scientific community and in the automotive industry. It offers a broad spectrum of implementation and modelling possibilities of the vehicle, driver behavior, control, sensors, and environmental models. Focusing on the virtual development of highly automated driving functions on the vehicle guidance level, it is essential to perceive the environment in a realistic manner. For the longitudinal and lateral path guidance line detection sensors are necessary for the determination of the relevant perceiving vehicle and for the planning of trajectories. For this purpose, a lane sensor model was developed in order to efficiently detect lanes in the simulation environment of CarMaker. The so-called advanced lane detection model (ALDM) is optimized regarding the calculation time and is for the lateral and longitudinal vehicle guidance in CarMaker.
An ADMM-based Optimal Transmission Frequency Management System for IoT Edge Intelligence
Authors: Hongde Wu, Noel E. O'Connor, Jennifer Bruton, Mingming Liu
Abstract
In this paper, we investigate a key problem of Internet of Things (IoT) applications in practice. Our research objective is to optimize the transmission frequencies for a group of IoT edge devices under practical constraints. Our key assumption is that different IoT devices may have different priority levels when transmitting data in a resource-constrained environment and that those priority levels may only be locally defined and accessible by edge devices for privacy concerns. To address this problem, we leverage the well-known Alternating Direction Method of Multipliers (ADMM) optimization method and demonstrate its applicability for effectively managing various IoT data streams in a decentralized framework. Our experimental results show that the transmission frequency of each edge device can converge to optimality with little delay using ADMM, and the proposed system can be adjusted dynamically when a new device connects to the system. In addition, we also introduce an anomaly detection mechanism to the system when a device's transmission frequency may be compromised by external manipulation, showing that the proposed system is robust and secure for various IoT applications.
Keyword: metric learning
Sparse online relative similarity learning
Keyword: image retrieval
Learning Regional Attention over Multi-resolution Deep Convolutional Features for Trademark Retrieval
Keyword: face recognition
An Improved Real-Time Face Recognition System at Low Resolution Based on Local Binary Pattern Histogram Algorithm and CLAHE
Robust Backdoor Attacks against Deep Neural Networks in Real Physical World
TransRPPG: Remote Photoplethysmography Transformer for 3D Mask Face Presentation Attack Detection
Keyword: imagenet
Continual Learning From Unlabeled Data Via Deep Clustering
See through Gradients: Image Batch Recovery via GradInversion
Keyword: augmentation
A Better-Than-2 Approximation for Weighted Tree Augmentation
On the Robustness of Goal Oriented Dialogue Systems to Real-world Noise
A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation
Memory Capacity of Neural Turing Machines with Matrix Representation
Keyword: self-supervised
Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding
Self-supervised Learning of 3D Object Understanding by Data Association and Landmark Estimation for Image Sequence
Variational Co-embedding Learning for Attributed Network Clustering
Self-Supervised Exploration via Latent Bayesian Surprise
Self-supervised Video Object Segmentation by Motion Grouping
Keyword: generalization
Web Question Answering with Neurosymbolic Program Synthesis
Attentive Max Feature Map for Acoustic Scene Classification with Joint Learning considering the Abstraction of Classes
Regularizing Models via Pointwise Mutual Information for Named Entity Recognition
Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing
Stochastic Processes with Expected Stopping Time
Rehearsal revealed: The limits and merits of revisiting samples in continual learning
Maximum Entropy Auto-Encoding
Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations
Towards Deconfounding the Influence of Subject's Demographic Characteristics in Question Answering
Demystify Optimization Challenges in Multilingual Transformers
SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements
Keyword: detection
An Explainable Machine Learning-based Network Intrusion Detection System for Enabling Generalisability in Securing IoT Networks
Optimal Positioning of PMUs for Fault Detection and Localization in Active Distribution Networks
Learning structure-aware semantic segmentation with image-level supervision
An Improved Real-Time Face Recognition System at Low Resolution Based on Local Binary Pattern Histogram Algorithm and CLAHE
Phase noise in communication systems: from measures to models
Weakly Supervised Video Anomaly Detection via Center-guided Discriminative Learning
Continual Learning for Fake Audio Detection
OneLog: Towards End-to-End Training in Software Log Anomaly Detection
SDN-Based Intrusion Detection System for Early Detection and Mitigation of DDoS Attacks
Microfluidic-based Bacterial Molecular Computing on a Chip
UIT-E10dot3 at SemEval-2021 Task 5: Toxic Spans Detection with Named Entity Recognition and Question-Answering Approaches
MM-Rec: Multimodal News Recommendation
Ransomware Detection Using Deep Learning in the SCADA System of Electric Vehicle Charging Station
TransRPPG: Remote Photoplethysmography Transformer for 3D Mask Face Presentation Attack Detection
Points as Queries: Weakly Semi-supervised Object Detection by Points
Bayesian and Dempster-Shafer models for combining multiple sources of evidence in a fraud detection system
Cross-Domain Label-Adaptive Stance Detection
Advanced Lane Detection Model for the Virtual Development of Highly Automated Functions
An ADMM-based Optimal Transmission Frequency Management System for IoT Edge Intelligence
Keyword: line segment
There is no result
Keyword: text to image
There is no result