Abstract
Deep learning based device fingerprinting has emerged as a key method of identifying and authenticating devices solely via their captured RF transmissions. Conventional approaches are not portable to different domains in that if a model is trained on data from one domain, it will not perform well on data from a different but related domain. Examples of such domains include the receiver hardware used for collecting the data, the day/time on which data was captured, and the protocol configuration of devices. This work proposes Tweak, a technique that, using metric learning and a calibration process, enables a model trained with data from one domain to perform well on data from another domain. This process is accomplished with only a small amount of training data from the target domain and without changing the weights of the model, which makes the technique computationally lightweight and thus suitable for resource-limited IoT networks. This work evaluates the effectiveness of Tweak vis-a-vis its ability to identify IoT devices using a testbed of real LoRa-enabled devices under various scenarios. The results of this evaluation show that Tweak is viable and especially useful for networks with limited computational resources and applications with time-sensitive missions.
Keyword: image retrieval
There is no result
Keyword: self-supervised
Self-supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax
Abstract
Learning Electronic Health Records (EHRs) representation is a preeminent yet under-discovered research topic. It benefits various clinical decision support applications, e.g., medication outcome prediction or patient similarity search. Current approaches focus on task-specific label supervision on vectorized sequential EHR, which is not applicable to large-scale unsupervised scenarios. Recently, contrastive learning shows great success on self-supervised representation learning problems. However, complex temporality often degrades the performance. We propose Graph Kernel Infomax, a self-supervised graph kernel learning approach on the graphical representation of EHR, to overcome the previous problems. Unlike the state-of-the-art, we do not change the graph structure to construct augmented views. Instead, we use Kernel Subspace Augmentation to embed nodes into two geometrically different manifold views. The entire framework is trained by contrasting nodes and graph representations on those two manifold views through the commonly used contrastive objectives. Empirically, using publicly available benchmark EHR datasets, our approach yields performance on clinical downstream tasks that exceeds the state-of-the-art. Theoretically, the variation on distance metrics naturally creates different views as data augmentation without changing graph structures.
BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec
Authors: Joon Sern Lee, Kai Keng Tay, Zong Fu Chua
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Rapid digitalisation spurred by the Covid-19 pandemic has resulted in more cyber crime. Malware-as-a-service is now a booming business for cyber criminals. With the surge in malware activities, it is vital for cyber defenders to understand more about the malware samples they have at hand as such information can greatly influence their next course of actions during a breach. Recently, researchers have shown how malware family classification can be done by first converting malware binaries into grayscale images and then passing them through neural networks for classification. However, most work focus on studying the impact of different neural network architectures on classification performance. In the last year, researchers have shown that augmenting supervised learning with self-supervised learning can improve performance. Even more recently, Data2Vec was proposed as a modality agnostic self-supervised framework to train neural networks. In this paper, we present BinImg2Vec, a framework of training malware binary image classifiers that incorporates both self-supervised learning and supervised learning to produce a model that consistently outperforms one trained only via supervised learning. We were able to achieve a 4% improvement in classification performance and a 0.5% reduction in performance variance over multiple runs. We also show how our framework produces embeddings that can be well clustered, facilitating model explanability.
Detection of diabetic retinopathy using longitudinal self-supervised learning
Abstract
Longitudinal imaging is able to capture both static anatomical structures and dynamic changes in disease progression towards earlier and better patient-specific pathology management. However, conventional approaches for detecting diabetic retinopathy (DR) rarely take advantage of longitudinal information to improve DR analysis. In this work, we investigate the benefit of exploiting self-supervised learning with a longitudinal nature for DR diagnosis purposes. We compare different longitudinal self-supervised learning (LSSL) methods to model the disease progression from longitudinal retinal color fundus photographs (CFP) to detect early DR severity changes using a pair of consecutive exams. The experiments were conducted on a longitudinal DR screening dataset with or without those trained encoders (LSSL) acting as a longitudinal pretext task. Results achieve an AUC of 0.875 for the baseline (model trained from scratch) and an AUC of 0.96 (95% CI: 0.9593-0.9655 DeLong test) with a p-value < 2.2e-16 on early fusion using a simple ResNet alike architecture with frozen LSSL weights, suggesting that the LSSL latent space enables to encode the dynamic of DR progression.
IMG2IMU: Applying Knowledge from Large-Scale Images to IMU Applications via Contrastive Learning
Abstract
Recent advances in machine learning showed that pre-training representations acquired via self-supervised learning could achieve high accuracy on tasks with small training data. Unlike in vision and natural language processing domains, such pre-training for IMU-based applications is challenging, as there are only a few publicly available datasets with sufficient size and diversity to learn generalizable representations. To overcome this problem, we propose IMG2IMU, a novel approach that adapts pre-train representation from large-scale images to diverse few-shot IMU sensing tasks. We convert the sensor data into visually interpretable spectrograms for the model to utilize the knowledge gained from vision. Further, we apply contrastive learning on an augmentation set we designed to learn representations that are tailored to interpreting sensor data. Our extensive evaluations on five different IMU sensing tasks show that IMG2IMU consistently outperforms the baselines, illustrating that vision knowledge can be incorporated into a few-shot learning environment for IMU sensing tasks.
nnOOD: A Framework for Benchmarking Self-supervised Anomaly Localisation Methods
Authors: Matthew Baugh, Jeremy Tan, Athanasios Vlontzos, Johanna P. Müller, Bernhard Kainz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The wide variety of in-distribution and out-of-distribution data in medical imaging makes universal anomaly detection a challenging task. Recently a number of self-supervised methods have been developed that train end-to-end models on healthy data augmented with synthetic anomalies. However, it is difficult to compare these methods as it is not clear whether gains in performance are from the task itself or the training pipeline around it. It is also difficult to assess whether a task generalises well for universal anomaly detection, as they are often only tested on a limited range of anomalies. To assist with this we have developed nnOOD, a framework that adapts nnU-Net to allow for comparison of self-supervised anomaly localisation methods. By isolating the synthetic, self-supervised task from the rest of the training process we perform a more faithful comparison of the tasks, whilst also making the workflow for evaluating over a given dataset quick and easy. Using this we have implemented the current state-of-the-art tasks and evaluated them on a challenging X-ray dataset.
Keyword: vision transformer
Transformers in Remote Sensing: A Survey
Authors: Abdulaziz Amer Aleissaee, Amandeep Kumar, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal, Gui-Song Xia, Fahad Shahbaz khan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Deep learning-based algorithms have seen a massive popularity in different areas of remote sensing image analysis over the past decade. Recently, transformers-based architectures, originally introduced in natural language processing, have pervaded computer vision field where the self-attention mechanism has been utilized as a replacement to the popular convolution operator for capturing long-range dependencies. Inspired by recent advances in computer vision, remote sensing community has also witnessed an increased exploration of vision transformers for a diverse set of tasks. Although a number of surveys have focused on transformers in computer vision in general, to the best of our knowledge we are the first to present a systematic review of recent advances based on transformers in remote sensing. Our survey covers more than 60 recent transformers-based methods for different remote sensing problems in sub-areas of remote sensing: very high-resolution (VHR), hyperspectral (HSI) and synthetic aperture radar (SAR) imagery. We conclude the survey by discussing different challenges and open issues of transformers in remote sensing. Additionally, we intend to frequently update and maintain the latest transformers in remote sensing papers with their respective code at: https://github.com/VIROBO-15/Transformer-in-Remote-Sensing
Keyword: multimodal
3D Path Planning and Obstacle Avoidance Algorithms for Obstacle-Overcoming Robots
Authors: Yuanhao huang, Shi Huang, Hao Wang, Ruifeng Meng
Abstract
This article introduces a multimodal motion planning (MMP) algorithm that combines three-dimensional (3-D) path planning and a DWA obstacle avoidance algorithm. The algorithms aim to plan the path and motion of obstacle-overcoming robots in complex unstructured scenes. A novel A-star algorithm is proposed to combine the characteristics of unstructured scenes and a strategy to switch it into a greedy best-first strategy algorithm. Meanwhile, the algorithm of path planning is integrated with the DWA algorithm so that the robot can perform local dynamic obstacle avoidance during the movement along the global planned path. Furthermore, when the proposed global path planning algorithm combines with the local obstacle avoidance algorithm, the robot can correct the path after obstacle avoidance and obstacle overcoming. The simulation experiments in a factory with several complex environments verified the feasibility and robustness of the algorithms. The algorithms can quickly generate a reasonable 3-D path for obstacle-overcoming robots and perform reliable local obstacle avoidance under the premise of considering the characteristics of the scene and motion obstacles.
Keyword: CLIP
Zero-Shot Multi-Modal Artist-Controlled Retrieval and Exploration of 3D Object Sets
Authors: Kristofer Schlachter, Benjamin Ahlbrand, Zhu Wang, Valerio Ortenzi, Ken Perlin
Abstract
When creating 3D content, highly specialized skills are generally needed to design and generate models of objects and other assets by hand. We address this problem through high-quality 3D asset retrieval from multi-modal inputs, including 2D sketches, images and text. We use CLIP as it provides a bridge to higher-level latent features. We use these features to perform a multi-modality fusion to address the lack of artistic control that affects common data-driven approaches. Our approach allows for multi-modal conditional feature-driven retrieval through a 3D asset database, by utilizing a combination of input latent embeddings. We explore the effects of different combinations of feature embeddings across different input types and weighting methods.
Temporal Contrastive Learning with Curriculum
Authors: Shuvendu Roy, Ali Etemad
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
We present ConCur, a contrastive video representation learning method that uses curriculum learning to impose a dynamic sampling strategy in contrastive training. More specifically, ConCur starts the contrastive training with easy positive samples (temporally close and semantically similar clips), and as the training progresses, it increases the temporal span effectively sampling hard positives (temporally away and semantically dissimilar). To learn better context-aware representations, we also propose an auxiliary task of predicting the temporal distance between a positive pair of clips. We conduct extensive experiments on two popular action recognition datasets, UCF101 and HMDB51, on which our proposed method achieves state-of-the-art performance on two benchmark tasks of video action recognition and video retrieval. We explore the impact of encoder backbones and pre-training strategies by using R(2+1)D and C3D encoders and pre-training on Kinetics-400 and Kinetics-200 datasets. Moreover, a detailed ablation study shows the effectiveness of each of the components of our proposed method.
Keyword: metric learning
Tweak: Towards Portable Deep Learning Models for Domain-Agnostic LoRa Device Authentication
Keyword: image retrieval
There is no result
Keyword: self-supervised
Self-supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax
BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec
Detection of diabetic retinopathy using longitudinal self-supervised learning
IMG2IMU: Applying Knowledge from Large-Scale Images to IMU Applications via Contrastive Learning
nnOOD: A Framework for Benchmarking Self-supervised Anomaly Localisation Methods
Keyword: vision transformer
Transformers in Remote Sensing: A Survey
Keyword: multimodal
3D Path Planning and Obstacle Avoidance Algorithms for Obstacle-Overcoming Robots
Keyword: CLIP
Zero-Shot Multi-Modal Artist-Controlled Retrieval and Exploration of 3D Object Sets
Temporal Contrastive Learning with Curriculum
Keyword: DALLE
There is no result