Abstract
Self-supervised learning shows promise in harnessing extensive unlabeled data, but it also confronts significant privacy concerns, especially in vision. In this paper, we aim to perform membership inference on visual self-supervised models in a more realistic setting: self-supervised training method and details are unknown for an adversary when attacking as he usually faces a black-box system in practice. In this setting, considering that self-supervised model could be trained by completely different self-supervised paradigms, e.g., masked image modeling and contrastive learning, with complex training details, we propose a unified membership inference method called PartCrop. It is motivated by the shared part-aware capability among models and stronger part response on the training data. Specifically, PartCrop crops parts of objects in an image to query responses with the image in representation space. We conduct extensive attacks on self-supervised models with different training protocols and structures using three widely used image datasets. The results verify the effectiveness and generalization of PartCrop. Moreover, to defend against PartCrop, we evaluate two common approaches, i.e., early stop and differential privacy, and propose a tailored method called shrinking crop scale range. The defense experiments indicate that all of them are effective. Our code is available at https://github.com/JiePKU/PartCrop
Differentially Private Verification of Survey-Weighted Estimates
Authors: Tong Lin, Jerome P. Reiter
Subjects: Cryptography and Security (cs.CR); Methodology (stat.ME)
Abstract
Several official statistics agencies release synthetic data as public use microdata files. In practice, synthetic data do not admit accurate results for every analysis. Thus, it is beneficial for agencies to provide users with feedback on the quality of their analyses of the synthetic data. One approach is to couple synthetic data with a verification server that provides users with measures of the similarity of estimates computed with the synthetic and underlying confidential data. However, such measures leak information about the confidential records, so that agencies may wish to apply disclosure control methods to the released verification measures. We present a verification measure that satisfies differential privacy and can be used when the underlying confidential are collected with a complex survey design. We illustrate the verification measure using repeated sampling simulations where the confidential data are sampled with a probability proportional to size design, and the analyst estimates a population total or mean with the synthetic data. The simulations suggest that the verification measures can provide useful information about the quality of synthetic data inferences.
Keyword: privacy
A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
Authors: Jie Zhu, Jirong Zha, Ding Li, Leye Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Self-supervised learning shows promise in harnessing extensive unlabeled data, but it also confronts significant privacy concerns, especially in vision. In this paper, we aim to perform membership inference on visual self-supervised models in a more realistic setting: self-supervised training method and details are unknown for an adversary when attacking as he usually faces a black-box system in practice. In this setting, considering that self-supervised model could be trained by completely different self-supervised paradigms, e.g., masked image modeling and contrastive learning, with complex training details, we propose a unified membership inference method called PartCrop. It is motivated by the shared part-aware capability among models and stronger part response on the training data. Specifically, PartCrop crops parts of objects in an image to query responses with the image in representation space. We conduct extensive attacks on self-supervised models with different training protocols and structures using three widely used image datasets. The results verify the effectiveness and generalization of PartCrop. Moreover, to defend against PartCrop, we evaluate two common approaches, i.e., early stop and differential privacy, and propose a tailored method called shrinking crop scale range. The defense experiments indicate that all of them are effective. Our code is available at https://github.com/JiePKU/PartCrop
An Interpretable Client Decision Tree Aggregation process for Federated Learning
Authors: Alberto Argente-Garrido, Cristina Zuheros, M. Victoria Luzón, Francisco Herrera
Abstract
Trustworthy Artificial Intelligence solutions are essential in today's data-driven applications, prioritizing principles such as robustness, safety, transparency, explainability, and privacy among others. This has led to the emergence of Federated Learning as a solution for privacy and distributed machine learning. While decision trees, as self-explanatory models, are ideal for collaborative model training across multiple devices in resource-constrained environments such as federated learning environments for injecting interpretability in these models. Decision tree structure makes the aggregation in a federated learning environment not trivial. They require techniques that can merge their decision paths without introducing bias or overfitting while keeping the aggregated decision trees robust and generalizable. In this paper, we propose an Interpretable Client Decision Tree Aggregation process for Federated Learning scenarios that keeps the interpretability and the precision of the base decision trees used for the aggregation. This model is based on aggregating multiple decision paths of the decision trees and can be used on different decision tree types, such as ID3 and CART. We carry out the experiments within four datasets, and the analysis shows that the tree built with the model improves the local models, and outperforms the state-of-the-art.
Differentially Private Verification of Survey-Weighted Estimates
Authors: Tong Lin, Jerome P. Reiter
Subjects: Cryptography and Security (cs.CR); Methodology (stat.ME)
Abstract
Several official statistics agencies release synthetic data as public use microdata files. In practice, synthetic data do not admit accurate results for every analysis. Thus, it is beneficial for agencies to provide users with feedback on the quality of their analyses of the synthetic data. One approach is to couple synthetic data with a verification server that provides users with measures of the similarity of estimates computed with the synthetic and underlying confidential data. However, such measures leak information about the confidential records, so that agencies may wish to apply disclosure control methods to the released verification measures. We present a verification measure that satisfies differential privacy and can be used when the underlying confidential are collected with a complex survey design. We illustrate the verification measure using repeated sampling simulations where the confidential data are sampled with a probability proportional to size design, and the analyst estimates a population total or mean with the synthetic data. The simulations suggest that the verification measures can provide useful information about the quality of synthetic data inferences.
Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game
Abstract
With the enhanced performance of large models on natural language processing tasks, potential moral and ethical issues of large models arise. There exist malicious attackers who induce large models to jailbreak and generate information containing illegal, privacy-invasive information through techniques such as prompt engineering. As a result, large models counter malicious attackers' attacks using techniques such as safety alignment. However, the strong defense mechanism of the large model through rejection replies is easily identified by attackers and used to strengthen attackers' capabilities. In this paper, we propose a multi-agent attacker-disguiser game approach to achieve a weak defense mechanism that allows the large model to both safely reply to the attacker and hide the defense intent. First, we construct a multi-agent framework to simulate attack and defense scenarios, playing different roles to be responsible for attack, disguise, safety evaluation, and disguise evaluation tasks. After that, we design attack and disguise game algorithms to optimize the game strategies of the attacker and the disguiser and use the curriculum learning process to strengthen the capabilities of the agents. The experiments verify that the method in this paper is more effective in strengthening the model's ability to disguise the defense intent compared with other methods. Moreover, our approach can adapt any black-box large model to assist the model in defense and does not suffer from model version iterations.
Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face Recognition
Abstract
In this study, we apply the information-theoretic Privacy Funnel (PF) model to the domain of face recognition, developing a novel method for privacy-preserving representation learning within an end-to-end training framework. Our approach addresses the trade-off between obfuscation and utility in data protection, quantified through logarithmic loss, also known as self-information loss. This research provides a foundational exploration into the integration of information-theoretic privacy principles with representation learning, focusing specifically on the face recognition systems. We particularly highlight the adaptability of our framework with recent advancements in face recognition networks, such as AdaFace and ArcFace. In addition, we introduce the Generative Privacy Funnel ($\mathsf{GenPF}$) model, a paradigm that extends beyond the traditional scope of the PF model, referred to as the Discriminative Privacy Funnel ($\mathsf{DisPF}$). This $\mathsf{GenPF}$ model brings new perspectives on data generation methods with estimation-theoretic and information-theoretic privacy guarantees. Complementing these developments, we also present the deep variational PF (DVPF) model. This model proposes a tractable variational bound for measuring information leakage, enhancing the understanding of privacy preservation challenges in deep representation learning. The DVPF model, associated with both $\mathsf{DisPF}$ and $\mathsf{GenPF}$ models, sheds light on connections with various generative models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Diffusion models. Complementing our theoretical contributions, we release a reproducible PyTorch package, facilitating further exploration and application of these privacy-preserving methodologies in face recognition systems.
Mixing Individual and Collective Behaviours to Predict Out-of-Routine Mobility
Abstract
Predicting human displacements is crucial for addressing various societal challenges, including urban design, traffic congestion, epidemic management, and migration dynamics. While predictive models like deep learning and Markov models offer insights into individual mobility, they often struggle with out-of-routine behaviours. Our study introduces an approach that dynamically integrates individual and collective mobility behaviours, leveraging collective intelligence to enhance prediction accuracy. Evaluating the model on millions of privacy-preserving trajectories across three US cities, we demonstrate its superior performance in predicting out-of-routine mobility, surpassing even advanced deep learning methods. Spatial analysis highlights the model's effectiveness near urban areas with a high density of points of interest, where collective behaviours strongly influence mobility. During disruptive events like the COVID-19 pandemic, our model retains predictive capabilities, unlike individual-based models. By bridging the gap between individual and collective behaviours, our approach offers transparent and accurate predictions, crucial for addressing contemporary mobility challenges.
Federated Computing -- Survey on Building Blocks, Extensions and Systems
Authors: René Schwermer, Ruben Mayer, Hans-Arno Jacobsen
Abstract
In response to the increasing volume and sensitivity of data, traditional centralized computing models face challenges, such as data security breaches and regulatory hurdles. Federated Computing (FC) addresses these concerns by enabling collaborative processing without compromising individual data privacy. This is achieved through a decentralized network of devices, each retaining control over its data, while participating in collective computations. The motivation behind FC extends beyond technical considerations to encompass societal implications. As the need for responsible AI and ethical data practices intensifies, FC aligns with the principles of user empowerment and data sovereignty. FC comprises of Federated Learning (FL) and Federated Analytics (FA). FC systems became more complex over time and they currently lack a clear definition and taxonomy describing its moving pieces. Current surveys capture domain-specific FL use cases, describe individual components in an FC pipeline individually or decoupled from each other, or provide a quantitative overview of the number of published papers. This work surveys more than 150 papers to distill the underlying structure of FC systems with their basic building blocks, extensions, architecture, environment, and motivation. We capture FL and FA systems individually and point out unique difference between those two.
Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds
Authors: Kamalika Chaudhuri, Chuan Guo, Laurens van der Maaten, Saeed Mahloujifar, Mark Tygert
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Machine Learning (stat.ML)
Abstract
Protecting privacy during inference with deep neural networks is possible by adding noise to the activations in the last layers prior to the final classifiers or other task-specific layers. The activations in such layers are known as "features" (or, less commonly, as "embeddings" or "feature embeddings"). The added noise helps prevent reconstruction of the inputs from the noisy features. Lower bounding the variance of every possible unbiased estimator of the inputs quantifies the confidentiality arising from such added noise. Convenient, computationally tractable bounds are available from classic inequalities of Hammersley and of Chapman and Robbins -- the HCR bounds. Numerical experiments indicate that the HCR bounds are on the precipice of being effectual for small neural nets with the data sets, "MNIST" and "CIFAR-10," which contain 10 classes each for image classification. The HCR bounds appear to be insufficient on their own to guarantee confidentiality of the inputs to inference with standard deep neural nets, "ResNet-18" and "Swin-T," pre-trained on the data set, "ImageNet-1000," which contains 1000 classes. Supplementing the addition of noise to features with other methods for providing confidentiality may be warranted in the case of ImageNet. In all cases, the results reported here limit consideration to amounts of added noise that incur little degradation in the accuracy of classification from the noisy features. Thus, the added noise enhances confidentiality without much reduction in the accuracy on the task of image classification.
Keyword: machine learning
Remote sensing framework for geological mapping via stacked autoencoders and clustering
Authors: Sandeep Nagar, Ehsan Farahbakhsh, Joseph Awange, Rohitash Chandra
Abstract
Supervised learning methods for geological mapping via remote sensing face limitations due to the scarcity of accurately labelled training data. In contrast, unsupervised learning methods, such as dimensionality reduction and clustering have the ability to uncover patterns and structures in remote sensing data without relying on predefined labels. Dimensionality reduction methods have the potential to play a crucial role in improving the accuracy of geological maps. Although conventional dimensionality reduction methods may struggle with nonlinear data, unsupervised deep learning models such as autoencoders have the ability to model nonlinear relationship in data. Stacked autoencoders feature multiple interconnected layers to capture hierarchical data representations that can be useful for remote sensing data. In this study, we present an unsupervised machine learning framework for processing remote sensing data by utilizing stacked autoencoders for dimensionality reduction and k-means clustering for mapping geological units. We use the Landsat-8, ASTER, and Sentinel-2 datasets of the Mutawintji region in Western New South Wales, Australia to evaluate the framework for geological mapping. We also provide a comparison of stacked autoencoders with principal component analysis and canonical autoencoders. Our results reveal that the framework produces accurate and interpretable geological maps, efficiently discriminating rock units. We find that the stacked autoencoders provide better accuracy when compared to the counterparts. We also find that the generated maps align with prior geological knowledge of the study area while providing novel insights into geological structures.
Leveraging Machine Learning for Early Autism Detection via INDT-ASD Indian Database
Abstract
Machine learning (ML) has advanced quickly, particularly throughout the area of health care. The diagnosis of neurodevelopment problems using ML is a very important area of healthcare. Autism spectrum disorder (ASD) is one of the developmental disorders that is growing the fastest globally. The clinical screening tests used to identify autistic symptoms are expensive and time-consuming. But now that ML has been advanced, it's feasible to identify autism early on. Previously, many different techniques have been used in investigations. Still, none of them have produced the anticipated outcomes when it comes to the capacity to predict autistic features utilizing a clinically validated Indian ASD database. Therefore, this study aimed to develop a simple, quick, and inexpensive technique for identifying ASD by using ML. Various machine learning classifiers, including Adaboost (AB), Gradient Boost (GB), Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Gaussian Naive Bayes (GNB), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM), were used to develop the autism prediction model. The proposed method was tested with records from the AIIMS Modified INDT-ASD (AMI) database, which were collected through an application developed by AIIMS in Delhi, India. Feature engineering has been applied to make the proposed solution easier than already available solutions. Using the proposed model, we succeeded in predicting ASD using a minimized set of 20 questions rather than the 28 questions presented in AMI with promising accuracy. In a comparative evaluation, SVM emerged as the superior model among others, with 100 $\pm$ 0.05\% accuracy, higher recall by 5.34\%, and improved accuracy by 2.22\%-6.67\% over RF. We have also introduced a web-based solution supporting both Hindi and English.
A shared compilation stack for distributed-memory parallelism in stencil DSLs
Authors: George Bisbas, Anton Lydike, Emilien Bauer, Nick Brown, Mathieu Fehr, Lawrence Mitchell, Gabriel Rodriguez-Canal, Maurice Jamieson, Paul H. J. Kelly, Michel Steuwer, Tobias Grosser
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)
Abstract
Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express problems at a high level, providing rich details that optimizing compilers can exploit to target current- and next-generation supercomputers. The convenience and performance of DSLs come with significant development and maintenance costs. The siloed design of DSL compilers and the resulting inability to benefit from shared infrastructure cause uncertainties around longevity and the adoption of DSLs at scale. By tailoring the broadly-adopted MLIR compiler framework to HPC, we bring the same synergies that the machine learning community already exploits across their DSLs (e.g. Tensorflow, PyTorch) to the finite-difference stencil HPC community. We introduce new HPC-specific abstractions for message passing targeting distributed stencil computations. We demonstrate the sharing of common components across three distinct HPC stencil-DSL compilers: Devito, PSyclone, and the Open Earth Compiler, showing that our framework generates high-performance executables based upon a shared compiler ecosystem.
Virtual Sensor for Real-Time Bearing Load Prediction Using Heterogeneous Temporal Graph Neural Networks
Authors: Mengjie Zhao, Cees Taal, Stephan Baggerohr, Olga Fink
Abstract
Accurate bearing load monitoring is essential for their Prognostics and Health Management (PHM), enabling damage assessment, wear prediction, and proactive maintenance. While bearing sensors are typically placed on the bearing housing, direct load monitoring requires sensors inside the bearing itself. Recently introduced sensor rollers enable direct bearing load monitoring but are constrained by their battery life. Data-driven virtual sensors can learn from sensor roller data collected during a batterys lifetime to map operating conditions to bearing loads. Although spatially distributed bearing sensors offer insights into load distribution (e.g., correlating temperature with load), traditional machine learning algorithms struggle to fully exploit these spatial-temporal dependencies. To address this gap, we introduce a graph-based virtual sensor that leverages Graph Neural Networks (GNNs) to analyze spatial-temporal dependencies among sensor signals, mapping existing measurements (temperature, vibration) to bearing loads. Since temperature and vibration signals exhibit vastly different dynamics, we propose Heterogeneous Temporal Graph Neural Networks (HTGNN), which explicitly models these signal types and their interactions for effective load prediction. Our results demonstrate that HTGNN outperforms Convolutional Neural Networks (CNNs), which struggle to capture both spatial and heterogeneous signal characteristics. These findings highlight the importance of capturing the complex spatial interactions between temperature, vibration, and load.
Heat Death of Generative Models in Closed-Loop Learning
Authors: Matteo Marchi, Stefano Soatto, Pratik Chaudhari, Paulo Tabuada
Abstract
Improvement and adoption of generative machine learning models is rapidly accelerating, as exemplified by the popularity of LLMs (Large Language Models) for text, and diffusion models for image generation.As generative models become widespread, data they generate is incorporated into shared content through the public web. This opens the question of what happens when data generated by a model is fed back to the model in subsequent training campaigns. This is a question about the stability of the training process, whether the distribution of publicly accessible content, which we refer to as "knowledge", remains stable or collapses. Small scale empirical experiments reported in the literature show that this closed-loop training process is prone to degenerating. Models may start producing gibberish data, or sample from only a small subset of the desired data distribution (a phenomenon referred to as mode collapse). So far there has been only limited theoretical understanding of this process, in part due to the complexity of the deep networks underlying these generative models. The aim of this paper is to provide insights into this process (that we refer to as "generative closed-loop learning") by studying the learning dynamics of generative models that are fed back their own produced content in addition to their original training dataset. The sampling of many of these models can be controlled via a "temperature" parameter. Using dynamical systems tools, we show that, unless a sufficient amount of external data is introduced at each iteration, any non-trivial temperature leads the model to asymptotically degenerate. In fact, either the generative distribution collapses to a small set of outputs, or becomes uniform over a large set of outputs.
Effective Malware Detection for Embedded Computing Systems with Limited Exposure
Abstract
One of the pivotal security threats for the embedded computing systems is malicious software a.k.a malware. With efficiency and efficacy, Machine Learning (ML) has been widely adopted for malware detection in recent times. Despite being efficient, the existing techniques require a tremendous number of benign and malware samples for training and modeling an efficient malware detector. Furthermore, such constraints limit the detection of emerging malware samples due to the lack of sufficient malware samples required for efficient training. To address such concerns, we introduce a code-aware data generation technique that generates multiple mutated samples of the limitedly seen malware by the devices. Loss minimization ensures that the generated samples closely mimic the limitedly seen malware and mitigate the impractical samples. Such developed malware is further incorporated into the training set to formulate the model that can efficiently detect the emerging malware despite having limited exposure. The experimental results demonstrates that the proposed technique achieves an accuracy of 90% in detecting limitedly seen malware, which is approximately 3x more than the accuracy attained by state-of-the-art techniques.
Attribution Regularization for Multimodal Paradigms
Authors: Sahiti Yerramilli, Jayant Sravan Tamarapalli, Jonathan Francis, Eric Nyberg
Abstract
Multimodal machine learning has gained significant attention in recent years due to its potential for integrating information from multiple modalities to enhance learning and decision-making processes. However, it is commonly observed that unimodal models outperform multimodal models, despite the latter having access to richer information. Additionally, the influence of a single modality often dominates the decision-making process, resulting in suboptimal performance. This research project aims to address these challenges by proposing a novel regularization term that encourages multimodal models to effectively utilize information from all modalities when making decisions. The focus of this project lies in the video-audio domain, although the proposed regularization technique holds promise for broader applications in embodied AI research, where multiple modalities are involved. By leveraging this regularization term, the proposed approach aims to mitigate the issue of unimodal dominance and improve the performance of multimodal machine learning systems. Through extensive experimentation and evaluation, the effectiveness and generalizability of the proposed technique will be assessed. The findings of this research project have the potential to significantly contribute to the advancement of multimodal machine learning and facilitate its application in various domains, including multimedia analysis, human-computer interaction, and embodied AI research.
Obfuscated Malware Detection: Investigating Real-world Scenarios through Memory Analysis
Authors: S M Rakib Hasan, Aakar Dhakal
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
In the era of the internet and smart devices, the detection of malware has become crucial for system security. Malware authors increasingly employ obfuscation techniques to evade advanced security solutions, making it challenging to detect and eliminate threats. Obfuscated malware, adept at hiding itself, poses a significant risk to various platforms, including computers, mobile devices, and IoT devices. Conventional methods like heuristic-based or signature-based systems struggle against this type of malware, as it leaves no discernible traces on the system. In this research, we propose a simple and cost-effective obfuscated malware detection system through memory dump analysis, utilizing diverse machine-learning algorithms. The study focuses on the CIC-MalMem-2022 dataset, designed to simulate real-world scenarios and assess memory-based obfuscated malware detection. We evaluate the effectiveness of machine learning algorithms, such as decision trees, ensemble methods, and neural networks, in detecting obfuscated malware within memory dumps. Our analysis spans multiple malware categories, providing insights into algorithmic strengths and limitations. By offering a comprehensive assessment of machine learning algorithms for obfuscated malware detection through memory analysis, this paper contributes to ongoing efforts to enhance cybersecurity and fortify digital ecosystems against evolving and sophisticated malware threats. The source code is made open-access for reproducibility and future research endeavours. It can be accessed at https://bit.ly/MalMemCode.
Designing a Photonic Physically Unclonable Function Having Resilience to Machine Learning Attacks
Authors: Elena R. Henderson, Jessie M. Henderson, Hiva Shahoei, William V. Oxford, Eric C. Larson, Duncan L. MacFarlane, Mitchell A. Thornton
Subjects: Cryptography and Security (cs.CR); Optics (physics.optics)
Abstract
Physically unclonable functions (PUFs) are designed to act as device 'fingerprints.' Given an input challenge, the PUF circuit should produce an unpredictable response for use in situations such as root-of-trust applications and other hardware-level cybersecurity applications. PUFs are typically subcircuits present within integrated circuits (ICs), and while conventional IC PUFs are well-understood, several implementations have proven vulnerable to malicious exploits, including those perpetrated by machine learning (ML)-based attacks. Such attacks can be difficult to prevent because they are often designed to work even when relatively few challenge-response pairs are known in advance. Hence the need for both more resilient PUF designs and analysis of ML-attack susceptibility. Previous work has developed a PUF for photonic integrated circuits (PICs). A PIC PUF not only produces unpredictable responses given manufacturing-introduced tolerances, but is also less prone to electromagnetic radiation eavesdropping attacks than a purely electronic IC PUF. In this work, we analyze the resilience of the proposed photonic PUF when subjected to ML-based attacks. Specifically, we describe a computational PUF model for producing the large datasets required for training ML attacks; we analyze the quality of the model; and we discuss the modeled PUF's susceptibility to ML-based attacks. We find that the modeled PUF generates distributions that resemble uniform white noise, explaining the exhibited resilience to neural-network-based attacks designed to exploit latent relationships between challenges and responses. Preliminary analysis suggests that the PUF exhibits similar resilience to generative adversarial networks, and continued development will show whether more-sophisticated ML approaches better compromise the PUF and -- if so -- how design modifications might improve resilience.
Task Agnostic Architecture for Algorithm Induction via Implicit Composition
Abstract
Different fields in applied machine learning such as computer vision, speech or natural language processing have been building domain-specialised solutions. Currently, we are witnessing an opposing trend towards developing more generalist architectures, driven by Large Language Models and multi-modal foundational models. These architectures are designed to tackle a variety of tasks, including those previously unseen and using inputs across multiple modalities. Taking this trend of generalization to the extreme suggests the possibility of a single deep network architecture capable of solving all tasks. This position paper aims to explore developing such a unified architecture and proposes a theoretical framework of how it could be constructed. Our proposal is based on the following assumptions. Firstly, tasks are solved by following a sequence of instructions, typically implemented in code for conventional computing hardware, which inherently operates sequentially. Second, recent Generative AI, especially Transformer-based models, demonstrate potential as an architecture capable of constructing algorithms for a wide range of domains. For example, GPT-4 shows exceptional capability at in-context learning of novel tasks which is hard to explain in any other way than the ability to compose novel solutions from fragments on previously learnt algorithms. Third, the observation that the main missing component in developing a truly generalised network is an efficient approach for self-consistent input of previously learnt sub-steps of an algorithm and their (implicit) composition during the network's internal forward pass. Our exploration delves into current capabilities and limitations of Transformer-based and other methods in efficient and correct algorithm composition and proposes a Transformer-like architecture as well as a discrete learning framework to overcome these limitations.
Creating a Trajectory for Code Writing: Algorithmic Reasoning Tasks
Authors: Shruthi Ravikumar, Margaret Hamilton, Charles Thevathayan, Maria Spichkova, Kashif Ali, Gayan Wijesinghe
Subjects: Software Engineering (cs.SE); Programming Languages (cs.PL)
Abstract
Many students in introductory programming courses fare poorly in the code writing tasks of the final summative assessment. Such tasks are designed to assess whether novices have developed the analytical skills to translate from the given problem domain to coding. In the past researchers have used instruments such as code-explain and found that the extent of cognitive depth reached in these tasks correlated well with code writing ability. However, the need for manual marking and personalized interviews used for identifying cognitive difficulties limited the study to a small group of stragglers. To extend this work to larger groups, we have devised several question types with varying cognitive demands collectively called Algorithmic Reasoning Tasks (ARTs), which do not require manual marking. These tasks require levels of reasoning which can define a learning trajectory. This paper describes these instruments and the machine learning models used for validating them. We have used the data collected in an introductory programming course in the penultimate week of the semester which required attempting ART type instruments and code writing. Our preliminary research suggests ART type instruments can be combined with specific machine learning models to act as an effective learning trajectory and early prediction of code-writing skills.
Abstract
In this mini-review, we explore the new prediction methods for drug combination synergy relying on high-throughput combinatorial screens. The fast progress of the field is witnessed in the more than thirty original machine learning methods published since 2021, a clear majority of them based on deep learning techniques. We aim to put these papers under a unifying lens by highlighting the core technologies, the data sources, the input data types and synergy scores used in the methods, as well as the prediction scenarios and evaluation protocols that the papers deal with. Our finding is that the best methods accurately solve the synergy prediction scenarios involving known drugs or cell lines while the scenarios involving new drugs or cell lines still fall short of an accurate prediction level.
An Interpretable Client Decision Tree Aggregation process for Federated Learning
Authors: Alberto Argente-Garrido, Cristina Zuheros, M. Victoria Luzón, Francisco Herrera
Abstract
Trustworthy Artificial Intelligence solutions are essential in today's data-driven applications, prioritizing principles such as robustness, safety, transparency, explainability, and privacy among others. This has led to the emergence of Federated Learning as a solution for privacy and distributed machine learning. While decision trees, as self-explanatory models, are ideal for collaborative model training across multiple devices in resource-constrained environments such as federated learning environments for injecting interpretability in these models. Decision tree structure makes the aggregation in a federated learning environment not trivial. They require techniques that can merge their decision paths without introducing bias or overfitting while keeping the aggregated decision trees robust and generalizable. In this paper, we propose an Interpretable Client Decision Tree Aggregation process for Federated Learning scenarios that keeps the interpretability and the precision of the base decision trees used for the aggregation. This model is based on aggregating multiple decision paths of the decision trees and can be used on different decision tree types, such as ID3 and CART. We carry out the experiments within four datasets, and the analysis shows that the tree built with the model improves the local models, and outperforms the state-of-the-art.
Space-time parallel scaling of Parareal with a Fourier Neural Operator as coarse propagator
Authors: Abdul Qadir Ibrahim, Sebastian Götschel, Daniel Ruprecht
Abstract
Iterative parallel-in-time algorithms like Parareal can extend scaling beyond the saturation of purely spatial parallelization when solving initial value problems. However, they require the user to build coarse models to handle the inevitably serial transport of information in time.This is a time consuming and difficult process since there is still only limited theoretical insight into what constitutes a good and efficient coarse model. Novel approaches from machine learning to solve differential equations could provide a more generic way to find coarse level models for parallel-in-time algorithms. This paper demonstrates that a physics-informed Fourier Neural Operator (PINO) is an effective coarse model for the parallelization in time of the two-asset Black-Scholes equation using Parareal. We demonstrate that PINO-Parareal converges as fast as a bespoke numerical coarse model and that, in combination with spatial parallelization by domain decomposition, it provides better overall speedup than both purely spatial parallelization and space-time parallelizaton with a numerical coarse propagator.
CSEPrompts: A Benchmark of Introductory Computer Science Prompts
Abstract
Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs) that are trained on massive amounts of data and often have trillions of parameters. Commercial applications (e.g., ChatGPT) have made this technology available to the general public, thus making it possible to use LLMs to produce high-quality texts for academic and professional purposes. Schools and universities are aware of the increasing use of AI-generated content by students and they have been researching the impact of this new technology and its potential misuse. Educational programs in Computer Science (CS) and related fields are particularly affected because LLMs are also capable of generating programming code in various programming languages. To help understand the potential impact of publicly available LLMs in CS education, we introduce CSEPrompts, a framework with hundreds of programming exercise prompts and multiple-choice questions retrieved from introductory CS and programming courses. We also provide experimental results on CSEPrompts to evaluate the performance of several LLMs with respect to generating Python code and answering basic computer science and programming questions.
Adaptive hp-Polynomial Based Sparse Grid Collocation Algorithms for Piecewise Smooth Functions with Kinks
Abstract
High-dimensional interpolation problems appear in various applications of uncertainty quantification, stochastic optimization and machine learning. Such problems are computationally expensive and request the use of adaptive grid generation strategies like anisotropic sparse grids to mitigate the curse of dimensionality. However, it is well known that the standard dimension-adaptive sparse grid method converges very slowly or even fails in the case of non-smooth functions. For piecewise smooth functions with kinks, we construct two novel hp-adaptive sparse grid collocation algorithms that combine low-order basis functions with local support in parts of the domain with less regularity and variable-order basis functions elsewhere. Spatial refinement is realized by means of a hierarchical multivariate knot tree which allows the construction of localised hierarchical basis functions with varying order. Hierarchical surplus is used as an error indicator to automatically detect the non-smooth region and adaptively refine the collocation points there. The local polynomial degrees are optionally selected by a greedy approach or a kink detection procedure. Three numerical benchmark examples with different dimensions are discussed and comparison with locally linear and highest degree basis functions are given to show the efficiency and accuracy of the proposed methods.
On Future Power Systems Digital Twins: Towards a Standard Architecture
Authors: Wouter Zomerdijk, Peter Palensky, Tarek AlSkaif, Pedro P. Vergara
Abstract
The energy sector's digital transformation brings mutually dependent communication and energy infrastructure, tightening the relationship between the physical and the digital world. Digital twins (DT) are the key concept for this. This paper initially discusses the evolution of the DT concept across various engineering applications before narrowing its focus to the power systems domain. By reviewing different definitions and applications, we present a new definition of DTs specifically tailored to power systems. Based on the proposed definition and extensive deliberations and consultations with distribution system operators, energy traders, and municipalities, we introduce a standard DT ecosystem architecture that offers services beyond real-time updates and can seamlessly integrate with existing transmission and distribution system operators' processes, while reconciling with concepts such as microgrids and local energy communities based on a system-of-systems view. We also discuss the integration of power system DTs into various phases of the system's life cycle, such as long-term planning, emphasizing challenges that remain to be addressed, such as managing measurement and model errors, and uncertainty propagation. Finally, we present our vision of how artificial intelligence (AI) and machine learning (ML) can enhance several power system DT modules established in the proposed architecture.
Learning Alternative Ways of Performing a Task
Authors: David Nieves, María José Ramírez-Quintana, Carlos Monserrat, César Ferri, José Hernández-Orallo
Abstract
A common way of learning to perform a task is to observe how it is carried out by experts. However, it is well known that for most tasks there is no unique way to perform them. This is especially noticeable the more complex the task is because factors such as the skill or the know-how of the expert may well affect the way she solves the task. In addition, learning from experts also suffers of having a small set of training examples generally coming from several experts (since experts are usually a limited and expensive resource), being all of them positive examples (i.e. examples that represent successful executions of the task). Traditional machine learning techniques are not useful in such scenarios, as they require extensive training data. Starting from very few executions of the task presented as activity sequences, we introduce a novel inductive approach for learning multiple models, with each one representing an alternative strategy of performing a task. By an iterative process based on generalisation and specialisation, we learn the underlying patterns that capture the different styles of performing a task exhibited by the examples. We illustrate our approach on two common activity recognition tasks: a surgical skills training task and a cooking domain. We evaluate the inferred models with respect to two metrics that measure how well the models represent the examples and capture the different forms of executing a task showed by the examples. We compare our results with the traditional process mining approach and show that a small set of meaningful examples is enough to obtain patterns that capture the different strategies that are followed to solve the tasks.
LightFAt: Mitigating Control-flow Explosion via Lightweight PMU-based Control-flow Attestation
Authors: Jeferson Gonzalez-Gomez, Hassan Nassar, Lars Bauer, Jorg Henkel
Abstract
With the continuous evolution of computational devices, more and more applications are being executed remotely. The applications operate on a wide spectrum of devices, ranging from IoT nodes with low computational capabilities to large cloud providers with high capabilities. Remote execution often deals with sensitive data or executes proprietary software. Hence, the challenge of ensuring that the code execution will not be compromised rises. Remote Attestation deals with this challenge. It ensures the code is executed in a non-compromised environment by calculating a potentially large sequence of cryptographic hash values. Each hash calculation is computationally intensive and over a large sequence the overhead becomes extremely high. In this work, we propose LightFAt: a Lightweight Control Flow Attestation scheme. Instead of relying on the expensive cryptographic hash calculation, LightFAt leverages the readings from the processor's Performance Monitor Unit (PMU) in conjunction with a lightweight unsupervised machine learning (ML) classifier to detect whether a target application's control flow is compromised, hence improving the system's security. On the verifier's side, LightFAt reaches a detection accuracy of over 95%, with low false-negative and false-positive rates.
Towards detecting unanticipated bias in Large Language Models
Authors: Anna Kruspe
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
Over the last year, Large Language Models (LLMs) like ChatGPT have become widely available and have exhibited fairness issues similar to those in previous machine learning systems. Current research is primarily focused on analyzing and quantifying these biases in training data and their impact on the decisions of these models, alongside developing mitigation strategies. This research largely targets well-known biases related to gender, race, ethnicity, and language. However, it is clear that LLMs are also affected by other, less obvious implicit biases. The complex and often opaque nature of these models makes detecting such biases challenging, yet this is crucial due to their potential negative impact in various applications. In this paper, we explore new avenues for detecting these unanticipated biases in LLMs, focusing specifically on Uncertainty Quantification and Explainable AI methods. These approaches aim to assess the certainty of model decisions and to make the internal decision-making processes of LLMs more transparent, thereby identifying and understanding biases that are not immediately apparent. Through this research, we aim to contribute to the development of fairer and more transparent AI systems.
Adversarial Attacks and Dimensionality in Text Classifiers
Abstract
Adversarial attacks on machine learning algorithms have been a key deterrent to the adoption of AI in many real-world use cases. They significantly undermine the ability of high-performance neural networks by forcing misclassifications. These attacks introduce minute and structured perturbations or alterations in the test samples, imperceptible to human annotators in general, but trained neural networks and other models are sensitive to it. Historically, adversarial attacks have been first identified and studied in the domain of image processing. In this paper, we study adversarial examples in the field of natural language processing, specifically text classification tasks. We investigate the reasons for adversarial vulnerability, particularly in relation to the inherent dimensionality of the model. Our key finding is that there is a very strong correlation between the embedding dimensionality of the adversarial samples and their effectiveness on models tuned with input samples with same embedding dimension. We utilize this sensitivity to design an adversarial defense mechanism. We use ensemble models of varying inherent dimensionality to thwart the attacks. This is tested on multiple datasets for its efficacy in providing robustness. We also study the problem of measuring adversarial perturbation using different distance metrics. For all of the aforementioned studies, we have run tests on multiple models with varying dimensionality and used a word-vector level adversarial attack to substantiate the findings.
Continual Learning of Numerous Tasks from Long-tail Distributions
Abstract
Continual learning, an important aspect of artificial intelligence and machine learning research, focuses on developing models that learn and adapt to new tasks while retaining previously acquired knowledge. Existing continual learning algorithms usually involve a small number of tasks with uniform sizes and may not accurately represent real-world learning scenarios. In this paper, we investigate the performance of continual learning algorithms with a large number of tasks drawn from a task distribution that is long-tail in terms of task sizes. We design one synthetic dataset and two real-world continual learning datasets to evaluate the performance of existing algorithms in such a setting. Moreover, we study an overlooked factor in continual learning, the optimizer states, e.g. first and second moments in the Adam optimizer, and investigate how it can be used to improve continual learning performance. We propose a method that reuses the optimizer states in Adam by maintaining a weighted average of the second moments from previous tasks. We demonstrate that our method, compatible with most existing continual learning algorithms, effectively reduces forgetting with only a small amount of additional computational or memory costs, and provides further improvements on existing continual learning algorithms, particularly in a long-tail task sequence.
Closing the Implementation Gap in MC: Fully Chemical Synchronization and Detection for Cellular Receivers
Authors: Bastian Heinlein, Lukas Brand, Malcolm Egan, Maximilian Schäfer, Robert Schober, Sebastian Lotter
Abstract
In the context of the Internet of Bio-Nano Things (IoBNT), nano-devices are envisioned to perform complex tasks collaboratively, i.e., by communicating with each other. One candidate for the implementation of such devices are engineered cells due to their inherent biocompatibility. However, because each engineered cell has only little computational capabilities, transmitter and receiver (RX) functionalities can afford only limited complexity. In this paper, we propose a simple, yet modular, architecture for a cellular RX that is capable of processing a stream of observed symbols using chemical reaction networks. Furthermore, we propose two specific detector implementations for the RX. The first detector is based on a machine learning model that is trained offline, i.e., before the cellular RX is deployed. The second detector utilizes pilot symbol-based training and is therefore able to continuously adapt to changing channel conditions online, i.e., after deployment. To coordinate the different chemical processing steps involved in symbol detection, the proposed cellular RX leverages an internal chemical timer. Furthermore, the RX is synchronized with the transmitter via external, i.e., extracellular, signals. Finally, the proposed architecture is validated using theoretical analysis and stochastic simulations. The presented results confirm the feasibility of both proposed implementations and reveal that the proposed online learning-based RX is able to perform reliable detection even in initially unknown or slowly changing channels. By its modular design and exclusively chemical implementation, the proposed RX contributes towards the realization of versatile and biocompatible nano-scale communication networks for IoBNT applications narrowing the existing implementation gap in cellular molecular communication (MC).
Empowering Biomedical Discovery with AI Agents
Authors: Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, Marinka Zitnik
Abstract
We envision 'AI scientists' as systems capable of skeptical learning and reasoning that empower biomedical research through collaborative agents that integrate machine learning tools with experimental platforms. Rather than taking humans out of the discovery process, biomedical AI agents combine human creativity and expertise with AI's ability to analyze large datasets, navigate hypothesis spaces, and execute repetitive tasks. AI agents are proficient in a variety of tasks, including self-assessment and planning of discovery workflows. These agents use large language models and generative models to feature structured memory for continual learning and use machine learning tools to incorporate scientific knowledge, biological principles, and theories. AI agents can impact areas ranging from hybrid cell simulation, programmable control of phenotypes, and the design of cellular circuits to the development of new therapies.
"Are Adversarial Phishing Webpages a Threat in Reality?" Understanding the Users' Perception of Adversarial Webpages
Authors: Ying Yuan, Qingying Hao, Giovanni Apruzzese, Mauro Conti, Gang Wang
Abstract
Machine learning based phishing website detectors (ML-PWD) are a critical part of today's anti-phishing solutions in operation. Unfortunately, ML-PWD are prone to adversarial evasions, evidenced by both academic studies and analyses of real-world adversarial phishing webpages. However, existing works mostly focused on assessing adversarial phishing webpages against ML-PWD, while neglecting a crucial aspect: investigating whether they can deceive the actual target of phishing -- the end users. In this paper, we fill this gap by conducting two user studies (n=470) to examine how human users perceive adversarial phishing webpages, spanning both synthetically crafted ones (which we create by evading a state-of-the-art ML-PWD) as well as real adversarial webpages (taken from the wild Web) that bypassed a production-grade ML-PWD. Our findings confirm that adversarial phishing is a threat to both users and ML-PWD, since most adversarial phishing webpages have comparable effectiveness on users w.r.t. unperturbed ones. However, not all adversarial perturbations are equally effective. For example, those with added typos are significantly more noticeable to users, who tend to overlook perturbations of higher visual magnitude (such as replacing the background). We also show that users' self-reported frequency of visiting a brand's website has a statistically negative correlation with their phishing detection accuracy, which is likely caused by overconfidence. We release our resources.
AI-augmented Automation for Real Driving Prediction: an Industrial Use Case
Authors: Romina Eramo, Hamzeh Eyal Salman, Matteo Spezialetti, Darko Stern, Pierre Quinton, Antonio Cicchetti
Abstract
The risen complexity of automotive systems requires new development strategies and methods to master the upcoming challenges. Traditional methods need thus to be changed by an increased level of automation, and a faster continuous improvement cycle. In this context, current vehicle performance tests represent a very time-consuming and expensive task due to the need to perform the tests in real driving conditions. As a consequence, agile/iterative processes like DevOps are largely hindered by the necessity of triggering frequent tests. This paper reports on a practical experience of developing an AI-augmented solution based on Machine Learning and Model-based Engineering to support continuous vehicle development and testing. In particular, historical data collected in real driving conditions is leveraged to synthesize a high-fidelity driving simulator and hence enable performance tests in virtual environments. Based on this practical experience, this paper also proposes a conceptual framework to support predictions based on real driving behavior.
Abstract
Human Activity Recognition is a subject of great research today and has its applications in remote healthcare, activity tracking of the elderly or the disables, calories burnt tracking etc. In our project, we have created an Android application that recognizes the daily human activities and calculate the calories burnt in real time. We first captured labeled triaxial acceleration readings for different daily human activities from the smartphone's embedded accelerometer. These readings were preprocessed using a median filter. 42 features were extracted using various methods. We then tested various machine learning algorithms along with dimensionality reduction. Finally, in our Android application, we used the machine learning algorithm and a subset of features that provided maximum accuracy and minimum model building time. This is used for real-time activity recognition and calculation of calories burnt using a formula based on Metabolic Equivalent.
Keyword: optimization
Trainable Least Squares to Reduce PAPR in OFDM-based Hybrid Beamforming Systems
Authors: Andrey Ivanov, Alexander Osinsky, Roman Bychkov, Vladimir Kalinin, Dmitry Lakontsev
Subjects: Other Computer Science (cs.OH); Information Theory (cs.IT)
Abstract
In this paper, we propose a trainable least squares (LS) approach for reducing the peak-to-average power ratio (PAPR) of orthogonal frequency division multiplexing (OFDM) signals in a hybrid beamforming (HBF) system. Compared to digital beamforming (DBF), in HBF technology the number of antennas exceeds the number of digital ports. Therefore, PAPR reduction capabilities are restricted by both a limited bandwidth and the reduced size of digital space. The problem is to meet both conditions. Moreover, the major HBF advantage is a reduced system complexity, thus the complexity of the PAPR reduction algorithm is expected to be low. To justify the performance of the proposed trainable LS, we provide a performance bound achieved by convex optimization using the CVX Matlab package. Moreover, the complexity of the proposed algorithm can be comparable to the minimal complexity of the digital ``twin'' calculation in HBF. The abovementioned features prove the feasibility of the trained LS for PAPR reduction in fully-connected HBF.
An Online Joint Optimization Approach for QoE Maximization in UAV-Enabled Mobile Edge Computing
Abstract
Given flexible mobility, rapid deployment, and low cost, unmanned aerial vehicle (UAV)-enabled mobile edge computing (MEC) shows great potential to compensate for the lack of terrestrial edge computing coverage. However, limited battery capacity, computing and spectrum resources also pose serious challenges for UAV-enabled MEC, which shorten the service time of UAVs and degrade the quality of experience (QoE) of user devices (UDs) {\color{b} without effective control approach}. In this work, we consider a UAV-enabled MEC scenario where a UAV serves as an aerial edge server to provide computing services for multiple ground UDs. Then, a joint task offloading, resource allocation, and UAV trajectory planning optimization problem (JTRTOP) is formulated to maximize the QoE of UDs under the UAV energy consumption constraint. To solve the JTRTOP that is proved to be a future-dependent and NP-hard problem, an online joint optimization approach (OJOA) is proposed. Specifically, the JTRTOP is first transformed into a per-slot real-time optimization problem (PROP) by using the Lyapunov optimization framework. Then, a two-stage optimization method based on game theory and convex optimization is proposed to solve the PROP. Simulation results validate that the proposed approach can achieve superior system performance compared to the other benchmark schemes.
Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization
Authors: Yoichi Ishibashi, Yoshimasa Nishimura
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract
Recent advancements in automatic code generation using large language model (LLM) agent have brought us closer to the future of automated software development. However, existing single-agent approaches face limitations in generating and improving large-scale, complex codebases due to constraints in context length. To tackle this challenge, we propose Self-Organized multi-Agent framework (SoA), a novel multi-agent framework that enables the scalable and efficient generation and optimization of large-scale code. In SoA, self-organized agents operate independently to generate and modify code components while seamlessly collaborating to construct the overall codebase. A key feature of our framework is the automatic multiplication of agents based on problem complexity, allowing for dynamic scalability. This enables the overall code volume to be increased indefinitely according to the number of agents, while the amount of code managed by each agent remains constant. We evaluate SoA on the HumanEval benchmark and demonstrate that, compared to a single-agent system, each agent in SoA handles significantly less code, yet the overall generated code is substantially greater. Moreover, SoA surpasses the powerful single-agent baseline by 5% in terms of Pass@1 accuracy.
LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP
Authors: Yunshi Huang, Fereshteh Shakeri, Jose Dolz, Malik Boudiaf, Houda Bahig, Ismail Ben Ayed
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In a recent, strongly emergent literature on few-shot CLIP adaptation, Linear Probe (LP) has been often reported as a weak baseline. This has motivated intensive research building convoluted prompt learning or feature adaptation strategies. In this work, we propose and examine from convex-optimization perspectives a generalization of the standard LP baseline, in which the linear classifier weights are learnable functions of the text embedding, with class-wise multipliers blending image and text knowledge. As our objective function depends on two types of variables, i.e., the class visual prototypes and the learnable blending parameters, we propose a computationally efficient block coordinate Majorize-Minimize (MM) descent algorithm. In our full-batch MM optimizer, which we coin LP++, step sizes are implicit, unlike standard gradient descent practices where learning rates are intensively searched over validation sets. By examining the mathematical properties of our loss (e.g., Lipschitz gradient continuity), we build majorizing functions yielding data-driven learning rates and derive approximations of the loss's minima, which provide data-informed initialization of the variables. Our image-language objective function, along with these non-trivial optimization insights and ingredients, yields, surprisingly, highly competitive few-shot CLIP performances. Furthermore, LP++ operates in black-box, relaxes intensive validation searches for the optimization hyper-parameters, and runs orders-of-magnitudes faster than state-of-the-art few-shot CLIP adaptation methods. Our code is available at: \url{https://github.com/FereshteShakeri/FewShot-CLIP-Strong-Baseline.git}.
Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization
Authors: Tobias Schnabel, Jennifer Neville
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Large language models (LLMs) can now handle longer and more complex inputs, which facilitate the use of more elaborate prompts. However, prompts often require some tuning to improve performance for deployment. Recent work has proposed automatic prompt optimization methods, but as prompt complexity and LLM strength increase, many prompt optimization techniques are no longer sufficient and a new approach is needed to optimize {\em meta prompt programs}. To address this, we introduce SAMMO, a framework for {\em compile-time} optimizations of metaprompt programs, which represent prompts as structured objects that allows for a rich set of transformations that can be searched over during optimization. We show that SAMMO generalizes previous methods and improves the performance of complex prompts on (1) instruction tuning, (2) RAG pipeline tuning, and (3) prompt compression, across several different LLMs. We make all code available open-source at https://github.com/microsoft/sammo .
Comparative Study of Domain Driven Terms Extraction Using Large Language Models
Abstract
Keywords play a crucial role in bridging the gap between human understanding and machine processing of textual data. They are essential to data enrichment because they form the basis for detailed annotations that provide a more insightful and in-depth view of the underlying data. Keyword/domain driven term extraction is a pivotal task in natural language processing, facilitating information retrieval, document summarization, and content categorization. This review focuses on keyword extraction methods, emphasizing the use of three major Large Language Models(LLMs): Llama2-7B, GPT-3.5, and Falcon-7B. We employed a custom Python package to interface with these LLMs, simplifying keyword extraction. Our study, utilizing the Inspec and PubMed datasets, evaluates the performance of these models. The Jaccard similarity index was used for assessment, yielding scores of 0.64 (Inspec) and 0.21 (PubMed) for GPT-3.5, 0.40 and 0.17 for Llama2-7B, and 0.23 and 0.12 for Falcon-7B. This paper underlines the role of prompt engineering in LLMs for better keyword extraction and discusses the impact of hallucination in LLMs on result evaluation. It also sheds light on the challenges in using LLMs for keyword extraction, including model complexity, resource demands, and optimization techniques.
EnergAIze: Multi Agent Deep Deterministic Policy Gradient for Vehicle to Grid Energy Management
Authors: Tiago Fonseca, Luis Ferreira, Bernardo Cabral, Ricardo Severino, Isabel Praca
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
Abstract
This paper investigates the increasing roles of Renewable Energy Sources (RES) and Electric Vehicles (EVs). While indicating a new era of sustainable energy, these also introduce complex challenges, including the need to balance supply and demand and smooth peak consumptions amidst rising EV adoption rates. Addressing these challenges requires innovative solutions such as Demand Response (DR), energy flexibility management, Renewable Energy Communities (RECs), and more specifically for EVs, Vehicle-to-Grid (V2G). However, existing V2G approaches often fall short in real-world adaptability, global REC optimization with other flexible assets, scalability, and user engagement. To bridge this gap, this paper introduces EnergAIze, a Multi-Agent Reinforcement Learning (MARL) energy management framework, leveraging the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. EnergAIze enables user-centric and multi-objective energy management by allowing each prosumer to select from a range of personal management objectives, thus encouraging engagement. Additionally, it architects' data protection and ownership through decentralized computing, where each prosumer can situate an energy management optimization node directly at their own dwelling. The local node not only manages local energy assets but also fosters REC wide optimization. The efficacy of EnergAIze was evaluated through case studies employing the CityLearn simulation framework. These simulations were instrumental in demonstrating EnergAIze's adeptness at implementing V2G technology within a REC and other energy assets. The results show reduction in peak loads, ramping, carbon emissions, and electricity costs at the REC level while optimizing for individual prosumers objectives.
Abstract
Most 3D Gaussian Splatting (3D-GS) based methods for urban scenes initialize 3D Gaussians directly with 3D LiDAR points, which not only underutilizes LiDAR data capabilities but also overlooks the potential advantages of fusing LiDAR with camera data. In this paper, we design a novel tightly coupled LiDAR-Camera Gaussian Splatting (TCLC-GS) to fully leverage the combined strengths of both LiDAR and camera sensors, enabling rapid, high-quality 3D reconstruction and novel view RGB/depth synthesis. TCLC-GS designs a hybrid explicit (colorized 3D mesh) and implicit (hierarchical octree feature) 3D representation derived from LiDAR-camera data, to enrich the properties of 3D Gaussians for splatting. 3D Gaussian's properties are not only initialized in alignment with the 3D mesh which provides more completed 3D shape and color information, but are also endowed with broader contextual information through retrieved octree implicit features. During the Gaussian Splatting optimization process, the 3D mesh offers dense depth information as supervision, which enhances the training process by learning of a robust geometry. Comprehensive evaluations conducted on the Waymo Open Dataset and nuScenes Dataset validate our method's state-of-the-art (SOTA) performance. Utilizing a single NVIDIA RTX 3090 Ti, our method demonstrates fast training and achieves real-time RGB and depth rendering at 90 FPS in resolution of 1920x1280 (Waymo), and 120 FPS in resolution of 1600x900 (nuScenes) in urban scenarios.
A Unified Editing Method for Co-Speech Gesture Generation via Diffusion Inversion
Authors: Zeyu Zhao, Nan Gao, Zhi Zeng, Guixuan Zhang, Jie Liu, Shuwu Zhang
Abstract
Diffusion models have shown great success in generating high-quality co-speech gestures for interactive humanoid robots or digital avatars from noisy input with the speech audio or text as conditions. However, they rarely focus on providing rich editing capabilities for content creators other than high-level specialized measures like style conditioning. To resolve this, we propose a unified framework utilizing diffusion inversion that enables multi-level editing capabilities for co-speech gesture generation without re-training. The method takes advantage of two key capabilities of invertible diffusion models. The first is that through inversion, we can reconstruct the intermediate noise from gestures and regenerate new gestures from the noise. This can be used to obtain gestures with high-level similarities to the original gestures for different speech conditions. The second is that this reconstruction reduces activation caching requirements during gradient calculation, making the direct optimization on input noises possible on current hardware with limited memory. With different loss functions designed for, e.g., joint rotation or velocity, we can control various low-level details by automatically tweaking the input noises through optimization. Extensive experiments on multiple use cases show that this framework succeeds in unifying high-level and low-level co-speech gesture editing.
MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms
Abstract
With its elastic power and a pay-as-you-go cost model, the deployment of deep learning inference services (DLISs) on serverless platforms is emerging as a prevalent trend. However, the varying resource requirements of different layers in DL models hinder resource utilization and increase costs, when DLISs are deployed as a single function on serverless platforms. To tackle this problem, we propose a model partitioning framework called MOPAR. This work is based on the two resource usage patterns of DLISs: global differences and local similarity, due to the presence of resource dominant (RD) operators and layer stacking. Considering these patterns, MOPAR adopts a hybrid approach that initially divides the DL model vertically into multiple slices composed of similar layers to improve resource efficiency. Slices containing RD operators are further partitioned into multiple sub-slices, enabling parallel optimization to reduce inference latency. Moreover, MOPAR comprehensively employs data compression and share-memory techniques to offset the additional time introduced by communication between slices. We implement a prototype of MOPAR and evaluate its efficacy using four categories of 12 DL models on OpenFaaS and AWS Lambda. The experiment results show that MOPAR can improve the resource efficiency of DLISs by 27.62\% on average, while reducing latency by about 5.52\%. Furthermore, based on Lambda's pricing, the cost of running DLISs is reduced by about 2.58 $\times$ using MOPAR.
Joint Optimization on Uplink OFDMA and MU-MIMO for IEEE 802.11ax: Deep Hierarchical Reinforcement Learning Approach
Authors: Hyeonho Noh, Harim Lee, Hyun Jong Yang
Subjects: Systems and Control (eess.SY); Information Theory (cs.IT)
Abstract
This letter tackles a joint user scheduling, frequency resource allocation (USRA), multi-input-multi-output mode selection (MIMO MS) between single-user MIMO and multi-user (MU) MIMO, and MU-MIMO user selection problem, integrating uplink orthogonal frequency division multiple access (OFDMA) in IEEE 802.11ax. Specifically, we focus on \textit{unsaturated traffic conditions} where users' data demands fluctuate. In unsaturated traffic conditions, considering packet volumes per user introduces a combinatorial problem, requiring the simultaneous optimization of MU-MIMO user selection and RA along the time-frequency-space axis. Consequently, dealing with the combinatorial nature of this problem, characterized by a large cardinality of unknown variables, poses a challenge that conventional optimization methods find nearly impossible to address. In response, this letter proposes an approach with deep hierarchical reinforcement learning (DHRL) to solve the joint problem. Rather than simply adopting off-the-shelf DHRL, we \textit{tailor} the DHRL to the joint USRA and MS problem, thereby significantly improving the convergence speed and throughput. Extensive simulation results show that the proposed algorithm achieves significantly improved throughput compared to the existing schemes under various unsaturated traffic conditions.
Versatile Scene-Consistent Traffic Scenario Generation as Optimization with Diffusion
Abstract
Generating realistic and controllable agent behaviors in traffic simulation is crucial for the development of autonomous vehicles. This problem is often formulated as imitation learning (IL) from real-world driving data by either directly predicting future trajectories or inferring cost functions with inverse optimal control. In this paper, we draw a conceptual connection between IL and diffusion-based generative modeling and introduce a novel framework Versatile Behavior Diffusion (VBD) to simulate interactive scenarios with multiple traffic participants. Our model not only generates scene-consistent multi-agent interactions but also enables scenario editing through multi-step guidance and refinement. Experimental evaluations show that VBD achieves state-of-the-art performance on the Waymo Sim Agents benchmark. In addition, we illustrate the versatility of our model by adapting it to various applications. VBD is capable of producing scenarios conditioning on priors, integrating with model-based optimization, sampling multi-modal scene-consistent scenarios by fusing marginal predictions, and generating safety-critical scenarios when combined with a game-theoretic solver.
Adaptive hp-Polynomial Based Sparse Grid Collocation Algorithms for Piecewise Smooth Functions with Kinks
Abstract
High-dimensional interpolation problems appear in various applications of uncertainty quantification, stochastic optimization and machine learning. Such problems are computationally expensive and request the use of adaptive grid generation strategies like anisotropic sparse grids to mitigate the curse of dimensionality. However, it is well known that the standard dimension-adaptive sparse grid method converges very slowly or even fails in the case of non-smooth functions. For piecewise smooth functions with kinks, we construct two novel hp-adaptive sparse grid collocation algorithms that combine low-order basis functions with local support in parts of the domain with less regularity and variable-order basis functions elsewhere. Spatial refinement is realized by means of a hierarchical multivariate knot tree which allows the construction of localised hierarchical basis functions with varying order. Hierarchical surplus is used as an error indicator to automatically detect the non-smooth region and adaptively refine the collocation points there. The local polynomial degrees are optionally selected by a greedy approach or a kink detection procedure. Three numerical benchmark examples with different dimensions are discussed and comparison with locally linear and highest degree basis functions are given to show the efficiency and accuracy of the proposed methods.
Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering
Abstract
We present a proximal policy optimization (PPO) agent trained through curriculum learning (CL) principles and meticulous reward engineering to optimize a real-world high-throughput waste sorting facility. Our work addresses the challenge of effectively balancing the competing objectives of operational safety, volume optimization, and minimizing resource usage. A vanilla agent trained from scratch on these multiple criteria fails to solve the problem due to its inherent complexities. This problem is particularly difficult due to the environment's extremely delayed rewards with long time horizons and class (or action) imbalance, with important actions being infrequent in the optimal policy. This forces the agent to anticipate long-term action consequences and prioritize rare but rewarding behaviours, creating a non-trivial reinforcement learning task. Our five-stage CL approach tackles these challenges by gradually increasing the complexity of the environmental dynamics during policy transfer while simultaneously refining the reward mechanism. This iterative and adaptable process enables the agent to learn a desired optimal policy. Results demonstrate that our approach significantly improves inference-time safety, achieving near-zero safety violations in addition to enhancing waste sorting plant efficiency.
Vocabulary Attack to Hijack Large Language Model Applications
Authors: Patrick Levi, Christoph P. Neumann
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
The fast advancements in Large Language Models (LLMs) are driving an increasing number of applications. Together with the growing number of users, we also see an increasing number of attackers who try to outsmart these systems. They want the model to reveal confidential information, specific false information, or offensive behavior. To this end, they manipulate their instructions for the LLM by inserting separators or rephrasing them systematically until they reach their goal. Our approach is different. It inserts words from the model vocabulary. We find these words using an optimization procedure and embeddings from another LLM (attacker LLM). We prove our approach by goal hijacking two popular open-source LLMs from the Llama2 and the Flan-T5 families, respectively. We present two main findings. First, our approach creates inconspicuous instructions and therefore it is hard to detect. For many attack cases, we find that even a single word insertion is sufficient. Second, we demonstrate that we can conduct our attack using a different model than the target model to conduct our attack with.
Leveraging Swarm Intelligence to Drive Autonomously: A Particle Swarm Optimization based Approach to Motion Planning
Authors: Sven Ochs, Jens Doll, Marc Heinrich, Philip Schörner, Sebastian Klemm, Marc René Zofka, J. Marius Zöllner
Abstract
Motion planning is an essential part of autonomous mobile platforms. A good pipeline should be modular enough to handle different vehicles, environments, and perception modules. The planning process has to cope with all the different modalities and has to have a modular and flexible design. But most importantly, it has to be safe and robust. In this paper, we want to present our motion planning pipeline with particle swarm optimization (PSO) at its core. This solution is independent of the vehicle type and has a clear and simple-to-implement interface for perception modules. Moreover, the approach stands out for being easily adaptable to new scenarios. Parallel calculation allows for fast planning cycles. Following the principles of PSO, the trajectory planer first generates a swarm of initial trajectories that are optimized afterward. We present the underlying control space and inner workings. Finally, the application to real-world automated driving is shown in the evaluation with a deeper look at the modeling of the cost function. The approach is used in our automated shuttles that have already driven more than 3.500 km safely and entirely autonomously in sub-urban everyday traffic.
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models
Abstract
Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler (RKL) divergence is mode-seeking and thus preferable over the mean-seeking forward Kullback-Leibler (FKL) divergence, this study empirically and theoretically demonstrates that neither mode-seeking nor mean-seeking properties manifest in KD for LLMs. Instead, RKL and FKL are found to share the same optimization objective and both converge after a sufficient number of epochs. However, due to practical constraints, LLMs are seldom trained for such an extensive number of epochs. Meanwhile, we further find that RKL focuses on the tail part of the distributions, while FKL focuses on the head part at the beginning epochs. Consequently, we propose a simple yet effective Adaptive Kullback-Leiber (AKL) divergence method, which adaptively allocates weights to combine FKL and RKL. Metric-based and GPT-4-based evaluations demonstrate that the proposed AKL outperforms the baselines across various tasks and improves the diversity and quality of generated responses.
Automatic Prompt Selection for Large Language Models
Authors: Viet-Tung Do, Van-Khanh Hoang, Duy-Hung Nguyen, Shahab Sabahi, Jeff Yang, Hajime Hotta, Minh-Tien Nguyen, Hung Le
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Large Language Models (LLMs) can perform various natural language processing tasks with suitable instruction prompts. However, designing effective prompts manually is challenging and time-consuming. Existing methods for automatic prompt optimization either lack flexibility or efficiency. In this paper, we propose an effective approach to automatically select the optimal prompt for a given input from a finite set of synthetic candidate prompts. Our approach consists of three steps: (1) clustering the training data and generating candidate prompts for each cluster using an LLM-based prompt generator; (2) synthesizing a dataset of input-prompt-output tuples for training a prompt evaluator to rank the prompts based on their relevance to the input; (3) using the prompt evaluator to select the best prompt for a new input at test time. Our approach balances prompt generality-specificity and eliminates the need for resource-intensive training and inference. It demonstrates competitive performance on zero-shot question-answering datasets: GSM8K, MultiArith, and AQuA.
LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis
Abstract
Although neural radiance fields (NeRFs) have achieved triumphs in image novel view synthesis (NVS), LiDAR NVS remains largely unexplored. Previous LiDAR NVS methods employ a simple shift from image NVS methods while ignoring the dynamic nature and the large-scale reconstruction problem of LiDAR point clouds. In light of this, we propose LiDAR4D, a differentiable LiDAR-only framework for novel space-time LiDAR view synthesis. In consideration of the sparsity and large-scale characteristics, we design a 4D hybrid representation combined with multi-planar and grid features to achieve effective reconstruction in a coarse-to-fine manner. Furthermore, we introduce geometric constraints derived from point clouds to improve temporal consistency. For the realistic synthesis of LiDAR point clouds, we incorporate the global optimization of ray-drop probability to preserve cross-region patterns. Extensive experiments on KITTI-360 and NuScenes datasets demonstrate the superiority of our method in accomplishing geometry-aware and time-consistent dynamic reconstruction. Codes are available at https://github.com/ispc-lab/LiDAR4D.
Unsupervised Occupancy Learning from Sparse Point Cloud
Abstract
Implicit Neural Representations have gained prominence as a powerful framework for capturing complex data modalities, encompassing a wide range from 3D shapes to images and audio. Within the realm of 3D shape representation, Neural Signed Distance Functions (SDF) have demonstrated remarkable potential in faithfully encoding intricate shape geometry. However, learning SDFs from 3D point clouds in the absence of ground truth supervision remains a very challenging task. In this paper, we propose a method to infer occupancy fields instead of SDFs as they are easier to learn from sparse inputs. We leverage a margin-based uncertainty measure to differentially sample from the decision boundary of the occupancy function and supervise the sampled boundary points using the input point cloud. We further stabilize the optimization process at the early stages of the training by biasing the occupancy function towards minimal entropy fields while maximizing its entropy at the input point cloud. Through extensive experiments and evaluations, we illustrate the efficacy of our proposed method, highlighting its capacity to improve implicit shape inference with respect to baselines and the state-of-the-art using synthetic and real data.
Planning for Robust Open-loop Pushing: Exploiting Quasi-static Belief Dynamics and Contact-informed Optimization
Authors: Julius Jankowski, Lara Brudermüller, Nick Hawes, Sylvain Calinon
Abstract
Non-prehensile manipulation such as pushing is typically subject to uncertain, non-smooth dynamics. However, modeling the uncertainty of the dynamics typically results in intractable belief dynamics, making data-efficient planning under uncertainty difficult. This article focuses on the problem of efficiently generating robust open-loop pushing plans. First, we investigate how the belief over object configurations propagates through quasi-static contact dynamics. We exploit the simplified dynamics to predict the variance of the object configuration without sampling from a perturbation distribution. In a sampling-based trajectory optimization algorithm, the gain of the variance is constrained in order to enforce robustness of the plan. Second, we propose an informed trajectory sampling mechanism for drawing robot trajectories that are likely to make contact with the object. This sampling mechanism is shown to significantly improve chances of finding robust solutions, especially when making-and-breaking contacts is required. We demonstrate that the proposed approach is able to synthesize bi-manual pushing trajectories, resulting in successful long-horizon pushing maneuvers without exteroceptive feedback such as vision or tactile feedback.
Wideband Beamforming for Near-Field Communications with Circular Arrays
Authors: Yunhui Guo, Yang Zhang, Zhaolin Wang, Yuanwei Liu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
The beamforming performance of the uniform circular array (UCA) in near-field wideband communication systems is investigated. Compared to uniform linear array (ULA), UCA exhibits uniform effective array aperture in all directions, thus enabling more users to benefit from near-field communications. In this paper, the unique beam squint effect in near-field wideband UCA systems is comprehensively analyzed in both the distance and angular domains. It is rigorously demonstrated that the beam focal point only exists at a specific frequency in wideband UCA systems, resulting in significant beamforming loss. To alleviate this unique beam squint effect, the true-time delay (TTD)-based beamforming architecture is exploited. In particular, two wideband beamforming optimization approaches leveraging TTD units are proposed. 1) Analytical approach: In this approach, the phase shifters (PSs) and the time delay of TTD units are designed based on the analytical formula for beamforming gain. Following this design, the minimum number of TTD units required to achieve a predetermined beamforming gain is quantified. 2) Joint-optimization approach: In this method, the PSs and the TTD units are jointly optimized under practical maximum delay constraints to approximate the optimal unconstrained analog beamformer. Specifically, an efficient alternating optimization algorithm is proposed, where the PSs and the TTD units are alternately updated using either the closed-form solution or the low-complexity linear search approach. Extensive numerical results demonstrate that 1) the proposed beamforming schemes effectively mitigate the beam squint effect, and 2) the joint-optimization approach outperforms the analytical approach in terms of array gain and achievable spectral efficiency.
A Survey of Optimization-based Task and Motion Planning: From Classical To Learning Approaches
Authors: Zhigen Zhao, Shuo Chen, Yan Ding, Ziyi Zhou, Shiqi Zhang, Danfei Xu, Ye Zhao
Abstract
Task and Motion Planning (TAMP) integrates high-level task planning and low-level motion planning to equip robots with the autonomy to effectively reason over long-horizon, dynamic tasks. Optimization-based TAMP focuses on hybrid optimization approaches that define goal conditions via objective functions and are capable of handling open-ended goals, robotic dynamics, and physical interaction between the robot and the environment. Therefore, optimization-based TAMP is particularly suited to solve highly complex, contact-rich locomotion and manipulation problems. This survey provides a comprehensive review on optimization-based TAMP, covering (i) planning domain representations, including action description languages and temporal logic, (ii) individual solution strategies for components of TAMP, including AI planning and trajectory optimization (TO), and (iii) the dynamic interplay between logic-based task planning and model-based TO. A particular focus of this survey is to highlight the algorithm structures to efficiently solve TAMP, especially hierarchical and distributed approaches. Additionally, the survey emphasizes the synergy between the classical methods and contemporary learning-based innovations such as large language models. Furthermore, the future research directions for TAMP is discussed in this survey, highlighting both algorithmic and application-specific challenges.
BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models
Abstract
This work presents BAdam, an optimizer that leverages the block coordinate optimization framework with Adam as the inner solver. BAdam offers a memory efficient approach to the full parameter finetuning of large language models and reduces running time of the backward process thanks to the chain rule property. Experimentally, we apply BAdam to instruction-tune the Llama 2-7B model on the Alpaca-GPT4 dataset using a single RTX3090-24GB GPU. The results indicate that BAdam exhibits superior convergence behavior in comparison to LoRA and LOMO. Furthermore, our downstream performance evaluation of the instruction-tuned models using the MT-bench shows that BAdam modestly surpasses LoRA and more substantially outperforms LOMO. Finally, we compare BAdam with Adam on a medium-sized task, i.e., finetuning RoBERTa-large on the SuperGLUE benchmark. The results demonstrate that BAdam is capable of narrowing the performance gap with Adam. Our code is available at https://github.com/Ledzy/BAdam.
Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models
Abstract
This paper reveals the phenomenon of parameter heterogeneity in large language models (LLMs). We find that a small subset of ``cherry'' parameters exhibit a disproportionately large influence on model performance, while the vast majority of parameters have minimal impact. This heterogeneity is found to be prevalent across different model families, scales, and types. Motivated by this observation, we propose CherryQ, a novel quantization method that unifies the optimization of mixed-precision parameters. CherryQ identifies and preserves the critical cherry parameters in high precision while aggressively quantizing the remaining parameters to low precision. Extensive experiments demonstrate the effectiveness of CherryQ. CherryQ outperforms existing quantization approaches in terms of perplexity and downstream task performance. Notably, our 3-bit quantized Vicuna-1.5 exhibits competitive performance compared to their 16-bit counterparts. These findings highlight the potential of CherryQ for enabling efficient deployment of LLMs by taking advantage of parameter heterogeneity.
Integrating Explanations in Learning LTL Specifications from Demonstrations
Abstract
This paper investigates whether recent advances in Large Language Models (LLMs) can assist in translating human explanations into a format that can robustly support learning Linear Temporal Logic (LTL) from demonstrations. Both LLMs and optimization-based methods can extract LTL specifications from demonstrations; however, they have distinct limitations. LLMs can quickly generate solutions and incorporate human explanations, but their lack of consistency and reliability hampers their applicability in safety-critical domains. On the other hand, optimization-based methods do provide formal guarantees but cannot process natural language explanations and face scalability challenges. We present a principled approach to combining LLMs and optimization-based methods to faithfully translate human explanations and demonstrations into LTL specifications. We have implemented a tool called Janaka based on our approach. Our experiments demonstrate the effectiveness of combining explanations with demonstrations in learning LTL specifications through several case studies.
Learning Quadrupedal Locomotion via Differentiable Simulation
Authors: Clemens Schwarke, Victor Klemm, Jesus Tordesillas, Jean-Pierre Sleiman, Marco Hutter
Abstract
The emergence of differentiable simulators enabling analytic gradient computation has motivated a new wave of learning algorithms that hold the potential to significantly increase sample efficiency over traditional Reinforcement Learning (RL) methods. While recent research has demonstrated performance gains in scenarios with comparatively smooth dynamics and, thus, smooth optimization landscapes, research on leveraging differentiable simulators for contact-rich scenarios, such as legged locomotion, is scarce. This may be attributed to the discontinuous nature of contact, which introduces several challenges to optimizing with analytic gradients. The purpose of this paper is to determine if analytic gradients can be beneficial even in the face of contact. Our investigation focuses on the effects of different soft and hard contact models on the learning process, examining optimization challenges through the lens of contact simulation. We demonstrate the viability of employing analytic gradients to learn physically plausible locomotion skills with a quadrupedal robot using Short-Horizon Actor-Critic (SHAC), a learning algorithm leveraging analytic gradients, and draw a comparison to a state-of-the-art RL algorithm, Proximal Policy Optimization (PPO), to understand the benefits of analytic gradients.
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Abstract
Large language models (LLMs) have shown excellent mastering of human language, but still struggle in real-world applications that require mathematical problem-solving. While many strategies and datasets to enhance LLMs' mathematics are developed, it remains a challenge to simultaneously maintain and improve both language and mathematical capabilities in deployed LLM systems.In this work, we tailor the Self-Critique pipeline, which addresses the challenge in the feedback learning stage of LLM alignment. We first train a general Math-Critique model from the LLM itself to provide feedback signals. Then, we sequentially employ rejective fine-tuning and direct preference optimization over the LLM's own generations for data collection. Based on ChatGLM3-32B, we conduct a series of experiments on both academic and our newly created challenging dataset, MathUserEval. Results show that our pipeline significantly enhances the LLM's mathematical problem-solving while still improving its language ability, outperforming LLMs that could be two times larger. Related techniques have been deployed to ChatGLM\footnote{\url{https://chatglm.cn}}, an online serving LLM. Related evaluation dataset and scripts are released at \url{https://github.com/THUDM/ChatGLM-Math}.
A Mean Field Game Model for Timely Computation in Edge Computing Systems
Authors: Shubham Aggarwal, Muhammad Aneeq uz Zaman, Melih Bastopcu, Sennur Ulukus, Tamer Başar
Subjects: Information Theory (cs.IT); Computer Science and Game Theory (cs.GT); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Abstract
We consider the problem of task offloading in multi-access edge computing (MEC) systems constituting $N$ devices assisted by an edge server (ES), where the devices can split task execution between a local processor and the ES. Since the local task execution and communication with the ES both consume power, each device must judiciously choose between the two. We model the problem as a large population non-cooperative game among the $N$ devices. Since computation of an equilibrium in this scenario is difficult due to the presence of a large number of devices, we employ the mean-field game framework to reduce the finite-agent game problem to a generic user's multi-objective optimization problem, with a coupled consistency condition. By leveraging the novel age of information (AoI) metric, we invoke techniques from stochastic hybrid systems (SHS) theory and study the tradeoffs between increasing information freshness and reducing power consumption. In numerical simulations, we validate that a higher load at the ES may lead devices to upload their task to the ES less often.
Keyword: deep learning
Remote sensing framework for geological mapping via stacked autoencoders and clustering
Authors: Sandeep Nagar, Ehsan Farahbakhsh, Joseph Awange, Rohitash Chandra
Abstract
Supervised learning methods for geological mapping via remote sensing face limitations due to the scarcity of accurately labelled training data. In contrast, unsupervised learning methods, such as dimensionality reduction and clustering have the ability to uncover patterns and structures in remote sensing data without relying on predefined labels. Dimensionality reduction methods have the potential to play a crucial role in improving the accuracy of geological maps. Although conventional dimensionality reduction methods may struggle with nonlinear data, unsupervised deep learning models such as autoencoders have the ability to model nonlinear relationship in data. Stacked autoencoders feature multiple interconnected layers to capture hierarchical data representations that can be useful for remote sensing data. In this study, we present an unsupervised machine learning framework for processing remote sensing data by utilizing stacked autoencoders for dimensionality reduction and k-means clustering for mapping geological units. We use the Landsat-8, ASTER, and Sentinel-2 datasets of the Mutawintji region in Western New South Wales, Australia to evaluate the framework for geological mapping. We also provide a comparison of stacked autoencoders with principal component analysis and canonical autoencoders. Our results reveal that the framework produces accurate and interpretable geological maps, efficiently discriminating rock units. We find that the stacked autoencoders provide better accuracy when compared to the counterparts. We also find that the generated maps align with prior geological knowledge of the study area while providing novel insights into geological structures.
A Generative Deep Learning Approach for Crash Severity Modeling with Imbalanced Data
Abstract
Crash data is often greatly imbalanced, with the majority of crashes being non-fatal crashes, and only a small number being fatal crashes due to their rarity. Such data imbalance issue poses a challenge for crash severity modeling since it struggles to fit and interpret fatal crash outcomes with very limited samples. Usually, such data imbalance issues are addressed by data resampling methods, such as under-sampling and over-sampling techniques. However, most traditional and deep learning-based data resampling methods, such as synthetic minority oversampling technique (SMOTE) and generative Adversarial Networks (GAN) are designed dedicated to processing continuous variables. Though some resampling methods have improved to handle both continuous and discrete variables, they may have difficulties in dealing with the collapse issue associated with sparse discrete risk factors. Moreover, there is a lack of comprehensive studies that compare the performance of various resampling methods in crash severity modeling. To address the aforementioned issues, the current study proposes a crash data generation method based on the Conditional Tabular GAN. After data balancing, a crash severity model is employed to estimate the performance of classification and interpretation. A comparative study is conducted to assess classification accuracy and distribution consistency of the proposed generation method using a 4-year imbalanced crash dataset collected in Washington State, U.S. Additionally, Monte Carlo simulation is employed to estimate the performance of parameter and probability estimation in both two- and three-class imbalance scenarios. The results indicate that using synthetic data generated by CTGAN-RU for crash severity modeling outperforms using original data or synthetic data generated by other resampling methods.
Insights from the Use of Previously Unseen Neural Architecture Search Datasets
Authors: Rob Geada, David Towers, Matthew Forshaw, Amir Atapour-Abarghouei, A. Stephen McGough
Abstract
The boundless possibility of neural networks which can be used to solve a problem -- each with different performance -- leads to a situation where a Deep Learning expert is required to identify the best neural network. This goes against the hope of removing the need for experts. Neural Architecture Search (NAS) offers a solution to this by automatically identifying the best architecture. However, to date, NAS work has focused on a small set of datasets which we argue are not representative of real-world problems. We introduce eight new datasets created for a series of NAS Challenges: AddNIST, Language, MultNIST, CIFARTile, Gutenberg, Isabella, GeoClassing, and Chesseract. These datasets and challenges are developed to direct attention to issues in NAS development and to encourage authors to consider how their models will perform on datasets unknown to them at development time. We present experimentation using standard Deep Learning methods as well as the best results from challenge participants.
CHOSEN: Contrastive Hypothesis Selection for Multi-View Depth Refinement
Authors: Di Qiu, Yinda Zhang, Thabo Beeler, Vladimir Tankovich, Christian Häne, Sean Fanello, Christoph Rhemann, Sergio Orts Escolano
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
We propose CHOSEN, a simple yet flexible, robust and effective multi-view depth refinement framework. It can be employed in any existing multi-view stereo pipeline, with straightforward generalization capability for different multi-view capture systems such as camera relative positioning and lenses. Given an initial depth estimation, CHOSEN iteratively re-samples and selects the best hypotheses, and automatically adapts to different metric or intrinsic scales determined by the capture system. The key to our approach is the application of contrastive learning in an appropriate solution space and a carefully designed hypothesis feature, based on which positive and negative hypotheses can be effectively distinguished. Integrated in a simple baseline multi-view stereo pipeline, CHOSEN delivers impressive quality in terms of depth and normal accuracy compared to many current deep learning based multi-view stereo pipelines.
Smooth Deep Saliency
Authors: Rudolf Herdt, Maximilian Schmidt, Daniel Otero Baguer, Peter Maaß
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this work, we investigate methods to reduce the noise in deep saliency maps coming from convolutional downsampling, with the purpose of explaining how a deep learning model detects tumors in scanned histological tissue samples. Those methods make the investigated models more interpretable for gradient-based saliency maps, computed in hidden layers. We test our approach on different models trained for image classification on ImageNet1K, and models trained for tumor detection on Camelyon16 and in-house real-world digital pathology scans of stained tissue samples. Our results show that the checkerboard noise in the gradient gets reduced, resulting in smoother and therefore easier to interpret saliency maps.
Semantic Augmentation in Images using Language
Authors: Sahiti Yerramilli, Jayant Sravan Tamarapalli, Tanmay Girish Kulkarni, Jonathan Francis, Eric Nyberg
Abstract
Deep Learning models are incredibly data-hungry and require very large labeled datasets for supervised learning. As a consequence, these models often suffer from overfitting, limiting their ability to generalize to real-world examples. Recent advancements in diffusion models have enabled the generation of photorealistic images based on textual inputs. Leveraging the substantial datasets used to train these diffusion models, we propose a technique to utilize generated images to augment existing datasets. This paper explores various strategies for effective data augmentation to improve the out-of-domain generalization capabilities of deep learning models.
APC2Mesh: Bridging the gap from occluded building façades to full 3D models
Authors: Perpetual Hope Akwensi, Akshay Bharadwaj, Ruisheng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The benefits of having digital twins of urban buildings are numerous. However, a major difficulty encountered in their creation from airborne LiDAR point clouds is the effective means of accurately reconstructing significant occlusions amidst point density variations and noise. To bridge the noise/sparsity/occlusion gap and generate high fidelity 3D building models, we propose APC2Mesh which integrates point completion into a 3D reconstruction pipeline, enabling the learning of dense geometrically accurate representation of buildings. Specifically, we leveraged complete points generated from occluded ones as input to a linearized skip attention-based deformation network for 3D mesh reconstruction. In our experiments, conducted on 3 different scenes, we demonstrate that: (1) APC2Mesh delivers comparatively superior results, indicating its efficacy in handling the challenges of occluded airborne building points of diverse styles and complexities. (2) The combination of point completion with typical deep learning-based 3D point cloud reconstruction methods offers a direct and effective solution for reconstructing significantly occluded airborne building points. As such, this neural integration holds promise for advancing the creation of digital twins for urban buildings with greater accuracy and fidelity.
A neuroergonomics model to evaluating nuclear power plants operators' performance under heat stress driven by ECG time-frequency spectrums and fNIRS prefrontal cortex network: a CNN-GAT fusion model
Authors: Yan Zhang, Ming Jia, Meng Li, JianYu Wang, XiangMin Hu, ZhiHui Xu, Tao Chen
Abstract
Operators experience complicated physiological and psychological states when exposed to extreme heat stress, which can impair cognitive function and decrease performance significantly, ultimately leading to severe secondary disasters. Therefore, there is an urgent need for a feasible technique to identify their abnormal states to enhance the reliability of human-cybernetics systems. With the advancement of deep learning in physiological modeling, a model for evaluating operators' performance driven by electrocardiogram (ECG) and functional near-infrared spectroscopy (fNIRS) was proposed, demonstrating high ecological validity. The model fused a convolutional neural network (CNN) backbone and a graph attention network (GAT) backbone to extract discriminative features from ECG time-frequency spectrums and fNIRS prefrontal cortex (PFC) network respectively with deeper neuroscience domain knowledge, and eventually achieved 0.90 AUC. Results supported that handcrafted features extracted by specialized neuroscience methods can alleviate overfitting. Inspired by the small-world nature of the brain network, the fNIRS PFC network was organized as an undirected graph and embedded by GAT. It is proven to perform better in information aggregation and delivery compared to a simple non-linear transformation. The model provides a potential neuroergonomics application for evaluating the human state in vital human-cybernetics systems under industry 5.0 scenarios.
MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms
Abstract
With its elastic power and a pay-as-you-go cost model, the deployment of deep learning inference services (DLISs) on serverless platforms is emerging as a prevalent trend. However, the varying resource requirements of different layers in DL models hinder resource utilization and increase costs, when DLISs are deployed as a single function on serverless platforms. To tackle this problem, we propose a model partitioning framework called MOPAR. This work is based on the two resource usage patterns of DLISs: global differences and local similarity, due to the presence of resource dominant (RD) operators and layer stacking. Considering these patterns, MOPAR adopts a hybrid approach that initially divides the DL model vertically into multiple slices composed of similar layers to improve resource efficiency. Slices containing RD operators are further partitioned into multiple sub-slices, enabling parallel optimization to reduce inference latency. Moreover, MOPAR comprehensively employs data compression and share-memory techniques to offset the additional time introduced by communication between slices. We implement a prototype of MOPAR and evaluate its efficacy using four categories of 12 DL models on OpenFaaS and AWS Lambda. The experiment results show that MOPAR can improve the resource efficiency of DLISs by 27.62\% on average, while reducing latency by about 5.52\%. Furthermore, based on Lambda's pricing, the cost of running DLISs is reduced by about 2.58 $\times$ using MOPAR.
TSNet:A Two-stage Network for Image Dehazing with Multi-scale Fusion and Adaptive Learning
Authors: Xiaolin Gong, Zehan Zheng, Heyuan Du
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Image dehazing has been a popular topic of research for a long time. Previous deep learning-based image dehazing methods have failed to achieve satisfactory dehazing effects on both synthetic datasets and real-world datasets, exhibiting poor generalization. Moreover, single-stage networks often result in many regions with artifacts and color distortion in output images. To address these issues, this paper proposes a two-stage image dehazing network called TSNet, mainly consisting of the multi-scale fusion module (MSFM) and the adaptive learning module (ALM). Specifically, MSFM and ALM enhance the generalization of TSNet. The MSFM can obtain large receptive fields at multiple scales and integrate features at different frequencies to reduce the differences between inputs and learning objectives. The ALM can actively learn of regions of interest in images and restore texture details more effectively. Additionally, TSNet is designed as a two-stage network, where the first-stage network performs image dehazing, and the second-stage network is employed to improve issues such as artifacts and color distortion present in the results of the first-stage network. We also change the learning objective from ground truth images to opposite fog maps, which improves the learning efficiency of TSNet. Extensive experiments demonstrate that TSNet exhibits superior dehazing performance on both synthetic and real-world datasets compared to previous state-of-the-art methods.
Abstract
In this mini-review, we explore the new prediction methods for drug combination synergy relying on high-throughput combinatorial screens. The fast progress of the field is witnessed in the more than thirty original machine learning methods published since 2021, a clear majority of them based on deep learning techniques. We aim to put these papers under a unifying lens by highlighting the core technologies, the data sources, the input data types and synergy scores used in the methods, as well as the prediction scenarios and evaluation protocols that the papers deal with. Our finding is that the best methods accurately solve the synergy prediction scenarios involving known drugs or cell lines while the scenarios involving new drugs or cell lines still fall short of an accurate prediction level.
A Neural Multigrid Solver for Helmholtz Equations with High Wavenumber and Heterogeneous Media
Abstract
Solving high-wavenumber and heterogeneous Helmholtz equations presents a long-standing challenge in scientific computing. In this paper, we introduce a deep learning-enhanced multigrid solver to address this issue. By conducting error analysis on standard multigrid applied to a discrete Helmholtz equation, we devise a strategy to handle errors with different frequencies separately. For error components with frequencies distant from the wavenumber, we perform simple smoothing based on local operations at different levels to eliminate them. On the other hand, to address error components with frequencies near the wavenumber, we utilize another multigrid V-cycle to solve an advection-diffusion-reaction (ADR) equation at a coarse scale. The resulting solver, named Wave-ADR-NS, involves parameters learned through unsupervised training. Numerical results demonstrate that Wave-ADR-NS effectively resolves heterogeneous 2D Helmholtz equation with wavenumber up to 2000. Comparative experiments against classical multigrid preconditioners and existing deep learning-based multigrid preconditioners reveals the superior performance of Wave-ADR-NS.
Computationally Efficient Unsupervised Deep Learning for Robust Joint AP Clustering and Beamforming Design in Cell-Free Systems
Abstract
In this paper, we consider robust joint access point (AP) clustering and beamforming design with imperfect channel state information (CSI) in cell-free systems. Specifically, we jointly optimize AP clustering and beamforming with imperfect CSI to simultaneously maximize the worst-case sum rate and minimize the number of AP clustering under power constraint and the sparsity constraint of AP clustering. By transformations, the semi-infinite constraints caused by the imperfect CSI are converted into more tractable forms for facilitating a computationally efficient unsupervised deep learning algorithm. In addition, to further reduce the computational complexity, a computationally effective unsupervised deep learning algorithm is proposed to implement robust joint AP clustering and beamforming design with imperfect CSI in cell-free systems. Numerical results demonstrate that the proposed unsupervised deep learning algorithm achieves a higher worst-case sum rate under a smaller number of AP clustering with computational efficiency.
An Interpretable Power System Transient Stability Assessment Method with Expert Guiding Neural-Regression-Tree
Authors: Hanxuan Wang, Na Lu, Zixuan Wang, Jiacheng Liu, Jun Liu
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Abstract
Deep learning based transient stability assessment (TSA) has achieved great success, yet the lack of interpretability hinders its industrial application. Although a great number of studies have tried to explore the interpretability of network solutions, many problems still remain unsolved: (1) the difference between the widely accepted power system knowledge and the generated interpretive rules is large, (2) the probability characteristics of the neural network have not been fully considered during generating the interpretive rules, (3) the cost of the trade-off between accuracy and interpretability is too heavy to take. To address these issues, an interpretable power system Transient Stability Assessment method with Expert guiding Neural-Regression-Tree (TSA-ENRT) is proposed. TSA-ENRT utilizes an expert guiding nonlinear regression tree to approximate the neural network prediction and the neural network can be explained by the interpretive rules generated by the tree model. The nonlinearity of the expert guiding nonlinear regression tree is endowed with the extracted knowledge from a simple two-machine three-bus power system, which forms an expert knowledge base and thus the generated interpretive rules are more consistent with human cognition. Besides, the expert guiding tree model can build a bridge between the interpretive rules and the probability prediction of neural network in a regression way. By regularizing the neural network with the average decision length of ENRT, the association of the neural network and tree model is constructed in the model training level which provides a better trade-off between accuracy and interpretability. Extensive experiments indicate the interpretive rules generated by the proposed TSA-ENRT are highly consistent with the neural network prediction and more agreed with human expert cognition.
Representation Alignment Contrastive Regularization for Multi-Object Tracking
Authors: Shujie Chen, Zhonglin Liu, Jianfeng Dong, Di Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Achieving high-performance in multi-object tracking algorithms heavily relies on modeling spatio-temporal relationships during the data association stage. Mainstream approaches encompass rule-based and deep learning-based methods for spatio-temporal relationship modeling. While the former relies on physical motion laws, offering wider applicability but yielding suboptimal results for complex object movements, the latter, though achieving high-performance, lacks interpretability and involves complex module designs. This work aims to simplify deep learning-based spatio-temporal relationship models and introduce interpretability into features for data association. Specifically, a lightweight single-layer transformer encoder is utilized to model spatio-temporal relationships. To make features more interpretative, two contrastive regularization losses based on representation alignment are proposed, derived from spatio-temporal consistency rules. By applying weighted summation to affinity matrices, the aligned features can seamlessly integrate into the data association stage of the original tracking workflow. Experimental results showcase that our model enhances the majority of existing tracking networks' performance without excessive complexity, with minimal increase in training overhead and nearly negligible computational and storage costs.
Fusing Multi-sensor Input with State Information on TinyML Brains for Autonomous Nano-drones
Abstract
Autonomous nano-drones (~10 cm in diameter), thanks to their ultra-low power TinyML-based brains, are capable of coping with real-world environments. However, due to their simplified sensors and compute units, they are still far from the sense-and-act capabilities shown in their bigger counterparts. This system paper presents a novel deep learning-based pipeline that fuses multi-sensorial input (i.e., low-resolution images and 8x8 depth map) with the robot's state information to tackle a human pose estimation task. Thanks to our design, the proposed system -- trained in simulation and tested on a real-world dataset -- improves a state-unaware State-of-the-Art baseline by increasing the R^2 regression metric up to 0.10 on the distance's prediction.
Active learning for efficient annotation in precision agriculture: a use-case on crop-weed semantic segmentation
Authors: Bart M. van Marrewijk, Charbel Dandjinou, Dan Jeric Arcega Rustia, Nicolas Franco Gonzalez, Boubacar Diallo, Jérôme Dias, Paul Melki, Pieter M. Blok
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Optimizing deep learning models requires large amounts of annotated images, a process that is both time-intensive and costly. Especially for semantic segmentation models in which every pixel must be annotated. A potential strategy to mitigate annotation effort is active learning. Active learning facilitates the identification and selection of the most informative images from a large unlabelled pool. The underlying premise is that these selected images can improve the model's performance faster than random selection to reduce annotation effort. While active learning has demonstrated promising results on benchmark datasets like Cityscapes, its performance in the agricultural domain remains largely unexplored. This study addresses this research gap by conducting a comparative study of three active learning-based acquisition functions: Bayesian Active Learning by Disagreement (BALD), stochastic-based BALD (PowerBALD), and Random. The acquisition functions were tested on two agricultural datasets: Sugarbeet and Corn-Weed, both containing three semantic classes: background, crop and weed. Our results indicated that active learning, especially PowerBALD, yields a higher performance than Random sampling on both datasets. But due to the relatively large standard deviations, the differences observed were minimal; this was partly caused by high image redundancy and imbalanced classes. Specifically, more than 89\% of the pixels belonged to the background class on both datasets. The absence of significant results on both datasets indicates that further research is required for applying active learning on agricultural datasets, especially if they contain a high-class imbalance and redundant images. Recommendations and insights are provided in this paper to potentially resolve such issues.
A Differentiable Integer Linear Programming Solver for Explanation-Based Natural Language Inference
Authors: Mokanarangan Thayaparan, Marco Valentino, André Freitas
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Integer Linear Programming (ILP) has been proposed as a formalism for encoding precise structural and semantic constraints for Natural Language Inference (NLI). However, traditional ILP frameworks are non-differentiable, posing critical challenges for the integration of continuous language representations based on deep learning. In this paper, we introduce a novel approach, named Diff-Comb Explainer, a neuro-symbolic architecture for explanation-based NLI based on Differentiable BlackBox Combinatorial Solvers (DBCS). Differently from existing neuro-symbolic solvers, Diff-Comb Explainer does not necessitate a continuous relaxation of the semantic constraints, enabling a direct, more precise, and efficient incorporation of neural representations into the ILP formulation. Our experiments demonstrate that Diff-Comb Explainer achieves superior performance when compared to conventional ILP solvers, neuro-symbolic black-box solvers, and Transformer-based encoders. Moreover, a deeper analysis reveals that Diff-Comb Explainer can significantly improve the precision, consistency, and faithfulness of the constructed explanations, opening new opportunities for research on neuro-symbolic architectures for explainable and transparent NLI in complex domains.
A Universal Deep Neural Network for Signal Detection in Wireless Communication Systems
Authors: Khalid Albagami, Nguyen Van Huynh, Geoffrey Ye Li
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Abstract
Recently, deep learning (DL) has been emerging as a promising approach for channel estimation and signal detection in wireless communications. The majority of the existing studies investigating the use of DL techniques in this domain focus on analysing channel impulse responses that are generated from only one channel distribution such as additive white Gaussian channel noise and Rayleigh channels. In practice, to cope with the dynamic nature of the wireless channel, DL methods must be re-trained on newly non-aged collected data which is costly, inefficient, and impractical. To tackle this challenge, this paper proposes a novel universal deep neural network (Uni-DNN) that can achieve high detection performance in various wireless environments without retraining the model. In particular, our proposed Uni-DNN model consists of a wireless channel classifier and a signal detector which are constructed by using DNNs. The wireless channel classifier enables the signal detector to generalise and perform optimally for multiple wireless channel distributions. In addition, to further improve the signal detection performance of the proposed model, convolutional neural network is employed. Extensive simulations using the orthogonal frequency division multiplexing scheme demonstrate that the bit error rate performance of our proposed solution can outperform conventional DL-based approaches as well as least square and minimum mean square error channel estimators in practical low pilot density scenarios.
A Satellite Band Selection Framework for Amazon Forest Deforestation Detection Task
Authors: Eduardo Neto, Fabio A. Faria, Amanda A. S. de Oliveira, Álvaro L. Fazenda
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Abstract
The conservation of tropical forests is a topic of significant social and ecological relevance due to their crucial role in the global ecosystem. Unfortunately, deforestation and degradation impact millions of hectares annually, necessitating government or private initiatives for effective forest monitoring. This study introduces a novel framework that employs the Univariate Marginal Distribution Algorithm (UMDA) to select spectral bands from Landsat-8 satellite, optimizing the representation of deforested areas. This selection guides a semantic segmentation architecture, DeepLabv3+, enhancing its performance. Experimental results revealed several band compositions that achieved superior balanced accuracy compared to commonly adopted combinations for deforestation detection, utilizing segment classification via a Support Vector Machine (SVM). Moreover, the optimal band compositions identified by the UMDA-based approach improved the performance of the DeepLabv3+ architecture, surpassing state-of-the-art approaches compared in this study. The observation that a few selected bands outperform the total contradicts the data-driven paradigm prevalent in the deep learning field. Therefore, this suggests an exception to the conventional wisdom that 'more is always better'.
Can We Understand Plasticity Through Neural Collapse?
Authors: Guglielmo Bonifazi, Iason Chalas, Gian Hess, Jakub Łucki
Abstract
This paper explores the connection between two recently identified phenomena in deep learning: plasticity loss and neural collapse. We analyze their correlation in different scenarios, revealing a significant association during the initial training phase on the first task. Additionally, we introduce a regularization approach to mitigate neural collapse, demonstrating its effectiveness in alleviating plasticity loss in this specific setting.
Unsupervised Learning of Effective Actions in Robotics
Authors: Marko Zaric, Jakob Hollenstein, Justus Piater, Erwan Renaudo
Abstract
Learning actions that are relevant to decision-making and can be executed effectively is a key problem in autonomous robotics. Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions. Although successful in solving manipulation tasks, deep learning methods also lack this ability, in addition to their high cost in terms of memory or training data. In this paper, we propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes", each producing different effects in the environment. After an exploration phase, the algorithm automatically builds a representation of the effects and groups motions into action prototypes, where motions more likely to produce an effect are represented more than those that lead to negligible changes. We evaluate our method on a simulated stair-climbing reinforcement learning task, and the preliminary results show that our effect driven discretization outperforms uniformly and randomly sampled discretizations in convergence speed and maximum reward.
Mixing Individual and Collective Behaviours to Predict Out-of-Routine Mobility
Abstract
Predicting human displacements is crucial for addressing various societal challenges, including urban design, traffic congestion, epidemic management, and migration dynamics. While predictive models like deep learning and Markov models offer insights into individual mobility, they often struggle with out-of-routine behaviours. Our study introduces an approach that dynamically integrates individual and collective mobility behaviours, leveraging collective intelligence to enhance prediction accuracy. Evaluating the model on millions of privacy-preserving trajectories across three US cities, we demonstrate its superior performance in predicting out-of-routine mobility, surpassing even advanced deep learning methods. Spatial analysis highlights the model's effectiveness near urban areas with a high density of points of interest, where collective behaviours strongly influence mobility. During disruptive events like the COVID-19 pandemic, our model retains predictive capabilities, unlike individual-based models. By bridging the gap between individual and collective behaviours, our approach offers transparent and accurate predictions, crucial for addressing contemporary mobility challenges.
AQuA -- Combining Experts' and Non-Experts' Views To Assess Deliberation Quality in Online Discussions Using LLMs
Authors: Maike Behrendt, Stefan Sylvius Wagner, Marc Ziegele, Lena Wilms, Anke Stoll, Dominique Heinbach, Stefan Harmeling
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract
Measuring the quality of contributions in political online discussions is crucial in deliberation research and computer science. Research has identified various indicators to assess online discussion quality, and with deep learning advancements, automating these measures has become feasible. While some studies focus on analyzing specific quality indicators, a comprehensive quality score incorporating various deliberative aspects is often preferred. In this work, we introduce AQuA, an additive score that calculates a unified deliberative quality score from multiple indices for each discussion post. Unlike other singular scores, AQuA preserves information on the deliberative aspects present in comments, enhancing model transparency. We develop adapter models for 20 deliberative indices, and calculate correlation coefficients between experts' annotations and the perceived deliberativeness by non-experts to weigh the individual indices into a single deliberative score. We demonstrate that the AQuA score can be computed easily from pre-trained adapters and aligns well with annotations on other datasets that have not be seen during training. The analysis of experts' vs. non-experts' annotations confirms theoretical findings in the social science literature.
Enhancing Interpretability of Vertebrae Fracture Grading using Human-interpretable Prototypes
Authors: Poulami Sinhamahapatra, Suprosanna Shit, Anjany Sekuboyina, Malek Husseini, David Schinz, Nicolas Lenhart, Joern Menze, Jan Kirschke, Karsten Roscher, Stephan Guennemann
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Vertebral fracture grading classifies the severity of vertebral fractures, which is a challenging task in medical imaging and has recently attracted Deep Learning (DL) models. Only a few works attempted to make such models human-interpretable despite the need for transparency and trustworthiness in critical use cases like DL-assisted medical diagnosis. Moreover, such models either rely on post-hoc methods or additional annotations. In this work, we propose a novel interpretable-by-design method, ProtoVerse, to find relevant sub-parts of vertebral fractures (prototypes) that reliably explain the model's decision in a human-understandable way. Specifically, we introduce a novel diversity-promoting loss to mitigate prototype repetitions in small datasets with intricate semantics. We have experimented with the VerSe'19 dataset and outperformed the existing prototype-based method. Further, our model provides superior interpretability against the post-hoc method. Importantly, expert radiologists validated the visual interpretability of our results, showing clinical applicability.
FlightScope: A Deep Comprehensive Assessment of Aircraft Detection Algorithms in Satellite Imagery
Authors: Safouane El Ghazouali, Arnaud Gucciardi, Nicola Venturi, Michael Rueegsegger, Umberto Michelucci
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Object detection in remotely sensed satellite pictures is fundamental in many fields such as biophysical, and environmental monitoring. While deep learning algorithms are constantly evolving, they have been mostly implemented and tested on popular ground-based taken photos. This paper critically evaluates and compares a suite of advanced object detection algorithms customized for the task of identifying aircraft within satellite imagery. Using the large HRPlanesV2 dataset, together with a rigorous validation with the GDIT dataset, this research encompasses an array of methodologies including YOLO versions 5 and 8, Faster RCNN, CenterNet, RetinaNet, RTMDet, and DETR, all trained from scratch. This exhaustive training and validation study reveal YOLOv5 as the preeminent model for the specific case of identifying airplanes from remote sensing data, showcasing high precision and adaptability across diverse imaging conditions. This research highlight the nuanced performance landscapes of these algorithms, with YOLOv5 emerging as a robust solution for aerial object detection, underlining its importance through superior mean average precision, Recall, and Intersection over Union scores. The findings described here underscore the fundamental role of algorithm selection aligned with the specific demands of satellite imagery analysis and extend a comprehensive framework to evaluate model efficacy. The benchmark toolkit and codes, available via https://github.com/toelt-llc/FlightScope_Bench, aims to further exploration and innovation in the realm of remote sensing object detection, paving the way for improved analytical methodologies in satellite imagery applications.
Deep Image Composition Meets Image Forgery
Authors: Eren Tahir, Mert Bal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Image forgery is a topic that has been studied for many years. Before the breakthrough of deep learning, forged images were detected using handcrafted features that did not require training. These traditional methods failed to perform satisfactorily even on datasets much worse in quality than real-life image manipulations. Advances in deep learning have impacted image forgery detection as much as they have impacted other areas of computer vision and have improved the state of the art. Deep learning models require large amounts of labeled data for training. In the case of image forgery, labeled data at the pixel level is a very important factor for the models to learn. None of the existing datasets have sufficient size, realism and pixel-level labeling at the same time. This is due to the high cost of producing and labeling quality images. It can take hours for an image editing expert to manipulate just one image. To bridge this gap, we automate data generation using image composition techniques that are very related to image forgery. Unlike other automated data generation frameworks, we use state of the art image composition deep learning models to generate spliced images close to the quality of real-life manipulations. Finally, we test the generated dataset on the SOTA image manipulation detection model and show that its prediction performance is lower compared to existing datasets, i.e. we produce realistic images that are more difficult to detect. Dataset will be available at https://github.com/99eren99/DIS25k .
Keyword: differential privacy
A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
Differentially Private Verification of Survey-Weighted Estimates
Keyword: privacy
A Unified Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
An Interpretable Client Decision Tree Aggregation process for Federated Learning
Differentially Private Verification of Survey-Weighted Estimates
Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game
Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face Recognition
Mixing Individual and Collective Behaviours to Predict Out-of-Routine Mobility
Federated Computing -- Survey on Building Blocks, Extensions and Systems
Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds
Keyword: machine learning
Remote sensing framework for geological mapping via stacked autoencoders and clustering
Leveraging Machine Learning for Early Autism Detection via INDT-ASD Indian Database
A shared compilation stack for distributed-memory parallelism in stencil DSLs
Virtual Sensor for Real-Time Bearing Load Prediction Using Heterogeneous Temporal Graph Neural Networks
Heat Death of Generative Models in Closed-Loop Learning
Effective Malware Detection for Embedded Computing Systems with Limited Exposure
Attribution Regularization for Multimodal Paradigms
Obfuscated Malware Detection: Investigating Real-world Scenarios through Memory Analysis
Designing a Photonic Physically Unclonable Function Having Resilience to Machine Learning Attacks
Task Agnostic Architecture for Algorithm Induction via Implicit Composition
Creating a Trajectory for Code Writing: Algorithmic Reasoning Tasks
New methods for drug synergy prediction
An Interpretable Client Decision Tree Aggregation process for Federated Learning
Space-time parallel scaling of Parareal with a Fourier Neural Operator as coarse propagator
CSEPrompts: A Benchmark of Introductory Computer Science Prompts
Adaptive hp-Polynomial Based Sparse Grid Collocation Algorithms for Piecewise Smooth Functions with Kinks
On Future Power Systems Digital Twins: Towards a Standard Architecture
Learning Alternative Ways of Performing a Task
LightFAt: Mitigating Control-flow Explosion via Lightweight PMU-based Control-flow Attestation
Towards detecting unanticipated bias in Large Language Models
Adversarial Attacks and Dimensionality in Text Classifiers
Continual Learning of Numerous Tasks from Long-tail Distributions
Closing the Implementation Gap in MC: Fully Chemical Synchronization and Detection for Cellular Receivers
Empowering Biomedical Discovery with AI Agents
"Are Adversarial Phishing Webpages a Threat in Reality?" Understanding the Users' Perception of Adversarial Webpages
AI-augmented Automation for Real Driving Prediction: an Industrial Use Case
Human Activity Recognition using Smartphones
Keyword: optimization
Trainable Least Squares to Reduce PAPR in OFDM-based Hybrid Beamforming Systems
An Online Joint Optimization Approach for QoE Maximization in UAV-Enabled Mobile Edge Computing
Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization
LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP
Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization
Comparative Study of Domain Driven Terms Extraction Using Large Language Models
EnergAIze: Multi Agent Deep Deterministic Policy Gradient for Vehicle to Grid Energy Management
TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Surrounding Autonomous Driving Scenes
A Unified Editing Method for Co-Speech Gesture Generation via Diffusion Inversion
MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms
Joint Optimization on Uplink OFDMA and MU-MIMO for IEEE 802.11ax: Deep Hierarchical Reinforcement Learning Approach
Versatile Scene-Consistent Traffic Scenario Generation as Optimization with Diffusion
Adaptive hp-Polynomial Based Sparse Grid Collocation Algorithms for Piecewise Smooth Functions with Kinks
Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering
Vocabulary Attack to Hijack Large Language Model Applications
Leveraging Swarm Intelligence to Drive Autonomously: A Particle Swarm Optimization based Approach to Motion Planning
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models
Automatic Prompt Selection for Large Language Models
LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis
Unsupervised Occupancy Learning from Sparse Point Cloud
Planning for Robust Open-loop Pushing: Exploiting Quasi-static Belief Dynamics and Contact-informed Optimization
Wideband Beamforming for Near-Field Communications with Circular Arrays
A Survey of Optimization-based Task and Motion Planning: From Classical To Learning Approaches
BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models
Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models
Integrating Explanations in Learning LTL Specifications from Demonstrations
Learning Quadrupedal Locomotion via Differentiable Simulation
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
A Mean Field Game Model for Timely Computation in Edge Computing Systems
Keyword: deep learning
Remote sensing framework for geological mapping via stacked autoencoders and clustering
A Generative Deep Learning Approach for Crash Severity Modeling with Imbalanced Data
Insights from the Use of Previously Unseen Neural Architecture Search Datasets
CHOSEN: Contrastive Hypothesis Selection for Multi-View Depth Refinement
Smooth Deep Saliency
Semantic Augmentation in Images using Language
APC2Mesh: Bridging the gap from occluded building façades to full 3D models
A neuroergonomics model to evaluating nuclear power plants operators' performance under heat stress driven by ECG time-frequency spectrums and fNIRS prefrontal cortex network: a CNN-GAT fusion model
MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms
TSNet:A Two-stage Network for Image Dehazing with Multi-scale Fusion and Adaptive Learning
New methods for drug synergy prediction
A Neural Multigrid Solver for Helmholtz Equations with High Wavenumber and Heterogeneous Media
Computationally Efficient Unsupervised Deep Learning for Robust Joint AP Clustering and Beamforming Design in Cell-Free Systems
An Interpretable Power System Transient Stability Assessment Method with Expert Guiding Neural-Regression-Tree
Representation Alignment Contrastive Regularization for Multi-Object Tracking
Fusing Multi-sensor Input with State Information on TinyML Brains for Autonomous Nano-drones
Active learning for efficient annotation in precision agriculture: a use-case on crop-weed semantic segmentation
A Differentiable Integer Linear Programming Solver for Explanation-Based Natural Language Inference
A Universal Deep Neural Network for Signal Detection in Wireless Communication Systems
A Satellite Band Selection Framework for Amazon Forest Deforestation Detection Task
Can We Understand Plasticity Through Neural Collapse?
Unsupervised Learning of Effective Actions in Robotics
Mixing Individual and Collective Behaviours to Predict Out-of-Routine Mobility
AQuA -- Combining Experts' and Non-Experts' Views To Assess Deliberation Quality in Online Discussions Using LLMs
Enhancing Interpretability of Vertebrae Fracture Grading using Human-interpretable Prototypes
FlightScope: A Deep Comprehensive Assessment of Aircraft Detection Algorithms in Satellite Imagery
Deep Image Composition Meets Image Forgery