New submissions for Mon, 8 Apr 24

Keyword: differential privacy

PrivShape: Extracting Shapes in Time Series under User-Level Local Differential Privacy

Authors: Yulian Mao, Qingqing Ye, Haibo Hu, Qi Wang, Kai Huang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.03873
Pdf link: https://arxiv.org/pdf/2404.03873
Abstract Time series have numerous applications in finance, healthcare, IoT, and smart city. In many of these applications, time series typically contain personal data, so privacy infringement may occur if they are released directly to the public. Recently, local differential privacy (LDP) has emerged as the state-of-the-art approach to protecting data privacy. However, existing works on LDP-based collections cannot preserve the shape of time series. A recent work, PatternLDP, attempts to address this problem, but it can only protect a finite group of elements in a time series due to {\omega}-event level privacy guarantee. In this paper, we propose PrivShape, a trie-based mechanism under user-level LDP to protect all elements. PrivShape first transforms a time series to reduce its length, and then adopts trie-expansion and two-level refinement to improve utility. By extensive experiments on real-world datasets, we demonstrate that PrivShape outperforms PatternLDP when adapted for offline use, and can effectively extract frequent shapes.
From Theory to Comprehension: A Comparative Study of Differential Privacy and $k$-Anonymity
Authors: Saskia Nuñez von Voigt, Luise Mehner, Florian Tschorsch
Subjects: Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2404.04006
Pdf link: https://arxiv.org/pdf/2404.04006
Abstract The notion of $\varepsilon$-differential privacy is a widely used concept of providing quantifiable privacy to individuals. However, it is unclear how to explain the level of privacy protection provided by a differential privacy mechanism with a set $\varepsilon$. In this study, we focus on users' comprehension of the privacy protection provided by a differential privacy mechanism. To do so, we study three variants of explaining the privacy protection provided by differential privacy: (1) the original mathematical definition; (2) $\varepsilon$ translated into a specific privacy risk; and (3) an explanation using the randomized response technique. We compare users' comprehension of privacy protection employing these explanatory models with their comprehension of privacy protection of $k$-anonymity as baseline comprehensibility. Our findings suggest that participants' comprehension of differential privacy protection is enhanced by the privacy risk model and the randomized response-based model. Moreover, our results confirm our intuition that privacy protection provided by $k$-anonymity is more comprehensible.
You Can Use But Cannot Recognize: Preserving Visual Privacy in Deep Neural Networks
Authors: Qiushi Li, Yan Zhang, Ju Ren, Qi Li, Yaoxue Zhang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.04098
Pdf link: https://arxiv.org/pdf/2404.04098
Abstract Image data have been extensively used in Deep Neural Network (DNN) tasks in various scenarios, e.g., autonomous driving and medical image analysis, which incurs significant privacy concerns. Existing privacy protection techniques are unable to efficiently protect such data. For example, Differential Privacy (DP) that is an emerging technique protects data with strong privacy guarantee cannot effectively protect visual features of exposed image dataset. In this paper, we propose a novel privacy-preserving framework VisualMixer that protects the training data of visual DNN tasks by pixel shuffling, while not injecting any noises. VisualMixer utilizes a new privacy metric called Visual Feature Entropy (VFE) to effectively quantify the visual features of an image from both biological and machine vision aspects. In VisualMixer, we devise a task-agnostic image obfuscation method to protect the visual privacy of data for DNN training and inference. For each image, it determines regions for pixel shuffling in the image and the sizes of these regions according to the desired VFE. It shuffles pixels both in the spatial domain and in the chromatic channel space in the regions without injecting noises so that it can prevent visual features from being discerned and recognized, while incurring negligible accuracy loss. Extensive experiments on real-world datasets demonstrate that VisualMixer can effectively preserve the visual privacy with negligible accuracy loss, i.e., at average 2.35 percentage points of model accuracy loss, and almost no performance degradation on model training.
Keyword: privacy

Federated Unlearning for Human Activity Recognition
Authors: Kongyang Chen, Dongping zhang, Yaping Chai, Weibin Zhang, Shaowei Wang, Jiaxing Shen
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.03659
Pdf link: https://arxiv.org/pdf/2404.03659
Abstract The rapid evolution of Internet of Things (IoT) technology has spurred the widespread adoption of Human Activity Recognition (HAR) in various daily life domains. Federated Learning (FL) is frequently utilized to build a global HAR model by aggregating user contributions without transmitting raw individual data. Despite substantial progress in user privacy protection with FL, challenges persist. Regulations like the General Data Protection Regulation (GDPR) empower users to request data removal, raising a new query in FL: How can a HAR client request data removal without compromising other clients' privacy? In response, we propose a lightweight machine unlearning method for refining the FL HAR model by selectively removing a portion of a client's training data. Our method employs a third-party dataset unrelated to model training. Using KL divergence as a loss function for fine-tuning, we aim to align the predicted probability distribution on forgotten data with the third-party dataset. Additionally, we introduce a membership inference evaluation method to assess unlearning effectiveness. Experimental results across diverse datasets show our method achieves unlearning accuracy comparable to \textit{retraining} methods, resulting in speedups ranging from hundreds to thousands.
Mitigating Heterogeneity in Federated Multimodal Learning with Biomedical Vision-Language Pre-training
Authors: Zitao Shuai, Liyue Shen
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.03854
Pdf link: https://arxiv.org/pdf/2404.03854
Abstract Vision-language pre-training (VLP) has arised as an efficient scheme for multimodal representation learning, but it requires large-scale multimodal data for pre-training, making it an obstacle especially for biomedical applications. To overcome the data limitation, federated learning (FL) can be a promising strategy to scale up the dataset for biomedical VLP while protecting data privacy. However, client data are often heterogeneous in real-world scenarios, and we observe that local training on heterogeneous client data would distort the multimodal representation learning and lead to biased cross-modal alignment. To address this challenge, we propose Federated distributional Robust Guidance-Based (FedRGB) learning framework for federated VLP with robustness to data heterogeneity. Specifically, we utilize a guidance-based local training scheme to reduce feature distortions, and employ a distribution-based min-max optimization to learn unbiased cross-modal alignment. The experiments on real-world datasets show our method successfully promotes efficient federated multimodal learning for biomedical VLP with data heterogeneity.
PrivShape: Extracting Shapes in Time Series under User-Level Local Differential Privacy
Authors: Yulian Mao, Qingqing Ye, Haibo Hu, Qi Wang, Kai Huang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.03873
Pdf link: https://arxiv.org/pdf/2404.03873
Abstract Time series have numerous applications in finance, healthcare, IoT, and smart city. In many of these applications, time series typically contain personal data, so privacy infringement may occur if they are released directly to the public. Recently, local differential privacy (LDP) has emerged as the state-of-the-art approach to protecting data privacy. However, existing works on LDP-based collections cannot preserve the shape of time series. A recent work, PatternLDP, attempts to address this problem, but it can only protect a finite group of elements in a time series due to {\omega}-event level privacy guarantee. In this paper, we propose PrivShape, a trie-based mechanism under user-level LDP to protect all elements. PrivShape first transforms a time series to reduce its length, and then adopts trie-expansion and two-level refinement to improve utility. By extensive experiments on real-world datasets, we demonstrate that PrivShape outperforms PatternLDP when adapted for offline use, and can effectively extract frequent shapes.
From Theory to Comprehension: A Comparative Study of Differential Privacy and $k$-Anonymity
Authors: Saskia Nuñez von Voigt, Luise Mehner, Florian Tschorsch
Subjects: Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2404.04006
Pdf link: https://arxiv.org/pdf/2404.04006
Abstract The notion of $\varepsilon$-differential privacy is a widely used concept of providing quantifiable privacy to individuals. However, it is unclear how to explain the level of privacy protection provided by a differential privacy mechanism with a set $\varepsilon$. In this study, we focus on users' comprehension of the privacy protection provided by a differential privacy mechanism. To do so, we study three variants of explaining the privacy protection provided by differential privacy: (1) the original mathematical definition; (2) $\varepsilon$ translated into a specific privacy risk; and (3) an explanation using the randomized response technique. We compare users' comprehension of privacy protection employing these explanatory models with their comprehension of privacy protection of $k$-anonymity as baseline comprehensibility. Our findings suggest that participants' comprehension of differential privacy protection is enhanced by the privacy risk model and the randomized response-based model. Moreover, our results confirm our intuition that privacy protection provided by $k$-anonymity is more comprehensible.
CLUE: A Clinical Language Understanding Evaluation for LLMs
Authors: Amin Dada, Marie Bauer, Amanda Butler Contreras, Osman Alperen Koraş, Constantin Marc Seibold, Kaleb E Smith, Jens Kleesiek
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.04067
Pdf link: https://arxiv.org/pdf/2404.04067
Abstract Large Language Models (LLMs) have shown the potential to significantly contribute to patient care, diagnostics, and administrative processes. Emerging biomedical LLMs address healthcare-specific challenges, including privacy demands and computational constraints. However, evaluation of these models has primarily been limited to non-clinical tasks, which do not reflect the complexity of practical clinical applications. Additionally, there has been no thorough comparison between biomedical and general-domain LLMs for clinical tasks. To fill this gap, we present the Clinical Language Understanding Evaluation (CLUE), a benchmark tailored to evaluate LLMs on real-world clinical tasks. CLUE includes two novel datasets derived from MIMIC IV discharge letters and four existing tasks designed to test the practical applicability of LLMs in healthcare settings. Our evaluation covers several biomedical and general domain LLMs, providing insights into their clinical performance and applicability. CLUE represents a step towards a standardized approach to evaluating and developing LLMs in healthcare to align future model development with the real-world needs of clinical application. We publish our evaluation and data generation scripts: https://github.com/dadaamin/CLUE
You Can Use But Cannot Recognize: Preserving Visual Privacy in Deep Neural Networks
Authors: Qiushi Li, Yan Zhang, Ju Ren, Qi Li, Yaoxue Zhang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.04098
Pdf link: https://arxiv.org/pdf/2404.04098
Abstract Image data have been extensively used in Deep Neural Network (DNN) tasks in various scenarios, e.g., autonomous driving and medical image analysis, which incurs significant privacy concerns. Existing privacy protection techniques are unable to efficiently protect such data. For example, Differential Privacy (DP) that is an emerging technique protects data with strong privacy guarantee cannot effectively protect visual features of exposed image dataset. In this paper, we propose a novel privacy-preserving framework VisualMixer that protects the training data of visual DNN tasks by pixel shuffling, while not injecting any noises. VisualMixer utilizes a new privacy metric called Visual Feature Entropy (VFE) to effectively quantify the visual features of an image from both biological and machine vision aspects. In VisualMixer, we devise a task-agnostic image obfuscation method to protect the visual privacy of data for DNN training and inference. For each image, it determines regions for pixel shuffling in the image and the sizes of these regions according to the desired VFE. It shuffles pixels both in the spatial domain and in the chromatic channel space in the regions without injecting noises so that it can prevent visual features from being discerned and recognized, while incurring negligible accuracy loss. Extensive experiments on real-world datasets demonstrate that VisualMixer can effectively preserve the visual privacy with negligible accuracy loss, i.e., at average 2.35 percentage points of model accuracy loss, and almost no performance degradation on model training.
Precision Guided Approach to Mitigate Data Poisoning Attacks in Federated Learning
Authors: K Naveen Kumar, C Krishna Mohan, Aravind Machiry
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.04139
Pdf link: https://arxiv.org/pdf/2404.04139
Abstract Federated Learning (FL) is a collaborative learning paradigm enabling participants to collectively train a shared machine learning model while preserving the privacy of their sensitive data. Nevertheless, the inherent decentralized and data-opaque characteristics of FL render its susceptibility to data poisoning attacks. These attacks introduce malformed or malicious inputs during local model training, subsequently influencing the global model and resulting in erroneous predictions. Current FL defense strategies against data poisoning attacks either involve a trade-off between accuracy and robustness or necessitate the presence of a uniformly distributed root dataset at the server. To overcome these limitations, we present FedZZ, which harnesses a zone-based deviating update (ZBDU) mechanism to effectively counter data poisoning attacks in FL. Further, we introduce a precision-guided methodology that actively characterizes these client clusters (zones), which in turn aids in recognizing and discarding malicious updates at the server. Our evaluation of FedZZ across two widely recognized datasets: CIFAR10 and EMNIST, demonstrate its efficacy in mitigating data poisoning attacks, surpassing the performance of prevailing state-of-the-art methodologies in both single and multi-client attack scenarios and varying attack volumes. Notably, FedZZ also functions as a robust client selection strategy, even in highly non-IID and attack-free scenarios. Moreover, in the face of escalating poisoning rates, the model accuracy attained by FedZZ displays superior resilience compared to existing techniques. For instance, when confronted with a 50% presence of malicious clients, FedZZ sustains an accuracy of 67.43%, while the accuracy of the second-best solution, FL-Defender, diminishes to 43.36%.
Keyword: machine learning

Machine Learning in Proton Exchange Membrane Water Electrolysis -- Part I: A Knowledge-Integrated Framework
Authors: Xia Chen, Alexander Rex, Janis Woelke, Christoph Eckert, Boris Bensmann, Richard Hanke-Rauschenbach, Philipp Geyer
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2404.03660
Pdf link: https://arxiv.org/pdf/2404.03660
Abstract In this study, we propose to adopt a novel framework, Knowledge-integrated Machine Learning, for advancing Proton Exchange Membrane Water Electrolysis (PEMWE) development. Given the significance of PEMWE in green hydrogen production and the inherent challenges in optimizing its performance, our framework aims to meld data-driven models with domain-specific insights systematically to address the domain challenges. We first identify the uncertainties originating from data acquisition conditions, data-driven model mechanisms, and domain expertise, highlighting their complementary characteristics in carrying information from different perspectives. Building upon this foundation, we showcase how to adeptly decompose knowledge and extract unique information to contribute to the data augmentation, modeling process, and knowledge discovery. We demonstrate a hierarchical three-level framework, termed the "Ladder of Knowledge-integrated Machine Learning", in the PEMWE context, applying it to three case studies within a context of cell degradation analysis to affirm its efficacy in interpolation, extrapolation, and information representation. This research lays the groundwork for more knowledge-informed enhancements in ML applications in engineering.
Machine learning augmented diagnostic testing to identify sources of variability in test performance
Authors: Christopher J. Banks, Aeron Sanchez, Vicki Stewart, Kate Bowen, Graham Smith, Rowland R. Kao
Subjects: Machine Learning (cs.LG); Populations and Evolution (q-bio.PE); Applications (stat.AP); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2404.03678
Pdf link: https://arxiv.org/pdf/2404.03678
Abstract Diagnostic tests which can detect pre-clinical or sub-clinical infection, are one of the most powerful tools in our armoury of weapons to control infectious diseases. Considerable effort has been therefore paid to improving diagnostic testing for human, plant and animal diseases, including strategies for targeting the use of diagnostic tests towards individuals who are more likely to be infected. Here, we follow other recent proposals to further refine this concept, by using machine learning to assess the situational risk under which a diagnostic test is applied to augment its interpretation . We develop this to predict the occurrence of breakdowns of cattle herds due to bovine tuberculosis, exploiting the availability of exceptionally detailed testing records. We show that, without compromising test specificity, test sensitivity can be improved so that the proportion of infected herds detected by the skin test, improves by over 16 percentage points. While many risk factors are associated with increased risk of becoming infected, of note are several factors which suggest that, in some herds there is a higher risk of infection going undetected, including effects that are correlated to the veterinary practice conducting the test, and number of livestock moved off the herd.
Predictive Analytics of Varieties of Potatoes
Authors: Fabiana Ferracina, Bala Krishnamoorthy, Mahantesh Halappanavar, Shengwei Hu, Vidyasagar Sathuvalli
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2404.03701
Pdf link: https://arxiv.org/pdf/2404.03701
Abstract We explore the application of machine learning algorithms to predict the suitability of Russet potato clones for advancement in breeding trials. Leveraging data from manually collected trials in the state of Oregon, we investigate the potential of a wide variety of state-of-the-art binary classification models. We conduct a comprehensive analysis of the dataset that includes preprocessing, feature engineering, and imputation to address missing values. We focus on several key metrics such as accuracy, F1-score, and Matthews correlation coefficient (MCC) for model evaluation. The top-performing models, namely the multi-layer perceptron (MLPC), histogram-based gradient boosting classifier (HGBC), and a support vector machine (SVC), demonstrate consistent and significant results. Variable selection further enhances model performance and identifies influential features in predicting trial outcomes. The findings emphasize the potential of machine learning in streamlining the selection process for potato varieties, offering benefits such as increased efficiency, substantial cost savings, and judicious resource utilization. Our study contributes insights into precision agriculture and showcases the relevance of advanced technologies for informed decision-making in breeding programs.
On Extending the Automatic Test Markup Language (ATML) for Machine Learning
Authors: Tyler Cody, Bingtong Li, Peter A. Beling
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.03769
Pdf link: https://arxiv.org/pdf/2404.03769
Abstract This paper addresses the urgent need for messaging standards in the operational test and evaluation (T&E) of machine learning (ML) applications, particularly in edge ML applications embedded in systems like robots, satellites, and unmanned vehicles. It examines the suitability of the IEEE Standard 1671 (IEEE Std 1671), known as the Automatic Test Markup Language (ATML), an XML-based standard originally developed for electronic systems, for ML application testing. The paper explores extending IEEE Std 1671 to encompass the unique challenges of ML applications, including the use of datasets and dependencies on software. Through modeling various tests such as adversarial robustness and drift detection, this paper offers a framework adaptable to specific applications, suggesting that minor modifications to ATML might suffice to address the novelties of ML. This paper differentiates ATML's focus on testing from other ML standards like Predictive Model Markup Language (PMML) or Open Neural Network Exchange (ONNX), which concentrate on ML model specification. We conclude that ATML is a promising tool for effective, near real-time operational T&E of ML applications, an essential aspect of AI lifecycle management, safety, and governance.
A Systems Theoretic Approach to Online Machine Learning
Authors: Anli du Preez, Peter A. Beling, Tyler Cody
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.03775
Pdf link: https://arxiv.org/pdf/2404.03775
Abstract The machine learning formulation of online learning is incomplete from a systems theoretic perspective. Typically, machine learning research emphasizes domains and tasks, and a problem solving worldview. It focuses on algorithm parameters, features, and samples, and neglects the perspective offered by considering system structure and system behavior or dynamics. Online learning is an active field of research and has been widely explored in terms of statistical theory and computational algorithms, however, in general, the literature still lacks formal system theoretical frameworks for modeling online learning systems and resolving systems-related concept drift issues. Furthermore, while the machine learning formulation serves to classify methods and literature, the systems theoretic formulation presented herein serves to provide a framework for the top-down design of online learning systems, including a novel definition of online learning and the identification of key design parameters. The framework is formulated in terms of input-output systems and is further divided into system structure and system behavior. Concept drift is a critical challenge faced in online learning, and this work formally approaches it as part of the system behavior characteristics. Healthcare provider fraud detection using machine learning is used as a case study throughout the paper to ground the discussion in a real-world online learning challenge.
An ExplainableFair Framework for Prediction of Substance Use Disorder Treatment Completion
Authors: Mary M. Lucas, Xiaoyang Wang, Chia-Hsuan Chang, Christopher C. Yang, Jacqueline E. Braughton, Quyen M. Ngo
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2404.03833
Pdf link: https://arxiv.org/pdf/2404.03833
Abstract Fairness of machine learning models in healthcare has drawn increasing attention from clinicians, researchers, and even at the highest level of government. On the other hand, the importance of developing and deploying interpretable or explainable models has been demonstrated, and is essential to increasing the trustworthiness and likelihood of adoption of these models. The objective of this study was to develop and implement a framework for addressing both these issues - fairness and explainability. We propose an explainable fairness framework, first developing a model with optimized performance, and then using an in-processing approach to mitigate model biases relative to the sensitive attributes of race and sex. We then explore and visualize explanations of the model changes that lead to the fairness enhancement process through exploring the changes in importance of features. Our resulting-fairness enhanced models retain high sensitivity with improved fairness and explanations of the fairness-enhancement that may provide helpful insights for healthcare providers to guide clinical decision-making and resource allocation.
Optimizing Convolutional Neural Networks for Identifying Invasive Pollinator Apis Mellifera and Finding a Ligand drug to Protect California's Biodiversity
Authors: Arnav Swaroop
Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2404.03870
Pdf link: https://arxiv.org/pdf/2404.03870
Abstract In North America, there are many diverse species of native bees crucial for the environment, who are the primary pollinators of most native floral species. The Californian agriculture industry imports European honeybees (Apis Mellifera) primarily for pollinating almonds. Unfortunately, this has resulted in the unintended consequence of disrupting the native ecosystem and threatening many native bee species as they are outcompeted for food. Our first step for protecting the native species is identification with the use of a Convolutional Neural Network (CNN) to differentiate common native bee species from invasive ones. Removing invasive colonies efficiently without harming native species is difficult as pesticides cause myriad diseases in native species. Our approach seeks to prevent the formation of new queens, causing the colony's collapse. Workers secrete royal jelly, a substance that causes fertility and longevity; it is fed to future honeybee queens. Targeting the production of this substance is safe as no native species use it; small organic molecules (ligands) prevent the proteins Apisimin and MRJP1 from combining and producing an oligomer used to form the substance. Ideal ligands bind to only one of these proteins preventing them from joining together: they have a high affinity for one receptor and a significantly lower affinity for the other. We optimized the CNN to provide a framework for creating Machine Learning models that excel at differentiating between subspecies of insects by measuring the effects of image alteration and class grouping on model performance. The CNN is able to achieve an accuracy of 82% in differentiating between invasive and native bee species; 3 ligands have been identified as effective. Our new approach offers a promising solution to curb the spread of invasive bees within California through an identification and neutralization method.
Semantic SQL -- Combining and optimizing semantic predicates in SQL
Authors: Akash Mittal, Anshul Bheemreddy, Huili Tao
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2404.03880
Pdf link: https://arxiv.org/pdf/2404.03880
Abstract In recent years, the surge in unstructured data analysis, facilitated by advancements in Machine Learning (ML), has prompted diverse approaches for handling images, text documents, and videos. Analysts, leveraging ML models, can extract meaningful information from unstructured data and store it in relational databases, allowing the execution of SQL queries for further analysis. Simultaneously, vector databases have emerged, embedding unstructured data for efficient top-k queries based on textual queries. This paper introduces a novel framework SSQL - Semantic SQL that utilizes these two approaches, enabling the incorporation of semantic queries within SQL statements. Our approach extends SQL queries with dedicated keywords for specifying semantic queries alongside predicates related to ML model results and metadata. Our experimental results show that using just semantic queries fails catastrophically to answer count and spatial queries in more than 60% of the cases. Our proposed method jointly optimizes the queries containing both semantic predicates and predicates on structured tables, such as those generated by ML models or other metadata. Further, to improve the query results, we incorporated human-in-the-loop feedback to determine the optimal similarity score threshold for returning results.
Multi-Task Learning for Lung sound & Lung disease classification
Authors: Suma K V, Deepali Koppad, Preethi Kumar, Neha A Kantikar, Surabhi Ramesh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)
Arxiv link: https://arxiv.org/abs/2404.03908
Pdf link: https://arxiv.org/pdf/2404.03908
Abstract In recent years, advancements in deep learning techniques have considerably enhanced the efficiency and accuracy of medical diagnostics. In this work, a novel approach using multi-task learning (MTL) for the simultaneous classification of lung sounds and lung diseases is proposed. Our proposed model leverages MTL with four different deep learning models such as 2D CNN, ResNet50, MobileNet and Densenet to extract relevant features from the lung sound recordings. The ICBHI 2017 Respiratory Sound Database was employed in the current study. The MTL for MobileNet model performed better than the other models considered, with an accuracy of74\% for lung sound analysis and 91\% for lung diseases classification. Results of the experimentation demonstrate the efficacy of our approach in classifying both lung sounds and lung diseases concurrently. In this study,using the demographic data of the patients from the database, risk level computation for Chronic Obstructive Pulmonary Disease is also carried out. For this computation, three machine learning algorithms namely Logistic Regression, SVM and Random Forest classifierswere employed. Among these ML algorithms, the Random Forest classifier had the highest accuracy of 92\%.This work helps in considerably reducing the physician's burden of not just diagnosing the pathology but also effectively communicating to the patient about the possible causes or outcomes.
Towards Understanding the Impact of Code Modifications on Software Quality Metrics
Authors: Thomas Karanikiotis, Andreas L. Symeonidis
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.03953
Pdf link: https://arxiv.org/pdf/2404.03953
Abstract Context: In the realm of software development, maintaining high software quality is a persistent challenge. However, this challenge is often impeded by the lack of comprehensive understanding of how specific code modifications influence quality metrics. Objective: This study ventures to bridge this gap through an approach that aspires to assess and interpret the impact of code modifications. The underlying hypothesis posits that code modifications inducing similar changes in software quality metrics can be grouped into distinct clusters, which can be effectively described using an AI language model, thus providing a simple understanding of code changes and their quality implications. Method: To validate this hypothesis, we built and analyzed a dataset from popular GitHub repositories, segmented into individual code modifications. Each project was evaluated against software quality metrics pre and post-application. Machine learning techniques were utilized to cluster these modifications based on the induced changes in the metrics. Simultaneously, an AI language model was employed to generate descriptions of each modification's function. Results: The results reveal distinct clusters of code modifications, each accompanied by a concise description, revealing their collective impact on software quality metrics. Conclusions: The findings suggest that this research is a significant step towards a comprehensive understanding of the complex relationship between code changes and software quality, which has the potential to transform software maintenance strategies and enable the development of more accurate quality prediction models.
Transformers for molecular property prediction: Lessons learned from the past five years
Authors: Afnan Sultan, Jochen Sieg, Miriam Mathea, Andrea Volkamer
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2404.03969
Pdf link: https://arxiv.org/pdf/2404.03969
Abstract Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pre-training data, optimal architecture selections, and promising pre-training objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field's understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.
Fast Genetic Algorithm for feature selection -- A qualitative approximation approach
Authors: Mohammed Ghaith Altarabichi, Sławomir Nowaczyk, Sepideh Pashami, Peyman Sheikholharam Mashhadi
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.03996
Pdf link: https://arxiv.org/pdf/2404.03996
Abstract Evolutionary Algorithms (EAs) are often challenging to apply in real-world settings since evolutionary computations involve a large number of evaluations of a typically expensive fitness function. For example, an evaluation could involve training a new machine learning model. An approximation (also known as meta-model or a surrogate) of the true function can be used in such applications to alleviate the computation cost. In this paper, we propose a two-stage surrogate-assisted evolutionary approach to address the computational issues arising from using Genetic Algorithm (GA) for feature selection in a wrapper setting for large datasets. We define 'Approximation Usefulness' to capture the necessary conditions to ensure correctness of the EA computations when an approximation is used. Based on this definition, we propose a procedure to construct a lightweight qualitative meta-model by the active selection of data instances. We then use a meta-model to carry out the feature selection task. We apply this procedure to the GA-based algorithm CHC (Cross generational elitist selection, Heterogeneous recombination and Cataclysmic mutation) to create a Qualitative approXimations variant, CHCQX. We show that CHCQX converges faster to feature subset solutions of significantly higher accuracy (as compared to CHC), particularly for large datasets with over 100K instances. We also demonstrate the applicability of the thinking behind our approach more broadly to Swarm Intelligence (SI), another branch of the Evolutionary Computation (EC) paradigm with results of PSOQX, a qualitative approximation adaptation of the Particle Swarm Optimization (PSO) method. A GitHub repository with the complete implementation is available.
Continual Learning with Weight Interpolation
Authors: Jędrzej Kozal, Jan Wasilewski, Bartosz Krawczyk, Michał Woźniak
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.04002
Pdf link: https://arxiv.org/pdf/2404.04002
Abstract Continual learning poses a fundamental challenge for modern machine learning systems, requiring models to adapt to new tasks while retaining knowledge from previous ones. Addressing this challenge necessitates the development of efficient algorithms capable of learning from data streams and accumulating knowledge over time. This paper proposes a novel approach to continual learning utilizing the weight consolidation method. Our method, a simple yet powerful technique, enhances robustness against catastrophic forgetting by interpolating between old and new model weights after each novel task, effectively merging two models to facilitate exploration of local minima emerging after arrival of new concepts. Moreover, we demonstrate that our approach can complement existing rehearsal-based replay approaches, improving their accuracy and further mitigating the forgetting phenomenon. Additionally, our method provides an intuitive mechanism for controlling the stability-plasticity trade-off. Experimental results showcase the significant performance enhancement to state-of-the-art experience replay algorithms the proposed weight consolidation approach offers. Our algorithm can be downloaded from https://github.com/jedrzejkozal/weight-interpolation-cl.
Good Books are Complex Matters: Gauging Complexity Profiles Across Diverse Categories of Perceived Literary Quality
Authors: Yuri Bizzoni, Pascale Feldkamp, Ida Marie Lassen, Mia Jacobsen, Mads Rosendahl Thomsen, Kristoffer Nielbo
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.04022
Pdf link: https://arxiv.org/pdf/2404.04022
Abstract In this study, we employ a classification approach to show that different categories of literary "quality" display unique linguistic profiles, leveraging a corpus that encompasses titles from the Norton Anthology, Penguin Classics series, and the Open Syllabus project, contrasted against contemporary bestsellers, Nobel prize winners and recipients of prestigious literary awards. Our analysis reveals that canonical and so called high-brow texts exhibit distinct textual features when compared to other quality categories such as bestsellers and popular titles as well as to control groups, likely responding to distinct (but not mutually exclusive) models of quality. We apply a classic machine learning approach, namely Random Forest, to distinguish quality novels from "control groups", achieving up to 77\% F1 scores in differentiating between the categories. We find that quality category tend to be easier to distinguish from control groups than from other quality categories, suggesting than literary quality features might be distinguishable but shared through quality proxies.
Cycle Life Prediction for Lithium-ion Batteries: Machine Learning and More
Authors: Joachim Schaeffer, Giacomo Galuppini, Jinwook Rhyu, Patrick A. Asinger, Robin Droop, Rolf Findeisen, Richard D. Braatz
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.04049
Pdf link: https://arxiv.org/pdf/2404.04049
Abstract Batteries are dynamic systems with complicated nonlinear aging, highly dependent on cell design, chemistry, manufacturing, and operational conditions. Prediction of battery cycle life and estimation of aging states is important to accelerate battery R&D, testing, and to further the understanding of how batteries degrade. Beyond testing, battery management systems rely on real-time models and onboard diagnostics and prognostics for safe operation. Estimating the state of health and remaining useful life of a battery is important to optimize performance and use resources optimally. This tutorial begins with an overview of first-principles, machine learning, and hybrid battery models. Then, a typical pipeline for the development of interpretable machine learning models is explained and showcased for cycle life prediction from laboratory testing data. We highlight the challenges of machine learning models, motivating the incorporation of physics in hybrid modeling approaches, which are needed to decipher the aging trajectory of batteries but require more data and further work on the physics of battery degradation. The tutorial closes with a discussion on generalization and further research directions.
Derivative-free tree optimization for complex systems
Authors: Ye Wei, Bo Peng, Ruiwen Xie, Yangtao Chen, Yu Qin, Peng Wen, Stefan Bauer, Po-Yen Tung
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2404.04062
Pdf link: https://arxiv.org/pdf/2404.04062
Abstract A tremendous range of design tasks in materials, physics, and biology can be formulated as finding the optimum of an objective function depending on many parameters without knowing its closed-form expression or the derivative. Traditional derivative-free optimization techniques often rely on strong assumptions about objective functions, thereby failing at optimizing non-convex systems beyond 100 dimensions. Here, we present a tree search method for derivative-free optimization that enables accelerated optimal design of high-dimensional complex systems. Specifically, we introduce stochastic tree expansion, dynamic upper confidence bound, and short-range backpropagation mechanism to evade local optimum, iteratively approximating the global optimum using machine learning models. This development effectively confronts the dimensionally challenging problems, achieving convergence to global optima across various benchmark functions up to 2,000 dimensions, surpassing the existing methods by 10- to 20-fold. Our method demonstrates wide applicability to a wide range of real-world complex systems spanning materials, physics, and biology, considerably outperforming state-of-the-art algorithms. This enables efficient autonomous knowledge discovery and facilitates self-driving virtual laboratories. Although we focus on problems within the realm of natural science, the advancements in optimization techniques achieved herein are applicable to a broader spectrum of challenges across all quantitative disciplines.
Hierarchical Neural Additive Models for Interpretable Demand Forecasts
Authors: Leif Feddersen, Catherine Cleophas
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2404.04070
Pdf link: https://arxiv.org/pdf/2404.04070
Abstract Demand forecasts are the crucial basis for numerous business decisions, ranging from inventory management to strategic facility planning. While machine learning (ML) approaches offer accuracy gains, their interpretability and acceptance are notoriously lacking. Addressing this dilemma, we introduce Hierarchical Neural Additive Models for time series (HNAM). HNAM expands upon Neural Additive Models (NAM) by introducing a time-series specific additive model with a level and interacting covariate components. Covariate interactions are only allowed according to a user-specified interaction hierarchy. For example, weekday effects may be estimated independently of other covariates, whereas a holiday effect may depend on the weekday and an additional promotion may depend on both former covariates that are lower in the interaction hierarchy. Thereby, HNAM yields an intuitive forecasting interface in which analysts can observe the contribution for each known covariate. We evaluate the proposed approach and benchmark its performance against other state-of-the-art machine learning and statistical models extensively on real-world retail data. The results reveal that HNAM offers competitive prediction performance whilst providing plausible explanations.
Machine Learning-Aided Cooperative Localization under Dense Urban Environment
Authors: Hoon Lee, Hong Ki Kim, Seung Hyun Oh, Sang Hyun Lee
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.04096
Pdf link: https://arxiv.org/pdf/2404.04096
Abstract Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions including localization and controls. Location awareness, in particular, lends itself to the deployment of location-specific services and the improvement of the operation performance. The localization entails direct communication to the network infrastructure, and the resulting centralized positioning solutions readily become intractable as the network scales up. As an alternative to the centralized solutions, this article addresses decentralized principle of vehicular localization reinforced by machine learning techniques in dense urban environments with frequent inaccessibility to reliable measurement. As such, the collaboration of multiple vehicles enhances the positioning performance of machine learning approaches. A virtual testbed is developed to validate this machine learning model for real-map vehicular networks. Numerical results demonstrate universal feasibility of cooperative localization, in particular, for dense urban area configurations.
Generalizable Temperature Nowcasting with Physics-Constrained RNNs for Predictive Maintenance of Wind Turbine Components
Authors: Johannes Exenberger, Matteo Di Salvo, Thomas Hirsch, Franz Wotawa, Gerald Schweiger
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.04126
Pdf link: https://arxiv.org/pdf/2404.04126
Abstract Machine learning plays an important role in the operation of current wind energy production systems. One central application is predictive maintenance to increase efficiency and lower electricity costs by reducing downtimes. Integrating physics-based knowledge in neural networks to enforce their physical plausibilty is a promising method to improve current approaches, but incomplete system information often impedes their application in real world scenarios. We describe a simple and efficient way for physics-constrained deep learning-based predictive maintenance for wind turbine gearbox bearings with partial system knowledge. The approach is based on temperature nowcasting constrained by physics, where unknown system coefficients are treated as learnable neural network parameters. Results show improved generalization performance to unseen environments compared to a baseline neural network, which is especially important in low data scenarios often encountered in real-world applications.
Precision Guided Approach to Mitigate Data Poisoning Attacks in Federated Learning
Authors: K Naveen Kumar, C Krishna Mohan, Aravind Machiry
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.04139
Pdf link: https://arxiv.org/pdf/2404.04139
Abstract Federated Learning (FL) is a collaborative learning paradigm enabling participants to collectively train a shared machine learning model while preserving the privacy of their sensitive data. Nevertheless, the inherent decentralized and data-opaque characteristics of FL render its susceptibility to data poisoning attacks. These attacks introduce malformed or malicious inputs during local model training, subsequently influencing the global model and resulting in erroneous predictions. Current FL defense strategies against data poisoning attacks either involve a trade-off between accuracy and robustness or necessitate the presence of a uniformly distributed root dataset at the server. To overcome these limitations, we present FedZZ, which harnesses a zone-based deviating update (ZBDU) mechanism to effectively counter data poisoning attacks in FL. Further, we introduce a precision-guided methodology that actively characterizes these client clusters (zones), which in turn aids in recognizing and discarding malicious updates at the server. Our evaluation of FedZZ across two widely recognized datasets: CIFAR10 and EMNIST, demonstrate its efficacy in mitigating data poisoning attacks, surpassing the performance of prevailing state-of-the-art methodologies in both single and multi-client attack scenarios and varying attack volumes. Notably, FedZZ also functions as a robust client selection strategy, even in highly non-IID and attack-free scenarios. Moreover, in the face of escalating poisoning rates, the model accuracy attained by FedZZ displays superior resilience compared to existing techniques. For instance, when confronted with a 50% presence of malicious clients, FedZZ sustains an accuracy of 67.43%, while the accuracy of the second-best solution, FL-Defender, diminishes to 43.36%.
Reliable Feature Selection for Adversarially Robust Cyber-Attack Detection
Authors: João Vitorino, Miguel Silva, Eva Maia, Isabel Praça
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2404.04188
Pdf link: https://arxiv.org/pdf/2404.04188
Abstract The growing cybersecurity threats make it essential to use high-quality data to train Machine Learning (ML) models for network traffic analysis, without noisy or missing data. By selecting the most relevant features for cyber-attack detection, it is possible to improve both the robustness and computational efficiency of the models used in a cybersecurity system. This work presents a feature selection and consensus process that combines multiple methods and applies them to several network datasets. Two different feature sets were selected and were used to train multiple ML models with regular and adversarial training. Finally, an adversarial evasion robustness benchmark was performed to analyze the reliability of the different feature sets and their impact on the susceptibility of the models to adversarial examples. By using an improved dataset with more data diversity, selecting the best time-related features and a more specific feature set, and performing adversarial training, the ML models were able to achieve a better adversarially robust generalization. The robustness of the models was significantly improved without their generalization to regular traffic flows being affected, without increases of false alarms, and without requiring too many computational resources, which enables a reliable detection of suspicious activity and perturbed traffic flows in enterprise computer networks.
How Lexical is Bilingual Lexicon Induction?
Authors: Harsh Kohli, Helian Feng, Nicholas Dronen, Calvin McCarter, Sina Moeini, Ali Kebarighotbi
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.04221
Pdf link: https://arxiv.org/pdf/2404.04221
Abstract In contemporary machine learning approaches to bilingual lexicon induction (BLI), a model learns a mapping between the embedding spaces of a language pair. Recently, retrieve-and-rank approach to BLI has achieved state of the art results on the task. However, the problem remains challenging in low-resource settings, due to the paucity of data. The task is complicated by factors such as lexical variation across languages. We argue that the incorporation of additional lexical information into the recent retrieve-and-rank approach should improve lexicon induction. We demonstrate the efficacy of our proposed approach on XLING, improving over the previous state of the art by an average of 2\% across all language pairs.
Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions
Authors: Zachary R. Fox, Ayana Ghosh
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Data Analysis, Statistics and Probability (physics.data-an); Biomolecules (q-bio.BM)
Arxiv link: https://arxiv.org/abs/2404.04224
Pdf link: https://arxiv.org/pdf/2404.04224
Abstract Predicting and enhancing inherent properties based on molecular structures is paramount to design tasks in medicine, materials science, and environmental management. Most of the current machine learning and deep learning approaches have become standard for predictions, but they face challenges when applied across different datasets due to reliance on correlations between molecular representation and target properties. These approaches typically depend on large datasets to capture the diversity within the chemical space, facilitating a more accurate approximation, interpolation, or extrapolation of the chemical behavior of molecules. In our research, we introduce an active learning approach that discerns underlying cause-effect relationships through strategic sampling with the use of a graph loss function. This method identifies the smallest subset of the dataset capable of encoding the most information representative of a much larger chemical space. The identified causal relations are then leveraged to conduct systematic interventions, optimizing the design task within a chemical space that the models have not encountered previously. While our implementation focused on the QM9 quantum-chemical dataset for a specific design task-finding molecules with a large dipole moment-our active causal learning approach, driven by intelligent sampling and interventions, holds potential for broader applications in molecular, materials design and discovery.
Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism
Authors: Trilokesh Ranjan Sarkar, Nilanjan Das, Pralay Sankar Maitra, Bijoy Some, Ritwik Saha, Orijita Adhikary, Bishal Bose, Jaydip Sen
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.04245
Pdf link: https://arxiv.org/pdf/2404.04245
Abstract This technical report delves into an in-depth exploration of adversarial attacks specifically targeted at Deep Neural Networks (DNNs) utilized for image classification. The study also investigates defense mechanisms aimed at bolstering the robustness of machine learning models. The research focuses on comprehending the ramifications of two prominent attack methodologies: the Fast Gradient Sign Method (FGSM) and the Carlini-Wagner (CW) approach. These attacks are examined concerning three pre-trained image classifiers: Resnext50_32x4d, DenseNet-201, and VGG-19, utilizing the Tiny-ImageNet dataset. Furthermore, the study proposes the robustness of defensive distillation as a defense mechanism to counter FGSM and CW attacks. This defense mechanism is evaluated using the CIFAR-10 dataset, where CNN models, specifically resnet101 and Resnext50_32x4d, serve as the teacher and student models, respectively. The proposed defensive distillation model exhibits effectiveness in thwarting attacks such as FGSM. However, it is noted to remain susceptible to more sophisticated techniques like the CW attack. The document presents a meticulous validation of the proposed scheme. It provides detailed and comprehensive results, elucidating the efficacy and limitations of the defense mechanisms employed. Through rigorous experimentation and analysis, the study offers insights into the dynamics of adversarial attacks on DNNs, as well as the effectiveness of defensive strategies in mitigating their impact.
Keyword: optimization

Serial Parallel Reliability Redundancy Allocation Optimization for Energy Efficient and Fault Tolerant Cloud Computing
Authors: Gutha Jaya Krishna
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.03665
Pdf link: https://arxiv.org/pdf/2404.03665
Abstract Serial-parallel redundancy is a reliable way to ensure service and systems will be available in cloud computing. That method involves making copies of the same system or program, with only one remaining active. When an error occurs, the inactive copy can step in as a backup right away, this provides continuous performance and uninterrupted operation. This approach is called parallel redundancy, otherwise known as active-active redundancy, and its exceptional when it comes to strategy. It creates duplicates of a system or service that are all running at once. By doing this fault tolerance increases since if one copy fails, the workload can be distributed across any replica thats functioning properly. Reliability allocation depends on features in a system and the availability and fault tolerance you want from it. Serial redundancy or parallel redundancies can be applied to increase the dependability of systems and services. To demonstrate how well this concept works, we looked into fixed serial parallel reliability redundancy allocation issues followed by using an innovative hybrid optimization technique to find the best possible allocation for peak dependability. We then measured our findings against other research.
SpikeExplorer: hardware-oriented Design Space Exploration for Spiking Neural Networks on FPGA
Authors: Dario Padovano, Alessio Carpegna, Alessandro Savino, Stefano Di Carlo
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.03714
Pdf link: https://arxiv.org/pdf/2404.03714
Abstract One of today's main concerns is to bring Artificial Intelligence power to embedded systems for edge applications. The hardware resources and power consumption required by state-of-the-art models are incompatible with the constrained environments observed in edge systems, such as IoT nodes and wearable devices. Spiking Neural Networks (SNNs) can represent a solution in this sense: inspired by neuroscience, they reach unparalleled power and resource efficiency when run on dedicated hardware accelerators. However, when designing such accelerators, the amount of choices that can be taken is huge. This paper presents SpikExplorer, a modular and flexible Python tool for hardware-oriented Automatic Design Space Exploration to automate the configuration of FPGA accelerators for SNNs. Using Bayesian optimizations, SpikerExplorer enables hardware-centric multi-objective optimization, supporting factors such as accuracy, area, latency, power, and various combinations during the exploration process. The tool searches the optimal network architecture, neuron model, and internal and training parameters, trying to reach the desired constraints imposed by the user. It allows for a straightforward network configuration, providing the full set of explored points for the user to pick the trade-off that best fits the needs. The potential of SpikExplorer is showcased using three benchmark datasets. It reaches 95.8% accuracy on the MNIST dataset, with a power consumption of 180mW/image and a latency of 0.12 ms/image, making it a powerful tool for automatically optimizing SNNs.
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Authors: Corby Rosset, Ching-An Cheng, Arindam Mitra, Michael Santacroce, Ahmed Awadallah, Tengyang Xie
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.03715
Pdf link: https://arxiv.org/pdf/2404.03715
Abstract This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself. The typical approach for post-training LLMs involves Reinforcement Learning from Human Feedback (RLHF), which traditionally separates reward learning and subsequent policy optimization. However, such a reward maximization approach is limited by the nature of "point-wise" rewards (such as Bradley-Terry model), which fails to express complex intransitive or cyclic preference relations. While advances on RLHF show reward learning and policy optimization can be merged into a single contrastive objective for stability, they yet still remain tethered to the reward maximization framework. Recently, a new wave of research sidesteps the reward maximization presumptions in favor of directly optimizing over "pair-wise" or general preferences. In this paper, we introduce Direct Nash Optimization (DNO), a provable and scalable algorithm that marries the simplicity and stability of contrastive learning with theoretical generality from optimizing general preferences. Because DNO is a batched on-policy algorithm using a regression-based objective, its implementation is straightforward and efficient. Moreover, DNO enjoys monotonic improvement across iterations that help it improve even over a strong teacher (such as GPT-4). In our experiments, a resulting 7B parameter Orca-2.5 model aligned by DNO achieves the state-of-the-art win-rate against GPT-4-Turbo of 33% on AlpacaEval 2.0 (even after controlling for response length), an absolute gain of 26% (7% to 33%) over the initializing model. It outperforms models with far more parameters, including Mistral Large, Self-Rewarding LM (70B parameters), and older versions of GPT-4.
Localized Distributional Robustness in Submodular Multi-Task Subset Selection
Authors: Ege C. Kaya, Abolfazl Hashemi
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2404.03759
Pdf link: https://arxiv.org/pdf/2404.03759
Abstract In this work, we approach the problem of multi-task submodular optimization with the perspective of local distributional robustness, within the neighborhood of a reference distribution which assigns an importance score to each task. We initially propose to introduce a regularization term which makes use of the relative entropy to the standard multi-task objective. We then demonstrate through duality that this novel formulation itself is equivalent to the maximization of a submodular function, which may be efficiently carried out through standard greedy selection methods. This approach bridges the existing gap in the optimization of performance-robustness trade-offs in multi-task subset selection. To numerically validate our theoretical results, we test the proposed method in two different setting, one involving the selection of satellites in low Earth orbit constellations in the context of a sensor selection problem, and the other involving an image summarization task using neural networks. Our method is compared with two other algorithms focused on optimizing the performance of the worst-case task, and on directly optimizing the performance on the reference distribution itself. We conclude that our novel formulation produces a solution that is locally distributional robust, and computationally inexpensive.
Understanding Language Modeling Paradigm Adaptations in Recommender Systems: Lessons Learned and Open Challenges
Authors: Lemei Zhang, Peng Liu, Yashar Deldjoo, Yong Zheng, Jon Atle Gulla
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2404.03788
Pdf link: https://arxiv.org/pdf/2404.03788
Abstract The emergence of Large Language Models (LLMs) has achieved tremendous success in the field of Natural Language Processing owing to diverse training paradigms that empower LLMs to effectively capture intricate linguistic patterns and semantic representations. In particular, the recent "pre-train, prompt and predict" training paradigm has attracted significant attention as an approach for learning generalizable models with limited labeled data. In line with this advancement, these training paradigms have recently been adapted to the recommendation domain and are seen as a promising direction in both academia and industry. This half-day tutorial aims to provide a thorough understanding of extracting and transferring knowledge from pre-trained models learned through different training paradigms to improve recommender systems from various perspectives, such as generality, sparsity, effectiveness and trustworthiness. In this tutorial, we first introduce the basic concepts and a generic architecture of the language modeling paradigm for recommendation purposes. Then, we focus on recent advancements in adapting LLM-related training strategies and optimization objectives for different recommendation tasks. After that, we will systematically introduce ethical issues in LLM-based recommender systems and discuss possible approaches to assessing and mitigating them. We will also summarize the relevant datasets, evaluation metrics, and an empirical study on the recommendation performance of training paradigms. Finally, we will conclude the tutorial with a discussion of open challenges and future directions.
Optimization of resources for digital radio transmission over IBOC FM through max-min fairness
Authors: Mónica Rico Martínez, Juan Carlos Vesga Ferreira, Joel Carroll Vargas, María Consuelo Rodríguez Niño, Andrés Alejandro Diaz Toro, William Alexander Cuevas Carrero
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2404.03795
Pdf link: https://arxiv.org/pdf/2404.03795
Abstract The equitable distribution of resources in a network is a complex process, considering that not all nodes have the same requirements, and the In-Band On-Channel (IBOC) hybrid transmission system is no exception. The IBOC system utilizes a hybrid in-band transmission to simultaneously broadcast analog and digital audio over the FM band. This article proposes the use of a Max-Min Fairness (MMF) algorithm, with a strategy to optimize resource allocation for IBOC FM transmission in a multiservice scenario. Additionally, the MMF algorithm offers low computational complexity for implementation in low-cost embedded systems, aiming to achieve fair resource distribution and provide adequate Quality of Service (QoS) levels for each node in the RF network, considering channel conditions and traffic types. The article explores a scenario under saturated traffic conditions to assess the optimization capabilities of the MMF algorithm under well-defined traffic and channel conditions. The evaluation process yielded highly favorable results, indicating that theMMF algorithm can be considered a viable alternative for bandwidth optimization in digital broadcasting over IBOC on FM with 95% confidence, and it holds potential for implementation in other digital broadcasting system.
Fast k-connectivity Restoration in Multi-Robot Systems for Robust Communication Maintenance
Authors: Md Ishat-E-Rabban, Guangyao Shi, Griffin Bonner, Pratap Tokekar
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2404.03834
Pdf link: https://arxiv.org/pdf/2404.03834
Abstract Maintaining a robust communication network plays an important role in the success of a multi-robot team jointly performing an optimization task. A key characteristic of a robust cooperative multi-robot system is the ability to repair the communication topology in the case of robot failure. In this paper, we focus on the Fast k-connectivity Restoration (FCR) problem, which aims to repair a network to make it k-connected with minimum robot movement. We develop a Quadratically Constrained Program (QCP) formulation of the FCR problem, which provides a way to optimally solve the problem, but cannot handle large instances due to high computational overhead. We therefore present a scalable algorithm, called EA-SCR, for the FCR problem using graph theoretic concepts. By conducting empirical studies, we demonstrate that the EA-SCR algorithm performs within 10 percent of the optimal while being orders of magnitude faster. We also show that EA-SCR outperforms existing solutions by 30 percent in terms of the FCR distance metric.
A Block-Coordinate Descent EMO Algorithm: Theoretical and Empirical Analysis
Authors: Benjamin Doerr, Joshua Knowles, Aneta Neumann, Frank Neumann
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.03838
Pdf link: https://arxiv.org/pdf/2404.03838
Abstract We consider whether conditions exist under which block-coordinate descent is asymptotically efficient in evolutionary multi-objective optimization, addressing an open problem. Block-coordinate descent, where an optimization problem is decomposed into $k$ blocks of decision variables and each of the blocks is optimized (with the others fixed) in a sequence, is a technique used in some large-scale optimization problems such as airline scheduling, however its use in multi-objective optimization is less studied. We propose a block-coordinate version of GSEMO and compare its running time to the standard GSEMO algorithm. Theoretical and empirical results on a bi-objective test function, a variant of LOTZ, serve to demonstrate the existence of cases where block-coordinate descent is faster. The result may yield wider insights into this class of algorithms.
The Low-Degree Hardness of Finding Large Independent Sets in Sparse Random Hypergraphs
Authors: Abhishek Dhawan, Yuzhou Wang
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO); Probability (math.PR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2404.03842
Pdf link: https://arxiv.org/pdf/2404.03842
Abstract We study the algorithmic task of finding large independent sets in Erdos-Renyi $r$-uniform hypergraphs on $n$ vertices having average degree $d$. Krivelevich and Sudakov showed that the maximum independent set has density $\left(\frac{r\log d}{(r-1)d}\right)^{1/(r-1)}$. We show that the class of low-degree polynomial algorithms can find independent sets of density $\left(\frac{\log d}{(r-1)d}\right)^{1/(r-1)}$ but no larger. This extends and generalizes earlier results of Gamarnik and Sudan, Rahman and Virag, and Wein on graphs, and answers a question of Bal and Bennett. We conjecture that this statistical-computational gap holds for this problem. Additionally, we explore the universality of this gap by examining $r$-partite hypergraphs. A hypergraph $H=(V,E)$ is $r$-partite if there is a partition $V=V_1\cup\cdots\cup V_r$ such that each edge contains exactly one vertex from each set $V_i$. We consider the problem of finding large balanced independent sets (independent sets containing the same number of vertices in each partition) in random $r$-partite hypergraphs with $n$ vertices in each partition and average degree $d$. We prove that the maximum balanced independent set has density $\left(\frac{r\log d}{(r-1)d}\right)^{1/(r-1)}$ asymptotically. Furthermore, we prove an analogous low-degree computational threshold of $\left(\frac{\log d}{(r-1)d}\right)^{1/(r-1)}$. Our results recover and generalize recent work of Perkins and the second author on bipartite graphs. While the graph case has been extensively studied, this work is the first to consider statistical-computational gaps of optimization problems on random hypergraphs. Our results suggest that these gaps persist for larger uniformities as well as across many models. A somewhat surprising aspect of the gap for balanced independent sets is that the algorithm achieving the lower bound is a simple degree-1 polynomial.
Mitigating Heterogeneity in Federated Multimodal Learning with Biomedical Vision-Language Pre-training
Authors: Zitao Shuai, Liyue Shen
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.03854
Pdf link: https://arxiv.org/pdf/2404.03854
Abstract Vision-language pre-training (VLP) has arised as an efficient scheme for multimodal representation learning, but it requires large-scale multimodal data for pre-training, making it an obstacle especially for biomedical applications. To overcome the data limitation, federated learning (FL) can be a promising strategy to scale up the dataset for biomedical VLP while protecting data privacy. However, client data are often heterogeneous in real-world scenarios, and we observe that local training on heterogeneous client data would distort the multimodal representation learning and lead to biased cross-modal alignment. To address this challenge, we propose Federated distributional Robust Guidance-Based (FedRGB) learning framework for federated VLP with robustness to data heterogeneity. Specifically, we utilize a guidance-based local training scheme to reduce feature distortions, and employ a distribution-based min-max optimization to learn unbiased cross-modal alignment. The experiments on real-world datasets show our method successfully promotes efficient federated multimodal learning for biomedical VLP with data heterogeneity.
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Authors: Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.03862
Pdf link: https://arxiv.org/pdf/2404.03862
Abstract For humans to trust the fluent generations of large language models (LLMs), they must be able to verify their correctness against trusted, external sources. Recent efforts aim to increase verifiability through citations of retrieved documents or post-hoc provenance. However, such citations are prone to mistakes that further complicate their verifiability. To address these limitations, we tackle the verifiability goal with a different philosophy: we trivialize the verification process by developing models that quote verbatim statements from trusted sources in pre-training data. We propose Quote-Tuning, which demonstrates the feasibility of aligning LLMs to leverage memorized information and quote from pre-training data. Quote-Tuning quantifies quoting against large corpora with efficient membership inference tools, and uses the amount of quotes as an implicit reward signal to construct a synthetic preference dataset for quoting, without any human annotation. Next, the target model is aligned to quote using preference optimization algorithms. Experimental results show that Quote-Tuning significantly increases the percentage of LLM generation quoted verbatim from high-quality pre-training documents by 55% to 130% relative to untuned models while maintaining response quality. Further experiments demonstrate that Quote-Tuning generalizes quoting to out-of-domain data, is applicable in different tasks, and provides additional benefits to truthfulness. Quote-Tuning not only serves as a hassle-free method to increase quoting but also opens up avenues for improving LLM trustworthiness through better verifiability.
Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration
Authors: Xudong Guo, Daming Shi, Junjie Yu, Wenhui Fan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.03869
Pdf link: https://arxiv.org/pdf/2404.03869
Abstract The rise of multi-agent systems, especially the success of multi-agent reinforcement learning (MARL), is reshaping our future across diverse domains like autonomous vehicle networks. However, MARL still faces significant challenges, particularly in achieving zero-shot scalability, which allows trained MARL models to be directly applied to unseen tasks with varying numbers of agents. In addition, real-world multi-agent systems usually contain agents with different functions and strategies, while the existing scalable MARL methods only have limited heterogeneity. To address this, we propose a novel MARL framework named Scalable and Heterogeneous Proximal Policy Optimization (SHPPO), integrating heterogeneity into parameter-shared PPO-based MARL networks. we first leverage a latent network to adaptively learn strategy patterns for each agent. Second, we introduce a heterogeneous layer for decision-making, whose parameters are specifically generated by the learned latent variables. Our approach is scalable as all the parameters are shared except for the heterogeneous layer, and gains both inter-individual and temporal heterogeneity at the same time. We implement our approach based on the state-of-the-art backbone PPO-based algorithm as SHPPO, while our approach is agnostic to the backbone and can be seamlessly plugged into any parameter-shared MARL method. SHPPO exhibits superior performance over the baselines such as MAPPO and HAPPO in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showcasing enhanced zero-shot scalability and offering insights into the learned latent representation's impact on team performance by visualization.
A proximal policy optimization based intelligent home solar management
Authors: Kode Creer, Imitiaz Parvez
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.03888
Pdf link: https://arxiv.org/pdf/2404.03888
Abstract In the smart grid, the prosumers can sell unused electricity back to the power grid, assuming the prosumers own renewable energy sources and storage units. The maximizing of their profits under a dynamic electricity market is a problem that requires intelligent planning. To address this, we propose a framework based on Proximal Policy Optimization (PPO) using recurrent rewards. By using the information about the rewards modeled effectively with PPO to maximize our objective, we were able to get over 30\% improvement over the other naive algorithms in accumulating total profits. This shows promise in getting reinforcement learning algorithms to perform tasks required to plan their actions in complex domains like financial markets. We also introduce a novel method for embedding longs based on soliton waves that outperformed normal embedding in our use case with random floating point data augmentation.
Game-theoretic Distributed Learning Approach for Heterogeneous-cost Task Allocation with Budget Constraints
Authors: Weiyi Yang, Xiaolu Liu, Lei He, Yonghao Du, Yingwu Chen
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2404.03974
Pdf link: https://arxiv.org/pdf/2404.03974
Abstract This paper investigates heterogeneous-cost task allocation with budget constraints (HCTAB), wherein heterogeneity is manifested through the varying capabilities and costs associated with different agents for task execution. Different from the centralized optimization-based method, the HCTAB problem is solved using a fully distributed framework, and a coalition formation game is introduced to provide a theoretical guarantee for this distributed framework. To solve the coalition formation game, a convergence-guaranteed log-linear learning algorithm based on heterogeneous cost is proposed. This algorithm incorporates two improvement strategies, namely, a cooperative exchange strategy and a heterogeneous-cost log-linear learning strategy. These strategies are specifically designed to be compatible with the heterogeneous cost and budget constraints characteristic of the HCTAB problem. Through ablation experiments, we demonstrate the effectiveness of these two improvements. Finally, numerical results show that the proposed algorithm outperforms existing task allocation algorithms and learning algorithms in terms of solving the HCTAB problem.
Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks
Authors: Mohammed Ghaith Altarabichi, Sławomir Nowaczyk, Sepideh Pashami, Peyman Sheikholharam Mashhadi, Julia Handl
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2404.03992
Pdf link: https://arxiv.org/pdf/2404.03992
Abstract This paper investigates how various randomization techniques impact Deep Neural Networks (DNNs). Randomization, like weight noise and dropout, aids in reducing overfitting and enhancing generalization, but their interactions are poorly understood. The study categorizes randomness techniques into four types and proposes new methods: adding noise to the loss function and random masking of gradient updates. Using Particle Swarm Optimizer (PSO) for hyperparameter optimization, it explores optimal configurations across MNIST, FASHION-MNIST, CIFAR10, and CIFAR100 datasets. Over 30,000 configurations are evaluated, revealing data augmentation and weight initialization randomness as main performance contributors. Correlation analysis shows different optimizers prefer distinct randomization types. The complete implementation and dataset are available on GitHub.
Fast Genetic Algorithm for feature selection -- A qualitative approximation approach
Authors: Mohammed Ghaith Altarabichi, Sławomir Nowaczyk, Sepideh Pashami, Peyman Sheikholharam Mashhadi
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.03996
Pdf link: https://arxiv.org/pdf/2404.03996
Abstract Evolutionary Algorithms (EAs) are often challenging to apply in real-world settings since evolutionary computations involve a large number of evaluations of a typically expensive fitness function. For example, an evaluation could involve training a new machine learning model. An approximation (also known as meta-model or a surrogate) of the true function can be used in such applications to alleviate the computation cost. In this paper, we propose a two-stage surrogate-assisted evolutionary approach to address the computational issues arising from using Genetic Algorithm (GA) for feature selection in a wrapper setting for large datasets. We define 'Approximation Usefulness' to capture the necessary conditions to ensure correctness of the EA computations when an approximation is used. Based on this definition, we propose a procedure to construct a lightweight qualitative meta-model by the active selection of data instances. We then use a meta-model to carry out the feature selection task. We apply this procedure to the GA-based algorithm CHC (Cross generational elitist selection, Heterogeneous recombination and Cataclysmic mutation) to create a Qualitative approXimations variant, CHCQX. We show that CHCQX converges faster to feature subset solutions of significantly higher accuracy (as compared to CHC), particularly for large datasets with over 100K instances. We also demonstrate the applicability of the thinking behind our approach more broadly to Swarm Intelligence (SI), another branch of the Evolutionary Computation (EC) paradigm with results of PSOQX, a qualitative approximation adaptation of the Particle Swarm Optimization (PSO) method. A GitHub repository with the complete implementation is available.
Derivative-free tree optimization for complex systems
Authors: Ye Wei, Bo Peng, Ruiwen Xie, Yangtao Chen, Yu Qin, Peng Wen, Stefan Bauer, Po-Yen Tung
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2404.04062
Pdf link: https://arxiv.org/pdf/2404.04062
Abstract A tremendous range of design tasks in materials, physics, and biology can be formulated as finding the optimum of an objective function depending on many parameters without knowing its closed-form expression or the derivative. Traditional derivative-free optimization techniques often rely on strong assumptions about objective functions, thereby failing at optimizing non-convex systems beyond 100 dimensions. Here, we present a tree search method for derivative-free optimization that enables accelerated optimal design of high-dimensional complex systems. Specifically, we introduce stochastic tree expansion, dynamic upper confidence bound, and short-range backpropagation mechanism to evade local optimum, iteratively approximating the global optimum using machine learning models. This development effectively confronts the dimensionally challenging problems, achieving convergence to global optima across various benchmark functions up to 2,000 dimensions, surpassing the existing methods by 10- to 20-fold. Our method demonstrates wide applicability to a wide range of real-world complex systems spanning materials, physics, and biology, considerably outperforming state-of-the-art algorithms. This enables efficient autonomous knowledge discovery and facilitates self-driving virtual laboratories. Although we focus on problems within the realm of natural science, the advancements in optimization techniques achieved herein are applicable to a broader spectrum of challenges across all quantitative disciplines.
Queue-aware Network Control Algorithm with a High Quantum Computing Readiness-Evaluated in Discrete-time Flow Simulator for Fat-Pipe Networks
Authors: Arthur Witt
Subjects: Systems and Control (eess.SY); Emerging Technologies (cs.ET); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2404.04080
Pdf link: https://arxiv.org/pdf/2404.04080
Abstract The emerging technology of quantum computing has the potential to change the way how problems will be solved in the future. This work presents a centralized network control algorithm executable on already existing quantum computer which are based on the principle of quantum annealing like the D-Wave Advantage. We introduce a resource reoccupation algorithm for traffic engineering in wide-area networks. The proposed optimization algorithm changes traffic steering and resource allocation in case of overloaded transceivers. Settings of active components like fiber amplifiers and transceivers are not changed for the reason of stability. This algorithm is beneficial in situations when the network traffic is fluctuating in time scales of seconds or spontaneous bursts occur. Further, we developed a discrete-time flow simulator to study the algorithm's performance in wide-area networks. Our network simulator considers backlog and loss modeling of buffered transmission lines. Concurring flows are handled equally in case of a backlog. This work provides an ILP-based network configuring algorithm that is applicable on quantum annealing computers. We showcase, that traffic losses can be reduced significantly by a factor of 2 if a resource reoccupation algorithm is applied in a network with bursty traffic. As resources are used more efficiently by reoccupation in heavy load situations, overprovisioning of networks can be reduced. Thus, this new form of network operation leads toward a zero-margin network. We show that our newly introduced network simulator enables analyses of short-time effects like buffering within fat-pipe networks. As the calculation of network configurations in real-sized networks is typically time-consuming, quantum computing can enable the proposed network configuration algorithm for application in real-sized wide-area networks.
Robust Preference Optimization with Provable Noise Tolerance for LLMs
Authors: Xize Liang, Chao Chen, Jie Wang, Yue Wu, Zhihang Fu, Zhihao Shi, Feng Wu, Jieping Ye
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.04102
Pdf link: https://arxiv.org/pdf/2404.04102
Abstract The preference alignment aims to enable large language models (LLMs) to generate responses that conform to human values, which is essential for developing general AI systems. Ranking-based methods -- a promising class of alignment approaches -- learn human preferences from datasets containing response pairs by optimizing the log-likelihood margins between preferred and dis-preferred responses. However, due to the inherent differences in annotators' preferences, ranking labels of comparisons for response pairs are unavoidably noisy. This seriously hurts the reliability of existing ranking-based methods. To address this problem, we propose a provably noise-tolerant preference alignment method, namely RObust Preference Optimization (ROPO). To the best of our knowledge, ROPO is the first preference alignment method with noise-tolerance guarantees. The key idea of ROPO is to dynamically assign conservative gradient weights to response pairs with high label uncertainty, based on the log-likelihood margins between the responses. By effectively suppressing the gradients of noisy samples, our weighting strategy ensures that the expected risk has the same gradient direction independent of the presence and proportion of noise. Experiments on three open-ended text generation tasks with four base models ranging in size from 2.8B to 13B demonstrate that ROPO significantly outperforms existing ranking-based methods.
3D Facial Expressions through Analysis-by-Neural-Synthesis
Authors: George Retsinas, Panagiotis P. Filntisis, Radek Danecek, Victoria F. Abrevaya, Anastasios Roussos, Timo Bolkart, Petros Maragos
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.04104
Pdf link: https://arxiv.org/pdf/2404.04104
Abstract While existing methods for 3D face reconstruction from in-the-wild images excel at recovering the overall face shape, they commonly miss subtle, extreme, asymmetric, or rarely observed expressions. We improve upon these methods with SMIRK (Spatial Modeling for Image-based Reconstruction of Kinesics), which faithfully reconstructs expressive 3D faces from images. We identify two key limitations in existing methods: shortcomings in their self-supervised training formulation, and a lack of expression diversity in the training images. For training, most methods employ differentiable rendering to compare a predicted face mesh with the input image, along with a plethora of additional loss functions. This differentiable rendering loss not only has to provide supervision to optimize for 3D face geometry, camera, albedo, and lighting, which is an ill-posed optimization problem, but the domain gap between rendering and input image further hinders the learning process. Instead, SMIRK replaces the differentiable rendering with a neural rendering module that, given the rendered predicted mesh geometry, and sparsely sampled pixels of the input image, generates a face image. As the neural rendering gets color information from sampled image pixels, supervising with neural rendering-based reconstruction loss can focus solely on the geometry. Further, it enables us to generate images of the input identity with varying expressions while training. These are then utilized as input to the reconstruction model and used as supervision with ground truth geometry. This effectively augments the training data and enhances the generalization for diverse expressions. Our qualitative, quantitative and particularly our perceptual evaluations demonstrate that SMIRK achieves the new state-of-the art performance on accurate expression reconstruction. Project webpage: https://georgeretsi.github.io/smirk/.
The Unreasonable Effectiveness Of Early Discarding After One Epoch In Neural Network Hyperparameter Optimization
Authors: Romain Egele, Felix Mohr, Tom Viering, Prasanna Balaprakash
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.04111
Pdf link: https://arxiv.org/pdf/2404.04111
Abstract To reach high performance with deep learning, hyperparameter optimization (HPO) is essential. This process is usually time-consuming due to costly evaluations of neural networks. Early discarding techniques limit the resources granted to unpromising candidates by observing the empirical learning curves and canceling neural network training as soon as the lack of competitiveness of a candidate becomes evident. Despite two decades of research, little is understood about the trade-off between the aggressiveness of discarding and the loss of predictive performance. Our paper studies this trade-off for several commonly used discarding techniques such as successive halving and learning curve extrapolation. Our surprising finding is that these commonly used techniques offer minimal to no added value compared to the simple strategy of discarding after a constant number of epochs of training. The chosen number of epochs depends mostly on the available compute budget. We call this approach i-Epoch (i being the constant number of epochs with which neural networks are trained) and suggest to assess the quality of early discarding techniques by comparing how their Pareto-Front (in consumed training epochs and predictive performance) complement the Pareto-Front of i-Epoch.
Wireless Resource Optimization in Hybrid Semantic/Bit Communication Networks
Authors: Le Xia, Yao Sun, Dusit Niyato, Lan Zhang, Muhammad Ali Imran
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.04162
Pdf link: https://arxiv.org/pdf/2404.04162
Abstract Recently, semantic communication (SemCom) has shown great potential in significant resource savings and efficient information exchanges, thus naturally introducing a novel and practical cellular network paradigm where two modes of SemCom and conventional bit communication (BitCom) coexist. Nevertheless, the involved wireless resource management becomes rather complicated and challenging, given the unique background knowledge matching and time-consuming semantic coding requirements in SemCom. To this end, this paper jointly investigates user association (UA), mode selection (MS), and bandwidth allocation (BA) problems in a hybrid semantic/bit communication network (HSB-Net). Concretely, we first identify a unified performance metric of message throughput for both SemCom and BitCom links. Next, we specially develop a knowledge matching-aware two-stage tandem packet queuing model and theoretically derive the average packet loss ratio and queuing latency. Combined with practical constraints, we then formulate a joint optimization problem for UA, MS, and BA to maximize the overall message throughput of HSB-Net. Afterward, we propose an optimal resource management strategy by utilizing a Lagrange primal-dual transformation method and a preference list-based heuristic algorithm with polynomial-time complexity. Numerical results not only demonstrate the accuracy of our analytical queuing model, but also validate the performance superiority of our proposed strategy compared with different benchmarks.
Convex MPC and Thrust Allocation with Deadband for Spacecraft Rendezvous
Authors: Pedro Taborda, Hugo Matias, Daniel Silvestre, Pedro Lourenço
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2404.04197
Pdf link: https://arxiv.org/pdf/2404.04197
Abstract This paper delves into a rendezvous scenario involving a chaser and a target spacecraft, focusing on the application of Model Predictive Control (MPC) to design a controller capable of guiding the chaser toward the target. The operational principle of spacecraft thrusters, requiring a minimum activation time that leads to the existence of a control deadband, introduces mixed-integer constraints into the optimization, posing a considerable computational challenge due to the exponential complexity on the number of integer constraints. We address this complexity by presenting two solver algorithms that efficiently approximate the optimal solution in significantly less time than standard solvers, making them well-suited for real-time applications.
Enhancing IoT Intelligence: A Transformer-based Reinforcement Learning Methodology
Authors: Gaith Rjoub, Saidul Islam, Jamal Bentahar, Mohammed Amin Almaiah, Rana Alrawashdeh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.04205
Pdf link: https://arxiv.org/pdf/2404.04205
Abstract The proliferation of the Internet of Things (IoT) has led to an explosion of data generated by interconnected devices, presenting both opportunities and challenges for intelligent decision-making in complex environments. Traditional Reinforcement Learning (RL) approaches often struggle to fully harness this data due to their limited ability to process and interpret the intricate patterns and dependencies inherent in IoT applications. This paper introduces a novel framework that integrates transformer architectures with Proximal Policy Optimization (PPO) to address these challenges. By leveraging the self-attention mechanism of transformers, our approach enhances RL agents' capacity for understanding and acting within dynamic IoT environments, leading to improved decision-making processes. We demonstrate the effectiveness of our method across various IoT scenarios, from smart home automation to industrial control systems, showing marked improvements in decision-making efficiency and adaptability. Our contributions include a detailed exploration of the transformer's role in processing heterogeneous IoT data, a comprehensive evaluation of the framework's performance in diverse environments, and a benchmark against traditional RL methods. The results indicate significant advancements in enabling RL agents to navigate the complexities of IoT ecosystems, highlighting the potential of our approach to revolutionize intelligent automation and decision-making in the IoT landscape.
Modeling Kinematic Uncertainty of Tendon-Driven Continuum Robots via Mixture Density Networks
Authors: Jordan Thompson, Brian Y. Cho, Daniel S. Brown, Alan Kuntz
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2404.04241
Pdf link: https://arxiv.org/pdf/2404.04241
Abstract Tendon-driven continuum robot kinematic models are frequently computationally expensive, inaccurate due to unmodeled effects, or both. In particular, unmodeled effects produce uncertainties that arise during the robot's operation that lead to variability in the resulting geometry. We propose a novel solution to these issues through the development of a Gaussian mixture kinematic model. We train a mixture density network to output a Gaussian mixture model representation of the robot geometry given the current tendon displacements. This model computes a probability distribution that is more representative of the true distribution of geometries at a given configuration than a model that outputs a single geometry, while also reducing the computation time. We demonstrate one use of this model through a trajectory optimization method that explicitly reasons about the workspace uncertainty to minimize the probability of collision.
DiffOp-net: A Differential Operator-based Fully Convolutional Network for Unsupervised Deformable Image Registration
Authors: Jiong Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.04244
Pdf link: https://arxiv.org/pdf/2404.04244
Abstract Existing unsupervised deformable image registration methods usually rely on metrics applied to the gradients of predicted displacement or velocity fields as a regularization term to ensure transformation smoothness, which potentially limits registration accuracy. In this study, we propose a novel approach to enhance unsupervised deformable image registration by introducing a new differential operator into the registration framework. This operator, acting on the velocity field and mapping it to a dual space, ensures the smoothness of the velocity field during optimization, facilitating accurate deformable registration. In addition, to tackle the challenge of capturing large deformations inside image pairs, we introduce a Cross-Coordinate Attention module (CCA) and embed it into a proposed Fully Convolutional Networks (FCNs)-based multi-resolution registration architecture. Evaluation experiments are conducted on two magnetic resonance imaging (MRI) datasets. Compared to various state-of-the-art registration approaches, including a traditional algorithm and three representative unsupervised learning-based methods, our method achieves superior accuracies, maintaining desirable diffeomorphic properties, and exhibiting promising registration speed.
Keyword: deep learning

Securing Social Spaces: Harnessing Deep Learning to Eradicate Cyberbullying
Authors: Rohan Biswas, Kasturi Ganguly, Arijit Das, Diganta Saha
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2404.03686
Pdf link: https://arxiv.org/pdf/2404.03686
Abstract In today's digital world, cyberbullying is a serious problem that can harm the mental and physical health of people who use social media. This paper explains just how serious cyberbullying is and how it really affects indi-viduals exposed to it. It also stresses how important it is to find better ways to detect cyberbullying so that online spaces can be safer. Plus, it talks about how making more accurate tools to spot cyberbullying will be really helpful in the future. Our paper introduces a deep learning-based ap-proach, primarily employing BERT and BiLSTM architectures, to effective-ly address cyberbullying. This approach is designed to analyse large vol-umes of posts and predict potential instances of cyberbullying in online spaces. Our results demonstrate the superiority of the hateBERT model, an extension of BERT focused on hate speech detection, among the five mod-els, achieving an accuracy rate of 89.16%. This research is a significant con-tribution to "Computational Intelligence for Social Transformation," prom-ising a safer and more inclusive digital landscape.
Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning
Authors: Spyridon Chavlis, Panayiota Poirazi
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2404.03708
Pdf link: https://arxiv.org/pdf/2404.03708
Abstract Artificial neural networks (ANNs) are at the core of most Deep learning (DL) algorithms that successfully tackle complex problems like image recognition, autonomous driving, and natural language processing. However, unlike biological brains who tackle similar problems in a very efficient manner, DL algorithms require a large number of trainable parameters, making them energy-intensive and prone to overfitting. Here, we show that a new ANN architecture that incorporates the structured connectivity and restricted sampling properties of biological dendrites counteracts these limitations. We find that dendritic ANNs are more robust to overfitting and outperform traditional ANNs on several image classification tasks while using significantly fewer trainable parameters. This is achieved through the adoption of a different learning strategy, whereby most of the nodes respond to several classes, unlike classical ANNs that strive for class-specificity. These findings suggest that the incorporation of dendrites can make learning in ANNs precise, resilient, and parameter-efficient and shed new light on how biological features can impact the learning strategies of ANNs.
Explaining Explainability: Understanding Concept Activation Vectors
Authors: Angus Nicolson, Lisa Schut, J. Alison Noble, Yarin Gal
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2404.03713
Pdf link: https://arxiv.org/pdf/2404.03713
Abstract Recent interpretability methods propose using concept-based explanations to translate the internal representations of deep learning models into a language that humans are familiar with: concepts. This requires understanding which concepts are present in the representation space of a neural network. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. In this work, we investigate three properties of CAVs. CAVs may be: (1) inconsistent between layers, (2) entangled with different concepts, and (3) spatially dependent. Each property provides both challenges and opportunities in interpreting models. We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact. Understanding these properties can be used to our advantage. For example, we introduce spatially dependent CAVs to test if a model is translation invariant with respect to a specific concept and class. Our experiments are performed on ImageNet and a new synthetic dataset, Elements. Elements is designed to capture a known ground truth relationship between concepts and classes. We release this dataset to facilitate further research in understanding and evaluating interpretability methods.
Learning smooth functions in high dimensions: from sparse polynomials to deep neural networks
Authors: Ben Adcock, Simone Brugiapaglia, Nick Dexter, Sebastian Moraga
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.03761
Pdf link: https://arxiv.org/pdf/2404.03761
Abstract Learning approximations to smooth target functions of many variables from finite sets of pointwise samples is an important task in scientific computing and its many applications in computational science and engineering. Despite well over half a century of research on high-dimensional approximation, this remains a challenging problem. Yet, significant advances have been made in the last decade towards efficient methods for doing this, commencing with so-called sparse polynomial approximation methods and continuing most recently with methods based on Deep Neural Networks (DNNs). In tandem, there have been substantial advances in the relevant approximation theory and analysis of these techniques. In this work, we survey this recent progress. We describe the contemporary motivations for this problem, which stem from parametric models and computational uncertainty quantification; the relevant function classes, namely, classes of infinite-dimensional, Banach-valued, holomorphic functions; fundamental limits of learnability from finite data for these classes; and finally, sparse polynomial and DNN methods for efficiently learning such functions from finite data. For the latter, there is currently a significant gap between the approximation theory of DNNs and the practical performance of deep learning. Aiming to narrow this gap, we develop the topic of practical existence theory, which asserts the existence of dimension-independent DNN architectures and training strategies that achieve provably near-optimal generalization errors in terms of the amount of training data.
Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models
Authors: Dennis Wu, Jerry Yao-Chieh Hu, Teng-Yun Hsiao, Han Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2404.03827
Pdf link: https://arxiv.org/pdf/2404.03827
Abstract We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed $\mathtt{U\text{-}Hop}$, with enhanced memory capacity. Our key contribution is a learnable feature map $\Phi$ which transforms the Hopfield energy function into a kernel space. This transformation ensures convergence between the local minima of energy and the fixed points of retrieval dynamics within the kernel space. Consequently, the kernel norm induced by $\Phi$ serves as a novel similarity measure. It utilizes the stored memory patterns as learning data to enhance memory capacity across all modern Hopfield models. Specifically, we accomplish this by constructing a separation loss $\mathcal{L}_\Phi$ that separates the local minima of kernelized energy by separating stored memory patterns in kernel space. Methodologically, $\mathtt{U\text{-}Hop}$ memory retrieval process consists of: \textbf{(Stage~I.)} minimizing separation loss for a more uniformed memory (local minimum) distribution, followed by \textbf{(Stage~II.)} standard Hopfield energy minimization for memory retrieval. This results in a significant reduction of possible meta-stable states in the Hopfield energy function, thus enhancing memory capacity by preventing memory confusion. Empirically, with real-world datasets, we demonstrate that $\mathtt{U\text{-}Hop}$ outperforms all existing modern Hopfield models and SOTA similarity measures, achieving substantial improvements in both associative memory retrieval and deep learning tasks.
Enhancing Breast Cancer Diagnosis in Mammography: Evaluation and Integration of Convolutional Neural Networks and Explainable AI
Authors: Maryam Ahmed, Tooba Bibi, Rizwan Ahmed Khan, Sidra Nasir
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.03892
Pdf link: https://arxiv.org/pdf/2404.03892
Abstract The study introduces an integrated framework combining Convolutional Neural Networks (CNNs) and Explainable Artificial Intelligence (XAI) for the enhanced diagnosis of breast cancer using the CBIS-DDSM dataset. Utilizing a fine-tuned ResNet50 architecture, our investigation not only provides effective differentiation of mammographic images into benign and malignant categories but also addresses the opaque "black-box" nature of deep learning models by employing XAI methodologies, namely Grad-CAM, LIME, and SHAP, to interpret CNN decision-making processes for healthcare professionals. Our methodology encompasses an elaborate data preprocessing pipeline and advanced data augmentation techniques to counteract dataset limitations, and transfer learning using pre-trained networks, such as VGG-16, DenseNet and ResNet was employed. A focal point of our study is the evaluation of XAI's effectiveness in interpreting model predictions, highlighted by utilising the Hausdorff measure to assess the alignment between AI-generated explanations and expert annotations quantitatively. This approach plays a critical role for XAI in promoting trustworthiness and ethical fairness in AI-assisted diagnostics. The findings from our research illustrate the effective collaboration between CNNs and XAI in advancing diagnostic methods for breast cancer, thereby facilitating a more seamless integration of advanced AI technologies within clinical settings. By enhancing the interpretability of AI-driven decisions, this work lays the groundwork for improved collaboration between AI systems and medical practitioners, ultimately enriching patient care. Furthermore, the implications of our research extend well beyond the current methodologies, advocating for subsequent inquiries into the integration of multimodal data and the refinement of AI explanations to satisfy the needs of clinical practice.
Multi-Task Learning for Lung sound & Lung disease classification
Authors: Suma K V, Deepali Koppad, Preethi Kumar, Neha A Kantikar, Surabhi Ramesh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD)
Arxiv link: https://arxiv.org/abs/2404.03908
Pdf link: https://arxiv.org/pdf/2404.03908
Abstract In recent years, advancements in deep learning techniques have considerably enhanced the efficiency and accuracy of medical diagnostics. In this work, a novel approach using multi-task learning (MTL) for the simultaneous classification of lung sounds and lung diseases is proposed. Our proposed model leverages MTL with four different deep learning models such as 2D CNN, ResNet50, MobileNet and Densenet to extract relevant features from the lung sound recordings. The ICBHI 2017 Respiratory Sound Database was employed in the current study. The MTL for MobileNet model performed better than the other models considered, with an accuracy of74\% for lung sound analysis and 91\% for lung diseases classification. Results of the experimentation demonstrate the efficacy of our approach in classifying both lung sounds and lung diseases concurrently. In this study,using the demographic data of the patients from the database, risk level computation for Chronic Obstructive Pulmonary Disease is also carried out. For this computation, three machine learning algorithms namely Logistic Regression, SVM and Random Forest classifierswere employed. Among these ML algorithms, the Random Forest classifier had the highest accuracy of 92\%.This work helps in considerably reducing the physician's burden of not just diagnosing the pathology but also effectively communicating to the patient about the possible causes or outcomes.
Deep Learning for Satellite Image Time Series Analysis: A Review
Authors: Lynn Miller, Charlotte Pelletier, Geoffrey I. Webb
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.03936
Pdf link: https://arxiv.org/pdf/2404.03936
Abstract Earth observation (EO) satellite missions have been providing detailed images about the state of the Earth and its land cover for over 50 years. Long term missions, such as NASA's Landsat, Terra, and Aqua satellites, and more recently, the ESA's Sentinel missions, record images of the entire world every few days. Although single images provide point-in-time data, repeated images of the same area, or satellite image time series (SITS) provide information about the changing state of vegetation and land use. These SITS are useful for modeling dynamic processes and seasonal changes such as plant phenology. They have potential benefits for many aspects of land and natural resource management, including applications in agricultural, forest, water, and disaster management, urban planning, and mining. However, the resulting satellite image time series (SITS) are complex, incorporating information from the temporal, spatial, and spectral dimensions. Therefore, deep learning methods are often deployed as they can analyze these complex relationships. This review presents a summary of the state-of-the-art methods of modelling environmental, agricultural, and other Earth observation variables from SITS data using deep learning methods. We aim to provide a resource for remote sensing experts interested in using deep learning techniques to enhance Earth observation models with temporal information.
Re-pseudonymization Strategies for Smart Meter Data Are Not Robust to Deep Learning Profiling Attacks
Authors: Ana-Maria Cretu, Miruna Rusu, Yves-Alexandre de Montjoye
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.03948
Pdf link: https://arxiv.org/pdf/2404.03948
Abstract Smart meters, devices measuring the electricity and gas consumption of a household, are currently being deployed at a fast rate throughout the world. The data they collect are extremely useful, including in the fight against climate change. However, these data and the information that can be inferred from them are highly sensitive. Re-pseudonymization, i.e., the frequent replacement of random identifiers over time, is widely used to share smart meter data while mitigating the risk of re-identification. We here show how, in spite of re-pseudonymization, households' consumption records can be pieced together with high accuracy in large-scale datasets. We propose the first deep learning-based profiling attack against re-pseudonymized smart meter data. Our attack combines neural network embeddings, which are used to extract features from weekly consumption records and are tailored to the smart meter identification task, with a nearest neighbor classifier. We evaluate six neural networks architectures as the embedding model. Our results suggest that the Transformer and CNN-LSTM architectures vastly outperform previous methods as well as other architectures, successfully identifying the correct household 73.4% of the time among 5139 households based on electricity and gas consumption records (54.5% for electricity only). We further show that the features extracted by the embedding model maintain their effectiveness when transferred to a set of users disjoint from the one used to train the model. Finally, we extensively evaluate the robustness of our results. Taken together, our results strongly suggest that even frequent re-pseudonymization strategies can be reversed, strongly limiting their ability to prevent re-identification in practice.
Transformers for molecular property prediction: Lessons learned from the past five years
Authors: Afnan Sultan, Jochen Sieg, Miriam Mathea, Andrea Volkamer
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2404.03969
Pdf link: https://arxiv.org/pdf/2404.03969
Abstract Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pre-training data, optimal architecture selections, and promising pre-training objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field's understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.
Model Selection with Model Zoo via Graph Learning
Authors: Ziyu Li, Hilco van der Wilk, Danning Zhan, Megha Khosla, Alessandro Bozzon, Rihan Hai
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2404.03988
Pdf link: https://arxiv.org/pdf/2404.03988
Abstract Pre-trained deep learning (DL) models are increasingly accessible in public repositories, i.e., model zoos. Given a new prediction task, finding the best model to fine-tune can be computationally intensive and costly, especially when the number of pre-trained models is large. Selecting the right pre-trained models is crucial, yet complicated by the diversity of models from various model families (like ResNet, Vit, Swin) and the hidden relationships between models and datasets. Existing methods, which utilize basic information from models and datasets to compute scores indicating model performance on target datasets, overlook the intrinsic relationships, limiting their effectiveness in model selection. In this study, we introduce TransferGraph, a novel framework that reformulates model selection as a graph learning problem. TransferGraph constructs a graph using extensive metadata extracted from models and datasets, while capturing their inherent relationships. Through comprehensive experiments across 16 real datasets, both images and texts, we demonstrate TransferGraph's effectiveness in capturing essential model-dataset relationships, yielding up to a 32% improvement in correlation between predicted performance and the actual fine-tuning results compared to the state-of-the-art methods.
Physics-Inspired Synthesized Underwater Image Dataset
Authors: Reina Kaneko, Hiroshi Higashi, Yuichi Tanaka
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.03998
Pdf link: https://arxiv.org/pdf/2404.03998
Abstract This paper introduces the physics-inspired synthesized underwater image dataset (PHISWID), a dataset tailored for enhancing underwater image processing through physics-inspired image synthesis. Deep learning approaches to underwater image enhancement typically demand extensive datasets, yet acquiring paired clean and degraded underwater ones poses significant challenges. While several underwater image datasets have been proposed using physics-based synthesis, a publicly accessible collection has been lacking. Additionally, most underwater image synthesis approaches do not intend to reproduce atmospheric scenes, resulting in incomplete enhancement. PHISWID addresses this gap by offering a set of paired ground-truth (atmospheric) and synthetically degraded underwater images, showcasing not only color degradation but also the often-neglected effects of marine snow, a composite of organic matter and sand particles that considerably impairs underwater image clarity. The dataset applies these degradations to atmospheric RGB-D images, enhancing the dataset's realism and applicability. PHISWID is particularly valuable for training deep neural networks in a supervised learning setting and for objectively assessing image quality in benchmark analyses. Our results reveal that even a basic U-Net architecture, when trained with PHISWID, substantially outperforms existing methods in underwater image enhancement. We intend to release PHISWID publicly, contributing a significant resource to the advancement of underwater imaging technology.
Finsler-Laplace-Beltrami Operators with Application to Shape Analysis
Authors: Simon Weber, Thomas Dagès, Maolin Gao, Daniel Cremers
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.03999
Pdf link: https://arxiv.org/pdf/2404.03999
Abstract The Laplace-Beltrami operator (LBO) emerges from studying manifolds equipped with a Riemannian metric. It is often called the Swiss army knife of geometry processing as it allows to capture intrinsic shape information and gives rise to heat diffusion, geodesic distances, and a multitude of shape descriptors. It also plays a central role in geometric deep learning. In this work, we explore Finsler manifolds as a generalization of Riemannian manifolds. We revisit the Finsler heat equation and derive a Finsler heat kernel and a Finsler-Laplace-Beltrami Operator (FLBO): a novel theoretically justified anisotropic Laplace-Beltrami operator (ALBO). In experimental evaluations we demonstrate that the proposed FLBO is a valuable alternative to the traditional Riemannian-based LBO and ALBOs for spatial filtering and shape correspondence estimation. We hope that the proposed Finsler heat kernel and the FLBO will inspire further exploration of Finsler geometry in the computer vision community.
A Comparison of Methods for Evaluating Generative IR
Authors: Negar Arabzadeh, Charles L. A. Clarke
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2404.04044
Pdf link: https://arxiv.org/pdf/2404.04044
Abstract Information retrieval systems increasingly incorporate generative components. For example, in a retrieval augmented generation (RAG) system, a retrieval component might provide a source of ground truth, while a generative component summarizes and augments its responses. In other systems, a large language model (LLM) might directly generate responses without consulting a retrieval component. While there are multiple definitions of generative information retrieval (Gen-IR) systems, in this paper we focus on those systems where the system's response is not drawn from a fixed collection of documents or passages. The response to a query may be entirely new text never. Since traditional IR evaluation methods break down under this model, we explore various methods that extend traditional offline evaluation approaches to the Gen-IR context. Offline IR evaluation traditionally employs paid human assessors, but increasingly LLMs are replacing human assessment, demonstrating capabilities similar or superior to crowdsourced labels. Given that Gen-IR systems do not generate responses from a fixed set, we assume that methods for Gen-IR evaluation must largely depend on LLM-generated labels. Along with methods based on binary and graded relevance, we explore methods based on explicit subtopics, pairwise preferences, and embeddings. We first validate these methods against human assessments on several TREC Deep Learning Track tasks; we then apply these methods to evaluate the output of several purely generative systems. For each method we consider both its ability to act autonomously, without the need for human labels or other input, and its ability to support human auditing. To trust these methods, we must be assured that their results align with human assessments. In order to do so, evaluation criteria must be transparent, so that outcomes can be audited by human assessors.
The Unreasonable Effectiveness Of Early Discarding After One Epoch In Neural Network Hyperparameter Optimization
Authors: Romain Egele, Felix Mohr, Tom Viering, Prasanna Balaprakash
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.04111
Pdf link: https://arxiv.org/pdf/2404.04111
Abstract To reach high performance with deep learning, hyperparameter optimization (HPO) is essential. This process is usually time-consuming due to costly evaluations of neural networks. Early discarding techniques limit the resources granted to unpromising candidates by observing the empirical learning curves and canceling neural network training as soon as the lack of competitiveness of a candidate becomes evident. Despite two decades of research, little is understood about the trade-off between the aggressiveness of discarding and the loss of predictive performance. Our paper studies this trade-off for several commonly used discarding techniques such as successive halving and learning curve extrapolation. Our surprising finding is that these commonly used techniques offer minimal to no added value compared to the simple strategy of discarding after a constant number of epochs of training. The chosen number of epochs depends mostly on the available compute budget. We call this approach i-Epoch (i being the constant number of epochs with which neural networks are trained) and suggest to assess the quality of early discarding techniques by comparing how their Pareto-Front (in consumed training epochs and predictive performance) complement the Pareto-Front of i-Epoch.
GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System
Authors: Yidong Gong, Pradeep Kumar
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2404.04118
Pdf link: https://arxiv.org/pdf/2404.04118
Abstract We hypothesize that the absence of a standardized benchmark has allowed several fundamental pitfalls in GNN System design and evaluation that the community has overlooked. In this work, we propose GNNBench, a plug-and-play benchmarking platform focused on system innovation. GNNBench presents a new protocol to exchange their captive tensor data, supports custom classes in System APIs, and allows automatic integration of the same system module to many deep learning frameworks, such as PyTorch and TensorFlow. To demonstrate the importance of such a benchmark framework, we integrated several GNN systems. Our results show that integration with GNNBench helped us identify several measurement issues that deserve attention from the community.
Generalizable Temperature Nowcasting with Physics-Constrained RNNs for Predictive Maintenance of Wind Turbine Components
Authors: Johannes Exenberger, Matteo Di Salvo, Thomas Hirsch, Franz Wotawa, Gerald Schweiger
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.04126
Pdf link: https://arxiv.org/pdf/2404.04126
Abstract Machine learning plays an important role in the operation of current wind energy production systems. One central application is predictive maintenance to increase efficiency and lower electricity costs by reducing downtimes. Integrating physics-based knowledge in neural networks to enforce their physical plausibilty is a promising method to improve current approaches, but incomplete system information often impedes their application in real world scenarios. We describe a simple and efficient way for physics-constrained deep learning-based predictive maintenance for wind turbine gearbox bearings with partial system knowledge. The approach is based on temperature nowcasting constrained by physics, where unknown system coefficients are treated as learnable neural network parameters. Results show improved generalization performance to unseen environments compared to a baseline neural network, which is especially important in low data scenarios often encountered in real-world applications.
Noisy Label Processing for Classification: A Survey
Authors: Mengting Li, Chuang Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.04159
Pdf link: https://arxiv.org/pdf/2404.04159
Abstract In recent years, deep neural networks (DNNs) have gained remarkable achievement in computer vision tasks, and the success of DNNs often depends greatly on the richness of data. However, the acquisition process of data and high-quality ground truth requires a lot of manpower and money. In the long, tedious process of data annotation, annotators are prone to make mistakes, resulting in incorrect labels of images, i.e., noisy labels. The emergence of noisy labels is inevitable. Moreover, since research shows that DNNs can easily fit noisy labels, the existence of noisy labels will cause significant damage to the model training process. Therefore, it is crucial to combat noisy labels for computer vision tasks, especially for classification tasks. In this survey, we first comprehensively review the evolution of different deep learning approaches for noisy label combating in the image classification task. In addition, we also review different noise patterns that have been proposed to design robust algorithms. Furthermore, we explore the inner pattern of real-world label noise and propose an algorithm to generate a synthetic label noise pattern guided by real-world data. We test the algorithm on the well-known real-world dataset CIFAR-10N to form a new real-world data-guided synthetic benchmark and evaluate some typical noise-robust methods on the benchmark.
SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers
Authors: Weile Li, Muqing Shi, Zhonghua Hong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.04179
Pdf link: https://arxiv.org/pdf/2404.04179
Abstract Traditional deep learning-based object detection networks often resize images during the data preprocessing stage to achieve a uniform size and scale in the feature map. Resizing is done to facilitate model propagation and fully connected classification. However, resizing inevitably leads to object deformation and loss of valuable information in the images. This drawback becomes particularly pronounced for tiny objects like distribution towers with linear shapes and few pixels. To address this issue, we propose abandoning the resizing operation. Instead, we introduce Positional-Encoding Multi-head Criss-Cross Attention. This allows the model to capture contextual information and learn from multiple representation subspaces, effectively enriching the semantics of distribution towers. Additionally, we enhance Spatial Pyramid Pooling by reshaping three pooled feature maps into a new unified one while also reducing the computational burden. This approach allows images of different sizes and scales to generate feature maps with uniform dimensions and can be employed in feature map propagation. Our SCAResNet incorporates these aforementioned improvements into the backbone network ResNet. We evaluated our SCAResNet using the Electric Transmission and Distribution Infrastructure Imagery dataset from Duke University. Without any additional tricks, we employed various object detection models with Gaussian Receptive Field based Label Assignment as the baseline. When incorporating the SCAResNet into the baseline model, we achieved a 2.1% improvement in mAPs. This demonstrates the advantages of our SCAResNet in detecting transmission and distribution towers and its value in tiny object detection. The source code is available at https://github.com/LisavilaLee/SCAResNet_mmdet.
Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions
Authors: Zachary R. Fox, Ayana Ghosh
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Data Analysis, Statistics and Probability (physics.data-an); Biomolecules (q-bio.BM)
Arxiv link: https://arxiv.org/abs/2404.04224
Pdf link: https://arxiv.org/pdf/2404.04224
Abstract Predicting and enhancing inherent properties based on molecular structures is paramount to design tasks in medicine, materials science, and environmental management. Most of the current machine learning and deep learning approaches have become standard for predictions, but they face challenges when applied across different datasets due to reliance on correlations between molecular representation and target properties. These approaches typically depend on large datasets to capture the diversity within the chemical space, facilitating a more accurate approximation, interpolation, or extrapolation of the chemical behavior of molecules. In our research, we introduce an active learning approach that discerns underlying cause-effect relationships through strategic sampling with the use of a graph loss function. This method identifies the smallest subset of the dataset capable of encoding the most information representative of a much larger chemical space. The identified causal relations are then leveraged to conduct systematic interventions, optimizing the design task within a chemical space that the models have not encountered previously. While our implementation focused on the QM9 quantum-chemical dataset for a specific design task-finding molecules with a large dipole moment-our active causal learning approach, driven by intelligent sampling and interventions, holds potential for broader applications in molecular, materials design and discovery.

qiaoyuet / arxiv_daily

New submissions for Mon, 8 Apr 24 #85

Keyword: differential privacy

PrivShape: Extracting Shapes in Time Series under User-Level Local Differential Privacy

From Theory to Comprehension: A Comparative Study of Differential Privacy and $k$-Anonymity

You Can Use But Cannot Recognize: Preserving Visual Privacy in Deep Neural Networks

Keyword: privacy

Federated Unlearning for Human Activity Recognition

Mitigating Heterogeneity in Federated Multimodal Learning with Biomedical Vision-Language Pre-training

PrivShape: Extracting Shapes in Time Series under User-Level Local Differential Privacy

From Theory to Comprehension: A Comparative Study of Differential Privacy and $k$-Anonymity

CLUE: A Clinical Language Understanding Evaluation for LLMs

You Can Use But Cannot Recognize: Preserving Visual Privacy in Deep Neural Networks

Precision Guided Approach to Mitigate Data Poisoning Attacks in Federated Learning

Keyword: machine learning

Machine Learning in Proton Exchange Membrane Water Electrolysis -- Part I: A Knowledge-Integrated Framework

Machine learning augmented diagnostic testing to identify sources of variability in test performance

Predictive Analytics of Varieties of Potatoes

On Extending the Automatic Test Markup Language (ATML) for Machine Learning

A Systems Theoretic Approach to Online Machine Learning

An ExplainableFair Framework for Prediction of Substance Use Disorder Treatment Completion

Optimizing Convolutional Neural Networks for Identifying Invasive Pollinator Apis Mellifera and Finding a Ligand drug to Protect California's Biodiversity

Semantic SQL -- Combining and optimizing semantic predicates in SQL

Multi-Task Learning for Lung sound & Lung disease classification

Towards Understanding the Impact of Code Modifications on Software Quality Metrics

Transformers for molecular property prediction: Lessons learned from the past five years

Fast Genetic Algorithm for feature selection -- A qualitative approximation approach

Continual Learning with Weight Interpolation

Good Books are Complex Matters: Gauging Complexity Profiles Across Diverse Categories of Perceived Literary Quality

Cycle Life Prediction for Lithium-ion Batteries: Machine Learning and More

Derivative-free tree optimization for complex systems

Hierarchical Neural Additive Models for Interpretable Demand Forecasts

Machine Learning-Aided Cooperative Localization under Dense Urban Environment

Generalizable Temperature Nowcasting with Physics-Constrained RNNs for Predictive Maintenance of Wind Turbine Components

Precision Guided Approach to Mitigate Data Poisoning Attacks in Federated Learning

Reliable Feature Selection for Adversarially Robust Cyber-Attack Detection

How Lexical is Bilingual Lexicon Induction?

Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions

Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism

Keyword: optimization

Serial Parallel Reliability Redundancy Allocation Optimization for Energy Efficient and Fault Tolerant Cloud Computing

SpikeExplorer: hardware-oriented Design Space Exploration for Spiking Neural Networks on FPGA

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Localized Distributional Robustness in Submodular Multi-Task Subset Selection

Understanding Language Modeling Paradigm Adaptations in Recommender Systems: Lessons Learned and Open Challenges

Optimization of resources for digital radio transmission over IBOC FM through max-min fairness

Fast k-connectivity Restoration in Multi-Robot Systems for Robust Communication Maintenance

A Block-Coordinate Descent EMO Algorithm: Theoretical and Empirical Analysis

The Low-Degree Hardness of Finding Large Independent Sets in Sparse Random Hypergraphs

Mitigating Heterogeneity in Federated Multimodal Learning with Biomedical Vision-Language Pre-training

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

A proximal policy optimization based intelligent home solar management

Game-theoretic Distributed Learning Approach for Heterogeneous-cost Task Allocation with Budget Constraints

Rolling the dice for better deep learning performance: A study of randomness techniques in deep neural networks

Fast Genetic Algorithm for feature selection -- A qualitative approximation approach

Derivative-free tree optimization for complex systems

Queue-aware Network Control Algorithm with a High Quantum Computing Readiness-Evaluated in Discrete-time Flow Simulator for Fat-Pipe Networks

Robust Preference Optimization with Provable Noise Tolerance for LLMs

3D Facial Expressions through Analysis-by-Neural-Synthesis

The Unreasonable Effectiveness Of Early Discarding After One Epoch In Neural Network Hyperparameter Optimization

Wireless Resource Optimization in Hybrid Semantic/Bit Communication Networks

Convex MPC and Thrust Allocation with Deadband for Spacecraft Rendezvous

Enhancing IoT Intelligence: A Transformer-based Reinforcement Learning Methodology

Modeling Kinematic Uncertainty of Tendon-Driven Continuum Robots via Mixture Density Networks

DiffOp-net: A Differential Operator-based Fully Convolutional Network for Unsupervised Deformable Image Registration

Keyword: deep learning

Securing Social Spaces: Harnessing Deep Learning to Eradicate Cyberbullying

Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning

Explaining Explainability: Understanding Concept Activation Vectors

Learning smooth functions in high dimensions: from sparse polynomials to deep neural networks

Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models

Enhancing Breast Cancer Diagnosis in Mammography: Evaluation and Integration of Convolutional Neural Networks and Explainable AI

Multi-Task Learning for Lung sound & Lung disease classification

Deep Learning for Satellite Image Time Series Analysis: A Review

Re-pseudonymization Strategies for Smart Meter Data Are Not Robust to Deep Learning Profiling Attacks

Transformers for molecular property prediction: Lessons learned from the past five years

Model Selection with Model Zoo via Graph Learning

Physics-Inspired Synthesized Underwater Image Dataset

Finsler-Laplace-Beltrami Operators with Application to Shape Analysis