New submissions for Friday, 24 May 2024 (showing 955 of 955 entries )

Keyword: differential privacy

A Huber Loss Minimization Approach to Mean Estimation under User-level Differential Privacy

Authors: Puning Zhao, Lifeng Lai, Li Shen, Qingming Li, Jiafei Wu, Zhe Liu
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.13453
Pdf link: https://arxiv.org/pdf/2405.13453
Abstract Privacy protection of users' entire contribution of samples is important in distributed systems. The most effective approach is the two-stage scheme, which finds a small interval first and then gets a refined estimate by clipping samples into the interval. However, the clipping operation induces bias, which is serious if the sample distribution is heavy-tailed. Besides, users with large local sample sizes can make the sensitivity much larger, thus the method is not suitable for imbalanced users. Motivated by these challenges, we propose a Huber loss minimization approach to mean estimation under user-level differential privacy. The connecting points of Huber loss can be adaptively adjusted to deal with imbalanced users. Moreover, it avoids the clipping operation, thus significantly reducing the bias compared with the two-stage approach. We provide a theoretical analysis of our approach, which gives the noise strength needed for privacy protection, as well as the bound of mean squared error. The result shows that the new method is much less sensitive to the imbalance of user-wise sample sizes and the tail of sample distributions. Finally, we perform numerical experiments to validate our theoretical analysis.
Naturally Private Recommendations with Determinantal Point Processes
Authors: Jack Fitzsimons, Agustín Freitas Pasqualini, Robert Pisarczyk, Dmitrii Usynin
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.13677
Pdf link: https://arxiv.org/pdf/2405.13677
Abstract Often we consider machine learning models or statistical analysis methods which we endeavour to alter, by introducing a randomized mechanism, to make the model conform to a differential privacy constraint. However, certain models can often be implicitly differentially private or require significantly fewer alterations. In this work, we discuss Determinantal Point Processes (DPPs) which are dispersion models that balance recommendations based on both the popularity and the diversity of the content. We introduce DPPs, derive and discuss the alternations required for them to satisfy epsilon-Differential Privacy and provide an analysis of their sensitivity. We conclude by proposing simple alternatives to DPPs which would make them more efficient with respect to their privacy-utility trade-off.
Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data
Authors: Haoran Li, Xinyuan Zhao, Dadi Guo, Hanlin Gu, Ziqian Zeng, Yuxing Han, Yangqiu Song, Lixin Fan, Qiang Yang
Subjects: Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2405.14212
Pdf link: https://arxiv.org/pdf/2405.14212
Abstract As large language models (LLMs) demonstrate unparalleled performance and generalization ability, LLMs are widely used and integrated into various applications. When it comes to sensitive domains, as commonly described in federated learning scenarios, directly using external LLMs on private data is strictly prohibited by stringent data security and privacy regulations. For local clients, the utilization of LLMs to improve the domain-specific small language models (SLMs), characterized by limited computational resources and domain-specific data, has attracted considerable research attention. By observing that LLMs can empower domain-specific SLMs, existing methods predominantly concentrate on leveraging the public data or LLMs to generate more data to transfer knowledge from LLMs to SLMs. However, due to the discrepancies between LLMs' generated data and clients' domain-specific data, these methods cannot yield substantial improvements in the domain-specific tasks. In this paper, we introduce a Federated Domain-specific Knowledge Transfer (FDKT) framework, which enables domain-specific knowledge transfer from LLMs to SLMs while preserving clients' data privacy. The core insight is to leverage LLMs to augment data based on domain-specific few-shot demonstrations, which are synthesized from private domain data using differential privacy. Such synthetic samples share similar data distribution with clients' private data and allow the server LLM to generate particular knowledge to improve clients' SLMs. The extensive experimental results demonstrate that the proposed FDKT framework consistently and greatly improves SLMs' task performance by around 5\% with a privacy budget of less than 10, compared to local training on private data.
A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness: Preliminary Results
Authors: Karima Makhlouf, Tamara Stefanovic, Heber H. Arcolezi, Catuscia Palamidessi
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.14725
Pdf link: https://arxiv.org/pdf/2405.14725
Abstract Machine learning (ML) algorithms rely primarily on the availability of training data, and, depending on the domain, these data may include sensitive information about the data providers, thus leading to significant privacy issues. Differential privacy (DP) is the predominant solution for privacy-preserving ML, and the local model of DP is the preferred choice when the server or the data collector are not trusted. Recent experimental studies have shown that local DP can impact ML prediction for different subgroups of individuals, thus affecting fair decision-making. However, the results are conflicting in the sense that some studies show a positive impact of privacy on fairness while others show a negative one. In this work, we conduct a systematic and formal study of the effect of local DP on fairness. Specifically, we perform a quantitative study of how the fairness of the decisions made by the ML model changes under local DP for different levels of privacy and data distributions. In particular, we provide bounds in terms of the joint distributions and the privacy level, delimiting the extent to which local DP can impact the fairness of the model. We characterize the cases in which privacy reduces discrimination and those with the opposite effect. We validate our theoretical findings on synthetic and real-world datasets. Our results are preliminary in the sense that, for now, we study only the case of one sensitive attribute, and only statistical disparity, conditional statistical disparity, and equal opportunity difference.
Keyword: privacy

StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems
Authors: Pavlos S. Bouzinis, Panagiotis Radoglou-Grammatikis, Ioannis Makris, Thomas Lagkas, Vasileios Argyriou, Georgios Th. Papadopoulos, Panagiotis Sarigiannidis, George K. Karagiannidis
Subjects: Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13062
Pdf link: https://arxiv.org/pdf/2405.13062
Abstract Federated learning (FL) is a decentralized learning technique that enables participating devices to collaboratively build a shared Machine Leaning (ML) or Deep Learning (DL) model without revealing their raw data to a third party. Due to its privacy-preserving nature, FL has sparked widespread attention for building Intrusion Detection Systems (IDS) within the realm of cybersecurity. However, the data heterogeneity across participating domains and entities presents significant challenges for the reliable implementation of an FL-based IDS. In this paper, we propose an effective method called Statistical Averaging (StatAvg) to alleviate non-independently and identically (non-iid) distributed features across local clients' data in FL. In particular, StatAvg allows the FL clients to share their individual data statistics with the server, which then aggregates this information to produce global statistics. The latter are shared with the clients and used for universal data normalisation. It is worth mentioning that StatAvg can seamlessly integrate with any FL aggregation strategy, as it occurs before the actual FL training process. The proposed method is evaluated against baseline approaches using datasets for network and host Artificial Intelligence (AI)-powered IDS. The experimental results demonstrate the efficiency of StatAvg in mitigating non-iid feature distributions across the FL clients compared to the baseline methods.
EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection
Authors: Yuwen Qian, Shuchi Wu, Kang Wei, Ming Ding, Di Xiao, Tao Xiang, Chuan Ma, Song Guo
Subjects: Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13080
Pdf link: https://arxiv.org/pdf/2405.13080
Abstract Federated self-supervised learning (FSSL) has recently emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data while preserving data privacy. While FSSL offers advantages, its susceptibility to backdoor attacks, a concern identified in traditional federated supervised learning (FSL), has not been investigated. To fill the research gap, we undertake a comprehensive investigation into a backdoor attack paradigm, where unscrupulous clients conspire to manipulate the global model, revealing the vulnerability of FSSL to such attacks. In FSL, backdoor attacks typically build a direct association between the backdoor trigger and the target label. In contrast, in FSSL, backdoor attacks aim to alter the global model's representation for images containing the attacker's specified trigger pattern in favor of the attacker's intended target class, which is less straightforward. In this sense, we demonstrate that existing defenses are insufficient to mitigate the investigated backdoor attacks in FSSL, thus finding an effective defense mechanism is urgent. To tackle this issue, we dive into the fundamental mechanism of backdoor attacks on FSSL, proposing the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models. In particular, EmInspector assesses the similarity of embeddings from different local models using a small set of inspection images (e.g., ten images of CIFAR100) without specific requirements on sample distribution or labels. We discover that embeddings from backdoored models tend to cluster together in the embedding space for a given inspection image. Evaluation results show that EmInspector can effectively mitigate backdoor attacks on FSSL across various adversary settings. Our code is avaliable at this https URL.
Multi-domain Knowledge Graph Collaborative Pre-training and Prompt Tuning for Diverse Downstream Tasks
Authors: Yichi Zhang, Binbin Hu, Zhuo Chen, Lingbing Guo, Ziqi Liu, Zhiqiang Zhang, Lei Liang, Huajun Chen, Wen Zhang
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.13085
Pdf link: https://arxiv.org/pdf/2405.13085
Abstract Knowledge graphs (KGs) provide reliable external knowledge for a wide variety of AI tasks in the form of structured triples. Knowledge graph pre-training (KGP) aims to pre-train neural networks on large-scale KGs and provide unified interfaces to enhance different downstream tasks, which is a key direction for KG management, maintenance, and applications. Existing works often focus on purely research questions in open domains, or they are not open source due to data security and privacy in real scenarios. Meanwhile, existing studies have not explored the training efficiency and transferability of KGP models in depth. To address these problems, We propose a framework MuDoK to achieve multi-domain collaborative pre-training and efficient prefix prompt tuning to serve diverse downstream tasks like recommendation and text understanding. Our design is a plug-and-play prompt learning approach that can be flexibly adapted to different downstream task backbones. In response to the lack of open-source benchmarks, we constructed a new multi-domain KGP benchmark called KPI with two large-scale KGs and six different sub-domain tasks to evaluate our method and open-sourced it for subsequent research. We evaluated our approach based on constructed KPI benchmarks using diverse backbone models in heterogeneous downstream tasks. The experimental results show that our framework brings significant performance gains, along with its generality, efficiency, and transferability.
FedASTA: Federated adaptive spatial-temporal attention for traffic flow prediction
Authors: Kaiyuan Li, Yihan Zhang, Xinlei Chen
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2405.13090
Pdf link: https://arxiv.org/pdf/2405.13090
Abstract Mobile devices and the Internet of Things (IoT) devices nowadays generate a large amount of heterogeneous spatial-temporal data. It remains a challenging problem to model the spatial-temporal dynamics under privacy concern. Federated learning (FL) has been proposed as a framework to enable model training across distributed devices without sharing original data which reduce privacy concern. Personalized federated learning (PFL) methods further address data heterogenous problem. However, these methods don't consider natural spatial relations among nodes. For the sake of modeling spatial relations, Graph Neural Netowork (GNN) based FL approach have been proposed. But dynamic spatial-temporal relations among edge nodes are not taken into account. Several approaches model spatial-temporal dynamics in a centralized environment, while less effort has been made under federated setting. To overcome these challeges, we propose a novel Federated Adaptive Spatial-Temporal Attention (FedASTA) framework to model the dynamic spatial-temporal relations. On the client node, FedASTA extracts temporal relations and trend patterns from the decomposed terms of original time series. Then, on the server node, FedASTA utilize trend patterns from clients to construct adaptive temporal-spatial aware graph which captures dynamic correlation between clients. Besides, we design a masked spatial attention module with both static graph and constructed adaptive graph to model spatial dependencies among clients. Extensive experiments on five real-world public traffic flow datasets demonstrate that our method achieves state-of-art performance in federated scenario. In addition, the experiments made in centralized setting show the effectiveness of our novel adaptive graph construction approach compared with other popular dynamic spatial-temporal aware methods.
Generating A Crowdsourced Conversation Dataset to Combat Cybergrooming
Authors: Xinyi Zhang, Pamela J. Wisniewski, Jin-hee Cho, Lifu Huang, Sang Won Lee
Subjects: Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2405.13154
Pdf link: https://arxiv.org/pdf/2405.13154
Abstract Cybergrooming emerges as a growing threat to adolescent safety and mental health. One way to combat cybergrooming is to leverage predictive artificial intelligence (AI) to detect predatory behaviors in social media. However, these methods can encounter challenges like false positives and negative implications such as privacy concerns. Another complementary strategy involves using generative artificial intelligence to empower adolescents by educating them about predatory behaviors. To this end, we envision developing state-of-the-art conversational agents to simulate the conversations between adolescents and predators for educational purposes. Yet, one key challenge is the lack of a dataset to train such conversational agents. In this position paper, we present our motivation for empowering adolescents to cope with cybergrooming. We propose to develop large-scale, authentic datasets through an online survey targeting adolescents and parents. We discuss some initial background behind our motivation and proposed design of the survey, such as situating the participants in artificial cybergrooming scenarios, then allowing participants to respond to the survey to obtain their authentic responses. We also present several open questions related to our proposed approach and hope to discuss them with the workshop attendees.
A Privacy-Preserving DAO Model Using NFT Authentication for the Punishment not Reward Blockchain Architecture
Authors: Talgar Bayan, Richard Banach
Subjects: Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2405.13156
Pdf link: https://arxiv.org/pdf/2405.13156
Abstract \This paper presents a novel decentralized autonomous organization (DAO) model leveraging non-fungible tokens (NFTs) for advanced access control and privacy-preserving interactions within a Punishment not Reward (PnR) blockchain framework. The proposed model introduces a dual NFT architecture: Membership NFTs ((NFT{auth})) for authentication and access control, and Interaction NFTs ((NFT{priv})) for enabling private, encrypted interactions among participants. Governance is enforced through smart contracts that manage reputation and administer punitive measures, such as conditional identity disclosure. By prioritizing privacy, security, and deterrence over financial rewards, this model addresses key challenges in existing blockchain incentive structures, paving the way for more sustainable and decentralized governance frameworks.
Bytes to Schlep? Use a FEP: Hiding Protocol Metadata with Fully Encrypted Protocols
Authors: Ellis Fenske, Aaron Johnson
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.13310
Pdf link: https://arxiv.org/pdf/2405.13310
Abstract Fully Encrypted Protocols (FEPs) have arisen in practice as a technique to avoid network censorship. Such protocols are designed to produce messages that appear completely random. This design hides communications metadata, such as version and length fields, and makes it difficult to even determine what protocol is being used. Moreover, these protocols frequently support padding to hide the length of protocol fields and the contained message. These techniques have relevance well beyond censorship circumvention, as protecting protocol metadata has security and privacy benefits for all Internet communications. The security of FEP designs depends on cryptographic assumptions, but neither security definitions nor proofs exist for them. We provide novel security definitions that capture the metadata-protection goals of FEPs. Our definitions are given in both the datastream and datagram settings, which model the ubiquitous TCP and UDP interfaces available to protocol designers. We prove relations among these new notions and existing security definitions. We further present new FEP constructions and prove their security. Finally, we survey existing FEP candidates and characterize the extent to which they satisfy FEP security. We identify novel ways in which these protocols are identifiable, including their responses to the introduction of data errors and the sizes of their smallest protocol messages.
On the Challenges of Creating Datasets for Analyzing Commercial Sex Advertisements to Assess Human Trafficking Risk and Organized Activity
Authors: Pablo Rivas, Tomas Cerny, Alejandro Rodriguez Perez, Javier Turek, Laurie Giddens, Gisela Bichler, Stacie Petter
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13348
Pdf link: https://arxiv.org/pdf/2405.13348
Abstract Our study addresses the challenges of building datasets to understand the risks associated with organized activities and human trafficking through commercial sex advertisements. These challenges include data scarcity, rapid obsolescence, and privacy concerns. Traditional approaches, which are not automated and are difficult to reproduce, fall short in addressing these issues. We have developed a reproducible and automated methodology to analyze five million advertisements. In the process, we identified further challenges in dataset creation within this sensitive domain. This paper presents a streamlined methodology to assist researchers in constructing effective datasets for combating organized crime, allowing them to focus on advancing detection technologies.
AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs
Authors: Alireza Ghaffari, Sharareh Younesian, Vahid Partovi Nia, Boxing Chen, Masoud Asgharian
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2405.13358
Pdf link: https://arxiv.org/pdf/2405.13358
Abstract The ever-growing computational complexity of Large Language Models (LLMs) necessitates efficient deployment strategies. The current state-of-the-art approaches for Post-training Quantization (PTQ) often require calibration to achieve the desired accuracy. This paper presents AdpQ, a novel zero-shot adaptive PTQ method for LLMs that achieves the state-of-the-art performance in low-precision quantization (e.g. 3-bit) without requiring any calibration data. Inspired by Adaptive LASSO regression model, our proposed approach tackles the challenge of outlier activations by separating salient weights using an adaptive soft-thresholding method. Guided by Adaptive LASSO, this method ensures that the quantized weights distribution closely follows the originally trained weights and eliminates the need for calibration data entirely, setting our method apart from popular approaches such as SpQR and AWQ. Furthermore, our method offers an additional benefit in terms of privacy preservation by eliminating any calibration or training data. We also delve deeper into the information-theoretic underpinnings of the proposed method. We demonstrate that it leverages the Adaptive LASSO to minimize the Kullback-Leibler divergence between the quantized weights and the originally trained weights. This minimization ensures the quantized model retains the Shannon information content of the original model to a great extent, guaranteeing efficient deployment without sacrificing accuracy or information. Our results achieve the same accuracy as the existing methods on various LLM benchmarks while the quantization time is reduced by at least 10x, solidifying our contribution to efficient and privacy-preserving LLM deployment.
FedCache 2.0: Exploiting the Potential of Distilled Data in Knowledge Cache-driven Federated Learning
Authors: Quyang Pan, Sheng Sun, Zhiyuan Wu, Yuwei Wang, Min Liu, Bo Gao
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13378
Pdf link: https://arxiv.org/pdf/2405.13378
Abstract Federated Edge Learning (FEL) has emerged as a promising approach for enabling edge devices to collaboratively train machine learning models while preserving data privacy. Despite its advantages, practical FEL deployment faces significant challenges related to device constraints and device-server interactions, necessitating heterogeneous, user-adaptive model training with limited and uncertain communication. In this paper, we introduce FedCache 2.0, a novel personalized FEL architecture that simultaneously addresses these challenges. FedCache 2.0 incorporates the benefits of both dataset distillation and knowledge cache-driven federated learning by storing and organizing distilled data as knowledge in the server-side knowledge cache. Moreover, a device-centric cache sampling strategy is introduced to tailor transferred knowledge for individual devices within controlled communication bandwidth. Extensive experiments on five datasets covering image recognition, audio understanding, and mobile sensor data mining tasks demonstrate that (1) FedCache 2.0 significantly outperforms state-of-the-art methods regardless of model structures, data distributions, and modalities. (2) FedCache 2.0 can train splendid personalized on-device models with at least $\times$28.6 improvement in communication efficiency.
The Illusion of Anonymity: Uncovering the Impact of User Actions on Privacy in Web3 Social Ecosystems
Authors: Bin Wang, Tianjian Liu, Wenqi Wang, Yuan Weng, Chao Li, Guangquan Xu, Meng Shen, Sencun Zhu, Wei Wang
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.13380
Pdf link: https://arxiv.org/pdf/2405.13380
Abstract The rise of Web3 social ecosystems signifies the dawn of a new chapter in digital interaction, offering significant prospects for user engagement and financial advancement. Nonetheless, this progress is shadowed by potential privacy concessions, especially as these platforms frequently merge with existing Web2.0 social media accounts, amplifying data privacy risks for users. In this study, we investigate the nuanced dynamics between user engagement on Web3 social platforms and the consequent privacy concerns. We scrutinize the widespread phenomenon of fabricated activities, which encompasses the establishment of bogus accounts aimed at mimicking popularity and the deliberate distortion of social interactions by some individuals to gain financial rewards. Such deceptive maneuvers not only distort the true measure of the active user base but also amplify privacy threats for all members of the user community. We also find that, notwithstanding their attempts to limit social exposure, users remain entangled in privacy vulnerabilities. The actions of those highly engaged users, albeit often a minority group, can inadvertently breach the privacy of the larger collective. By casting light on the delicate interplay between user engagement, financial motives, and privacy issues, we offer a comprehensive examination of the intrinsic challenges and hazards present in the Web3 social milieu. We highlight the urgent need for more stringent privacy measures and ethical protocols to navigate the complex web of social exchanges and financial ambitions in the rapidly evolving Web3.
Task-agnostic Decision Transformer for Multi-type Agent Control with Federated Split Training
Authors: Zhiyuan Wang, Bokui Chen, Xiaoyang Qu, Zhenhou Hong, Jing Xiao, Jianzong Wang
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.13445
Pdf link: https://arxiv.org/pdf/2405.13445
Abstract With the rapid advancements in artificial intelligence, the development of knowledgeable and personalized agents has become increasingly prevalent. However, the inherent variability in state variables and action spaces among personalized agents poses significant aggregation challenges for traditional federated learning algorithms. To tackle these challenges, we introduce the Federated Split Decision Transformer (FSDT), an innovative framework designed explicitly for AI agent decision tasks. The FSDT framework excels at navigating the intricacies of personalized agents by harnessing distributed data for training while preserving data privacy. It employs a two-stage training process, with local embedding and prediction models on client agents and a global transformer decoder model on the server. Our comprehensive evaluation using the benchmark D4RL dataset highlights the superior performance of our algorithm in federated split learning for personalized agents, coupled with significant reductions in communication and computational overhead compared to traditional centralized training approaches. The FSDT framework demonstrates strong potential for enabling efficient and privacy-preserving collaborative learning in applications such as autonomous driving decision systems. Our findings underscore the efficacy of the FSDT framework in effectively leveraging distributed offline reinforcement learning data to enable powerful multi-type agent decision systems.
A Huber Loss Minimization Approach to Mean Estimation under User-level Differential Privacy
Authors: Puning Zhao, Lifeng Lai, Li Shen, Qingming Li, Jiafei Wu, Zhe Liu
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.13453
Pdf link: https://arxiv.org/pdf/2405.13453
Abstract Privacy protection of users' entire contribution of samples is important in distributed systems. The most effective approach is the two-stage scheme, which finds a small interval first and then gets a refined estimate by clipping samples into the interval. However, the clipping operation induces bias, which is serious if the sample distribution is heavy-tailed. Besides, users with large local sample sizes can make the sensitivity much larger, thus the method is not suitable for imbalanced users. Motivated by these challenges, we propose a Huber loss minimization approach to mean estimation under user-level differential privacy. The connecting points of Huber loss can be adaptively adjusted to deal with imbalanced users. Moreover, it avoids the clipping operation, thus significantly reducing the bias compared with the two-stage approach. We provide a theoretical analysis of our approach, which gives the noise strength needed for privacy protection, as well as the bound of mean squared error. The result shows that the new method is much less sensitive to the imbalance of user-wise sample sizes and the tail of sample distributions. Finally, we perform numerical experiments to validate our theoretical analysis.
AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning
Authors: Di Qiu, Xinyang Lin, Kaiye Wang, Xiangxiang Chu, Pengfei Yan
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.13467
Pdf link: https://arxiv.org/pdf/2405.13467
Abstract With the growing attention on data privacy and communication security in face recognition applications, federated learning has been introduced to learn a face recognition model with decentralized datasets in a privacy-preserving manner. However, existing works still face challenges such as unsatisfying performance and additional communication costs, limiting their applicability in real-world scenarios. In this paper, we propose a simple yet effective federated face recognition framework called AdaFedFR, by devising an adaptive inter-class representation learning algorithm to enhance the generalization of the generic face model and the efficiency of federated training under strict privacy-preservation. In particular, our work delicately utilizes feature representations of public identities as learnable negative knowledge to optimize the local objective within the feature space, which further encourages the local model to learn powerful representations and optimize personalized models for clients. Experimental results demonstrate that our method outperforms previous approaches on several prevalent face recognition benchmarks within less than 3 communication rounds, which shows communication-friendly and great efficiency.
Naturally Private Recommendations with Determinantal Point Processes
Authors: Jack Fitzsimons, Agustín Freitas Pasqualini, Robert Pisarczyk, Dmitrii Usynin
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.13677
Pdf link: https://arxiv.org/pdf/2405.13677
Abstract Often we consider machine learning models or statistical analysis methods which we endeavour to alter, by introducing a randomized mechanism, to make the model conform to a differential privacy constraint. However, certain models can often be implicitly differentially private or require significantly fewer alterations. In this work, we discuss Determinantal Point Processes (DPPs) which are dispersion models that balance recommendations based on both the popularity and the diversity of the content. We introduce DPPs, derive and discuss the alternations required for them to satisfy epsilon-Differential Privacy and provide an analysis of their sensitivity. We conclude by proposing simple alternatives to DPPs which would make them more efficient with respect to their privacy-utility trade-off.
A Privacy Measure Turned Upside Down? Investigating the Use of HTTP Client Hints on the Web
Authors: Stephan Wiefling, Marian Hönscheid, Luigi Lo Iacono
Subjects: Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2405.13744
Pdf link: https://arxiv.org/pdf/2405.13744
Abstract HTTP client hints are a set of standardized HTTP request headers designed to modernize and potentially replace the traditional user agent string. While the user agent string exposes a wide range of information about the client's browser and device, client hints provide a controlled and structured approach for clients to selectively disclose their capabilities and preferences to servers. Essentially, client hints aim at more effective and privacy-friendly disclosure of browser or client properties than the user agent string. We present a first long-term study of the use of HTTP client hints in the wild. We found that despite being implemented in almost all web browsers, server-side usage of client hints remains generally low. However, in the context of third-party websites, which are often linked to trackers, the adoption rate is significantly higher. This is concerning because client hints allow the retrieval of more data from the client than the user agent string provides, and there are currently no mechanisms for users to detect or control this potential data leakage. Our work provides valuable insights for web users, browser vendors, and researchers by exposing potential privacy violations via client hints and providing help in developing remediation strategies as well as further research.
CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models
Authors: Huiwen Wu, Xiaohan Li, Deyi Zhang, Xiaogang Xu, Jiafei Wu, Puning Zhao, Zhe Liu
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2405.13746
Pdf link: https://arxiv.org/pdf/2405.13746
Abstract The success of current Large-Language Models (LLMs) hinges on extensive training data that is collected and stored centrally, called Centralized Learning (CL). However, such a collection manner poses a privacy threat, and one potential solution is Federated Learning (FL), which transfers gradients, not raw data, among clients. Unlike traditional networks, FL for LLMs incurs significant communication costs due to their tremendous parameters. This study introduces an innovative approach to compress gradients to improve communication efficiency during LLM FL, formulating the new FL pipeline named CG-FedLLM. This approach integrates an encoder on the client side to acquire the compressed gradient features and a decoder on the server side to reconstruct the gradients. We also developed a novel training strategy that comprises Temporal-ensemble Gradient-Aware Pre-training (TGAP) to identify characteristic gradients of the target model and Federated AutoEncoder-Involved Fine-tuning (FAF) to compress gradients adaptively. Extensive experiments confirm that our approach reduces communication costs and improves performance (e.g., average 3 points increment compared with traditional CL- and FL-based fine-tuning with LlaMA on a well-recognized benchmark, C-Eval). This improvement is because our encoder-decoder, trained via TGAP and FAF, can filter gradients while selectively preserving critical features. Furthermore, we present a series of experimental analyses focusing on the signal-to-noise ratio, compression rate, and robustness within this privacy-centric framework, providing insight into developing more efficient and secure LLMs.
Guarding Multiple Secrets: Enhanced Summary Statistic Privacy for Data Sharing
Authors: Shuaiqi Wang, Rongzhe Wei, Mohsen Ghassemi, Eleonora Kreacic, Vamsi K. Potluru
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.13804
Pdf link: https://arxiv.org/pdf/2405.13804
Abstract Data sharing enables critical advances in many research areas and business applications, but it may lead to inadvertent disclosure of sensitive summary statistics (e.g., means or quantiles). Existing literature only focuses on protecting a single confidential quantity, while in practice, data sharing involves multiple sensitive statistics. We propose a novel framework to define, analyze, and protect multi-secret summary statistics privacy in data sharing. Specifically, we measure the privacy risk of any data release mechanism by the worst-case probability of an attacker successfully inferring summary statistic secrets. Given an attacker's objective spanning from inferring a subset to the entirety of summary statistic secrets, we systematically design and analyze tailored privacy metrics. Defining the distortion as the worst-case distance between the original and released data distribution, we analyze the tradeoff between privacy and distortion. Our contribution also includes designing and analyzing data release mechanisms tailored for different data distributions and secret types. Evaluations on real-world data demonstrate the effectiveness of our mechanisms in practical applications.
Diffusion-Based Cloud-Edge-Device Collaborative Learning for Next POI Recommendations
Authors: Jing Long, Guanhua Ye, Tong Chen, Yang Wang, Meng Wang, Hongzhi Yin
Subjects: Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2405.13811
Pdf link: https://arxiv.org/pdf/2405.13811
Abstract The rapid expansion of Location-Based Social Networks (LBSNs) has highlighted the importance of effective next Point-of-Interest (POI) recommendations, which leverage historical check-in data to predict users' next POIs to visit. Traditional centralized deep neural networks (DNNs) offer impressive POI recommendation performance but face challenges due to privacy concerns and limited timeliness. In response, on-device POI recommendations have been introduced, utilizing federated learning (FL) and decentralized approaches to ensure privacy and recommendation timeliness. However, these methods often suffer from computational strain on devices and struggle to adapt to new users and regions. This paper introduces a novel collaborative learning framework, Diffusion-Based Cloud-Edge-Device Collaborative Learning for Next POI Recommendations (DCPR), leveraging the diffusion model known for its success across various domains. DCPR operates with a cloud-edge-device architecture to offer region-specific and highly personalized POI recommendations while reducing on-device computational burdens. DCPR minimizes on-device computational demands through a unique blend of global and local learning processes. Our evaluation with two real-world datasets demonstrates DCPR's superior performance in recommendation accuracy, efficiency, and adaptability to new users and regions, marking a significant step forward in on-device POI recommendation technology.
Federated Learning in Healthcare: Model Misconducts, Security, Challenges, Applications, and Future Research Directions -- A Systematic Review
Authors: Md Shahin Ali, Md Manjurul Ahsan, Lamia Tasnim, Sadia Afrin, Koushik Biswas, Md Maruf Hossain, Md Mahfuz Ahmed, Ronok Hashan, Md Khairul Islam, Shivakumar Raman
Subjects: Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13832
Pdf link: https://arxiv.org/pdf/2405.13832
Abstract Data privacy has become a major concern in healthcare due to the increasing digitization of medical records and data-driven medical research. Protecting sensitive patient information from breaches and unauthorized access is critical, as such incidents can have severe legal and ethical complications. Federated Learning (FL) addresses this concern by enabling multiple healthcare institutions to collaboratively learn from decentralized data without sharing it. FL's scope in healthcare covers areas such as disease prediction, treatment customization, and clinical trial research. However, implementing FL poses challenges, including model convergence in non-IID (independent and identically distributed) data environments, communication overhead, and managing multi-institutional collaborations. A systematic review of FL in healthcare is necessary to evaluate how effectively FL can provide privacy while maintaining the integrity and usability of medical data analysis. In this study, we analyze existing literature on FL applications in healthcare. We explore the current state of model security practices, identify prevalent challenges, and discuss practical applications and their implications. Additionally, the review highlights promising future research directions to refine FL implementations, enhance data security protocols, and expand FL's use to broader healthcare applications, which will benefit future researchers and practitioners.
AI-Protected Blockchain-based IoT environments: Harnessing the Future of Network Security and Privacy
Authors: Ali Mohammadi Ruzbahani
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.13847
Pdf link: https://arxiv.org/pdf/2405.13847
Abstract Integrating blockchain technology with the Internet of Things offers transformative possibilities for enhancing network security and privacy in the contemporary digital landscape, where interconnected devices and expansive networks are ubiquitous. This paper explores the pivotal role of artificial intelligence in bolstering blockchain-enabled IoT systems, potentially marking a significant leap forward in safeguarding data integrity and confidentiality across networks. Blockchain technology provides a decentralized and immutable ledger, ideal for the secure management of device identities and transactions in IoT networks. When coupled with AI, these systems gain the ability to not only automate and optimize security protocols but also adaptively respond to new and evolving cyber threats. This dual capability enhances the resilience of networks against cyber-attacks, a critical consideration as IoT devices increasingly permeate critical infrastructures. The synergy between AI and blockchain in IoT is profound. AI algorithms can analyze vast amounts of data from IoT devices to detect patterns and anomalies that may signify security breaches. Concurrently, blockchain can ensure that data records are tamper-proof, enhancing the reliability of AI-driven security measures. Moreover, this research evaluates the implications of AI-enhanced blockchain systems on privacy protection within IoT networks. IoT devices often collect sensitive personal data, making privacy a paramount concern. AI can facilitate the development of new protocols that ensure data privacy and user anonymity without compromising the functionality of IoT systems. Through comprehensive analysis and case studies, this paper aims to provide an in-depth understanding of how AI-enhanced blockchain technology can revolutionize network security and privacy in IoT environments.
What Do Privacy Advertisements Communicate to Consumers?
Authors: Xiaoxin Shen, Eman Alashwali, Lorrie Faith Cranor
Subjects: Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2405.13857
Pdf link: https://arxiv.org/pdf/2405.13857
Abstract When companies release marketing materials aimed at promoting their privacy practices or highlighting specific privacy features, what do they actually communicate to consumers? In this paper, we explore the impact of privacy marketing materials on: (1) consumers' attitude towards the organizations providing the campaigns, (2) overall privacy awareness, and (3) the actionability of suggested privacy advice. To this end, we investigated the impact of four privacy advertising videos and one privacy game published by five different technology companies. We conducted 24 semi-structured interviews with participants randomly assigned to view one or two of the videos or play the game. Our findings suggest that awareness of privacy features can contribute to positive perceptions of a company or its products. The ads we tested were more successful in communicating the advertised privacy features than the game we tested. We observed that advertising a single privacy feature using a single metaphor in a short ad increased awareness of the advertised feature. The game failed to communicate privacy features or motivate study participants to use the features. Our results also suggest that privacy campaigns can be useful for raising awareness about privacy features and improving brand image, but may not be the most effective way to teach viewers how to use privacy features.
Rehearsal-free Federated Domain-incremental Learning
Authors: Rui Sun, Haoran Duan, Jiahua Dong, Varun Ojha, Tejal Shah, Rajiv Ranjan
Subjects: Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.13900
Pdf link: https://arxiv.org/pdf/2405.13900
Abstract We introduce a rehearsal-free federated domain incremental learning framework, RefFiL, based on a global prompt-sharing paradigm to alleviate catastrophic forgetting challenges in federated domain-incremental learning, where unseen domains are continually learned. Typical methods for mitigating forgetting, such as the use of additional datasets and the retention of private data from earlier tasks, are not viable in federated learning (FL) due to devices' limited resources. Our method, RefFiL, addresses this by learning domain-invariant knowledge and incorporating various domain-specific prompts from the domains represented by different FL participants. A key feature of RefFiL is the generation of local fine-grained prompts by our domain adaptive prompt generator, which effectively learns from local domain knowledge while maintaining distinctive boundaries on a global scale. We also introduce a domain-specific prompt contrastive learning loss that differentiates between locally generated prompts and those from other domains, enhancing RefFiL's precision and effectiveness. Compared to existing methods, RefFiL significantly alleviates catastrophic forgetting without requiring extra memory space, making it ideal for privacy-sensitive and resource-constrained devices.
Memory Scraping Attack on Xilinx FPGAs: Private Data Extraction from Terminated Processes
Authors: Bharadwaj Madabhushi, Sandip Kundu, Daniel Holcomb
Subjects: Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2405.13927
Pdf link: https://arxiv.org/pdf/2405.13927
Abstract FPGA-based hardware accelerators are becoming increasingly popular due to their versatility, customizability, energy efficiency, constant latency, and scalability. FPGAs can be tailored to specific algorithms, enabling efficient hardware implementations that effectively leverage algorithm parallelism. This can lead to significant performance improvements over CPUs and GPUs, particularly for highly parallel applications. For example, a recent study found that Stratix 10 FPGAs can achieve up to 90\% of the performance of a TitanX Pascal GPU while consuming less than 50\% of the power. This makes FPGAs an attractive choice for accelerating machine learning (ML) workloads. However, our research finds privacy and security vulnerabilities in existing Xilinx FPGA-based hardware acceleration solutions. These vulnerabilities arise from the lack of memory initialization and insufficient process isolation, which creates potential avenues for unauthorized access to private data used by processes. To illustrate this issue, we conducted experiments using a Xilinx ZCU104 board running the PetaLinux tool from Xilinx. We found that PetaLinux does not effectively clear memory locations associated with a terminated process, leaving them vulnerable to memory scraping attack (MSA). This paper makes two main contributions. The first contribution is an attack methodology of using the Xilinx debugger from a different user space. We find that we are able to access process IDs, virtual address spaces, and pagemaps of one user from a different user space because of lack of adequate process isolation. The second contribution is a methodology for characterizing terminated processes and accessing their private data. We illustrate this on Xilinx ML application library.
Remote Keylogging Attacks in Multi-user VR Applications
Authors: Zihao Su, Kunlin Cai, Reuben Beeler, Lukas Dresel, Allan Garcia, Ilya Grishchenko, Yuan Tian, Christopher Kruegel, Giovanni Vigna
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.14036
Pdf link: https://arxiv.org/pdf/2405.14036
Abstract As Virtual Reality (VR) applications grow in popularity, they have bridged distances and brought users closer together. However, with this growth, there have been increasing concerns about security and privacy, especially related to the motion data used to create immersive experiences. In this study, we highlight a significant security threat in multi-user VR applications, which are applications that allow multiple users to interact with each other in the same virtual space. Specifically, we propose a remote attack that utilizes the avatar rendering information collected from an adversary's game clients to extract user-typed secrets like credit card information, passwords, or private conversations. We do this by (1) extracting motion data from network packets, and (2) mapping motion data to keystroke entries. We conducted a user study to verify the attack's effectiveness, in which our attack successfully inferred 97.62% of the keystrokes. Besides, we performed an additional experiment to underline that our attack is practical, confirming its effectiveness even when (1) there are multiple users in a room, and (2) the attacker cannot see the victims. Moreover, we replicated our proposed attack on four applications to demonstrate the generalizability of the attack. These results underscore the severity of the vulnerability and its potential impact on millions of VR social platform users.
Nearly Tight Black-Box Auditing of Differentially Private Machine Learning
Authors: Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro
Subjects: Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14106
Pdf link: https://arxiv.org/pdf/2405.14106
Abstract This paper presents a nearly tight audit of the Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm in the black-box model. Our auditing procedure empirically estimates the privacy leakage from DP-SGD using membership inference attacks; unlike prior work, the estimates are appreciably close to the theoretical DP bounds. The main intuition is to craft worst-case initial model parameters, as DP-SGD's privacy analysis is agnostic to the choice of the initial model parameters. For models trained with theoretical $\varepsilon=10.0$ on MNIST and CIFAR-10, our auditing procedure yields empirical estimates of $7.21$ and $6.95$, respectively, on 1,000-record samples and $6.48$ and $4.96$ on the full datasets. By contrast, previous work achieved tight audits only in stronger (i.e., less realistic) white-box models that allow the adversary to access the model's inner parameters and insert arbitrary gradients. Our auditing procedure can be used to detect bugs and DP violations more easily and offers valuable insight into how the privacy analysis of DP-SGD can be further improved.
Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data
Authors: Haoran Li, Xinyuan Zhao, Dadi Guo, Hanlin Gu, Ziqian Zeng, Yuxing Han, Yangqiu Song, Lixin Fan, Qiang Yang
Subjects: Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2405.14212
Pdf link: https://arxiv.org/pdf/2405.14212
Abstract As large language models (LLMs) demonstrate unparalleled performance and generalization ability, LLMs are widely used and integrated into various applications. When it comes to sensitive domains, as commonly described in federated learning scenarios, directly using external LLMs on private data is strictly prohibited by stringent data security and privacy regulations. For local clients, the utilization of LLMs to improve the domain-specific small language models (SLMs), characterized by limited computational resources and domain-specific data, has attracted considerable research attention. By observing that LLMs can empower domain-specific SLMs, existing methods predominantly concentrate on leveraging the public data or LLMs to generate more data to transfer knowledge from LLMs to SLMs. However, due to the discrepancies between LLMs' generated data and clients' domain-specific data, these methods cannot yield substantial improvements in the domain-specific tasks. In this paper, we introduce a Federated Domain-specific Knowledge Transfer (FDKT) framework, which enables domain-specific knowledge transfer from LLMs to SLMs while preserving clients' data privacy. The core insight is to leverage LLMs to augment data based on domain-specific few-shot demonstrations, which are synthesized from private domain data using differential privacy. Such synthetic samples share similar data distribution with clients' private data and allow the server LLM to generate particular knowledge to improve clients' SLMs. The extensive experimental results demonstrate that the proposed FDKT framework consistently and greatly improves SLMs' task performance by around 5\% with a privacy budget of less than 10, compared to local training on private data.
Time-FFM: Towards LM-Empowered Federated Foundation Model for Time Series Forecasting
Authors: Qingxiang Liu, Xu Liu, Chenghao Liu, Qingsong Wen, Yuxuan Liang
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14252
Pdf link: https://arxiv.org/pdf/2405.14252
Abstract Unlike natural language processing and computer vision, the development of Foundation Models (FMs) for time series forecasting is blocked due to data scarcity. While recent efforts are focused on building such FMs by unlocking the potential of language models (LMs) for time series analysis, dedicated parameters for various downstream forecasting tasks need training, which hinders the common knowledge sharing across domains. Moreover, data owners may hesitate to share the access to local data due to privacy concerns and copyright protection, which makes it impossible to simply construct a FM on cross-domain training instances. To address these issues, we propose Time-FFM, a Federated Foundation Model for Time series forecasting by leveraging pretrained LMs. Specifically, we begin by transforming time series into the modality of text tokens. To bootstrap LMs for time series reasoning, we propose a prompt adaption module to determine domain-customized prompts dynamically instead of artificially. Given the data heterogeneity across domains, we design a personalized federated training strategy by learning global encoders and local prediction heads. Our comprehensive experiments indicate that Time-FFM outperforms state-of-the-arts and promises effective few-shot and zero-shot forecaster.
Variational Bayes for Federated Continual Learning
Authors: Dezhong Yao, Sanmu Li, Yutong Dai, Zhiqiang Xu, Shengshan Hu, Peilin Zhao, Lichao Sun
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2405.14291
Pdf link: https://arxiv.org/pdf/2405.14291
Abstract Federated continual learning (FCL) has received increasing attention due to its potential in handling real-world streaming data, characterized by evolving data distributions and varying client classes over time. The constraints of storage limitations and privacy concerns confine local models to exclusively access the present data within each learning cycle. Consequently, this restriction induces performance degradation in model training on previous data, termed "catastrophic forgetting". However, existing FCL approaches need to identify or know changes in data distribution, which is difficult in the real world. To release these limitations, this paper directs attention to a broader continuous framework. Within this framework, we introduce Federated Bayesian Neural Network (FedBNN), a versatile and efficacious framework employing a variational Bayesian neural network across all clients. Our method continually integrates knowledge from local and historical data distributions into a single model, adeptly learning from new data distributions while retaining performance on historical distributions. We rigorously evaluate FedBNN's performance against prevalent methods in federated learning and continual learning using various metrics. Experimental analyses across diverse datasets demonstrate that FedBNN achieves state-of-the-art results in mitigating forgetting.
EdgeShard: Efficient LLM Inference via Collaborative Edge Computing
Authors: Mingjin Zhang, Jiannong Cao, Xiaoming Shen, Zeyang Cui
Subjects: Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2405.14371
Pdf link: https://arxiv.org/pdf/2405.14371
Abstract Large language models (LLMs) have shown great potential in natural language processing and content generation. However, current LLMs heavily rely on cloud computing, leading to prolonged latency, high bandwidth cost, and privacy concerns. Edge computing is promising to address such concerns by deploying LLMs on edge devices, closer to data sources. Some works try to leverage model quantization to reduce the model size to fit the resource-constraint edge devices, but they lead to accuracy loss. Other works use cloud-edge collaboration, suffering from unstable network connections. In this work, we leverage collaborative edge computing to facilitate the collaboration among edge devices and cloud servers for jointly performing efficient LLM inference. We propose a general framework to partition the LLM model into shards and deploy on distributed devices. To achieve efficient LLM inference, we formulate an adaptive joint device selection and model partition problem and design an efficient dynamic programming algorithm to optimize the inference latency and throughput, respectively. Experiments of Llama2 serial models on a heterogeneous physical prototype demonstrate that EdgeShard achieves up to 50% latency reduction and 2x throughput improvement over baseline methods.
Gradient Transformation: Towards Efficient and Model-Agnostic Unlearning for Dynamic Graph Neural Networks
Authors: He Zhang, Bang Wu, Xiangwen Yang, Xingliang Yuan, Chengqi Zhang, Shirui Pan
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14407
Pdf link: https://arxiv.org/pdf/2405.14407
Abstract Graph unlearning has emerged as an essential tool for safeguarding user privacy and mitigating the negative impacts of undesirable data. Meanwhile, the advent of dynamic graph neural networks (DGNNs) marks a significant advancement due to their superior capability in learning from dynamic graphs, which encapsulate spatial-temporal variations in diverse real-world applications (e.g., traffic forecasting). With the increasing prevalence of DGNNs, it becomes imperative to investigate the implementation of dynamic graph unlearning. However, current graph unlearning methodologies are designed for GNNs operating on static graphs and exhibit limitations including their serving in a pre-processing manner and impractical resource demands. Furthermore, the adaptation of these methods to DGNNs presents non-trivial challenges, owing to the distinctive nature of dynamic graphs. To this end, we propose an effective, efficient, model-agnostic, and post-processing method to implement DGNN unlearning. Specifically, we first define the unlearning requests and formulate dynamic graph unlearning in the context of continuous-time dynamic graphs. After conducting a role analysis on the unlearning data, the remaining data, and the target DGNN model, we propose a method called Gradient Transformation and a loss function to map the unlearning request to the desired parameter update. Evaluations on six real-world datasets and state-of-the-art DGNN backbones demonstrate its effectiveness (e.g., limited performance drop even obvious improvement) and efficiency (e.g., at most 7.23$\times$ speed-up) outperformance, and potential advantages in handling future unlearning requests (e.g., at most 32.59$\times$ speed-up).
GeoFaaS: An Edge-to-Cloud FaaS Platform
Authors: Mohammadreza Malekabbasi, Tobias Pfandzelter, Trever Schirmer, David Bermbach
Subjects: Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2405.14413
Pdf link: https://arxiv.org/pdf/2405.14413
Abstract The massive growth of mobile and IoT devices demands geographically distributed computing systems for optimal performance, privacy, and scalability. However, existing edge-to-cloud serverless platforms lack location awareness, resulting in inefficient network usage and increased latency. In this paper, we propose GeoFaaS, a novel edge-to-cloud Function-as-a-Service (FaaS) platform that leverages real-time client location information for transparent request execution on the nearest available FaaS node. If needed, GeoFaaS transparently offloads requests to the cloud when edge resources are overloaded, thus, ensuring consistent execution without user intervention. GeoFaaS has a modular and decentralized architecture: building on the single-node FaaS system tinyFaaS, GeoFaaS works as a stand-alone edge-to-cloud FaaS platform but can also integrate and act as a routing layer for existing FaaS services, e.g., in the cloud. To evaluate our approach, we implemented an open-source proof-of-concept prototype and studied performance and fault-tolerance behavior in experiments.
Worldwide Federated Training of Language Models
Authors: Alex Iacob, Lorenzo Sani, Bill Marino, Preslav Aleksandrov, Nicholas Donald Lane
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2405.14446
Pdf link: https://arxiv.org/pdf/2405.14446
Abstract The reliance of language model training on massive amounts of computation and vast datasets scraped from potentially low-quality, copyrighted, or sensitive data has come into question practically, legally, and ethically. Federated learning provides a plausible alternative by enabling previously untapped data to be voluntarily gathered from collaborating organizations. However, when scaled globally, federated learning requires collaboration across heterogeneous legal, security, and privacy regimes while accounting for the inherent locality of language data; this further exacerbates the established challenge of federated statistical heterogeneity. We propose a Worldwide Federated Language Model Training~(WorldLM) system based on federations of federations, where each federation has the autonomy to account for factors such as its industry, operating jurisdiction, or competitive environment. WorldLM enables such autonomy in the presence of statistical heterogeneity via partial model localization by allowing sub-federations to attentively aggregate key layers from their constituents. Furthermore, it can adaptively share information across federations via residual layer embeddings. Evaluations of language modeling on naturally heterogeneous datasets show that WorldLM outperforms standard federations by up to $1.91\times$, approaches the personalized performance of fully local models, and maintains these advantages under privacy-enhancing techniques.
Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model
Authors: Tudor Cebere, Aurélien Bellet, Nicolas Papernot
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.14457
Pdf link: https://arxiv.org/pdf/2405.14457
Abstract Machine learning models can be trained with formal privacy guarantees via differentially private optimizers such as DP-SGD. In this work, we study such privacy guarantees when the adversary only accesses the final model, i.e., intermediate model updates are not released. In the existing literature, this hidden state threat model exhibits a significant gap between the lower bound provided by empirical privacy auditing and the theoretical upper bound provided by privacy accounting. To challenge this gap, we propose to audit this threat model with adversaries that craft a gradient sequence to maximize the privacy loss of the final model without accessing intermediate models. We demonstrate experimentally how this approach consistently outperforms prior attempts at auditing the hidden state model. When the crafted gradient is inserted at every optimization step, our results imply that releasing only the final model does not amplify privacy, providing a novel negative result. On the other hand, when the crafted gradient is not inserted at every step, we show strong evidence that a privacy amplification phenomenon emerges in the general non-convex setting (albeit weaker than in convex regimes), suggesting that existing privacy upper bounds can be improved.
A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions
Authors: Mohammed Hassanin, Nour Moustafa
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.14487
Pdf link: https://arxiv.org/pdf/2405.14487
Abstract The recent progression of Large Language Models (LLMs) has witnessed great success in the fields of data-centric applications. LLMs trained on massive textual datasets showed ability to encode not only context but also ability to provide powerful comprehension to downstream tasks. Interestingly, Generative Pre-trained Transformers utilised this ability to bring AI a step closer to human being replacement in at least datacentric applications. Such power can be leveraged to identify anomalies of cyber threats, enhance incident response, and automate routine security operations. We provide an overview for the recent activities of LLMs in cyber defence sections, as well as categorization for the cyber defence sections such as threat intelligence, vulnerability assessment, network security, privacy preserving, awareness and training, automation, and ethical guidelines. Fundamental concepts of the progression of LLMs from Transformers, Pre-trained Transformers, and GPT is presented. Next, the recent works of each section is surveyed with the related strengths and weaknesses. A special section about the challenges and directions of LLMs in cyber security is provided. Finally, possible future research directions for benefiting from LLMs in cyber security is discussed.
Identity Inference from CLIP Models using Only Textual Data
Authors: Songze Li, Ruoxi Cheng, Xiaojun Jia
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.14517
Pdf link: https://arxiv.org/pdf/2405.14517
Abstract The widespread usage of large-scale multimodal models like CLIP has heightened concerns about the leakage of personally identifiable information (PII). Existing methods for identity inference in CLIP models, i.e., to detect the presence of a person's PII used for training a CLIP model, require querying the model with full PII, including textual descriptions of the person and corresponding images (e.g., the name and the face photo of the person). However, this may lead to potential privacy breach of the image, as it may have not been seen by the target model yet. Additionally, traditional membership inference attacks (MIAs) train shadow models to mimic the behaviors of the target model, which incurs high computational costs, especially for large CLIP models. To address these challenges, we propose a textual unimodal detector (TUNI) in CLIP models, a novel method for ID inference that 1) queries the target model with only text data; and 2) does not require training shadow models. Firstly, we develop a feature extraction algorithm, guided by the CLIP model, to extract features from a text description. TUNI starts with randomly generating textual gibberish that were clearly not utilized for training, and leverages their feature vectors to train a system of anomaly detectors. During inference, the feature vector of each test text is fed into the anomaly detectors to determine if the person's PII is in the training set (abnormal) or not (normal). Moreover, TUNI can be further strengthened integrating real images associated with the tested individuals, if available at the detector. Extensive experiments of TUNI across various CLIP model architectures and datasets demonstrate its superior performance over baselines, albeit with only text data.
Towards Privacy-Aware and Personalised Assistive Robots: A User-Centred Approach
Authors: Fernando E. Casado
Subjects: Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14528
Pdf link: https://arxiv.org/pdf/2405.14528
Abstract The global increase in the elderly population necessitates innovative long-term care solutions to improve the quality of life for vulnerable individuals while reducing caregiver burdens. Assistive robots, leveraging advancements in Machine Learning, offer promising personalised support. However, their integration into daily life raises significant privacy concerns. Widely used frameworks like the Robot Operating System (ROS) historically lack inherent privacy mechanisms, complicating data-driven approaches in robotics. This research pioneers user-centric, privacy-aware technologies such as Federated Learning (FL) to advance assistive robotics. FL enables collaborative learning without sharing sensitive data, addressing privacy and scalability issues. This work includes developing solutions for smart wheelchair assistance, enhancing user independence and well-being. By tackling challenges related to non-stationary data and heterogeneous environments, the research aims to improve personalisation and user experience. Ultimately, it seeks to lead the responsible integration of assistive robots into society, enhancing the quality of life for elderly and care-dependent individuals.
PrivCirNet: Efficient Private Inference via Block Circulant Transformation
Authors: Tianshi Xu, Lemeng Wu, Runsheng Wang, Meng Li
Subjects: Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.14569
Pdf link: https://arxiv.org/pdf/2405.14569
Abstract Homomorphic encryption (HE)-based deep neural network (DNN) inference protects data and model privacy but suffers from significant computation overhead. We observe transforming the DNN weights into circulant matrices converts general matrix-vector multiplications into HE-friendly 1-dimensional convolutions, drastically reducing the HE computation cost. Hence, in this paper, we propose \method, a protocol/network co-optimization framework based on block circulant transformation. At the protocol level, PrivCirNet customizes the HE encoding algorithm that is fully compatible with the block circulant transformation and reduces the computation latency in proportion to the block size. At the network level, we propose a latency-aware formulation to search for the layer-wise block size assignment based on second-order information. PrivCirNet also leverages layer fusion to further reduce the inference cost. We compare PrivCirNet with the state-of-the-art HE-based framework Bolt (IEEE S\&P 2024) and the HE-friendly pruning method SpENCNN (ICML 2023). For ResNet-18 and Vision Transformer (ViT) on Tiny ImageNet, PrivCirNet reduces latency by $5.0\times$ and $1.3\times$ with iso-accuracy over Bolt, respectively, and improves accuracy by $4.1\%$ and $12\%$ over SpENCNN, respectively. For MobileNetV2 on ImageNet, PrivCirNet achieves $1.7\times$ lower latency and $4.2\%$ better accuracy over Bolt and SpENCNN, respectively. Our code and checkpoints are available in the supplementary materials.
A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness: Preliminary Results
Authors: Karima Makhlouf, Tamara Stefanovic, Heber H. Arcolezi, Catuscia Palamidessi
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.14725
Pdf link: https://arxiv.org/pdf/2405.14725
Abstract Machine learning (ML) algorithms rely primarily on the availability of training data, and, depending on the domain, these data may include sensitive information about the data providers, thus leading to significant privacy issues. Differential privacy (DP) is the predominant solution for privacy-preserving ML, and the local model of DP is the preferred choice when the server or the data collector are not trusted. Recent experimental studies have shown that local DP can impact ML prediction for different subgroups of individuals, thus affecting fair decision-making. However, the results are conflicting in the sense that some studies show a positive impact of privacy on fairness while others show a negative one. In this work, we conduct a systematic and formal study of the effect of local DP on fairness. Specifically, we perform a quantitative study of how the fairness of the decisions made by the ML model changes under local DP for different levels of privacy and data distributions. In particular, we provide bounds in terms of the joint distributions and the privacy level, delimiting the extent to which local DP can impact the fairness of the model. We characterize the cases in which privacy reduces discrimination and those with the opposite effect. We validate our theoretical findings on synthetic and real-world datasets. Our results are preliminary in the sense that, for now, we study only the case of one sensitive attribute, and only statistical disparity, conditional statistical disparity, and equal opportunity difference.
Recurrent Early Exits for Federated Learning with Heterogeneous Clients
Authors: Royson Lee, Javier Fernandez-Marques, Shell Xu Hu, Da Li, Stefanos Laskaridis, Łukasz Dudziak, Timothy Hospedales, Ferenc Huszár, Nicholas D. Lane
Subjects: Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2405.14791
Pdf link: https://arxiv.org/pdf/2405.14791
Abstract Federated learning (FL) has enabled distributed learning of a model across multiple clients in a privacy-preserving manner. One of the main challenges of FL is to accommodate clients with varying hardware capacities; clients have differing compute and memory requirements. To tackle this challenge, recent state-of-the-art approaches leverage the use of early exits. Nonetheless, these approaches fall short of mitigating the challenges of joint learning multiple exit classifiers, often relying on hand-picked heuristic solutions for knowledge distillation among classifiers and/or utilizing additional layers for weaker classifiers. In this work, instead of utilizing multiple classifiers, we propose a recurrent early exit approach named ReeFL that fuses features from different sub-models into a single shared classifier. Specifically, we use a transformer-based early-exit module shared among sub-models to i) better exploit multi-layer feature representations for task-specific prediction and ii) modulate the feature representation of the backbone model for subsequent predictions. We additionally present a per-client self-distillation approach where the best sub-model is automatically selected as the teacher of the other sub-models at each client. Our experiments on standard image and speech classification benchmarks across various emerging federated fine-tuning baselines demonstrate ReeFL's effectiveness over previous works.
Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy
Authors: Shengfang Zhai, Huanran Chen, Yinpeng Dong, Jiajun Li, Qingni Shen, Yansong Gao, Hang Su, Yang Liu
Subjects: Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.14800
Pdf link: https://arxiv.org/pdf/2405.14800
Abstract Text-to-image diffusion models have achieved tremendous success in the field of controllable image generation, while also coming along with issues of privacy leakage and data copyrights. Membership inference arises in these contexts as a potential auditing method for detecting unauthorized data usage. While some efforts have been made on diffusion models, they are not applicable to text-to-image diffusion models due to the high computation overhead and enhanced generalization capabilities. In this paper, we first identify a conditional overfitting phenomenon in text-to-image diffusion models, indicating that these models tend to overfit the conditional distribution of images given the text rather than the marginal distribution of images. Based on this observation, we derive an analytical indicator, namely Conditional Likelihood Discrepancy (CLiD), to perform membership inference. This indicator reduces the stochasticity in estimating the memorization of individual samples. Experimental results demonstrate that our method significantly outperforms previous methods across various data distributions and scales. Additionally, our method shows superior resistance to overfitting mitigation strategies such as early stopping and data augmentation.
Keyword: machine learning

A Systematic Analysis on the Temporal Generalization of Language Models in Social Media
Authors: Asahi Ushio, Jose Camacho-Collados
Subjects: Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13017
Pdf link: https://arxiv.org/pdf/2405.13017
Abstract In machine learning, temporal shifts occur when there are differences between training and test splits in terms of time. For streaming data such as news or social media, models are commonly trained on a fixed corpus from a certain period of time, and they can become obsolete due to the dynamism and evolving nature of online content. This paper focuses on temporal shifts in social media and, in particular, Twitter. We propose a unified evaluation scheme to assess the performance of language models (LMs) under temporal shift on standard social media tasks. LMs are tested on five diverse social media NLP tasks under different temporal settings, which revealed two important findings: (i) the decrease in performance under temporal shift is consistent across different models for entity-focused tasks such as named entity recognition or disambiguation, and hate speech detection, but not significant in the other tasks analysed (i.e., topic and sentiment classification); and (ii) continuous pre-training on the test period does not improve the temporal adaptability of LMs.
A Machine Learning Approach for Predicting Upper Limb Motion Intentions with Multimodal Data in Virtual Reality
Authors: Pavan Uttej Ravva, Pinar Kullu, Mohammad Fahim Abrar, Roghayeh Leila Barmaki
Subjects: Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2405.13023
Pdf link: https://arxiv.org/pdf/2405.13023
Abstract Over the last decade, there has been significant progress in the field of interactive virtual rehabilitation. Physical therapy (PT) stands as a highly effective approach for enhancing physical impairments. However, patient motivation and progress tracking in rehabilitation outcomes remain a challenge. This work addresses the gap through a machine learning-based approach to objectively measure outcomes of the upper limb virtual therapy system in a user study with non-clinical participants. In this study, we use virtual reality to perform several tracing tasks while collecting motion and movement data using a KinArm robot and a custom-made wearable sleeve sensor. We introduce a two-step machine learning architecture to predict the motion intention of participants. The first step predicts reaching task segments to which the participant-marked points belonged using gaze, while the second step employs a Long Short-Term Memory (LSTM) model to predict directional movements based on resistance change values from the wearable sensor and the KinArm. We specifically propose to transpose our raw resistance data to the time-domain which significantly improves the accuracy of the models by 34.6%. To evaluate the effectiveness of our model, we compared different classification techniques with various data configurations. The results show that our proposed computational method is exceptional at predicting participant's actions with accuracy values of 96.72% for diamond reaching task, and 97.44% for circle reaching task, which demonstrates the great promise of using multimodal data, including eye-tracking and resistance change, to objectively measure the performance and intention in virtual rehabilitation settings.
Intelligent Tutor: Leveraging ChatGPT and Microsoft Copilot Studio to Deliver a Generative AI Student Support and Feedback System within Teams
Authors: Wei-Yu Chen
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2405.13024
Pdf link: https://arxiv.org/pdf/2405.13024
Abstract This study explores the integration of the ChatGPT API with GPT-4 model and Microsoft Copilot Studio on the Microsoft Teams platform to develop an intelligent tutoring system. Designed to provide instant support to students, the system dynamically adjusts educational content in response to the learners' progress and feedback. Utilizing advancements in natural language processing and machine learning, it interprets student inquiries, offers tailored feedback, and facilitates the educational journey. Initial implementation highlights the system's potential in boosting students' motivation and engagement, while equipping educators with critical insights into the learning process, thus promoting tailored educational experiences and enhancing instructional effectiveness.
Assessing Political Bias in Large Language Models
Authors: Luca Rettenberger, Markus Reischl, Mark Schutera
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.13041
Pdf link: https://arxiv.org/pdf/2405.13041
Abstract The assessment of societal biases within Large Language Models (LLMs) has emerged as a critical concern in the contemporary discourse surrounding Artificial Intelligence (AI) ethics and their impact. Especially, recognizing and considering political biases is important for practical applications to gain a deeper understanding of the possibilities and behaviors and to prevent unwanted statements. As the upcoming elections of the European Parliament will not remain unaffected by LLMs, we evaluate the bias of the current most popular open-source models concerning political issues within the European Union (EU) from a German perspective. To do so, we use the "Wahl-O-Mat", a voting advice application used in Germany, to determine which political party is the most aligned for the respective LLM. We show that larger models, such as Llama3-70B, tend to align more closely with left-leaning political parties like GRÜNE and Volt, while smaller models often remain neutral, particularly in English. This highlights the nuanced behavior of LLMs and the importance of language in shaping their political stances. Our findings underscore the importance of rigorously assessing and addressing societal bias in LLMs to safeguard the integrity and fairness of applications that employ the power of modern machine learning methods.
Towards Contactless Elevators with TinyML using CNN-based Person Detection and Keyword Spotting
Authors: Anway S. Pimpalkar, Deeplaxmi V. Niture
Subjects: Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.13051
Pdf link: https://arxiv.org/pdf/2405.13051
Abstract This study presents a proof of concept for a contactless elevator operation system aimed at minimizing human intervention while enhancing safety, intelligence, and efficiency. A microcontroller-based edge device executing tiny Machine Learning (tinyML) inferences is developed for elevator operation. Using person detection and keyword spotting algorithms, the system offers cost-effective and robust units requiring minimal infrastructural changes. The design incorporates preprocessing steps and quantized convolutional neural networks in a multitenant framework to optimize accuracy and response time. Results show a person detection accuracy of 83.34% and keyword spotting efficacy of 80.5%, with an overall latency under 5 seconds, indicating effectiveness in real-world scenarios. Unlike current high-cost and inconsistent contactless technologies, this system leverages tinyML to provide a cost-effective, reliable, and scalable solution, enhancing user safety and operational efficiency without significant infrastructural changes. The study highlights promising results, though further exploration is needed for scalability and integration with existing systems. The demonstrated energy efficiency, simplicity, and safety benefits suggest that tinyML adoption could revolutionize elevator systems, serving as a model for future technological advancements. This technology could significantly impact public health and convenience in multi-floor buildings by reducing physical contact and improving operational efficiency, particularly relevant in the context of pandemics or hygiene concerns.
A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis
Authors: Haocong Rao, Minlin Zeng, Xuejiao Zhao, Chunyan Miao
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.13082
Pdf link: https://arxiv.org/pdf/2405.13082
Abstract Recent years have witnessed an increasing global population affected by neurodegenerative diseases (NDs), which traditionally require extensive healthcare resources and human effort for medical diagnosis and monitoring. As a crucial disease-related motor symptom, human gait can be exploited to characterize different NDs. The current advances in artificial intelligence (AI) models enable automatic gait analysis for NDs identification and classification, opening a new avenue to facilitate faster and more cost-effective diagnosis of NDs. In this paper, we provide a comprehensive survey on recent progress of machine learning and deep learning based AI techniques applied to diagnosis of five typical NDs through gait. We provide an overview of the process of AI-assisted NDs diagnosis, and present a systematic taxonomy of existing gait data and AI models. Through an extensive review and analysis of 164 studies, we identify and discuss the challenges, potential solutions, and future directions in this field. Finally, we envision the prospective utilization of 3D skeleton data for human gait representation and the development of more efficient AI models for NDs diagnosis. We provide a public resource repository to track and facilitate developments in this emerging field: this https URL.
Combining Relevance and Magnitude for Resource-Aware DNN Pruning
Authors: Carla Fabiana Chiasserini, Francesco Malandrino, Nuria Molner, Zhiqiang Zhao
Subjects: Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2405.13088
Pdf link: https://arxiv.org/pdf/2405.13088
Abstract Pruning neural networks, i.e., removing some of their parameters whilst retaining their accuracy, is one of the main ways to reduce the latency of a machine learning pipeline, especially in resource- and/or bandwidth-constrained scenarios. In this context, the pruning technique, i.e., how to choose the parameters to remove, is critical to the system performance. In this paper, we propose a novel pruning approach, called FlexRel and predicated upon combining training-time and inference-time information, namely, parameter magnitude and relevance, in order to improve the resulting accuracy whilst saving both computational resources and bandwidth. Our performance evaluation shows that FlexRel is able to achieve higher pruning factors, saving over 35% bandwidth for typical accuracy targets.
The Role of Emotions in Informational Support Question-Response Pairs in Online Health Communities: A Multimodal Deep Learning Approach
Authors: Mohsen Jozani, Jason A. Williams, Ahmed Aleroud, Sarbottam Bhagat
Subjects: Subjects: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2405.13099
Pdf link: https://arxiv.org/pdf/2405.13099
Abstract This study explores the relationship between informational support seeking questions, responses, and helpfulness ratings in online health communities. We created a labeled data set of question-response pairs and developed multimodal machine learning and deep learning models to reliably predict informational support questions and responses. We employed explainable AI to reveal the emotions embedded in informational support exchanges, demonstrating the importance of emotion in providing informational support. This complex interplay between emotional and informational support has not been previously researched. The study refines social support theory and lays the groundwork for the development of user decision aids. Further implications are discussed.
A novel reliability attack of Physical Unclonable Functions
Authors: Gaoxiang Li, Yu Zhuang
Subjects: Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13147
Pdf link: https://arxiv.org/pdf/2405.13147
Abstract Physical Unclonable Functions (PUFs) are emerging as promising security primitives for IoT devices, providing device fingerprints based on physical characteristics. Despite their strengths, PUFs are vulnerable to machine learning (ML) attacks, including conventional and reliability-based attacks. Conventional ML attacks have been effective in revealing vulnerabilities of many PUFs, and reliability-based ML attacks are more powerful tools that have detected vulnerabilities of some PUFs that are resistant to conventional ML attacks. Since reliability-based ML attacks leverage information of PUFs' unreliability, we were tempted to examine the feasibility of building defense using reliability enhancing techniques, and have discovered that majority voting with reasonably high repeats provides effective defense against existing reliability-based ML attack methods. It is known that majority voting reduces but does not eliminate unreliability, we are motivated to investigate if new attack methods exist that can capture the low unreliability of highly but not-perfectly reliable PUFs, which led to the development of a new reliability representation and the new representation-enabled attack method that has experimentally cracked PUFs enhanced with majority voting of high repetitions.
Automated categorization of pre-trained models for software engineering: A case study with a Hugging Face dataset
Authors: Claudio Di Sipio, Riccardo Rubei, Juri Di Rocco, Davide Di Ruscio, Phuong T. Nguyen
Subjects: Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2405.13185
Pdf link: https://arxiv.org/pdf/2405.13185
Abstract Software engineering (SE) activities have been revolutionized by the advent of pre-trained models (PTMs), defined as large machine learning (ML) models that can be fine-tuned to perform specific SE tasks. However, users with limited expertise may need help to select the appropriate model for their current task. To tackle the issue, the Hugging Face (HF) platform simplifies the use of PTMs by collecting, storing, and curating several models. Nevertheless, the platform currently lacks a comprehensive categorization of PTMs designed specifically for SE, i.e., the existing tags are more suited to generic ML categories. This paper introduces an approach to address this gap by enabling the automatic classification of PTMs for SE tasks. First, we utilize a public dump of HF to extract PTMs information, including model documentation and associated tags. Then, we employ a semi-automated method to identify SE tasks and their corresponding PTMs from existing literature. The approach involves creating an initial mapping between HF tags and specific SE tasks, using a similarity-based strategy to identify PTMs with relevant tags. The evaluation shows that model cards are informative enough to classify PTMs considering the pipeline tag. Moreover, we provide a mapping between SE tasks and stored PTMs by relying on model names.
A machine learning framework for interpretable predictions in patient pathways: The case of predicting ICU admission for patients with symptoms of sepsis
Authors: Sandra Zilker, Sven Weinzierl, Mathias Kraus, Patrick Zschech, Martin Matzner
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13187
Pdf link: https://arxiv.org/pdf/2405.13187
Abstract Proactive analysis of patient pathways helps healthcare providers anticipate treatment-related risks, identify outcomes, and allocate resources. Machine learning (ML) can leverage a patient's complete health history to make informed decisions about future events. However, previous work has mostly relied on so-called black-box models, which are unintelligible to humans, making it difficult for clinicians to apply such models. Our work introduces PatWay-Net, an ML framework designed for interpretable predictions of admission to the intensive care unit (ICU) for patients with symptoms of sepsis. We propose a novel type of recurrent neural network and combine it with multi-layer perceptrons to process the patient pathways and produce predictive yet interpretable results. We demonstrate its utility through a comprehensive dashboard that visualizes patient health trajectories, predictive outcomes, and associated risks. Our evaluation includes both predictive performance - where PatWay-Net outperforms standard models such as decision trees, random forests, and gradient-boosted decision trees - and clinical utility, validated through structured interviews with clinicians. By providing improved predictive accuracy along with interpretable and actionable insights, PatWay-Net serves as a valuable tool for healthcare decision support in the critical case of patients with symptoms of sepsis.
Pragmatic auditing: a pilot-driven approach for auditing Machine Learning systems
Authors: Djalel Benbouzid, Christiane Plociennik, Laura Lucaj, Mihai Maftei, Iris Merget, Aljoscha Burchardt, Marc P. Hauer, Abdeldjallil Naceri, Patrick van der Smagt
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13191
Pdf link: https://arxiv.org/pdf/2405.13191
Abstract The growing adoption and deployment of Machine Learning (ML) systems came with its share of ethical incidents and societal concerns. It also unveiled the necessity to properly audit these systems in light of ethical principles. For such a novel type of algorithmic auditing to become standard practice, two main prerequisites need to be available: A lifecycle model that is tailored towards transparency and accountability, and a principled risk assessment procedure that allows the proper scoping of the audit. Aiming to make a pragmatic step towards a wider adoption of ML auditing, we present a respective procedure that extends the AI-HLEG guidelines published by the European Commission. Our audit procedure is based on an ML lifecycle model that explicitly focuses on documentation, accountability, and quality assurance; and serves as a common ground for alignment between the auditors and the audited organisation. We describe two pilots conducted on real-world use cases from two different organisations and discuss the shortcomings of ML algorithmic auditing as well as future directions thereof.
BenchNav: Simulation Platform for Benchmarking Off-road Navigation Algorithms with Probabilistic Traversability
Authors: Masafumi Endo, Kohei Honda, Genya Ishigami
Subjects: Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2405.13318
Pdf link: https://arxiv.org/pdf/2405.13318
Abstract As robotic navigation techniques in perception and planning advance, mobile robots increasingly venture into off-road environments involving complex traversability. However, selecting suitable planning methods remains a challenge due to their algorithmic diversity, as each offers unique benefits. To aid in algorithm design, we introduce BenchNav, an open-source PyTorch-based simulation platform for benchmarking off-road navigation with uncertain traversability. Built upon Gymnasium, BenchNav provides three key features: 1) a data generation pipeline for preparing synthetic natural environments, 2) built-in machine learning models for traversability prediction, and 3) consistent execution of path and motion planning across different algorithms. We show BenchNav's versatility through simulation examples in off-road environments, employing three representative planning algorithms from different domains. this https URL
FedCache 2.0: Exploiting the Potential of Distilled Data in Knowledge Cache-driven Federated Learning
Authors: Quyang Pan, Sheng Sun, Zhiyuan Wu, Yuwei Wang, Min Liu, Bo Gao
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13378
Pdf link: https://arxiv.org/pdf/2405.13378
Abstract Federated Edge Learning (FEL) has emerged as a promising approach for enabling edge devices to collaboratively train machine learning models while preserving data privacy. Despite its advantages, practical FEL deployment faces significant challenges related to device constraints and device-server interactions, necessitating heterogeneous, user-adaptive model training with limited and uncertain communication. In this paper, we introduce FedCache 2.0, a novel personalized FEL architecture that simultaneously addresses these challenges. FedCache 2.0 incorporates the benefits of both dataset distillation and knowledge cache-driven federated learning by storing and organizing distilled data as knowledge in the server-side knowledge cache. Moreover, a device-centric cache sampling strategy is introduced to tailor transferred knowledge for individual devices within controlled communication bandwidth. Extensive experiments on five datasets covering image recognition, audio understanding, and mobile sensor data mining tasks demonstrate that (1) FedCache 2.0 significantly outperforms state-of-the-art methods regardless of model structures, data distributions, and modalities. (2) FedCache 2.0 can train splendid personalized on-device models with at least $\times$28.6 improvement in communication efficiency.
NFCL: Simply interpretable neural networks for a short-term multivariate forecasting
Authors: Wonkeun Jo, Dongil Kim
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.13393
Pdf link: https://arxiv.org/pdf/2405.13393
Abstract Multivariate time-series forecasting (MTSF) stands as a compelling field within the machine learning community. Diverse neural network based methodologies deployed in MTSF applications have demonstrated commendable efficacy. Despite the advancements in model performance, comprehending the rationale behind the model's behavior remains an enigma. Our proposed model, the Neural ForeCasting Layer (NFCL), employs a straightforward amalgamation of neural networks. This uncomplicated integration ensures that each neural network contributes inputs and predictions independently, devoid of interference from other inputs. Consequently, our model facilitates a transparent explication of forecast results. This paper introduces NFCL along with its diverse extensions. Empirical findings underscore NFCL's superior performance compared to nine benchmark models across 15 available open datasets. Notably, NFCL not only surpasses competitors but also provides elucidation for its predictions. In addition, Rigorous experimentation involving diverse model structures bolsters the justification of NFCL's unique configuration.
Why do explanations fail? A typology and discussion on failures in XAI
Authors: Clara Bove, Thibault Laugel, Marie-Jeanne Lesot, Charles Tijus, Marcin Detyniecki
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2405.13474
Pdf link: https://arxiv.org/pdf/2405.13474
Abstract As Machine Learning (ML) models achieve unprecedented levels of performance, the XAI domain aims at making these models understandable by presenting end-users with intelligible explanations. Yet, some existing XAI approaches fail to meet expectations: several issues have been reported in the literature, generally pointing out either technical limitations or misinterpretations by users. In this paper, we argue that the resulting harms arise from a complex overlap of multiple failures in XAI, which existing ad-hoc studies fail to capture. This work therefore advocates for a holistic perspective, presenting a systematic investigation of limitations of current XAI methods and their impact on the interpretation of explanations. By distinguishing between system-specific and user-specific failures, we propose a typological framework that helps revealing the nuanced complexities of explanation failures. Leveraging this typology, we also discuss some research directions to help AI practitioners better understand the limitations of XAI systems and enhance the quality of ML explanations.
A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction
Authors: Yue Li, Florian A. Kunneman, Koen V. Hindriks
Subjects: Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2405.13477
Pdf link: https://arxiv.org/pdf/2405.13477
Abstract With current state-of-the-art automatic speech recognition (ASR) systems, it is not possible to transcribe overlapping speech audio streams separately. Consequently, when these ASR systems are used as part of a social robot like Pepper for interaction with a human, it is common practice to close the robot's microphone while it is talking itself. This prevents the human users to interrupt the robot, which limits speech-based human-robot interaction. To enable a more natural interaction which allows for such interruptions, we propose an audio processing pipeline for filtering out robot's ego speech using only a single-channel microphone. This pipeline takes advantage of the possibility to feed the robot ego speech signal, generated by a text-to-speech API, as training data into a machine learning model. The proposed pipeline combines a convolutional neural network and spectral subtraction to extract overlapping human speech from the audio recorded by the robot-embedded microphone. When evaluating on a held-out test set, we find that this pipeline outperforms our previous approach to this task, as well as state-of-the-art target speech extraction systems that were retrained on the same dataset. We have also integrated the proposed pipeline into a lightweight robot software development framework to make it available for broader use. As a step towards demonstrating the feasibility of deploying our pipeline, we use this framework to evaluate the effectiveness of the pipeline in a small lab-based feasibility pilot using the social robot Pepper. Our results show that when participants interrupt the robot, the pipeline can extract the participant's speech from one-second streaming audio buffers received by the robot-embedded single-channel microphone, hence in near-real time.
Bond Graphs for multi-physics informed Neural Networks for multi-variate time series
Authors: Alexis-Raja Brachet, Pierre-Yves Richard, Céline Hudelot
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.13586
Pdf link: https://arxiv.org/pdf/2405.13586
Abstract In the trend of hybrid Artificial Intelligence (AI) techniques, Physic Informed Machine Learning has seen a growing interest. It operates mainly by imposing a data, learning or inductive bias with simulation data, Partial Differential Equations or equivariance and invariance properties. While these models have shown great success on tasks involving one physical domain such as fluid dynamics, existing methods still struggle on tasks with complex multi-physical and multi-domain phenomena. To address this challenge, we propose to leverage Bond Graphs, a multi-physics modeling approach together with Graph Neural Network. We thus propose Neural Bond Graph Encoder (NBgE), a model agnostic physical-informed encoder tailored for multi-physics systems. It provides an unified framework for any multi-physics informed AI with a graph encoder readable for any deep learning model. Our experiments on two challenging multi-domain physical systems - a Direct Current Motor and the Respiratory system - demonstrate the effectiveness of our approach on a multi-variate time series forecasting task.
Almost sure convergence rates of stochastic gradient methods under gradient domination
Authors: Simon Weissmann, Sara Klein, Waïss Azizian, Leif Döring
Subjects: Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2405.13592
Pdf link: https://arxiv.org/pdf/2405.13592
Abstract Stochastic gradient methods are among the most important algorithms in training machine learning problems. While classical assumptions such as strong convexity allow a simple analysis they are rarely satisfied in applications. In recent years, global and local gradient domination properties have shown to be a more realistic replacement of strong convexity. They were proved to hold in diverse settings such as (simple) policy gradient methods in reinforcement learning and training of deep neural networks with analytic activation functions. We prove almost sure convergence rates $f(X_n)-f^*\in o\big( n^{-\frac{1}{4\beta-1}+\epsilon}\big)$ of the last iterate for stochastic gradient descent (with and without momentum) under global and local $\beta$-gradient domination assumptions. The almost sure rates get arbitrarily close to recent rates in expectation. Finally, we demonstrate how to apply our results to the training task in both supervised and reinforcement learning.
Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review
Authors: Hong Zhang, Jie Lin, Shengxuan Chen
Subjects: Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2405.13661
Pdf link: https://arxiv.org/pdf/2405.13661
Abstract Timbre, the sound's unique "color", is fundamental to how we perceive and appreciate music. This review explores the multifaceted world of timbre perception and representation. It begins by tracing the word's origin, offering an intuitive grasp of the concept. Building upon this foundation, the article delves into the complexities of defining and measuring timbre. It then explores the concept and techniques of timbre space, a powerful tool for visualizing how we perceive different timbres. The review further examines recent advancements in timbre manipulation and representation, including the increasingly utilized machine learning techniques. While the underlying neural mechanisms remain partially understood, the article discusses current neuroimaging techniques used to investigate this aspect of perception. Finally, it summarizes key takeaways, identifies promising future research directions, and emphasizes the potential applications of timbre research in music technology, assistive technologies, and our overall understanding of auditory perception.
Naturally Private Recommendations with Determinantal Point Processes
Authors: Jack Fitzsimons, Agustín Freitas Pasqualini, Robert Pisarczyk, Dmitrii Usynin
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.13677
Pdf link: https://arxiv.org/pdf/2405.13677
Abstract Often we consider machine learning models or statistical analysis methods which we endeavour to alter, by introducing a randomized mechanism, to make the model conform to a differential privacy constraint. However, certain models can often be implicitly differentially private or require significantly fewer alterations. In this work, we discuss Determinantal Point Processes (DPPs) which are dispersion models that balance recommendations based on both the popularity and the diversity of the content. We introduce DPPs, derive and discuss the alternations required for them to satisfy epsilon-Differential Privacy and provide an analysis of their sensitivity. We conclude by proposing simple alternatives to DPPs which would make them more efficient with respect to their privacy-utility trade-off.
Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com
Authors: Sergei Krutikov (1), Bulat Khaertdinov (2), Rodion Kiriukhin (1), Shubham Agrawal (1), Kees Jan De Vries (1) ((1) this http URL, (2) Maastricht University)
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13692
Pdf link: https://arxiv.org/pdf/2405.13692
Abstract Transformer-based neural networks, empowered by Self-Supervised Learning (SSL), have demonstrated unprecedented performance across various domains. However, related literature suggests that tabular Transformers may struggle to outperform classical Machine Learning algorithms, such as Gradient Boosted Decision Trees (GBDT). In this paper, we aim to challenge GBDTs with tabular Transformers on a typical task faced in e-commerce, namely fraud detection. Our study is additionally motivated by the problem of selection bias, often occurring in real-life fraud detection systems. It is caused by the production system affecting which subset of traffic becomes labeled. This issue is typically addressed by sampling randomly a small part of the whole production data, referred to as a Control Group. This subset follows a target distribution of production data and therefore is usually preferred for training classification models with standard ML algorithms. Our methodology leverages the capabilities of Transformers to learn transferable representations using all available data by means of SSL, giving it an advantage over classical methods. Furthermore, we conduct large-scale experiments, pre-training tabular Transformers on vast amounts of data instances and fine-tuning them on smaller target datasets. The proposed approach outperforms heavily tuned GBDTs by a considerable margin of the Average Precision (AP) score. Pre-trained models show more consistent performance than the ones trained from scratch when fine-tuning data is limited. Moreover, they require noticeably less labeled data for reaching performance comparable to their GBDT competitor that utilizes the whole dataset.
Mining Action Rules for Defect Reduction Planning
Authors: Khouloud Oueslati, Gabriel Laberge, Maxime Lamothe, Foutse Khomh
Subjects: Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13740
Pdf link: https://arxiv.org/pdf/2405.13740
Abstract Defect reduction planning plays a vital role in enhancing software quality and minimizing software maintenance costs. By training a black box machine learning model and "explaining" its predictions, explainable AI for software engineering aims to identify the code characteristics that impact maintenance risks. However, post-hoc explanations do not always faithfully reflect what the original model computes. In this paper, we introduce CounterACT, a Counterfactual ACTion rule mining approach that can generate defect reduction plans without black-box models. By leveraging action rules, CounterACT provides a course of action that can be considered as a counterfactual explanation for the class (e.g., buggy or not buggy) assigned to a piece of code. We compare the effectiveness of CounterACT with the original action rule mining algorithm and six established defect reduction approaches on 9 software projects. Our evaluation is based on (a) overlap scores between proposed code changes and actual developer modifications; (b) improvement scores in future releases; and (c) the precision, recall, and F1-score of the plans. Our results show that, compared to competing approaches, CounterACT's explainable plans achieve higher overlap scores at the release level (median 95%) and commit level (median 85.97%), and they offer better trade-off between precision and recall (median F1-score 88.12%). Finally, we venture beyond planning and explore leveraging Large Language models (LLM) for generating code edits from our generated plans. Our results show that suggested LLM code edits supported by our plans are actionable and are more likely to pass relevant test cases than vanilla LLM code recommendations.
A Dynamic Model of Performative Human-ML Collaboration: Theory and Empirical Evidence
Authors: Tom Sühr, Samira Samadi, Chiara Farronato
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); General Economics (econ.GN)
Arxiv link: https://arxiv.org/abs/2405.13753
Pdf link: https://arxiv.org/pdf/2405.13753
Abstract Machine learning (ML) models are increasingly used in various applications, from recommendation systems in e-commerce to diagnosis prediction in healthcare. In this paper, we present a novel dynamic framework for thinking about the deployment of ML models in a performative, human-ML collaborative system. In our framework, the introduction of ML recommendations changes the data generating process of human decisions, which are only a proxy to the ground truth and which are then used to train future versions of the model. We show that this dynamic process in principle can converge to different stable points, i.e. where the ML model and the Human+ML system have the same performance. Some of these stable points are suboptimal with respect to the actual ground truth. We conduct an empirical user study with 1,408 participants to showcase this process. In the study, humans solve instances of the knapsack problem with the help of machine learning predictions. This is an ideal setting because we can see how ML models learn to imitate human decisions and how this learning process converges to a stable point. We find that for many levels of ML performance, humans can improve the ML predictions to dynamically reach an equilibrium performance that is around 92% of the maximum knapsack value. We also find that the equilibrium performance could be even higher if humans rationally followed the ML recommendations. Finally, we test whether monetary incentives can increase the quality of human decisions, but we fail to find any positive effect. Our results have practical implications for the deployment of ML models in contexts where human decisions may deviate from the indisputable ground truth.
Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks
Authors: Mohit Prabhushankar, Ghassan AlRegib
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13758
Pdf link: https://arxiv.org/pdf/2405.13758
Abstract The widespread adoption of deep neural networks in machine learning calls for an objective quantification of esoteric trust. In this paper we propose GradTrust, a classification trust measure for large-scale neural networks at inference. The proposed method utilizes variance of counterfactual gradients, i.e. the required changes in the network parameters if the label were different. We show that GradTrust is superior to existing techniques for detecting misprediction rates on $50000$ images from ImageNet validation dataset. Depending on the network, GradTrust detects images where either the ground truth is incorrect or ambiguous, or the classes are co-occurring. We extend GradTrust to Video Action Recognition on Kinetics-400 dataset. We showcase results on $14$ architectures pretrained on ImageNet and $5$ architectures pretrained on Kinetics-400. We observe the following: (i) simple methodologies like negative log likelihood and margin classifiers outperform state-of-the-art uncertainty and out-of-distribution detection techniques for misprediction rates, and (ii) the proposed GradTrust is in the Top-2 performing methods on $37$ of the considered $38$ experimental modalities. The code is available at: this https URL
On the stability of second order gradient descent for time varying convex functions
Authors: Travis E. Gibson, Sawal Acharya, Anjali Parashar, Joseph E. Gaudio, Anurdha M. Annaswamy
Subjects: Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2405.13765
Pdf link: https://arxiv.org/pdf/2405.13765
Abstract Gradient based optimization algorithms deployed in Machine Learning (ML) applications are often analyzed and compared by their convergence rates or regret bounds. While these rates and bounds convey valuable information they don't always directly translate to stability guarantees. Stability and similar concepts, like robustness, will become ever more important as we move towards deploying models in real-time and safety critical systems. In this work we build upon the results in Gaudio et al. 2021 and Moreu and Annaswamy 2022 for second order gradient descent when applied to explicitly time varying cost functions and provide more general stability guarantees. These more general results can aid in the design and certification of these optimization schemes so as to help ensure safe and reliable deployment for real-time learning applications. We also hope that the techniques provided here will stimulate and cross-fertilize the analysis that occurs on the same algorithms from the online learning and stochastic optimization communities.
Efficient Two-Stage Gaussian Process Regression Via Automatic Kernel Search and Subsampling
Authors: Shifan Zhao, Jiaying Lu, Ji Yang (Carl), Edmond Chow, Yuanzhe Xi
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Probability (math.PR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2405.13785
Pdf link: https://arxiv.org/pdf/2405.13785
Abstract Gaussian Process Regression (GPR) is widely used in statistics and machine learning for prediction tasks requiring uncertainty measures. Its efficacy depends on the appropriate specification of the mean function, covariance kernel function, and associated hyperparameters. Severe misspecifications can lead to inaccurate results and problematic consequences, especially in safety-critical applications. However, a systematic approach to handle these misspecifications is lacking in the literature. In this work, we propose a general framework to address these issues. Firstly, we introduce a flexible two-stage GPR framework that separates mean prediction and uncertainty quantification (UQ) to prevent mean misspecification, which can introduce bias into the model. Secondly, kernel function misspecification is addressed through a novel automatic kernel search algorithm, supported by theoretical analysis, that selects the optimal kernel from a candidate set. Additionally, we propose a subsampling-based warm-start strategy for hyperparameter initialization to improve efficiency and avoid hyperparameter misspecification. With much lower computational cost, our subsampling-based strategy can yield competitive or better performance than training exclusively on the full dataset. Combining all these components, we recommend two GPR methods-exact and scalable-designed to match available computational resources and specific UQ requirements. Extensive evaluation on real-world datasets, including UCI benchmarks and a safety-critical medical case study, demonstrates the robustness and precision of our methods.
Towards Explainable Test Case Prioritisation with Learning-to-Rank Models
Authors: Aurora Ramírez, Mario Berrios, José Raúl Romero, Robert Feldt
Subjects: Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.13786
Pdf link: https://arxiv.org/pdf/2405.13786
Abstract Test case prioritisation (TCP) is a critical task in regression testing to ensure quality as software evolves. Machine learning has become a common way to achieve it. In particular, learning-to-rank (LTR) algorithms provide an effective method of ordering and prioritising test cases. However, their use poses a challenge in terms of explainability, both globally at the model level and locally for particular results. Here, we present and discuss scenarios that require different explanations and how the particularities of TCP (multiple builds over time, test case and test suite variations, etc.) could influence them. We include a preliminary experiment to analyse the similarity of explanations, showing that they do not only vary depending on test case-specific predictions, but also on the relative ranks.
Koopcon: A new approach towards smarter and less complex learning
Authors: Vahid Jebraeeli, Bo Jiang, Derya Cansever, Hamid Krim
Subjects: Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2405.13866
Pdf link: https://arxiv.org/pdf/2405.13866
Abstract In the era of big data, the sheer volume and complexity of datasets pose significant challenges in machine learning, particularly in image processing tasks. This paper introduces an innovative Autoencoder-based Dataset Condensation Model backed by Koopman operator theory that effectively packs large datasets into compact, information-rich representations. Inspired by the predictive coding mechanisms of the human brain, our model leverages a novel approach to encode and reconstruct data, maintaining essential features and label distributions. The condensation process utilizes an autoencoder neural network architecture, coupled with Optimal Transport theory and Wasserstein distance, to minimize the distributional discrepancies between the original and synthesized datasets. We present a two-stage implementation strategy: first, condensing the large dataset into a smaller synthesized subset; second, evaluating the synthesized data by training a classifier and comparing its performance with a classifier trained on an equivalent subset of the original data. Our experimental results demonstrate that the classifiers trained on condensed data exhibit comparable performance to those trained on the original datasets, thus affirming the efficacy of our condensation model. This work not only contributes to the reduction of computational resources but also paves the way for efficient data handling in constrained environments, marking a significant step forward in data-efficient machine learning.
LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework
Authors: Yiran Qiao, Xiang Ao, Yang Liu, Jiarong Xu, Xiaoqian Sun, Qing He
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.13902
Pdf link: https://arxiv.org/pdf/2405.13902
Abstract Recent prevailing works on graph machine learning typically follow a similar methodology that involves designing advanced variants of graph neural networks (GNNs) to maintain the superior performance of GNNs on different graphs. In this paper, we aim to streamline the GNN design process and leverage the advantages of Large Language Models (LLMs) to improve the performance of GNNs on downstream tasks. We formulate a new paradigm, coined "LLMs-as-Consultants," which integrates LLMs with GNNs in an interactive manner. A framework named LOGIN (LLM Consulted GNN training) is instantiated, empowering the interactive utilization of LLMs within the GNN training process. First, we attentively craft concise prompts for spotted nodes, carrying comprehensive semantic and topological information, and serving as input to LLMs. Second, we refine GNNs by devising a complementary coping mechanism that utilizes the responses from LLMs, depending on their correctness. We empirically evaluate the effectiveness of LOGIN on node classification tasks across both homophilic and heterophilic graphs. The results illustrate that even basic GNN architectures, when employed within the proposed LLMs-as-Consultants paradigm, can achieve comparable performance to advanced GNNs with intricate designs. Our codes are available at this https URL.
Memory Scraping Attack on Xilinx FPGAs: Private Data Extraction from Terminated Processes
Authors: Bharadwaj Madabhushi, Sandip Kundu, Daniel Holcomb
Subjects: Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2405.13927
Pdf link: https://arxiv.org/pdf/2405.13927
Abstract FPGA-based hardware accelerators are becoming increasingly popular due to their versatility, customizability, energy efficiency, constant latency, and scalability. FPGAs can be tailored to specific algorithms, enabling efficient hardware implementations that effectively leverage algorithm parallelism. This can lead to significant performance improvements over CPUs and GPUs, particularly for highly parallel applications. For example, a recent study found that Stratix 10 FPGAs can achieve up to 90\% of the performance of a TitanX Pascal GPU while consuming less than 50\% of the power. This makes FPGAs an attractive choice for accelerating machine learning (ML) workloads. However, our research finds privacy and security vulnerabilities in existing Xilinx FPGA-based hardware acceleration solutions. These vulnerabilities arise from the lack of memory initialization and insufficient process isolation, which creates potential avenues for unauthorized access to private data used by processes. To illustrate this issue, we conducted experiments using a Xilinx ZCU104 board running the PetaLinux tool from Xilinx. We found that PetaLinux does not effectively clear memory locations associated with a terminated process, leaving them vulnerable to memory scraping attack (MSA). This paper makes two main contributions. The first contribution is an attack methodology of using the Xilinx debugger from a different user space. We find that we are able to access process IDs, virtual address spaces, and pagemaps of one user from a different user space because of lack of adequate process isolation. The second contribution is a methodology for characterizing terminated processes and accessing their private data. We illustrate this on Xilinx ML application library.
Exploring the Relationship Between Feature Attribution Methods and Model Performance
Authors: Priscylla Silva, Claudio T. Silva, Luis Gustavo Nonato
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13957
Pdf link: https://arxiv.org/pdf/2405.13957
Abstract Machine learning and deep learning models are pivotal in educational contexts, particularly in predicting student success. Despite their widespread application, a significant gap persists in comprehending the factors influencing these models' predictions, especially in explainability within education. This work addresses this gap by employing nine distinct explanation methods and conducting a comprehensive analysis to explore the correlation between the agreement among these methods in generating explanations and the predictive model's performance. Applying Spearman's correlation, our findings reveal a very strong correlation between the model's performance and the agreement level observed among the explanation methods.
Analysis of Corrected Graph Convolutions
Authors: Robert Wang, Aseem Baranwal, Kimon Fountoulakis
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2405.13987
Pdf link: https://arxiv.org/pdf/2405.13987
Abstract Machine learning for node classification on graphs is a prominent area driven by applications such as recommendation systems. State-of-the-art models often use multiple graph convolutions on the data, as empirical evidence suggests they can enhance performance. However, it has been shown empirically and theoretically, that too many graph convolutions can degrade performance significantly, a phenomenon known as oversmoothing. In this paper, we provide a rigorous theoretical analysis, based on the contextual stochastic block model (CSBM), of the performance of vanilla graph convolution from which we remove the principal eigenvector to avoid oversmoothing. We perform a spectral analysis for $k$ rounds of corrected graph convolutions, and we provide results for partial and exact classification. For partial classification, we show that each round of convolution can reduce the misclassification error exponentially up to a saturation level, after which performance does not worsen. For exact classification, we show that the separability threshold can be improved exponentially up to $O({\log{n}}/{\log\log{n}})$ corrected convolutions.
AutoLCZ: Towards Automatized Local Climate Zone Mapping from Rule-Based Remote Sensing
Authors: Chenying Liu, Hunsoo Song, Anamika Shreevastava, Conrad M Albrecht
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Engineering, Finance, and Science (cs.CE); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.13993
Pdf link: https://arxiv.org/pdf/2405.13993
Abstract Local climate zones (LCZs) established a standard classification system to categorize the landscape universe for improved urban climate studies. Existing LCZ mapping is guided by human interaction with geographic information systems (GIS) or modelled from remote sensing (RS) data. GIS-based methods do not scale to large areas. However, RS-based methods leverage machine learning techniques to automatize LCZ classification from RS. Yet, RS-based methods require huge amounts of manual labels for training. We propose a novel LCZ mapping framework, termed AutoLCZ, to extract the LCZ classification features from high-resolution RS modalities. We study the definition of numerical rules designed to mimic the LCZ definitions. Those rules model geometric and surface cover properties from LiDAR data. Correspondingly, we enable LCZ classification from RS data in a GIS-based scheme. The proposed AutoLCZ method has potential to reduce the human labor to acquire accurate metadata. At the same time, AutoLCZ sheds light on the physical interpretability of RS-based methods. In a proof-of-concept for New York City (NYC) we leverage airborne LiDAR surveys to model 4 LCZ features to distinguish 10 LCZ types. The results indicate the potential of AutoLCZ as promising avenue for large-scale LCZ mapping from RS data.
Practical $0.385$-Approximation for Submodular Maximization Subject to a Cardinality Constraint
Authors: Murad Tukan, Loay Mualem, Moran Feldman
Subjects: Subjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2405.13994
Pdf link: https://arxiv.org/pdf/2405.13994
Abstract Non-monotone constrained submodular maximization plays a crucial role in various machine learning applications. However, existing algorithms often struggle with a trade-off between approximation guarantees and practical efficiency. The current state-of-the-art is a recent $0.401$-approximation algorithm, but its computational complexity makes it highly impractical. The best practical algorithms for the problem only guarantee $1/e$-approximation. In this work, we present a novel algorithm for submodular maximization subject to a cardinality constraint that combines a guarantee of $0.385$-approximation with a low and practical query complexity of $O(n+k^2)$. Furthermore, we evaluate the empirical performance of our algorithm in experiments based on various machine learning applications, including Movie Recommendation, Image Summarization, and more. These experiments demonstrate the efficacy of our approach.
Bridging Operator Learning and Conditioned Neural Fields: A Unifying Perspective
Authors: Sifan Wang, Jacob H Seidman, Shyam Sankaran, Hanwen Wang, George J. Pappas, Paris Perdikaris
Subjects: Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2405.13998
Pdf link: https://arxiv.org/pdf/2405.13998
Abstract Operator learning is an emerging area of machine learning which aims to learn mappings between infinite dimensional function spaces. Here we uncover a connection between operator learning architectures and conditioned neural fields from computer vision, providing a unified perspective for examining differences between popular operator learning models. We find that many commonly used operator learning models can be viewed as neural fields with conditioning mechanisms restricted to point-wise and/or global information. Motivated by this, we propose the Continuous Vision Transformer (CViT), a novel neural operator architecture that employs a vision transformer encoder and uses cross-attention to modulate a base field constructed with a trainable grid-based positional encoding of query coordinates. Despite its simplicity, CViT achieves state-of-the-art results across challenging benchmarks in climate modeling and fluid dynamics. Our contributions can be viewed as a first step towards adapting advanced computer vision architectures for building more flexible and accurate machine learning models in physical sciences.
Towards A Comprehensive Assessment of AI's Environmental Impact
Authors: Srija Chakraborty
Subjects: Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2405.14004
Pdf link: https://arxiv.org/pdf/2405.14004
Abstract Artificial Intelligence, machine learning (AI/ML) has allowed exploring solutions for a variety of environmental and climate questions ranging from natural disasters, greenhouse gas emission, monitoring biodiversity, agriculture, to weather and climate modeling, enabling progress towards climate change mitigation. However, the intersection of AI/ML and environment is not always positive. The recent surge of interest in ML, made possible by processing very large volumes of data, fueled by access to massive compute power, has sparked a trend towards large-scale adoption of AI/ML. This interest places tremendous pressure on natural resources, that are often overlooked and under-reported. There is a need for a framework that monitors the environmental impact and degradation from AI/ML throughout its lifecycle for informing policymakers, stakeholders to adequately implement standards and policies and track the policy outcome over time. For these policies to be effective, AI's environmental impact needs to be monitored in a spatially-disaggregated, timely manner across the globe at the key activity sites. This study proposes a methodology to track environmental variables relating to the multifaceted impact of AI around datacenters using openly available energy data and globally acquired satellite observations. We present a case study around Northern Virginia, United States that hosts a growing number of datacenters and observe changes in multiple satellite-based environmental metrics. We then discuss the steps to expand this methodology for comprehensive assessment of AI's environmental impact across the planet. We also identify data gaps and formulate recommendations for improving the understanding and monitoring AI-induced changes to the environment and climate.
Neural Scaling Laws for Embodied AI
Authors: Sebastian Sartor, Neil Thompson
Subjects: Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2405.14005
Pdf link: https://arxiv.org/pdf/2405.14005
Abstract Scaling laws have driven remarkable progress across machine learning domains like language modeling and computer vision. However, the exploration of scaling laws in embodied AI and robotics has been limited, despite the rapidly increasing usage of machine learning in this field. This paper presents the first study to quantify scaling laws for Robot Foundation Models (RFMs) and the use of LLMs in robotics tasks. Through a meta-analysis spanning 198 research papers, we analyze how key factors like compute, model size, and training data quantity impact model performance across various robotic tasks. Our findings confirm that scaling laws apply to both RFMs and LLMs in robotics, with performance consistently improving as resources increase. The power law coefficients for RFMs closely match those of LLMs in robotics, resembling those found in computer vision and outperforming those for LLMs in the language domain. We also note that these coefficients vary with task complexity, with familiar tasks scaling more efficiently than unfamiliar ones, emphasizing the need for large and diverse datasets. Furthermore, we highlight the absence of standardized benchmarks in embodied AI. Most studies indicate diminishing returns, suggesting that significant resources are necessary to achieve high performance, posing challenges due to data and computational limitations. Finally, as models scale, we observe the emergence of new capabilities, particularly related to data and model size.
Adversarial Training of Two-Layer Polynomial and ReLU Activation Networks via Convex Optimization
Authors: Daniel Kuelbs, Sanjay Lall, Mert Pilanci
Subjects: Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2405.14033
Pdf link: https://arxiv.org/pdf/2405.14033
Abstract Training neural networks which are robust to adversarial attacks remains an important problem in deep learning, especially as heavily overparameterized models are adopted in safety-critical settings. Drawing from recent work which reformulates the training problems for two-layer ReLU and polynomial activation networks as convex programs, we devise a convex semidefinite program (SDP) for adversarial training of polynomial activation networks via the S-procedure. We also derive a convex SDP to compute the minimum distance from a correctly classified example to the decision boundary of a polynomial activation network. Adversarial training for two-layer ReLU activation networks has been explored in the literature, but, in contrast to prior work, we present a scalable approach which is compatible with standard machine libraries and GPU acceleration. The adversarial training SDP for polynomial activation networks leads to large increases in robust test accuracy against $\ell^\infty$ attacks on the Breast Cancer Wisconsin dataset from the UCI Machine Learning Repository. For two-layer ReLU networks, we leverage our scalable implementation to retrain the final two fully connected layers of a Pre-Activation ResNet-18 model on the CIFAR-10 dataset. Our 'robustified' model achieves higher clean and robust test accuracies than the same architecture trained with sharpness-aware minimization.
A Concentration Inequality for Maximum Mean Discrepancy (MMD)-based Statistics and Its Application in Generative Models
Authors: Yijin Ni, Xiaoming Huo
Subjects: Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Arxiv link: https://arxiv.org/abs/2405.14051
Pdf link: https://arxiv.org/pdf/2405.14051
Abstract Maximum Mean Discrepancy (MMD) is a probability metric that has found numerous applications in machine learning. In this work, we focus on its application in generative models, including the minimum MMD estimator, Generative Moment Matching Network (GMMN), and Generative Adversarial Network (GAN). In these cases, MMD is part of an objective function in a minimization or min-max optimization problem. Even if its empirical performance is competitive, the consistency and convergence rate analysis of the corresponding MMD-based estimators has yet to be carried out. We propose a uniform concentration inequality for a class of Maximum Mean Discrepancy (MMD)-based estimators, that is, a maximum deviation bound of empirical MMD values over a collection of generated distributions and adversarially learned kernels. Here, our inequality serves as an efficient tool in the theoretical analysis for MMD-based generative models. As elaborating examples, we applied our main result to provide the generalization error bounds for the MMD-based estimators in the context of the minimum MMD estimator and MMD GAN.
Formally Verifying Deep Reinforcement Learning Controllers with Lyapunov Barrier Certificates
Authors: Udayan Mandal, Guy Amir, Haoze Wu, Ieva Daukantas, Fletcher Lee Newell, Umberto J. Ravaioli, Baoluo Meng, Michael Durling, Milan Ganai, Tobey Shim, Guy Katz, Clark Barrett
Subjects: Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2405.14058
Pdf link: https://arxiv.org/pdf/2405.14058
Abstract Deep reinforcement learning (DRL) is a powerful machine learning paradigm for generating agents that control autonomous systems. However, the "black box" nature of DRL agents limits their deployment in real-world safety-critical applications. A promising approach for providing strong guarantees on an agent's behavior is to use Neural Lyapunov Barrier (NLB) certificates, which are learned functions over the system whose properties indirectly imply that an agent behaves as desired. However, NLB-based certificates are typically difficult to learn and even more difficult to verify, especially for complex systems. In this work, we present a novel method for training and verifying NLB-based certificates for discrete-time systems. Specifically, we introduce a technique for certificate composition, which simplifies the verification of highly-complex systems by strategically designing a sequence of certificates. When jointly verified with neural network verification engines, these certificates provide a formal guarantee that a DRL agent both achieves its goals and avoids unsafe behavior. Furthermore, we introduce a technique for certificate filtering, which significantly simplifies the process of producing formally verified certificates. We demonstrate the merits of our approach with a case study on providing safety and liveness guarantees for a DRL-controlled spacecraft.
Probabilistic Inference in the Era of Tensor Networks and Differential Programming
Authors: Martin Roa-Villescas, Xuanzhao Gao, Sander Stuijk, Henk Corporaal, Jin-Guo Liu
Subjects: Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2405.14060
Pdf link: https://arxiv.org/pdf/2405.14060
Abstract Probabilistic inference is a fundamental task in modern machine learning. Recent advances in tensor network (TN) contraction algorithms have enabled the development of better exact inference methods. However, many common inference tasks in probabilistic graphical models (PGMs) still lack corresponding TN-based adaptations. In this work, we advance the connection between PGMs and TNs by formulating and implementing tensor-based solutions for the following inference tasks: (i) computing the partition function, (ii) computing the marginal probability of sets of variables in the model, (iii) determining the most likely assignment to a set of variables, and (iv) the same as (iii) but after having marginalized a different set of variables. We also present a generalized method for generating samples from a learned probability distribution. Our work is motivated by recent technical advances in the fields of quantum circuit simulation, quantum many-body physics, and statistical physics. Through an experimental evaluation, we demonstrate that the integration of these quantum technologies with a series of algorithms introduced in this study significantly improves the effectiveness of existing methods for solving probabilistic inference tasks.
A Neighbor-Searching Discrepancy-based Drift Detection Scheme for Learning Evolving Data
Authors: Feng Gu, Jie Lu, Zhen Fang, Kun Wang, Guangquan Zhang
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14153
Pdf link: https://arxiv.org/pdf/2405.14153
Abstract Uncertain changes in data streams present challenges for machine learning models to dynamically adapt and uphold performance in real-time. Particularly, classification boundary change, also known as real concept drift, is the major cause of classification performance deterioration. However, accurately detecting real concept drift remains challenging because the theoretical foundations of existing drift detection methods - two-sample distribution tests and monitoring classification error rate, both suffer from inherent limitations such as the inability to distinguish virtual drift (changes not affecting the classification boundary, will introduce unnecessary model maintenance), limited statistical power, or high computational cost. Furthermore, no existing detection method can provide information on the trend of the drift, which could be invaluable for model maintenance. This work presents a novel real concept drift detection method based on Neighbor-Searching Discrepancy, a new statistic that measures the classification boundary difference between two samples. The proposed method is able to detect real concept drift with high accuracy while ignoring virtual drift. It can also indicate the direction of the classification boundary change by identifying the invasion or retreat of a certain class, which is also an indicator of separability change between classes. A comprehensive evaluation of 11 experiments is conducted, including empirical verification of the proposed theory using artificial datasets, and experimental comparisons with commonly used drift handling methods on real-world datasets. The results show that the proposed theory is robust against a range of distributions and dimensions, and the drift detection method outperforms state-of-the-art alternative methods.
Fairness Hub Technical Briefs: Definition and Detection of Distribution Shift
Authors: Nicolas Acevedo, Carmen Cortez, Chris Brooks, Rene Kizilcec, Renzhe Yu
Subjects: Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2405.14186
Pdf link: https://arxiv.org/pdf/2405.14186
Abstract Distribution shift is a common situation in machine learning tasks, where the data used for training a model is different from the data the model is applied to in the real world. This issue arises across multiple technical settings: from standard prediction tasks, to time-series forecasting, and to more recent applications of large language models (LLMs). This mismatch can lead to performance reductions, and can be related to a multiplicity of factors: sampling issues and non-representative data, changes in the environment or policies, or the emergence of previously unseen scenarios. This brief focuses on the definition and detection of distribution shifts in educational settings. We focus on standard prediction problems, where the task is to learn a model that takes in a series of input (predictors) $X=(x_1,x_2,...,x_m)$ and produces an output $Y=f(X)$.
Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds
Authors: Hanwei Zhang, Luo Cheng, Qisong He, Wei Huang, Renjue Li, Ronan Sicre, Xiaowei Huang, Holger Hermanns, Lijun Zhang
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2405.14210
Pdf link: https://arxiv.org/pdf/2405.14210
Abstract Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect that a seemingly well-trained model ends up misclassifying the input. This paper adds to the understanding of adversarial attacks by presenting Eidos, a framework providing Efficient Imperceptible aDversarial attacks on 3D pOint cloudS. Eidos supports a diverse set of imperceptibility metrics. It employs an iterative, two-step procedure to identify optimal adversarial examples, thereby enabling a runtime-imperceptibility trade-off. We provide empirical evidence relative to several popular 3D point cloud classification models and several established 3D attack methods, showing Eidos' superiority with respect to efficiency as well as imperceptibility.
RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder
Authors: Jiwan Seo, Joonhyuk Kang
Subjects: Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2405.14222
Pdf link: https://arxiv.org/pdf/2405.14222
Abstract Vector Quantized Variational AutoEncoder (VQ-VAE) is an established technique in machine learning for learning discrete representations across various modalities. However, its scalability and applicability are limited by the need to retrain the model to adjust the codebook for different data or model scales. We introduce the Rate-Adaptive VQ-VAE (RAQ-VAE) framework, which addresses this challenge with two novel codebook representation methods: a model-based approach using a clustering-based technique on an existing well-trained VQ-VAE model, and a data-driven approach utilizing a sequence-to-sequence (Seq2Seq) model for variable-rate codebook generation. Our experiments demonstrate that RAQ-VAE achieves effective reconstruction performance across multiple rates, often outperforming conventional fixed-rate VQ-VAE models. This work enhances the adaptability and performance of VQ-VAEs, with broad applications in data reconstruction, generation, and computer vision tasks.
FloodDamageCast: Building Flood Damage Nowcasting with Machine Learning and Data Augmentation
Authors: Chia-Fu Liu, Lipai Huang, Kai Yin, Sam Brody, Ali Mostafavi
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14232
Pdf link: https://arxiv.org/pdf/2405.14232
Abstract Near-real time estimation of damage to buildings and infrastructure, referred to as damage nowcasting in this study, is crucial for empowering emergency responders to make informed decisions regarding evacuation orders and infrastructure repair priorities during disaster response and recovery. Here, we introduce FloodDamageCast, a machine learning framework tailored for property flood damage nowcasting. The framework leverages heterogeneous data to predict residential flood damage at a resolution of 500 meters by 500 meters within Harris County, Texas, during the 2017 Hurricane Harvey. To deal with data imbalance, FloodDamageCast incorporates a generative adversarial networks-based data augmentation coupled with an efficient machine learning model. The results demonstrate the model's ability to identify high-damage spatial areas that would be overlooked by baseline models. Insights gleaned from flood damage nowcasting can assist emergency responders to more efficiently identify repair needs, allocate resources, and streamline on-the-ground inspections, thereby saving both time and effort.
Sparse $L^1$-Autoencoders for Scientific Data Compression
Authors: Matthias Chung, Rick Archibald, Paul Atzberger, Jack Michael Solomon
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2405.14270
Pdf link: https://arxiv.org/pdf/2405.14270
Abstract Scientific datasets present unique challenges for machine learning-driven compression methods, including more stringent requirements on accuracy and mitigation of potential invalidating artifacts. Drawing on results from compressed sensing and rate-distortion theory, we introduce effective data compression methods by developing autoencoders using high dimensional latent spaces that are $L^1$-regularized to obtain sparse low dimensional representations. We show how these information-rich latent spaces can be used to mitigate blurring and other artifacts to obtain highly effective data compression methods for scientific data. We demonstrate our methods for short angle scattering (SAS) datasets showing they can achieve compression ratios around two orders of magnitude and in some cases better. Our compression methods show promise for use in addressing current bottlenecks in transmission, storage, and analysis in high-performance distributed computing environments. This is central to processing the large volume of SAS data being generated at shared experimental facilities around the world to support scientific investigations. Our approaches provide general ways for obtaining specialized compression methods for targeted scientific datasets.
AdaGMLP: AdaBoosting GNN-to-MLP Knowledge Distillation
Authors: Weigang Lu, Ziyu Guan, Wei Zhao, Yaming Yang
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.14307
Pdf link: https://arxiv.org/pdf/2405.14307
Abstract Graph Neural Networks (GNNs) have revolutionized graph-based machine learning, but their heavy computational demands pose challenges for latency-sensitive edge devices in practical industrial applications. In response, a new wave of methods, collectively known as GNN-to-MLP Knowledge Distillation, has emerged. They aim to transfer GNN-learned knowledge to a more efficient MLP student, which offers faster, resource-efficient inference while maintaining competitive performance compared to GNNs. However, these methods face significant challenges in situations with insufficient training data and incomplete test data, limiting their applicability in real-world applications. To address these challenges, we propose AdaGMLP, an AdaBoosting GNN-to-MLP Knowledge Distillation framework. It leverages an ensemble of diverse MLP students trained on different subsets of labeled nodes, addressing the issue of insufficient training data. Additionally, it incorporates a Node Alignment technique for robust predictions on test data with missing or incomplete features. Our experiments on seven benchmark datasets with different settings demonstrate that AdaGMLP outperforms existing G2M methods, making it suitable for a wide range of latency-sensitive real-world applications. We have submitted our code to the GitHub repository (this https URL).
SmartCS: Enabling the Creation of ML-Powered Computer Vision Mobile Apps for Citizen Science Applications without Coding
Authors: Fahim Hasan Khan, Akila de Silva, Gregory Dusek, James Davis, Alex Pang
Subjects: Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2405.14323
Pdf link: https://arxiv.org/pdf/2405.14323
Abstract It is undeniable that citizen science contributes to the advancement of various fields of study. There are now software tools that facilitate the development of citizen science apps. However, apps developed with these tools rely on individual human skills to correctly collect useful data. Machine learning (ML)-aided apps provide on-field guidance to citizen scientists on data collection tasks. However, these apps rely on server-side ML support, and therefore need a reliable internet connection. Furthermore, the development of citizen science apps with ML support requires a significant investment of time and money. For some projects, this barrier may preclude the use of citizen science effectively. We present a platform that democratizes citizen science by making it accessible to a much broader audience of both researchers and participants. The SmartCS platform allows one to create citizen science apps with ML support quickly and without coding skills. Apps developed using SmartCS have client-side ML support, making them usable in the field, even when there is no internet connection. The client-side ML helps educate users to better recognize the subjects, thereby enabling high-quality data collection. We present several citizen science apps created using SmartCS, some of which were conceived and created by high school students.
Explaining Graph Neural Networks via Structure-aware Interaction Index
Authors: Ngoc Bui, Hieu Trung Nguyen, Viet Anh Nguyen, Rex Ying
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14352
Pdf link: https://arxiv.org/pdf/2405.14352
Abstract The Shapley value is a prominent tool for interpreting black-box machine learning models thanks to its strong theoretical foundation. However, for models with structured inputs, such as graph neural networks, existing Shapley-based explainability approaches either focus solely on node-wise importance or neglect the graph structure when perturbing the input instance. This paper introduces the Myerson-Taylor interaction index that internalizes the graph structure into attributing the node values and the interaction values among nodes. Unlike the Shapley-based methods, the Myerson-Taylor index decomposes coalitions into components satisfying a pre-chosen connectivity criterion. We prove that the Myerson-Taylor index is the unique one that satisfies a system of five natural axioms accounting for graph structure and high-order interaction among nodes. Leveraging these properties, we propose Myerson-Taylor Structure-Aware Graph Explainer (MAGE), a novel explainer that uses the second-order Myerson-Taylor index to identify the most important motifs influencing the model prediction, both positively and negatively. Extensive experiments on various graph datasets and models demonstrate that our method consistently provides superior subgraph explanations compared to state-of-the-art methods.
Exact Gauss-Newton Optimization for Training Deep Neural Networks
Authors: Mikalai Korbit, Adeyemi D. Adeoye, Alberto Bemporad, Mario Zanon
Subjects: Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2405.14402
Pdf link: https://arxiv.org/pdf/2405.14402
Abstract We present EGN, a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges to an $\epsilon$-stationary point at a linear rate. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, and SGN optimizers across various supervised and reinforcement learning tasks.
Unraveling overoptimism and publication bias in ML-driven science
Authors: Pouria Saidi, Gautam Dasarathy, Visar Berisha
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2405.14422
Pdf link: https://arxiv.org/pdf/2405.14422
Abstract Machine Learning (ML) is increasingly used across many disciplines with impressive reported results across many domain areas. However, recent studies suggest that the published performance of ML models are often overoptimistic and not reflective of true accuracy were these models to be deployed. Validity concerns are underscored by findings of a concerning inverse relationship between sample size and reported accuracy in published ML models across several domains. This is in contrast with the theory of learning curves in ML, where we expect accuracy to improve or stay the same with increasing sample size. This paper investigates the factors contributing to overoptimistic accuracy reports in ML-based science, focusing on data leakage and publication bias. Our study introduces a novel stochastic model for observed accuracy, integrating parametric learning curves and the above biases. We then construct an estimator based on this model that corrects for these biases in observed data. Theoretical and empirical results demonstrate that this framework can estimate the underlying learning curve that gives rise to the observed overoptimistic results, thereby providing more realistic performance assessments of ML performance from a collection of published results. We apply the model to various meta-analyses in the digital health literature, including neuroimaging-based and speech-based classifications of several neurological conditions. Our results indicate prevalent overoptimism across these fields and we estimate the inherent limits of ML-based prediction in each domain.
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks
Authors: Michelle Halbheer, Dominik J. Mühlematter, Alexander Becker, Dominik Narnhofer, Helge Aasen, Konrad Schindler, Mehmet Ozgur Turkoglu
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14438
Pdf link: https://arxiv.org/pdf/2405.14438
Abstract Numerous crucial tasks in real-world decision-making rely on machine learning algorithms with calibrated uncertainty estimates. However, modern methods often yield overconfident and uncalibrated predictions. Various approaches involve training an ensemble of separate models to quantify the uncertainty related to the model itself, known as epistemic uncertainty. In an explicit implementation, the ensemble approach has high computational cost and high memory requirements. This particular challenge is evident in state-of-the-art neural networks such as transformers, where even a single network is already demanding in terms of compute and memory. Consequently, efforts are made to emulate the ensemble model without actually instantiating separate ensemble members, referred to as implicit ensembling. We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks, which is based on Low-Rank Adaptation (LoRA). Initially developed for efficient LLM fine-tuning, we extend LoRA to an implicit ensembling approach. By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections. Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
Bayesian Adaptive Calibration and Optimal Design
Authors: Rafael Oliveira, Dino Sejdinovic, David Howard, Edwin Bonilla
Subjects: Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2405.14440
Pdf link: https://arxiv.org/pdf/2405.14440
Abstract The process of calibrating computer models of natural phenomena is essential for applications in the physical sciences, where plenty of domain knowledge can be embedded into simulations and then calibrated against real observations. Current machine learning approaches, however, mostly rely on rerunning simulations over a fixed set of designs available in the observed data, potentially neglecting informative correlations across the design space and requiring a large amount of simulations. Instead, we consider the calibration process from the perspective of Bayesian adaptive experimental design and propose a data-efficient algorithm to run maximally informative simulations within a batch-sequential process. At each round, the algorithm jointly estimates the parameters of the posterior distribution and optimal designs by maximising a variational lower bound of the expected information gain. The simulator is modelled as a sample from a Gaussian process, which allows us to correlate simulations and observed data with the unknown calibration parameters. We show the benefits of our method when compared to related approaches across synthetic and real-data problems.
Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model
Authors: Tudor Cebere, Aurélien Bellet, Nicolas Papernot
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.14457
Pdf link: https://arxiv.org/pdf/2405.14457
Abstract Machine learning models can be trained with formal privacy guarantees via differentially private optimizers such as DP-SGD. In this work, we study such privacy guarantees when the adversary only accesses the final model, i.e., intermediate model updates are not released. In the existing literature, this hidden state threat model exhibits a significant gap between the lower bound provided by empirical privacy auditing and the theoretical upper bound provided by privacy accounting. To challenge this gap, we propose to audit this threat model with adversaries that craft a gradient sequence to maximize the privacy loss of the final model without accessing intermediate models. We demonstrate experimentally how this approach consistently outperforms prior attempts at auditing the hidden state model. When the crafted gradient is inserted at every optimization step, our results imply that releasing only the final model does not amplify privacy, providing a novel negative result. On the other hand, when the crafted gradient is not inserted at every step, we show strong evidence that a privacy amplification phenomenon emerges in the general non-convex setting (albeit weaker than in convex regimes), suggesting that existing privacy upper bounds can be improved.
Explainable automatic industrial carbon footprint estimation from bank transaction classification using natural language processing
Authors: Jaime González-González, Silvia García-Méndez, Francisco de Arriba-Pérez, Francisco J. González-Castaño, Óscar Barba-Seara
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14505
Pdf link: https://arxiv.org/pdf/2405.14505
Abstract Concerns about the effect of greenhouse gases have motivated the development of certification protocols to quantify the industrial carbon footprint (CF). These protocols are manual, work-intensive, and expensive. All of the above have led to a shift towards automatic data-driven approaches to estimate the CF, including Machine Learning (ML) solutions. Unfortunately, the decision-making processes involved in these solutions lack transparency from the end user's point of view, who must blindly trust their outcomes compared to intelligible traditional manual approaches. In this research, manual and automatic methodologies for CF estimation were reviewed, taking into account their transparency limitations. This analysis led to the proposal of a new explainable ML solution for automatic CF calculations through bank transaction classification. Consideration should be given to the fact that no previous research has considered the explainability of bank transaction classification for this purpose. For classification, different ML models have been employed based on their promising performance in the literature, such as Support Vector Machine, Random Forest, and Recursive Neural Networks. The results obtained were in the 90 % range for accuracy, precision, and recall evaluation metrics. From their decision paths, the proposed solution estimates the CO2 emissions associated with bank transactions. The explainability methodology is based on an agnostic evaluation of the influence of the input terms extracted from the descriptions of transactions using locally interpretable models. The explainability terms were automatically validated using a similarity metric over the descriptions of the target categories. Conclusively, the explanation performance is satisfactory in terms of the proximity of the explanations to the associated activity sector descriptions.
A New Formulation for Zeroth-Order Optimization of Adversarial EXEmples in Malware Detection
Authors: Marco Rando, Luca Demetrio, Lorenzo Rosasco, Fabio Roli
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14519
Pdf link: https://arxiv.org/pdf/2405.14519
Abstract Machine learning malware detectors are vulnerable to adversarial EXEmples, i.e. carefully-crafted Windows programs tailored to evade detection. Unlike other adversarial problems, attacks in this context must be functionality-preserving, a constraint which is challenging to address. As a consequence heuristic algorithms are typically used, that inject new content, either randomly-picked or harvested from legitimate programs. In this paper, we show how learning malware detectors can be cast within a zeroth-order optimization framework which allows to incorporate functionality-preserving manipulations. This permits the deployment of sound and efficient gradient-free optimization algorithms, which come with theoretical guarantees and allow for minimal hyper-parameters tuning. As a by-product, we propose and study ZEXE, a novel zero-order attack against Windows malware detection. Compared to state-of-the-art techniques, ZEXE provides drastic improvement in the evasion rate, while reducing to less than one third the size of the injected content.
Explaining Black-box Model Predictions via Two-level Nested Feature Attributions with Consistency Property
Authors: Yuya Yoshikawa, Masanari Kimura, Ryotaro Shimizu, Yuki Saito
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2405.14522
Pdf link: https://arxiv.org/pdf/2405.14522
Abstract Techniques that explain the predictions of black-box machine learning models are crucial to make the models transparent, thereby increasing trust in AI systems. The input features to the models often have a nested structure that consists of high- and low-level features, and each high-level feature is decomposed into multiple low-level features. For such inputs, both high-level feature attributions (HiFAs) and low-level feature attributions (LoFAs) are important for better understanding the model's decision. In this paper, we propose a model-agnostic local explanation method that effectively exploits the nested structure of the input to estimate the two-level feature attributions simultaneously. A key idea of the proposed method is to introduce the consistency property that should exist between the HiFAs and LoFAs, thereby bridging the separate optimization problems for estimating them. Thanks to this consistency property, the proposed method can produce HiFAs and LoFAs that are both faithful to the black-box models and consistent with each other, using a smaller number of queries to the models. In experiments on image classification in multiple instance learning and text classification using language models, we demonstrate that the HiFAs and LoFAs estimated by the proposed method are accurate, faithful to the behaviors of the black-box models, and provide consistent explanations.
Towards Privacy-Aware and Personalised Assistive Robots: A User-Centred Approach
Authors: Fernando E. Casado
Subjects: Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14528
Pdf link: https://arxiv.org/pdf/2405.14528
Abstract The global increase in the elderly population necessitates innovative long-term care solutions to improve the quality of life for vulnerable individuals while reducing caregiver burdens. Assistive robots, leveraging advancements in Machine Learning, offer promising personalised support. However, their integration into daily life raises significant privacy concerns. Widely used frameworks like the Robot Operating System (ROS) historically lack inherent privacy mechanisms, complicating data-driven approaches in robotics. This research pioneers user-centric, privacy-aware technologies such as Federated Learning (FL) to advance assistive robotics. FL enables collaborative learning without sharing sensitive data, addressing privacy and scalability issues. This work includes developing solutions for smart wheelchair assistance, enhancing user independence and well-being. By tackling challenges related to non-stationary data and heterogeneous environments, the research aims to improve personalisation and user experience. Ultimately, it seeks to lead the responsible integration of assistive robots into society, enhancing the quality of life for elderly and care-dependent individuals.
Nuclear Norm Regularization for Deep Learning
Authors: Christopher Scarvelis, Justin Solomon
Subjects: Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2405.14544
Pdf link: https://arxiv.org/pdf/2405.14544
Abstract Penalizing the nuclear norm of a function's Jacobian encourages it to locally behave like a low-rank linear map. Such functions vary locally along only a handful of directions, making the Jacobian nuclear norm a natural regularizer for machine learning problems. However, this regularizer is intractable for high-dimensional problems, as it requires computing a large Jacobian matrix and taking its singular value decomposition. We show how to efficiently penalize the Jacobian nuclear norm using techniques tailor-made for deep learning. We prove that for functions parametrized as compositions $f = g \circ h$, one may equivalently penalize the average squared Frobenius norm of $Jg$ and $Jh$. We then propose a denoising-style approximation that avoids the Jacobian computations altogether. Our method is simple, efficient, and accurate, enabling Jacobian nuclear norm regularization to scale to high-dimensional deep learning problems. We complement our theory with an empirical study of our regularizer's performance and investigate applications to denoising and representation learning.
Rapid modelling of reactive transport in porous media using machine learning: limitations and solutions
Authors: Vinicius L S Silva, Geraldine Regnier, Pablo Salinas, Claire E Heaney, Matthew D Jackson, Christopher C Pain
Subjects: Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2405.14548
Pdf link: https://arxiv.org/pdf/2405.14548
Abstract Reactive transport in porous media plays a pivotal role in subsurface reservoir processes, influencing fluid properties and geochemical characteristics. However, coupling fluid flow and transport with geochemical reactions is computationally intensive, requiring geochemical calculations at each grid cell and each time step within a discretized simulation domain. Although recent advancements have integrated machine learning techniques as surrogates for geochemical simulations, ensuring computational efficiency and accuracy remains a challenge. This chapter investigates machine learning models as replacements for a geochemical module in a reactive transport in porous media simulation. We test this approach on a well-documented cation exchange problem. While the surrogate models excel in isolated predictions, they fall short in rollout predictions over successive time steps. By introducing modifications, including physics-based constraints and tailored dataset generation strategies, we show that machine learning surrogates can achieve accurate rollout predictions. Our findings emphasize that, when judiciously designed, machine learning surrogates can substantially expedite the cation exchange problem without compromising accuracy, offering significant potential for a range of reactive transport applications.
Deep Learning Classification of Photoplethysmogram Signal for Hypertension Levels
Authors: Nida Nasir, Mustafa Sameer, Feras Barneih, Omar Alshaltone, Muneeb Ahmed
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2405.14556
Pdf link: https://arxiv.org/pdf/2405.14556
Abstract Continuous photoplethysmography (PPG)-based blood pressure monitoring is necessary for healthcare and fitness applications. In Artificial Intelligence (AI), signal classification levels with the machine and deep learning arrangements need to be explored further. Techniques based on time-frequency spectra, such as Short-time Fourier Transform (STFT), have been used to address the challenges of motion artifact correction. Therefore, the proposed study works with PPG signals of more than 200 patients (650+ signal samples) with hypertension, using STFT with various Neural Networks (Convolution Neural Network (CNN), Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (Bi-LSTM), followed by machine learning classifiers, such as, Support Vector Machine (SVM) and Random Forest (RF). The classification has been done for two categories: Prehypertension (normal levels) and Hypertension (includes Stage I and Stage II). Various performance metrics have been obtained with two batch sizes of 3 and 16 for the fusion of the neural networks. With precision and specificity of 100% and recall of 82.1%, the LSTM model provides the best results among all combinations of Neural Networks. However, the maximum accuracy of 71.9% is achieved by the LSTM-CNN model. Further stacked Ensemble method has been used to achieve 100% accuracy for Meta-LSTM-RF, Meta- LSTM-CNN-RF and Meta- STFT-CNN-SVM.
Linear Mode Connectivity in Differentiable Tree Ensembles
Authors: Ryuichi Kanoh, Mahito Sugiyama
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14596
Pdf link: https://arxiv.org/pdf/2405.14596
Abstract Linear Mode Connectivity (LMC) refers to the phenomenon that performance remains consistent for linearly interpolated models in the parameter space. For independently optimized model pairs from different random initializations, achieving LMC is considered crucial for validating the stable success of the non-convex optimization in modern machine learning models and for facilitating practical parameter-based operations such as model merging. While LMC has been achieved for neural networks by considering the permutation invariance of neurons in each hidden layer, its attainment for other models remains an open question. In this paper, we first achieve LMC for soft tree ensembles, which are tree-based differentiable models extensively used in practice. We show the necessity of incorporating two invariances: subtree flip invariance and splitting order invariance, which do not exist in neural networks but are inherent to tree architectures, in addition to permutation invariance of trees. Moreover, we demonstrate that it is even possible to exclude such additional invariances while keeping LMC by designing decision list-based tree architectures, where such invariances do not exist by definition. Our findings indicate the significance of accounting for architecture-specific invariances in achieving LMC.
Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models
Authors: Jingyi Chen, Ju-Seung Byun, Micha Elsner, Andrew Perrault
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.14632
Pdf link: https://arxiv.org/pdf/2405.14632
Abstract Recent advancements in generative models have sparked significant interest within the machine learning community. Particularly, diffusion models have demonstrated remarkable capabilities in synthesizing images and speech. Studies such as those by Lee et al. [19], Black et al. [4], Wang et al. [36], and Fan et al. [8] illustrate that Reinforcement Learning with Human Feedback (RLHF) can enhance diffusion models for image synthesis. However, due to architectural differences between these models and those employed in speech synthesis, it remains uncertain whether RLHF could similarly benefit speech synthesis models. In this paper, we explore the practical application of RLHF to diffusion-based text-to-speech synthesis, leveraging the mean opinion score (MOS) as predicted by UTokyo-SaruLab MOS prediction system [29] as a proxy loss. We introduce diffusion model loss-guided RL policy optimization (DLPO) and compare it against other RLHF approaches, employing the NISQA speech quality and naturalness assessment model [21] and human preference experiments for further evaluation. Our results show that RLHF can enhance diffusion-based text-to-speech synthesis models, and, moreover, DLPO can better improve diffusion models in generating natural and high quality speech audios.
Efficiency for Free: Ideal Data Are Transportable Representations
Authors: Peng Sun, Yi Jiang, Tao Lin
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.14669
Pdf link: https://arxiv.org/pdf/2405.14669
Abstract Data, the seminal opportunity and challenge in modern machine learning, currently constrains the scalability of representation learning and impedes the pace of model evolution. Existing paradigms tackle the issue of learning efficiency over massive datasets from the perspective of self-supervised learning and dataset distillation independently, while neglecting the untapped potential of accelerating representation learning from an intermediate standpoint. In this work, we delve into defining the ideal data properties from both optimization and generalization perspectives. We propose that model-generated representations, despite being trained on diverse tasks and architectures, converge to a shared linear space, facilitating effective linear transport between models. Furthermore, we demonstrate that these representations exhibit properties conducive to the formation of ideal data. The theoretical/empirical insights therein inspire us to propose a Representation Learning Accelerator (ReLA), which leverages a task- and architecture-agnostic, yet publicly available, free model to form a dynamic data subset and thus accelerate (self-)supervised learning. For instance, employing a CLIP ViT B/16 as a prior model for dynamic data generation, ReLA-aided BYOL can train a ResNet-50 from scratch with 50% of ImageNet-1K, yielding performance surpassing that of training on the full dataset. Additionally, employing a ResNet-18 pre-trained on CIFAR-10 can enhance ResNet-50 training on 10% of ImageNet-1K, resulting in a 7.7% increase in accuracy.
Defining error accumulation in ML atmospheric simulators
Authors: Raghul Parthipan, Mohit Anand, Hannah M. Christensen, J. Scott Hosking, Damon J. Wischik
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14714
Pdf link: https://arxiv.org/pdf/2405.14714
Abstract Machine learning (ML) has recently shown significant promise in modelling atmospheric systems, such as the weather. Many of these ML models are autoregressive, and error accumulation in their forecasts is a key problem. However, there is no clear definition of what `error accumulation' actually entails. In this paper, we propose a definition and an associated metric to measure it. Our definition distinguishes between errors which are due to model deficiencies, which we may hope to fix, and those due to the intrinsic properties of atmospheric systems (chaos, unobserved variables), which are not fixable. We illustrate the usefulness of this definition by proposing a simple regularization loss penalty inspired by it. This approach shows performance improvements (according to RMSE and spread/skill) in a selection of atmospheric systems, including the real-world weather prediction task.
A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness: Preliminary Results
Authors: Karima Makhlouf, Tamara Stefanovic, Heber H. Arcolezi, Catuscia Palamidessi
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.14725
Pdf link: https://arxiv.org/pdf/2405.14725
Abstract Machine learning (ML) algorithms rely primarily on the availability of training data, and, depending on the domain, these data may include sensitive information about the data providers, thus leading to significant privacy issues. Differential privacy (DP) is the predominant solution for privacy-preserving ML, and the local model of DP is the preferred choice when the server or the data collector are not trusted. Recent experimental studies have shown that local DP can impact ML prediction for different subgroups of individuals, thus affecting fair decision-making. However, the results are conflicting in the sense that some studies show a positive impact of privacy on fairness while others show a negative one. In this work, we conduct a systematic and formal study of the effect of local DP on fairness. Specifically, we perform a quantitative study of how the fairness of the decisions made by the ML model changes under local DP for different levels of privacy and data distributions. In particular, we provide bounds in terms of the joint distributions and the privacy level, delimiting the extent to which local DP can impact the fairness of the model. We characterize the cases in which privacy reduces discrimination and those with the opposite effect. We validate our theoretical findings on synthetic and real-world datasets. Our results are preliminary in the sense that, for now, we study only the case of one sensitive attribute, and only statistical disparity, conditional statistical disparity, and equal opportunity difference.
CLIPScope: Enhancing Zero-Shot OOD Detection with Bayesian Scoring
Authors: Hao Fu, Naman Patel, Prashanth Krishnamurthy, Farshad Khorrami
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.14737
Pdf link: https://arxiv.org/pdf/2405.14737
Abstract Detection of out-of-distribution (OOD) samples is crucial for safe real-world deployment of machine learning models. Recent advances in vision language foundation models have made them capable of detecting OOD samples without requiring in-distribution (ID) images. However, these zero-shot methods often underperform as they do not adequately consider ID class likelihoods in their detection confidence scoring. Hence, we introduce CLIPScope, a zero-shot OOD detection approach that normalizes the confidence score of a sample by class likelihoods, akin to a Bayesian posterior update. Furthermore, CLIPScope incorporates a novel strategy to mine OOD classes from a large lexical database. It selects class labels that are farthest and nearest to ID classes in terms of CLIP embedding distance to maximize coverage of OOD samples. We conduct extensive ablation studies and empirical evaluations, demonstrating state of the art performance of CLIPScope across various OOD detection benchmarks.
Iterative Causal Segmentation: Filling the Gap between Market Segmentation and Marketing Strategy
Authors: Kaihua Ding, Jingsong Cui, Mohammad Soltani, Jing Jin
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.14743
Pdf link: https://arxiv.org/pdf/2405.14743
Abstract The field of causal Machine Learning (ML) has made significant strides in recent years. Notable breakthroughs include methods such as meta learners (arXiv:1706.03461v6) and heterogeneous doubly robust estimators (arXiv:2004.14497) introduced in the last five years. Despite these advancements, the field still faces challenges, particularly in managing tightly coupled systems where both the causal treatment variable and a confounding covariate must serve as key decision-making indicators. This scenario is common in applications of causal ML for marketing, such as marketing segmentation and incremental marketing uplift. In this work, we present our formally proven algorithm, iterative causal segmentation, to address this issue.
A Transformer-Based Approach for Smart Invocation of Automatic Code Completion
Authors: Aral de Moor, Arie van Deursen, Maliheh Izadi
Subjects: Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14753
Pdf link: https://arxiv.org/pdf/2405.14753
Abstract Transformer-based language models are highly effective for code completion, with much research dedicated to enhancing the content of these completions. Despite their effectiveness, these models come with high operational costs and can be intrusive, especially when they suggest too often and interrupt developers who are concentrating on their work. Current research largely overlooks how these models interact with developers in practice and neglects to address when a developer should receive completion suggestions. To tackle this issue, we developed a machine learning model that can accurately predict when to invoke a code completion tool given the code context and available telemetry data. To do so, we collect a dataset of 200k developer interactions with our cross-IDE code completion plugin and train several invocation filtering models. Our results indicate that our small-scale transformer model significantly outperforms the baseline while maintaining low enough latency. We further explore the search space for integrating additional telemetry data into a pre-trained transformer directly and obtain promising results. To further demonstrate our approach's practical potential, we deployed the model in an online environment with 34 developers and provided real-world insights based on 74k actual invocations.
Applied Machine Learning to Anomaly Detection in Enterprise Purchase Processes
Authors: A. Herreros-Martínez, R. Magdalena-Benedicto, J. Vila-Francés, A.J. Serrano-López, S. Pérez-Díaz
Subjects: Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14754
Pdf link: https://arxiv.org/pdf/2405.14754
Abstract In a context of a continuous digitalisation of processes, organisations must deal with the challenge of detecting anomalies that can reveal suspicious activities upon an increasing volume of data. To pursue this goal, audit engagements are carried out regularly, and internal auditors and purchase specialists are constantly looking for new methods to automate these processes. This work proposes a methodology to prioritise the investigation of the cases detected in two large purchase datasets from real data. The goal is to contribute to the effectiveness of the companies' control efforts and to increase the performance of carrying out such tasks. A comprehensive Exploratory Data Analysis is carried out before using unsupervised Machine Learning techniques addressed to detect anomalies. A univariate approach has been applied through the z-Score index and the DBSCAN algorithm, while a multivariate analysis is implemented with the k-Means and Isolation Forest algorithms, and the Silhouette index, resulting in each method having a transaction candidates' proposal to be reviewed. An ensemble prioritisation of the candidates is provided jointly with a proposal of explicability methods (LIME, Shapley, SHAP) to help the company specialists in their understanding.
Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training
Authors: Tehila Dahan, Kfir Y. Levy
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.14759
Pdf link: https://arxiv.org/pdf/2405.14759
Abstract In this paper, we investigate the challenging framework of Byzantine-robust training in distributed machine learning (ML) systems, focusing on enhancing both efficiency and practicality. As distributed ML systems become integral for complex ML tasks, ensuring resilience against Byzantine failures-where workers may contribute incorrect updates due to malice or error-gains paramount importance. Our first contribution is the introduction of the Centered Trimmed Meta Aggregator (CTMA), an efficient meta-aggregator that upgrades baseline aggregators to optimal performance levels, while requiring low computational demands. Additionally, we propose harnessing a recently developed gradient estimation technique based on a double-momentum strategy within the Byzantine context. Our paper highlights its theoretical and practical advantages for Byzantine-robust training, especially in simplifying the tuning process and reducing the reliance on numerous hyperparameters. The effectiveness of this technique is supported by theoretical insights within the stochastic convex optimization (SCO) framework.

qiaoyuet / arxiv_daily

New submissions for Friday, 24 May 2024 (showing 955 of 955 entries ) #108

Keyword: differential privacy

A Huber Loss Minimization Approach to Mean Estimation under User-level Differential Privacy

Naturally Private Recommendations with Determinantal Point Processes

Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data

A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness: Preliminary Results

Keyword: privacy

StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems

EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection

Multi-domain Knowledge Graph Collaborative Pre-training and Prompt Tuning for Diverse Downstream Tasks

FedASTA: Federated adaptive spatial-temporal attention for traffic flow prediction

Generating A Crowdsourced Conversation Dataset to Combat Cybergrooming

A Privacy-Preserving DAO Model Using NFT Authentication for the Punishment not Reward Blockchain Architecture

Bytes to Schlep? Use a FEP: Hiding Protocol Metadata with Fully Encrypted Protocols

On the Challenges of Creating Datasets for Analyzing Commercial Sex Advertisements to Assess Human Trafficking Risk and Organized Activity

AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs

FedCache 2.0: Exploiting the Potential of Distilled Data in Knowledge Cache-driven Federated Learning

The Illusion of Anonymity: Uncovering the Impact of User Actions on Privacy in Web3 Social Ecosystems

Task-agnostic Decision Transformer for Multi-type Agent Control with Federated Split Training

A Huber Loss Minimization Approach to Mean Estimation under User-level Differential Privacy

AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning

Naturally Private Recommendations with Determinantal Point Processes

A Privacy Measure Turned Upside Down? Investigating the Use of HTTP Client Hints on the Web

CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models

Guarding Multiple Secrets: Enhanced Summary Statistic Privacy for Data Sharing

Diffusion-Based Cloud-Edge-Device Collaborative Learning for Next POI Recommendations

Federated Learning in Healthcare: Model Misconducts, Security, Challenges, Applications, and Future Research Directions -- A Systematic Review

AI-Protected Blockchain-based IoT environments: Harnessing the Future of Network Security and Privacy

What Do Privacy Advertisements Communicate to Consumers?

Rehearsal-free Federated Domain-incremental Learning

Memory Scraping Attack on Xilinx FPGAs: Private Data Extraction from Terminated Processes

Remote Keylogging Attacks in Multi-user VR Applications

Nearly Tight Black-Box Auditing of Differentially Private Machine Learning

Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data

Time-FFM: Towards LM-Empowered Federated Foundation Model for Time Series Forecasting

Variational Bayes for Federated Continual Learning

EdgeShard: Efficient LLM Inference via Collaborative Edge Computing

Gradient Transformation: Towards Efficient and Model-Agnostic Unlearning for Dynamic Graph Neural Networks

GeoFaaS: An Edge-to-Cloud FaaS Platform

Worldwide Federated Training of Language Models

Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model

A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions

Identity Inference from CLIP Models using Only Textual Data

Towards Privacy-Aware and Personalised Assistive Robots: A User-Centred Approach

PrivCirNet: Efficient Private Inference via Block Circulant Transformation

A Systematic and Formal Study of the Impact of Local Differential Privacy on Fairness: Preliminary Results

Recurrent Early Exits for Federated Learning with Heterogeneous Clients

Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy

Keyword: machine learning

A Systematic Analysis on the Temporal Generalization of Language Models in Social Media

A Machine Learning Approach for Predicting Upper Limb Motion Intentions with Multimodal Data in Virtual Reality

Intelligent Tutor: Leveraging ChatGPT and Microsoft Copilot Studio to Deliver a Generative AI Student Support and Feedback System within Teams

Assessing Political Bias in Large Language Models

Towards Contactless Elevators with TinyML using CNN-based Person Detection and Keyword Spotting

A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis

Combining Relevance and Magnitude for Resource-Aware DNN Pruning

The Role of Emotions in Informational Support Question-Response Pairs in Online Health Communities: A Multimodal Deep Learning Approach

A novel reliability attack of Physical Unclonable Functions

Automated categorization of pre-trained models for software engineering: A case study with a Hugging Face dataset

A machine learning framework for interpretable predictions in patient pathways: The case of predicting ICU admission for patients with symptoms of sepsis

Pragmatic auditing: a pilot-driven approach for auditing Machine Learning systems

BenchNav: Simulation Platform for Benchmarking Off-road Navigation Algorithms with Probabilistic Traversability

FedCache 2.0: Exploiting the Potential of Distilled Data in Knowledge Cache-driven Federated Learning

NFCL: Simply interpretable neural networks for a short-term multivariate forecasting

Why do explanations fail? A typology and discussion on failures in XAI

A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction

Bond Graphs for multi-physics informed Neural Networks for multi-variate time series

Almost sure convergence rates of stochastic gradient methods under gradient domination

Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review

Naturally Private Recommendations with Determinantal Point Processes

Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com

Mining Action Rules for Defect Reduction Planning

A Dynamic Model of Performative Human-ML Collaboration: Theory and Empirical Evidence

Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks

On the stability of second order gradient descent for time varying convex functions

Efficient Two-Stage Gaussian Process Regression Via Automatic Kernel Search and Subsampling

Towards Explainable Test Case Prioritisation with Learning-to-Rank Models

Koopcon: A new approach towards smarter and less complex learning

LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework