New submissions for Tuesday, 11 June 2024 (showing 666 of 666 entries )

Keyword: differential privacy

Efficient Differentially Private Fine-Tuning of Diffusion Models

Authors: Jing Liu, Andrew Lowy, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2406.05257
Pdf link: https://arxiv.org/pdf/2406.05257
Abstract The recent developments of Diffusion Models (DMs) enable generation of astonishingly high-quality synthetic samples. Recent work showed that the synthetic samples generated by the diffusion model, which is pre-trained on public data and fully fine-tuned with differential privacy on private data, can train a downstream classifier, while achieving a good privacy-utility tradeoff. However, fully fine-tuning such large diffusion models with DP-SGD can be very resource-demanding in terms of memory usage and computation. In this work, we investigate Parameter-Efficient Fine-Tuning (PEFT) of diffusion models using Low-Dimensional Adaptation (LoDA) with Differential Privacy. We evaluate the proposed method with the MNIST and CIFAR-10 datasets and demonstrate that such efficient fine-tuning can also generate useful synthetic samples for training downstream classifiers, with guaranteed privacy protection of fine-tuning data. Our source code will be made available on GitHub.
Privacy-Preserving Optimal Parameter Selection for Collaborative Clustering
Authors: Maryam Ghasemian, Erman Ayday
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2406.05545
Pdf link: https://arxiv.org/pdf/2406.05545
Abstract This study investigates the optimal selection of parameters for collaborative clustering while ensuring data privacy. We focus on key clustering algorithms within a collaborative framework, where multiple data owners combine their data. A semi-trusted server assists in recommending the most suitable clustering algorithm and its parameters. Our findings indicate that the privacy parameter ($\epsilon$) minimally impacts the server's recommendations, but an increase in $\epsilon$ raises the risk of membership inference attacks, where sensitive information might be inferred. To mitigate these risks, we implement differential privacy techniques, particularly the Randomized Response mechanism, to add noise and protect data privacy. Our approach demonstrates that high-quality clustering can be achieved while maintaining data confidentiality, as evidenced by metrics such as the Adjusted Rand Index and Silhouette Score. This study contributes to privacy-aware data sharing, optimal algorithm and parameter selection, and effective communication between data owners and the server.
Comments on "Federated Learning with Differential Privacy: Algorithms and Performance Analysis"
Authors: Mahtab Talaei, Iman Izadi
Subjects: Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2406.05858
Pdf link: https://arxiv.org/pdf/2406.05858
Abstract In the paper by Wei et al. ("Federated Learning with Differential Privacy: Algorithms and Performance Analysis"), the convergence performance of the proposed differential privacy algorithm in federated learning (FL), known as Noising before Model Aggregation FL (NbAFL), was studied. However, the presented convergence upper bound of NbAFL (Theorem 2) is incorrect. This comment aims to present the correct form of the convergence upper bound for NbAFL.
Keyword: privacy

oBAKE: an Online Biometric-Authenticated Key Exchange Protocol
Authors: Haochen M. Kotoi-Xie, Takumi Moriyama
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2406.05134
Pdf link: https://arxiv.org/pdf/2406.05134
Abstract In this writing, we introduce a novel biometric-authenticated key exchange protocol that allows secure and privacy-preserving key establishment between a stateless biometric sensing system and a "smart" user token that possesses biometric templates of the user. The protocol yields a shared secret incorporating random nonce from both parties when they positively authenticate each other. Mutual positive authentication here is defined as when the feature vector calculated from the sensor data captured by the biometric sensing system only differs from the feature vector stored as the biometric template within the user token by less than a predefined threshold. The parties exchange only randomized data and cryptographically derived verifiers; no significant information regarding the vectors is ever exchanged. The protocol essentially utilizes the BBKDF scheme for feature vector matching, and as a result, the threshold is compared per component of the two vectors to be matched. This fact makes it straightforward to employ multiple biometric modalities. The protocol also allows online authentication where the biometric sensing system can potentially send multiple queries derived from different sensor data samples, in one or more rounds. The protocol is designed in such a way that the user token can very efficiently answer a multitude of such queries. This makes the protocol especially suitable for interactive systems while posing a minimal computational burden on the user token. The biometric sensing system can be made stateless, i.e. user registration in advance is not required. Furthermore, the protocol is bidirectionally privacy-preserving in the sense that unless mutual authentication is achieved first, neither the biometric sensing system, nor the user token can gain useful information, respectively regarding the biometric template, or sensor-data-derived feature vectors.
CredSec: A Blockchain-based Secure Credential Management System for University Adoption
Authors: Md. Ahsan Habib, Md. Mostafijur Rahman, Nieb Hasan Neom
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2406.05151
Pdf link: https://arxiv.org/pdf/2406.05151
Abstract University education play a critical role in shaping intellectual and professional development of the individuals and contribute significantly to the advancement of knowledge and society. Generally, university authority has a direct control of students result making and stores the credential in their local dedicated server. So, there is chance to alter the credential and also have a very high possibility to encounter various threats and different security attacks. To resolve these, we propose a blockchain based secure credential management system (BCMS) for efficiently storing, managing and recovering credential without involving the university authority. The proposed BCMS incorporates a modified two factor encryption (m2FE) technique, a combination of RSA cryptosystem and a DNA encoding to ensure credential privacy and an enhanced authentication scheme for teachers and students. Besides, to reduce size of the cipher credential and its conversion time, we use character to integer (C2I) table instead of ASCII table. Finally, the experimental result and analysis of the BCMS illustrate the effectiveness over state of the art works.
Federated LoRA with Sparse Communication
Authors: Kevin Kuo, Arian Raje, Kousik Rajesh, Virginia Smith
Subjects: Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2406.05233
Pdf link: https://arxiv.org/pdf/2406.05233
Abstract Low-rank adaptation (LoRA) is a natural method for finetuning in communication-constrained machine learning settings such as cross-device federated learning. Prior work that has studied LoRA in the context of federated learning has focused on improving LoRA's robustness to heterogeneity and privacy. In this work, we instead consider techniques for further improving communication-efficiency in federated LoRA. Unfortunately, we show that centralized ML methods that improve the efficiency of LoRA through unstructured pruning do not transfer well to federated settings. We instead study a simple approach, \textbf{FLASC}, that applies sparsity to LoRA during communication while allowing clients to locally fine-tune the entire LoRA module. Across four common federated learning tasks, we demonstrate that this method matches the performance of dense LoRA with up to $10\times$ less communication. Additionally, despite being designed primarily to target communication, we find that this approach has benefits in terms of heterogeneity and privacy relative to existing approaches tailored to these specific concerns. Overall, our work highlights the importance of considering system-specific constraints when developing communication-efficient finetuning approaches, and serves as a simple and competitive baseline for future work in federated finetuning.
A Language Model-Guided Framework for Mining Time Series with Distributional Shifts
Authors: Haibei Zhu, Yousef El-Laham, Elizabeth Fons, Svitlana Vyetrenko
Subjects: Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2406.05249
Pdf link: https://arxiv.org/pdf/2406.05249
Abstract Effective utilization of time series data is often constrained by the scarcity of data quantity that reflects complex dynamics, especially under the condition of distributional shifts. Existing datasets may not encompass the full range of statistical properties required for robust and comprehensive analysis. And privacy concerns can further limit their accessibility in domains such as finance and healthcare. This paper presents an approach that utilizes large language models and data source interfaces to explore and collect time series datasets. While obtained from external sources, the collected data share critical statistical properties with primary time series datasets, making it possible to model and adapt to various scenarios. This method enlarges the data quantity when the original data is limited or lacks essential properties. It suggests that collected datasets can effectively supplement existing datasets, especially involving changes in data distribution. We demonstrate the effectiveness of the collected datasets through practical examples and show how time series forecasting foundation models fine-tuned on these datasets achieve comparable performance to those models without fine-tuning.
Efficient Differentially Private Fine-Tuning of Diffusion Models
Authors: Jing Liu, Andrew Lowy, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2406.05257
Pdf link: https://arxiv.org/pdf/2406.05257
Abstract The recent developments of Diffusion Models (DMs) enable generation of astonishingly high-quality synthetic samples. Recent work showed that the synthetic samples generated by the diffusion model, which is pre-trained on public data and fully fine-tuned with differential privacy on private data, can train a downstream classifier, while achieving a good privacy-utility tradeoff. However, fully fine-tuning such large diffusion models with DP-SGD can be very resource-demanding in terms of memory usage and computation. In this work, we investigate Parameter-Efficient Fine-Tuning (PEFT) of diffusion models using Low-Dimensional Adaptation (LoDA) with Differential Privacy. We evaluate the proposed method with the MNIST and CIFAR-10 datasets and demonstrate that such efficient fine-tuning can also generate useful synthetic samples for training downstream classifiers, with guaranteed privacy protection of fine-tuning data. Our source code will be made available on GitHub.
Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Authors: Kalyan Nakka, Jimmy Dani, Nitesh Saxena
Subjects: Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2406.05364
Pdf link: https://arxiv.org/pdf/2406.05364
Abstract In this paper, we present a very first study to investigate trust and ethical implications of on-device artificial intelligence (AI), focusing on ''small'' language models (SLMs) amenable for personal devices like smartphones. While on-device SLMs promise enhanced privacy, reduced latency, and improved user experience compared to cloud-based services, we posit that they might also introduce significant challenges and vulnerabilities compared to on-server counterparts. As part of our trust assessment study, we conduct a systematic evaluation of the state-of-the-art on-devices SLMs, contrasted to their on-server counterparts, based on a well-established trustworthiness measurement framework. Our results show on-device SLMs to be (statistically) significantly less trustworthy, specifically demonstrating more stereotypical, unfair and privacy-breaching behavior. Informed by these findings, we then perform our ethics assessment study by inferring whether SLMs would provide responses to potentially unethical vanilla prompts, collated from prior jailbreaking and prompt engineering studies and other sources. Strikingly, the on-device SLMs did answer valid responses to these prompts, which ideally should be rejected. Even more seriously, the on-device SLMs responded with valid answers without any filters and without the need for any jailbreaking or prompt engineering. These responses can be abused for various harmful and unethical scenarios including: societal harm, illegal activities, hate, self-harm, exploitable phishing content and exploitable code, all of which indicates the high vulnerability and exploitability of these on-device SLMs. Overall, our findings highlight gaping vulnerabilities in state-of-the-art on-device AI which seem to stem from resource constraints faced by these models and which may make typical defenses fundamentally challenging to be deployed in these environments.
PTF-FSR: A Parameter Transmission-Free Federated Sequential Recommender System
Authors: Wei Yuan, Chaoqun Yang, Liang Qu, Quoc Viet Hung Nguyen, Guanhua Ye, Hongzhi Yin
Subjects: Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2406.05387
Pdf link: https://arxiv.org/pdf/2406.05387
Abstract Sequential recommender systems have made significant progress. Recently, due to increasing concerns about user data privacy, some researchers have implemented federated learning for sequential recommendation, a.k.a., Federated Sequential Recommender Systems (FedSeqRecs), in which a public sequential recommender model is shared and frequently transmitted between a central server and clients to achieve collaborative learning. Although these solutions mitigate user privacy to some extent, they present two significant limitations that affect their practical usability: (1) They require a globally shared sequential recommendation model. However, in real-world scenarios, the recommendation model constitutes a critical intellectual property for platform and service providers. Therefore, service providers may be reluctant to disclose their meticulously developed models. (2) The communication costs are high as they correlate with the number of model parameters. This becomes particularly problematic as the current FedSeqRec will be inapplicable when sequential recommendation marches into a large language model era. To overcome the above challenges, this paper proposes a parameter transmission-free federated sequential recommendation framework (PTF-FSR), which ensures both model and data privacy protection to meet the privacy needs of service providers and system users alike. Furthermore, since PTF-FSR only transmits prediction results under privacy protection, which are independent of model sizes, this new federated learning architecture can accommodate more complex and larger sequential recommendation models. Extensive experiments conducted on three widely used recommendation datasets, employing various sequential recommendation models from both ID-based and ID-free paradigms, demonstrate the effectiveness and generalization capability of our proposed framework.
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas
Authors: Chengyuan Deng, Yiqun Duan, Xin Jin, Heng Chang, Yijun Tian, Han Liu, Henry Peng Zou, Yiqiao Jin, Yijia Xiao, Yichen Wang, Shenghao Wu, Zongxing Xie, Kuofeng Gao, Sihong He, Jun Zhuang, Lu Cheng, Haohan Wang
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.05392
Pdf link: https://arxiv.org/pdf/2406.05392
Abstract Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, and data privacy, to emerging problems like truthfulness and social norms. We critically analyze existing research aimed at understanding, examining, and mitigating these ethical risks. Our survey underscores integrating ethical standards and societal values into the development of LLMs, thereby guiding the development of responsible and ethically aligned language models.
PrivacyCube: Data Physicalization for Enhancing Privacy Awareness in IoT
Authors: Bayan Al Muhander, Nalin Arachchilage, Yasar Majib, Mohammed Alosaimi, Omer Rana, Charith Perera
Subjects: Subjects: Cryptography and Security (cs.CR); Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2406.05451
Pdf link: https://arxiv.org/pdf/2406.05451
Abstract People are increasingly bringing Internet of Things (IoT) devices into their homes without understanding how their data is gathered, processed, and used. We describe PrivacyCube, a novel data physicalization designed to increase privacy awareness within smart home environments. PrivacyCube visualizes IoT data consumption by displaying privacy-related notices. PrivacyCube aims to assist smart home occupants to (i) understand their data privacy better and (ii) have conversations around data management practices of IoT devices used within their homes. Using PrivacyCube, households can learn and make informed privacy decisions collectively. To evaluate PrivacyCube, we used multiple research methods throughout the different stages of design. We first conducted a focus group study in two stages with six participants to compare PrivacyCube to text and state-of-the-art privacy policies. We then deployed PrivacyCube in a 14-day-long field study with eight households. Our results show that PrivacyCube helps home occupants comprehend IoT privacy better with significantly increased privacy awareness at p < .05 (p=0.00041, t= -5.57). Participants preferred PrivacyCube over text privacy policies because it was comprehensive and easier to use. PrivacyCube and Privacy Label, a state-of-the-art approach, both received positive reviews from participants, with PrivacyCube being preferred for its interactivity and ability to encourage conversations. PrivacyCube was also considered by home occupants as a piece of home furniture, encouraging them to socialize and discuss IoT privacy implications using this device.
PriviFy: Designing Tangible Interfaces for Configuring IoT Privacy Preferences
Authors: Bayan Al Muhander, Omer Rana, Charith Perera
Subjects: Subjects: Cryptography and Security (cs.CR); Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2406.05459
Pdf link: https://arxiv.org/pdf/2406.05459
Abstract The Internet of Things (IoT) devices, such as smart speakers can collect sensitive user data, necessitating the need for users to manage their privacy preferences. However, configuring these preferences presents users with multiple challenges. Existing privacy controls often lack transparency, are hard to understand, and do not provide meaningful choices. On top of that, users struggle to locate privacy settings due to multiple menus or confusing labeling, which discourages them from using these controls. We introduce PriviFy (Privacy Simplify-er), a novel and user-friendly tangible interface that can simplify the configuration of smart devices privacy settings. PriviFy is designed to propose an enhancement to existing hardware by integrating additional features that improve privacy management. We envision that positive feedback and user experiences from our study will inspire consumer product developers and smart device manufacturers to incorporate the useful design elements we have identified. Using fidelity prototyping, we iteratively designed PriviFy prototype with 20 participants to include interactive features such as knobs, buttons, lights, and notifications that allow users to configure their data privacy preferences and receive confirmation of their choices. We further evaluated PriviFy high-fidelity prototype with 20 more participants. Our results show that PriviFy helps simplify the complexity of privacy preferences configuration with a significant usability score at p < .05 (P = 0.000000017, t = -8.8639). PriviFy successfully met users privacy needs and enabled them to regain control over their data. We conclude by recommending the importance of designing specific privacy configuration options.
Blockchain Integrated Federated Learning in Edge-Fog-Cloud Systems for IoT based Healthcare Applications A Survey
Authors: Shinu M. Rajagopal, Supriya M., Rajkumar Buyya
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2406.05517
Pdf link: https://arxiv.org/pdf/2406.05517
Abstract Modern Internet of Things (IoT) applications generate enormous amounts of data, making data-driven machine learning essential for developing precise and reliable statistical models. However, data is often stored in silos, and strict user-privacy legislation complicates data utilization, limiting machine learning's potential in traditional centralized paradigms due to diverse data probability distributions and lack of personalization. Federated learning, a new distributed paradigm, supports collaborative learning while preserving privacy, making it ideal for IoT applications. By employing cryptographic techniques, IoT systems can securely store and transmit data, ensuring consistency. The integration of federated learning and blockchain is particularly advantageous for handling sensitive data, such as in healthcare. Despite the potential of these technologies, a comprehensive examination of their integration in edge-fog-cloud-based IoT computing systems and healthcare applications is needed. This survey article explores the architecture, structure, functions, and characteristics of federated learning and blockchain, their applications in various computing paradigms, and evaluates their implementations in healthcare.
Privacy-Preserving Optimal Parameter Selection for Collaborative Clustering
Authors: Maryam Ghasemian, Erman Ayday
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2406.05545
Pdf link: https://arxiv.org/pdf/2406.05545
Abstract This study investigates the optimal selection of parameters for collaborative clustering while ensuring data privacy. We focus on key clustering algorithms within a collaborative framework, where multiple data owners combine their data. A semi-trusted server assists in recommending the most suitable clustering algorithm and its parameters. Our findings indicate that the privacy parameter ($\epsilon$) minimally impacts the server's recommendations, but an increase in $\epsilon$ raises the risk of membership inference attacks, where sensitive information might be inferred. To mitigate these risks, we implement differential privacy techniques, particularly the Randomized Response mechanism, to add noise and protect data privacy. Our approach demonstrates that high-quality clustering can be achieved while maintaining data confidentiality, as evidenced by metrics such as the Adjusted Rand Index and Silhouette Score. This study contributes to privacy-aware data sharing, optimal algorithm and parameter selection, and effective communication between data owners and the server.
CCSI: Continual Class-Specific Impression for Data-free Class Incremental Learning
Authors: Sana Ayromlou, Teresa Tsang, Purang Abolmaesumi, Xiaoxiao Li
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2406.05631
Pdf link: https://arxiv.org/pdf/2406.05631
Abstract In real-world clinical settings, traditional deep learning-based classification methods struggle with diagnosing newly introduced disease types because they require samples from all disease classes for offline training. Class incremental learning offers a promising solution by adapting a deep network trained on specific disease classes to handle new diseases. However, catastrophic forgetting occurs, decreasing the performance of earlier classes when adapting the model to new data. Prior proposed methodologies to overcome this require perpetual storage of previous samples, posing potential practical concerns regarding privacy and storage regulations in healthcare. To this end, we propose a novel data-free class incremental learning framework that utilizes data synthesis on learned classes instead of data storage from previous classes. Our key contributions include acquiring synthetic data known as Continual Class-Specific Impression (CCSI) for previously inaccessible trained classes and presenting a methodology to effectively utilize this data for updating networks when introducing new classes. We obtain CCSI by employing data inversion over gradients of the trained classification model on previous classes starting from the mean image of each class inspired by common landmarks shared among medical images and utilizing continual normalization layers statistics as a regularizer in this pixel-wise optimization process. Subsequently, we update the network by combining the synthesized data with new class data and incorporate several losses, including an intra-domain contrastive loss to generalize the deep network trained on the synthesized data to real data, a margin loss to increase separation among previous classes and new ones, and a cosine-normalized cross-entropy loss to alleviate the adverse effects of imbalanced distributions in training data.
A Superalignment Framework in Autonomous Driving with Large Language Models
Authors: Xiangrui Kong, Thomas Braunl, Marco Fahmi, Yue Wang
Subjects: Subjects: Robotics (cs.RO); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2406.05651
Pdf link: https://arxiv.org/pdf/2406.05651
Abstract Over the last year, significant advancements have been made in the realms of large language models (LLMs) and multi-modal large language models (MLLMs), particularly in their application to autonomous driving. These models have showcased remarkable abilities in processing and interacting with complex information. In autonomous driving, LLMs and MLLMs are extensively used, requiring access to sensitive vehicle data such as precise locations, images, and road conditions. These data are transmitted to an LLM-based inference cloud for advanced analysis. However, concerns arise regarding data security, as the protection against data and privacy breaches primarily depends on the LLM's inherent security measures, without additional scrutiny or evaluation of the LLM's inference outputs. Despite its importance, the security aspect of LLMs in autonomous driving remains underexplored. Addressing this gap, our research introduces a novel security framework for autonomous vehicles, utilizing a multi-agent LLM approach. This framework is designed to safeguard sensitive information associated with autonomous vehicles from potential leaks, while also ensuring that LLM outputs adhere to driving regulations and align with human values. It includes mechanisms to filter out irrelevant queries and verify the safety and reliability of LLM outputs. Utilizing this framework, we evaluated the security, privacy, and cost aspects of eleven large language model-driven autonomous driving cues. Additionally, we performed QA tests on these driving prompts, which successfully demonstrated the framework's efficacy.
Region of Interest Loss for Anonymizing Learned Image Compression
Authors: Christoph Liebender, Ranulfo Bezerra, Kazunori Ohno, Satoshi Tadokoro
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2406.05726
Pdf link: https://arxiv.org/pdf/2406.05726
Abstract The use of AI in public spaces continually raises concerns about privacy and the protection of sensitive data. An example is the deployment of detection and recognition methods on humans, where images are provided by surveillance cameras. This results in the acquisition of great amounts of sensitive data, since the capture and transmission of images taken by such cameras happens unaltered, for them to be received by a server on the network. However, many applications do not explicitly require the identity of a given person in a scene; An anonymized representation containing information of the person's position while preserving the context of them in the scene suffices. We show how using a customized loss function on region of interests (ROI) can achieve sufficient anonymization such that human faces become unrecognizable while persons are kept detectable, by training an end-to-end optimized autoencoder for learned image compression that utilizes the flexibility of the learned analysis and reconstruction transforms for the task of mutating parts of the compression result. This approach enables compression and anonymization in one step on the capture device, instead of transmitting sensitive, nonanonymized data over the network. Additionally, we evaluate how this anonymization impacts the average precision of pre-trained foundation models on detecting faces (MTCNN) and humans (YOLOv8) in comparison to non-ANN based methods, while considering compression rate and latency.
Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance
Authors: Zhan Zhang, Qin Zhang, Yang Jiao, Lin Lu, Lin Ma, Aihua Liu, Xiao Liu, Juan Zhao, Yajun Xue, Bing Wei, Mingxia Zhang, Ru Gao, Hong Zhao, Jie Lu, Fan Li, Yang Zhang, Yiming Wang, Lei Zhang, Fengwei Tian, Jie Hu, Xin Gou
Subjects: Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.05746
Pdf link: https://arxiv.org/pdf/2406.05746
Abstract AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost and high energy consumption. Through close collaboration between clinical experts and DUCG technicians, 46 DUCG models covering 54 chief complaints were constructed. Over 1,000 diseases can be diagnosed without triage. Before being applied in real-world, the 46 DUCG models were retrospectively verified by third-party hospitals. The verified diagnostic precisions were no less than 95%, in which the diagnostic precision for every disease including uncommon ones was no less than 80%. After verifications, the 46 DUCG models were applied in the real-world in China. Over one million real diagnosis cases have been performed, with only 17 incorrect diagnoses identified. Due to DUCG's transparency, the mistakes causing the incorrect diagnoses were found and corrected. The diagnostic abilities of the clinicians who applied DUCG frequently were improved significantly. Following the introduction to the earlier presented DUCG methodology, the recommendation algorithm for potential medical checks is presented and the key idea of DUCG is extracted.
Comments on "Federated Learning with Differential Privacy: Algorithms and Performance Analysis"
Authors: Mahtab Talaei, Iman Izadi
Subjects: Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2406.05858
Pdf link: https://arxiv.org/pdf/2406.05858
Abstract In the paper by Wei et al. ("Federated Learning with Differential Privacy: Algorithms and Performance Analysis"), the convergence performance of the proposed differential privacy algorithm in federated learning (FL), known as Noising before Model Aggregation FL (NbAFL), was studied. However, the presented convergence upper bound of NbAFL (Theorem 2) is incorrect. This comment aims to present the correct form of the convergence upper bound for NbAFL.
Source -Free Domain Adaptation for Speaker Verification in Data-Scarce Languages and Noisy Channels
Authors: Shlomo Salo Elia, Aviad Malachi, Vered Aharonson, Gadi Pinkas
Subjects: Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2406.05863
Pdf link: https://arxiv.org/pdf/2406.05863
Abstract Domain adaptation is often hampered by exceedingly small target datasets and inaccessible source data. These conditions are prevalent in speech verification, where privacy policies and/or languages with scarce speech resources limit the availability of sufficient data. This paper explored techniques of sourcefree domain adaptation unto a limited target speech dataset for speaker verificationin data-scarce languages. Both language and channel mis-match between source and target were investigated. Fine-tuning methods were evaluated and compared across different sizes of labeled target data. A novel iterative cluster-learn algorithm was studied for unlabeled target datasets.
Jailbreaking Quantum Computers
Authors: Chuanqi Xu, Jakub Szefer
Subjects: Subjects: Cryptography and Security (cs.CR); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2406.05941
Pdf link: https://arxiv.org/pdf/2406.05941
Abstract This work presented the first thorough exploration of the attacks on the interface between gate-level and pulse-level quantum circuits and pulse-level quantum circuits themselves. Typically, quantum circuits and programs that execute on quantum computers, are defined using gate-level primitives. However, to improve the expressivity of quantum circuits and to allow better optimization, pulse-level circuits are now often used. The attacks presented in this work leverage the inconsistency between the gate-level description of the custom gate, and the actual, low-level pulse implementation of this gate. By manipulating the custom gate specification, this work proposes numerous attacks: qubit plunder, qubit block, qubit reorder, timing mismatch, frequency mismatch, phase mismatch, and waveform mismatch. This work demonstrates these attacks on the real quantum computer and simulator, and shows that most current software development kits are vulnerable to these new types of attacks. In the end, this work proposes a defense framework. The exploration of security and privacy issues of the rising pulse-level quantum circuits provides insight into the future development of secure quantum software development kits and quantum computer systems.
Dynamic Virtual Power Plants With Frequency Regulation Capacity
Authors: Xiang Zhu, Guangchun Ruan, Hua Geng
Subjects: Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2406.05976
Pdf link: https://arxiv.org/pdf/2406.05976
Abstract For integrating heterogeneous distributed energy resources to provide fast frequency regulation, this paper proposes a dynamic virtual power plant~(DVPP) with frequency regulation capacity. A parameter anonymity-based approach is established for DVPP aggregating small-scaled inverter-based resources~(IBRs) with privacy concerns. On this basis, a parameter-to-performance mapping is formulated to evaluate how control coefficients impact the DVPP-level power overshoot as well as the IBR-level costs. The objective is to design the best way to provide the frequency response with minimal impacts on grid and the most financial gains. Numerical experiments illustrate the effectiveness of the proposed approach and further analysis validates that our models are able to take dead bands into consideration.
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Authors: Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao
Subjects: Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2406.06007
Pdf link: https://arxiv.org/pdf/2406.06007
Abstract Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in this https URL.
The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models
Authors: Ryosuke Takahashi, Go Kamoda, Benjamin Heinzerling, Keisuke Sakaguchi, Kentaro Inui
Subjects: Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2406.06032
Pdf link: https://arxiv.org/pdf/2406.06032
Abstract Language models (LMs) encode world knowledge in their internal parameters through training. However, LMs may learn personal and confidential information from the training data, leading to privacy concerns such as data leakage. Therefore, research on knowledge deletion from LMs is essential. This study focuses on the knowledge stored in LMs and analyzes the relationship between the side effects of knowledge deletion and the entities related to the knowledge. Our findings reveal that deleting knowledge related to popular entities can have catastrophic side effects. Furthermore, this research is the first to analyze knowledge deletion in models trained on synthetic knowledge graphs, indicating a new direction for controlled experiments.
A Survey on Machine Unlearning: Techniques and New Emerged Privacy Risks
Authors: Hengzhu Liu, Ping Xiong, Tianqing Zhu, Philip S. Yu
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2406.06186
Pdf link: https://arxiv.org/pdf/2406.06186
Abstract The explosive growth of machine learning has made it a critical infrastructure in the era of artificial intelligence. The extensive use of data poses a significant threat to individual privacy. Various countries have implemented corresponding laws, such as GDPR, to protect individuals' data privacy and the right to be forgotten. This has made machine unlearning a research hotspot in the field of privacy protection in recent years, with the aim of efficiently removing the contribution and impact of individual data from trained models. The research in academia on machine unlearning has continuously enriched its theoretical foundation, and many methods have been proposed, targeting different data removal requests in various application scenarios. However, recently researchers have found potential privacy leakages of various of machine unlearning approaches, making the privacy preservation on machine unlearning area a critical topic. This paper provides an overview and analysis of the existing research on machine unlearning, aiming to present the current vulnerabilities of machine unlearning approaches. We analyze privacy risks in various aspects, including definitions, implementation methods, and real-world applications. Compared to existing reviews, we analyze the new challenges posed by the latest malicious attack techniques on machine unlearning from the perspective of privacy threats. We hope that this survey can provide an initial but comprehensive discussion on this new emerging area.
Federated learning in food research
Authors: Zuzanna Fendor, Bas H.M. van der Velden, Xinxin Wang, Andrea Jr. Carnoli, Osman Mutlu, Ali Hürriyetoğlu
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.06202
Pdf link: https://arxiv.org/pdf/2406.06202
Abstract Research in the food domain is at times limited due to data sharing obstacles, such as data ownership, privacy requirements, and regulations. While important, these obstacles can restrict data-driven methods such as machine learning. Federated learning, the approach of training models on locally kept data and only sharing the learned parameters, is a potential technique to alleviate data sharing obstacles. This systematic review investigates the use of federated learning within the food domain, structures included papers in a federated learning framework, highlights knowledge gaps, and discusses potential applications. A total of 41 papers were included in the review. The current applications include solutions to water and milk quality assessment, cybersecurity of water processing, pesticide residue risk analysis, weed detection, and fraud detection, focusing on centralized horizontal federated learning. One of the gaps found was the lack of vertical or transfer federated learning and decentralized architectures.
A LoRa-based Energy-efficient Sensing System for Urban Data Collection
Authors: Lukas Schulthess, Tiago Salzmann, Christian Vogt, Michele Magno
Subjects: Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2406.06404
Pdf link: https://arxiv.org/pdf/2406.06404
Abstract Nowadays, cities provide much more than shopping opportunities or working spaces. Individual locations such as parks and squares are used as meeting points and local recreation areas by many people. To ensure that they remain attractive in the future, the design of such squares must be regularly adapted to the needs of the public. These utilization trends can be derived using public data collection. The more diverse and rich the data sets are, the easier it is to optimize public space design through data analysis. Traditional data collection methods such as questionnaires, observations, or videos are either labor intensive or cannot guarantee to preserve the individual's privacy. This work presents a privacy-preserving, low-power, and low-cost smart sensing system that is capable of anonymously collecting data about public space utilization by analyzing the occupancy distribution of public seating. To support future urban planning the sensor nodes are capable of monitoring environmental noise, chair utilization, and their position, temperature, and humidity and provide them over a city-wide Long Range Wide Area Network (LoRaWAN). The final sensing system's robust operation is proven in a trial run at two public squares in a city with 16 sensor nodes over a duration of two months. By consuming 33.65 mWh per day with all subsystems enabled, including sitting detection based on a continuous acceleration measurement operating on a robust and simple threshold algorithm, the custom-designed sensor node achieves continuous monitoring during the 2-month trial run. The evaluation of the experimental results clearly shows how the two locations are used, which confirms the practicability of the proposed solution. All data collected during the field trial is publicly available as open data.
DiffAudit: Auditing Privacy Practices of Online Services for Children and Adolescents
Authors: Olivia Figueira, Rahmadi Trimananda, Athina Markopoulou, Scott Jordan
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2406.06473
Pdf link: https://arxiv.org/pdf/2406.06473
Abstract Children's and adolescents' online data privacy are regulated by laws such as the Children's Online Privacy Protection Act (COPPA) and the California Consumer Privacy Act (CCPA). Online services that are directed towards general audiences (i.e., including children, adolescents, and adults) must comply with these laws. In this paper, first, we present DiffAudit, a platform-agnostic privacy auditing methodology for general audience services. DiffAudit performs differential analysis of network traffic data flows to compare data processing practices (i) between child, adolescent, and adult users and (ii) before and after consent is given and user age is disclosed. We also present a data type classification method that utilizes GPT-4 and our data type ontology based on COPPA and CCPA, allowing us to identify considerably more data types than prior work. Second, we apply DiffAudit to a set of popular general audience mobile and web services and observe a rich set of behaviors extracted from over 440K outgoing requests, containing 3,968 unique data types we extracted and classified. We reveal problematic data processing practices prior to consent and age disclosure, lack of differentiation between age-specific data flows, inconsistent privacy policy disclosures, and sharing of linkable data with third parties, including advertising and tracking services.
OmniLytics+: A Secure, Efficient, and Affordable Blockchain Data Market for Machine Learning through Off-Chain Processing
Authors: Songze Li, Mingzhe Liu, Mengqi Chen
Subjects: Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.06477
Pdf link: https://arxiv.org/pdf/2406.06477
Abstract The rapid development of large machine learning (ML) models requires a massive amount of training data, resulting in booming demands of data sharing and trading through data markets. Traditional centralized data markets suffer from low level of security, and emerging decentralized platforms are faced with efficiency and privacy challenges. In this paper, we propose OmniLytics+, the first decentralized data market, built upon blockchain and smart contract technologies, to simultaneously achieve 1) data (resp., model) privacy for the data (resp. model) owner; 2) robustness against malicious data owners; 3) efficient data validation and aggregation. Specifically, adopting the zero-knowledge (ZK) rollup paradigm, OmniLytics+ proposes to secret share encrypted local gradients, computed from the encrypted global model, with a set of untrusted off-chain servers, who collaboratively generate a ZK proof on the validity of the gradient. In this way, the storage and processing overheads are securely offloaded from blockchain verifiers, significantly improving the privacy, efficiency, and affordability over existing rollup solutions. We implement the proposed OmniLytics+ data market as an Ethereum smart contract [41]. Extensive experiments demonstrate the effectiveness of OmniLytics+ in training large ML models in presence of malicious data owner, and the substantial advantages of OmniLytics+ in gas cost and execution time over baselines.
Keyword: machine learning

Evaluating the Effectiveness of Data Augmentation for Emotion Classification in Low-Resource Settings
Authors: Aashish Arora, Elsbeth Turcan
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2406.05190
Pdf link: https://arxiv.org/pdf/2406.05190
Abstract Data augmentation has the potential to improve the performance of machine learning models by increasing the amount of training data available. In this study, we evaluated the effectiveness of different data augmentation techniques for a multi-label emotion classification task using a low-resource dataset. Our results showed that Back Translation outperformed autoencoder-based approaches and that generating multiple examples per training instance led to further performance improvement. In addition, we found that Back Translation generated the most diverse set of unigrams and trigrams. These findings demonstrate the utility of Back Translation in enhancing the performance of emotion classification models in resource-limited situations.
Federated LoRA with Sparse Communication
Authors: Kevin Kuo, Arian Raje, Kousik Rajesh, Virginia Smith
Subjects: Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2406.05233
Pdf link: https://arxiv.org/pdf/2406.05233
Abstract Low-rank adaptation (LoRA) is a natural method for finetuning in communication-constrained machine learning settings such as cross-device federated learning. Prior work that has studied LoRA in the context of federated learning has focused on improving LoRA's robustness to heterogeneity and privacy. In this work, we instead consider techniques for further improving communication-efficiency in federated LoRA. Unfortunately, we show that centralized ML methods that improve the efficiency of LoRA through unstructured pruning do not transfer well to federated settings. We instead study a simple approach, \textbf{FLASC}, that applies sparsity to LoRA during communication while allowing clients to locally fine-tune the entire LoRA module. Across four common federated learning tasks, we demonstrate that this method matches the performance of dense LoRA with up to $10\times$ less communication. Additionally, despite being designed primarily to target communication, we find that this approach has benefits in terms of heterogeneity and privacy relative to existing approaches tailored to these specific concerns. Overall, our work highlights the importance of considering system-specific constraints when developing communication-efficient finetuning approaches, and serves as a simple and competitive baseline for future work in federated finetuning.
Automated Trustworthiness Testing for Machine Learning Classifiers
Authors: Steven Cho, Seaton Cousins-Baxter, Stefano Ruberto, Valerio Terragni
Subjects: Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2406.05251
Pdf link: https://arxiv.org/pdf/2406.05251
Abstract Machine Learning (ML) has become an integral part of our society, commonly used in critical domains such as finance, healthcare, and transportation. Therefore, it is crucial to evaluate not only whether ML models make correct predictions but also whether they do so for the correct reasons, ensuring our trust that will perform well on unseen data. This concept is known as trustworthiness in ML. Recently, explainable techniques (e.g., LIME, SHAP) have been developed to interpret the decision-making processes of ML models, providing explanations for their predictions (e.g., words in the input that influenced the prediction the most). Assessing the plausibility of these explanations can enhance our confidence in the models' trustworthiness. However, current approaches typically rely on human judgment to determine the plausibility of these explanations. This paper proposes TOWER, the first technique to automatically create trustworthiness oracles that determine whether text classifier predictions are trustworthy. It leverages word embeddings to automatically evaluate the trustworthiness of a model-agnostic text classifiers based on the outputs of explanatory techniques. Our hypothesis is that a prediction is trustworthy if the words in its explanation are semantically related to the predicted class. We perform unsupervised learning with untrustworthy models obtained from noisy data to find the optimal configuration of TOWER. We then evaluated TOWER on a human-labeled trustworthiness dataset that we created. The results show that TOWER can detect a decrease in trustworthiness as noise increases, but is not effective when evaluated against the human-labeled dataset. Our initial experiments suggest that our hypothesis is valid and promising, but further research is needed to better understand the relationship between explanations and trustworthiness issues.
Investigating Memory Failure Prediction Across CPU Architectures
Authors: Qiao Yu, Wengui Zhang, Min Zhou, Jialiang Yu, Zhenli Sheng, Jasmin Bogatinovski, Jorge Cardoso, Odej Kao
Subjects: Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2406.05354
Pdf link: https://arxiv.org/pdf/2406.05354
Abstract Large-scale datacenters often experience memory failures, where Uncorrectable Errors (UEs) highlight critical malfunction in Dual Inline Memory Modules (DIMMs). Existing approaches primarily utilize Correctable Errors (CEs) to predict UEs, yet they typically neglect how these errors vary between different CPU architectures, especially in terms of Error Correction Code (ECC) applicability. In this paper, we investigate the correlation between CEs and UEs across different CPU architectures, including X86 and ARM. Our analysis identifies unique patterns of memory failure associated with each processor platform. Leveraging Machine Learning (ML) techniques on production datasets, we conduct the memory failure prediction in different processors' platforms, achieving up to 15% improvements in F1-score compared to the existing algorithm. Finally, an MLOps (Machine Learning Operations) framework is provided to consistently improve the failure prediction in the production environment.
Design of reliable technology valuation model with calibrated machine learning of patent indicators
Authors: Seunghyun Lee, Janghyeok Yoon, Jaewoong Choi
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2406.05446
Pdf link: https://arxiv.org/pdf/2406.05446
Abstract Machine learning (ML) has revolutionized the digital transformation of technology valuation by predicting the value of patents with high accuracy. However, the lack of validation regarding the reliability of these models hinders experts from fully trusting the confidence of model predictions. To address this issue, we propose an analytical framework for reliable technology valuation using calibrated ML models, which provide robust confidence levels in model predictions. We extract quantitative patent indicators that represent various technology characteristics as input data, using the patent maintenance period as a proxy for technology values. Multiple ML models are developed to capture the nonlinear relationship between patent indicators and technology value. The reliability and accuracy of these models are evaluated, presenting a Pareto-front map where the expected calibration error, Matthews correlation coefficient and F1-scores are compared. After identifying the best-performing model, we apply SHapley Additive exPlanation (SHAP) analysis to pinpoint the most significant input features by confidence bin. Through a case study, we confirmed that the proposed approach offers a practical guideline for developing reliable and accurate ML-based technology valuation models, with significant implications for both academia and industry.
RandONet: Shallow-Networks with Random Projections for learning linear and nonlinear operators
Authors: Gianluca Fabiani, Ioannis G. Kevrekidis, Constantinos Siettos, Athanasios N. Yannacopoulos
Subjects: Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2406.05470
Pdf link: https://arxiv.org/pdf/2406.05470
Abstract Deep Operator Networks (DeepOnets) have revolutionized the domain of scientific machine learning for the solution of the inverse problem for dynamical systems. However, their implementation necessitates optimizing a high-dimensional space of parameters and hyperparameters. This fact, along with the requirement of substantial computational resources, poses a barrier to achieving high numerical accuracy. Here, inpsired by DeepONets and to address the above challenges, we present Random Projection-based Operator Networks (RandONets): shallow networks with random projections that learn linear and nonlinear operators. The implementation of RandONets involves: (a) incorporating random bases, thus enabling the use of shallow neural networks with a single hidden layer, where the only unknowns are the output weights of the network's weighted inner product; this reduces dramatically the dimensionality of the parameter space; and, based on this, (b) using established least-squares solvers (e.g., Tikhonov regularization and preconditioned QR decomposition) that offer superior numerical approximation properties compared to other optimization techniques used in deep-learning. In this work, we prove the universal approximation accuracy of RandONets for approximating nonlinear operators and demonstrate their efficiency in approximating linear nonlinear evolution operators (right-hand-sides (RHS)) with a focus on PDEs. We show, that for this particular task, RandONets outperform, both in terms of numerical approximation accuracy and computational cost, the ``vanilla" DeepOnets.
A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications
Authors: Aydin Zaboli, Seong Lok Choi, Tai-Jin Song, Junho Hong
Subjects: Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2406.05472
Pdf link: https://arxiv.org/pdf/2406.05472
Abstract Cybersecurity breaches in digital substations can pose significant challenges to the stability and reliability of power system operations. To address these challenges, defense and mitigation techniques are required. Identifying and detecting anomalies in information and communication technology (ICT) is crucial to ensure secure device interactions within digital substations. This paper proposes a task-oriented dialogue (ToD) system for anomaly detection (AD) in datasets of multicast messages e.g., generic object oriented substation event (GOOSE) and sampled value (SV) in digital substations using large language models (LLMs). This model has a lower potential error and better scalability and adaptability than a process that considers the cybersecurity guidelines recommended by humans, known as the human-in-the-loop (HITL) process. Also, this methodology significantly reduces the effort required when addressing new cyber threats or anomalies compared with machine learning (ML) techniques, since it leaves the models complexity and precision unaffected and offers a faster implementation. These findings present a comparative assessment, conducted utilizing standard and advanced performance evaluation metrics for the proposed AD framework and the HITL process. To generate and extract datasets of IEC 61850 communications, a hardware-in-the-loop (HIL) testbed was employed.
Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals
Authors: Susu Sun, Stefano Woerner, Andreas Maier, Lisa M. Koch, Christian F. Baumgartner
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.05477
Pdf link: https://arxiv.org/pdf/2406.05477
Abstract Interpretability is crucial for machine learning algorithms in high-stakes medical applications. However, high-performing neural networks typically cannot explain their predictions. Post-hoc explanation methods provide a way to understand neural networks but have been shown to suffer from conceptual problems. Moreover, current research largely focuses on providing local explanations for individual samples rather than global explanations for the model itself. In this paper, we propose Attri-Net, an inherently interpretable model for multi-label classification that provides local and global explanations. Attri-Net first counterfactually generates class-specific attribution maps to highlight the disease evidence, then performs classification with logistic regression classifiers based solely on the attribution maps. Local explanations for each prediction can be obtained by interpreting the attribution maps weighted by the classifiers' weights. Global explanation of whole model can be obtained by jointly considering learned average representations of the attribution maps for each class (called the class centers) and the weights of the linear classifiers. To ensure the model is ``right for the right reason", we further introduce a mechanism to guide the model's explanations to align with human knowledge. Our comprehensive evaluations show that Attri-Net can generate high-quality explanations consistent with clinical knowledge while not sacrificing classification performance.
G-Transformer: Counterfactual Outcome Prediction under Dynamic and Time-varying Treatment Regimes
Authors: Hong Xiong, Feng Wu, Leon Deng, Megan Su, Li-wei H Lehman
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.05504
Pdf link: https://arxiv.org/pdf/2406.05504
Abstract In the context of medical decision making, counterfactual prediction enables clinicians to predict treatment outcomes of interest under alternative courses of therapeutic actions given observed patient history. Prior machine learning approaches for counterfactual predictions under time-varying treatments focus on static time-varying treatment regimes where treatments do not depend on previous covariate history. In this work, we present G-Transformer, a Transformer-based framework supporting g-computation for counterfactual prediction under dynamic and time-varying treatment strategies. G-Transfomer captures complex, long-range dependencies in time-varying covariates using a Transformer architecture. G-Transformer estimates the conditional distribution of relevant covariates given covariate and treatment history at each time point using an encoder architecture, then produces Monte Carlo estimates of counterfactual outcomes by simulating forward patient trajectories under treatment strategies of interest. We evaluate G-Transformer extensively using two simulated longitudinal datasets from mechanistic models, and a real-world sepsis ICU dataset from MIMIC-IV. G-Transformer outperforms both classical and state-of-the-art counterfactual prediction models in these settings. To the best of our knowledge, this is the first Transformer-based architecture for counterfactual outcome prediction under dynamic and time-varying treatment strategies. Code will be released upon publication of the paper.
Blockchain Integrated Federated Learning in Edge-Fog-Cloud Systems for IoT based Healthcare Applications A Survey
Authors: Shinu M. Rajagopal, Supriya M., Rajkumar Buyya
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2406.05517
Pdf link: https://arxiv.org/pdf/2406.05517
Abstract Modern Internet of Things (IoT) applications generate enormous amounts of data, making data-driven machine learning essential for developing precise and reliable statistical models. However, data is often stored in silos, and strict user-privacy legislation complicates data utilization, limiting machine learning's potential in traditional centralized paradigms due to diverse data probability distributions and lack of personalization. Federated learning, a new distributed paradigm, supports collaborative learning while preserving privacy, making it ideal for IoT applications. By employing cryptographic techniques, IoT systems can securely store and transmit data, ensuring consistency. The integration of federated learning and blockchain is particularly advantageous for handling sensitive data, such as in healthcare. Despite the potential of these technologies, a comprehensive examination of their integration in edge-fog-cloud-based IoT computing systems and healthcare applications is needed. This survey article explores the architecture, structure, functions, and characteristics of federated learning and blockchain, their applications in various computing paradigms, and evaluates their implementations in healthcare.
Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training
Authors: Ray Cao, Sherry Luo, Steve Gan, Sujeeth Jinesh
Subjects: Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2406.05546
Pdf link: https://arxiv.org/pdf/2406.05546
Abstract In this study, we explore the impact of relaxing data consistency in parallel machine learning training during a failure using various parameter server configurations. Our failure recovery strategies include traditional checkpointing, chain replication (which ensures a backup server takes over in case of failure), and a novel stateless parameter server approach. In the stateless approach, workers continue generating gradient updates even if the parameter server is down, applying these updates once the server is back online. We compare these techniques to a standard checkpointing approach, where the training job is resumed from the latest checkpoint. To assess the resilience and performance of each configuration, we intentionally killed the parameter server during training for each experiment. Our experiment results indicate that the stateless parameter server approach continues to train towards convergence and improves accuracy as much as 10\% in the face of a failure despite using stale weights and gradients. The chain replication and checkpointing techniques demonstrate convergence but suffer from setbacks in accuracy due to restarting from old checkpoints. These results suggest that allowing workers to continue generating updates during server downtime and applying these updates later can effectively improve hardware utilization. Furthermore, despite higher resource usage, the stateless parameter server method incurs similar monetary costs in terms of hardware usage compared to standard checkpointing methods due to the pricing structure of common cloud providers.
Learning Human Detected Differences in Directed Acyclic Graphs
Authors: Kathrin Guckes (née Ballweg), Alena Beyer, Prof. Margit Pohl, Prof. Tatiana von Landesberger
Subjects: Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2406.05561
Pdf link: https://arxiv.org/pdf/2406.05561
Abstract Prior research has shown that human perception of similarity differs from mathematical measures in visual comparison tasks, including those involving directed acyclic graphs. This divergence can lead to missed differences and skepticism about algorithmic results. To address this, we aim to learn the structural differences humans detect in graphs visually. We want to visualize these human-detected differences alongside actual changes, enhancing credibility and aiding users in spotting overlooked differences. Our approach aligns with recent research in machine learning capturing human behavior. We provide a data augmentation algorithm, a dataset, and a machine learning model to support this task. This work fills a gap in learning differences in directed acyclic graphs and contributes to better comparative visualizations.
Flexible Multi-Dimensional FFTs for Plane Wave Density Functional Theory Codes
Authors: Doru Thom Popovici, Mauro del Ben, Osni Marques, Andrew Canning
Subjects: Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)
Arxiv link: https://arxiv.org/abs/2406.05577
Pdf link: https://arxiv.org/pdf/2406.05577
Abstract Multi-dimensional Fourier transforms are key mathematical building blocks that appear in a wide range of applications from materials science, physics, chemistry and even machine learning. Over the past years, a multitude of software packages targeting distributed multi-dimensional Fourier transforms have been developed. Most variants attempt to offer efficient implementations for single transforms applied on data mapped onto rectangular grids. However, not all scientific applications conform to this pattern, i.e. plane wave Density Functional Theory codes require multi-dimensional Fourier transforms applied on data represented as batches of spheres. Typically, the implementations for this use case are hand-coded and tailored for the requirements of each application. In this work, we present the Fastest Fourier Transform from Berkeley (FFTB) a distributed framework that offers flexible implementations for both regular/non-regular data grids and batched/non-batched transforms. We provide a flexible implementations with a user-friendly API that captures most of the use cases. Furthermore, we provide implementations for both CPU and GPU platforms, showing that our approach offers improved execution time and scalability on the HP Cray EX supercomputer. In addition, we outline the need for flexible implementations for different use cases of the software package.
Deep Learning to Predict Glaucoma Progression using Structural Changes in the Eye
Authors: Sayan Mandal
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.05605
Pdf link: https://arxiv.org/pdf/2406.05605
Abstract Glaucoma is a chronic eye disease characterized by optic neuropathy, leading to irreversible vision loss. It progresses gradually, often remaining undiagnosed until advanced stages. Early detection is crucial to monitor atrophy and develop treatment strategies to prevent further vision impairment. Data-centric methods have enabled computer-aided algorithms for precise glaucoma diagnosis. In this study, we use deep learning models to identify complex disease traits and progression criteria, detecting subtle changes indicative of glaucoma. We explore the structure-function relationship in glaucoma progression and predict functional impairment from structural eye deterioration. We analyze statistical and machine learning methods, including deep learning techniques with optical coherence tomography (OCT) scans for accurate progression prediction. Addressing challenges like age variability, data imbalances, and noisy labels, we develop novel semi-supervised time-series algorithms: 1. Weakly-Supervised Time-Series Learning: We create a CNN-LSTM model to encode spatiotemporal features from OCT scans. This approach uses age-related progression and positive-unlabeled data to establish robust pseudo-progression criteria, bypassing gold-standard labels. 2. Semi-Supervised Time-Series Learning: Using labels from Guided Progression Analysis (GPA) in a contrastive learning scheme, the CNN-LSTM architecture learns from potentially mislabeled data to improve prediction accuracy. Our methods outperform conventional and state-of-the-art techniques.
Which Backbone to Use: A Resource-efficient Domain Specific Comparison for Computer Vision
Authors: Pranav Jeevan, Amit Sethi
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.05612
Pdf link: https://arxiv.org/pdf/2406.05612
Abstract In contemporary computer vision applications, particularly image classification, architectural backbones pre-trained on large datasets like ImageNet are commonly employed as feature extractors. Despite the widespread use of these pre-trained convolutional neural networks (CNNs), there remains a gap in understanding the performance of various resource-efficient backbones across diverse domains and dataset sizes. Our study systematically evaluates multiple lightweight, pre-trained CNN backbones under consistent training settings across a variety of datasets, including natural images, medical images, galaxy images, and remote sensing images. This comprehensive analysis aims to aid machine learning practitioners in selecting the most suitable backbone for their specific problem, especially in scenarios involving small datasets where fine-tuning a pre-trained network is crucial. Even though attention-based architectures are gaining popularity, we observed that they tend to perform poorly under low data finetuning tasks compared to CNNs. We also observed that some CNN architectures such as ConvNeXt, RegNet and EfficientNet performs well compared to others on a diverse set of domains consistently. Our findings provide actionable insights into the performance trade-offs and effectiveness of different backbones, facilitating informed decision-making in model selection for a broad spectrum of computer vision domains. Our code is available here: this https URL
Cross Language Soccer Framework: An Open Source Framework for the RoboCup 2D Soccer Simulation
Authors: Nader Zare, Aref Sayareh, Alireza Sadraii, Arad Firouzkouhi, Amilcar Soares
Subjects: Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2406.05621
Pdf link: https://arxiv.org/pdf/2406.05621
Abstract RoboCup Soccer Simulation 2D (SS2D) research is hampered by the complexity of existing Cpp-based codes like Helios, Cyrus, and Gliders, which also suffer from limited integration with modern machine learning frameworks. This development paper introduces a transformative solution a gRPC-based, language-agnostic framework that seamlessly integrates with the high-performance Helios base code. This approach not only facilitates the use of diverse programming languages including CSharp, JavaScript, and Python but also maintains the computational efficiency critical for real time decision making in SS2D. By breaking down language barriers, our framework significantly enhances collaborative potential and flexibility, empowering researchers to innovate without the overhead of mastering or developing extensive base codes. We invite the global research community to leverage and contribute to the Cross Language Soccer (CLS) framework, which is openly available under the MIT License, to drive forward the capabilities of multi-agent systems in soccer simulations.
General Distribution Learning: A theoretical framework for Deep Learning
Authors: Binchuan Qi, Li Li, Wei Gong
Subjects: Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2406.05666
Pdf link: https://arxiv.org/pdf/2406.05666
Abstract There remain numerous unanswered research questions on deep learning (DL) within the classical learning theory framework. These include the remarkable generalization capabilities of overparametrized neural networks (NNs), the efficient optimization performance despite non-convexity of objectives, the mechanism of flat minima in generalization, and the exceptional performance of deep architectures, among others. This paper introduces a novel theoretical learning framework known as General Distribution Learning (GD Learning), which is designed to address a comprehensive range of machine learning and statistical tasks, including classification, regression and parameter estimation. Departing from statistical machine learning, GD Learning focuses on the true underlying distribution. In GD Learning, learning error, corresponding to the expected error in classical statistical learning framework, is divided into fitting errors caused by models and fitting algorithms, as well as sampling errors introduced by limited sampling data. The framework significantly incorporates prior knowledge, especially in scenarios characterized by data scarcity. This integration of external knowledge helps to minimize learning errors across the entire dataset, thereby enhancing performance. Within the GD Learning framework, we demonstrate that the global optimal solution to non-convex optimization problems, such as minimizing fitting error, can be approached by minimizing the gradient norm and the non-uniformity of the eigenvalues of the model's Jacobian matrix. This insight has led to the development of the gradient structure control algorithm. GD Learning also offers a fresh perspective on the questions on deep learning, including overparameterization and non-convex optimizations, bias-variance trade-off, and the mechanism of flat minima.
Certified Robustness to Data Poisoning in Gradient-Based Training
Authors: Philip Sosnin, Mark N. Müller, Maximilian Baader, Calvin Tsay, Matthew Wicker
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2406.05670
Pdf link: https://arxiv.org/pdf/2406.05670
Abstract Modern machine learning pipelines leverage large amounts of public data, making it infeasible to guarantee data quality and leaving models open to poisoning and backdoor attacks. However, provably bounding model behavior under such attacks remains an open problem. In this work, we address this challenge and develop the first framework providing provable guarantees on the behavior of models trained with potentially manipulated data. In particular, our framework certifies robustness against untargeted and targeted poisoning as well as backdoor attacks for both input and label manipulations. Our method leverages convex relaxations to over-approximate the set of all possible parameter updates for a given poisoning threat model, allowing us to bound the set of all reachable parameters for any gradient-based learning algorithm. Given this set of parameters, we provide bounds on worst-case behavior, including model performance and backdoor success rate. We demonstrate our approach on multiple real-world datasets from applications including energy consumption, medical imaging, and autonomous driving.
Understanding Open Source Contributor Profiles in Popular Machine Learning Libraries
Authors: Jiawen Liu, Haoxiang Zhang, Ying Zou
Subjects: Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2406.05685
Pdf link: https://arxiv.org/pdf/2406.05685
Abstract With the increasing popularity of machine learning (ML), many open-source software (OSS) contributors are attracted to developing and adopting ML approaches. Comprehensive understanding of ML contributors is crucial for successful ML OSS development and maintenance. Without such knowledge, there is a risk of inefficient resource allocation and hindered collaboration in ML OSS projects. Existing research focuses on understanding the difficulties and challenges perceived by ML contributors by user surveys. There is a lack of understanding of ML contributors based on their activities tracked from software repositories. In this paper, we aim to understand ML contributors by identifying contributor profiles in ML libraries. We further study contributors' OSS engagement from three aspects: workload composition, work preferences, and technical importance. By investigating 7,640 contributors from 6 popular ML libraries (TensorFlow, PyTorch, Keras, MXNet, Theano, and ONNX), we identify four contributor profiles: Core-Afterhour, Core-Workhour, Peripheral-Afterhour, and Peripheral-Workhour. We find that: 1) project experience, authored files, collaborations, and geographical location are significant features of all profiles; 2) contributors in Core profiles exhibit significantly different OSS engagement compared to Peripheral profiles; 3) contributors' work preferences and workload compositions significantly impact project popularity; 4) long-term contributors evolve towards making fewer, constant, balanced and less technical contributions.
Peptide Vaccine Design by Evolutionary Multi-Objective Optimization
Authors: Dan-Xuan Liu, Yi-Heng Xu, Chao Qian
Subjects: Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2406.05743
Pdf link: https://arxiv.org/pdf/2406.05743
Abstract Peptide vaccines are growing in significance for fighting diverse diseases. Machine learning has improved the identification of peptides that can trigger immune responses, and the main challenge of peptide vaccine design now lies in selecting an effective subset of peptides due to the allelic diversity among individuals. Previous works mainly formulated this task as a constrained optimization problem, aiming to maximize the expected number of peptide-Major Histocompatibility Complex (peptide-MHC) bindings across a broad range of populations by selecting a subset of diverse peptides with limited size; and employed a greedy algorithm, whose performance, however, may be limited due to the greedy nature. In this paper, we propose a new framework PVD-EMO based on Evolutionary Multi-objective Optimization, which reformulates Peptide Vaccine Design as a bi-objective optimization problem that maximizes the expected number of peptide-MHC bindings and minimizes the number of selected peptides simultaneously, and employs a Multi-Objective Evolutionary Algorithm (MOEA) to solve it. We also incorporate warm-start and repair strategies into MOEAs to improve efficiency and performance. We prove that the warm-start strategy ensures that PVD-EMO maintains the same worst-case approximation guarantee as the previous greedy algorithm, and meanwhile, the EMO framework can help avoid local optima. Experiments on a peptide vaccine design for COVID-19, caused by the SARS-CoV-2 virus, demonstrate the superiority of PVD-EMO.
Rapid Optimization of Superposition Codes for Multi-Hop NOMA MANETs via Deep Unfolding
Authors: Tomer Alter, Nir Shlezinger
Subjects: Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2406.05747
Pdf link: https://arxiv.org/pdf/2406.05747
Abstract Various communication technologies are expected to utilize mobile ad hoc networks (MANETs). By combining MANETs with non-orthogonal multiple access (NOMA) communications, one can support scalable, spectrally efficient, and flexible network topologies. To achieve these benefits of NOMA MANETs, one should determine the transmission protocol, particularly the superposition code. However, the latter involves lengthy optimization that has to be repeated when the topology changes. In this work, we propose an algorithm for rapidly optimizing superposition codes in multi-hop NOMA MANETs. To achieve reliable tunning with few iterations, we adopt the emerging deep unfolding methodology, leveraging data to boost reliable settings. Our superposition coding optimization algorithm utilizes a small number of projected gradient steps while learning its per-user hyperparameters to maximize the minimal rate over past channels in an unsupervised manner. The learned optimizer is designed for both settings with full channel state information, as well as when the channel coefficients are to be estimated from pilots. We show that the combination of principled optimization and machine learning yields a scalable optimizer, that once trained, can be applied to different topologies. We cope with the non-convex nature of the optimization problem by applying parallel-learned optimization with different starting points as a form of ensemble learning. Our numerical results demonstrate that the proposed method enables the rapid setting of high-rate superposition codes for various channels.
Numerical solution of a PDE arising from prediction with expert advice
Authors: Jeff Calder, Nadejda Drenska, Drisana Mosaphir
Subjects: Subjects: Numerical Analysis (math.NA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2406.05754
Pdf link: https://arxiv.org/pdf/2406.05754
Abstract This work investigates the online machine learning problem of prediction with expert advice in an adversarial setting through numerical analysis of, and experiments with, a related partial differential equation. The problem is a repeated two-person game involving decision-making at each step informed by $n$ experts in an adversarial environment. The continuum limit of this game over a large number of steps is a degenerate elliptic equation whose solution encodes the optimal strategies for both players. We develop numerical methods for approximating the solution of this equation in relatively high dimensions ($n\leq 10$) by exploiting symmetries in the equation and the solution to drastically reduce the size of the computational domain. Based on our numerical results we make a number of conjectures about the optimality of various adversarial strategies, in particular about the non-optimality of the COMB strategy.
ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving
Authors: Chen Ma, Ningfei Wang, Zhengyu Zhao, Qian Wang, Qi Alfred Chen, Chao Shen
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2406.05810
Pdf link: https://arxiv.org/pdf/2406.05810
Abstract Recent research in adversarial machine learning has focused on visual perception in Autonomous Driving (AD) and has shown that printed adversarial patches can attack object detectors. However, it is important to note that AD visual perception encompasses more than just object detection; it also includes Multiple Object Tracking (MOT). MOT enhances the robustness by compensating for object detection errors and requiring consistent object detection results across multiple frames before influencing tracking results and driving decisions. Thus, MOT makes attacks on object detection alone less effective. To attack such robust AD visual perception, a digital hijacking attack has been proposed to cause dangerous driving scenarios. However, this attack has limited effectiveness. In this paper, we introduce a novel physical-world adversarial patch attack, ControlLoc, designed to exploit hijacking vulnerabilities in entire AD visual perception. ControlLoc utilizes a two-stage process: initially identifying the optimal location for the adversarial patch, and subsequently generating the patch that can modify the perceived location and shape of objects with the optimal location. Extensive evaluations demonstrate the superior performance of ControlLoc, achieving an impressive average attack success rate of around 98.1% across various AD visual perceptions and datasets, which is four times greater effectiveness than the existing hijacking attack. The effectiveness of ControlLoc is further validated in physical-world conditions, including real vehicle tests under different conditions such as outdoor light conditions with an average attack success rate of 77.5%. AD system-level impact assessments are also included, such as vehicle collision, using industry-grade AD systems and production-grade AD simulators with an average vehicle collision rate and unnecessary emergency stop rate of 81.3%.
What Can We Learn from State Space Models for Machine Learning on Graphs?
Authors: Yinan Huang, Siqi Miao, Pan Li
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.05815
Pdf link: https://arxiv.org/pdf/2406.05815
Abstract Machine learning on graphs has recently found extensive applications across domains. However, the commonly used Message Passing Neural Networks (MPNNs) suffer from limited expressive power and struggle to capture long-range dependencies. Graph transformers offer a strong alternative due to their global attention mechanism, but they come with great computational overheads, especially for large graphs. In recent years, State Space Models (SSMs) have emerged as a compelling approach to replace full attention in transformers to model sequential data. It blends the strengths of RNNs and CNNs, offering a) efficient computation, b) the ability to capture long-range dependencies, and c) good generalization across sequences of various lengths. However, extending SSMs to graph-structured data presents unique challenges due to the lack of canonical node ordering in graphs. In this work, we propose Graph State Space Convolution (GSSC) as a principled extension of SSMs to graph-structured data. By leveraging global permutation-equivariant set aggregation and factorizable graph kernels that rely on relative node distances as the convolution kernels, GSSC preserves all three advantages of SSMs. We demonstrate the provably stronger expressiveness of GSSC than MPNNs in counting graph substructures and show its effectiveness across 10 real-world, widely used benchmark datasets, where GSSC achieves best results on 7 out of 10 datasets with all significant improvements compared to the state-of-the-art baselines and second-best results on the other 3 datasets. Our findings highlight the potential of GSSC as a powerful and scalable model for graph machine learning. Our code is available at this https URL.
Self-Distilled Disentangled Learning for Counterfactual Prediction
Authors: Xinshu Li, Mingling Gong, Lina Yao
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2406.05855
Pdf link: https://arxiv.org/pdf/2406.05855
Abstract The advancements in disentangled representation learning significantly enhance the accuracy of counterfactual predictions by granting precise control over instrumental variables, confounders, and adjustable variables. An appealing method for achieving the independent separation of these factors is mutual information minimization, a task that presents challenges in numerous machine learning scenarios, especially within high-dimensional spaces. To circumvent this challenge, we propose the Self-Distilled Disentanglement framework, referred to as $SD^2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs for high-dimensional representations. Our comprehensive experiments, conducted on both synthetic and real-world datasets, confirms the effectiveness of our approach in facilitating counterfactual inference in the presence of both observed and unobserved confounders.
Conserving Human Creativity with Evolutionary Generative Algorithms: A Case Study in Music Generation
Authors: Justin Kilb, Caroline Ellis
Subjects: Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2406.05873
Pdf link: https://arxiv.org/pdf/2406.05873
Abstract This study explores the application of evolutionary generative algorithms in music production to preserve and enhance human creativity. By integrating human feedback into Differential Evolution algorithms, we produced six songs that were submitted to international record labels, all of which received contract offers. In addition to testing the commercial viability of these methods, this paper examines the long-term implications of content generation using traditional machine learning methods compared with evolutionary algorithms. Specifically, as current generative techniques continue to scale, the potential for computer-generated content to outpace human creation becomes likely. This trend poses a risk of exhausting the pool of human-created training data, potentially forcing generative machine learning models to increasingly depend on their random input functions for generating novel content. In contrast to a future of content generation guided by aimless random functions, our approach allows for individualized creative exploration, ensuring that computer-assisted content generation methods are human-centric and culturally relevant through time.
Few-Shot Load Forecasting Under Data Scarcity in Smart Grids: A Meta-Learning Approach
Authors: Georgios Tsoumplekas, Christos L. Athanasiadis, Dimitrios I. Doukas, Antonios Chrysopoulos, Pericles A. Mitkas
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2406.05887
Pdf link: https://arxiv.org/pdf/2406.05887
Abstract Despite the rapid expansion of smart grids and large volumes of data at the individual consumer level, there are still various cases where adequate data collection to train accurate load forecasting models is challenging or even impossible. This paper proposes adapting an established model-agnostic meta-learning algorithm for short-term load forecasting in the context of few-shot learning. Specifically, the proposed method can rapidly adapt and generalize within any unknown load time series of arbitrary length using only minimal training samples. In this context, the meta-learning model learns an optimal set of initial parameters for a base-level learner recurrent neural network. The proposed model is evaluated using a dataset of historical load consumption data from real-world consumers. Despite the examined load series' short length, it produces accurate forecasts outperforming transfer learning and task-specific machine learning methods by $12.5\%$. To enhance robustness and fairness during model evaluation, a novel metric, mean average log percentage error, is proposed that alleviates the bias introduced by the commonly used MAPE metric. Finally, a series of studies to evaluate the model's robustness under different hyperparameters and time series lengths is also conducted, demonstrating that the proposed approach consistently outperforms all other models.
Event prediction and causality inference despite incomplete information
Authors: Harrison Lam, Yuanjie Chen, Noboru Kanazawa, Mohammad Chowdhury, Anna Battista, Stephan Waldert
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.05893
Pdf link: https://arxiv.org/pdf/2406.05893
Abstract We explored the challenge of predicting and explaining the occurrence of events within sequences of data points. Our focus was particularly on scenarios in which unknown triggers causing the occurrence of events may consist of non-consecutive, masked, noisy data points. This scenario is akin to an agent tasked with learning to predict and explain the occurrence of events without understanding the underlying processes or having access to crucial information. Such scenarios are encountered across various fields, such as genomics, hardware and software verification, and financial time series prediction. We combined analytical, simulation, and machine learning (ML) approaches to investigate, quantify, and provide solutions to this challenge. We deduced and validated equations generally applicable to any variation of the underlying challenge. Using these equations, we (1) described how the level of complexity changes with various parameters (e.g., number of apparent and hidden states, trigger length, confidence, etc.) and (2) quantified the data needed to successfully train an ML model. We then (3) proved our ML solution learns and subsequently identifies unknown triggers and predicts the occurrence of events. If the complexity of the challenge is too high, our ML solution can identify trigger candidates to be used to interactively probe the system under investigation to determine the true trigger in a way considerably more efficient than brute force methods. By sharing our findings, we aim to assist others grappling with similar challenges, enabling estimates on the complexity of their problem, the data required and a solution to solve it.
Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback
Authors: Emilia Agis Lerner, Florian E. Dorner, Elliott Ash, Naman Goel
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2406.05902
Pdf link: https://arxiv.org/pdf/2406.05902
Abstract There is a growing body of work on learning from human feedback to align various aspects of machine learning systems with human values and preferences. We consider the setting of fairness in content moderation, in which human feedback is used to determine how two comments -- referencing different sensitive attribute groups -- should be treated in comparison to one another. With a novel dataset collected from Prolific and MTurk, we find significant gaps in fairness preferences depending on the race, age, political stance, educational level, and LGBTQ+ identity of annotators. We also demonstrate that demographics mentioned in text have a strong influence on how users perceive individual fairness in moderation. Further, we find that differences also exist in downstream classifiers trained to predict human preferences. Finally, we observe that an ensemble, giving equal weight to classifiers trained on annotations from different demographics, performs better for different demographic intersections; compared to a single classifier that gives equal weight to each annotation.
Explainable AI for Mental Disorder Detection via Social Media: A survey and outlook
Authors: Yusif Ibrahimov, Tarique Anwar, Tommy Yuan
Subjects: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2406.05984
Pdf link: https://arxiv.org/pdf/2406.05984
Abstract Mental health constitutes a complex and pervasive global challenge, affecting millions of lives and often leading to severe consequences. In this paper, we conduct a thorough survey to explore the intersection of data science, artificial intelligence, and mental healthcare, focusing on the recent developments of mental disorder detection through online social media (OSM). A significant portion of the population actively engages in OSM platforms, creating a vast repository of personal data that holds immense potential for mental health analytics. The paper navigates through traditional diagnostic methods, state-of-the-art data- and AI-driven research studies, and the emergence of explainable AI (XAI) models for mental healthcare. We review state-of-the-art machine learning methods, particularly those based on modern deep learning, while emphasising the need for explainability in healthcare AI models. The experimental design section provides insights into prevalent practices, including available datasets and evaluation approaches. We also identify key issues and challenges in the field and propose promising future research directions. As mental health decisions demand transparency, interpretability, and ethical considerations, this paper contributes to the ongoing discourse on advancing XAI in mental healthcare through social media. The comprehensive overview presented here aims to guide researchers, practitioners, and policymakers in developing the area of mental disorder detection.
A Dual-View Approach to Classifying Radiology Reports by Co-Training
Authors: Yutong Han, Yan Yuan, Lili Mou
Subjects: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.05995
Pdf link: https://arxiv.org/pdf/2406.05995
Abstract Radiology report analysis provides valuable information that can aid with public health initiatives, and has been attracting increasing attention from the research community. In this work, we present a novel insight that the structure of a radiology report (namely, the Findings and Impression sections) offers different views of a radiology scan. Based on this intuition, we further propose a co-training approach, where two machine learning models are built upon the Findings and Impression sections, respectively, and use each other's information to boost performance with massive unlabeled data in a semi-supervised manner. We conducted experiments in a public health surveillance study, and results show that our co-training approach is able to improve performance using the dual views and surpass competing supervised and semi-supervised methods.
fSEAD: a Composable FPGA-based Streaming Ensemble Anomaly Detection Library
Authors: Binglei Lou, David Boland, Philip H.W. Leong
Subjects: Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.05999
Pdf link: https://arxiv.org/pdf/2406.05999
Abstract Machine learning ensembles combine multiple base models to produce a more accurate output. They can be applied to a range of machine learning problems, including anomaly detection. In this paper, we investigate how to maximize the composability and scalability of an FPGA-based streaming ensemble anomaly detector (fSEAD). To achieve this, we propose a flexible computing architecture consisting of multiple partially reconfigurable regions, pblocks, which each implement anomaly detectors. Our proof-of-concept design supports three state-of-the-art anomaly detection algorithms: Loda, RS-Hash and xStream. Each algorithm is scalable, meaning multiple instances can be placed within a pblock to improve performance. Moreover, fSEAD is implemented using High-level synthesis (HLS), meaning further custom anomaly detectors can be supported. Pblocks are interconnected via an AXI-switch, enabling them to be composed in an arbitrary fashion before combining and merging results at run-time to create an ensemble that maximizes the use of FPGA resources and accuracy. Through utilizing reconfigurable Dynamic Function eXchange (DFX), the detector can be modified at run-time to adapt to changing environmental conditions. We compare fSEAD to an equivalent central processing unit (CPU) implementation using four standard datasets, with speed-ups ranging from $3\times$ to $8\times$.
EpiLearn: A Python Library for Machine Learning in Epidemic Modeling
Authors: Zewen Liu, Yunxiao Li, Mingyang Wei, Guancheng Wan, Max S.Y. Lau, Wei Jin
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.06016
Pdf link: https://arxiv.org/pdf/2406.06016
Abstract EpiLearn is a Python toolkit developed for modeling, simulating, and analyzing epidemic data. Although there exist several packages that also deal with epidemic modeling, they are often restricted to mechanistic models or traditional statistical tools. As machine learning continues to shape the world, the gap between these packages and the latest models has become larger. To bridge the gap and inspire innovative research in epidemic modeling, EpiLearn not only provides support for evaluating epidemic models based on machine learning, but also incorporates comprehensive tools for analyzing epidemic data, such as simulation, visualization, transformations, etc. For the convenience of both epidemiologists and data scientists, we provide a unified framework for training and evaluation of epidemic models on two tasks: Forecasting and Source Detection. To facilitate the development of new models, EpiLearn follows a modular design, making it flexible and easy to use. In addition, an interactive web application is also developed to visualize the real-world or simulated epidemic data. Our package is available at this https URL.
GraphStorm: all-in-one graph machine learning framework for industry applications
Authors: Da Zheng, Xiang Song, Qi Zhu, Jian Zhang, Theodore Vasiloudis, Runjie Ma, Houyu Zhang, Zichen Wang, Soji Adeshina, Israt Nisa, Alejandro Mottini, Qingjun Cui, Huzefa Rangwala, Belinda Zeng, Christos Faloutsos, George Karypis
Subjects: Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2406.06022
Pdf link: https://arxiv.org/pdf/2406.06022
Abstract Graph machine learning (GML) is effective in many business applications. However, making GML easy to use and applicable to industry applications with massive datasets remain challenging. We developed GraphStorm, which provides an end-to-end solution for scalable graph construction, graph model training and inference. GraphStorm has the following desirable properties: (a) Easy to use: it can perform graph construction and model training and inference with just a single command; (b) Expert-friendly: GraphStorm contains many advanced GML modeling techniques to handle complex graph data and improve model performance; (c) Scalable: every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code. GraphStorm has been used and deployed for over a dozen billion-scale industry applications after its release in May 2023. It is open-sourced in Github: this https URL.
Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset
Authors: Shijie Lian, Ziyi Zhang, Hua Li, Wenjie Li, Laurence Tianruo Yang, Sam Kwong, Runmin Cong
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2406.06039
Pdf link: https://arxiv.org/pdf/2406.06039
Abstract With the breakthrough of large models, Segment Anything Model (SAM) and its extensions have been attempted to apply in diverse tasks of computer vision. Underwater salient instance segmentation is a foundational and vital step for various underwater vision tasks, which often suffer from low segmentation accuracy due to the complex underwater circumstances and the adaptive ability of models. Moreover, the lack of large-scale datasets with pixel-level salient instance annotations has impeded the development of machine learning techniques in this field. To address these issues, we construct the first large-scale underwater salient instance segmentation dataset (USIS10K), which contains 10,632 underwater images with pixel-level annotations in 7 categories from various underwater scenes. Then, we propose an Underwater Salient Instance Segmentation architecture based on Segment Anything Model (USIS-SAM) specifically for the underwater domain. We devise an Underwater Adaptive Visual Transformer (UA-ViT) encoder to incorporate underwater domain visual prompts into the segmentation network. We further design an out-of-the-box underwater Salient Feature Prompter Generator (SFPG) to automatically generate salient prompters instead of explicitly providing foreground points or boxes as prompts in SAM. Comprehensive experimental results show that our USIS-SAM method can achieve superior performance on USIS10K datasets compared to the state-of-the-art methods. Datasets and codes are released on this https URL.
LLM-Based Intent Processing and Network Optimization Using Attention-Based Hierarchical Reinforcement Learning
Authors: Md Arafat Habib, Pedro Enrique Iturria Rivera, Yigit Ozcan, Medhat Elsayed, Majid Bavand, Raimundus Gaigalas, Melike Erol-Kantarci
Subjects: Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2406.06059
Pdf link: https://arxiv.org/pdf/2406.06059
Abstract Intent-based network automation is a promising tool to enable easier network management however certain challenges need to be effectively addressed. These are: 1) processing intents, i.e., identification of logic and necessary parameters to fulfill an intent, 2) validating an intent to align it with current network status, and 3) satisfying intents via network optimizing functions like xApps and rApps in O-RAN. This paper addresses these points via a three-fold strategy to introduce intent-based automation for O-RAN. First, intents are processed via a lightweight Large Language Model (LLM). Secondly, once an intent is processed, it is validated against future incoming traffic volume profiles (high or low). Finally, a series of network optimization applications (rApps and xApps) have been developed. With their machine learning-based functionalities, they can improve certain key performance indicators such as throughput, delay, and energy efficiency. In this final stage, using an attention-based hierarchical reinforcement learning algorithm, these applications are optimally initiated to satisfy the intent of an operator. Our simulations show that the proposed method can achieve at least 12% increase in throughput, 17.1% increase in energy efficiency, and 26.5% decrease in network delay compared to the baseline algorithms.
Learning Physical Simulation with Message Passing Transformer
Authors: Zeyi Xu, Yifei Li
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.06060
Pdf link: https://arxiv.org/pdf/2406.06060
Abstract Machine learning methods for physical simulation have achieved significant success in recent years. We propose a new universal architecture based on Graph Neural Network, the Message Passing Transformer, which incorporates a Message Passing framework, employs an Encoder-Processor-Decoder structure, and applies Graph Fourier Loss as loss function for model optimization. To take advantage of the past message passing state information, we propose Hadamard-Product Attention to update the node attribute in the Processor, Hadamard-Product Attention is a variant of Dot-Product Attention that focuses on more fine-grained semantics and emphasizes on assigning attention weights over each feature dimension rather than each position in the sequence relative to others. We further introduce Graph Fourier Loss (GFL) to balance high-energy and low-energy components. To improve time performance, we precompute the graph's Laplacian eigenvectors before the training process. Our architecture achieves significant accuracy improvements in long-term rollouts for both Lagrangian and Eulerian dynamical systems over current methods.
Supervised Radio Frequency Interference Detection with SNNs
Authors: Nicholas J. Pritchard, Andreas Wicenec, Mohammed Bennamoun, Richard Dodson
Subjects: Subjects: Neural and Evolutionary Computing (cs.NE); Instrumentation and Methods for Astrophysics (astro-ph.IM)
Arxiv link: https://arxiv.org/abs/2406.06075
Pdf link: https://arxiv.org/pdf/2406.06075
Abstract Radio Frequency Interference (RFI) poses a significant challenge in radio astronomy, arising from terrestrial and celestial sources, disrupting observations conducted by radio telescopes. Addressing RFI involves intricate heuristic algorithms, manual examination, and, increasingly, machine learning methods. Given the dynamic and temporal nature of radio astronomy observations, Spiking Neural Networks (SNNs) emerge as a promising approach. In this study, we cast RFI detection as a supervised multi-variate time-series segmentation problem. Notably, our investigation explores the encoding of radio astronomy visibility data for SNN inference, considering six encoding schemes: rate, latency, delta-modulation, and three variations of the step-forward algorithm. We train a small two-layer fully connected SNN on simulated data derived from the Hydrogen Epoch of Reionization Array (HERA) telescope and perform extensive hyper-parameter optimization. Results reveal that latency encoding exhibits superior performance, achieving a per-pixel accuracy of 98.8% and an f1-score of 0.761. Remarkably, these metrics approach those of contemporary RFI detection algorithms, notwithstanding the simplicity and compactness of our proposed network architecture. This study underscores the potential of RFI detection as a benchmark problem for SNN researchers, emphasizing the efficacy of SNNs in addressing complex time-series segmentation tasks in radio astronomy.
Sequential Binary Classification for Intrusion Detection in Software Defined Networks
Authors: Ishan Chokshi, Shrihari Vasudevan, Nachiappan Sundaram, Raaghul Ranganathan
Subjects: Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2406.06099
Pdf link: https://arxiv.org/pdf/2406.06099
Abstract Software-Defined Networks (SDN) are the standard architecture for network deployment. Intrusion Detection Systems (IDS) are a pivotal part of this technology as networks become more vulnerable to new and sophisticated attacks. Machine Learning (ML)-based IDS are increasingly seen as the most effective approach to handle this issue. However, IDS datasets suffer from high class imbalance, which impacts the performance of standard ML models. We propose Sequential Binary Classification (SBC) - an algorithm for multi-class classification to address this issue. SBC is a hierarchical cascade of base classifiers, each of which can be modelled on any general binary classifier. Extensive experiments are reported on benchmark datasets that evaluate the performance of SBC under different scenarios.
DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection
Authors: Donggeun Ko, Sangwoo Jo, Dongjun Lee, Namjun Park, Jaekwang Kim
Subjects: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.06134
Pdf link: https://arxiv.org/pdf/2406.06134
Abstract Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by developing novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generative approaches to date have largely relied on using bias-specific samples from the dataset, which are typically too scarce. In this work, we propose, DiffInject, a straightforward yet powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. This approach significantly advances the use of diffusion models for debiasing purposes by manipulating the latent space. Our framework does not require any explicit knowledge of the bias types or labelling, making it a fully unsupervised setting for debiasing. Our methodology demonstrates substantial result in effectively reducing dataset bias.
A Survey on Machine Unlearning: Techniques and New Emerged Privacy Risks
Authors: Hengzhu Liu, Ping Xiong, Tianqing Zhu, Philip S. Yu
Subjects: Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2406.06186
Pdf link: https://arxiv.org/pdf/2406.06186
Abstract The explosive growth of machine learning has made it a critical infrastructure in the era of artificial intelligence. The extensive use of data poses a significant threat to individual privacy. Various countries have implemented corresponding laws, such as GDPR, to protect individuals' data privacy and the right to be forgotten. This has made machine unlearning a research hotspot in the field of privacy protection in recent years, with the aim of efficiently removing the contribution and impact of individual data from trained models. The research in academia on machine unlearning has continuously enriched its theoretical foundation, and many methods have been proposed, targeting different data removal requests in various application scenarios. However, recently researchers have found potential privacy leakages of various of machine unlearning approaches, making the privacy preservation on machine unlearning area a critical topic. This paper provides an overview and analysis of the existing research on machine unlearning, aiming to present the current vulnerabilities of machine unlearning approaches. We analyze privacy risks in various aspects, including definitions, implementation methods, and real-world applications. Compared to existing reviews, we analyze the new challenges posed by the latest malicious attack techniques on machine unlearning from the perspective of privacy threats. We hope that this survey can provide an initial but comprehensive discussion on this new emerging area.
Federated learning in food research
Authors: Zuzanna Fendor, Bas H.M. van der Velden, Xinxin Wang, Andrea Jr. Carnoli, Osman Mutlu, Ali Hürriyetoğlu
Subjects: Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.06202
Pdf link: https://arxiv.org/pdf/2406.06202
Abstract Research in the food domain is at times limited due to data sharing obstacles, such as data ownership, privacy requirements, and regulations. While important, these obstacles can restrict data-driven methods such as machine learning. Federated learning, the approach of training models on locally kept data and only sharing the learned parameters, is a potential technique to alleviate data sharing obstacles. This systematic review investigates the use of federated learning within the food domain, structures included papers in a federated learning framework, highlights knowledge gaps, and discusses potential applications. A total of 41 papers were included in the review. The current applications include solutions to water and milk quality assessment, cybersecurity of water processing, pesticide residue risk analysis, weed detection, and fraud detection, focusing on centralized horizontal federated learning. One of the gaps found was the lack of vertical or transfer federated learning and decentralized architectures.
Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning
Authors: Xiaoting Lyu, Yufei Han, Wei Wang, Jingkai Liu, Yongsheng Zhu, Guangquan Xu, Jiqiang Liu, Xiangliang Zhang
Subjects: Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2406.06207
Pdf link: https://arxiv.org/pdf/2406.06207
Abstract Federated Learning (FL) is a collaborative machine learning technique where multiple clients work together with a central server to train a global model without sharing their private data. However, the distribution shift across non-IID datasets of clients poses a challenge to this one-model-fits-all method hindering the ability of the global model to effectively adapt to each client's unique local data. To echo this challenge, personalized FL (PFL) is designed to allow each client to create personalized local models tailored to their private data. While extensive research has scrutinized backdoor risks in FL, it has remained underexplored in PFL applications. In this study, we delve deep into the vulnerabilities of PFL to backdoor attacks. Our analysis showcases a tale of two cities. On the one hand, the personalization process in PFL can dilute the backdoor poisoning effects injected into the personalized local models. Furthermore, PFL systems can also deploy both server-end and client-end defense mechanisms to strengthen the barrier against backdoor attacks. On the other hand, our study shows that PFL fortified with these defense methods may offer a false sense of security. We propose \textit{PFedBA}, a stealthy and effective backdoor attack strategy applicable to PFL systems. \textit{PFedBA} ingeniously aligns the backdoor learning task with the main learning task of PFL by optimizing the trigger generation process. Our comprehensive experiments demonstrate the effectiveness of \textit{PFedBA} in seamlessly embedding triggers into personalized local models. \textit{PFedBA} yields outstanding attack performance across 10 state-of-the-art PFL algorithms, defeating the existing 6 defense mechanisms. Our study sheds light on the subtle yet potent backdoor threats to PFL systems, urging the community to bolster defenses against emerging backdoor challenges.
Siren -- Advancing Cybersecurity through Deception and Adaptive Analysis
Authors: Girish Kulathumani, Samruth Ananthanarayanan, Ganesh Narayanan
Subjects: Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.06225
Pdf link: https://arxiv.org/pdf/2406.06225
Abstract Siren represents a pioneering research effort aimed at fortifying cybersecurity through strategic integration of deception, machine learning, and proactive threat analysis. Drawing inspiration from mythical sirens, this project employs sophisticated methods to lure potential threats into controlled environments. The system features a dynamic machine learning model for real-time analysis and classification, ensuring continuous adaptability to emerging cyber threats. The architectural framework includes a link monitoring proxy, a purpose-built machine learning model for dynamic link analysis, and a honeypot enriched with simulated user interactions to intensify threat engagement. Data protection within the honeypot is fortified with probabilistic encryption. Additionally, the incorporation of simulated user activity extends the system's capacity to capture and learn from potential attackers even after user disengagement. Siren introduces a paradigm shift in cybersecurity, transforming traditional defense mechanisms into proactive systems that actively engage and learn from potential adversaries. The research strives to enhance user protection while yielding valuable insights for ongoing refinement in response to the evolving landscape of cybersecurity threats.
PAC-Bayes Analysis for Recalibration in Classification
Authors: Masahiro Fujisawa, Futoshi Futami
Subjects: Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2406.06227
Pdf link: https://arxiv.org/pdf/2406.06227
Abstract Nonparametric estimation with binning is widely employed in the calibration error evaluation and the recalibration of machine learning models. Recently, theoretical analyses of the bias induced by this estimation approach have been actively pursued; however, the understanding of the generalization of the calibration error to unknown data remains limited. In addition, although many recalibration algorithms have been proposed, their generalization performance lacks theoretical guarantees. To address this problem, we conduct a generalization analysis of the calibration error under the probably approximately correct (PAC) Bayes framework. This approach enables us to derive a first optimizable upper bound for the generalization error in the calibration context. We then propose a generalization-aware recalibration algorithm based on our generalization theory. Numerical experiments show that our algorithm improves the Gaussian-process-based recalibration performance on various benchmark datasets and models.
A Guide to Stochastic Optimisation for Large-Scale Inverse Problems
Authors: Matthias J. Ehrhardt, Zeljko Kereta, Jingwei Liang, Junqi Tang
Subjects: Subjects: Numerical Analysis (math.NA); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2406.06342
Pdf link: https://arxiv.org/pdf/2406.06342
Abstract Stochastic optimisation algorithms are the de facto standard for machine learning with large amounts of data. Handling only a subset of available data in each optimisation step dramatically reduces the per-iteration computational costs, while still ensuring significant progress towards the solution. Driven by the need to solve large-scale optimisation problems as efficiently as possible, the last decade has witnessed an explosion of research in this area. Leveraging the parallels between machine learning and inverse problems has allowed harnessing the power of this research wave for solving inverse problems. In this survey, we provide a comprehensive account of the state-of-the-art in stochastic optimisation from the viewpoint of inverse problems. We present algorithms with diverse modalities of problem randomisation and discuss the roles of variance reduction, acceleration, higher-order methods, and other algorithmic modifications, and compare theoretical results with practical behaviour. We focus on the potential and the challenges for stochastic optimisation that are unique to inverse imaging problems and are not commonly encountered in machine learning. We conclude the survey with illustrative examples from imaging problems to examine the advantages and disadvantages that this new generation of algorithms bring to the field of inverse problems.
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Authors: Thomas Bott, Florian Lux, Ngoc Thang Vu
Subjects: Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2406.06406
Pdf link: https://arxiv.org/pdf/2406.06406
Abstract In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language. In this work, we propose a system conditioned on embeddings derived from an emotionally rich text that serves as prompt. Thereby, a joint representation of speaker and prompt embeddings is integrated at several points within a transformer-based architecture. Our approach is trained on merged emotional speech and text datasets and varies prompts in each training iteration to increase the generalization capabilities of the model. Objective and subjective evaluation results demonstrate the ability of the conditioned synthesis system to accurately transfer the emotions present in a prompt to speech. At the same time, precise tractability of speaker identities as well as overall high speech quality and intelligibility are maintained.
A Taxonomy of Challenges to Curating Fair Datasets
Authors: Dora Zhao, Morgan Klaus Scheuerman, Pooja Chitre, Jerone T.A. Andrews, Georgia Panagiotidou, Shawn Walker, Kathleen H. Pine, Alice Xiang
Subjects: Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2406.06407
Pdf link: https://arxiv.org/pdf/2406.06407
Abstract Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.
Foundation Inference Models for Markov Jump Processes
Authors: David Berghaus, Kostadin Cvejoski, Patrick Seifner, Cesar Ojeda, Ramses J. Sanchez
Subjects: Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2406.06419
Pdf link: https://arxiv.org/pdf/2406.06419
Abstract Markov jump processes are continuous-time stochastic processes which describe dynamical systems evolving in discrete state spaces. These processes find wide application in the natural sciences and machine learning, but their inference is known to be far from trivial. In this work we introduce a methodology for zero-shot inference of Markov jump processes (MJPs), on bounded state spaces, from noisy and sparse observations, which consists of two components. First, a broad probability distribution over families of MJPs, as well as over possible observation times and noise mechanisms, with which we simulate a synthetic dataset of hidden MJPs and their noisy observation process. Second, a neural network model that processes subsets of the simulated observations, and that is trained to output the initial condition and rate matrix of the target MJP in a supervised way. We empirically demonstrate that one and the same (pretrained) model can infer, in a zero-shot fashion, hidden MJPs evolving in state spaces of different dimensionalities. Specifically, we infer MJPs which describe (i) discrete flashing ratchet systems, which are a type of Brownian motors, and the conformational dynamics in (ii) molecular simulations, (iii) experimental ion channel data and (iv) simple protein folding models. What is more, we show that our model performs on par with state-of-the-art models which are finetuned to the target datasets.
Deep Generative Modeling Reshapes Compression and Transmission: From Efficiency to Resiliency
Authors: Jincheng Dai, Xiaoqi Qin, Sixian Wang, Lexi Xu, Kai Niu, Ping Zhang
Subjects: Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2406.06446
Pdf link: https://arxiv.org/pdf/2406.06446
Abstract Information theory and machine learning are inextricably linked and have even been referred to as "two sides of the same coin". One particularly elegant connection is the essential equivalence between probabilistic generative modeling and data compression or transmission. In this article, we reveal the dual-functionality of deep generative models that reshapes both data compression for efficiency and transmission error concealment for resiliency. We present how the contextual predictive capabilities of powerful generative models can be well positioned to be strong compressors and estimators. In this sense, we advocate for viewing the deep generative modeling problem through the lens of end-to-end communications, and evaluate the compression and error restoration capabilities of foundation generative models. We show that the kernel of many large generative models is powerful predictor that can capture complex relationships among semantic latent variables, and the communication viewpoints provide novel insights into semantic feature tokenization, contextual learning, and usage of deep generative models. In summary, our article highlights the essential connections of generative AI to source and channel coding techniques, and motivates researchers to make further explorations in this emerging topic.
How is the Pilot Doing: VTOL Pilot Workload Estimation by Multimodal Machine Learning on Psycho-physiological Signals
Authors: Jong Hoon Park, Lawrence Chen, Ian Higgins, Zhaobo Zheng, Shashank Mehrotra, Kevin Salubre, Mohammadreza Mousaei, Steven Willits, Blain Levedahl, Timothy Buker, Eliot Xing, Teruhisa Misu, Sebastian Scherer, Jean Oh
Subjects: Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2406.06448
Pdf link: https://arxiv.org/pdf/2406.06448
Abstract Vertical take-off and landing (VTOL) aircraft do not require a prolonged runway, thus allowing them to land almost anywhere. In recent years, their flexibility has made them popular in development, research, and operation. When compared to traditional fixed-wing aircraft and rotorcraft, VTOLs bring unique challenges as they combine many maneuvers from both types of aircraft. Pilot workload is a critical factor for safe and efficient operation of VTOLs. In this work, we conduct a user study to collect multimodal data from 28 pilots while they perform a variety of VTOL flight tasks. We analyze and interpolate behavioral patterns related to their performance and perceived workload. Finally, we build machine learning models to estimate their workload from the collected data. Our results are promising, suggesting that quantitative and accurate VTOL pilot workload monitoring is viable. Such assistive tools would help the research field understand VTOL operations and serve as a stepping stone for the industry to ensure VTOL safe operations and further remote operations.
OmniLytics+: A Secure, Efficient, and Affordable Blockchain Data Market for Machine Learning through Off-Chain Processing
Authors: Songze Li, Mingzhe Liu, Mengqi Chen
Subjects: Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2406.06477
Pdf link: https://arxiv.org/pdf/2406.06477
Abstract The rapid development of large machine learning (ML) models requires a massive amount of training data, resulting in booming demands of data sharing and trading through data markets. Traditional centralized data markets suffer from low level of security, and emerging decentralized platforms are faced with efficiency and privacy challenges. In this paper, we propose OmniLytics+, the first decentralized data market, built upon blockchain and smart contract technologies, to simultaneously achieve 1) data (resp., model) privacy for the data (resp. model) owner; 2) robustness against malicious data owners; 3) efficient data validation and aggregation. Specifically, adopting the zero-knowledge (ZK) rollup paradigm, OmniLytics+ proposes to secret share encrypted local gradients, computed from the encrypted global model, with a set of untrusted off-chain servers, who collaboratively generate a ZK proof on the validity of the gradient. In this way, the storage and processing overheads are securely offloaded from blockchain verifiers, significantly improving the privacy, efficiency, and affordability over existing rollup solutions. We implement the proposed OmniLytics+ data market as an Ethereum smart contract [41]. Extensive experiments demonstrate the effectiveness of OmniLytics+ in training large ML models in presence of malicious data owner, and the substantial advantages of OmniLytics+ in gas cost and execution time over baselines.
Continuum Attention for Neural Operators
Authors: Edoardo Calvello, Nikola B. Kovachki, Matthew E. Levine, Andrew M. Stuart
Subjects: Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2406.06486
Pdf link: https://arxiv.org/pdf/2406.06486
Abstract Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time-series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they are universal; it is thus natural to ask whether the attention mechanism can be used in the design of neural operators. Motivated by this, we study transformers in the function space setting. We formulate attention as a map between infinite dimensional function spaces and prove that the attention mechanism as implemented in practice is a Monte Carlo or finite difference approximation of this operator. The function space formulation allows for the design of transformer neural operators, a class of architectures designed to learn mappings between function spaces, for which we prove a universal approximation result. The prohibitive cost of applying the attention operator to functions defined on multi-dimensional domains leads to the need for more efficient attention-based architectures. For this reason we also introduce a function space generalization of the patching strategy from computer vision, and introduce a class of associated neural operators. Numerical results, on an array of operator learning problems, demonstrate the promise of our approaches to function space formulations of attention and their use in neural operators.

qiaoyuet / arxiv_daily

New submissions for Tuesday, 11 June 2024 (showing 666 of 666 entries ) #119

Keyword: differential privacy

Efficient Differentially Private Fine-Tuning of Diffusion Models

Privacy-Preserving Optimal Parameter Selection for Collaborative Clustering

Comments on "Federated Learning with Differential Privacy: Algorithms and Performance Analysis"

Keyword: privacy

oBAKE: an Online Biometric-Authenticated Key Exchange Protocol

CredSec: A Blockchain-based Secure Credential Management System for University Adoption

Federated LoRA with Sparse Communication

A Language Model-Guided Framework for Mining Time Series with Distributional Shifts

Efficient Differentially Private Fine-Tuning of Diffusion Models

Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models

PTF-FSR: A Parameter Transmission-Free Federated Sequential Recommender System

Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

PrivacyCube: Data Physicalization for Enhancing Privacy Awareness in IoT

PriviFy: Designing Tangible Interfaces for Configuring IoT Privacy Preferences

Blockchain Integrated Federated Learning in Edge-Fog-Cloud Systems for IoT based Healthcare Applications A Survey

Privacy-Preserving Optimal Parameter Selection for Collaborative Clustering

CCSI: Continual Class-Specific Impression for Data-free Class Incremental Learning

A Superalignment Framework in Autonomous Driving with Large Language Models

Region of Interest Loss for Anonymizing Learned Image Compression

Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance

Comments on "Federated Learning with Differential Privacy: Algorithms and Performance Analysis"

Source -Free Domain Adaptation for Speaker Verification in Data-Scarce Languages and Noisy Channels

Jailbreaking Quantum Computers

Dynamic Virtual Power Plants With Frequency Regulation Capacity

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

A Survey on Machine Unlearning: Techniques and New Emerged Privacy Risks

Federated learning in food research

A LoRa-based Energy-efficient Sensing System for Urban Data Collection

DiffAudit: Auditing Privacy Practices of Online Services for Children and Adolescents

OmniLytics+: A Secure, Efficient, and Affordable Blockchain Data Market for Machine Learning through Off-Chain Processing

Keyword: machine learning

Evaluating the Effectiveness of Data Augmentation for Emotion Classification in Low-Resource Settings

Federated LoRA with Sparse Communication

Automated Trustworthiness Testing for Machine Learning Classifiers

Investigating Memory Failure Prediction Across CPU Architectures

Design of reliable technology valuation model with calibrated machine learning of patent indicators

RandONet: Shallow-Networks with Random Projections for learning linear and nonlinear operators

A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications

Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals

G-Transformer: Counterfactual Outcome Prediction under Dynamic and Time-varying Treatment Regimes

Blockchain Integrated Federated Learning in Edge-Fog-Cloud Systems for IoT based Healthcare Applications A Survey

Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training

Learning Human Detected Differences in Directed Acyclic Graphs

Flexible Multi-Dimensional FFTs for Plane Wave Density Functional Theory Codes

Deep Learning to Predict Glaucoma Progression using Structural Changes in the Eye

Which Backbone to Use: A Resource-efficient Domain Specific Comparison for Computer Vision

Cross Language Soccer Framework: An Open Source Framework for the RoboCup 2D Soccer Simulation

General Distribution Learning: A theoretical framework for Deep Learning

Certified Robustness to Data Poisoning in Gradient-Based Training

Understanding Open Source Contributor Profiles in Popular Machine Learning Libraries

Peptide Vaccine Design by Evolutionary Multi-Objective Optimization

Rapid Optimization of Superposition Codes for Multi-Hop NOMA MANETs via Deep Unfolding

Numerical solution of a PDE arising from prediction with expert advice

ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving

What Can We Learn from State Space Models for Machine Learning on Graphs?

Self-Distilled Disentangled Learning for Counterfactual Prediction

Conserving Human Creativity with Evolutionary Generative Algorithms: A Case Study in Music Generation

Few-Shot Load Forecasting Under Data Scarcity in Smart Grids: A Meta-Learning Approach

Event prediction and causality inference despite incomplete information

Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback

Explainable AI for Mental Disorder Detection via Social Media: A survey and outlook

A Dual-View Approach to Classifying Radiology Reports by Co-Training

fSEAD: a Composable FPGA-based Streaming Ensemble Anomaly Detection Library

EpiLearn: A Python Library for Machine Learning in Epidemic Modeling

GraphStorm: all-in-one graph machine learning framework for industry applications

Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset

LLM-Based Intent Processing and Network Optimization Using Attention-Based Hierarchical Reinforcement Learning

Learning Physical Simulation with Message Passing Transformer

Supervised Radio Frequency Interference Detection with SNNs

Sequential Binary Classification for Intrusion Detection in Software Defined Networks

DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection

A Survey on Machine Unlearning: Techniques and New Emerged Privacy Risks

Federated learning in food research

Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning

Siren -- Advancing Cybersecurity through Deception and Adaptive Analysis

PAC-Bayes Analysis for Recalibration in Classification