New submissions for Fri, 3 May 24

Keyword: differential privacy

The Privacy Power of Correlated Noise in Decentralized Learning

Authors: Youssef Allouah, Anastasia Koloskova, Aymane El Firdoussi, Martin Jaggi, Rachid Guerraoui
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2405.01031
Pdf link: https://arxiv.org/pdf/2405.01031
Abstract Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose Decor, a variant of decentralized SGD with differential privacy (DP) guarantees. Essentially, in Decor, users securely exchange randomness seeds in one communication round to generate pairwise-canceling correlated Gaussian noises, which are injected to protect local models at every communication round. We theoretically and empirically show that, for arbitrary connected graphs, Decor matches the central DP optimal privacy-utility trade-off. We do so under SecLDP, our new relaxation of local DP, which protects all user communications against an external eavesdropper and curious users, assuming that every pair of connected users shares a secret, i.e., an information hidden to all others. The main theoretical challenge is to control the accumulation of non-canceling correlated noise due to network sparsity. We also propose a companion SecLDP privacy accountant for public use.
Privacy-Enhanced Database Synthesis for Benchmark Publishing
Authors: Yongrui Zhong, Yunqing Ge, Jianbin Qin, Shuyuan Zheng, Bo Tang, Yu-Xuan Qiu, Rui Mao, Ye Yuan, Makoto Onizuka, Chuan Xiao
Subjects: Databases (cs.DB); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.01312
Pdf link: https://arxiv.org/pdf/2405.01312
Abstract Benchmarking is crucial for evaluating a DBMS, yet existing benchmarks often fail to reflect the varied nature of user workloads. As a result, there is increasing momentum toward creating databases that incorporate real-world user data to more accurately mirror business environments. However, privacy concerns deter users from directly sharing their data, underscoring the importance of creating synthesized databases for benchmarking that also prioritize privacy protection. Differential privacy has become a key method for safeguarding privacy when sharing data, but the focus has largely been on minimizing errors in aggregate queries or classification tasks, with less attention given to benchmarking factors like runtime performance. This paper delves into the creation of privacy-preserving databases specifically for benchmarking, aiming to produce a differentially private database whose query performance closely resembles that of the original data. Introducing PrivBench, an innovative synthesis framework, we support the generation of high-quality data that maintains privacy. PrivBench uses sum-product networks (SPNs) to partition and sample data, enhancing data representation while securing privacy. The framework allows users to adjust the detail of SPN partitions and privacy settings, crucial for customizing privacy levels. We validate our approach, which uses the Laplace and exponential mechanisms, in maintaining privacy. Our tests show that PrivBench effectively generates data that maintains privacy and excels in query performance, consistently reducing errors in query execution time, query cardinality, and KL divergence.
Navigating Heterogeneity and Privacy in One-Shot Federated Learning with Diffusion Models
Authors: Matias Mendieta, Guangyu Sun, Chen Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01494
Pdf link: https://arxiv.org/pdf/2405.01494
Abstract Federated learning (FL) enables multiple clients to train models collectively while preserving data privacy. However, FL faces challenges in terms of communication cost and data heterogeneity. One-shot federated learning has emerged as a solution by reducing communication rounds, improving efficiency, and providing better security against eavesdropping attacks. Nevertheless, data heterogeneity remains a significant challenge, impacting performance. This work explores the effectiveness of diffusion models in one-shot FL, demonstrating their applicability in addressing data heterogeneity and improving FL performance. Additionally, we investigate the utility of our diffusion model approach, FedDiff, compared to other one-shot FL methods under differential privacy (DP). Furthermore, to improve generated sample quality under DP settings, we propose a pragmatic Fourier Magnitude Filtering (FMF) method, enhancing the effectiveness of generated data for global model training.
Keyword: privacy

Federated Graph Learning for EV Charging Demand Forecasting with Personalization Against Cyberattacks
Authors: Yi Li, Renyou Xie, Chaojie Li, Yi Wang, Zhaoyang Dong
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2405.00742
Pdf link: https://arxiv.org/pdf/2405.00742
Abstract Mitigating cybersecurity risk in electric vehicle (EV) charging demand forecasting plays a crucial role in the safe operation of collective EV chargings, the stability of the power grid, and the cost-effective infrastructure expansion. However, existing methods either suffer from the data privacy issue and the susceptibility to cyberattacks or fail to consider the spatial correlation among different stations. To address these challenges, a federated graph learning approach involving multiple charging stations is proposed to collaboratively train a more generalized deep learning model for demand forecasting while capturing spatial correlations among various stations and enhancing robustness against potential attacks. Firstly, for better model performance, a Graph Neural Network (GNN) model is leveraged to characterize the geographic correlation among different charging stations in a federated manner. Secondly, to ensure robustness and deal with the data heterogeneity in a federated setting, a message passing that utilizes a global attention mechanism to aggregate personalized models for each client is proposed. Thirdly, by concerning cyberattacks, a special credit-based function is designed to mitigate potential threats from malicious clients or unwanted attacks. Extensive experiments on a public EV charging dataset are conducted using various deep learning techniques and federated learning methods to demonstrate the prediction accuracy and robustness of the proposed approach.
The Impact of IMSI Catcher Deployments on Cellular Network Security: Challenges and Countermeasures in 4G and 5G Networks
Authors: Karwan Mustafa Kareem
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.00793
Pdf link: https://arxiv.org/pdf/2405.00793
Abstract IMSI (International Mobile Subscriber Identity) catchers, also known as "Stingrays" or "cell site simulators," are rogue devices that pose a significant threat to cellular network security [1]. IMSI catchers can intercept and manipulate cellular communications, compromising the privacy and security of mobile devices and their users. With the advent of 4G and 5G networks, IMSI catchers have become more sophisticated and pose new challenges to cellular network security [2]. This paper provides an overview of the impact of IMSI catcher deployments on cellular network security in the context of 4G and 5G networks. It discusses the challenges posed by IMSI catchers, including the unauthorized collection of IMSI numbers, interception of communications, and potential misuse of subscriber information. It also highlights the potential consequences of IMSI catcher deployments, including the compromise of user privacy, financial fraud, and unauthorized surveillance. The paper further reviews the countermeasures that can be employed to mitigate the risks posed by IMSI catchers. These countermeasures include network-based solutions such as signal analysis, encryption, and authentication mechanisms, as well as user-based solutions such as mobile applications and device settings. The paper also discusses the limitations and effectiveness of these countermeasures in the context of 4G and 5G networks. Finally, the paper identifies research gaps and future directions for enhancing cellular network security against IMSI catchers in the era of 4G and 5G networks. This includes the need for improved encryption algorithms, authentication mechanisms, and detection techniques to effectively detect and prevent IMSI catcher deployments. The paper also emphasizes the importance of regulatory and policy measures to govern the deployment and use of IMSI catchers to protect user privacy and security.
Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning
Authors: Seyed Mahmoud Sajjadi Mohammadabadi, Lei Yang, Feng Yan, Junshan Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2405.00839
Pdf link: https://arxiv.org/pdf/2405.00839
Abstract Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. However, inherent heterogeneity in agents' resources (computation, communication, and task size) may lead to substantial variations in training time. This heterogeneity creates a bottleneck, lengthening the overall training time due to straggler effects and potentially wasting spare resources of faster agents. To minimize training time in heterogeneous environments, we present a Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning (ComDML), which balances the workload among agents through a decentralized approach. Leveraging local-loss split training, ComDML enables parallel updates, where slower agents offload part of their workload to faster agents. To minimize the overall training time, ComDML optimizes the workload balancing by jointly considering the communication and computation capacities of agents, which hinges upon integer programming. A dynamic decentralized pairing scheduler is developed to efficiently pair agents and determine optimal offloading amounts. We prove that in ComDML, both slower and faster agents' models converge, for convex and non-convex functions. Furthermore, extensive experimental results on popular datasets (CIFAR-10, CIFAR-100, and CINIC-10) and their non-I.I.D. variants, with large models such as ResNet-56 and ResNet-110, demonstrate that ComDML can significantly reduce the overall training time while maintaining model accuracy, compared to state-of-the-art methods. ComDML demonstrates robustness in heterogeneous environments, and privacy measures can be seamlessly integrated for enhanced data protection.
Quantum Federated Learning Experiments in the Cloud with Data Encoding
Authors: Shiva Raj Pokhrel, Naman Yash, Jonathan Kua, Gang Li, Lei Pan
Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2405.00909
Pdf link: https://arxiv.org/pdf/2405.00909
Abstract Quantum Federated Learning (QFL) is an emerging concept that aims to unfold federated learning (FL) over quantum networks, enabling collaborative quantum model training along with local data privacy. We explore the challenges of deploying QFL on cloud platforms, emphasizing quantum intricacies and platform limitations. The proposed data-encoding-driven QFL, with a proof of concept (GitHub Open Source) using genomic data sets on quantum simulators, shows promising results.
Recovering Labels from Local Updates in Federated Learning
Authors: Huancheng Chen, Haris Vikalo
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.00955
Pdf link: https://arxiv.org/pdf/2405.00955
Abstract Gradient inversion (GI) attacks present a threat to the privacy of clients in federated learning (FL) by aiming to enable reconstruction of the clients' data from communicated model updates. A number of such techniques attempts to accelerate data recovery by first reconstructing labels of the samples used in local training. However, existing label extraction methods make strong assumptions that typically do not hold in realistic FL settings. In this paper we present a novel label recovery scheme, Recovering Labels from Local Updates (RLU), which provides near-perfect accuracy when attacking untrained (most vulnerable) models. More significantly, RLU achieves high performance even in realistic real-world settings where the clients in an FL system run multiple local epochs, train on heterogeneous data, and deploy various optimizers to minimize different objective functions. Specifically, RLU estimates labels by solving a least-square problem that emerges from the analysis of the correlation between labels of the data points used in a training round and the resulting update of the output layer. The experimental results on several datasets, architectures, and data heterogeneity scenarios demonstrate that the proposed method consistently outperforms existing baselines, and helps improve quality of the reconstructed images in GI attacks in terms of both PSNR and LPIPS.
FREE: Faster and Better Data-Free Meta-Learning
Authors: Yongxian Wei, Zixuan Hu, Zhenyi Wang, Li Shen, Chun Yuan, Dacheng Tao
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.00984
Pdf link: https://arxiv.org/pdf/2405.00984
Abstract Data-Free Meta-Learning (DFML) aims to extract knowledge from a collection of pre-trained models without requiring the original data, presenting practical benefits in contexts constrained by data privacy concerns. Current DFML methods primarily focus on the data recovery from these pre-trained models. However, they suffer from slow recovery speed and overlook gaps inherent in heterogeneous pre-trained models. In response to these challenges, we introduce the Faster and Better Data-Free Meta-Learning (FREE) framework, which contains: (i) a meta-generator for rapidly recovering training tasks from pre-trained models; and (ii) a meta-learner for generalizing to new unseen tasks. Specifically, within the module Faster Inversion via Meta-Generator, each pre-trained model is perceived as a distinct task. The meta-generator can rapidly adapt to a specific task in just five steps, significantly accelerating the data recovery. Furthermore, we propose Better Generalization via Meta-Learner and introduce an implicit gradient alignment algorithm to optimize the meta-learner. This is achieved as aligned gradient directions alleviate potential conflicts among tasks from heterogeneous pre-trained models. Empirical experiments on multiple benchmarks affirm the superiority of our approach, marking a notable speed-up (20$\times$) and performance enhancement (1.42\% $\sim$ 4.78\%) in comparison to the state-of-the-art.
Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment
Authors: Aditya Chakravarty
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2405.01004
Pdf link: https://arxiv.org/pdf/2405.01004
Abstract Recent transformer-based ASR models have achieved word-error rates (WER) below 4%, surpassing human annotator accuracy, yet they demand extensive server resources, contributing to significant carbon footprints. The traditional server-based architecture of ASR also presents privacy concerns, alongside reliability and latency issues due to network dependencies. In contrast, on-device (edge) ASR enhances privacy, boosts performance, and promotes sustainability by effectively balancing energy use and accuracy for specific applications. This study examines the effects of quantization, memory demands, and energy consumption on the performance of various ASR model inference on the NVIDIA Jetson Orin Nano. By analyzing WER and transcription speed across models using FP32, FP16, and INT8 quantization on clean and noisy datasets, we highlight the crucial trade-offs between accuracy, speeds, quantization, energy efficiency, and memory needs. We found that changing precision from fp32 to fp16 halves the energy consumption for audio transcription across different models, with minimal performance degradation. A larger model size and number of parameters neither guarantees better resilience to noise, nor predicts the energy consumption for a given transcription load. These, along with several other findings offer novel insights for optimizing ASR systems within energy- and memory-limited environments, crucial for the development of efficient on-device ASR solutions. The code and input data needed to reproduce the results in this article are open sourced are available on [https://github.com/zzadiues3338/ASR-energy-jetson].
Towards Trust Proof for Secure Confidential Virtual Machines
Authors: Jingkai Mao, Haoran Zhu, Junchao Fan, Lin Li, Xiaolin Chang
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2405.01030
Pdf link: https://arxiv.org/pdf/2405.01030
Abstract The Virtual Machine (VM)-based Trusted-Execution-Environment (TEE) technology, like AMD Secure-Encrypted-Virtualization (SEV), enables the establishment of Confidential VMs (CVMs) to protect data privacy. But CVM lacks ways to provide the trust proof of its running state, degrading the user confidence of using CVM. The technology of virtual Trusted Platform Module (vTPM) can be used to generate trust proof for CVM. However, the existing vTPM-based approaches have the weaknesses like lack of a well-defined root-of-trust, lack of vTPM protection, and lack of vTPM's trust proof. These weaknesses prevent the generation of the trust proof of the CVM. This paper proposes an approach to generate the trust proof for AMD SEV-based CVM so as to ensure its security by using a secure vTPM to construct Trusted Complete Chain for the CVM (T3CVM). T3CVM consists of three components: 1) TR-Manager, as the well-defined root-of-trust, helps to build complete trust chains for CVMs; 2) CN-TPMCVM, a special CVM provides secure vTPMs; 3) CN-CDriver, an enhanced TPM driver. Our approach overcomes the weaknesses of existing approaches and enables trusted computing-based applications to run seamlessly in the trusted CVM. We perform a formal security analysis of T3CVM, and implement a prototype system to evaluate its performance.
The Privacy Power of Correlated Noise in Decentralized Learning
Authors: Youssef Allouah, Anastasia Koloskova, Aymane El Firdoussi, Martin Jaggi, Rachid Guerraoui
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2405.01031
Pdf link: https://arxiv.org/pdf/2405.01031
Abstract Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose Decor, a variant of decentralized SGD with differential privacy (DP) guarantees. Essentially, in Decor, users securely exchange randomness seeds in one communication round to generate pairwise-canceling correlated Gaussian noises, which are injected to protect local models at every communication round. We theoretically and empirically show that, for arbitrary connected graphs, Decor matches the central DP optimal privacy-utility trade-off. We do so under SecLDP, our new relaxation of local DP, which protects all user communications against an external eavesdropper and curious users, assuming that every pair of connected users shares a secret, i.e., an information hidden to all others. The main theoretical challenge is to control the accumulation of non-canceling correlated noise due to network sparsity. We also propose a companion SecLDP privacy accountant for public use.
Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization Approach
Authors: Tianhao Shi, Yang Zhang, Jizhi Zhang, Fuli Feng, Xiangnan He
Subjects: Information Retrieval (cs.IR); Computers and Society (cs.CY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01063
Pdf link: https://arxiv.org/pdf/2405.01063
Abstract As recommender systems are indispensable in various domains such as job searching and e-commerce, providing equitable recommendations to users with different sensitive attributes becomes an imperative requirement. Prior approaches for enhancing fairness in recommender systems presume the availability of all sensitive attributes, which can be difficult to obtain due to privacy concerns or inadequate means of capturing these attributes. In practice, the efficacy of these approaches is limited, pushing us to investigate ways of promoting fairness with limited sensitive attribute information. Toward this goal, it is important to reconstruct missing sensitive attributes. Nevertheless, reconstruction errors are inevitable due to the complexity of real-world sensitive attribute reconstruction problems and legal regulations. Thus, we pursue fair learning methods that are robust to reconstruction errors. To this end, we propose Distributionally Robust Fair Optimization (DRFO), which minimizes the worst-case unfairness over all potential probability distributions of missing sensitive attributes instead of the reconstructed one to account for the impact of the reconstruction errors. We provide theoretical and empirical evidence to demonstrate that our method can effectively ensure fairness in recommender systems when only limited sensitive attributes are accessible.
Boosting Communication Efficiency of Federated Learning's Secure Aggregation
Authors: Niousha Nazemi, Omid Tavallaie, Shuaijun Chen, Albert Y. Zomaya, Ralph Holz
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.01144
Pdf link: https://arxiv.org/pdf/2405.01144
Abstract Federated Learning (FL) is a decentralized machine learning approach where client devices train models locally and send them to a server that performs aggregation to generate a global model. FL is vulnerable to model inversion attacks, where the server can infer sensitive client data from trained models. Google's Secure Aggregation (SecAgg) protocol addresses this data privacy issue by masking each client's trained model using shared secrets and individual elements generated locally on the client's device. Although SecAgg effectively preserves privacy, it imposes considerable communication and computation overhead, especially as network size increases. Building upon SecAgg, this poster introduces a Communication-Efficient Secure Aggregation (CESA) protocol that substantially reduces this overhead by using only two shared secrets per client to mask the model. We propose our method for stable networks with low delay variation and limited client dropouts. CESA is independent of the data distribution and network size (for higher than 6 nodes), preventing the honest-but-curious server from accessing unmasked models. Our initial evaluation reveals that CESA significantly reduces the communication cost compared to SecAgg.
Gradient-Congruity Guided Federated Sparse Training
Authors: Chris Xing Tian, Yibing Liu, Haoliang Li, Ray C.C. Cheung, Shiqi Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.01189
Pdf link: https://arxiv.org/pdf/2405.01189
Abstract Edge computing allows artificial intelligence and machine learning models to be deployed on edge devices, where they can learn from local data and collaborate to form a global model. Federated learning (FL) is a distributed machine learning technique that facilitates this process while preserving data privacy. However, FL also faces challenges such as high computational and communication costs regarding resource-constrained devices, and poor generalization performance due to the heterogeneity of data across edge clients and the presence of out-of-distribution data. In this paper, we propose the Gradient-Congruity Guided Federated Sparse Training (FedSGC), a novel method that integrates dynamic sparse training and gradient congruity inspection into federated learning framework to address these issues. Our method leverages the idea that the neurons, in which the associated gradients with conflicting directions with respect to the global model contain irrelevant or less generalized information for other clients, and could be pruned during the sparse training process. Conversely, the neurons where the associated gradients with consistent directions could be grown in a higher priority. In this way, FedSGC can greatly reduce the local computation and communication overheads while, at the same time, enhancing the generalization abilities of FL. We evaluate our method on challenging non-i.i.d settings and show that it achieves competitive accuracy with state-of-the-art FL methods across various scenarios while minimizing computation and communication costs.
Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features
Authors: Francisco Teixeira, Karla Pizzi, Raphael Olivier, Alberto Abad, Bhiksha Raj, Isabel Trancoso
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2405.01207
Pdf link: https://arxiv.org/pdf/2405.01207
Abstract Membership Inference (MI) poses a substantial privacy threat to the training data of Automatic Speech Recognition (ASR) systems, while also offering an opportunity to audit these models with regard to user data. This paper explores the effectiveness of loss-based features in combination with Gaussian and adversarial perturbations to perform MI in ASR models. To the best of our knowledge, this approach has not yet been investigated. We compare our proposed features with commonly used error-based features and find that the proposed features greatly enhance performance for sample-level MI. For speaker-level MI, these features improve results, though by a smaller margin, as error-based features already obtained a high performance for this task. Our findings emphasise the importance of considering different feature sets and levels of access to target models for effective MI in ASR systems, providing valuable insights for auditing such models.
A Survey on Semantic Communication Networks: Architecture, Security, and Privacy
Authors: Shaolong Guo, Yuntao Wang, Ning Zhang, Zhou Su, Tom H. Luan, Zhiyi Tian, Xuemin Shen
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2405.01221
Pdf link: https://arxiv.org/pdf/2405.01221
Abstract Semantic communication, emerging as a breakthrough beyond the classical Shannon paradigm, aims to convey the essential meaning of source data rather than merely focusing on precise yet content-agnostic bit transmission. By interconnecting diverse intelligent agents (e.g., autonomous vehicles and VR devices) via semantic communications, the semantic communication networks (SemComNet) supports semantic-oriented transmission, efficient spectrum utilization, and flexible networking among collaborative agents. Consequently, SemComNet stands out for enabling ever-increasing intelligent applications, such as autonomous driving and Metaverse. However, being built on a variety of cutting-edge technologies including AI and knowledge graphs, SemComNet introduces diverse brand-new and unexpected threats, which pose obstacles to its widespread development. Besides, due to the intrinsic characteristics of SemComNet in terms of heterogeneous components, autonomous intelligence, and large-scale structure, a series of critical challenges emerge in securing SemComNet. In this paper, we provide a comprehensive and up-to-date survey of SemComNet from its fundamentals, security, and privacy aspects. Specifically, we first introduce a novel three-layer architecture of SemComNet for multi-agent interaction, which comprises the control layer, semantic transmission layer, and cognitive sensing layer. Then, we discuss its working modes and enabling technologies. Afterward, based on the layered architecture of SemComNet, we outline a taxonomy of security and privacy threats, while discussing state-of-the-art defense approaches. Finally, we present future research directions, clarifying the path toward building intelligent, robust, and green SemComNet. To our knowledge, this survey is the first to comprehensively cover the fundamentals of SemComNet, alongside a detailed analysis of its security and privacy issues.
Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration
Authors: Praveen Kumar Chandaliya, Kiran Raja, Raghavendra Ramachandra, Zahid Akhtar, Christoph Busch
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.01273
Pdf link: https://arxiv.org/pdf/2405.01273
Abstract Numerous studies have shown that existing Face Recognition Systems (FRS), including commercial ones, often exhibit biases toward certain ethnicities due to under-represented data. In this work, we explore ethnicity alteration and skin tone modification using synthetic face image generation methods to increase the diversity of datasets. We conduct a detailed analysis by first constructing a balanced face image dataset representing three ethnicities: Asian, Black, and Indian. We then make use of existing Generative Adversarial Network-based (GAN) image-to-image translation and manifold learning models to alter the ethnicity from one to another. A systematic analysis is further conducted to assess the suitability of such datasets for FRS by studying the realistic skin-tone representation using Individual Typology Angle (ITA). Further, we also analyze the quality characteristics using existing Face image quality assessment (FIQA) approaches. We then provide a holistic FRS performance analysis using four different systems. Our findings pave the way for future research works in (i) developing both specific ethnicity and general (any to any) ethnicity alteration models, (ii) expanding such approaches to create databases with diverse skin tones, (iii) creating datasets representing various ethnicities which further can help in mitigating bias while addressing privacy concerns.
Privacy-Enhanced Database Synthesis for Benchmark Publishing
Authors: Yongrui Zhong, Yunqing Ge, Jianbin Qin, Shuyuan Zheng, Bo Tang, Yu-Xuan Qiu, Rui Mao, Ye Yuan, Makoto Onizuka, Chuan Xiao
Subjects: Databases (cs.DB); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.01312
Pdf link: https://arxiv.org/pdf/2405.01312
Abstract Benchmarking is crucial for evaluating a DBMS, yet existing benchmarks often fail to reflect the varied nature of user workloads. As a result, there is increasing momentum toward creating databases that incorporate real-world user data to more accurately mirror business environments. However, privacy concerns deter users from directly sharing their data, underscoring the importance of creating synthesized databases for benchmarking that also prioritize privacy protection. Differential privacy has become a key method for safeguarding privacy when sharing data, but the focus has largely been on minimizing errors in aggregate queries or classification tasks, with less attention given to benchmarking factors like runtime performance. This paper delves into the creation of privacy-preserving databases specifically for benchmarking, aiming to produce a differentially private database whose query performance closely resembles that of the original data. Introducing PrivBench, an innovative synthesis framework, we support the generation of high-quality data that maintains privacy. PrivBench uses sum-product networks (SPNs) to partition and sample data, enhancing data representation while securing privacy. The framework allows users to adjust the detail of SPN partitions and privacy settings, crucial for customizing privacy levels. We validate our approach, which uses the Laplace and exponential mechanisms, in maintaining privacy. Our tests show that PrivBench effectively generates data that maintains privacy and excels in query performance, consistently reducing errors in query execution time, query cardinality, and KL divergence.
IDPFilter: Mitigating Interdependent Privacy Issues in Third-Party Apps
Authors: Shuaishuai Liu, Gergely Biczók
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.01411
Pdf link: https://arxiv.org/pdf/2405.01411
Abstract Third-party applications have become an essential part of today's online ecosystem, enhancing the functionality of popular platforms. However, the intensive data exchange underlying their proliferation has increased concerns about interdependent privacy (IDP). This paper provides a comprehensive investigation into the previously underinvestigated IDP issues of third-party apps. Specifically, first, we analyze the permission structure of multiple app platforms, identifying permissions that have the potential to cause interdependent privacy issues by enabling a user to share someone else's personal data with an app. Second, we collect datasets and characterize the extent to which existing apps request these permissions, revealing the relationship between characteristics such as the respective app platform, the app's type, and the number of interdependent privacy-related permissions it requests. Third, we analyze the various reasons IDP is neglected by both data protection regulations and app platforms and then devise principles that should be followed when designing a mitigation solution. Finally, based on these principles and satisfying clearly defined objectives, we propose IDPFilter, a platform-agnostic API that enables application providers to minimize collateral information collection by filtering out data collected from their users but implicating others as data subjects. We implement a proof-of-concept prototype, IDPTextFilter, that implements the filtering logic on textual data, and provide its initial performance evaluation with regard to privacy, accuracy, and efficiency.
Exploring Privacy Issues in Mission Critical Communication: Navigating 5G and Beyond Networks
Authors: Prajnamaya Dass, Marcel Gräfenstein, Stefan Köpsell
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2405.01492
Pdf link: https://arxiv.org/pdf/2405.01492
Abstract Mission critical communication (MCC) involves the exchange of information and data among emergency services, including the police, fire brigade, and other first responders, particularly during emergencies, disasters, or critical incidents. The widely-adopted TETRA (Terrestrial Trunked Radio)-based communication for mission critical services faces challenges including limited data capacity, coverage limitations, spectrum congestion, and security concerns. Therefore, as an alternative, mission critical communication over cellular networks (4G and 5G) has emerged. While cellular-based MCC enables features like real-time video streaming and high-speed data transmission, the involvement of network operators and application service providers in the MCC architecture raises privacy concerns for mission critical users and services. For instance, the disclosure of a policeman's location details to the network operator raises privacy concerns. To the best of our knowledge, no existing work considers the privacy issues in mission critical system with respect to 5G and upcoming technologies. Therefore, in this paper, we analyse the 3GPP standardised MCC architecture within the context of 5G core network concepts and assess the privacy implications for MC users, network entities, and MC servers. The privacy analysis adheres to the deployment strategies in the standard for MCC. Additionally, we explore emerging 6G technologies, such as off-network communications, joint communication and sensing, and non-3GPP communications, to identify privacy challenges in MCC architecture. Finally, we propose privacy controls to establish a next-generation privacy-preserving MCC architecture.
Navigating Heterogeneity and Privacy in One-Shot Federated Learning with Diffusion Models
Authors: Matias Mendieta, Guangyu Sun, Chen Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01494
Pdf link: https://arxiv.org/pdf/2405.01494
Abstract Federated learning (FL) enables multiple clients to train models collectively while preserving data privacy. However, FL faces challenges in terms of communication cost and data heterogeneity. One-shot federated learning has emerged as a solution by reducing communication rounds, improving efficiency, and providing better security against eavesdropping attacks. Nevertheless, data heterogeneity remains a significant challenge, impacting performance. This work explores the effectiveness of diffusion models in one-shot FL, demonstrating their applicability in addressing data heterogeneity and improving FL performance. Additionally, we investigate the utility of our diffusion model approach, FedDiff, compared to other one-shot FL methods under differential privacy (DP). Furthermore, to improve generated sample quality under DP settings, we propose a pragmatic Fourier Magnitude Filtering (FMF) method, enhancing the effectiveness of generated data for global model training.
Keyword: machine learning

Understanding Social Perception, Interactions, and Safety Aspects of Sidewalk Delivery Robots Using Sentiment Analysis
Authors: Yuchen Du, Tho V. Le
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00688
Pdf link: https://arxiv.org/pdf/2405.00688
Abstract This article presents a comprehensive sentiment analysis (SA) of comments on YouTube videos related to Sidewalk Delivery Robots (SDRs). We manually annotated the collected YouTube comments with three sentiment labels: negative (0), positive (1), and neutral (2). We then constructed models for text sentiment classification and tested the models' performance on both binary and ternary classification tasks in terms of accuracy, precision, recall, and F1 score. Our results indicate that, in binary classification tasks, the Support Vector Machine (SVM) model using Term Frequency-Inverse Document Frequency (TF-IDF) and N-gram get the highest accuracy. In ternary classification tasks, the model using Bidirectional Encoder Representations from Transformers (BERT), Long Short-Term Memory Networks (LSTM) and Gated Recurrent Unit (GRU) significantly outperforms other machine learning models, achieving an accuracy, precision, recall, and F1 score of 0.78. Additionally, we employ the Latent Dirichlet Allocation model to generate 10 topics from the comments to explore the public's underlying views on SDRs. Drawing from these findings, we propose targeted recommendations for shaping future policies concerning SDRs. This work provides valuable insights for stakeholders in the SDR sector regarding social perception, interaction, and safety.
Joint torques prediction of a robotic arm using neural networks
Authors: Giulia d'Addato, Ruggero Carli, Eurico Pedrosa, Artur Pereira, Luigi Palopoli, Daniele Fontanelli
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00695
Pdf link: https://arxiv.org/pdf/2405.00695
Abstract Accurate dynamic models are crucial for many robotic applications. Traditional approaches to deriving these models are based on the application of Lagrangian or Newtonian mechanics. Although these methods provide a good insight into the physical behaviour of the system, they rely on the exact knowledge of parameters such as inertia, friction and joint flexibility. In addition, the system is often affected by uncertain and nonlinear effects, such as saturation and dead zones, which can be difficult to model. A popular alternative is the application of Machine Learning (ML) techniques - e.g., Neural Networks (NNs) - in the context of a "black-box" methodology. This paper reports on our experience with this approach for a real-life 6 degrees of freedom (DoF) manipulator. Specifically, we considered several NN architectures: single NN, multiple NNs, and cascade NN. We compared the performance of the system by using different policies for selecting the NN hyperparameters. Our experiments reveal that the best accuracy and performance are obtained by a cascade NN, in which we encode our prior physical knowledge about the dependencies between joints, complemented by an appropriate optimisation of the hyperparameters.
Interactive Analysis of LLMs using Meaningful Counterfactuals
Authors: Furui Cheng, Vilém Zouhar, Robin Shing Moon Chan, Daniel Fürst, Hendrik Strobelt, Mennatallah El-Assady
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00708
Pdf link: https://arxiv.org/pdf/2405.00708
Abstract Counterfactual examples are useful for exploring the decision boundaries of machine learning models and determining feature attributions. How can we apply counterfactual-based methods to analyze and explain LLMs? We identify the following key challenges. First, the generated textual counterfactuals should be meaningful and readable to users and thus can be mentally compared to draw conclusions. Second, to make the solution scalable to long-form text, users should be equipped with tools to create batches of counterfactuals from perturbations at various granularity levels and interactively analyze the results. In this paper, we tackle the above challenges and contribute 1) a novel algorithm for generating batches of complete and meaningful textual counterfactuals by removing and replacing text segments in different granularities, and 2) LLM Analyzer, an interactive visualization tool to help users understand an LLM's behaviors by interactively inspecting and aggregating meaningful counterfactuals. We evaluate the proposed algorithm by the grammatical correctness of its generated counterfactuals using 1,000 samples from medical, legal, finance, education, and news datasets. In our experiments, 97.2% of the counterfactuals are grammatically correct. Through a use case, user studies, and feedback from experts, we demonstrate the usefulness and usability of the proposed interactive visualization tool.
HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis
Authors: Andy He, Darren Key, Mason Bulling, Andrew Chang, Skyler Shapiro, Everett Lee
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00738
Pdf link: https://arxiv.org/pdf/2405.00738
Abstract Graphics Processing Units (GPUs) have become the leading hardware accelerator for deep learning applications and are used widely in training and inference of transformers; transformers have achieved state-of-the-art performance in many areas of machine learning and are especially used in most modern Large Language Models (LLMs). However, GPUs require large amounts of energy, which poses environmental concerns, demands high operational costs, and causes GPUs to be unsuitable for edge computing. We develop an accelerator for transformers, namely, Llama 2, an open-source state-of-the-art LLM, using high level synthesis (HLS) on Field Programmable Gate Arrays (FPGAs). HLS allows us to rapidly prototype FPGA designs without writing code at the register-transfer level (RTL). We name our method HLSTransform, and the FPGA designs we synthesize with HLS achieve up to a 12.75x reduction and 8.25x reduction in energy used per token on the Xilinx Virtex UltraScale+ VU9P FPGA compared to an Intel Xeon Broadwell E5-2686 v4 CPU and NVIDIA RTX 3090 GPU respectively, while increasing inference speeds by up to 2.46x compared to CPU and maintaining 0.53x the speed of an RTX 3090 GPU despite the GPU's 4 times higher base clock rate. With the lack of existing open-source FPGA accelerators for transformers, we open-source our code and document our steps for synthesis. We hope this work will serve as a step in democratizing the use of FPGAs in transformer inference and inspire research into energy-efficient inference methods as a whole. The code can be found on https://github.com/HLSTransform/submission.
On the weight dynamics of learning networks
Authors: Nahal Sharafi, Christoph Martin, Sarah Hallerberg
Subjects: Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)
Arxiv link: https://arxiv.org/abs/2405.00743
Pdf link: https://arxiv.org/pdf/2405.00743
Abstract Neural networks have become a widely adopted tool for tackling a variety of problems in machine learning and artificial intelligence. In this contribution we use the mathematical framework of local stability analysis to gain a deeper understanding of the learning dynamics of feed forward neural networks. Therefore, we derive equations for the tangent operator of the learning dynamics of three-layer networks learning regression tasks. The results are valid for an arbitrary numbers of nodes and arbitrary choices of activation functions. Applying the results to a network learning a regression task, we investigate numerically, how stability indicators relate to the final training-loss. Although the specific results vary with different choices of initial conditions and activation functions, we demonstrate that it is possible to predict the final training loss, by monitoring finite-time Lyapunov exponents or covariant Lyapunov vectors during the training process.
Quantum AI for Alzheimer's disease early screening
Authors: Giacomo Cappiello, Filippo Caruso
Subjects: Emerging Technologies (cs.ET); Machine Learning (cs.LG); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2405.00755
Pdf link: https://arxiv.org/pdf/2405.00755
Abstract Quantum machine learning is a new research field combining quantum information science and machine learning. Quantum computing technologies seem to be particularly well suited to solving problems in the health sector in an efficient way, because they may deal with large datasets more efficiently than classical AI. Alzheimer's disease is a neurodegenerative brain disorder that mostly affects elderly people, causing important cognitive impairments. It is the most common cause of dementia and it has an effect on memory, thought, learning abilities and movement control. This type of disease has no cure, consequently an early diagnosis is fundamental for reducing its impact. The analysis of handwriting can be effective for diagnosing, as many researches have conjectured. The DARWIN (Diagnosis AlzheimeR WIth haNdwriting) dataset contains handwriting samples from people affected by Alzheimer's disease and a group of healthy people. Here we apply quantum AI to this use-case. In particular, we use this dataset to test kernel methods for classification task and compare their performances with the ones obtained via quantum machine learning methods. We find that quantum and classical algorithms achieve similar performances and in some cases quantum methods perform even better. Our results pave the way for future new quantum machine learning applications in early-screening diagnostics in the healthcare domain.
HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond
Authors: Stefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00820
Pdf link: https://arxiv.org/pdf/2405.00820
Abstract Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extensibility, or lack of reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs, limiting wider adoption of such datasets. In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture ensures broad design space coverage via design space expansion and supports multiple vendor tools. Users can contribute to each stage with their own HLS designs and synthesis results and extend the framework itself with custom frontends and tool flows. We also include an initial set of built-in designs from common HLS benchmarks curated open-source HLS designs. We showcase the versatility and multi-functionality of our framework through six case studies: I) Design space sampling; II) Fine-grained parallelism backend speedup; III) Targeting Intel's HLS flow; IV) Adding new auxiliary designs; V) Integrating published HLS data; VI) HLS tool version regression benchmarking. Code at https://github.com/sharc-lab/HLSFactory.
Public Computing Intellectuals in the Age of AI Crisis
Authors: Randy Connolly
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2405.00860
Pdf link: https://arxiv.org/pdf/2405.00860
Abstract The belief that AI technology is on the cusp of causing a generalized social crisis became a popular one in 2023. Interestingly, some of these worries were voiced from within the tech sector itself. While there was no doubt an element of hype and exaggeration to some of these accounts, they do reflect the fact that there are troubling ramifications to this technology stack. This conjunction of shared concerns about social, political, and personal futures presaged by current developments in machine learning and data science presents the academic discipline of computing with a rare opportunity for self-examination and reconfiguration. This position paper endeavors to do so in four sections. The first expands on the nature of the AI crisis for computing. The second articulates possible critical responses to this crisis and advocates for a broader analytic focus on power relations. The third section presents a novel characterization of academic computing's epistemological field, one which includes not only the discipline's usual instrumental forms of knowledge but reflexive knowledge as well. This reflexive dimension integrates both the critical and public functions of the discipline as equal intellectual partners and a necessary component of any contemporary academic field. The final section will advocate for a conceptual archetype--the Public Computer Intellectual--as a way of practically imagining the expanded possibilities of academic practice in our discipline, one that provides both self-critique and an outward-facing orientation towards the public good. It will argue that the computer education research community can play a vital role in this regard.
Learning to Boost the Performance of Stable Nonlinear Systems
Authors: Luca Furieri, Clara Lucía Galimberti, Giancarlo Ferrari-Trecate
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00871
Pdf link: https://arxiv.org/pdf/2405.00871
Abstract The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over arbitrarily deep neural network classes of performance-boosting controllers for stable nonlinear systems; crucially, we guarantee Lp closed-loop stability even if optimization is halted prematurely, and even when the ground-truth dynamics are unknown, with vanishing conservatism in the class of stabilizing policies as the model uncertainty is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely shaping the cost functions through several numerical experiments.
Artificial intelligence for context-aware visual change detection in software test automation
Authors: Milad Moradi, Ke Yan, David Colwell, Rhona Asgari
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.00874
Pdf link: https://arxiv.org/pdf/2405.00874
Abstract Automated software testing is integral to the software development process, streamlining workflows and ensuring product reliability. Visual testing within this context, especially concerning user interface (UI) and user experience (UX) validation, stands as one of crucial determinants of overall software quality. Nevertheless, conventional methods like pixel-wise comparison and region-based visual change detection fall short in capturing contextual similarities, nuanced alterations, and understanding the spatial relationships between UI elements. In this paper, we introduce a novel graph-based method for visual change detection in software test automation. Leveraging a machine learning model, our method accurately identifies UI controls from software screenshots and constructs a graph representing contextual and spatial relationships between the controls. This information is then used to find correspondence between UI controls within screenshots of different versions of a software. The resulting graph encapsulates the intricate layout of the UI and underlying contextual relations, providing a holistic and context-aware model. This model is finally used to detect and highlight visual regressions in the UI. Comprehensive experiments on different datasets showed that our change detector can accurately detect visual software changes in various simple and complex test scenarios. Moreover, it outperformed pixel-wise comparison and region-based baselines by a large margin in more complex testing scenarios. This work not only contributes to the advancement of visual change detection but also holds practical implications, offering a robust solution for real-world software test automation challenges, enhancing reliability, and ensuring the seamless evolution of software interfaces.
Wake Vision: A Large-scale, Diverse Dataset and Benchmark Suite for TinyML Person Detection
Authors: Colby Banbury, Emil Njor, Matthew Stewart, Pete Warden, Manjunath Kudlur, Nat Jeffries, Xenofon Fafoutis, Vijay Janapa Reddi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.00892
Pdf link: https://arxiv.org/pdf/2405.00892
Abstract Machine learning applications on extremely low-power devices, commonly referred to as tiny machine learning (TinyML), promises a smarter and more connected world. However, the advancement of current TinyML research is hindered by the limited size and quality of pertinent datasets. To address this challenge, we introduce Wake Vision, a large-scale, diverse dataset tailored for person detection -- the canonical task for TinyML visual sensing. Wake Vision comprises over 6 million images, which is a hundredfold increase compared to the previous standard, and has undergone thorough quality filtering. Using Wake Vision for training results in a 2.41\% increase in accuracy compared to the established benchmark. Alongside the dataset, we provide a collection of five detailed benchmark sets that assess model performance on specific segments of the test data, such as varying lighting conditions, distances from the camera, and demographic characteristics of subjects. These novel fine-grained benchmarks facilitate the evaluation of model quality in challenging real-world scenarios that are often ignored when focusing solely on overall accuracy. Through an evaluation of a MobileNetV2 TinyML model on the benchmarks, we show that the input resolution plays a more crucial role than the model width in detecting distant subjects and that the impact of quantization on model robustness is minimal, thanks to the dataset quality. These findings underscore the importance of a detailed evaluation to identify essential factors for model development. The dataset, benchmark suite, code, and models are publicly available under the CC-BY 4.0 license, enabling their use for commercial use cases.
De-Biasing Models of Biased Decisions: A Comparison of Methods Using Mortgage Application Data
Authors: Nicholas Tenev
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Econometrics (econ.EM)
Arxiv link: https://arxiv.org/abs/2405.00910
Pdf link: https://arxiv.org/pdf/2405.00910
Abstract Prediction models can improve efficiency by automating decisions such as the approval of loan applications. However, they may inherit bias against protected groups from the data they are trained on. This paper adds counterfactual (simulated) ethnic bias to real data on mortgage application decisions, and shows that this bias is replicated by a machine learning model (XGBoost) even when ethnicity is not used as a predictive variable. Next, several other de-biasing methods are compared: averaging over prohibited variables, taking the most favorable prediction over prohibited variables (a novel method), and jointly minimizing errors as well as the association between predictions and prohibited variables. De-biasing can recover some of the original decisions, but the results are sensitive to whether the bias is effected through a proxy.
LOQA: Learning with Opponent Q-Learning Awareness
Authors: Milad Aghajohari, Juan Agustin Duque, Tim Cooijmans, Aaron Courville
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01035
Pdf link: https://arxiv.org/pdf/2405.01035
Abstract In various real-world scenarios, interactions among agents often resemble the dynamics of general-sum games, where each agent strives to optimize its own utility. Despite the ubiquitous relevance of such settings, decentralized machine learning algorithms have struggled to find equilibria that maximize individual utility while preserving social welfare. In this paper we introduce Learning with Opponent Q-Learning Awareness (LOQA), a novel, decentralized reinforcement learning algorithm tailored to optimizing an agent's individual utility while fostering cooperation among adversaries in partially competitive environments. LOQA assumes the opponent samples actions proportionally to their action-value function Q. Experimental results demonstrate the effectiveness of LOQA at achieving state-of-the-art performance in benchmark scenarios such as the Iterated Prisoner's Dilemma and the Coin Game. LOQA achieves these outcomes with a significantly reduced computational footprint, making it a promising approach for practical multi-agent applications.
Polynomial Chaos Expanded Gaussian Process
Authors: Dominik Polke, Tim Kösters, Elmar Ahle, Dirk Söffker
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2405.01052
Pdf link: https://arxiv.org/pdf/2405.01052
Abstract In complex and unknown processes, global models are initially generated over the entire experimental space, but they often fail to provide accurate predictions in local areas. Recognizing this limitation, this study addresses the need for models that effectively represent both global and local experimental spaces. It introduces a novel machine learning (ML) approach: Polynomial Chaos Expanded Gaussian Process (PCEGP), leveraging polynomial chaos expansion (PCE) to calculate input-dependent hyperparameters of the Gaussian process (GP). This approach provides a mathematically interpretable method that incorporates non-stationary covariance functions and heteroscedastic noise estimation to generate locally adapted models. The model performance is compared to different algorithms in benchmark tests for regression tasks. The results demonstrate low prediction errors of the PCEGP in these benchmark applications, highlighting model performance that is often competitive with or superior to previous methods. A key advantage of the presented model is the transparency and traceability in the calculation of hyperparameters and model predictions.
Explicitly Modeling Generality into Self-Supervised Learning
Authors: Jingyao Wang, Wenwen Qiang, Changwen Zheng
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.01053
Pdf link: https://arxiv.org/pdf/2405.01053
Abstract The goal of generality in machine learning is to achieve excellent performance on various unseen tasks and domains. Recently, self-supervised learning (SSL) has been regarded as an effective method to achieve this goal. It can learn high-quality representations from unlabeled data and achieve promising empirical performance on multiple downstream tasks. Existing SSL methods mainly constrain generality from two aspects: (i) large-scale training data, and (ii) learning task-level shared knowledge. However, these methods lack explicit modeling of the SSL generality in the learning objective, and the theoretical understanding of SSL's generality remains limited. This may cause SSL models to overfit in data-scarce situations and generalize poorly in the real world, making it difficult to achieve true generality. To address these issues, we provide a theoretical definition of generality in SSL and define a $\sigma$-measurement to help quantify it. Based on this insight, we explicitly model generality into self-supervised learning and further propose a novel SSL framework, called GeSSL. It introduces a self-motivated target based on $\sigma$-measurement, which enables the model to find the optimal update direction towards generality. Extensive theoretical and empirical evaluations demonstrate the superior performance of the proposed GeSSL.
Leverage Multi-source Traffic Demand Data Fusion with Transformer Model for Urban Parking Prediction
Authors: Yin Huang, Yongqi Dong, Youhua Tang, Li Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2405.01055
Pdf link: https://arxiv.org/pdf/2405.01055
Abstract The escalation in urban private car ownership has worsened the urban parking predicament, necessitating effective parking availability prediction for urban planning and management. However, the existing prediction methods suffer from low prediction accuracy with the lack of spatial-temporal correlation features related to parking volume, and neglect of flow patterns and correlations between similar parking lots within certain areas. To address these challenges, this study proposes a parking availability prediction framework integrating spatial-temporal deep learning with multi-source data fusion, encompassing traffic demand data from multiple sources (e.g., metro, bus, taxi services), and parking lot data. The framework is based on the Transformer as the spatial-temporal deep learning model and leverages K-means clustering to establish parking cluster zones, extracting and integrating traffic demand characteristics from various transportation modes (i.e., metro, bus, online ride-hailing, and taxi) connected to parking lots. Real-world empirical data was used to verify the effectiveness of the proposed method compared with different machine learning, deep learning, and traditional statistical models for predicting parking availability. Experimental results reveal that, with the proposed pipeline, the developed Transformer model outperforms other models in terms of various metrics, e.g., Mean Squared Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). By fusing multi-source demanding data with spatial-temporal deep learning techniques, this approach offers the potential to develop parking availability prediction systems that furnish more accurate and timely information to both drivers and urban planners, thereby fostering more efficient and sustainable urban mobility.
A text-based, generative deep learning model for soil reflectance spectrum simulation in the VIS-NIR (400-2499 nm) bands
Authors: Tong Lei, Brian N. Bailey
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2405.01060
Pdf link: https://arxiv.org/pdf/2405.01060
Abstract Simulating soil reflectance spectra is invaluable for soil-plant radiative modeling and training machine learning models, yet it is difficult as the intricate relationships between soil structure and its constituents. To address this, a fully data-driven soil optics generative model (SOGM) for simulation of soil reflectance spectra based on soil property inputs was developed. The model is trained on an extensive dataset comprising nearly 180,000 soil spectra-property pairs from 17 datasets. It generates soil reflectance spectra from text-based inputs describing soil properties and their values rather than only numerical values and labels in binary vector format. The generative model can simulate output spectra based on an incomplete set of input properties. SOGM is based on the denoising diffusion probabilistic model (DDPM). Two additional sub-models were also built to complement the SOGM: a spectral padding model that can fill in the gaps for spectra shorter than the full visible-near-infrared range (VIS-NIR; 400 to 2499 nm), and a wet soil spectra model that can estimate the effects of water content on soil reflectance spectra given the dry spectrum predicted by the SOGM. The SOGM was up-scaled by coupling with the Helios 3D plant modeling software, which allowed for generation of synthetic aerial images of simulated soil and plant scenes. It can also be easily integrated with soil-plant radiation model used for remote sensin research like PROSAIL. The testing results of the SOGM on new datasets that not included in model training proved that the model can generate reasonable soil reflectance spectra based on available property inputs. The presented models are openly accessible on: https://github.com/GEMINI-Breeding/SOGM_soil_spectra_simulation.
Callico: a Versatile Open-Source Document Image Annotation Platform
Authors: Christopher Kermorvant, Eva Bardou, Manon Blanco, Bastien Abadie
Subjects: Computer Vision and Pattern Recognition (cs.CV); Digital Libraries (cs.DL)
Arxiv link: https://arxiv.org/abs/2405.01071
Pdf link: https://arxiv.org/pdf/2405.01071
Abstract This paper presents Callico, a web-based open source platform designed to simplify the annotation process in document recognition projects. The move towards data-centric AI in machine learning and deep learning underscores the importance of high-quality data, and the need for specialised tools that increase the efficiency and effectiveness of generating such data. For document image annotation, Callico offers dual-display annotation for digitised documents, enabling simultaneous visualisation and annotation of scanned images and text. This capability is critical for OCR and HTR model training, document layout analysis, named entity recognition, form-based key value annotation or hierarchical structure annotation with element grouping. The platform supports collaborative annotation with versatile features backed by a commitment to open source development, high-quality code standards and easy deployment via Docker. Illustrative use cases - including the transcription of the Belfort municipal registers, the indexing of French World War II prisoners for the ICRC, and the extraction of personal information from the Socface project's census lists - demonstrate Callico's applicability and utility.
Boosting Communication Efficiency of Federated Learning's Secure Aggregation
Authors: Niousha Nazemi, Omid Tavallaie, Shuaijun Chen, Albert Y. Zomaya, Ralph Holz
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.01144
Pdf link: https://arxiv.org/pdf/2405.01144
Abstract Federated Learning (FL) is a decentralized machine learning approach where client devices train models locally and send them to a server that performs aggregation to generate a global model. FL is vulnerable to model inversion attacks, where the server can infer sensitive client data from trained models. Google's Secure Aggregation (SecAgg) protocol addresses this data privacy issue by masking each client's trained model using shared secrets and individual elements generated locally on the client's device. Although SecAgg effectively preserves privacy, it imposes considerable communication and computation overhead, especially as network size increases. Building upon SecAgg, this poster introduces a Communication-Efficient Secure Aggregation (CESA) protocol that substantially reduces this overhead by using only two shared secrets per client to mask the model. We propose our method for stable networks with low delay variation and limited client dropouts. CESA is independent of the data distribution and network size (for higher than 6 nodes), preventing the honest-but-curious server from accessing unmasked models. Our initial evaluation reveals that CESA significantly reduces the communication cost compared to SecAgg.
Gradient-Congruity Guided Federated Sparse Training
Authors: Chris Xing Tian, Yibing Liu, Haoliang Li, Ray C.C. Cheung, Shiqi Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.01189
Pdf link: https://arxiv.org/pdf/2405.01189
Abstract Edge computing allows artificial intelligence and machine learning models to be deployed on edge devices, where they can learn from local data and collaborate to form a global model. Federated learning (FL) is a distributed machine learning technique that facilitates this process while preserving data privacy. However, FL also faces challenges such as high computational and communication costs regarding resource-constrained devices, and poor generalization performance due to the heterogeneity of data across edge clients and the presence of out-of-distribution data. In this paper, we propose the Gradient-Congruity Guided Federated Sparse Training (FedSGC), a novel method that integrates dynamic sparse training and gradient congruity inspection into federated learning framework to address these issues. Our method leverages the idea that the neurons, in which the associated gradients with conflicting directions with respect to the global model contain irrelevant or less generalized information for other clients, and could be pruned during the sparse training process. Conversely, the neurons where the associated gradients with consistent directions could be grown in a higher priority. In this way, FedSGC can greatly reduce the local computation and communication overheads while, at the same time, enhancing the generalization abilities of FL. We evaluate our method on challenging non-i.i.d settings and show that it achieves competitive accuracy with state-of-the-art FL methods across various scenarios while minimizing computation and communication costs.
Lying Graph Convolution: Learning to Lie for Node Classification Tasks
Authors: Daniele Castellana
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2405.01247
Pdf link: https://arxiv.org/pdf/2405.01247
Abstract In the context of machine learning for graphs, many researchers have empirically observed that Deep Graph Networks (DGNs) perform favourably on node classification tasks when the graph structure is homophilic (\ie adjacent nodes are similar). In this paper, we introduce Lying-GCN, a new DGN inspired by opinion dynamics that can adaptively work in both the heterophilic and the homophilic setting. At each layer, each agent (node) shares its own opinions (node embeddings) with its neighbours. Instead of sharing its opinion directly as in GCN, we introduce a mechanism which allows agents to lie. Such a mechanism is adaptive, thus the agents learn how and when to lie according to the task that should be solved. We provide a characterisation of our proposal in terms of dynamical systems, by studying the spectral property of the coefficient matrix of the system. While the steady state of the system collapses to zero, we believe the lying mechanism is still usable to solve node classification tasks. We empirically prove our belief on both synthetic and real-world datasets, by showing that the lying mechanism allows to increase the performances in the heterophilic setting without harming the results in the homophilic one.
Identification of Entailment and Contradiction Relations between Natural Language Sentences: A Neurosymbolic Approach
Authors: Xuyao Feng, Anthony Hunter
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2405.01259
Pdf link: https://arxiv.org/pdf/2405.01259
Abstract Natural language inference (NLI), also known as Recognizing Textual Entailment (RTE), is an important aspect of natural language understanding. Most research now uses machine learning and deep learning to perform this task on specific datasets, meaning their solution is not explainable nor explicit. To address the need for an explainable approach to RTE, we propose a novel pipeline that is based on translating text into an Abstract Meaning Representation (AMR) graph. For this we use a pre-trained AMR parser. We then translate the AMR graph into propositional logic and use a SAT solver for automated reasoning. In text, often commonsense suggests that an entailment (or contradiction) relationship holds between a premise and a claim, but because different wordings are used, this is not identified from their logical representations. To address this, we introduce relaxation methods to allow replacement or forgetting of some propositions. Our experimental results show this pipeline performs well on four RTE datasets.
An Online Gradient-Based Caching Policy with Logarithmic Complexity and Regret Guarantees
Authors: Damiano Carra, Giovanni Neglia
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Operating Systems (cs.OS)
Arxiv link: https://arxiv.org/abs/2405.01263
Pdf link: https://arxiv.org/pdf/2405.01263
Abstract The commonly used caching policies, such as LRU or LFU, exhibit optimal performance only for specific traffic patterns. Even advanced Machine Learning-based methods, which detect patterns in historical request data, struggle when future requests deviate from past trends. Recently, a new class of policies has emerged that makes no assumptions about the request arrival process. These algorithms solve an online optimization problem, enabling continuous adaptation to the context. They offer theoretical guarantees on the regret metric, which is the gap between the gain of the online policy and the gain of the optimal static cache allocation in hindsight. Nevertheless, the high computational complexity of these solutions hinders their practical adoption. In this study, we introduce a groundbreaking gradient-based online caching policy, the first to achieve logarithmic computational complexity relative to catalog size along with regret guarantees. This means our algorithm can efficiently handle large-scale data while minimizing the performance gap between real-time decisions and optimal hindsight choices. As requests arrive, our policy dynamically adjusts the probabilities of including items in the cache, which drive cache update decisions. Our algorithm's streamlined complexity is a key advantage, enabling its application to real-world traces featuring millions of requests and items. This is a significant achievement, as traces of this scale have been out of reach for existing policies with regret guarantees. To the best of our knowledge, our experimental results show for the first time that the regret guarantees of gradient-based caching policies bring significant benefits in scenarios of practical interest.
Invariant Risk Minimization Is A Total Variation Model
Authors: Zhao-Rong Lai, Wei-Wen Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01389
Pdf link: https://arxiv.org/pdf/2405.01389
Abstract Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning. While most related works focus on new IRM settings or new application scenarios, the mathematical essence of IRM remains to be properly explained. We verify that IRM is essentially a total variation based on $L^2$ norm (TV-$\ell_2$) of the learning risk with respect to the classifier variable. Moreover, we propose a novel IRM framework based on the TV-$\ell_1$ model. It not only expands the classes of functions that can be used as the learning risk, but also has robust performance in denoising and invariant feature preservation based on the coarea formula. We also illustrate some requirements for IRM-TV-$\ell_1$ to achieve out-of-distribution generalization. Experimental results show that the proposed framework achieves competitive performance in several benchmark machine learning scenarios.
Uncertainty for Active Learning on Graphs
Authors: Dominik Fuchsgruber, Tom Wollschläger, Bertrand Charpentier, Antonio Oroz, Stephan Günnemann
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01462
Pdf link: https://arxiv.org/pdf/2405.01462
Abstract Uncertainty Sampling is an Active Learning strategy that aims to improve the data efficiency of machine learning models by iteratively acquiring labels of data points with the highest uncertainty. While it has proven effective for independent data its applicability to graphs remains under-explored. We propose the first extensive study of Uncertainty Sampling for node classification: (1) We benchmark Uncertainty Sampling beyond predictive uncertainty and highlight a significant performance gap to other Active Learning strategies. (2) We develop ground-truth Bayesian uncertainty estimates in terms of the data generating process and prove their effectiveness in guiding Uncertainty Sampling toward optimal queries. We confirm our results on synthetic data and design an approximate approach that consistently outperforms other uncertainty estimators on real datasets. (3) Based on this analysis, we relate pitfalls in modeling uncertainty to existing methods. Our analysis enables and informs the development of principled uncertainty estimation on graphs.
Common pitfalls to avoid while using multiobjective optimization in machine learning
Authors: Junaid Akhter, Paul David Fährmann, Konstantin Sonntag, Sebastian Peitz
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2405.01480
Pdf link: https://arxiv.org/pdf/2405.01480
Abstract Recently, there has been an increasing interest in exploring the application of multiobjective optimization (MOO) in machine learning (ML). The interest is driven by the numerous situations in real-life applications where multiple objectives need to be optimized simultaneously. A key aspect of MOO is the existence of a Pareto set, rather than a single optimal solution, which illustrates the inherent trade-offs between objectives. Despite its potential, there is a noticeable lack of satisfactory literature that could serve as an entry-level guide for ML practitioners who want to use MOO. Hence, our goal in this paper is to produce such a resource. We critically review previous studies, particularly those involving MOO in deep learning (using Physics-Informed Neural Networks (PINNs) as a guiding example), and identify misconceptions that highlight the need for a better grasp of MOO principles in ML. Using MOO of PINNs as a case study, we demonstrate the interplay between the data loss and the physics loss terms. We highlight the most common pitfalls one should avoid while using MOO techniques in ML. We begin by establishing the groundwork for MOO, focusing on well-known approaches such as the weighted sum (WS) method, alongside more complex techniques like the multiobjective gradient descent algorithm (MGDA). Additionally, we compare the results obtained from the WS and MGDA with one of the most common evolutionary algorithms, NSGA-II. We emphasize the importance of understanding the specific problem, the objective space, and the selected MOO method, while also noting that neglecting factors such as convergence can result in inaccurate outcomes and, consequently, a non-optimal solution. Our goal is to offer a clear and practical guide for ML practitioners to effectively apply MOO, particularly in the context of DL.
Digital Twin Generators for Disease Modeling
Authors: Nameyeh Alam, Jake Basilico, Daniele Bertolini, Satish Casie Chetty, Heather D'Angelo, Ryan Douglas, Charles K. Fisher, Franklin Fuller, Melissa Gomes, Rishabh Gupta, Alex Lang, Anton Loukianov, Rachel Mak-McCully, Cary Murray, Hanalei Pham, Susanna Qiao, Elena Ryapolova-Webb, Aaron Smith, Dimitri Theoharatos, Anil Tolwani, Eric W. Tramel, Anna Vidovszky, Judy Viduya, Jonathan R. Walsh
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2405.01488
Pdf link: https://arxiv.org/pdf/2405.01488
Abstract A patient's digital twin is a computational model that describes the evolution of their health over time. Digital twins have the potential to revolutionize medicine by enabling individual-level computer simulations of human health, which can be used to conduct more efficient clinical trials or to recommend personalized treatment options. Due to the overwhelming complexity of human biology, machine learning approaches that leverage large datasets of historical patients' longitudinal health records to generate patients' digital twins are more tractable than potential mechanistic models. In this manuscript, we describe a neural network architecture that can learn conditional generative models of clinical trajectories, which we call Digital Twin Generators (DTGs), that can create digital twins of individual patients. We show that the same neural network architecture can be trained to generate accurate digital twins for patients across 13 different indications simply by changing the training set and tuning hyperparameters. By introducing a general purpose architecture, we aim to unlock the ability to scale machine learning approaches to larger datasets and across more indications so that a digital twin could be created for any patient in the world.
Keyword: optimization

Comparative approach: Electric distribution optimization with loss minimization algorithm and particle swarm optimization
Authors: Soufiane Bouabbadi
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2405.00680
Pdf link: https://arxiv.org/pdf/2405.00680
Abstract Power systems are very large and complex, it can be influenced by many unexpected events this makes power system optimization problems difficult to solve, hence methods for solving these problems ought to be an active research topic. This review presents an overview of important mathematical comparaison of loss minimization algorithm and particle swarm optimization algorithm in terms of the performances of electric distribution.
Technical Report on BaumEvA Evolutionary Optimization Python-Library Testing
Authors: Vadim Tynchenko, Aleksei Kudryavtsev, Vladimir Nelyub, Aleksei Borodulin, Andrei Gantimurov
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.00686
Pdf link: https://arxiv.org/pdf/2405.00686
Abstract This report presents the test results Python library BaumEvA, which implements evolutionary algorithms for optimizing various types of problems, including computer vision tasks accompanied by the search for optimal model architectures. Testing was carried out to evaluate the effectiveness and reliability of the pro-posed methods, as well as to determine their applicability in various fields. Dur-ing testing, various test functions and parameters of evolutionary algorithms were used, which made it possible to evaluate their performance in a wide range of conditions. Test results showed that the library provides effective and reliable methods for solving optimization problems. However, some limitations were identified related to computational resources and execution time of algorithms on problems with large dimensions. The report includes a detailed description of the tests performed, the results obtained and conclusions about the applicability of the genetic algorithm in various tasks. Recommendations for choosing algorithm pa-rameters and using the library to achieve the best results are also provided. The report may be useful to developers involved in the optimization of complex com-puting systems, as well as to researchers studying the possibilities of using evo-lutionary algorithms in various fields of science and technology.
Life-long Learning and Testing for Automated Vehicles via Adaptive Scenario Sampling as A Continuous Optimization Process
Authors: Jingwei Ge, Pengbo Wang, Cheng Chang, Yi Zhang, Danya Yao, Li Li
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2405.00696
Pdf link: https://arxiv.org/pdf/2405.00696
Abstract Sampling critical testing scenarios is an essential step in intelligence testing for Automated Vehicles (AVs). However, due to the lack of prior knowledge on the distribution of critical scenarios in sampling space, we can hardly efficiently find the critical scenarios or accurately evaluate the intelligence of AVs. To solve this problem, we formulate the testing as a continuous optimization process which iteratively generates potential critical scenarios and meanwhile evaluates these scenarios. A bi-level loop is proposed for such life-long learning and testing. In the outer loop, we iteratively learn space knowledge by evaluating AV in the already sampled scenarios and then sample new scenarios based on the retained knowledge. Outer loop stops when all generated samples cover the whole space. While to maximize the coverage of the space in each outer loop, we set an inner loop which receives newly generated samples in outer loop and outputs the updated positions of these samples. We assume that points in a small sphere-like subspace can be covered (or represented) by the point in the center of this sphere. Therefore, we can apply a multi-rounds heuristic strategy to move and pack these spheres in space to find the best covering solution. The simulation results show that faster and more accurate evaluation of AVs can be achieved with more critical scenarios.
Soft Preference Optimization: Aligning Language Models to Expert Distributions
Authors: Arsalan Sharifnassab, Sina Ghiassian, Saber Salehkaleybar, Surya Kanoria, Dale Schuurmans
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.00747
Pdf link: https://arxiv.org/pdf/2405.00747
Abstract We propose Soft Preference Optimization (SPO), a method for aligning generative models, such as Large Language Models (LLMs), with human preferences, without the need for a reward model. SPO optimizes model outputs directly over a preference dataset through a natural loss function that integrates preference loss with a regularization term across the model's entire output distribution rather than limiting it to the preference dataset. Although SPO does not require the assumption of an existing underlying reward model, we demonstrate that, under the Bradley-Terry (BT) model assumption, it converges to a softmax of scaled rewards, with the distribution's "softness" adjustable via the softmax exponent, an algorithm parameter. We showcase SPO's methodology, its theoretical foundation, and its comparative advantages in simplicity, computational efficiency, and alignment precision.
Does Using Bazel Help Speed Up Continuous Integration Builds?
Authors: Shenyu Zheng, Bram Adams, Ahmed E. Hassan
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2405.00796
Pdf link: https://arxiv.org/pdf/2405.00796
Abstract A long continuous integration (CI) build forces developers to wait for CI feedback before starting subsequent development activities, leading to time wasted. In addition to a variety of build scheduling and test selection heuristics studied in the past, new artifact-based build technologies like Bazel have built-in support for advanced performance optimizations such as parallel build and incremental build (caching of build results). However, little is known about the extent to which new build technologies like Bazel deliver on their promised benefits, especially for long-build duration projects. In this study, we collected 383 Bazel projects from GitHub, then studied their parallel and incremental build usage of Bazel in 4 popular CI services, and compared the results with Maven projects. We conducted 3,500 experiments on 383 Bazel projects and analyzed the build logs of a subset of 70 buildable projects to evaluate the performance impact of Bazel's parallel builds. Additionally, we performed 102,232 experiments on the 70 buildable projects' last 100 commits to evaluate Bazel's incremental build performance. Our results show that 31.23% of Bazel projects adopt a CI service but do not use Bazel in the CI service, while for those who do use Bazel in CI, 27.76% of them use other tools to facilitate Bazel's execution. Compared to sequential builds, the median speedups for long-build duration projects are 2.00x, 3.84x, 7.36x, and 12.80x, at parallelism degrees 2, 4, 8, and 16, respectively, even though, compared to a clean build, applying incremental build achieves a median speedup of 4.22x (with a build system tool-independent CI cache) and 4.71x (with a build system tool-specific cache) for long-build duration projects. Our results provide guidance for developers to improve the usage of Bazel in their projects.
Sifting out communities in large sparse networks
Authors: Sharlee Climer, Kenneth Smith Jr, Wei Yang, Lisa de las Fuentes, Victor G. Dávila-Román, C. Charles Gu
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00816
Pdf link: https://arxiv.org/pdf/2405.00816
Abstract Research data sets are growing to unprecedented sizes and network modeling is commonly used to extract complex relationships in diverse domains, such as genetic interactions involved in disease, logistics, and social communities. As the number of nodes increases in a network, an increasing sparsity of edges is a practical limitation due to memory restrictions. Moreover, many of these sparse networks exhibit very large numbers of nodes with no adjacent edges, as well as disjoint components of nodes with no edges connecting them. A prevalent aim in network modeling is the identification of clusters, or communities, of nodes that are highly interrelated. Several definitions of strong community structure have been introduced to facilitate this task, each with inherent assumptions and biases. We introduce an intuitive objective function for quantifying the quality of clustering results in large sparse networks. We utilize a two-step method for identifying communities which is especially well-suited for this domain as the first step efficiently divides the network into the disjoint components, while the second step optimizes clustering of the produced components based on the new objective. Using simulated networks, optimization based on the new objective function consistently yields significantly higher accuracy than those based on the modularity function, with the widest gaps appearing for the noisiest networks. Additionally, applications to benchmark problems illustrate the intuitive correctness of our approach. Finally, the practicality of our approach is demonstrated in real-world data in which we identify complex genetic interactions in large-scale networks comprised of tens of thousands of nodes. Based on these three different types of trials, our results clearly demonstrate the usefulness of our two-step procedure and the accuracy of our simple objective.
HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond
Authors: Stefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00820
Pdf link: https://arxiv.org/pdf/2405.00820
Abstract Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extensibility, or lack of reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs, limiting wider adoption of such datasets. In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture ensures broad design space coverage via design space expansion and supports multiple vendor tools. Users can contribute to each stage with their own HLS designs and synthesis results and extend the framework itself with custom frontends and tool flows. We also include an initial set of built-in designs from common HLS benchmarks curated open-source HLS designs. We showcase the versatility and multi-functionality of our framework through six case studies: I) Design space sampling; II) Fine-grained parallelism backend speedup; III) Targeting Intel's HLS flow; IV) Adding new auxiliary designs; V) Integrating published HLS data; VI) HLS tool version regression benchmarking. Code at https://github.com/sharc-lab/HLSFactory.
A Convex Formulation of the Soft-Capture Problem
Authors: Ibrahima Sory Sow, Geordan Gutow, Howie Choset, Zachary Manchester
Subjects: Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2405.00867
Pdf link: https://arxiv.org/pdf/2405.00867
Abstract We present a fast trajectory optimization algorithm for the soft capture of uncooperative tumbling space objects. Our algorithm generates safe, dynamically feasible, and minimum-fuel trajectories for a six-degree-of-freedom servicing spacecraft to achieve soft capture (near-zero relative velocity at contact) between predefined locations on the servicer spacecraft and target body. We solve a convex problem by enforcing a convex relaxation of the field-of-view constraint, followed by a sequential convex program correcting the trajectory for collision avoidance. The optimization problems can be solved with a standard second-order cone programming solver, making the algorithm both fast and practical for implementation in flight software. We demonstrate the performance and robustness of our algorithm in simulation over a range of object tumble rates up to 10{\deg}/s.
Learning to Boost the Performance of Stable Nonlinear Systems
Authors: Luca Furieri, Clara Lucía Galimberti, Giancarlo Ferrari-Trecate
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00871
Pdf link: https://arxiv.org/pdf/2405.00871
Abstract The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over arbitrarily deep neural network classes of performance-boosting controllers for stable nonlinear systems; crucially, we guarantee Lp closed-loop stability even if optimization is halted prematurely, and even when the ground-truth dynamics are unknown, with vanishing conservatism in the class of stabilizing policies as the model uncertainty is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely shaping the cost functions through several numerical experiments.
A Differentiable Dynamic Modeling Approach to Integrated Motion Planning and Actuator Physical Design for Mobile Manipulators
Authors: Zehui Lu, Yebin Wang
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2405.00882
Pdf link: https://arxiv.org/pdf/2405.00882
Abstract This paper investigates the differentiable dynamic modeling of mobile manipulators to facilitate efficient motion planning and physical design of actuators, where the actuator design is parameterized by physically meaningful motor geometry parameters. These parameters impact the manipulator's link mass, inertia, center-of-mass, torque constraints, and angular velocity constraints, influencing control authority in motion planning and trajectory tracking control. A motor's maximum torque/speed and how the design parameters affect the dynamics are modeled analytically, facilitating differentiable and analytical dynamic modeling. Additionally, an integrated locomotion and manipulation planning problem is formulated with direct collocation discretization, using the proposed differentiable dynamics and motor parameterization. Such dynamics are required to capture the dynamic coupling between the base and the manipulator. Numerical experiments demonstrate the effectiveness of differentiable dynamics in speeding up optimization and advantages in task completion time and energy consumption over established sequential motion planning approach. Finally, this paper introduces a simultaneous actuator design and motion planning framework, providing numerical results to validate the proposed differentiable modeling approach for co-design problems.
Transformer-Based Self-Supervised Learning for Histopathological Classification of Ischemic Stroke Clot Origin
Authors: K. Yeh, M. S. Jabal, V. Gupta, D. F. Kallmes, W. Brinjikji, B. S. Erdal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00908
Pdf link: https://arxiv.org/pdf/2405.00908
Abstract Background and Purpose: Identifying the thromboembolism source in ischemic stroke is crucial for treatment and secondary prevention yet is often undetermined. This study describes a self-supervised deep learning approach in digital pathology of emboli for classifying ischemic stroke clot origin from histopathological images. Methods: The dataset included whole slide images (WSI) from the STRIP AI Kaggle challenge, consisting of retrieved clots from ischemic stroke patients following mechanical thrombectomy. Transformer-based deep learning models were developed using transfer learning and self-supervised pretraining for classifying WSI. Customizations included an attention pooling layer, weighted loss function, and threshold optimization. Various model architectures were tested and compared, and model performances were primarily evaluated using weighted logarithmic loss. Results: The model achieved a logloss score of 0.662 in cross-validation and 0.659 on the test set. Different model backbones were compared, with the swin_large_patch4_window12_384 showed higher performance. Thresholding techniques for clot origin classification were employed to balance false positives and negatives. Conclusion: The study demonstrates the extent of efficacy of transformer-based deep learning models in identifying ischemic stroke clot origins from histopathological images and emphasizes the need for refined modeling techniques specifically adapted to thrombi WSI. Further research is needed to improve model performance, interpretability, validate its effectiveness. Future enhancement could include integrating larger patient cohorts, advanced preprocessing strategies, and exploring ensemble multimodal methods for enhanced diagnostic accuracy.
X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation
Authors: Yiwei Ma, Zhekai Lin, Jiayi Ji, Yijun Fan, Xiaoshuai Sun, Rongrong Ji
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.00954
Pdf link: https://arxiv.org/pdf/2405.00954
Abstract Recent advancements in automatic 3D avatar generation guided by text have made significant progress. However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progressive framework for generating high-quality animatable avatars from text prompts. It follows a sequential Geometry->Texture->Animation paradigm, simplifying optimization through step-by-step generation. To tackle oversaturation, we introduce Adaptive Variational Parameter (AVP), representing avatars as an adaptive distribution during training. Additionally, we present Avatar-aware Score Distillation Sampling (ASDS), a novel technique that incorporates avatar-aware noise into rendered images for improved generation quality during optimization. Extensive evaluations confirm the superiority of X-Oscar over existing text-to-3D and text-to-avatar approaches. Our anonymous project page: https://xmu-xiaoma666.github.io/Projects/X-Oscar/.
Robust Decentralized Learning with Local Updates and Gradient Tracking
Authors: Sajjad Ghiasvand, Amirhossein Reisizadeh, Mahnoosh Alizadeh, Ramtin Pedarsani
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2405.00965
Pdf link: https://arxiv.org/pdf/2405.00965
Abstract As distributed learning applications such as Federated Learning, the Internet of Things (IoT), and Edge Computing grow, it is critical to address the shortcomings of such technologies from a theoretical perspective. As an abstraction, we consider decentralized learning over a network of communicating clients or nodes and tackle two major challenges: data heterogeneity and adversarial robustness. We propose a decentralized minimax optimization method that employs two important modules: local updates and gradient tracking. Minimax optimization is the key tool to enable adversarial training for ensuring robustness. Having local updates is essential in Federated Learning (FL) applications to mitigate the communication bottleneck, and utilizing gradient tracking is essential to proving convergence in the case of data heterogeneity. We analyze the performance of the proposed algorithm, Dec-FedTrack, in the case of nonconvex-strongly concave minimax optimization, and prove that it converges a stationary point. We also conduct numerical experiments to support our theoretical findings.
Active Cell Balancing for Extended Operational Time of Lithium-Ion Battery Systems in Energy Storage Applications
Authors: Yiming Xu, Xiaohua Ge, Ruohan Guo, Weixiang Shen
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2405.00973
Pdf link: https://arxiv.org/pdf/2405.00973
Abstract Cell inconsistency within a lithium-ion battery system poses a significant challenge in maximizing the system operational time. This study presents an optimization-driven active balancing method to minimize the effects of cell inconsistency on the system operational time while simultaneously satisfying the system output power demand and prolonging the system operational time in energy storage applications. The proposed method utilizes a fractional order model to forecast the terminal voltage dynamics of each cell within a battery system, enhanced with a particle-swarm-optimisation-genetic algorithm for precise parameter identification. It is implemented under two distinct cell-level balancing topologies: independent cell balancing and differential cell balancing. Subsequently, the current distribution for each topology is determined by resolving two optimization control problems constrained by the battery's operational specifications and power demands. The effectiveness of the proposed method is validated by extensive experiments based on the two balancing topologies. The results demonstrate that the proposed method increases the operational time by 3.2%.
Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation
Authors: David Eric Austin, Anton Korikov, Armin Toroghi, Scott Sanner
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2405.00981
Pdf link: https://arxiv.org/pdf/2405.00981
Abstract Designing preference elicitation (PE) methodologies that can quickly ascertain a user's top item preferences in a cold-start setting is a key challenge for building effective and personalized conversational recommendation (ConvRec) systems. While large language models (LLMs) constitute a novel technology that enables fully natural language (NL) PE dialogues, we hypothesize that monolithic LLM NL-PE approaches lack the multi-turn, decision-theoretic reasoning required to effectively balance the NL exploration and exploitation of user preferences towards an arbitrary item set. In contrast, traditional Bayesian optimization PE methods define theoretically optimal PE strategies, but fail to use NL item descriptions or generate NL queries, unrealistically assuming users can express preferences with direct item ratings and comparisons. To overcome the limitations of both approaches, we formulate NL-PE in a Bayesian Optimization (BO) framework that seeks to generate NL queries which actively elicit natural language feedback to reduce uncertainty over item utilities to identify the best recommendation. We demonstrate our framework in a novel NL-PE algorithm, PEBOL, which uses Natural Language Inference (NLI) between user preference utterances and NL item descriptions to maintain preference beliefs and BO strategies such as Thompson Sampling (TS) and Upper Confidence Bound (UCB) to guide LLM query generation. We numerically evaluate our methods in controlled experiments, finding that PEBOL achieves up to 131% improvement in MAP@10 after 10 turns of cold start NL-PE dialogue compared to monolithic GPT-3.5, despite relying on a much smaller 400M parameter NLI model for preference inference.
QSimPy: A Learning-centric Simulation Framework for Quantum Cloud Resource Management
Authors: Hoa T. Nguyen, Muhammad Usman, Rajkumar Buyya
Subjects: Emerging Technologies (cs.ET); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2405.01021
Pdf link: https://arxiv.org/pdf/2405.01021
Abstract Quantum cloud computing is an emerging computing paradigm that allows seamless access to quantum hardware as cloud-based services. However, effective use of quantum resources is challenging and necessitates robust simulation frameworks for effective resource management design and evaluation. To address this need, we proposed QSimPy, a novel discrete-event simulation framework designed with the main focus of facilitating learning-centric approaches for quantum resource management problems in cloud environments. Underpinned by extensibility, compatibility, and reusability principles, QSimPy provides a lightweight simulation environment based on SimPy, a well-known Python-based simulation engine for modeling dynamics of quantum cloud resources and task operations. We integrate the Gymnasium environment into our framework to support the creation of simulated environments for developing and evaluating reinforcement learning-based techniques for optimizing quantum cloud resource management. The QSimPy framework encapsulates the operational intricacies of quantum cloud environments, supporting research in dynamic task allocation and optimization through DRL approaches. We also demonstrate the use of QSimPy in developing reinforcement learning policies for quantum task placement problems, demonstrating its potential as a useful framework for future quantum cloud research.
Differentiable Particles for General-Purpose Deformable Object Manipulation
Authors: Siwei Chen, Yiqing Xu, Cunjun Yu, Linfeng Li, David Hsu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2405.01044
Pdf link: https://arxiv.org/pdf/2405.01044
Abstract Deformable object manipulation is a long-standing challenge in robotics. While existing approaches often focus narrowly on a specific type of object, we seek a general-purpose algorithm, capable of manipulating many different types of objects: beans, rope, cloth, liquid, . . . . One key difficulty is a suitable representation, rich enough to capture object shape, dynamics for manipulation and yet simple enough to be acquired effectively from sensor data. Specifically, we propose Differentiable Particles (DiPac), a new algorithm for deformable object manipulation. DiPac represents a deformable object as a set of particles and uses a differentiable particle dynamics simulator to reason about robot manipulation. To find the best manipulation action, DiPac combines learning, planning, and trajectory optimization through differentiable trajectory tree optimization. Differentiable dynamics provides significant benefits and enable DiPac to (i) estimate the dynamics parameters efficiently, thereby narrowing the sim-to-real gap, and (ii) choose the best action by backpropagating the gradient along sampled trajectories. Both simulation and real-robot experiments show promising results. DiPac handles a variety of object types. By combining planning and learning, DiPac outperforms both pure model-based planning methods and pure data-driven learning methods. In addition, DiPac is robust and adapts to changes in dynamics, thereby enabling the transfer of an expert policy from one object to another with different physical properties, e.g., from a rigid rod to a deformable rope.
Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization Approach
Authors: Tianhao Shi, Yang Zhang, Jizhi Zhang, Fuli Feng, Xiangnan He
Subjects: Information Retrieval (cs.IR); Computers and Society (cs.CY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01063
Pdf link: https://arxiv.org/pdf/2405.01063
Abstract As recommender systems are indispensable in various domains such as job searching and e-commerce, providing equitable recommendations to users with different sensitive attributes becomes an imperative requirement. Prior approaches for enhancing fairness in recommender systems presume the availability of all sensitive attributes, which can be difficult to obtain due to privacy concerns or inadequate means of capturing these attributes. In practice, the efficacy of these approaches is limited, pushing us to investigate ways of promoting fairness with limited sensitive attribute information. Toward this goal, it is important to reconstruct missing sensitive attributes. Nevertheless, reconstruction errors are inevitable due to the complexity of real-world sensitive attribute reconstruction problems and legal regulations. Thus, we pursue fair learning methods that are robust to reconstruction errors. To this end, we propose Distributionally Robust Fair Optimization (DRFO), which minimizes the worst-case unfairness over all potential probability distributions of missing sensitive attributes instead of the reconstructed one to account for the impact of the reconstruction errors. We provide theoretical and empirical evidence to demonstrate that our method can effectively ensure fairness in recommender systems when only limited sensitive attributes are accessible.
Multi-user ISAC through Stacked Intelligent Metasurfaces: New Algorithms and Experiments
Authors: Ziqing Wang, Hongzheng Liu, Jianan Zhang, Rujing Xiong, Kai Wan, Xuewen Qian, Marco Di Renzo, Robert Caiming Qiu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2405.01104
Pdf link: https://arxiv.org/pdf/2405.01104
Abstract This paper investigates a Stacked Intelligent Metasurfaces (SIM)-assisted Integrated Sensing and Communications (ISAC) system. An extended target model is considered, where the BS aims to estimate the complete target response matrix relative to the SIM. Under the constraints of minimum Signal-to-Interference-plus-Noise Ratio (SINR) for the communication users (CUs) and maximum transmit power, we jointly optimize the transmit beamforming at the base station (BS) and the end-to-end transmission matrix of the SIM, to minimize the Cram\'er-Rao Bound (CRB) for target estimation. Effective algorithms such as the alternating optimization (AO) and semidefinite relaxation (SDR) are employed to solve the non-convex SINR-constrained CRB minimization problem. Finally, we design and build an experimental platform for SIM, and evaluate the performance of the proposed algorithms for communication and sensing tasks.
Hypergraph $p$-Laplacian regularization on point clouds for data interpolation
Authors: Kehan Shi, Martin Burger
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2405.01109
Pdf link: https://arxiv.org/pdf/2405.01109
Abstract As a generalization of graphs, hypergraphs are widely used to model higher-order relations in data. This paper explores the benefit of the hypergraph structure for the interpolation of point cloud data that contain no explicit structural information. We define the $\varepsilon_n$-ball hypergraph and the $k_n$-nearest neighbor hypergraph on a point cloud and study the $p$-Laplacian regularization on the hypergraphs. We prove the variational consistency between the hypergraph $p$-Laplacian regularization and the continuum $p$-Laplacian regularization in a semisupervised setting when the number of points $n$ goes to infinity while the number of labeled points remains fixed. A key improvement compared to the graph case is that the results rely on weaker assumptions on the upper bound of $\varepsilon_n$ and $k_n$. To solve the convex but non-differentiable large-scale optimization problem, we utilize the stochastic primal-dual hybrid gradient algorithm. Numerical experiments on data interpolation verify that the hypergraph $p$-Laplacian regularization outperforms the graph $p$-Laplacian regularization in preventing the development of spikes at the labeled points.
Faster Learned Sparse Retrieval with Block-Max Pruning
Authors: Antonio Mallia, Torten Suel, Nicola Tonellotto
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2405.01117
Pdf link: https://arxiv.org/pdf/2405.01117
Abstract Learned sparse retrieval systems aim to combine the effectiveness of contextualized language models with the scalability of conventional data structures such as inverted indexes. Nevertheless, the indexes generated by these systems exhibit significant deviations from the ones that use traditional retrieval models, leading to a discrepancy in the performance of existing query optimizations that were specifically developed for traditional structures. These disparities arise from structural variations in query and document statistics, including sub-word tokenization, leading to longer queries, smaller vocabularies, and different score distributions within posting lists. This paper introduces Block-Max Pruning (BMP), an innovative dynamic pruning strategy tailored for indexes arising in learned sparse retrieval environments. BMP employs a block filtering mechanism to divide the document space into small, consecutive document ranges, which are then aggregated and sorted on the fly, and fully processed only as necessary, guided by a defined safe early termination criterion or based on approximate retrieval requirements. Through rigorous experimentation, we show that BMP substantially outperforms existing dynamic pruning strategies, offering unparalleled efficiency in safe retrieval contexts and improved tradeoffs between precision and efficiency in approximate retrieval tasks.
Energy-Efficient Reconfigurable Holographic Surfaces Operating in the Presence of Realistic Hardware Impairments
Authors: Qingchao Li, Mohammed El-Hajjar, Yanshi Sun, Ibrahim Hemadeh, Arman Shojaeifard, Lajos Hanzo
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2405.01146
Pdf link: https://arxiv.org/pdf/2405.01146
Abstract Reconfigurable holographic surfaces (RHSs) constitute a promising technique of supporting energy-efficient communications. In this paper, we formulate the energy efficiency maximization problem of the switch-controlled RHS-aided beamforming architecture by alternately optimizing the holographic beamformer at the RHS, the digital beamformer, the total transmit power and the power sharing ratio of each user. Specifically, to deal with this challenging non-convex optimization problem, we decouple it into three sub-problems. Firstly, the coefficients of RHS elements responsible for the holographic beamformer are optimized to maximize the sum of the eigen-channel gains of all users by our proposed low-complexity eigen-decomposition (ED) method. Then, the digital beamformer is designed by the singular value decomposition (SVD) method to support multi-user information transfer. Finally, the total transmit power and the power sharing ratio are alternately optimized, while considering the effect of transceiver hardware impairments (HWI). We theoretically derive the spectral efficiency and energy efficiency performance upper bound for the RHS-based beamforming architectures in the presence of HWIs. Our simulation results show that the switch-controlled RHS-aided beamforming architecture achieves higher energy efficiency than the conventional fully digital beamformer and the hybrid beamformer based on phase shift arrays (PSA). Moreover, considering the effect of HWI in the beamforming design can bring about further energy efficiency enhancements.
Optimizing Satellite Network Infrastructure: A Joint Approach to Gateway Placement and Routing
Authors: Yuma Abe, Flor Ortiz, Eva Lagunas, Victor Monzon Baeza, Symeon Chatzinotas, Hiroyuki Tsuji
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2405.01149
Pdf link: https://arxiv.org/pdf/2405.01149
Abstract Satellite constellation systems are becoming more attractive to provide communication services worldwide, especially in areas without network connectivity. While optimizing satellite gateway placement is crucial for operators to minimize deployment and operating costs, reducing the number of gateways may require more inter-satellite link hops to reach the ground network, thereby increasing latency. Therefore, it is of significant importance to develop a framework that optimizes gateway placement, dynamic routing, and flow management in inter-satellite links to enhance network performance. To this end, we model an optimization problem as a mixed-integer problem with a cost function combining the number of gateways, flow allocation, and traffic latency, allowing satellite operators to set priorities based on their policies. Our simulation results indicate that the proposed approach effectively reduces the number of active gateways by selecting their most appropriate locations while balancing the trade-off between the number of gateways and traffic latency. Furthermore, we demonstrate the impact of different weights in the cost function on performance through comparative analysis.
GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression
Authors: Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu, Wen Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2405.01170
Pdf link: https://arxiv.org/pdf/2405.01170
Abstract Transformer-based entropy models have gained prominence in recent years due to their superior ability to capture long-range dependencies in probability distribution estimation compared to convolution-based methods. However, previous transformer-based entropy models suffer from a sluggish coding process due to pixel-wise autoregression or duplicated computation during inference. In this paper, we propose a novel transformer-based entropy model called GroupedMixer, which enjoys both faster coding speed and better compression performance than previous transformer-based methods. Specifically, our approach builds upon group-wise autoregression by first partitioning the latent variables into groups along spatial-channel dimensions, and then entropy coding the groups with the proposed transformer-based entropy model. The global causal self-attention is decomposed into more efficient group-wise interactions, implemented using inner-group and cross-group token-mixers. The inner-group token-mixer incorporates contextual elements within a group while the cross-group token-mixer interacts with previously decoded groups. Alternate arrangement of two token-mixers enables global contextual reference. To further expedite the network inference, we introduce context cache optimization to GroupedMixer, which caches attention activation values in cross-group token-mixers and avoids complex and duplicated computation. Experimental results demonstrate that the proposed GroupedMixer yields the state-of-the-art rate-distortion performance with fast compression speed.
Third Medium Finite Element Contact Formulation for Pneumatically Actuated Systems
Authors: Ondřej Faltus, Martin Horák, Martin Doškář, Ondřej Rokoš
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2405.01185
Pdf link: https://arxiv.org/pdf/2405.01185
Abstract Mechanical metamaterials are artificially engineered microstructures that exhibit novel mechanical behavior on the macroscopic scale. Active metamaterials can be externally controlled. Pneumatically actuated metamaterials can change their mechanical, acoustic, or other types of effective behavior in response to applied pressure with possible applications ranging from soft robotic actuators to phononic crystals. To facilitate the design of such pneumatically actuated metamaterials and structures by topology optimization, a robust way of their computational modeling, capturing both pneumatic actuation of internal voids and internal contact, is needed. Since voids in topology optimization are often modeled using a soft material model, the third medium contact formulation lends itself as a suitable stepping stone. We propose a single hyperelastic material model capable of maintaining a prescribed hydrostatic Cauchy stress within a void in the pre-contact phase while simultaneously acting as a third medium to enforce frictionless contact, contrasting existing third medium approaches focused solely on contact. We split the overall third-medium energy density into contact, regularization, and pneumatic pressure contributions, all of which can be individually controlled and tuned. To prevent distortions of the compliant third medium, we include curvature penalization in our model. This improves on existing formulations in terms of compliant third medium behavior, leading ultimately to better numerical stability of the solution. Since our formulation is energetically consistent, we are able to employ more advanced finite element solvers, such as the modified Cholesky algorithm to detect instabilities. We demonstrate the behavior of the proposed formulation on several examples of traditional contact benchmarks, including a standard patch test, and validate it with experimental measurement.
Movable Antenna Enhanced Wireless Sensing Via Antenna Position Optimization
Authors: Wenyan Ma, Lipeng Zhu, Rui Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2405.01215
Pdf link: https://arxiv.org/pdf/2405.01215
Abstract In this paper, we propose a new wireless sensing system equipped with the movable-antenna (MA) array, which can flexibly adjust the positions of antenna elements for improving the sensing performance over conventional antenna arrays with fixed-position antennas (FPAs). First, we show that the angle estimation performance in wireless sensing is fundamentally determined by the array geometry, where the Cramer-Rao bound (CRB) of the mean square error (MSE) for angle of arrival (AoA) estimation is derived as a function of the antennas' positions for both one-dimensional (1D) and two-dimensional (2D) MA arrays. Then, for the case of 1D MA array, we obtain a globally optimal solution for the MAs' positions in closed form to minimize the CRB of AoA estimation MSE. While in the case of 2D MA array, we aim to achieve the minimum of maximum (min-max) CRBs of estimation MSE for the two AoAs with respect to the horizontal and vertical axes, respectively. In particular, for the special case of circular antenna movement region, an optimal solution for the MAs' positions is derived under certain numbers of MAs and circle radii. Thereby, both the lower- and upper-bounds of the min-max CRB are obtained for the antenna movement region with arbitrary shapes. Moreover, we develop an efficient alternating optimization algorithm to obtain a locally optimal solution for MAs' positions by iteratively optimizing one between their horizontal and vertical coordinates with the other being fixed. Numerical results demonstrate that our proposed 1D/2D MA arrays can significantly decrease the CRB of AoA estimation MSE as well as the actual MSE compared to conventional uniform linear arrays (ULAs)/uniform planar arrays (UPAs) with different values of uniform inter-antenna spacing.
Avoiding Redundant Restarts in Multimodal Global Optimization
Authors: Jacob de Nobel, Diederick Vermetten, Anna V. Kononova, Ofer M. Shir, Thomas Bäck
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2405.01226
Pdf link: https://arxiv.org/pdf/2405.01226
Abstract Na\"ive restarts of global optimization solvers when operating on multimodal search landscapes may resemble the Coupon's Collector Problem, with a potential to waste significant function evaluations budget on revisiting the same basins of attractions. In this paper, we assess the degree to which such ``duplicate restarts'' occur on standard multimodal benchmark functions, which defines the \textit{redundancy potential} of each particular landscape. We then propose a repelling mechanism to avoid such wasted restarts with the CMA-ES and investigate its efficacy on test cases with high redundancy potential compared to the standard restart mechanism.
Boosting Jailbreak Attack with Momentum
Authors: Yihao Zhang, Zeming Wei
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2405.01229
Pdf link: https://arxiv.org/pdf/2405.01229
Abstract Large Language Models (LLMs) have achieved remarkable success across diverse tasks, yet they remain vulnerable to adversarial attacks, notably the well-documented \textit{jailbreak} attack. Recently, the Greedy Coordinate Gradient (GCG) attack has demonstrated efficacy in exploiting this vulnerability by optimizing adversarial prompts through a combination of gradient heuristics and greedy search. However, the efficiency of this attack has become a bottleneck in the attacking process. To mitigate this limitation, in this paper we rethink the generation of adversarial prompts through an optimization lens, aiming to stabilize the optimization process and harness more heuristic insights from previous iterations. Specifically, we introduce the \textbf{M}omentum \textbf{A}ccelerated G\textbf{C}G (\textbf{MAC}) attack, which incorporates a momentum term into the gradient heuristic. Experimental results showcase the notable enhancement achieved by MAP in gradient-based attacks on aligned language models. Our code is available at https://github.com/weizeming/momentum-attack-llm.
An Online Gradient-Based Caching Policy with Logarithmic Complexity and Regret Guarantees
Authors: Damiano Carra, Giovanni Neglia
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Operating Systems (cs.OS)
Arxiv link: https://arxiv.org/abs/2405.01263
Pdf link: https://arxiv.org/pdf/2405.01263
Abstract The commonly used caching policies, such as LRU or LFU, exhibit optimal performance only for specific traffic patterns. Even advanced Machine Learning-based methods, which detect patterns in historical request data, struggle when future requests deviate from past trends. Recently, a new class of policies has emerged that makes no assumptions about the request arrival process. These algorithms solve an online optimization problem, enabling continuous adaptation to the context. They offer theoretical guarantees on the regret metric, which is the gap between the gain of the online policy and the gain of the optimal static cache allocation in hindsight. Nevertheless, the high computational complexity of these solutions hinders their practical adoption. In this study, we introduce a groundbreaking gradient-based online caching policy, the first to achieve logarithmic computational complexity relative to catalog size along with regret guarantees. This means our algorithm can efficiently handle large-scale data while minimizing the performance gap between real-time decisions and optimal hindsight choices. As requests arrive, our policy dynamically adjusts the probabilities of including items in the cache, which drive cache update decisions. Our algorithm's streamlined complexity is a key advantage, enabling its application to real-world traces featuring millions of requests and items. This is a significant achievement, as traces of this scale have been out of reach for existing policies with regret guarantees. To the best of our knowledge, our experimental results show for the first time that the regret guarantees of gradient-based caching policies bring significant benefits in scenarios of practical interest.
Model Predictive Guidance for Fuel-Optimal Landing of Reusable Launch Vehicles
Authors: Ki-Wook Jung, Sang-Don Lee, Cheol-Goo Jung, Chang-Hun Lee
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2405.01264
Pdf link: https://arxiv.org/pdf/2405.01264
Abstract This paper introduces a landing guidance strategy for reusable launch vehicles (RLVs) using a model predictive approach based on sequential convex programming (SCP). The proposed approach devises two distinct optimal control problems (OCPs): planning a fuel-optimal landing trajectory that accommodates practical path constraints specific to RLVs, and determining real-time optimal tracking commands. This dual optimization strategy allows for reduced computational load through adjustable prediction horizon lengths in the tracking task, achieving near closed-loop performance. Enhancements in model fidelity for the tracking task are achieved through an alternative rotational dynamics representation, enabling a more stable numerical solution of the OCP and accounting for vehicle transient dynamics. Furthermore, modifications of aerodynamic force in both planning and tracking phases are proposed, tailored for thrust-vector-controlled RLVs, to reduce the fidelity gap without adding computational complexity. Extensive 6-DOF simulation experiments validate the effectiveness and improved guidance performance of the proposed algorithm.
Non-iterative Optimization of Trajectory and Radio Resource for Aerial Network
Authors: Hyeonsu Lyu, Jonggyu Jang, Harim Lee, Hyun Jong Yang
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01314
Pdf link: https://arxiv.org/pdf/2405.01314
Abstract We address a joint trajectory planning, user association, resource allocation, and power control problem to maximize proportional fairness in the aerial IoT network, considering practical end-to-end quality-of-service (QoS) and communication schedules. Though the problem is rather ancient, apart from the fact that the previous approaches have never considered user- and time-specific QoS, we point out a prevalent mistake in coordinate optimization approaches adopted by the majority of the literature. Coordinate optimization approaches, which repetitively optimize radio resources for a fixed trajectory and vice versa, generally converge to local optima when all variables are differentiable. However, these methods often stagnate at a non-stationary point, significantly degrading the network utility in mixed-integer problems such as joint trajectory and radio resource optimization. We detour this problem by converting the formulated problem into the Markov decision process (MDP). Exploiting the beneficial characteristics of the MDP, we design a non-iterative framework that cooperatively optimizes trajectory and radio resources without initial trajectory choice. The proposed framework can incorporate various trajectory planning algorithms such as the genetic algorithm, tree search, and reinforcement learning. Extensive comparisons with diverse baselines verify that the proposed framework significantly outperforms the state-of-the-art method, nearly achieving the global optimum. Our implementation code is available at https://github.com/hslyu/dbspf.
Constrained Reinforcement Learning Under Model Mismatch
Authors: Zhongchang Sun, Sihong He, Fei Miao, Shaofeng Zou
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01327
Pdf link: https://arxiv.org/pdf/2405.01327
Abstract Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address the above challenge, we formulate the problem as constrained RL under model uncertainty, where the goal is to learn a good policy that optimizes the reward and at the same time satisfy the constraint under model mismatch. We develop a Robust Constrained Policy Optimization (RCPO) algorithm, which is the first algorithm that applies to large/continuous state space and has theoretical guarantees on worst-case reward improvement and constraint violation at each iteration during the training. We demonstrate the effectiveness of our algorithm on a set of RL tasks with constraints.
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
Authors: Kelvin C.K. Chan, Yang Zhao, Xuhui Jia, Ming-Hsuan Yang, Huisheng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.01356
Pdf link: https://arxiv.org/pdf/2405.01356
Abstract In subject-driven text-to-image synthesis, the synthesis process tends to be heavily influenced by the reference images provided by users, often overlooking crucial attributes detailed in the text prompt. In this work, we propose Subject-Agnostic Guidance (SAG), a simple yet effective solution to remedy the problem. We show that through constructing a subject-agnostic condition and applying our proposed dual classifier-free guidance, one could obtain outputs consistent with both the given subject and input text prompts. We validate the efficacy of our approach through both optimization-based and encoder-based methods. Additionally, we demonstrate its applicability in second-order customization methods, where an encoder-based model is fine-tuned with DreamBooth. Our approach is conceptually simple and requires only minimal code modifications, but leads to substantial quality improvements, as evidenced by our evaluations and user studies.
ATOM: Attention Mixer for Efficient Dataset Distillation
Authors: Samir Khaki, Ahmad Sajedi, Kai Wang, Lucy Z. Liu, Yuri A. Lawryshyn, Konstantinos N. Plataniotis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.01373
Pdf link: https://arxiv.org/pdf/2405.01373
Abstract Recent works in dataset distillation seek to minimize training expenses by generating a condensed synthetic dataset that encapsulates the information present in a larger real dataset. These approaches ultimately aim to attain test accuracy levels akin to those achieved by models trained on the entirety of the original dataset. Previous studies in feature and distribution matching have achieved significant results without incurring the costs of bi-level optimization in the distillation process. Despite their convincing efficiency, many of these methods suffer from marginal downstream performance improvements, limited distillation of contextual information, and subpar cross-architecture generalization. To address these challenges in dataset distillation, we propose the ATtentiOn Mixer (ATOM) module to efficiently distill large datasets using a mixture of channel and spatial-wise attention in the feature matching process. Spatial-wise attention helps guide the learning process based on consistent localization of classes in their respective images, allowing for distillation from a broader receptive field. Meanwhile, channel-wise attention captures the contextual information associated with the class itself, thus making the synthetic image more informative for training. By integrating both types of attention, our ATOM module demonstrates superior performance across various computer vision datasets, including CIFAR10/100 and TinyImagenet. Notably, our method significantly improves performance in scenarios with a low number of images per class, thereby enhancing its potential. Furthermore, we maintain the improvement in cross-architectures and applications such as neural architecture search.
Common pitfalls to avoid while using multiobjective optimization in machine learning
Authors: Junaid Akhter, Paul David Fährmann, Konstantin Sonntag, Sebastian Peitz
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2405.01480
Pdf link: https://arxiv.org/pdf/2405.01480
Abstract Recently, there has been an increasing interest in exploring the application of multiobjective optimization (MOO) in machine learning (ML). The interest is driven by the numerous situations in real-life applications where multiple objectives need to be optimized simultaneously. A key aspect of MOO is the existence of a Pareto set, rather than a single optimal solution, which illustrates the inherent trade-offs between objectives. Despite its potential, there is a noticeable lack of satisfactory literature that could serve as an entry-level guide for ML practitioners who want to use MOO. Hence, our goal in this paper is to produce such a resource. We critically review previous studies, particularly those involving MOO in deep learning (using Physics-Informed Neural Networks (PINNs) as a guiding example), and identify misconceptions that highlight the need for a better grasp of MOO principles in ML. Using MOO of PINNs as a case study, we demonstrate the interplay between the data loss and the physics loss terms. We highlight the most common pitfalls one should avoid while using MOO techniques in ML. We begin by establishing the groundwork for MOO, focusing on well-known approaches such as the weighted sum (WS) method, alongside more complex techniques like the multiobjective gradient descent algorithm (MGDA). Additionally, we compare the results obtained from the WS and MGDA with one of the most common evolutionary algorithms, NSGA-II. We emphasize the importance of understanding the specific problem, the objective space, and the selected MOO method, while also noting that neglecting factors such as convergence can result in inaccurate outcomes and, consequently, a non-optimal solution. Our goal is to offer a clear and practical guide for ML practitioners to effectively apply MOO, particularly in the context of DL.
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Authors: Gerald Shen, Zhilin Wang, Olivier Delalleau, Jiaqi Zeng, Yi Dong, Daniel Egert, Shengyang Sun, Jimmy Zhang, Sahil Jain, Ali Taghibakhshi, Markel Sanz Ausin, Ashwath Aithal, Oleksii Kuchaiev
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01481
Pdf link: https://arxiv.org/pdf/2405.01481
Abstract Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which often contain tens or hundreds of billions of parameters. We create NeMo-Aligner, a toolkit for model alignment that can efficiently scale to using hundreds of GPUs for training. NeMo-Aligner comes with highly optimized and scalable implementations for major paradigms of model alignment such as: Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), SteerLM, and Self-Play Fine-Tuning (SPIN). Additionally, our toolkit supports running most of the alignment techniques in a Parameter Efficient Fine-Tuning (PEFT) setting. NeMo-Aligner is designed for extensibility, allowing support for other alignment techniques with minimal effort. It is open-sourced with Apache 2.0 License and we invite community contributions at https://github.com/NVIDIA/NeMo-Aligner
D2PO: Discriminator-Guided DPO with Response Evaluation Models
Authors: Prasann Singhal, Nathan Lambert, Scott Niekum, Tanya Goyal, Greg Durrett
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2405.01511
Pdf link: https://arxiv.org/pdf/2405.01511
Abstract Varied approaches for aligning language models have been proposed, including supervised fine-tuning, RLHF, and direct optimization methods such as DPO. Although DPO has rapidly gained popularity due to its straightforward training process and competitive results, there is an open question of whether there remain practical advantages of using a discriminator, like a reward model, to evaluate responses. We propose D2PO, discriminator-guided DPO, an approach for the online setting where preferences are being collected throughout learning. As we collect gold preferences, we use these not only to train our policy, but to train a discriminative response evaluation model to silver-label even more synthetic data for policy training. We explore this approach across a set of diverse tasks, including a realistic chat setting, we find that our approach leads to higher-quality outputs compared to DPO with the same data budget, and greater efficiency in terms of preference data requirements. Furthermore, we show conditions under which silver labeling is most helpful: it is most effective when training the policy with DPO, outperforming traditional PPO, and benefits from maintaining a separate discriminator from the policy model.
Model-based Deep Learning for Rate Split Multiple Access in Vehicular Communications
Authors: Hanwen Zhang, Mingzhe Chen, Alireza Vahid, Haijian Sun
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2405.01515
Pdf link: https://arxiv.org/pdf/2405.01515
Abstract Rate split multiple access (RSMA) has been proven as an effective communication scheme for 5G and beyond, especially in vehicular scenarios. However, RSMA requires complicated iterative algorithms for proper resource allocation, which cannot fulfill the stringent latency requirement in resource constrained vehicles. Although data driven approaches can alleviate this issue, they suffer from poor generalizability and scarce training data. In this paper, we propose a fractional programming (FP) based deep unfolding (DU) approach to address resource allocation problem for a weighted sum rate optimization in RSMA. By carefully designing the penalty function, we couple the variable update with projected gradient descent algorithm (PGD). Following the structure of PGD, we embed few learnable parameters in each layer of the DU network. Through extensive simulation, we have shown that the proposed model-based neural networks has similar performance as optimal results given by traditional algorithm but with much lower computational complexity, less training data, and higher resilience to test set data and out-of-distribution (OOD) data.
FLAME: Factuality-Aware Alignment for Large Language Models
Authors: Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Wen-tau Yih, Xilun Chen
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.01525
Pdf link: https://arxiv.org/pdf/2405.01525
Abstract Alignment is a standard procedure to fine-tune pre-trained large language models (LLMs) to follow natural language instructions and serve as helpful AI assistants. We have observed, however, that the conventional alignment process fails to enhance the factual accuracy of LLMs, and often leads to the generation of more false facts (i.e. hallucination). In this paper, we study how to make the LLM alignment process more factual, by first identifying factors that lead to hallucination in both alignment steps:\ supervised fine-tuning (SFT) and reinforcement learning (RL). In particular, we find that training the LLM on new knowledge or unfamiliar texts can encourage hallucination. This makes SFT less factual as it trains on human labeled data that may be novel to the LLM. Furthermore, reward functions used in standard RL can also encourage hallucination, because it guides the LLM to provide more helpful responses on a diverse set of instructions, often preferring longer and more detailed responses. Based on these observations, we propose factuality-aware alignment, comprised of factuality-aware SFT and factuality-aware RL through direct preference optimization. Experiments show that our proposed factuality-aware alignment guides LLMs to output more factual responses while maintaining instruction-following capability.
Customizing Text-to-Image Models with a Single Image Pair
Authors: Maxwell Jones, Sheng-Yu Wang, Nupur Kumari, David Bau, Jun-Yan Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01536
Pdf link: https://arxiv.org/pdf/2405.01536
Abstract Art reinterpretation is the practice of creating a variation of a reference work, making a paired artwork that exhibits a distinct artistic style. We ask if such an image pair can be used to customize a generative model to capture the demonstrated stylistic difference. We propose Pair Customization, a new customization method that learns stylistic difference from a single image pair and then applies the acquired style to the generation process. Unlike existing methods that learn to mimic a single concept from a collection of images, our method captures the stylistic difference between paired images. This allows us to apply a stylistic change without overfitting to the specific image content in the examples. To address this new task, we employ a joint optimization method that explicitly separates the style and content into distinct LoRA weight spaces. We optimize these style and content weights to reproduce the style and content images while encouraging their orthogonality. During inference, we modify the diffusion process via a new style guidance based on our learned weights. Both qualitative and quantitative experiments show that our method can effectively learn style while avoiding overfitting to image content, highlighting the potential of modeling such stylistic differences from a single image pair.
Keyword: deep learning

HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis
Authors: Andy He, Darren Key, Mason Bulling, Andrew Chang, Skyler Shapiro, Everett Lee
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00738
Pdf link: https://arxiv.org/pdf/2405.00738
Abstract Graphics Processing Units (GPUs) have become the leading hardware accelerator for deep learning applications and are used widely in training and inference of transformers; transformers have achieved state-of-the-art performance in many areas of machine learning and are especially used in most modern Large Language Models (LLMs). However, GPUs require large amounts of energy, which poses environmental concerns, demands high operational costs, and causes GPUs to be unsuitable for edge computing. We develop an accelerator for transformers, namely, Llama 2, an open-source state-of-the-art LLM, using high level synthesis (HLS) on Field Programmable Gate Arrays (FPGAs). HLS allows us to rapidly prototype FPGA designs without writing code at the register-transfer level (RTL). We name our method HLSTransform, and the FPGA designs we synthesize with HLS achieve up to a 12.75x reduction and 8.25x reduction in energy used per token on the Xilinx Virtex UltraScale+ VU9P FPGA compared to an Intel Xeon Broadwell E5-2686 v4 CPU and NVIDIA RTX 3090 GPU respectively, while increasing inference speeds by up to 2.46x compared to CPU and maintaining 0.53x the speed of an RTX 3090 GPU despite the GPU's 4 times higher base clock rate. With the lack of existing open-source FPGA accelerators for transformers, we open-source our code and document our steps for synthesis. We hope this work will serve as a step in democratizing the use of FPGAs in transformer inference and inspire research into energy-efficient inference methods as a whole. The code can be found on https://github.com/HLSTransform/submission.
Federated Graph Learning for EV Charging Demand Forecasting with Personalization Against Cyberattacks
Authors: Yi Li, Renyou Xie, Chaojie Li, Yi Wang, Zhaoyang Dong
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2405.00742
Pdf link: https://arxiv.org/pdf/2405.00742
Abstract Mitigating cybersecurity risk in electric vehicle (EV) charging demand forecasting plays a crucial role in the safe operation of collective EV chargings, the stability of the power grid, and the cost-effective infrastructure expansion. However, existing methods either suffer from the data privacy issue and the susceptibility to cyberattacks or fail to consider the spatial correlation among different stations. To address these challenges, a federated graph learning approach involving multiple charging stations is proposed to collaboratively train a more generalized deep learning model for demand forecasting while capturing spatial correlations among various stations and enhancing robustness against potential attacks. Firstly, for better model performance, a Graph Neural Network (GNN) model is leveraged to characterize the geographic correlation among different charging stations in a federated manner. Secondly, to ensure robustness and deal with the data heterogeneity in a federated setting, a message passing that utilizes a global attention mechanism to aggregate personalized models for each client is proposed. Thirdly, by concerning cyberattacks, a special credit-based function is designed to mitigate potential threats from malicious clients or unwanted attacks. Extensive experiments on a public EV charging dataset are conducted using various deep learning techniques and federated learning methods to demonstrate the prediction accuracy and robustness of the proposed approach.
More is Better: Deep Domain Adaptation with Multiple Sources
Authors: Sicheng Zhao, Hui Chen, Hu Huang, Pengfei Xu, Guiguang Ding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00749
Pdf link: https://arxiv.org/pdf/2405.00749
Abstract In many practical applications, it is often difficult and expensive to obtain large-scale labeled data to train state-of-the-art deep neural networks. Therefore, transferring the learned knowledge from a separate, labeled source domain to an unlabeled or sparsely labeled target domain becomes an appealing alternative. However, direct transfer often results in significant performance decay due to domain shift. Domain adaptation (DA) aims to address this problem by aligning the distributions between the source and target domains. Multi-source domain adaptation (MDA) is a powerful and practical extension in which the labeled data may be collected from multiple sources with different distributions. In this survey, we first define various MDA strategies. Then we systematically summarize and compare modern MDA methods in the deep learning era from different perspectives, followed by commonly used datasets and a brief benchmark. Finally, we discuss future research directions for MDA that are worth investigating.
ICU Bloodstream Infection Prediction: A Transformer-Based Approach for EHR Analysis
Authors: Ortal Hirszowicz, Dvir Aran
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00819
Pdf link: https://arxiv.org/pdf/2405.00819
Abstract We introduce RatchetEHR, a novel transformer-based framework designed for the predictive analysis of electronic health records (EHR) data in intensive care unit (ICU) settings, with a specific focus on bloodstream infection (BSI) prediction. Leveraging the MIMIC-IV dataset, RatchetEHR demonstrates superior predictive performance compared to other methods, including RNN, LSTM, and XGBoost, particularly due to its advanced handling of sequential and temporal EHR data. A key innovation in RatchetEHR is the integration of the Graph Convolutional Transformer (GCT) component, which significantly enhances the ability to identify hidden structural relationships within EHR data, resulting in more accurate clinical predictions. Through SHAP value analysis, we provide insights into influential features for BSI prediction. RatchetEHR integrates multiple advancements in deep learning which together provide accurate predictions even with a relatively small sample size and highly imbalanced dataset. This study contributes to medical informatics by showcasing the application of advanced AI techniques in healthcare and sets a foundation for further research to optimize these capabilities in EHR data analysis.
Transformer-Based Self-Supervised Learning for Histopathological Classification of Ischemic Stroke Clot Origin
Authors: K. Yeh, M. S. Jabal, V. Gupta, D. F. Kallmes, W. Brinjikji, B. S. Erdal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00908
Pdf link: https://arxiv.org/pdf/2405.00908
Abstract Background and Purpose: Identifying the thromboembolism source in ischemic stroke is crucial for treatment and secondary prevention yet is often undetermined. This study describes a self-supervised deep learning approach in digital pathology of emboli for classifying ischemic stroke clot origin from histopathological images. Methods: The dataset included whole slide images (WSI) from the STRIP AI Kaggle challenge, consisting of retrieved clots from ischemic stroke patients following mechanical thrombectomy. Transformer-based deep learning models were developed using transfer learning and self-supervised pretraining for classifying WSI. Customizations included an attention pooling layer, weighted loss function, and threshold optimization. Various model architectures were tested and compared, and model performances were primarily evaluated using weighted logarithmic loss. Results: The model achieved a logloss score of 0.662 in cross-validation and 0.659 on the test set. Different model backbones were compared, with the swin_large_patch4_window12_384 showed higher performance. Thresholding techniques for clot origin classification were employed to balance false positives and negatives. Conclusion: The study demonstrates the extent of efficacy of transformer-based deep learning models in identifying ischemic stroke clot origins from histopathological images and emphasizes the need for refined modeling techniques specifically adapted to thrombi WSI. Further research is needed to improve model performance, interpretability, validate its effectiveness. Future enhancement could include integrating larger patient cohorts, advanced preprocessing strategies, and exploring ensemble multimodal methods for enhanced diagnostic accuracy.
MTDT: A Multi-Task Deep Learning Digital Twin
Authors: Nooshin Yousefzadeh, Rahul Sengupta, Yashaswi Karnati, Anand Rangarajan, Sanjay Ranka
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.00922
Pdf link: https://arxiv.org/pdf/2405.00922
Abstract Traffic congestion has significant impacts on both the economy and the environment. Measures of Effectiveness (MOEs) have long been the standard for evaluating the level of service and operational efficiency of traffic intersections. However, the scarcity of traditional high-resolution loop detector data (ATSPM) presents challenges in accurately measuring MOEs or capturing the intricate temporospatial characteristics inherent in urban intersection traffic. In response to this challenge, we have introduced the Multi-Task Deep Learning Digital Twin (MTDT) as a solution for multifaceted and precise intersection traffic flow simulation. MTDT enables accurate, fine-grained estimation of loop detector waveform time series for each lane of movement, alongside successful estimation of several MOEs for each lane group associated with a traffic phase concurrently and for all approaches of an arbitrary urban intersection. Unlike existing deep learning methodologies, MTDT distinguishes itself through its adaptability to local temporal and spatial features, such as signal timing plans, intersection topology, driving behaviors, and turning movement counts. While maintaining a straightforward design, our model emphasizes the advantages of multi-task learning in traffic modeling. By consolidating the learning process across multiple tasks, MTDT demonstrates reduced overfitting, increased efficiency, and enhanced effectiveness by sharing representations learned by different tasks. Furthermore, our approach facilitates sequential computation and lends itself to complete parallelization through GPU implementation. This not only streamlines the computational process but also enhances scalability and performance.
CrossMPT: Cross-attention Message-Passing Transformer for Error Correcting Codes
Authors: Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim, Yongjune Kim, Jong-Seon No
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2405.01033
Pdf link: https://arxiv.org/pdf/2405.01033
Abstract Error correcting codes~(ECCs) are indispensable for reliable transmission in communication systems. The recent advancements in deep learning have catalyzed the exploration of ECC decoders based on neural networks. Among these, transformer-based neural decoders have achieved state-of-the-art decoding performance. In this paper, we propose a novel Cross-attention Message-Passing Transformer~(CrossMPT). CrossMPT iteratively updates two types of input vectors (i.e., magnitude and syndrome vectors) using two masked cross-attention blocks. The mask matrices in these cross-attention blocks are determined by the code's parity-check matrix that delineates the relationship between magnitude and syndrome vectors. Our experimental results show that CrossMPT significantly outperforms existing neural network-based decoders, particularly in decoding low-density parity-check codes. Notably, CrossMPT also achieves a significant reduction in computational complexity, achieving over a 50\% decrease in its attention layers compared to the original transformer-based decoder, while retaining the computational complexity of the remaining layers.
Few Shot Class Incremental Learning using Vision-Language models
Authors: Anurag Kumar, Chinmay Bharti, Saikat Dutta, Srikrishna Karanam, Biplab Banerjee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2405.01040
Pdf link: https://arxiv.org/pdf/2405.01040
Abstract Recent advancements in deep learning have demonstrated remarkable performance comparable to human capabilities across various supervised computer vision tasks. However, the prevalent assumption of having an extensive pool of training data encompassing all classes prior to model training often diverges from real-world scenarios, where limited data availability for novel classes is the norm. The challenge emerges in seamlessly integrating new classes with few samples into the training data, demanding the model to adeptly accommodate these additions without compromising its performance on base classes. To address this exigency, the research community has introduced several solutions under the realm of few-shot class incremental learning (FSCIL). In this study, we introduce an innovative FSCIL framework that utilizes language regularizer and subspace regularizer. During base training, the language regularizer helps incorporate semantic information extracted from a Vision-Language model. The subspace regularizer helps in facilitating the model's acquisition of nuanced connections between image and text semantics inherent to base classes during incremental training. Our proposed framework not only empowers the model to embrace novel classes with limited data, but also ensures the preservation of performance on base classes. To substantiate the efficacy of our approach, we conduct comprehensive experiments on three distinct FSCIL benchmarks, where our framework attains state-of-the-art performance.
Leverage Multi-source Traffic Demand Data Fusion with Transformer Model for Urban Parking Prediction
Authors: Yin Huang, Yongqi Dong, Youhua Tang, Li Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2405.01055
Pdf link: https://arxiv.org/pdf/2405.01055
Abstract The escalation in urban private car ownership has worsened the urban parking predicament, necessitating effective parking availability prediction for urban planning and management. However, the existing prediction methods suffer from low prediction accuracy with the lack of spatial-temporal correlation features related to parking volume, and neglect of flow patterns and correlations between similar parking lots within certain areas. To address these challenges, this study proposes a parking availability prediction framework integrating spatial-temporal deep learning with multi-source data fusion, encompassing traffic demand data from multiple sources (e.g., metro, bus, taxi services), and parking lot data. The framework is based on the Transformer as the spatial-temporal deep learning model and leverages K-means clustering to establish parking cluster zones, extracting and integrating traffic demand characteristics from various transportation modes (i.e., metro, bus, online ride-hailing, and taxi) connected to parking lots. Real-world empirical data was used to verify the effectiveness of the proposed method compared with different machine learning, deep learning, and traditional statistical models for predicting parking availability. Experimental results reveal that, with the proposed pipeline, the developed Transformer model outperforms other models in terms of various metrics, e.g., Mean Squared Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). By fusing multi-source demanding data with spatial-temporal deep learning techniques, this approach offers the potential to develop parking availability prediction systems that furnish more accurate and timely information to both drivers and urban planners, thereby fostering more efficient and sustainable urban mobility.
Callico: a Versatile Open-Source Document Image Annotation Platform
Authors: Christopher Kermorvant, Eva Bardou, Manon Blanco, Bastien Abadie
Subjects: Computer Vision and Pattern Recognition (cs.CV); Digital Libraries (cs.DL)
Arxiv link: https://arxiv.org/abs/2405.01071
Pdf link: https://arxiv.org/pdf/2405.01071
Abstract This paper presents Callico, a web-based open source platform designed to simplify the annotation process in document recognition projects. The move towards data-centric AI in machine learning and deep learning underscores the importance of high-quality data, and the need for specialised tools that increase the efficiency and effectiveness of generating such data. For document image annotation, Callico offers dual-display annotation for digitised documents, enabling simultaneous visualisation and annotation of scanned images and text. This capability is critical for OCR and HTR model training, document layout analysis, named entity recognition, form-based key value annotation or hierarchical structure annotation with element grouping. The platform supports collaborative annotation with versatile features backed by a commitment to open source development, high-quality code standards and easy deployment via Docker. Illustrative use cases - including the transcription of the Belfort municipal registers, the indexing of French World War II prisoners for the ICRC, and the extraction of personal information from the Socface project's census lists - demonstrate Callico's applicability and utility.
KDPrint: Passive Authentication using Keystroke Dynamics-to-Image Encoding via Standardization
Authors: Yooshin Kim, Namhyeok Kwon, Donghoon Shin
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.01080
Pdf link: https://arxiv.org/pdf/2405.01080
Abstract In contemporary mobile user authentication systems, verifying user legitimacy has become paramount due to the widespread use of smartphones. Although fingerprint and facial recognition are widely used for mobile authentication, PIN-based authentication is still employed as a fallback option if biometric authentication fails after multiple attempts. Consequently, the system remains susceptible to attacks targeting the PIN when biometric methods are unsuccessful. In response to these concerns, two-factor authentication has been proposed, albeit with the caveat of increased user effort. To address these challenges, this paper proposes a passive authentication system that utilizes keystroke data, a byproduct of primary authentication methods, for background user authentication. Additionally, we introduce a novel image encoding technique to capture the temporal dynamics of keystroke data, overcoming the performance limitations of deep learning models. Furthermore, we present a methodology for selecting suitable behavioral biometric features for image representation. The resulting images, depicting the user's PIN input patterns, enhance the model's ability to uniquely identify users through the secondary channel with high accuracy. Experimental results demonstrate that the proposed imaging approach surpasses existing methods in terms of information capacity. In self-collected dataset experiments, incorporating features from prior research, our method achieved an Equal Error Rate (EER) of 6.7\%, outperforming the existing method's 47.7\%. Moreover, our imaging technique attained a True Acceptance Rate (TAR) of 94.4\% and a False Acceptance Rate (FAR) of 8\% for 17 users.
MCMS: Multi-Category Information and Multi-Scale Stripe Attention for Blind Motion Deblurring
Authors: Nianzu Qiao, Lamei Di, Changyin Sun
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.01083
Pdf link: https://arxiv.org/pdf/2405.01083
Abstract Deep learning-based motion deblurring techniques have advanced significantly in recent years. This class of techniques, however, does not carefully examine the inherent flaws in blurry images. For instance, low edge and structural information are traits of blurry images. The high-frequency component of blurry images is edge information, and the low-frequency component is structure information. A blind motion deblurring network (MCMS) based on multi-category information and multi-scale stripe attention mechanism is proposed. Given the respective characteristics of the high-frequency and low-frequency components, a three-stage encoder-decoder model is designed. Specifically, the first stage focuses on extracting the features of the high-frequency component, the second stage concentrates on extracting the features of the low-frequency component, and the third stage integrates the extracted low-frequency component features, the extracted high-frequency component features, and the original blurred image in order to recover the final clear image. As a result, the model effectively improves motion deblurring by fusing the edge information of the high-frequency component and the structural information of the low-frequency component. In addition, a grouped feature fusion technique is developed so as to achieve richer, more three-dimensional and comprehensive utilization of various types of features at a deep level. Next, a multi-scale stripe attention mechanism (MSSA) is designed, which effectively combines the anisotropy and multi-scale information of the image, a move that significantly enhances the capability of the deep model in feature representation. Large-scale comparative studies on various datasets show that the strategy in this paper works better than the recently published measures.
Generative Relevance Feedback and Convergence of Adaptive Re-Ranking: University of Glasgow Terrier Team at TREC DL 2023
Authors: Andrew Parry, Thomas Jaenich, Sean MacAvaney, Iadh Ounis
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2405.01122
Pdf link: https://arxiv.org/pdf/2405.01122
Abstract This paper describes our participation in the TREC 2023 Deep Learning Track. We submitted runs that apply generative relevance feedback from a large language model in both a zero-shot and pseudo-relevance feedback setting over two sparse retrieval approaches, namely BM25 and SPLADE. We couple this first stage with adaptive re-ranking over a BM25 corpus graph scored using a monoELECTRA cross-encoder. We investigate the efficacy of these generative approaches for different query types in first-stage retrieval. In re-ranking, we investigate operating points of adaptive re-ranking with different first stages to find the point in graph traversal where the first stage no longer has an effect on the performance of the overall retrieval pipeline. We find some performance gains from the application of generative query reformulation. However, our strongest run in terms of P@10 and nDCG@10 applied both adaptive re-ranking and generative pseudo-relevance feedback, namely uogtr_b_grf_e_gb.
Detecting and clustering swallow events in esophageal long-term high-resolution manometry
Authors: Alexander Geiger, Lars Wagner, Daniel Rueckert, Dirk Wilhelm, Alissa Jell
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.01126
Pdf link: https://arxiv.org/pdf/2405.01126
Abstract High-resolution manometry (HRM) is the gold standard in diagnosing esophageal motility disorders. As HRM is typically conducted under short-term laboratory settings, intermittently occurring disorders are likely to be missed. Therefore, long-term (up to 24h) HRM (LTHRM) is used to gain detailed insights into the swallowing behavior. However, analyzing the extensive data from LTHRM is challenging and time consuming as medical experts have to analyze the data manually, which is slow and prone to errors. To address this challenge, we propose a Deep Learning based swallowing detection method to accurately identify swallowing events and secondary non-deglutitive-induced esophageal motility disorders in LTHRM data. We then proceed with clustering the identified swallows into distinct classes, which are analyzed by highly experienced clinicians to validate the different swallowing patterns. We evaluate our computational pipeline on a total of 25 LTHRMs, which were meticulously annotated by medical experts. By detecting more than 94% of all relevant swallow events and providing all relevant clusters for a more reliable diagnostic process among experienced clinicians, we are able to demonstrate the effectiveness as well as positive clinical impact of our approach to make LTHRM feasible in clinical care.
Uncertainty-aware self-training with expectation maximization basis transformation
Authors: Zijia Wang, Wenbin Yang, Zhisong Liu, Zhen Jia
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.01175
Pdf link: https://arxiv.org/pdf/2405.01175
Abstract Self-training is a powerful approach to deep learning. The key process is to find a pseudo-label for modeling. However, previous self-training algorithms suffer from the over-confidence issue brought by the hard labels, even some confidence-related regularizers cannot comprehensively catch the uncertainty. Therefore, we propose a new self-training framework to combine uncertainty information of both model and dataset. Specifically, we propose to use Expectation-Maximization (EM) to smooth the labels and comprehensively estimate the uncertainty information. We further design a basis extraction network to estimate the initial basis from the dataset. The obtained basis with uncertainty can be filtered based on uncertainty information. It can then be transformed into the real hard label to iteratively update the model and basis in the retraining process. Experiments on image classification and semantic segmentation show the advantages of our methods among confidence-aware self-training algorithms with 1-3 percentage improvement on different datasets.
Potential Energy based Mixture Model for Noisy Label Learning
Authors: Zijia Wang, Wenbin Yang, Zhisong Liu, Zhen Jia
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2405.01186
Pdf link: https://arxiv.org/pdf/2405.01186
Abstract Training deep neural networks (DNNs) from noisy labels is an important and challenging task. However, most existing approaches focus on the corrupted labels and ignore the importance of inherent data structure. To bridge the gap between noisy labels and data, inspired by the concept of potential energy in physics, we propose a novel Potential Energy based Mixture Model (PEMM) for noise-labels learning. We innovate a distance-based classifier with the potential energy regularization on its class centers. Embedding our proposed classifier with existing deep learning backbones, we can have robust networks with better feature representations. They can preserve intrinsic structures from the data, resulting in a superior noisy tolerance. We conducted extensive experiments to analyze the efficiency of our proposed model on several real-world datasets. Quantitative results show that it can achieve state-of-the-art performance.
DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection
Authors: Yanjing Yang, Xin Zhou, Runfeng Mao, Jinwei Xu, Lanxin Yang, Yu Zhangm, Haifeng Shen, He Zhang
Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2405.01202
Pdf link: https://arxiv.org/pdf/2405.01202
Abstract Software vulnerability detection is generally supported by automated static analysis tools, which have recently been reinforced by deep learning (DL) models. However, despite the superior performance of DL-based approaches over rule-based ones in research, applying DL approaches to software vulnerability detection in practice remains a challenge due to the complex structure of source code, the black-box nature of DL, and the domain knowledge required to understand and validate the black-box results for addressing tasks after detection. Conventional DL models are trained by specific projects and, hence, excel in identifying vulnerabilities in these projects but not in others. These models with poor performance in vulnerability detection would impact the downstream tasks such as location and repair. More importantly, these models do not provide explanations for developers to comprehend detection results. In contrast, Large Language Models (LLMs) have made lots of progress in addressing these issues by leveraging prompting techniques. Unfortunately, their performance in identifying vulnerabilities is unsatisfactory. This paper contributes \textbf{\DLAP}, a \underline{\textbf{D}}eep \underline{\textbf{L}}earning \underline{\textbf{A}}ugmented LLMs \underline{\textbf{P}}rompting framework that combines the best of both DL models and LLMs to achieve exceptional vulnerability detection performance. Experimental evaluation results confirm that \DLAP outperforms state-of-the-art prompting frameworks, including role-based prompts, auxiliary information prompts, chain-of-thought prompts, and in-context learning prompts, as well as fine-turning on multiple metrics.
RaffeSDG: Random Frequency Filtering enabled Single-source Domain Generalization for Medical Image Segmentation
Authors: Heng Li, Haojin Li, Jianyu Chen, Zhongxi Qiu, Huazhu Fu, Lidai Wang, Yan Hu, Jiang Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.01228
Pdf link: https://arxiv.org/pdf/2405.01228
Abstract Deep learning models often encounter challenges in making accurate inferences when there are domain shifts between the source and target data. This issue is particularly pronounced in clinical settings due to the scarcity of annotated data resulting from the professional and private nature of medical data. Despite the existence of decent solutions, many of them are hindered in clinical settings due to limitations in data collection and computational complexity. To tackle domain shifts in data-scarce medical scenarios, we propose a Random frequency filtering enabled Single-source Domain Generalization algorithm (RaffeSDG), which promises robust out-of-domain inference with segmentation models trained on a single-source domain. A filter-based data augmentation strategy is first proposed to promote domain variability within a single-source domain by introducing variations in frequency space and blending homologous samples. Then Gaussian filter-based structural saliency is also leveraged to learn robust representations across augmented samples, further facilitating the training of generalizable segmentation models. To validate the effectiveness of RaffeSDG, we conducted extensive experiments involving out-of-domain inference on segmentation tasks for three human tissues imaged by four diverse modalities. Through thorough investigations and comparisons, compelling evidence was observed in these experiments, demonstrating the potential and generalizability of RaffeSDG. The code is available at https://github.com/liamheng/Non-IID_Medical_Image_Segmentation.
Identification of Entailment and Contradiction Relations between Natural Language Sentences: A Neurosymbolic Approach
Authors: Xuyao Feng, Anthony Hunter
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2405.01259
Pdf link: https://arxiv.org/pdf/2405.01259
Abstract Natural language inference (NLI), also known as Recognizing Textual Entailment (RTE), is an important aspect of natural language understanding. Most research now uses machine learning and deep learning to perform this task on specific datasets, meaning their solution is not explainable nor explicit. To address the need for an explainable approach to RTE, we propose a novel pipeline that is based on translating text into an Abstract Meaning Representation (AMR) graph. For this we use a pre-trained AMR parser. We then translate the AMR graph into propositional logic and use a SAT solver for automated reasoning. In text, often commonsense suggests that an entailment (or contradiction) relationship holds between a premise and a claim, but because different wordings are used, this is not identified from their logical representations. To address this, we introduce relaxation methods to allow replacement or forgetting of some propositions. Our experimental results show this pipeline performs well on four RTE datasets.
The Importance of Model Inspection for Better Understanding Performance Characteristics of Graph Neural Networks
Authors: Nairouz Shehata, Carolina Piçarra, Anees Kazi, Ben Glocker
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01270
Pdf link: https://arxiv.org/pdf/2405.01270
Abstract This study highlights the importance of conducting comprehensive model inspection as part of comparative performance analyses. Here, we investigate the effect of modelling choices on the feature learning characteristics of graph neural networks applied to a brain shape classification task. Specifically, we analyse the effect of using parameter-efficient, shared graph convolutional submodels compared to structure-specific, non-shared submodels. Further, we assess the effect of mesh registration as part of the data harmonisation pipeline. We find substantial differences in the feature embeddings at different layers of the models. Our results highlight that test accuracy alone is insufficient to identify important model characteristics such as encoded biases related to data source or potentially non-discriminative features learned in submodels. Our model inspection framework offers a valuable tool for practitioners to better understand performance characteristics of deep learning models in medical imaging.
Quantifying Spatial Domain Explanations in BCI using Earth Mover's Distance
Authors: Param Rajpura, Hubert Cecotti, Yogesh Kumar Meena
Subjects: Human-Computer Interaction (cs.HC); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2405.01277
Pdf link: https://arxiv.org/pdf/2405.01277
Abstract Brain-computer interface (BCI) systems facilitate unique communication between humans and computers, benefiting severely disabled individuals. Despite decades of research, BCIs are not fully integrated into clinical and commercial settings. It's crucial to assess and explain BCI performance, offering clear explanations for potential users to avoid frustration when it doesn't work as expected. This work investigates the efficacy of different deep learning and Riemannian geometry-based classification models in the context of motor imagery (MI) based BCI using electroencephalography (EEG). We then propose an optimal transport theory-based approach using earth mover's distance (EMD) to quantify the comparison of the feature relevance map with the domain knowledge of neuroscience. For this, we utilized explainable AI (XAI) techniques for generating feature relevance in the spatial domain to identify important channels for model outcomes. Three state-of-the-art models are implemented - 1) Riemannian geometry-based classifier, 2) EEGNet, and 3) EEG Conformer, and the observed trend in the model's accuracy across different architectures on the dataset correlates with the proposed feature relevance metrics. The models with diverse architectures perform significantly better when trained on channels relevant to motor imagery than data-driven channel selection. This work focuses attention on the necessity for interpretability and incorporating metrics beyond accuracy, underscores the value of combining domain knowledge and quantifying model interpretations with data-driven approaches in creating reliable and robust Brain-Computer Interfaces (BCIs).
Data Scoping: Effectively Learning the Evolution of Generic Transport PDEs
Authors: Jiangce Chen, Wenzhuo Xu, Zeda Xu, Noelia Grande Gutiérrez, Sneha Prabha Narra, Christopher McComb
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2405.01319
Pdf link: https://arxiv.org/pdf/2405.01319
Abstract Transport phenomena (e.g., fluid flows) are governed by time-dependent partial differential equations (PDEs) describing mass, momentum, and energy conservation, and are ubiquitous in many engineering applications. However, deep learning architectures are fundamentally incompatible with the simulation of these PDEs. This paper clearly articulates and then solves this incompatibility. The local-dependency of generic transport PDEs implies that it only involves local information to predict the physical properties at a location in the next time step. However, the deep learning architecture will inevitably increase the scope of information to make such predictions as the number of layers increases, which can cause sluggish convergence and compromise generalizability. This paper aims to solve this problem by proposing a distributed data scoping method with linear time complexity to strictly limit the scope of information to predict the local properties. The numerical experiments over multiple physics show that our data scoping method significantly accelerates training convergence and improves the generalizability of benchmark models on large-scale engineering simulations. Specifically, over the geometries not included in the training data for heat transferring simulation, it can increase the accuracy of Convolutional Neural Networks (CNNs) by 21.7 \% and that of Fourier Neural Operators (FNOs) by 38.5 \% on average.
Common pitfalls to avoid while using multiobjective optimization in machine learning
Authors: Junaid Akhter, Paul David Fährmann, Konstantin Sonntag, Sebastian Peitz
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2405.01480
Pdf link: https://arxiv.org/pdf/2405.01480
Abstract Recently, there has been an increasing interest in exploring the application of multiobjective optimization (MOO) in machine learning (ML). The interest is driven by the numerous situations in real-life applications where multiple objectives need to be optimized simultaneously. A key aspect of MOO is the existence of a Pareto set, rather than a single optimal solution, which illustrates the inherent trade-offs between objectives. Despite its potential, there is a noticeable lack of satisfactory literature that could serve as an entry-level guide for ML practitioners who want to use MOO. Hence, our goal in this paper is to produce such a resource. We critically review previous studies, particularly those involving MOO in deep learning (using Physics-Informed Neural Networks (PINNs) as a guiding example), and identify misconceptions that highlight the need for a better grasp of MOO principles in ML. Using MOO of PINNs as a case study, we demonstrate the interplay between the data loss and the physics loss terms. We highlight the most common pitfalls one should avoid while using MOO techniques in ML. We begin by establishing the groundwork for MOO, focusing on well-known approaches such as the weighted sum (WS) method, alongside more complex techniques like the multiobjective gradient descent algorithm (MGDA). Additionally, we compare the results obtained from the WS and MGDA with one of the most common evolutionary algorithms, NSGA-II. We emphasize the importance of understanding the specific problem, the objective space, and the selected MOO method, while also noting that neglecting factors such as convergence can result in inaccurate outcomes and, consequently, a non-optimal solution. Our goal is to offer a clear and practical guide for ML practitioners to effectively apply MOO, particularly in the context of DL.
A separability-based approach to quantifying generalization: which layer is best?
Authors: Luciano Dyballa, Evan Gerritz, Steven W. Zucker
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2405.01524
Pdf link: https://arxiv.org/pdf/2405.01524
Abstract Generalization to unseen data remains poorly understood for deep learning classification and foundation models. How can one assess the ability of networks to adapt to new or extended versions of their input space in the spirit of few-shot learning, out-of-distribution generalization, and domain adaptation? Which layers of a network are likely to generalize best? We provide a new method for evaluating the capacity of networks to represent a sampled domain, regardless of whether the network has been trained on all classes in the domain. Our approach is the following: after fine-tuning state-of-the-art pre-trained models for visual classification on a particular domain, we assess their performance on data from related but distinct variations in that domain. Generalization power is quantified as a function of the latent embeddings of unseen data from intermediate layers for both unsupervised and supervised settings. Working throughout all stages of the network, we find that (i) high classification accuracy does not imply high generalizability; and (ii) deeper layers in a model do not always generalize the best, which has implications for pruning. Since the trends observed across datasets are largely consistent, we conclude that our approach reveals (a function of) the intrinsic capacity of the different layers of a model to generalize.

qiaoyuet / arxiv_daily

New submissions for Fri, 3 May 24 #99

Keyword: differential privacy

The Privacy Power of Correlated Noise in Decentralized Learning

Privacy-Enhanced Database Synthesis for Benchmark Publishing

Navigating Heterogeneity and Privacy in One-Shot Federated Learning with Diffusion Models

Keyword: privacy

Federated Graph Learning for EV Charging Demand Forecasting with Personalization Against Cyberattacks

The Impact of IMSI Catcher Deployments on Cellular Network Security: Challenges and Countermeasures in 4G and 5G Networks

Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning

Quantum Federated Learning Experiments in the Cloud with Data Encoding

Recovering Labels from Local Updates in Federated Learning

FREE: Faster and Better Data-Free Meta-Learning

Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment

Towards Trust Proof for Secure Confidential Virtual Machines

The Privacy Power of Correlated Noise in Decentralized Learning

Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization Approach

Boosting Communication Efficiency of Federated Learning's Secure Aggregation

Gradient-Congruity Guided Federated Sparse Training

Improving Membership Inference in ASR Model Auditing with Perturbed Loss Features

A Survey on Semantic Communication Networks: Architecture, Security, and Privacy

Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration

Privacy-Enhanced Database Synthesis for Benchmark Publishing

IDPFilter: Mitigating Interdependent Privacy Issues in Third-Party Apps

Exploring Privacy Issues in Mission Critical Communication: Navigating 5G and Beyond Networks

Navigating Heterogeneity and Privacy in One-Shot Federated Learning with Diffusion Models

Keyword: machine learning

Understanding Social Perception, Interactions, and Safety Aspects of Sidewalk Delivery Robots Using Sentiment Analysis

Joint torques prediction of a robotic arm using neural networks

Interactive Analysis of LLMs using Meaningful Counterfactuals

HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis

On the weight dynamics of learning networks

Quantum AI for Alzheimer's disease early screening

HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond

Public Computing Intellectuals in the Age of AI Crisis

Learning to Boost the Performance of Stable Nonlinear Systems

Artificial intelligence for context-aware visual change detection in software test automation

Wake Vision: A Large-scale, Diverse Dataset and Benchmark Suite for TinyML Person Detection

De-Biasing Models of Biased Decisions: A Comparison of Methods Using Mortgage Application Data

LOQA: Learning with Opponent Q-Learning Awareness

Polynomial Chaos Expanded Gaussian Process

Explicitly Modeling Generality into Self-Supervised Learning

Leverage Multi-source Traffic Demand Data Fusion with Transformer Model for Urban Parking Prediction

A text-based, generative deep learning model for soil reflectance spectrum simulation in the VIS-NIR (400-2499 nm) bands

Callico: a Versatile Open-Source Document Image Annotation Platform

Boosting Communication Efficiency of Federated Learning's Secure Aggregation

Gradient-Congruity Guided Federated Sparse Training

Lying Graph Convolution: Learning to Lie for Node Classification Tasks

Identification of Entailment and Contradiction Relations between Natural Language Sentences: A Neurosymbolic Approach

An Online Gradient-Based Caching Policy with Logarithmic Complexity and Regret Guarantees

Invariant Risk Minimization Is A Total Variation Model

Uncertainty for Active Learning on Graphs

Common pitfalls to avoid while using multiobjective optimization in machine learning

Digital Twin Generators for Disease Modeling

Keyword: optimization

Comparative approach: Electric distribution optimization with loss minimization algorithm and particle swarm optimization

Technical Report on BaumEvA Evolutionary Optimization Python-Library Testing

Life-long Learning and Testing for Automated Vehicles via Adaptive Scenario Sampling as A Continuous Optimization Process

Soft Preference Optimization: Aligning Language Models to Expert Distributions

Does Using Bazel Help Speed Up Continuous Integration Builds?

Sifting out communities in large sparse networks

HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond

A Convex Formulation of the Soft-Capture Problem

Learning to Boost the Performance of Stable Nonlinear Systems

A Differentiable Dynamic Modeling Approach to Integrated Motion Planning and Actuator Physical Design for Mobile Manipulators

Transformer-Based Self-Supervised Learning for Histopathological Classification of Ischemic Stroke Clot Origin

X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

Robust Decentralized Learning with Local Updates and Gradient Tracking

Active Cell Balancing for Extended Operational Time of Lithium-Ion Battery Systems in Energy Storage Applications

Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation

QSimPy: A Learning-centric Simulation Framework for Quantum Cloud Resource Management

Differentiable Particles for General-Purpose Deformable Object Manipulation

Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization Approach

Multi-user ISAC through Stacked Intelligent Metasurfaces: New Algorithms and Experiments

Hypergraph $p$-Laplacian regularization on point clouds for data interpolation

Faster Learned Sparse Retrieval with Block-Max Pruning

Energy-Efficient Reconfigurable Holographic Surfaces Operating in the Presence of Realistic Hardware Impairments

Optimizing Satellite Network Infrastructure: A Joint Approach to Gateway Placement and Routing

GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression

Third Medium Finite Element Contact Formulation for Pneumatically Actuated Systems