New submissions for Tue, 23 Jan 24

Keyword: sgd

Communication Efficient and Provable Federated Unlearning

Authors: Authors: Youming Tao, Cheng-Long Wang, Miao Pan, Dongxiao Yu, Xiuzhen Cheng, Di Wang
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2401.11018
Pdf link: https://arxiv.org/pdf/2401.11018
Abstract We study federated unlearning, a novel problem to eliminate the impact of specific clients or data points on the global model learned via federated learning (FL). This problem is driven by the right to be forgotten and the privacy challenges in FL. We introduce a new framework for exact federated unlearning that meets two essential criteria: \textit{communication efficiency} and \textit{exact unlearning provability}. To our knowledge, this is the first work to tackle both aspects coherently. We start by giving a rigorous definition of \textit{exact} federated unlearning, which guarantees that the unlearned model is statistically indistinguishable from the one trained without the deleted data. We then pinpoint the key property that enables fast exact federated unlearning: total variation (TV) stability, which measures the sensitivity of the model parameters to slight changes in the dataset. Leveraging this insight, we develop a TV-stable FL algorithm called \texttt{FATS}, which modifies the classical \texttt{\underline{F}ed\underline{A}vg} algorithm for \underline{T}V \underline{S}tability and employs local SGD with periodic averaging to lower the communication round. We also design efficient unlearning algorithms for \texttt{FATS} under two settings: client-level and sample-level unlearning. We provide theoretical guarantees for our learning and unlearning algorithms, proving that they achieve exact federated unlearning with reasonable convergence rates for both the original and unlearned models. We empirically validate our framework on 6 benchmark datasets, and show its superiority over state-of-the-art methods in terms of accuracy, communication cost, computation cost, and unlearning efficacy.
Understanding the Generalization Benefits of Late Learning Rate Decay
Authors: Authors: Yinuo Ren, Chao Ma, Lexing Ying
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.11600
Pdf link: https://arxiv.org/pdf/2401.11600
Abstract Why do neural networks trained with large learning rates for a longer time often lead to better generalization? In this paper, we delve into this question by examining the relation between training and testing loss in neural networks. Through visualization of these losses, we note that the training trajectory with a large learning rate navigates through the minima manifold of the training loss, finally nearing the neighborhood of the testing loss minimum. Motivated by these findings, we introduce a nonlinear model whose loss landscapes mirror those observed for real neural networks. Upon investigating the training process using SGD on our model, we demonstrate that an extended phase with a large learning rate steers our model towards the minimum norm solution of the training loss, which may achieve near-optimal generalization, thereby affirming the empirically observed benefits of late learning rate decay.
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Authors: Authors: Marlon Becker, Frederick Altrock, Benjamin Risse
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.12033
Pdf link: https://arxiv.org/pdf/2401.12033
Abstract The recently proposed optimization algorithm for deep neural networks Sharpness Aware Minimization (SAM) suggests perturbing parameters before gradient calculation by a gradient ascent step to guide the optimization into parameter space regions of flat loss. While significant generalization improvements and thus reduction of overfitting could be demonstrated, the computational costs are doubled due to the additionally needed gradient calculation, making SAM unfeasible in case of limited computationally capacities. Motivated by Nesterov Accelerated Gradient (NAG) we propose Momentum-SAM (MSAM), which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands over SGD or Adam. We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization. Code is available at https://github.com/MarlonBecker/MSAM.
The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization
Authors: Authors: Matan Schliserman, Uri Sherman, Tomer Koren
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.12058
Pdf link: https://arxiv.org/pdf/2401.12058
Abstract We study the generalization performance of gradient methods in the fundamental stochastic convex optimization setting, focusing on its dimension dependence. First, for full-batch gradient descent (GD) we give a construction of a learning problem in dimension $d=O(n^2)$, where the canonical version of GD (tuned for optimal performance of the empirical risk) trained with $n$ training examples converges, with constant probability, to an approximate empirical risk minimizer with $\Omega(1)$ population excess risk. Our bound translates to a lower bound of $\Omega (\sqrt{d})$ on the number of training examples required for standard GD to reach a non-trivial test error, answering an open question raised by Feldman (2016) and Amir, Koren, and Livni (2021b) and showing that a non-trivial dimension dependence is unavoidable. Furthermore, for standard one-pass stochastic gradient descent (SGD), we show that an application of the same construction technique provides a similar $\Omega(\sqrt{d})$ lower bound for the sample complexity of SGD to reach a non-trivial empirical error, despite achieving optimal test performance. This again provides an exponential improvement in the dimension dependence compared to previous work (Koren, Livni, Mansour, and Sherman, 2022), resolving an open question left therein.
Keyword: optimization

Quantum Neural Network Software Testing, Analysis, and Code Optimization for Advanced IoT Systems: Design, Implementation, and Visualization
Authors: Authors: Soohyun Park, Joongheon Kim
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2401.10914
Pdf link: https://arxiv.org/pdf/2401.10914
Abstract This paper introduces a novel run-time testing, analysis, and code optimization (TACO) method for quantum neural network (QNN) software in advanced Internet-of-Things (IoT) systems, which visually presents the learning performance that is called a barren plateau. The run-time visual presentation of barren plateau situations is helpful for real-time quantum-based advanced IoT software testing because the software engineers can easily be aware of the training performances of QNN. Moreover, this tool is obviously useful for software engineers because it can intuitively guide them in designing and implementing high-accurate QNN-based advanced IoT software even if they are not familiar with quantum mechanics and quantum computing. Lastly, the proposed TACO is also capable of visual feedback because software engineers visually identify the barren plateau situations using tensorboard. In turn, they are also able to modify QNN structures based on the information.
Automatic dimensionality reduction of Twin-in-the-Loop Observers
Authors: Authors: Giacomo Delcaro, Federico Dettù, Simone Formentin, Sergio Matteo Savaresi
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.10945
Pdf link: https://arxiv.org/pdf/2401.10945
Abstract State-of-the-art vehicle dynamics estimation techniques usually share one common drawback: each variable to estimate is computed with an independent, simplified filtering module. These modules run in parallel and need to be calibrated separately. To solve this issue, a unified Twin-in-the-Loop (TiL) Observer architecture has recently been proposed: the classical simplified control-oriented vehicle model in the estimators is replaced by a full-fledged vehicle simulator, or digital twin (DT). The states of the DT are corrected in real time with a linear time invariant output error law. Since the simulator is a black-box, no explicit analytical formulation is available, hence classical filter tuning techniques cannot be used. Due to this reason, Bayesian Optimization will be used to solve a data-driven optimization problem to tune the filter. Due to the complexity of the DT, the optimization problem is high-dimensional. This paper aims to find a procedure to tune the high-complexity observer by lowering its dimensionality. In particular, in this work we will analyze both a supervised and an unsupervised learning approach. The strategies have been validated for speed and yaw-rate estimation on real-world data.
Decentralizing Coordination in Open Vehicle Fleets for Scalable and Dynamic Task Allocation
Authors: Authors: Marin Lujak, Stefano Giordani, Andrea Omicini, Sascha Ossowski
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.10965
Pdf link: https://arxiv.org/pdf/2401.10965
Abstract One of the major challenges in the coordination of large, open, collaborative, and commercial vehicle fleets is dynamic task allocation. Self-concerned individually rational vehicle drivers have both local and global objectives, which require coordination using some fair and efficient task allocation method. In this paper, we review the literature on scalable and dynamic task allocation focusing on deterministic and dynamic two-dimensional linear assignment problems. We focus on multiagent system representation of open vehicle fleets where dynamically appearing vehicles are represented by software agents that should be allocated to a set of dynamically appearing tasks. We give a comparison and critical analysis of recent research results focusing on centralized, distributed, and decentralized solution approaches. Moreover, we propose mathematical models for dynamic versions of the following assignment problems well known in combinatorial optimization: the assignment problem, bottleneck assignment problem, fair matching problem, dynamic minimum deviation assignment problem, $\sum_{k}$-assignment problem, the semiassignment problem, the assignment problem with side constraints, and the assignment problem while recognizing agent qualification; all while considering the main aspect of open vehicle fleets: random arrival of tasks and vehicles (agents) that may become available after assisting previous tasks or by participating in the fleet at times based on individual interest.
Optimization of the Context-Free Language Reachability Matrix-Based Algorithm
Authors: Authors: Ilia Muravev
Subjects: Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2401.11029
Pdf link: https://arxiv.org/pdf/2401.11029
Abstract Various static analysis problems are reformulated as instances of the Context-Free Language Reachability (CFL-r) problem. One promising way to make solving CFL-r more practical for large-scale interprocedural graphs is to reduce CFL-r to linear algebra operations on sparse matrices, as they are efficiently executed on modern hardware. In this work, we present five optimizations for a matrix-based CFL-r algorithm that utilize the specific properties of both the underlying semiring and the widely-used linear algebra library SuiteSparse:GraphBlas. Our experimental results show that these optimizations result in orders of magnitude speedup, with the optimized matrix-based CFL-r algorithm consistently outperforming state-of-the-art CFL-r solvers across four considered static analyses.
Chance-Constrained, Drift-Safe Guidance for Spacecraft Rendezvous
Authors: Authors: Andrew W. Berning Jr., Ethan R. Burnett, Stefan Bieniawski
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.11077
Pdf link: https://arxiv.org/pdf/2401.11077
Abstract A robust drift-safe rendezvous trajectory optimization tool is developed in this work, with applications to orbital rendezvous and proximity operations. The method is based on direct collocation and utilizes a sequential convex programming framework, and is extended from previous work to include passive safety constraints. The tool is then paired with a dispersion analysis framework to allow trajectories to be optimized subject to plant, navigation, and actuator uncertainties. The timing, direction, and magnitude of orbital maneuvers are optimized subject to the expected propellant usage, for a given navigation system performance. Representative trajectories are presented for the LEO flight regime, but the approach can also be applied to GEO and NRHO with minimal modification.
Adaptive Global-Local Representation Learning and Selection for Cross-Domain Facial Expression Recognition
Authors: Authors: Yuefang Gao, Yuhao Xie, Zeke Zexi Hu, Tianshui Chen, Liang Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.11085
Pdf link: https://arxiv.org/pdf/2401.11085
Abstract Domain shift poses a significant challenge in Cross-Domain Facial Expression Recognition (CD-FER) due to the distribution variation across different domains. Current works mainly focus on learning domain-invariant features through global feature adaptation, while neglecting the transferability of local features. Additionally, these methods lack discriminative supervision during training on target datasets, resulting in deteriorated feature representation in target domain. To address these limitations, we propose an Adaptive Global-Local Representation Learning and Selection (AGLRLS) framework. The framework incorporates global-local adversarial adaptation and semantic-aware pseudo label generation to enhance the learning of domain-invariant and discriminative feature during training. Meanwhile, a global-local prediction consistency learning is introduced to improve classification results during inference. Specifically, the framework consists of separate global-local adversarial learning modules that learn domain-invariant global and local features independently. We also design a semantic-aware pseudo label generation module, which computes semantic labels based on global and local features. Moreover, a novel dynamic threshold strategy is employed to learn the optimal thresholds by leveraging independent prediction of global and local features, ensuring filtering out the unreliable pseudo labels while retaining reliable ones. These labels are utilized for model optimization through the adversarial learning process in an end-to-end manner. During inference, a global-local prediction consistency module is developed to automatically learn an optimal result from multiple predictions. We conduct comprehensive experiments and analysis based on a fair evaluation benchmark. The results demonstrate that the proposed framework outperforms the current competing methods by a substantial margin.
Meta Reinforcement Learning for Strategic IoT Deployments Coverage in Disaster-Response UAV Swarms
Authors: Authors: Marwan Dhuheir, Aiman Erbad, Ala Al-Fuqaha
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.11118
Pdf link: https://arxiv.org/pdf/2401.11118
Abstract In the past decade, Unmanned Aerial Vehicles (UAVs) have grabbed the attention of researchers in academia and industry for their potential use in critical emergency applications, such as providing wireless services to ground users and collecting data from areas affected by disasters, due to their advantages in terms of maneuverability and movement flexibility. The UAVs' limited resources, energy budget, and strict mission completion time have posed challenges in adopting UAVs for these applications. Our system model considers a UAV swarm that navigates an area collecting data from ground IoT devices focusing on providing better service for strategic locations and allowing UAVs to join and leave the swarm (e.g., for recharging) in a dynamic way. In this work, we introduce an optimization model with the aim of minimizing the total energy consumption and provide the optimal path planning of UAVs under the constraints of minimum completion time and transmit power. The formulated optimization is NP-hard making it not applicable for real-time decision making. Therefore, we introduce a light-weight meta-reinforcement learning solution that can also cope with sudden changes in the environment through fast convergence. We conduct extensive simulations and compare our approach to three state-of-the-art learning models. Our simulation results prove that our introduced approach is better than the three state-of-the-art algorithms in providing coverage to strategic locations with fast convergence.
The Bit Complexity of Dynamic Algebraic Formulas and their Determinants
Authors: Authors: Emile Anand, Jan van den Brand, Mehrdad Ghadiri, Daniel Zhang
Subjects: Computational Complexity (cs.CC)
Arxiv link: https://arxiv.org/abs/2401.11127
Pdf link: https://arxiv.org/pdf/2401.11127
Abstract Many iterative algorithms in optimization, computational geometry, computer algebra, and other areas of computer science require repeated computation of some algebraic expression whose input changes slightly from one iteration to the next. Although efficient data structures have been proposed for maintaining the solution of such algebraic expressions under low-rank updates, most of these results are only analyzed under exact arithmetic (real-RAM model and finite fields) which may not accurately reflect the complexity guarantees of real computers. In this paper, we analyze the stability and bit complexity of such data structures for expressions that involve the inversion, multiplication, addition, and subtraction of matrices under the word-RAM model. We show that the bit complexity only increases linearly in the number of matrix operations in the expression. In addition, we consider the bit complexity of maintaining the determinant of a matrix expression. We show that the required bit complexity depends on the logarithm of the condition number of matrices instead of the logarithm of their determinant. We also discuss rank maintenance and its connections to determinant maintenance. Our results have wide applications ranging from computational geometry (e.g., computing the volume of a polytope) to optimization (e.g., solving linear programs using the simplex algorithm).
Stability Plasticity Decoupled Fine-tuning For Few-shot end-to-end Object Detection
Authors: Authors: Yuantao Yin, Ping Yin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.11140
Pdf link: https://arxiv.org/pdf/2401.11140
Abstract Few-shot object detection(FSOD) aims to design methods to adapt object detectors efficiently with only few annotated samples. Fine-tuning has been shown to be an effective and practical approach. However, previous works often take the classical base-novel two stage fine-tuning procedure but ignore the implicit stability-plasticity contradiction among different modules. Specifically, the random re-initialized classifiers need more plasticity to adapt to novel samples. The other modules inheriting pre-trained weights demand more stability to reserve their class-agnostic knowledge. Regular fine-tuning which couples the optimization of these two parts hurts the model generalization in FSOD scenarios. In this paper, we find that this problem is prominent in the end-to-end object detector Sparse R-CNN for its multi-classifier cascaded architecture. We propose to mitigate this contradiction by a new three-stage fine-tuning procedure by introducing an addtional plasticity classifier fine-tuning(PCF) stage. We further design the multi-source ensemble(ME) technique to enhance the generalization of the model in the final fine-tuning stage. Extensive experiments verify that our method is effective in regularizing Sparse R-CNN, outperforming previous methods in the FSOD benchmark.
Wideband Beamforming for RIS Assisted Near-Field Communications
Authors: Authors: Ji Wang, Jian Xiao, Yixuan Zou, Wenwu Xie, Yuanwei Liu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.11141
Pdf link: https://arxiv.org/pdf/2401.11141
Abstract A near-field wideband beamforming scheme is investigated for reconfigurable intelligent surface (RIS) assisted multiple-input multiple-output (MIMO) systems, in which a deep learning-based end-to-end (E2E) optimization framework is proposed to maximize the system spectral efficiency. To deal with the near-field double beam split effect, the base station is equipped with frequency-dependent hybrid precoding architecture by introducing sub-connected true time delay (TTD) units, while two specific RIS architectures, namely true time delay-based RIS (TTD-RIS) and virtual subarray-based RIS (SA-RIS), are exploited to realize the frequency-dependent passive beamforming at the RIS. Furthermore, the efficient E2E beamforming models without explicit channel state information are proposed, which jointly exploits the uplink channel training module and the downlink wideband beamforming module. In the proposed network architecture of the E2E models, the classical communication signal processing methods, i.e., polarized filtering and sparsity transform, are leveraged to develop a signal-guided beamforming network. Numerical results show that the proposed E2E models have superior beamforming performance and robustness to conventional beamforming benchmarks. Furthermore, the tradeoff between the beamforming gain and the hardware complexity is investigated for different frequency-dependent RIS architectures, in which the TTD-RIS can achieve better spectral efficiency than the SA-RIS while requiring additional energy consumption and hardware cost.
Joint Beamforming Optimization and Mode Selection for RDARS-aided MIMO Systems
Authors: Authors: Jintao Wang, Chengzhi Ma, Shiqi Gong, Xi Yang, Shaodan Ma
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.11205
Pdf link: https://arxiv.org/pdf/2401.11205
Abstract Considering the appealing distribution gains of distributed antenna systems (DAS) and passive gains of reconfigurable intelligent surface (RIS), a flexible reconfigurable architecture called reconfigurable distributed antenna and reflecting surface (RDARS) is proposed. RDARS encompasses DAS and RIS as two special cases and maintains the advantages of distributed antennas while reducing the hardware cost by replacing some active antennas with low-cost passive reflecting surfaces. In this paper, we present a RDARS-aided uplink multi-user communication system and investigate the system transmission reliability with the newly proposed architecture. Specifically, in addition to the distribution gain and the reflection gain provided by the connection and reflection modes, respectively, we also consider the dynamic mode switching of each element which introduces an additional degree of freedom (DoF) and thus results in a selection gain. As such, we aim to minimize the total sum mean-square-error (MSE) of all data streams by jointly optimizing the receive beamforming matrix, the reflection phase shifts and the channel-aware placement of elements in the connection mode. To tackle this nonconvex problem with intractable binary and cardinality constraints, we propose an inexact block coordinate descent (BCD) based penalty dual decomposition (PDD) algorithm with the guaranteed convergence. Since the PDD algorithm usually suffers from high computational complexity, a low-complexity greedy-search-based alternating optimization (AO) algorithm is developed to yield a semi-closed-form solution with acceptable performance. Numerical results demonstrate the superiority of the proposed architecture compared to the conventional fully passive RIS or DAS. Furthermore, some insights about the practical implementation of RDARS are provided.
Selecting Walk Schemes for Database Embedding
Authors: Authors: Yuval Lev Lubarsky, Jan Tönshoff, Martin Grohe, Benny Kimelfeld
Subjects: Machine Learning (cs.LG); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2401.11215
Pdf link: https://arxiv.org/pdf/2401.11215
Abstract Machinery for data analysis often requires a numeric representation of the input. Towards that, a common practice is to embed components of structured data into a high-dimensional vector space. We study the embedding of the tuples of a relational database, where existing techniques are often based on optimization tasks over a collection of random walks from the database. The focus of this paper is on the recent FoRWaRD algorithm that is designed for dynamic databases, where walks are sampled by following foreign keys between tuples. Importantly, different walks have different schemas, or "walk schemes", that are derived by listing the relations and attributes along the walk. Also importantly, different walk schemes describe relationships of different natures in the database. We show that by focusing on a few informative walk schemes, we can obtain tuple embedding significantly faster, while retaining the quality. We define the problem of scheme selection for tuple embedding, devise several approaches and strategies for scheme selection, and conduct a thorough empirical study of the performance over a collection of downstream tasks. Our results confirm that with effective strategies for scheme selection, we can obtain high-quality embeddings considerably (e.g., three times) faster, preserve the extensibility to newly inserted tuples, and even achieve an increase in the precision of some tasks.
On the Information Leakage Performance of Secure Finite Blocklength Transmissions over Rayleigh Fading Channels
Authors: Authors: Milad Tatar Mamaghani, Xiangyun Zhou, Nan Yang, A. Lee Swindlehurst, H. Vincent Poor
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2401.11219
Pdf link: https://arxiv.org/pdf/2401.11219
Abstract This paper presents a secrecy performance study of a wiretap communication system with finite blocklength (FBL) transmissions over Rayleigh fading channels, based on the definition of an average information leakage (AIL) metric. We evaluate the exact and closed-form approximate AIL performance, assuming that only statistical channel state information (CSI) of the eavesdropping link is available. Then, we reveal an inherent statistical relationship between the AIL metric in the FBL regime and the commonly-used secrecy outage probability in conventional infinite blocklength communications. Aiming to improve the secure communication performance of the considered system, we formulate a blocklength optimization problem and solve it via a low-complexity approach. Next, we present numerical results to verify our analytical findings and provide various important insights into the impacts of system parameters on the AIL. Specifically, our results indicate that i) compromising a small amount of AIL can lead to significant reliability improvements, and ii) the AIL experiences a secrecy floor in the high signal-to-noise ratio regime.
AFS-BM: Enhancing Model Performance through Adaptive Feature Selection with Binary Masking
Authors: Authors: Mehmet Y. Turali, Mehmet E. Lorasdagi, Ali T. Koc, Suleyman S. Kozat
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.11250
Pdf link: https://arxiv.org/pdf/2401.11250
Abstract We study the problem of feature selection in general machine learning (ML) context, which is one of the most critical subjects in the field. Although, there exist many feature selection methods, however, these methods face challenges such as scalability, managing high-dimensional data, dealing with correlated features, adapting to variable feature importance, and integrating domain knowledge. To this end, we introduce the ``Adaptive Feature Selection with Binary Masking" (AFS-BM) which remedies these problems. AFS-BM achieves this by joint optimization for simultaneous feature selection and model training. In particular, we do the joint optimization and binary masking to continuously adapt the set of features and model parameters during the training process. This approach leads to significant improvements in model accuracy and a reduction in computational requirements. We provide an extensive set of experiments where we compare AFS-BM with the established feature selection methods using well-known datasets from real-life competitions. Our results show that AFS-BM makes significant improvement in terms of accuracy and requires significantly less computational complexity. This is due to AFS-BM's ability to dynamically adjust to the changing importance of features during the training process, which an important contribution to the field. We openly share our code for the replicability of our results and to facilitate further research.
Long-Term Fair Decision Making through Deep Generative Models
Authors: Authors: Yaowei Hu, Yongkai Wu, Lu Zhang
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2401.11288
Pdf link: https://arxiv.org/pdf/2401.11288
Abstract This paper studies long-term fair machine learning which aims to mitigate group disparity over the long term in sequential decision-making systems. To define long-term fairness, we leverage the temporal causal graph and use the 1-Wasserstein distance between the interventional distributions of different demographic groups at a sufficiently large time step as the quantitative metric. Then, we propose a three-phase learning framework where the decision model is trained on high-fidelity data generated by a deep generative model. We formulate the optimization problem as a performative risk minimization and adopt the repeated gradient descent algorithm for learning. The empirical evaluation shows the efficacy of the proposed method using both synthetic and semi-synthetic datasets.
A Hierarchical Decision-Based Maintenance for a Complex Modular System Driven by the { MoMA} Algorithm
Authors: Authors: M.L. Gamiz, D. Montoro-Cazorla, M.C. Segovia-Garcia
Subjects: Systems and Control (eess.SY); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2401.11328
Pdf link: https://arxiv.org/pdf/2401.11328
Abstract This paper presents a maintenance policy for a modular system formed by K independent modules (n-subsystems) subjected to environmental conditions (shocks). For the modeling of this complex system, the use of the Matrix-Analytical Method (MAM) is proposed under a layered approach according to its hierarchical structure. Thus, the operational state of the system (top layer) depends on the states of the modules (middle layer), which in turn depend on the states of their components (bottom layer). This allows a detailed description of the system operation to plan maintenance actions appropriately and optimally. We propose a hierarchical decision-based maintenance strategy with periodic inspections as follows: at the time of the inspection, the condition of the system is first evaluated. If intervention is necessary, the modules are then checked to make individual decisions based on their states, and so on. Replacement or repair will be carried out as appropriate. An optimization problem is formulated as a function of the length of the inspection period and the intervention cost incurred over the useful life of the system. Our method shows the advantages, providing compact and implementable expressions. The model is illustrated on a submarine Electrical Control Unit (ECU).
Interactive AI with Retrieval-Augmented Generation for Next Generation Networking
Authors: Authors: Ruichen Zhang, Hongyang Du, Yinqiu Liu, Dusit Niyato, Jiawen Kang, Sumei Sun, Xuemin Shen, H. Vincent Poor
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2401.11391
Pdf link: https://arxiv.org/pdf/2401.11391
Abstract With the advance of artificial intelligence (AI), the emergence of Google Gemini and OpenAI Q* marks the direction towards artificial general intelligence (AGI). To implement AGI, the concept of interactive AI (IAI) has been introduced, which can interactively understand and respond not only to human user input but also to dynamic system and network conditions. In this article, we explore an integration and enhancement of IAI in networking. We first comprehensively review recent developments and future perspectives of AI and then introduce the technology and components of IAI. We then explore the integration of IAI into the next-generation networks, focusing on how implicit and explicit interactions can enhance network functionality, improve user experience, and promote efficient network management. Subsequently, we propose an IAI-enabled network management and optimization framework, which consists of environment, perception, action, and brain units. We also design the pluggable large language model (LLM) module and retrieval augmented generation (RAG) module to build the knowledge base and contextual memory for decision-making in the brain unit. We demonstrate the effectiveness of the framework through case studies. Finally, we discuss potential research directions for IAI-based networks.
Causal Generative Explainers using Counterfactual Inference: A Case Study on the Morpho-MNIST Dataset
Authors: Authors: Will Taylor-Melanson, Zahra Sadeghi, Stan Matwin
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.11394
Pdf link: https://arxiv.org/pdf/2401.11394
Abstract In this paper, we propose leveraging causal generative learning as an interpretable tool for explaining image classifiers. Specifically, we present a generative counterfactual inference approach to study the influence of visual features (i.e., pixels) as well as causal factors through generative learning. To this end, we first uncover the most influential pixels on a classifier's decision by varying the value of a causal attribute via counterfactual inference and computing both Shapely and contrastive explanations for counterfactual images with these different attribute values. We then establish a Monte-Carlo mechanism using the generator of a causal generative model in order to adapt Shapley explainers to produce feature importances for the human-interpretable attributes of a causal dataset in the case where a classifier has been trained exclusively on the images of the dataset. Finally, we present optimization methods for creating counterfactual explanations of classifiers by means of counterfactual inference, proposing straightforward approaches for both differentiable and arbitrary classifiers. We exploit the Morpho-MNIST causal dataset as a case study for exploring our proposed methods for generating counterfacutl explantions. We employ visual explanation methods from OmnixAI open source toolkit to compare them with our proposed methods. By employing quantitative metrics to measure the interpretability of counterfactual explanations, we find that our proposed methods of counterfactual explanation offer more interpretable explanations compared to those generated from OmnixAI. This finding suggests that our methods are well-suited for generating highly interpretable counterfactual explanations on causal datasets.
MolTailor: Tailoring Chemical Molecular Representation to Specific Tasks via Text Prompts
Authors: Authors: Haoqiang Guo, Sendong Zhao, Haochun Wang, Yanrui Du, Bing Qin
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Biomolecules (q-bio.BM)
Arxiv link: https://arxiv.org/abs/2401.11403
Pdf link: https://arxiv.org/pdf/2401.11403
Abstract Deep learning is now widely used in drug discovery, providing significant acceleration and cost reduction. As the most fundamental building block, molecular representation is essential for predicting molecular properties to enable various downstream applications. Most existing methods attempt to incorporate more information to learn better representations. However, not all features are equally important for a specific task. Ignoring this would potentially compromise the training efficiency and predictive accuracy. To address this issue, we propose a novel approach, which treats language models as an agent and molecular pretraining models as a knowledge base. The agent accentuates task-relevant features in the molecular representation by understanding the natural language description of the task, just as a tailor customizes clothes for clients. Thus, we call this approach MolTailor. Evaluations demonstrate MolTailor's superior performance over baselines, validating the efficacy of enhancing relevance for molecular representation learning. This illustrates the potential of language model guided optimization to better exploit and unleash the capabilities of existing powerful molecular representation methods. Our codes and appendix are available at https://github.com/SCIR-HI/MolTailor.
Robust Beamforming for Downlink Multi-Cell Systems: A Bilevel Optimization Perspective
Authors: Authors: Xingdi Chen, Yu Xiong, Kai Yang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.11409
Pdf link: https://arxiv.org/pdf/2401.11409
Abstract Utilization of inter-base station cooperation for information processing has shown great potential in enhancing the overall quality of communication services (QoS) in wireless communication networks. Nevertheless, such cooperations require the knowledge of channel state information (CSI) at base stations (BSs), which is assumed to be perfectly known. However, CSI errors are inevitable in practice which necessitates beamforming techniques that can achieve robust performance in the presence of channel estimation errors. Existing approaches relax the robust beamforming design problems into semidefinite programming (SDP), which can only achieve a solution that is far from being optimal. To this end, this paper views robust beamforming design problems from a bilevel optimization perspective. In particular, we focus on maximizing the worst-case weighted sum-rate (WSR) in the downlink multi-cell multi-user multiple-input single-output (MISO) system considering bounded CSI errors. We first reformulate this problem into a bilevel optimization problem and then develop an efficient algorithm based on the cutting plane method. A distributed optimization algorithm has also been developed to facilitate the parallel processing in practical settings. Numerical results are provided to confirm the effectiveness of the proposed algorithm in terms of performance and complexity, particularly in the presence of CSI uncertainties.
Joint Downlink and Uplink Optimization for RIS-Aided FDD MIMO Communication Systems
Authors: Authors: Gyoseung Lee, Hyeongtaek Lee, Donghwan Kim, Jaehoon Chung, A. Lee. Swindlehurst, Junil Choi
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.11429
Pdf link: https://arxiv.org/pdf/2401.11429
Abstract This paper investigates reconfigurable intelligent surface (RIS)-aided frequency division duplexing (FDD) communication systems. Since the downlink and uplink signals are simultaneously transmitted in FDD, the phase shifts at the RIS should be designed to support both transmissions. Considering a single-user multiple-input multiple-output system, we formulate a weighted sum-rate maximization problem to jointly maximize the downlink and uplink system performance. To tackle the non-convex optimization problem, we adopt an alternating optimization (AO) algorithm, in which two phase shift optimization techniques are developed to handle the unit-modulus constraints induced by the reflection coefficients at the RIS. The first technique exploits the manifold optimization-based algorithm, while the second uses a lower-complexity AO approach. Numerical results verify that the proposed techniques rapidly converge to local optima and significantly improve the overall system performance compared to existing benchmark schemes.
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
Authors: Authors: Songyang Gao, Qiming Ge, Wei Shen, Shihan Dou, Junjie Ye, Xiao Wang, Rui Zheng, Yicheng Zou, Zhi Chen, Hang Yan, Qi Zhang, Dahua Lin
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.11458
Pdf link: https://arxiv.org/pdf/2401.11458
Abstract The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by complex annotation and training requirements. This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences. In this work, we introduce \textit{Linear Alignment}, a novel algorithm that aligns language models with human preferences in one single inference step, eliminating the reliance on data annotation and model training. Linear alignment incorporates a new parameterization for policy optimization under divergence constraints, which enables the extraction of optimal policy in a closed-form manner and facilitates the direct estimation of the aligned response. Extensive experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment across diverse scenarios. Our code and dataset will be published on \url{https://github.com/Wizardcoast/Linear_Alignment.git}.
LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory Reduction
Authors: Authors: Zhigang Wang, Hangyu Yang, Ning Wang, Chuanfei Xu, Jie Nie, Zhiqiang Wei, Yu Gu, Ge Yu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.11471
Pdf link: https://arxiv.org/pdf/2401.11471
Abstract In the last decade, Convolutional Neural Network with a multi-layer architecture has advanced rapidly. However, training its complex network is very space-consuming, since a lot of intermediate data are preserved across layers, especially when processing high-dimension inputs with a big batch size. That poses great challenges to the limited memory capacity of current accelerators (e.g., GPUs). Existing efforts mitigate such bottleneck by external auxiliary solutions with additional hardware costs, and internal modifications with potential accuracy penalty. Differently, our analysis reveals that computations intra- and inter-layers exhibit the spatial-temporal weak dependency and even complete independency features. That inspires us to break the traditional layer-by-layer (column) dataflow rule. Now operations are novelly re-organized into rows throughout all convolution layers. This lightweight design allows a majority of intermediate data to be removed without any loss of accuracy. We particularly study the weak dependency between two consecutive rows. For the resulting skewed memory consumption, we give two solutions with different favorite scenarios. Evaluations on two representative networks confirm the effectiveness. We also validate that our middle dataflow optimization can be smoothly embraced by existing works for better memory reduction.
ColorVideoVDP: A visual difference predictor for image, video and display distortions
Authors: Authors: Rafal K. Mantiuk, Param Hanji, Maliha Ashraf, Yuta Asano, Alexandre Chapiro
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.11485
Pdf link: https://arxiv.org/pdf/2401.11485
Abstract ColorVideoVDP is a video and image quality metric that models spatial and temporal aspects of vision, for both luminance and color. The metric is built on novel psychophysical models of chromatic spatiotemporal contrast sensitivity and cross-channel contrast masking. It accounts for the viewing conditions, geometric, and photometric characteristics of the display. It was trained to predict common video streaming distortions (e.g. video compression, rescaling, and transmission errors), and also 8 new distortion types related to AR/VR displays (e.g. light source and waveguide non-uniformities). To address the latter application, we collected our novel XR-Display-Artifact-Video quality dataset (XR-DAVID), comprised of 336 distorted videos. Extensive testing on XR-DAVID, as well as several datasets from the literature, indicate a significant gain in prediction performance compared to existing metrics. ColorVideoVDP opens the doors to many novel applications which require the joint automated spatiotemporal assessment of luminance and color distortions, including video streaming, display specification and design, visual comparison of results, and perceptually-guided quality optimization.
BA-LINS: A Frame-to-Frame Bundle Adjustment for LiDAR-Inertial Navigation
Authors: Authors: Hailiang Tang, Tisheng Zhang, Liqiang Wang, Man yuan, Xiaoji Niu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.11491
Pdf link: https://arxiv.org/pdf/2401.11491
Abstract Bundle Adjustment (BA) has been proven to improve the accuracy of the LiDAR mapping. However, the BA method has not been properly employed in a dead-reckoning navigation system. In this paper, we present a frame-to-frame (F2F) BA for LiDAR-inertial navigation, named BA-LINS. Based on the direct F2F point-cloud association, the same-plane points are associated among the LiDAR keyframes. Hence, the plane-point BA measurement can be constructed using the same-plane points. The LiDAR BA measurements and the inertial measurement unit (IMU)-preintegration measurements are tightly integrated under the framework of factor graph optimization. An effective adaptive covariance estimation algorithm for LiDAR BA measurements is proposed to further improve the accuracy of BA-LINS. We conduct exhaustive real-world experiments on public and private datasets to examine the proposed BA-LINS. The results demonstrate that BA-LINS yields superior accuracy to state-of-the-art methods. Compared to the baseline system FF-LINS, the absolute translation accuracy and state-estimation efficiency of BA-LINS are improved by 29.5% and 28.7%, respectively, on the private dataset. Besides, the ablation experiment results exhibit that the proposed adaptive covariance estimation algorithm can notably improve the accuracy and robustness of BA-LINS.
Information-Theoretic State Variable Selection for Reinforcement Learning
Authors: Authors: Charles Westphal, Stephen Hailes, Mirco Musolesi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2401.11512
Pdf link: https://arxiv.org/pdf/2401.11512
Abstract Identifying the most suitable variables to represent the state is a fundamental challenge in Reinforcement Learning (RL). These variables must efficiently capture the information necessary for making optimal decisions. In order to address this problem, in this paper, we introduce the Transfer Entropy Redundancy Criterion (TERC), an information-theoretic criterion, which determines if there is \textit{entropy transferred} from state variables to actions during training. We define an algorithm based on TERC that provably excludes variables from the state that have no effect on the final performance of the agent, resulting in more sample efficient learning. Experimental results show that this speed-up is present across three different algorithm classes (represented by tabular Q-learning, Actor-Critic, and Proximal Policy Optimization (PPO)) in a variety of environments. Furthermore, to highlight the differences between the proposed methodology and the current state-of-the-art feature selection approaches, we present a series of controlled experiments on synthetic data, before generalizing to real-world decision-making tasks. We also introduce a representation of the problem that compactly captures the transfer of information from state variables to actions as Bayesian networks.
Tempo: Confidentiality Preservation in Cloud-Based Neural Network Training
Authors: Authors: Rongwu Xu, Zhixuan Fang
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.11531
Pdf link: https://arxiv.org/pdf/2401.11531
Abstract Cloud deep learning platforms provide cost-effective deep neural network (DNN) training for customers who lack computation resources. However, cloud systems are often untrustworthy and vulnerable to attackers, leading to growing concerns about model privacy. Recently, researchers have sought to protect data privacy in deep learning by leveraging CPU trusted execution environments (TEEs), which minimize the use of cryptography, but existing works failed to simultaneously utilize the computational resources of GPUs to assist in training and prevent model leakage. This paper presents Tempo, the first cloud-based deep learning system that cooperates with TEE and distributed GPUs for efficient DNN training with model confidentiality preserved. To tackle the challenge of preserving privacy while offloading linear algebraic operations from TEE to GPUs for efficient batch computation, we introduce a customized permutation-based obfuscation algorithm to blind both inputs and model parameters. An optimization mechanism that reduces encryption operations is proposed for faster weight updates during backpropagation to speed up training. We implement Tempo and evaluate it with both training and inference for two prevalent DNNs. Empirical results indicate that Tempo outperforms baselines and offers sufficient privacy protection.
Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting
Authors: Authors: Lingting Zhu, Zhao Wang, Zhenchao Jin, Guying Lin, Lequan Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.11535
Pdf link: https://arxiv.org/pdf/2401.11535
Abstract Surgical 3D reconstruction is a critical area of research in robotic surgery, with recent works adopting variants of dynamic radiance fields to achieve success in 3D reconstruction of deformable tissues from single-viewpoint videos. However, these methods often suffer from time-consuming optimization or inferior quality, limiting their adoption in downstream tasks. Inspired by 3D Gaussian Splatting, a recent trending 3D representation, we present EndoGS, applying Gaussian Splatting for deformable endoscopic tissue reconstruction. Specifically, our approach incorporates deformation fields to handle dynamic scenes, depth-guided supervision to optimize 3D targets with a single viewpoint, and a spatial-temporal weight mask to mitigate tool occlusion. As a result, EndoGS reconstructs and renders high-quality deformable endoscopic tissues from a single-viewpoint video, estimated depth maps, and labeled tool masks. Experiments on DaVinci robotic surgery videos demonstrate that EndoGS achieves superior rendering quality. Code is available at https://github.com/HKU-MedAI/EndoGS.
Real-Time Systems Optimization with Black-box Constraints and Hybrid Variables
Authors: Authors: Sen Wang, Dong Li, Shao-Yu Huang, Xuanliang Deng, Ashrarul H. Sifat, Changhee Jung, Ryan Williams, Haibo Zeng
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.11620
Pdf link: https://arxiv.org/pdf/2401.11620
Abstract When optimizing real-time systems, designers often face a challenging problem where the schedulability constraints are non-convex, non-continuous, or lack an analytical form to understand their properties. Although the optimization framework NORTH proposed in previous work is general (it works with arbitrary schedulability analysis) and scalable, it can only handle problems with continuous variables, which limits its application. In this paper, we extend the applications of the framework NORTH to problems with a hybrid of continuous and discrete variables. This is achieved in a coordinate-descent method, where the continuous and discrete variables are optimized separately during iterations. The new framework, NORTH+, improves around 20% solution quality than NORTH in experiments.
The Markov-Chain Polytope with Applications
Authors: Authors: Mordecai J. Golin, Albert John Lalim Patupat
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2401.11622
Pdf link: https://arxiv.org/pdf/2401.11622
Abstract This paper addresses the problem of finding a minimum-cost $m$-state Markov chain $(S0,\ldots,S{m-1})$ in a large set of chains. The chains studied have a reward associated with each state. The cost of a chain is its "gain", i.e., its average reward under its stationary distribution. Specifically, for each $k=0,\ldots,m-1$ there is a known set ${\mathbb S}_k$ of type-$k$ states. A permissible Markov chain contains exactly one state of each type; the problem is to find a minimum-cost permissible chain. The original motivation was to find a cheapest binary AIFV-$m$ lossless code on a source alphabet of size $n$. Such a code is an $m$-tuple of trees, in which each tree can be viewed as a Markov Chain state. This formulation was then used to address other problems in lossless compression. The known solution techniques for finding minimum-cost Markov chains were iterative and ran in exponential time. This paper shows how to map every possible type-$k$ state into a type-$k$ hyperplane and then define a "Markov Chain Polytope" as the lower envelope of all such hyperplanes. Finding a minimum-cost Markov chain can then be shown to be equivalent to finding a "highest" point on this polytope. The local optimization procedures used in the previous iterative algorithms are shown to be separation oracles for this polytope. Since these were often polynomial time, an application of the Ellipsoid method immediately leads to polynomial time algorithms for these problems.
MR.CAP: Multi-Robot Joint Control and Planning for Object Transport
Authors: Authors: Hussein Ali Jaafar, Cheng-Hao Kao, Sajad Saeedi
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.11634
Pdf link: https://arxiv.org/pdf/2401.11634
Abstract With the recent influx in demand for multi-robot systems throughout industry and academia, there is an increasing need for faster, robust, and generalizable path planning algorithms. Similarly, given the inherent connection between control algorithms and multi-robot path planners, there is in turn an increased demand for fast, efficient, and robust controllers. We propose a scalable joint path planning and control algorithm for multi-robot systems with constrained behaviours based on factor graph optimization. We demonstrate our algorithm on a series of hardware and simulated experiments. Our algorithm is consistently able to recover from disturbances and avoid obstacles while outperforming state-of-the-art methods in optimization time, path deviation, and inter-robot errors. See the code and supplementary video for experiments.
An Improved Grey Wolf Optimization Algorithm for Heart Disease Prediction
Authors: Authors: Sihan Niu, Yifan Zhou, Zhikai Li, Shuyao Huang, Yujun Zhou
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2401.11669
Pdf link: https://arxiv.org/pdf/2401.11669
Abstract This paper presents a unique solution to challenges in medical image processing by incorporating an adaptive curve grey wolf optimization (ACGWO) algorithm into neural network backpropagation. Neural networks show potential in medical data but suffer from issues like overfitting and lack of interpretability due to imbalanced and scarce data. Traditional Gray Wolf Optimization (GWO) also has its drawbacks, such as a lack of population diversity and premature convergence. This paper addresses these problems by introducing an adaptive algorithm, enhancing the standard GWO with a sigmoid function. This algorithm was extensively compared to four leading algorithms using six well-known test functions, outperforming them effectively. Moreover, by utilizing the ACGWO, we increase the robustness and generalization of the neural network, resulting in more interpretable predictions. Applied to the publicly accessible Cleveland Heart Disease dataset, our technique surpasses ten other methods, achieving 86.8% accuracy, indicating its potential for efficient heart disease prediction in the clinical setting.
Graph Condensation: A Survey
Authors: Authors: Xinyi Gao, Junliang Yu, Wei Jiang, Tong Chen, Wentao Zhang, Hongzhi Yin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.11720
Pdf link: https://arxiv.org/pdf/2401.11720
Abstract The burgeoning volume of graph data poses significant challenges in storage, transmission, and particularly the training of graph neural networks (GNNs). To address these challenges, graph condensation (GC) has emerged as an innovative solution. GC focuses on synthesizing a compact yet highly representative graph, on which GNNs can achieve performance comparable to trained on the large original graph. The notable efficacy of GC and its broad prospects have garnered significant attention and spurred extensive research. This survey paper provides an up-to-date and systematic overview of GC, organizing existing research into four categories aligned with critical GC evaluation criteria: effectiveness, generalization, fairness, and efficiency. To facilitate an in-depth and comprehensive understanding of GC, we examine various methods under each category and thoroughly discuss two essential components within GC: optimization strategies and condensed graph generation. Additionally, we introduce the applications of GC in a variety of fields, and highlight the present challenges and novel insights in GC, promoting advancements in future research.
Fast and Scalable Network Slicing by Integrating Deep Learning with Lagrangian Methods
Authors: Authors: Tianlun Hu, Qi Liao, Qiang Liu, Antonio Massaro, Georg Carle
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.11731
Pdf link: https://arxiv.org/pdf/2401.11731
Abstract Network slicing is a key technique in 5G and beyond for efficiently supporting diverse services. Many network slicing solutions rely on deep learning to manage complex and high-dimensional resource allocation problems. However, deep learning models suffer limited generalization and adaptability to dynamic slicing configurations. In this paper, we propose a novel framework that integrates constrained optimization methods and deep learning models, resulting in strong generalization and superior approximation capability. Based on the proposed framework, we design a new neural-assisted algorithm to allocate radio resources to slices to maximize the network utility under inter-slice resource constraints. The algorithm exhibits high scalability, accommodating varying numbers of slices and slice configurations with ease. We implement the proposed solution in a system-level network simulator and evaluate its performance extensively by comparing it to state-of-the-art solutions including deep reinforcement learning approaches. The numerical results show that our solution obtains near-optimal quality-of-service satisfaction and promising generalization performance under different network slicing scenarios.
MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation
Authors: Authors: Shenwang Jiang, Jianan Li, Ying Wang, Wenxuan Wu, Jizhou Zhang, Bo Huang, Tingfa Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.11738
Pdf link: https://arxiv.org/pdf/2401.11738
Abstract Noisy labels, inevitably existing in pseudo segmentation labels generated from weak object-level annotations, severely hampers model optimization for semantic segmentation. Previous works often rely on massive hand-crafted losses and carefully-tuned hyper-parameters to resist noise, suffering poor generalization capability and high model complexity. Inspired by recent advances in meta learning, we argue that rather than struggling to tolerate noise hidden behind clean labels passively, a more feasible solution would be to find out the noisy regions actively, so as to simply ignore them during model optimization. With this in mind, this work presents a novel meta learning based semantic segmentation method, MetaSeg, that comprises a primary content-aware meta-net (CAM-Net) to sever as a noise indicator for an arbitrary segmentation model counterpart. Specifically, CAM-Net learns to generate pixel-wise weights to suppress noisy regions with incorrect pseudo labels while highlighting clean ones by exploiting hybrid strengthened features from image content, providing straightforward and reliable guidance for optimizing the segmentation model. Moreover, to break the barrier of time-consuming training when applying meta learning to common large segmentation models, we further present a new decoupled training strategy that optimizes different model layers in a divide-and-conquer manner. Extensive experiments on object, medical, remote sensing and human segmentation shows that our method achieves superior performance, approaching that of fully supervised settings, which paves a new promising way for omni-supervised semantic segmentation.
FedGTA: Topology-aware Averaging for Federated Graph Learning
Authors: Authors: Xunkai Li, Zhengyu Wu, Wentao Zhang, Yinlin Zhu, Rong-Hua Li, Guoren Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2401.11755
Pdf link: https://arxiv.org/pdf/2401.11755
Abstract Federated Graph Learning (FGL) is a distributed machine learning paradigm that enables collaborative training on large-scale subgraphs across multiple local systems. Existing FGL studies fall into two categories: (i) FGL Optimization, which improves multi-client training in existing machine learning models; (ii) FGL Model, which enhances performance with complex local models and multi-client interactions. However, most FGL optimization strategies are designed specifically for the computer vision domain and ignore graph structure, presenting dissatisfied performance and slow convergence. Meanwhile, complex local model architectures in FGL Models studies lack scalability for handling large-scale subgraphs and have deployment limitations. To address these issues, we propose Federated Graph Topology-aware Aggregation (FedGTA), a personalized optimization strategy that optimizes through topology-aware local smoothing confidence and mixed neighbor features. During experiments, we deploy FedGTA in 12 multi-scale real-world datasets with the Louvain and Metis split. This allows us to evaluate the performance and robustness of FedGTA across a range of scenarios. Extensive experiments demonstrate that FedGTA achieves state-of-the-art performance while exhibiting high scalability and efficiency. The experiment includes ogbn-papers100M, the most representative large-scale graph database so that we can verify the applicability of our method to large-scale graph learning. To the best of our knowledge, our study is the first to bridge large-scale graph learning with FGL using this optimization strategy, contributing to the development of efficient and scalable FGL methods.
Towards Effective and General Graph Unlearning via Mutual Evolution
Authors: Authors: Xunkai Li, Yulin Zhao, Zhengyu Wu, Wentao Zhang, Rong-Hua Li, Guoren Wang
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2401.11760
Pdf link: https://arxiv.org/pdf/2401.11760
Abstract With the rapid advancement of AI applications, the growing needs for data privacy and model robustness have highlighted the importance of machine unlearning, especially in thriving graph-based scenarios. However, most existing graph unlearning strategies primarily rely on well-designed architectures or manual process, rendering them less user-friendly and posing challenges in terms of deployment efficiency. Furthermore, striking a balance between unlearning performance and framework generalization is also a pivotal concern. To address the above issues, we propose \underline{\textbf{M}}utual \underline{\textbf{E}}volution \underline{\textbf{G}}raph \underline{\textbf{U}}nlearning (MEGU), a new mutual evolution paradigm that simultaneously evolves the predictive and unlearning capacities of graph unlearning. By incorporating aforementioned two components, MEGU ensures complementary optimization in a unified training framework that aligns with the prediction and unlearning requirements. Extensive experiments on 9 graph benchmark datasets demonstrate the superior performance of MEGU in addressing unlearning requirements at the feature, node, and edge levels. Specifically, MEGU achieves average performance improvements of 2.7\%, 2.5\%, and 3.2\% across these three levels of unlearning tasks when compared to state-of-the-art baselines. Furthermore, MEGU exhibits satisfactory training efficiency, reducing time and space overhead by an average of 159.8x and 9.6x, respectively, in comparison to retraining GNN from scratch.
LightDiC: A Simple yet Effective Approach for Large-scale Digraph Representation Learning
Authors: Authors: Xunkai Li, Meihao Liao, Zhengyu Wu, Daohan Su, Wentao Zhang, Rong-Hua Li, Guoren Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2401.11772
Pdf link: https://arxiv.org/pdf/2401.11772
Abstract Most existing graph neural networks (GNNs) are limited to undirected graphs, whose restricted scope of the captured relational information hinders their expressive capabilities and deployments in real-world scenarios. Compared with undirected graphs, directed graphs (digraphs) fit the demand for modeling more complex topological systems by capturing more intricate relationships between nodes, such as formulating transportation and financial networks. While some directed GNNs have been introduced, their inspiration mainly comes from deep learning architectures, which lead to redundant complexity and computation, making them inapplicable to large-scale databases. To address these issues, we propose LightDiC, a scalable variant of the digraph convolution based on the magnetic Laplacian. Since topology-related computations are conducted solely during offline pre-processing, LightDiC achieves exceptional scalability, enabling downstream predictions to be trained separately without incurring recursive computational costs. Theoretical analysis shows that LightDiC utilizes directed information to achieve message passing based on the complex field, which corresponds to the proximal gradient descent process of the Dirichlet energy optimization function from the perspective of digraph signal denoising, ensuring its expressiveness. Experimental results demonstrate that LightDiC performs comparably well or even outperforms other SOTA methods in various downstream tasks, with fewer learnable parameters and higher training efficiency. Notably, LightDiC is the first DiGNN to provide satisfactory results in the most representative large-scale database (ogbn-papers100M).
Optimization in Sanger Sequencing
Authors: Authors: Luisa Carpente, Ana Cerdeira-Pena, Silvia Lorenzo-Freire, Ángeles S. Places
Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.11854
Pdf link: https://arxiv.org/pdf/2401.11854
Abstract The main objective of this paper is to solve the optimization problem that is associated with the classification of DNA samples in PCR plates for Sanger sequencing. To achieve this goal, we design an integer linear programming model. Given that the real instances involve the classification of thousands of samples and the linear model can only be solved for small instances, the paper includes a heuristic to cope with bigger problems. The heuristic algorithm is based on the simulated annealing technique. This algorithm obtains satisfactory solutions to the problem in a short amount of time. It has been tested with real data and yields improved results compared to some commercial software typically used in (clinical) laboratories. Moreover, the algorithm has already been implemented in the laboratory and is being successfully used.
Automation of Triangle Ruler-and-Compass Constructions Using Constraint Solvers
Authors: Authors: Milan Banković (Faculty of Mathematics, University of Belgrade, Serbia)
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.11903
Pdf link: https://arxiv.org/pdf/2401.11903
Abstract In this paper, we present an approach to automated solving of triangle ruler-and-compass construction problems using finite-domain constraint solvers. The constraint model is described in the MiniZinc modeling language, and is based on the automated planning. The main benefit of using general constraint solvers for such purpose, instead of developing dedicated tools, is that we can rely on the efficient search that is already implemented within the solver, enabling us to focus on geometric aspects of the problem. We may also use the solver's built-in optimization capabilities to search for the shortest possible constructions. We evaluate our approach on 74 solvable problems from the Wernick's list, and compare it to the dedicated triangle construction solver ArgoTriCS. The results show that our approach is comparable to dedicated tools, while it requires much less effort to implement. Also, our model often finds shorter constructions, thanks to the optimization capabilities offered by the constraint solvers.
Maximizing Spectral and Energy Efficiency in Multi-user MIMO OFDM Systems with RIS and Hardware Impairment
Authors: Authors: Mohammad Soleymani, Ignacio Santamaria, Aydin Sezgin, Eduard Jorswieck
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.11921
Pdf link: https://arxiv.org/pdf/2401.11921
Abstract An emerging technology to enhance the spectral efficiency (SE) and energy efficiency (EE) of wireless communication systems is reconfigurable intelligent surface (RIS), which is shown to be very powerful in single-carrier systems. However, in multi-user orthogonal frequency division multiplexing (OFDM) systems, RIS may not be as promising as in single-carrier systems since an independent optimization of RIS elements at each sub-carrier is impossible in multi-carrier systems. Thus, this paper investigates the performance of various RIS technologies like regular (reflective and passive), simultaneously transmit and reflect (STAR), and multi-sector beyond diagonal (BD) RIS in multi-user multiple-input multiple-output (MIMO) OFDM broadcast channels (BC). This requires to formulate and solve a joint MIMO precoding and RIS optimization problem. The obtained solution reveals that RIS can significantly improve the system performance even when the number of RIS elements is relatively low. Moreover, we develop resource allocation schemes for STAR-RIS and multi-sector BD-RIS in MIMO OFDM BCs, and show that these RIS technologies can outperform a regular RIS, especially when the regular RIS cannot assist the communications for all the users.
Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey
Authors: Authors: Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zheng, Ke Tang
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.11963
Pdf link: https://arxiv.org/pdf/2401.11963
Abstract Evolutionary Reinforcement Learning (ERL), which integrates Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for optimization, has demonstrated remarkable performance advancements. By fusing the strengths of both approaches, ERL has emerged as a promising research direction. This survey offers a comprehensive overview of the diverse research branches in ERL. Specifically, we systematically summarize recent advancements in relevant algorithms and identify three primary research directions: EA-assisted optimization of RL, RL-assisted optimization of EA, and synergistic optimization of EA and RL. Following that, we conduct an in-depth analysis of each research direction, organizing multiple research branches. We elucidate the problems that each branch aims to tackle and how the integration of EA and RL addresses these challenges. In conclusion, we discuss potential challenges and prospective future research directions across various research directions.
Integrating Statistical Significance and Discriminative Power in Pattern Discovery
Authors: Authors: Leonardo Alexandre, Rafael S. Costa, Rui Henriques
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.12000
Pdf link: https://arxiv.org/pdf/2401.12000
Abstract Pattern discovery plays a central role in both descriptive and predictive tasks across multiple domains. Actionable patterns must meet rigorous statistical significance criteria and, in the presence of target variables, further uphold discriminative power. Our work addresses the underexplored area of guiding pattern discovery by integrating statistical significance and discriminative power criteria into state-of-the-art algorithms while preserving pattern quality. We also address how pattern quality thresholds, imposed by some algorithms, can be rectified to accommodate these additional criteria. To test the proposed methodology, we select the triclustering task as the guiding pattern discovery case and extend well-known greedy and multi-objective optimization triclustering algorithms, $\delta$-Trimax and TriGen, that use various pattern quality criteria, such as Mean Squared Residual (MSR), Least Squared Lines (LSL), and Multi Slope Measure (MSL). Results from three case studies show the role of the proposed methodology in discovering patterns with pronounced improvements of discriminative power and statistical significance without quality deterioration, highlighting its importance in supervisedly guiding the search. Although the proposed methodology is motivated over multivariate time series data, it can be straightforwardly extended to pattern discovery tasks involving multivariate, N-way (N>3), transactional, and sequential data structures. Availability: The code is freely available at https://github.com/JupitersMight/MOF_Triclustering under the MIT license.
A Survey of Advances in Optimization Methods for Wireless Communication System Design
Authors: Authors: Ya-Feng Liu, Tsung-Hui Chang, Mingyi Hong, Zheyu Wu, Anthony Man-Cho So, Eduard A. Jorswieck, Wei Yu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.12025
Pdf link: https://arxiv.org/pdf/2401.12025
Abstract Mathematical optimization is now widely regarded as an indispensable modeling and solution tool for the design of wireless communications systems. While optimization has played a significant role in the revolutionary progress in wireless communication and networking technologies from 1G to 5G and onto the future 6G, the innovations in wireless technologies have also substantially transformed the nature of the underlying mathematical optimization problems upon which the system designs are based and have sparked significant innovations in the development of methodologies to understand, to analyze, and to solve those problems. In this paper, we provide a comprehensive survey of recent advances in mathematical optimization theory and algorithms for wireless communication system design. We begin by illustrating common features of mathematical optimization problems arising in wireless communication system design. We discuss various scenarios and use cases and their associated mathematical structures from an optimization perspective. We then provide an overview of recent advances in mathematical optimization theory and algorithms, from nonconvex optimization, global optimization, and integer programming, to distributed optimization and learning-based optimization. The key to successful solution of mathematical optimization problems is in carefully choosing and/or developing suitable optimization algorithms (or neural network architectures) that can exploit the underlying problem structure. We conclude the paper by identifying several open research challenges and outlining future research directions.
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Authors: Authors: Marlon Becker, Frederick Altrock, Benjamin Risse
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.12033
Pdf link: https://arxiv.org/pdf/2401.12033
Abstract The recently proposed optimization algorithm for deep neural networks Sharpness Aware Minimization (SAM) suggests perturbing parameters before gradient calculation by a gradient ascent step to guide the optimization into parameter space regions of flat loss. While significant generalization improvements and thus reduction of overfitting could be demonstrated, the computational costs are doubled due to the additionally needed gradient calculation, making SAM unfeasible in case of limited computationally capacities. Motivated by Nesterov Accelerated Gradient (NAG) we propose Momentum-SAM (MSAM), which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands over SGD or Adam. We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization. Code is available at https://github.com/MarlonBecker/MSAM.
The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization
Authors: Authors: Matan Schliserman, Uri Sherman, Tomer Koren
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.12058
Pdf link: https://arxiv.org/pdf/2401.12058
Abstract We study the generalization performance of gradient methods in the fundamental stochastic convex optimization setting, focusing on its dimension dependence. First, for full-batch gradient descent (GD) we give a construction of a learning problem in dimension $d=O(n^2)$, where the canonical version of GD (tuned for optimal performance of the empirical risk) trained with $n$ training examples converges, with constant probability, to an approximate empirical risk minimizer with $\Omega(1)$ population excess risk. Our bound translates to a lower bound of $\Omega (\sqrt{d})$ on the number of training examples required for standard GD to reach a non-trivial test error, answering an open question raised by Feldman (2016) and Amir, Koren, and Livni (2021b) and showing that a non-trivial dimension dependence is unavoidable. Furthermore, for standard one-pass stochastic gradient descent (SGD), we show that an application of the same construction technique provides a similar $\Omega(\sqrt{d})$ lower bound for the sample complexity of SGD to reach a non-trivial empirical error, despite achieving optimal test performance. This again provides an exponential improvement in the dimension dependence compared to previous work (Koren, Livni, Mansour, and Sherman, 2022), resolving an open question left therein.
Collaborative Reinforcement Learning Based Unmanned Aerial Vehicle (UAV) Trajectory Design for 3D UAV Tracking
Authors: Authors: Yujiao Zhu, Mingzhe Chen, Sihua Wang, Ye Hu, Yuchen Liu, Changchuan Yin
Subjects: Multiagent Systems (cs.MA); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.12079
Pdf link: https://arxiv.org/pdf/2401.12079
Abstract In this paper, the problem of using one active unmanned aerial vehicle (UAV) and four passive UAVs to localize a 3D target UAV in real time is investigated. In the considered model, each passive UAV receives reflection signals from the target UAV, which are initially transmitted by the active UAV. The received reflection signals allow each passive UAV to estimate the signal transmission distance which will be transmitted to a base station (BS) for the estimation of the position of the target UAV. Due to the movement of the target UAV, each active/passive UAV must optimize its trajectory to continuously localize the target UAV. Meanwhile, since the accuracy of the distance estimation depends on the signal-to-noise ratio of the transmission signals, the active UAV must optimize its transmit power. This problem is formulated as an optimization problem whose goal is to jointly optimize the transmit power of the active UAV and trajectories of both active and passive UAVs so as to maximize the target UAV positioning accuracy. To solve this problem, a Z function decomposition based reinforcement learning (ZD-RL) method is proposed. Compared to value function decomposition based RL (VD-RL), the proposed method can find the probability distribution of the sum of future rewards to accurately estimate the expected value of the sum of future rewards thus finding better transmit power of the active UAV and trajectories for both active and passive UAVs and improving target UAV positioning accuracy. Simulation results show that the proposed ZD-RL method can reduce the positioning errors by up to 39.4% and 64.6%, compared to VD-RL and independent deep RL methods, respectively.
LearnedWMP: Workload Memory Prediction Using Distribution of Query Templates
Authors: Authors: Shaikh Quader, Andres Jaramillo, Sumona Mukhopadhyay, Ghadeer Abuoda, Calisto Zuzarte, David Kalmuk, Marin Litoiu, Manos Papagelis
Subjects: Databases (cs.DB); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.12103
Pdf link: https://arxiv.org/pdf/2401.12103
Abstract In a modern DBMS, working memory is frequently the limiting factor when processing in-memory analytic query operations such as joins, sorting, and aggregation. Existing resource estimation approaches for a DBMS estimate the resource consumption of a query by computing an estimate of each individual database operator in the query execution plan. Such an approach is slow and error-prone as it relies upon simplifying assumptions, such as uniformity and independence of the underlying data. Additionally, the existing approach focuses on individual queries separately and does not factor in other queries in the workload that may be executed concurrently. In this research, we are interested in query performance optimization under concurrent execution of a batch of queries (a workload). Specifically, we focus on predicting the memory demand for a workload rather than providing separate estimates for each query within it. We introduce the problem of workload memory prediction and formalize it as a distribution regression problem. We propose Learned Workload Memory Prediction (LearnedWMP) to improve and simplify estimating the working memory demands of workloads. Through a comprehensive experimental evaluation, we show that LearnedWMP reduces the memory estimation error of the state-of-the-practice method by up to 47.6%. Compared to an alternative single-query model, during training and inferencing, the LearnedWMP model and its variants were 3x to 10x faster. Moreover, LearnedWMP-based models were at least 50% smaller in most cases. Overall, the results demonstrate the advantages of the LearnedWMP approach and its potential for a broader impact on query performance optimization.
Energy-aware Trajectory Optimization for UAV-mounted RIS and Full-duplex Relay
Authors: Authors: Dimitrios Tyrovolas, Nikos A. Mitsiou, Thomas G. Boufikos, Prodromos-Vasileios Mekikis, Sotiris A. Tegos, Panagiotis D. Diamantoulakis, Sotiris Ioannidis, Christos K. Liaskos, George K. Karagiannidis
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.12107
Pdf link: https://arxiv.org/pdf/2401.12107
Abstract In the evolving landscape of sixth-generation (6G) wireless networks, unmanned aerial vehicles (UAVs) have emerged as transformative tools for dynamic and adaptive connectivity. However, dynamically adjusting their position to offer favorable communication channels introduces operational challenges in terms of energy consumption, especially when integrating advanced communication technologies like reconfigurable intelligent surfaces (RISs) and full-duplex relays (FDRs). To this end, by recognizing the pivotal role of UAV mobility, the paper introduces an energy-aware trajectory design for UAV-mounted RISs and UAV-mounted FDRs using the decode and forward (DF) protocol, aiming to maximize the network minimum rate and enhance user fairness, while taking into consideration the available on-board energy. Specifically, this work highlights their distinct energy consumption characteristics and their associated integration challenges by developing appropriate energy consumption models for both UAV-mounted RISs and FDRs that capture the intricate relationship between key factors such as weight, and their operational characteristics. Furthermore, a joint time-division multiple access (TDMA) user scheduling-UAV trajectory optimization problem is formulated, considering the power dynamics of both systems, while assuring that the UAV energy is not depleted mid-air. Finally, simulation results underscore the importance of energy considerations in determining the optimal trajectory and scheduling and provide insights into the performance comparison of UAV-mounted RISs and FDRs in UAV-assisted wireless networks.
Gradient Preserving Operator Inference: Data-Driven Reduced-Order Models for Equations with Gradient Structure
Authors: Authors: Yuwei Geng, Jasdeep Singh, Lili Ju, Boris Kramer, Zhu Wang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2401.12138
Pdf link: https://arxiv.org/pdf/2401.12138
Abstract Hamiltonian Operator Inference has been introduced in [Sharma, H., Wang, Z., Kramer, B., Physica D: Nonlinear Phenomena, 431, p.133122, 2022] to learn structure-preserving reduced-order models (ROMs) for Hamiltonian systems. This approach constructs a low-dimensional model using only data and knowledge of the Hamiltonian function. Such ROMs can keep the intrinsic structure of the system, allowing them to capture the physics described by the governing equations. In this work, we extend this approach to more general systems that are either conservative or dissipative in energy, and which possess a gradient structure. We derive the optimization problems for inferring structure-preserving ROMs that preserve the gradient structure. We further derive an {\em a priori} error estimate for the reduced-order approximation. To test the algorithms, we consider semi-discretized partial differential equations with gradient structure, such as the parameterized wave and Korteweg-de-Vries equations in the conservative case and the one- and two-dimensional Allen-Cahn equations in the dissipative case. The numerical results illustrate the accuracy, structure-preservation properties, and predictive capabilities of the gradient-preserving Operator Inference ROMs.
Uncoded Storage Coded Transmission Elastic Computing with Straggler Tolerance in Heterogeneous Systems
Authors: Authors: Xi Zhong, Joerg Kliewer, Mingyue Ji
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.12151
Pdf link: https://arxiv.org/pdf/2401.12151
Abstract In 2018, Yang et al. introduced a novel and effective approach, using maximum distance separable (MDS) codes, to mitigate the impact of elasticity in cloud computing systems. This approach is referred to as coded elastic computing. Some limitations of this approach include that it assumes all virtual machines have the same computing speeds and storage capacities, and it cannot tolerate stragglers for matrix-matrix multiplications. In order to resolve these limitations, in this paper, we introduce a new combinatorial optimization framework, named uncoded storage coded transmission elastic computing (USCTEC), for heterogeneous speeds and storage constraints, aiming to minimize the expected computation time for matrix-matrix multiplications, under the consideration of straggler tolerance. Within this framework, we propose optimal solutions with straggler tolerance under relaxed storage constraints. Moreover, we propose a heuristic algorithm that considers the heterogeneous storage constraints. Our results demonstrate that the proposed algorithm outperforms baseline solutions utilizing cyclic storage placements, in terms of both expected computation time and storage size.
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
Authors: Authors: Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2401.12179
Pdf link: https://arxiv.org/pdf/2401.12179
Abstract We propose Diffusion Inference-Time T-Optimization (DITTO), a general-purpose frame-work for controlling pre-trained text-to-music diffusion models at inference-time via optimizing initial noise latents. Our method can be used to optimize through any differentiable feature matching loss to achieve a target (stylized) output and leverages gradient checkpointing for memory efficiency. We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control - all without ever fine-tuning the underlying model. When we compare our approach against related training, guidance, and optimization-based methods, we find DITTO achieves state-of-the-art performance on nearly all tasks, including outperforming comparable approaches on controllability, audio quality, and computational efficiency, thus opening the door for high-quality, flexible, training-free control of diffusion models. Sound examples can be found at https://DITTO-Music.github.io/web/.
Quality-Aware Hydraulic Control in Drinking Water Networks via Controllability Proxies
Authors: Authors: Salma M. Elsherif, Mohamad H. Kazma, Ahmad F. Taha
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.12214
Pdf link: https://arxiv.org/pdf/2401.12214
Abstract The operation of water distribution networks is a complex procedure aimed at efficiently delivering consumers with adequate water quantity while ensuring its safe quality. An added challenge is the dependency of the water quality dynamics on the system's hydraulics, which influences the performance of the water quality controller. Prior research has addressed either solving the optimum operational hydraulic setting problem or regulating the water quality dynamics as separate problems. Additionally, there have been efforts to couple these two problems and solve one compact problem resulting in trade-offs between the contradictory objectives. In contrast, this paper examines the dependency and influence from a control-theoretic standpoint. More specifically, we explore the influence of accountability for water quality controllability improvement when addressing the pump scheduling problem. We examine its effects on the cumulative cost of the interconnected systems as well as the subsequent performance of the water quality controller. To achieve this, we develop a framework that incorporates different controllability metrics within the operational hydraulic optimization problem; its aim is attaining an adequate level of water quality control across the system. We assess the aforementioned aspects' performance on various scaled networks with a wide range of numerical scenarios.
Keyword: adam

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Authors: Authors: Marlon Becker, Frederick Altrock, Benjamin Risse
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.12033
Pdf link: https://arxiv.org/pdf/2401.12033
Abstract The recently proposed optimization algorithm for deep neural networks Sharpness Aware Minimization (SAM) suggests perturbing parameters before gradient calculation by a gradient ascent step to guide the optimization into parameter space regions of flat loss. While significant generalization improvements and thus reduction of overfitting could be demonstrated, the computational costs are doubled due to the additionally needed gradient calculation, making SAM unfeasible in case of limited computationally capacities. Motivated by Nesterov Accelerated Gradient (NAG) we propose Momentum-SAM (MSAM), which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands over SGD or Adam. We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization. Code is available at https://github.com/MarlonBecker/MSAM.
Keyword: gradient

Helmholtz-Decomposition and Optical Flow: A new method to characterize GCamP recordings
Authors: Authors: Michael Gerstenberger, Dominic Juestel, Silviu Bodea
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.11008
Pdf link: https://arxiv.org/pdf/2401.11008
Abstract During deep sleep and under anaesthesia spontaneous patterns of cortical activation frequently take the form of slow travelling waves. Slow wave sleep is an important cognitive state especially because of its relevance for memory consolidation. However, despite extensive research the exact mechanisms are still ill-understood. Novel methods such as high speed widefield imaging of GCamP activity offer new potentials. Here we show how data recorded from transgenic mice under anesthesia can be processed to analyze sources, sinks and patterns of flow. To make the best possible use of the data novel means of data processing are necessary. Therefore, we (1) give a an brief account on processes that play a role in generating slow waves and demonstrate (2) a novel approach to characterize its patterns in GCamP recordings. While slow waves are highly variable, it shows that some are surprisingly similar. To enable quantitative means of analysis and examine the structure of such prototypical events we propose a novel approach for the characterization of slow waves: The Helmholtz-Decomposition of gradient-based Dense Optical Flow of the pixeldense GCamP contrast (df/f). It allows to detect the sources and sinks of activation and discern them from global patterns of neural flow. Aggregated features can be analyzed with variational autoencoders. The results unravel regularities between slow waves and shows how they relate to the experimental conditions. The approach reveals a complex topology of different features in latent slow wave space and identifies prototypical examples for each stage.
UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures
Authors: Authors: Mingyuan Zhou, Rakib Hyder, Ziwei Xuan, Guojun Qi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.11078
Pdf link: https://arxiv.org/pdf/2401.11078
Abstract Recent advances in 3D avatar generation have gained significant attentions. These breakthroughs aim to produce more realistic animatable avatars, narrowing the gap between virtual and real-world experiences. Most of existing works employ Score Distillation Sampling (SDS) loss, combined with a differentiable renderer and text condition, to guide a diffusion model in generating 3D avatars. However, SDS often generates oversmoothed results with few facial details, thereby lacking the diversity compared with ancestral sampling. On the other hand, other works generate 3D avatar from a single image, where the challenges of unwanted lighting effects, perspective views, and inferior image quality make them difficult to reliably reconstruct the 3D face meshes with the aligned complete textures. In this paper, we propose a novel 3D avatar generation approach termed UltrAvatar with enhanced fidelity of geometry, and superior quality of physically based rendering (PBR) textures without unwanted lighting. To this end, the proposed approach presents a diffuse color extraction model and an authenticity guided texture diffusion model. The former removes the unwanted lighting effects to reveal true diffuse colors so that the generated avatars can be rendered under various lighting conditions. The latter follows two gradient-based guidances for generating PBR textures to render diverse face-identity features and details better aligning with 3D mesh geometry. We demonstrate the effectiveness and robustness of the proposed method, outperforming the state-of-the-art methods by a large margin in the experiments.
FedRKG: A Privacy-preserving Federated Recommendation Framework via Knowledge Graph Enhancement
Authors: Authors: Dezhong Yao, Tongtong Liu, Qi Cao, Hai Jin
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2401.11089
Pdf link: https://arxiv.org/pdf/2401.11089
Abstract Federated Learning (FL) has emerged as a promising approach for preserving data privacy in recommendation systems by training models locally. Recently, Graph Neural Networks (GNN) have gained popularity in recommendation tasks due to their ability to capture high-order interactions between users and items. However, privacy concerns prevent the global sharing of the entire user-item graph. To address this limitation, some methods create pseudo-interacted items or users in the graph to compensate for missing information for each client. Unfortunately, these methods introduce random noise and raise privacy concerns. In this paper, we propose FedRKG, a novel federated recommendation system, where a global knowledge graph (KG) is constructed and maintained on the server using publicly available item information, enabling higher-order user-item interactions. On the client side, a relation-aware GNN model leverages diverse KG relationships. To protect local interaction items and obscure gradients, we employ pseudo-labeling and Local Differential Privacy (LDP). Extensive experiments conducted on three real-world datasets demonstrate the competitive performance of our approach compared to centralized algorithms while ensuring privacy preservation. Moreover, FedRKG achieves an average accuracy improvement of 4% compared to existing federated learning baselines.
Diffusion Model Conditioning on Gaussian Mixture Model and Negative Gaussian Mixture Gradient
Authors: Authors: Weiguo Lu, Xuan Wu, Deng Ding, Jinqiao Duan, Jirong Zhuang, Gangnan Yuan
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.11261
Pdf link: https://arxiv.org/pdf/2401.11261
Abstract Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feature conditioning to guide the denoising process. Based on set theory, we provide a comprehensive theoretical analysis that shows that conditional latent distribution based on features and classes is significantly different, so that conditional latent distribution on features produces fewer defect generations than conditioning on classes. Two diffusion models conditioned on the Gaussian mixture model are trained separately for comparison. Experiments support our findings. A novel gradient function called the negative Gaussian mixture gradient (NGMG) is proposed and applied in diffusion model training with an additional classifier. Training stability has improved. We also theoretically prove that NGMG shares the same benefit as the Earth Mover distance (Wasserstein) as a more sensible cost function when learning distributions supported by low-dimensional manifolds.
Long-Term Fair Decision Making through Deep Generative Models
Authors: Authors: Yaowei Hu, Yongkai Wu, Lu Zhang
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2401.11288
Pdf link: https://arxiv.org/pdf/2401.11288
Abstract This paper studies long-term fair machine learning which aims to mitigate group disparity over the long term in sequential decision-making systems. To define long-term fairness, we leverage the temporal causal graph and use the 1-Wasserstein distance between the interventional distributions of different demographic groups at a sufficiently large time step as the quantitative metric. Then, we propose a three-phase learning framework where the decision model is trained on high-fidelity data generated by a deep generative model. We formulate the optimization problem as a performative risk minimization and adopt the repeated gradient descent algorithm for learning. The empirical evaluation shows the efficacy of the proposed method using both synthetic and semi-synthetic datasets.
Finding a Needle in the Adversarial Haystack: A Targeted Paraphrasing Approach For Uncovering Edge Cases with Minimal Distribution Distortion
Authors: Authors: Aly M. Kassem, Sherif Saad
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.11373
Pdf link: https://arxiv.org/pdf/2401.11373
Abstract Adversarial attacks against NLP Deep Learning models are a significant concern. In particular, adversarial samples exploit the model's sensitivity to small input changes. While these changes appear insignificant on the semantics of the input sample, they result in significant decay in model performance. In this paper, we propose Targeted Paraphrasing via RL (TPRL), an approach to automatically learn a policy to generate challenging samples that most likely improve the model's performance. TPRL leverages FLAN T5, a language model, as a generator and employs a self learned policy using a proximal policy gradient to generate the adversarial examples automatically. TPRL's reward is based on the confusion induced in the classifier, preserving the original text meaning through a Mutual Implication score. We demonstrate and evaluate TPRL's effectiveness in discovering natural adversarial attacks and improving model performance through extensive experiments on four diverse NLP classification tasks via Automatic and Human evaluation. TPRL outperforms strong baselines, exhibits generalizability across classifiers and datasets, and combines the strengths of language modeling and reinforcement learning to generate diverse and influential adversarial examples.
Adversarial Augmentation Training Makes Action Recognition Models More Robust to Realistic Video Distribution Shifts
Authors: Authors: Kiyoon Kim, Shreyank N Gowda, Panagiotis Eustratiadis, Antreas Antoniou, Robert B Fisher
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.11406
Pdf link: https://arxiv.org/pdf/2401.11406
Abstract Despite recent advances in video action recognition achieving strong performance on existing benchmarks, these models often lack robustness when faced with natural distribution shifts between training and test data. We propose two novel evaluation methods to assess model resilience to such distribution disparity. One method uses two different datasets collected from different sources and uses one for training and validation, and the other for testing. More precisely, we created dataset splits of HMDB-51 or UCF-101 for training, and Kinetics-400 for testing, using the subset of the classes that are overlapping in both train and test datasets. The other proposed method extracts the feature mean of each class from the target evaluation dataset's training data (i.e. class prototype) and estimates test video prediction as a cosine similarity score between each sample to the class prototypes of each target class. This procedure does not alter model weights using the target dataset and it does not require aligning overlapping classes of two different datasets, thus is a very efficient method to test the model robustness to distribution shifts without prior knowledge of the target distribution. We address the robustness problem by adversarial augmentation training - generating augmented views of videos that are "hard" for the classification model by applying gradient ascent on the augmentation parameters - as well as "curriculum" scheduling the strength of the video augmentations. We experimentally demonstrate the superior performance of the proposed adversarial augmentation approach over baselines across three state-of-the-art action recognition models - TSM, Video Swin Transformer, and Uniformer. The presented work provides critical insight into model robustness to distribution shifts and presents effective techniques to enhance video action recognition performance in a real-world deployment.
Frost Prediction Using Machine Learning Methods in Fars Province
Authors: Authors: Milad Barooni, Koorush Ziarati, Ali Barooni
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.11462
Pdf link: https://arxiv.org/pdf/2401.11462
Abstract One of the common hazards and issues in meteorology and agriculture is the problem of frost, chilling or freezing. This event occurs when the minimum ambient temperature falls below a certain value. This phenomenon causes a lot of damage to the country, especially Fars province. Solving this problem requires that, in addition to predicting the minimum temperature, we can provide enough time to implement the necessary measures. Empirical methods have been provided by the Food and Agriculture Organization (FAO), which can predict the minimum temperature, but not in time. In addition to this, we can use machine learning methods to model the minimum temperature. In this study, we have used three methods Gated Recurrent Unit (GRU), Temporal Convolutional Network (TCN) as deep learning methods, and Gradient Boosting (XGBoost). A customized loss function designed for methods based on deep learning, which can be effective in reducing prediction errors. With methods based on deep learning models, not only do we observe a reduction in RMSE error compared to empirical methods but also have more time to predict minimum temperature. Thus, we can model the minimum temperature for the next 24 hours by having the current 24 hours. With the gradient boosting model (XGBoost) we can keep the prediction time as deep learning and RMSE error reduced. Finally, we experimentally concluded that machine learning methods work better than empirical methods and XGBoost model can have better performance in this problem among other implemented.
How Robust Are Energy-Based Models Trained With Equilibrium Propagation?
Authors: Authors: Siddharth Mansingh, Michal Kucer, Garrett Kenyon, Juston Moore, Michael Teti
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.11543
Pdf link: https://arxiv.org/pdf/2401.11543
Abstract Deep neural networks (DNNs) are easily fooled by adversarial perturbations that are imperceptible to humans. Adversarial training, a process where adversarial examples are added to the training set, is the current state-of-the-art defense against adversarial attacks, but it lowers the model's accuracy on clean inputs, is computationally expensive, and offers less robustness to natural noise. In contrast, energy-based models (EBMs), which were designed for efficient implementation in neuromorphic hardware and physical systems, incorporate feedback connections from each layer to the previous layer, yielding a recurrent, deep-attractor architecture which we hypothesize should make them naturally robust. Our work is the first to explore the robustness of EBMs to both natural corruptions and adversarial attacks, which we do using the CIFAR-10 and CIFAR-100 datasets. We demonstrate that EBMs are more robust than transformers and display comparable robustness to adversarially-trained DNNs on gradient-based (white-box) attacks, query-based (black-box) attacks, and natural perturbations without sacrificing clean accuracy, and without the need for adversarial training or additional training techniques.
Tight Verification of Probabilistic Robustness in Bayesian Neural Networks
Authors: Authors: Ben Batten, Mehran Hosseini, Alessio Lomuscio
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
Arxiv link: https://arxiv.org/abs/2401.11627
Pdf link: https://arxiv.org/pdf/2401.11627
Abstract We introduce two algorithms for computing tight guarantees on the probabilistic robustness of Bayesian Neural Networks (BNNs). Computing robustness guarantees for BNNs is a significantly more challenging task than verifying the robustness of standard Neural Networks (NNs) because it requires searching the parameters' space for safe weights. Moreover, tight and complete approaches for the verification of standard NNs, such as those based on Mixed-Integer Linear Programming (MILP), cannot be directly used for the verification of BNNs because of the polynomial terms resulting from the consecutive multiplication of variables encoding the weights. Our algorithms efficiently and effectively search the parameters' space for safe weights by using iterative expansion and the network's gradient and can be used with any verification algorithm of choice for BNNs. In addition to proving that our algorithms compute tighter bounds than the SoA, we also evaluate our algorithms against the SoA on standard benchmarks, such as MNIST and CIFAR10, showing that our algorithms compute bounds up to 40% tighter than the SoA.
Reframing Offline Reinforcement Learning as a Regression Problem
Authors: Authors: Prajwal Koirala, Cody Fleming
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.11630
Pdf link: https://arxiv.org/pdf/2401.11630
Abstract The study proposes the reformulation of offline reinforcement learning as a regression problem that can be solved with decision trees. Aiming to predict actions based on input states, return-to-go (RTG), and timestep information, we observe that with gradient-boosted trees, the agent training and inference are very fast, the former taking less than a minute. Despite the simplification inherent in this reformulated problem, our agent demonstrates performance that is at least on par with established methods. This assertion is validated by testing it across standard datasets associated with D4RL Gym-MuJoCo tasks. We further discuss the agent's ability to generalize by testing it on two extreme cases, how it learns to model the return distributions effectively even with highly skewed expert datasets, and how it exhibits robust performance in scenarios with sparse/delayed rewards.
Differentiable Tree Search in Latent State Space
Authors: Authors: Dixant Mittal, Wee Sun Lee
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.11660
Pdf link: https://arxiv.org/pdf/2401.11660
Abstract In decision-making problems with limited training data, policy functions approximated using deep neural networks often exhibit suboptimal performance. An alternative approach involves learning a world model from the limited data and determining actions through online search. However, the performance is adversely affected by compounding errors arising from inaccuracies in the learnt world model. While methods like TreeQN have attempted to address these inaccuracies by incorporating algorithmic structural biases into their architectures, the biases they introduce are often weak and insufficient for complex decision-making tasks. In this work, we introduce Differentiable Tree Search (DTS), a novel neural network architecture that significantly strengthens the inductive bias by embedding the algorithmic structure of a best-first online search algorithm. DTS employs a learnt world model to conduct a fully differentiable online search in latent state space. The world model is jointly optimised with the search algorithm, enabling the learning of a robust world model and mitigating the effect of model inaccuracies. We address potential Q-function discontinuities arising from naive incorporation of best-first search by adopting a stochastic tree expansion policy, formulating search tree expansion as a decision-making task, and introducing an effective variance reduction technique for the gradient computation. We evaluate DTS in an offline-RL setting with a limited training data scenario on Procgen games and grid navigation task, and demonstrate that DTS outperforms popular model-free and model-based baselines.
Conjugate Direction Methods Under Inconsistent Systems
Authors: Authors: Alexander Lim, Yang Liu, Fred Roosta
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2401.11714
Pdf link: https://arxiv.org/pdf/2401.11714
Abstract Since the development of the conjugate gradient (CG) method in 1952 by Hestenes and Stiefel, CG, has become an indispensable tool in computational mathematics for solving positive definite linear systems. On the other hand, the conjugate residual (CR) method, closely related CG and introduced by Stiefel in 1955 for the same settings, remains relatively less known outside the numerical linear algebra community. Since their inception, these methods -- henceforth collectively referred to as conjugate direction methods -- have been extended beyond positive definite to indefinite, albeit consistent, settings. Going one step further, in this paper, we investigate theoretical and empirical properties of these methods under inconsistent systems. Among other things, we show that small modifications to the original algorithms allow for the pseudo-inverse solution. Furthermore, we show that CR is essentially equivalent to the minimum residual method, proposed by Paige and Saunders in 1975, in such contexts. Lastly, we conduct a series of numerical experiments to shed lights on their numerical stability (or lack thereof) and their performance for inconsistent systems. Surprisingly, we will demonstrate that, unlike CR and contrary to popular belief, CG can exhibit significant numerical instability, bordering on catastrophe in some instances.
GI-PIP: Do We Require Impractical Auxiliary Dataset for Gradient Inversion Attacks?
Authors: Authors: Yu sun, Gaojian Xiong, Xianxun Yao, Kailang Ma, Jian Cui
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.11748
Pdf link: https://arxiv.org/pdf/2401.11748
Abstract Deep gradient inversion attacks expose a serious threat to Federated Learning (FL) by accurately recovering private data from shared gradients. However, the state-of-the-art heavily relies on impractical assumptions to access excessive auxiliary data, which violates the basic data partitioning principle of FL. In this paper, a novel method, Gradient Inversion Attack using Practical Image Prior (GI-PIP), is proposed under a revised threat model. GI-PIP exploits anomaly detection models to capture the underlying distribution from fewer data, while GAN-based methods consume significant more data to synthesize images. The extracted distribution is then leveraged to regulate the attack process as Anomaly Score loss. Experimental results show that GI-PIP achieves a 16.12 dB PSNR recovery using only 3.8\% data of ImageNet, while GAN-based methods necessitate over 70\%. Moreover, GI-PIP exhibits superior capability on distribution generalization compared to GAN-based methods. Our approach significantly alleviates the auxiliary data requirement on both amount and distribution in gradient inversion attacks, hence posing more substantial threat to real-world FL.
LightDiC: A Simple yet Effective Approach for Large-scale Digraph Representation Learning
Authors: Authors: Xunkai Li, Meihao Liao, Zhengyu Wu, Daohan Su, Wentao Zhang, Rong-Hua Li, Guoren Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2401.11772
Pdf link: https://arxiv.org/pdf/2401.11772
Abstract Most existing graph neural networks (GNNs) are limited to undirected graphs, whose restricted scope of the captured relational information hinders their expressive capabilities and deployments in real-world scenarios. Compared with undirected graphs, directed graphs (digraphs) fit the demand for modeling more complex topological systems by capturing more intricate relationships between nodes, such as formulating transportation and financial networks. While some directed GNNs have been introduced, their inspiration mainly comes from deep learning architectures, which lead to redundant complexity and computation, making them inapplicable to large-scale databases. To address these issues, we propose LightDiC, a scalable variant of the digraph convolution based on the magnetic Laplacian. Since topology-related computations are conducted solely during offline pre-processing, LightDiC achieves exceptional scalability, enabling downstream predictions to be trained separately without incurring recursive computational costs. Theoretical analysis shows that LightDiC utilizes directed information to achieve message passing based on the complex field, which corresponds to the proximal gradient descent process of the Dirichlet energy optimization function from the perspective of digraph signal denoising, ensuring its expressiveness. Experimental results demonstrate that LightDiC performs comparably well or even outperforms other SOTA methods in various downstream tasks, with fewer learnable parameters and higher training efficiency. Notably, LightDiC is the first DiGNN to provide satisfactory results in the most representative large-scale database (ogbn-papers100M).
EPIC: a provable accelerated Eigensolver based on Preconditioning and Implicit Convexity
Authors: Authors: Nian Shao, Wenbin Chen, Zhaojun Bai
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2401.11786
Pdf link: https://arxiv.org/pdf/2401.11786
Abstract This paper is concerned with the extraction of the smallest eigenvalue and the corresponding eigenvector of a symmetric positive definite matrix pencil. We reveal implicit convexity of the eigenvalue problem in Euclidean space. A provable accelerated eigensolver based on preconditioning and implicit convexity (EPIC) is proposed. Theoretical analysis shows the acceleration of EPIC with the rate of convergence resembling the expected rate of convergence of the well-known locally optimal preconditioned conjugate gradient (LOPCG). A complete proof of the expected rate of convergence of LOPCG is elusive so far. Numerical results confirm our theoretical findings of EPIC.
Low-Tubal-Rank Tensor Recovery via Factorized Gradient Descent
Authors: Authors: Zhiyu Liu, Zhi Han, Yandong Tang, Xi-Le Zhao, Yao Wang
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.11940
Pdf link: https://arxiv.org/pdf/2401.11940
Abstract This paper considers the problem of recovering a tensor with an underlying low-tubal-rank structure from a small number of corrupted linear measurements. Traditional approaches tackling such a problem require the computation of tensor Singular Value Decomposition (t-SVD), that is a computationally intensive process, rendering them impractical for dealing with large-scale tensors. Aim to address this challenge, we propose an efficient and effective low-tubal-rank tensor recovery method based on a factorization procedure akin to the Burer-Monteiro (BM) method. Precisely, our fundamental approach involves decomposing a large tensor into two smaller factor tensors, followed by solving the problem through factorized gradient descent (FGD). This strategy eliminates the need for t-SVD computation, thereby reducing computational costs and storage requirements. We provide rigorous theoretical analysis to ensure the convergence of FGD under both noise-free and noisy situations. Additionally, it is worth noting that our method does not require the precise estimation of the tensor tubal-rank. Even in cases where the tubal-rank is slightly overestimated, our approach continues to demonstrate robust performance. A series of experiments have been carried out to demonstrate that, as compared to other popular ones, our approach exhibits superior performance in multiple scenarios, in terms of the faster computational speed and the smaller convergence error.
RUMBoost: Gradient Boosted Random Utility Models
Authors: Authors: Nicolas Salvadé, Tim Hillel
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.11954
Pdf link: https://arxiv.org/pdf/2401.11954
Abstract This paper introduces the RUMBoost model, a novel discrete choice modelling approach that combines the interpretability and behavioural robustness of Random Utility Models (RUMs) with the generalisation and predictive ability of deep learning methods. We obtain the full functional form of non-linear utility specifications by replacing each linear parameter in the utility functions of a RUM with an ensemble of gradient boosted regression trees. This enables piece-wise constant utility values to be imputed for all alternatives directly from the data for any possible combination of input variables. We introduce additional constraints on the ensembles to ensure three crucial features of the utility specifications: (i) dependency of the utilities of each alternative on only the attributes of that alternative, (ii) monotonicity of marginal utilities, and (iii) an intrinsically interpretable functional form, where the exact response of the model is known throughout the entire input space. Furthermore, we introduce an optimisation-based smoothing technique that replaces the piece-wise constant utility values of alternative attributes with monotonic piece-wise cubic splines to identify non-linear parameters with defined gradient. We demonstrate the potential of the RUMBoost model compared to various ML and Random Utility benchmark models for revealed preference mode choice data from London. The results highlight the great predictive performance and the direct interpretability of our proposed approach. Furthermore, the smoothed attribute utility functions allow for the calculation of various behavioural indicators and marginal utilities. Finally, we demonstrate the flexibility of our methodology by showing how the RUMBoost model can be extended to complex model specifications, including attribute interactions, correlation within alternative error terms and heterogeneity within the population.
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Authors: Authors: Marlon Becker, Frederick Altrock, Benjamin Risse
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.12033
Pdf link: https://arxiv.org/pdf/2401.12033
Abstract The recently proposed optimization algorithm for deep neural networks Sharpness Aware Minimization (SAM) suggests perturbing parameters before gradient calculation by a gradient ascent step to guide the optimization into parameter space regions of flat loss. While significant generalization improvements and thus reduction of overfitting could be demonstrated, the computational costs are doubled due to the additionally needed gradient calculation, making SAM unfeasible in case of limited computationally capacities. Motivated by Nesterov Accelerated Gradient (NAG) we propose Momentum-SAM (MSAM), which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands over SGD or Adam. We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization. Code is available at https://github.com/MarlonBecker/MSAM.
The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization
Authors: Authors: Matan Schliserman, Uri Sherman, Tomer Koren
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.12058
Pdf link: https://arxiv.org/pdf/2401.12058
Abstract We study the generalization performance of gradient methods in the fundamental stochastic convex optimization setting, focusing on its dimension dependence. First, for full-batch gradient descent (GD) we give a construction of a learning problem in dimension $d=O(n^2)$, where the canonical version of GD (tuned for optimal performance of the empirical risk) trained with $n$ training examples converges, with constant probability, to an approximate empirical risk minimizer with $\Omega(1)$ population excess risk. Our bound translates to a lower bound of $\Omega (\sqrt{d})$ on the number of training examples required for standard GD to reach a non-trivial test error, answering an open question raised by Feldman (2016) and Amir, Koren, and Livni (2021b) and showing that a non-trivial dimension dependence is unavoidable. Furthermore, for standard one-pass stochastic gradient descent (SGD), we show that an application of the same construction technique provides a similar $\Omega(\sqrt{d})$ lower bound for the sample complexity of SGD to reach a non-trivial empirical error, despite achieving optimal test performance. This again provides an exponential improvement in the dimension dependence compared to previous work (Koren, Livni, Mansour, and Sherman, 2022), resolving an open question left therein.
Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles
Authors: Authors: Maximilian Muschalik, Fabian Fumagalli, Barbara Hammer, Eyke Hüllermeier
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.12069
Pdf link: https://arxiv.org/pdf/2401.12069
Abstract While shallow decision trees may be interpretable, larger ensemble models like gradient-boosted trees, which often set the state of the art in machine learning problems involving tabular data, still remain black box models. As a remedy, the Shapley value (SV) is a well-known concept in explainable artificial intelligence (XAI) research for quantifying additive feature attributions of predictions. The model-specific TreeSHAP methodology solves the exponential complexity for retrieving exact SVs from tree-based models. Expanding beyond individual feature attribution, Shapley interactions reveal the impact of intricate feature interactions of any order. In this work, we present TreeSHAP-IQ, an efficient method to compute any-order additive Shapley interactions for predictions of tree-based models. TreeSHAP-IQ is supported by a mathematical framework that exploits polynomial arithmetic to compute the interaction scores in a single recursive traversal of the tree, akin to Linear TreeSHAP. We apply TreeSHAP-IQ on state-of-the-art tree ensembles and explore interactions on well-established benchmark datasets.
Improved accuracy of continuum surface flux models for metal additive manufacturing melt pool simulations
Authors: Authors: Nils Much, Magdalena Schreter-Fleischhacker, Peter Munch, Martin Kronbichler, Wolfgang A. Wall, Christoph Meier
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2401.12114
Pdf link: https://arxiv.org/pdf/2401.12114
Abstract Computational modeling of the melt pool dynamics in laser-based powder bed fusion metal additive manufacturing (PBF-LB/M) promises to shed light on fundamental defect generation mechanisms. These processes are typically accompanied by rapid evaporation so that the evaporation-induced recoil pressure and cooling arise as major driving forces for fluid dynamics and temperature evolution. The magnitude of these interface fluxes depends exponentially on the melt pool surface temperature, which, therefore, must be predicted with high accuracy. The present work utilizes a diffuse interface model based on a continuum surface flux (CSF) description on the interfaces to study dimensionally reduced thermal two-phase problems representing PBF-LB/M in a finite element framework. It is demonstrated that the extreme temperature gradients combined with the high ratios of material properties between metal and ambient gas lead to significant errors in the interface temperatures and fluxes when classical CSF approaches, along with typical interface thicknesses and discretizations, are applied. A novel parameter-scaled CSF approach is proposed, which is constructed to yield a smoother temperature rate in the diffuse interface region, significantly increasing the solution accuracy. The interface thickness required to predict the temperature field with a given level of accuracy is less restrictive by at least one order of magnitude for the proposed parameter-scaled CSF approach compared to classical CSF, drastically reducing computational costs. Finally, we showcased the general applicability of the parameter-scaled CSF to a three-dimensional simulation of stationary laser melting of PBF-LB/M considering the fully coupled thermo-hydrodynamic multi-phase problem, including phase change.
Gradient Preserving Operator Inference: Data-Driven Reduced-Order Models for Equations with Gradient Structure
Authors: Authors: Yuwei Geng, Jasdeep Singh, Lili Ju, Boris Kramer, Zhu Wang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2401.12138
Pdf link: https://arxiv.org/pdf/2401.12138
Abstract Hamiltonian Operator Inference has been introduced in [Sharma, H., Wang, Z., Kramer, B., Physica D: Nonlinear Phenomena, 431, p.133122, 2022] to learn structure-preserving reduced-order models (ROMs) for Hamiltonian systems. This approach constructs a low-dimensional model using only data and knowledge of the Hamiltonian function. Such ROMs can keep the intrinsic structure of the system, allowing them to capture the physics described by the governing equations. In this work, we extend this approach to more general systems that are either conservative or dissipative in energy, and which possess a gradient structure. We derive the optimization problems for inferring structure-preserving ROMs that preserve the gradient structure. We further derive an {\em a priori} error estimate for the reduced-order approximation. To test the algorithms, we consider semi-discretized partial differential equations with gradient structure, such as the parameterized wave and Korteweg-de-Vries equations in the conservative case and the one- and two-dimensional Allen-Cahn equations in the dissipative case. The numerical results illustrate the accuracy, structure-preservation properties, and predictive capabilities of the gradient-preserving Operator Inference ROMs.
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
Authors: Authors: Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2401.12179
Pdf link: https://arxiv.org/pdf/2401.12179
Abstract We propose Diffusion Inference-Time T-Optimization (DITTO), a general-purpose frame-work for controlling pre-trained text-to-music diffusion models at inference-time via optimizing initial noise latents. Our method can be used to optimize through any differentiable feature matching loss to achieve a target (stylized) output and leverages gradient checkpointing for memory efficiency. We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control - all without ever fine-tuning the underlying model. When we compare our approach against related training, guidance, and optimization-based methods, we find DITTO achieves state-of-the-art performance on nearly all tasks, including outperforming comparable approaches on controllability, audio quality, and computational efficiency, thus opening the door for high-quality, flexible, training-free control of diffusion models. Sound examples can be found at https://DITTO-Music.github.io/web/.
Keyword: super-resolution

Simultaneous Blind Demixing and Super-resolution via Vectorized Hankel Lift
Authors: Authors: Haifeng Wang, Jinchi Chen, Hulei Fan, Yuxiang Zhao, Li Yu
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2401.11805
Pdf link: https://arxiv.org/pdf/2401.11805
Abstract In this work, we investigate the problem of simultaneous blind demixing and super-resolution. Leveraging the subspace assumption regarding unknown point spread functions, this problem can be reformulated as a low-rank matrix demixing problem. We propose a convex recovery approach that utilizes the low-rank structure of each vectorized Hankel matrix associated with the target matrix. Our analysis reveals that for achieving exact recovery, the number of samples needs to satisfy the condition $n\gtrsim Ksr \log (sn)$. Empirical evaluations demonstrate the recovery capabilities and the computational efficiency of the convex method.
Observation-Guided Meteorological Field Downscaling at Station Scale: A Benchmark and a New Method
Authors: Authors: Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Keyan Chen, Zhengyi Wang, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.11960
Pdf link: https://arxiv.org/pdf/2401.11960
Abstract Downscaling (DS) of meteorological variables involves obtaining high-resolution states from low-resolution meteorological fields and is an important task in weather forecasting. Previous methods based on deep learning treat downscaling as a super-resolution task in computer vision and utilize high-resolution gridded meteorological fields as supervision to improve resolution at specific grid scales. However, this approach has struggled to align with the continuous distribution characteristics of meteorological fields, leading to an inherent systematic bias between the downscaled results and the actual observations at meteorological stations. In this paper, we extend meteorological downscaling to arbitrary scattered station scales, establish a brand new benchmark and dataset, and retrieve meteorological states at any given station location from a coarse-resolution meteorological field. Inspired by data assimilation techniques, we integrate observational data into the downscaling process, providing multi-scale observational priors. Building on this foundation, we propose a new downscaling model based on hypernetwork architecture, namely HyperDS, which efficiently integrates different observational information into the model training, achieving continuous scale modeling of the meteorological field. Through extensive experiments, our proposed method outperforms other specially designed baseline models on multiple surface variables. Notably, the mean squared error (MSE) for wind speed and surface pressure improved by 67% and 19.5% compared to other methods. We will release the dataset and code subsequently.

zoq / arxiv-updates

New submissions for Tue, 23 Jan 24 #692

Keyword: sgd

Communication Efficient and Provable Federated Unlearning

Understanding the Generalization Benefits of Late Learning Rate Decay

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization

Keyword: optimization

Quantum Neural Network Software Testing, Analysis, and Code Optimization for Advanced IoT Systems: Design, Implementation, and Visualization

Automatic dimensionality reduction of Twin-in-the-Loop Observers

Decentralizing Coordination in Open Vehicle Fleets for Scalable and Dynamic Task Allocation

Optimization of the Context-Free Language Reachability Matrix-Based Algorithm

Chance-Constrained, Drift-Safe Guidance for Spacecraft Rendezvous

Adaptive Global-Local Representation Learning and Selection for Cross-Domain Facial Expression Recognition

Meta Reinforcement Learning for Strategic IoT Deployments Coverage in Disaster-Response UAV Swarms

The Bit Complexity of Dynamic Algebraic Formulas and their Determinants

Stability Plasticity Decoupled Fine-tuning For Few-shot end-to-end Object Detection

Wideband Beamforming for RIS Assisted Near-Field Communications

Joint Beamforming Optimization and Mode Selection for RDARS-aided MIMO Systems

Selecting Walk Schemes for Database Embedding

On the Information Leakage Performance of Secure Finite Blocklength Transmissions over Rayleigh Fading Channels

AFS-BM: Enhancing Model Performance through Adaptive Feature Selection with Binary Masking

Long-Term Fair Decision Making through Deep Generative Models

A Hierarchical Decision-Based Maintenance for a Complex Modular System Driven by the { MoMA} Algorithm

Interactive AI with Retrieval-Augmented Generation for Next Generation Networking

Causal Generative Explainers using Counterfactual Inference: A Case Study on the Morpho-MNIST Dataset

MolTailor: Tailoring Chemical Molecular Representation to Specific Tasks via Text Prompts

Robust Beamforming for Downlink Multi-Cell Systems: A Bilevel Optimization Perspective

Joint Downlink and Uplink Optimization for RIS-Aided FDD MIMO Communication Systems

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory Reduction

ColorVideoVDP: A visual difference predictor for image, video and display distortions

BA-LINS: A Frame-to-Frame Bundle Adjustment for LiDAR-Inertial Navigation

Information-Theoretic State Variable Selection for Reinforcement Learning

Tempo: Confidentiality Preservation in Cloud-Based Neural Network Training

Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting

Real-Time Systems Optimization with Black-box Constraints and Hybrid Variables

The Markov-Chain Polytope with Applications

MR.CAP: Multi-Robot Joint Control and Planning for Object Transport

An Improved Grey Wolf Optimization Algorithm for Heart Disease Prediction

Graph Condensation: A Survey

Fast and Scalable Network Slicing by Integrating Deep Learning with Lagrangian Methods

MetaSeg: Content-Aware Meta-Net for Omni-Supervised Semantic Segmentation

FedGTA: Topology-aware Averaging for Federated Graph Learning

Towards Effective and General Graph Unlearning via Mutual Evolution

LightDiC: A Simple yet Effective Approach for Large-scale Digraph Representation Learning

Optimization in Sanger Sequencing

Automation of Triangle Ruler-and-Compass Constructions Using Constraint Solvers

Maximizing Spectral and Energy Efficiency in Multi-user MIMO OFDM Systems with RIS and Hardware Impairment

Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey

Integrating Statistical Significance and Discriminative Power in Pattern Discovery

A Survey of Advances in Optimization Methods for Wireless Communication System Design

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization

Collaborative Reinforcement Learning Based Unmanned Aerial Vehicle (UAV) Trajectory Design for 3D UAV Tracking

LearnedWMP: Workload Memory Prediction Using Distribution of Query Templates

Energy-aware Trajectory Optimization for UAV-mounted RIS and Full-duplex Relay

Gradient Preserving Operator Inference: Data-Driven Reduced-Order Models for Equations with Gradient Structure

Uncoded Storage Coded Transmission Elastic Computing with Straggler Tolerance in Heterogeneous Systems

DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Quality-Aware Hydraulic Control in Drinking Water Networks via Controllability Proxies

Keyword: adam

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Keyword: gradient

Helmholtz-Decomposition and Optical Flow: A new method to characterize GCamP recordings

UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures

FedRKG: A Privacy-preserving Federated Recommendation Framework via Knowledge Graph Enhancement

Diffusion Model Conditioning on Gaussian Mixture Model and Negative Gaussian Mixture Gradient

Long-Term Fair Decision Making through Deep Generative Models

Finding a Needle in the Adversarial Haystack: A Targeted Paraphrasing Approach For Uncovering Edge Cases with Minimal Distribution Distortion

Adversarial Augmentation Training Makes Action Recognition Models More Robust to Realistic Video Distribution Shifts

Frost Prediction Using Machine Learning Methods in Fars Province

How Robust Are Energy-Based Models Trained With Equilibrium Propagation?

Tight Verification of Probabilistic Robustness in Bayesian Neural Networks

Reframing Offline Reinforcement Learning as a Regression Problem

Differentiable Tree Search in Latent State Space

Conjugate Direction Methods Under Inconsistent Systems

GI-PIP: Do We Require Impractical Auxiliary Dataset for Gradient Inversion Attacks?

LightDiC: A Simple yet Effective Approach for Large-scale Digraph Representation Learning

EPIC: a provable accelerated Eigensolver based on Preconditioning and Implicit Convexity