Abstract
Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In this paper, we propose "MMCC", the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism. MMCC is nearly tight in that it approaches a lower bound as $\epsilon\to0$. To analyze correlated outputs in MMCC, we prove that they can be analyzed as if they were independent, by conditioning them on prior outputs. Our "conditional composition theorem" has broad utility: we use it to show that the noise added to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with amplification. Our amplification algorithm also has practical empirical utility: we show it leads to significant improvement in the privacy-utility trade-offs for DP-FTRL algorithms on standard benchmarks.
Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend Existing Ones?
Authors: Authors: Dominic Petrak, Nafise Sadat Moosavi, Ye Tian, Nikolai Rozanov, Iryna Gurevych
Abstract
Learning from free-text human feedback is essential for dialog systems, but annotated data is scarce and usually covers only a small fraction of error types known in conversational AI. Instead of collecting and annotating new datasets from scratch, recent advances in synthetic dialog generation could be used to augment existing dialog datasets with the necessary annotations. However, to assess the feasibility of such an effort, it is important to know the types and frequency of free-text human feedback included in these datasets. In this work, we investigate this question for a variety of commonly used dialog datasets, including MultiWoZ, SGD, BABI, PersonaChat, Wizards-of-Wikipedia, and the human-bot split of the Self-Feeding Chatbot. Using our observations, we derive new taxonomies for the annotation of free-text human feedback in dialogs and investigate the impact of including such data in response generation for three SOTA language generation models, including GPT-2, LLAMA, and Flan-T5. Our findings provide new insights into the composition of the datasets examined, including error types, user response types, and the relations between them.
Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization
Authors: Authors: Zhen Qin, Zhishuai Liu, Pan Xu
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
signSGD is popular in nonconvex optimization due to its communication efficiency. Yet, existing analyses of signSGD rely on assuming that data are sampled with replacement in each iteration, contradicting the practical implementation where data are randomly reshuffled and sequentially fed into the algorithm. We bridge this gap by proving the first convergence result of signSGD with random reshuffling (SignRR) for nonconvex optimization. Given the dataset size $n$, the number of epochs of data passes $T$, and the variance bound of a stochastic gradient $\sigma^2$, we show that SignRR has the same convergence rate $O(\log(nT)/\sqrt{nT} + |\sigma|_1)$ as signSGD \citep{bernstein2018signsgd}. We then present SignRVR and SignRVM, which leverage variance-reduced gradients and momentum updates respectively, both converging at $O(\log(nT)/\sqrt{nT})$. In contrast with the analysis of signSGD, our results do not require an extremely large batch size in each iteration to be of the same order as the total number of iterations \citep{bernstein2018signsgd} or the signs of stochastic and true gradients match element-wise with a minimum probability of 1/2 \citep{safaryan2021stochastic}. We also extend our algorithms to cases where data are distributed across different machines, yielding dist-SignRVR and dist-SignRVM, both converging at $O(\log(n_0T)/\sqrt{n_0T})$, where $n_0$ is the dataset size of a single machine. We back up our theoretical findings through experiments on simulated and real-world problems, verifying that randomly reshuffled sign methods match or surpass existing baselines.
Keyword: optimization
Fast Path Planning for Autonomous Vehicle Parking with Safety-Guarantee using Hamilton-Jacobi Reachability
Authors: Authors: Xuemin Chi, Jun Zeng, Jihao Huang, Zhitao Liu, Hongye Su
Abstract
We present a fast planning architecture called Hamilton-Jacobi-based bidirectional A (HJBA) to solve general tight parking scenarios. The algorithm is a two-layer composed of a high-level HJ-based reachability analysis and a lower-level bidirectional A search algorithm. In high-level reachability analysis, a backward reachable tube (BRT) concerning vehicle dynamics is computed by the HJ analysis and it intersects with a safe set to get a safe reachable set. The safe set is defined by constraints of positive signed distances for obstacles in the environment and computed by solving QP optimization problems offline. For states inside the intersection set, i.e., the safe reachable set, the computed backward reachable tube ensures they are reachable subjected to system dynamics and input bounds, and the safe set guarantees they satisfy parking safety with respect to obstacles in different shapes. For online computation, randomized states are sampled from the safe reachable set, and used as heuristic guide points to be considered in the bidirectional A search. The bidirectional A* search is paralleled for each randomized state from the safe reachable set. We show that the proposed two-level planning algorithm is able to solve different parking scenarios effectively and computationally fast for typical parking requests. We validate our algorithm through simulations in large-scale randomized parking scenarios and demonstrate it to be able to outperform other state-of-the-art parking planning algorithms.
Application of deep and reinforcement learning to boundary control problems
Authors: Authors: Zenin Easa Panthakkalakath, Juraj Kardoš, Olaf Schenk
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract
The boundary control problem is a non-convex optimization and control problem in many scientific domains, including fluid mechanics, structural engineering, and heat transfer optimization. The aim is to find the optimal values for the domain boundaries such that the enclosed domain adhering to the governing equations attains the desired state values. Traditionally, non-linear optimization methods, such as the Interior-Point method (IPM), are used to solve such problems. This project explores the possibilities of using deep learning and reinforcement learning to solve boundary control problems. We adhere to the framework of iterative optimization strategies, employing a spatial neural network to construct well-informed initial guesses, and a spatio-temporal neural network learns the iterative optimization algorithm using policy gradients. Synthetic data, generated from the problems formulated in the literature, is used for training, testing and validation. The numerical experiments indicate that the proposed method can rival the speed and accuracy of existing solvers. In our preliminary results, the network attains costs lower than IPOPT, a state-of-the-art non-linear IPM, in 51\% cases. The overall number of floating point operations in the proposed method is similar to that of IPOPT. Additionally, the informed initial guess method and the learned momentum-like behaviour in the optimizer method are incorporated to avoid convergence to local minima.
Neural Multi-Objective Combinatorial Optimization with Diversity Enhancement
Authors: Authors: Jinbiao Chen, Zizhen Zhang, Zhiguang Cao, Yaoxin Wu, Yining Ma, Te Ye, Jiahai Wang
Abstract
Most of existing neural methods for multi-objective combinatorial optimization (MOCO) problems solely rely on decomposition, which often leads to repetitive solutions for the respective subproblems, thus a limited Pareto set. Beyond decomposition, we propose a novel neural heuristic with diversity enhancement (NHDE) to produce more Pareto solutions from two perspectives. On the one hand, to hinder duplicated solutions for different subproblems, we propose an indicator-enhanced deep reinforcement learning method to guide the model, and design a heterogeneous graph attention mechanism to capture the relations between the instance graph and the Pareto front graph. On the other hand, to excavate more solutions in the neighborhood of each subproblem, we present a multiple Pareto optima strategy to sample and preserve desirable solutions. Experimental results on classic MOCO problems show that our NHDE is able to generate a Pareto front with higher diversity, thereby achieving superior overall performance. Moreover, our NHDE is generic and can be applied to different neural methods for MOCO.
Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization
Authors: Authors: Jinbiao Chen, Jiahai Wang, Zizhen Zhang, Zhiguang Cao, Te Ye, Siyuan Chen
Abstract
Recently, neural heuristics based on deep reinforcement learning have exhibited promise in solving multi-objective combinatorial optimization problems (MOCOPs). However, they are still struggling to achieve high learning efficiency and solution quality. To tackle this issue, we propose an efficient meta neural heuristic (EMNH), in which a meta-model is first trained and then fine-tuned with a few steps to solve corresponding single-objective subproblems. Specifically, for the training process, a (partial) architecture-shared multi-task model is leveraged to achieve parallel learning for the meta-model, so as to speed up the training; meanwhile, a scaled symmetric sampling method with respect to the weight vectors is designed to stabilize the training. For the fine-tuning process, an efficient hierarchical method is proposed to systematically tackle all the subproblems. Experimental results on the multi-objective traveling salesman problem (MOTSP), multi-objective capacitated vehicle routing problem (MOCVRP), and multi-objective knapsack problem (MOKP) show that, EMNH is able to outperform the state-of-the-art neural heuristics in terms of solution quality and learning efficiency, and yield competitive solutions to the strong traditional heuristics while consuming much shorter time.
Triple Simplex Matrix Completion for Expense Forecasting
Authors: Authors: Cheng Qian, Lucas Glass, Nikos Sidiropoulos
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Abstract
Forecasting project expenses is a crucial step for businesses to avoid budget overruns and project failures. Traditionally, this has been done by financial analysts or data science techniques such as time-series analysis. However, these approaches can be uncertain and produce results that differ from the planned budget, especially at the start of a project with limited data points. This paper proposes a constrained non-negative matrix completion model that predicts expenses by learning the likelihood of the project correlating with certain expense patterns in the latent space. The model is constrained on three probability simplexes, two of which are on the factor matrices and the third on the missing entries. Additionally, the predicted expense values are guaranteed to meet the budget constraint without the need of post-processing. An inexact alternating optimization algorithm is developed to solve the associated optimization problem and is proven to converge to a stationary point. Results from two real datasets demonstrate the effectiveness of the proposed method in comparison to state-of-the-art algorithms.
Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency
Abstract
We consider Bayesian optimization using Gaussian Process models, also referred to as kernel-based bandit optimization. We study the methodology of exploring the domain using random samples drawn from a distribution. We show that this random exploration approach achieves the optimal error rates. Our analysis is based on novel concentration bounds in an infinite dimensional Hilbert space established in this work, which may be of independent interest. We further develop an algorithm based on random exploration with domain shrinking and establish its order-optimal regret guarantees under both noise-free and noisy settings. In the noise-free setting, our analysis closes the existing gap in regret performance and thereby resolves a COLT open problem. The proposed algorithm also enjoys a computational advantage over prevailing methods due to the random exploration that obviates the expensive optimization of a non-convex acquisition function for choosing the query points at each iteration.
DoGE: Domain Reweighting with Generalization Estimation
Authors: Authors: Simin Fan, Matteo Pagliardini, Martin Jaggi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
The coverage and composition of the pretraining data corpus significantly impacts the generalization ability of large language models. Conventionally, the pretraining corpus is composed of various source domains (e.g. CommonCrawl, Wikipedia, Github etc.) according to certain sampling probabilities (domain weights). However, current methods lack a principled way to optimize domain weights for ultimate goal for generalization. We propose DOmain reweighting with Generalization Estimation (DoGE), where we reweigh the sampling probability from each domain based on its contribution to the final generalization objective assessed by a gradient-based generalization estimation function. First, we train a small-scale proxy model with a min-max optimization to obtain the reweighted domain weights. At each step, the domain weights are updated to maximize the overall generalization gain by mirror descent. Finally we use the obtained domain weights to train a larger scale full-size language model. On SlimPajama-6B dataset, with universal generalization objective, DoGE achieves better average perplexity and zero-shot reasoning accuracy. On out-of-domain generalization tasks, DoGE reduces perplexity on the target domain by a large margin. We further apply a parameter-selection scheme which improves the efficiency of generalization estimation.
Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach
Abstract
We study the problem of computationally and label efficient PAC active learning $d$-dimensional halfspaces with Tsybakov Noise~\citep{tsybakov2004optimal} under structured unlabeled data distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any approximate first-order stationary point of a smooth nonconvex loss function yields a halfspace with a low excess error guarantee. In light of the above structural result, we design a nonconvex optimization-based algorithm with a label complexity of $\tilde{O}(d (\frac{1}{\epsilon})^{\frac{8-6\alpha}{3\alpha-1}})$\footnote{In the main body of this work, we use $\tilde{O}(\cdot), \tilde{\Theta}(\cdot)$ to hide factors of the form $\polylog(d, \frac{1}{\epsilon}, \frac{1}{\delta})$}, under the assumption that the Tsybakov noise parameter $\alpha \in (\frac13, 1]$, which narrows down the gap between the label complexities of the previously known efficient passive or active algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the information-theoretic lower bound in this setting.
Sensor Attacks and Resilient Defense on HVAC Systems for Energy Market Signal Tracking
Abstract
The power flexibility from smart buildings makes them suitable candidates for providing grid services. The building automation system (BAS) that employs model predictive control (MPC) for grid services relies heavily on sensor data gathered from IoT-based HVAC systems through communication networks. However, cyber-attacks that tamper sensor values can compromise the accuracy and flexibility of HVAC system power adjustment. Existing studies on grid-interactive buildings mainly focus on the efficiency and flexibility of buildings' participation in grid operations, while the security aspect is lacking. In this paper, we investigate the effects of cyber-attacks on HVAC systems in grid-interactive buildings, specifically their power-tracking performance. We design a stochastic optimization-based stealthy sensor attack and a corresponding defense strategy using a resilient control framework. The attack and its defense are tested in a physical model of a test building with a single-chiller HVAC system. Simulation results demonstrate that minor falsifications caused by a stealthy sensor attack can significantly alter the power profile, leading to large power tracking errors. However, the resilient control framework can reduce the power tracking error by over 70% under such attacks without filtering out compromised data.
Diverse Conventions for Human-AI Collaboration
Authors: Authors: Bidipta Sarkar, Andy Shih, Dorsa Sadigh
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract
Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce \emph{mixed-play}, where an initial state is randomly generated by sampling self-play and cross-play transitions and the player learns to maximize the self-play reward from this initial state. We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users.
Fractal Landscapes in Policy Optimization
Authors: Authors: Tao Wang, Sylvia Herbert, Sicun Gao
Abstract
Policy gradient lies at the core of deep reinforcement learning (RL) in continuous domains. Despite much success, it is often observed in practice that RL training with policy gradient can fail for many reasons, even on standard control problems with known solutions. We propose a framework for understanding one inherent limitation of the policy gradient approach: the optimization landscape in the policy space can be extremely non-smooth or fractal for certain classes of MDPs, such that there does not exist gradient to be estimated in the first place. We draw on techniques from chaos theory and non-smooth analysis, and analyze the maximal Lyapunov exponents and H\"older exponents of the policy optimization objectives. Moreover, we develop a practical method that can estimate the local smoothness of objective function from samples to identify when the training process has encountered fractal landscapes. We show experiments to illustrate how some failure cases of policy optimization can be explained by such fractal landscapes.
Nested Control Co-design of a Spar Buoy Horizontal-axis Floating Offshore Wind Turbine
Authors: Authors: Saeid Bayat, Yong Hoon Lee, James T. Allison
Abstract
Floating offshore wind turbine (FOWT) systems involve several coupled physical analysis disciplines, including aeroelasticity, multi-body structural dynamics, hydrodynamics, and controls. Conventionally, physical structure (plant) and control design decisions are treated as two separate problems, and generally, control design is performed after the plant design is complete. However, this sequential design approach cannot fully capitalize upon the synergy between plant and control design decisions. These conventional design practices produce suboptimal designs, especially in cases with strong coupling between plant and control design decisions. Control co-design (CCD) is a holistic design approach that accounts fully for plant-control design coupling by optimizing these decisions simultaneously. CCD is especially advantageous for system design problems with complex interactions between physics disciplines, which is the case for FOWT systems. This paper presents and demonstrates a nested CCD approach using open-loop optimal control (OLOC) for a simplified reduced-order model that simulates FOWT dynamic behavior. This simplified model is helpful for optimization studies due to its computational efficiency, but is still sufficiently rich enough to capture important multidisciplinary physics couplings and plant-control design coupling associated with a horizontal-axis FOWT system with a spar buoy floating platform. The CCD result shows an improvement in the objective function, annual energy production (AEP), compared to the baseline design by more than eleven percent. Optimization studies at this fidelity level can provide system design engineers with insights into design directions that leverage design coupling to improve performance. These studies also provide a template for future more detailed turbine CCD optimization studies that utilize higher fidelity models and design representations.
Robust Representation Learning for Unified Online Top-K Recommendation
Abstract
In large-scale industrial e-commerce, the efficiency of an online recommendation system is crucial in delivering highly relevant item/content advertising that caters to diverse business scenarios. However, most existing studies focus solely on item advertising, neglecting the significance of content advertising. This oversight results in inconsistencies within the multi-entity structure and unfair retrieval. Furthermore, the challenge of retrieving top-k advertisements from multi-entity advertisements across different domains adds to the complexity. Recent research proves that user-entity behaviors within different domains exhibit characteristics of differentiation and homogeneity. Therefore, the multi-domain matching models typically rely on the hybrid-experts framework with domain-invariant and domain-specific representations. Unfortunately, most approaches primarily focus on optimizing the combination mode of different experts, failing to address the inherent difficulty in optimizing the expert modules themselves. The existence of redundant information across different domains introduces interference and competition among experts, while the distinct learning objectives of each domain lead to varying optimization challenges among experts. To tackle these issues, we propose robust representation learning for the unified online top-k recommendation. Our approach constructs unified modeling in entity space to ensure data fairness. The robust representation learning employs domain adversarial learning and multi-view wasserstein distribution learning to learn robust representations. Moreover, the proposed method balances conflicting objectives through the homoscedastic uncertainty weights and orthogonality constraints. Various experiments validate the effectiveness and rationality of our proposed method, which has been successfully deployed online to serve real business scenarios.
AMG: Automated Efficient Approximate Multiplier Generator for FPGAs via Bayesian Optimization
Abstract
Approximate computing is a promising approach to reduce the power, delay, and area in hardware design for many error-resilient applications such as machine learning (ML) and digital signal processing (DSP) systems, in which multipliers usually are key arithmetic units. Due to the underlying architectural differences between ASICs and FPGAs, existing ASIC-based approximate multipliers do not offer symmetrical gains when they are implemented by FPGA resources. In this paper, we propose AMG, an open-source automated approximate multiplier generator for FPGAs driven by Bayesian optimization (BO) with parallel evaluation. The proposed method simplifies the exact half adders (HAs) for the initial partial product (PP) compression in a multiplier while preserving coarse-grained additions for the following accumulation. The generated multipliers can be effectively mapped to lookup tables (LUTs) and carry chains provided by modern FPGAs, reducing hardware costs with acceptable errors. Compared with 1167 multipliers from previous works, our generated multipliers can form a Pareto front with 28.70%-38.47% improvements in terms of the product of hardware cost and error on average. All source codes, reproduced multipliers, and our generated multipliers are available at https://github.com/phyzhenli/AMG.
Topology Optimization with Text-Guided Stylization
Authors: Authors: Shengze Zhong, Parinya Punpongsanon, Daisuke Iwai, Kosuke Sato
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
We propose an approach for the generation of topology-optimized structures with text-guided appearance stylization. This methodology aims to enrich the concurrent design of a structure's physical functionality and aesthetic appearance. Users can effortlessly input descriptive text to govern the style of the structure. Our system employs a hash-encoded neural network as the implicit structure representation backbone, which serves as the foundation for the co-optimization of structural mechanical performance, style, and connectivity, to ensure full-color, high-quality 3D-printable solutions. We substantiate the effectiveness of our system through extensive comparisons, demonstrations, and a 3D printing test.
Optimization of process parameters in additive manufacturing based on the finite element method
Abstract
A design optimization framework for process parameters of additive manufacturing based on finite element simulation is proposed. The finite element method uses a coupled thermomechanical model developed for fused deposition modeling from the authors' previous work. Both gradient-based and gradient-free optimization methods are proposed. The gradient-based approach, which solves a PDE-constrained optimization problem, requires sensitivities computed from the fully discretized finite element model. We show the derivation of the sensitivities and apply them in a projected gradient descent algorithm. For the gradient-free approach, we propose two distinct algorithms: a local search algorithm called the method of local variations and a Bayesian optimization algorithm using Gaussian processes. To illustrate the effectiveness and differences of the methods, we provide two-dimensional design optimization examples using all three proposed algorithms.
Capacity-based Spatial Modulation Constellation and Pre-scaling Design
Abstract
Spatial Modulation (SM) can utilize the index of the transmit antenna (TA) to transmit additional information. In this paper, to improve the performance of SM, a non-uniform constellation (NUC) and pre-scaling coefficients optimization design scheme is proposed. The bit-interleaved coded modulation (BICM) capacity calculation formula of SM system is firstly derived. The constellation and pre-scaling coefficients are optimized by maximizing the BICM capacity without channel state information (CSI) feedback. Optimization results are given for the multiple-input-single-output (MISO) system with Rayleigh channel. Simulation result shows the proposed scheme provides a meaningful performance gain compared to conventional SM system without CSI feedback. The proposed optimization design scheme can be a promising technology for future 6G to achieve high-efficiency.
Accelerating Split Federated Learning over Wireless Communication Networks
Abstract
The development of artificial intelligence (AI) provides opportunities for the promotion of deep neural network (DNN)-based applications. However, the large amount of parameters and computational complexity of DNN makes it difficult to deploy it on edge devices which are resource-constrained. An efficient method to address this challenge is model partition/splitting, in which DNN is divided into two parts which are deployed on device and server respectively for co-training or co-inference. In this paper, we consider a split federated learning (SFL) framework that combines the parallel model training mechanism of federated learning (FL) and the model splitting structure of split learning (SL). We consider a practical scenario of heterogeneous devices with individual split points of DNN. We formulate a joint problem of split point selection and bandwidth allocation to minimize the system latency. By using alternating optimization, we decompose the problem into two sub-problems and solve them optimally. Experiment results demonstrate the superiority of our work in latency reduction and accuracy improvement.
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression
Authors: Authors: Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Ran Lucien Wang, Rui Yan
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. While numerous model compression techniques have been proposed, most of them are not well-suited for achieving extreme model compression when there is a significant gap in model scale. In this paper, we introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT), which effectively transfers the knowledge of LLMs to extremely small-scale models (e.g., 1%). In particular, our approach extracts knowledge from LLMs to construct a knowledge store, from which the small-scale model can retrieve relevant information and leverage it for effective inference. To improve the quality of the model, soft prompt tuning and Proximal Policy Optimization (PPO) reinforcement learning techniques are employed. Extensive experiments are conducted on low-resource tasks from SuperGLUE and GLUE benchmarks. The results demonstrate that the proposed approach significantly enhances the performance of small-scale models by leveraging the knowledge from LLMs.
Deceptive Fairness Attacks on Graphs via Meta Learning
Abstract
We study deceptive fairness attacks on graphs to answer the following question: How can we achieve poisoning attacks on a graph learning model to exacerbate the bias deceptively? We answer this question via a bi-level optimization problem and propose a meta learning-based framework named FATE. FATE is broadly applicable with respect to various fairness definitions and graph learning models, as well as arbitrary choices of manipulation operations. We further instantiate FATE to attack statistical parity and individual fairness on graph neural networks. We conduct extensive experimental evaluations on real-world datasets in the task of semi-supervised node classification. The experimental results demonstrate that FATE could amplify the bias of graph neural networks with or without fairness consideration while maintaining the utility on the downstream task. We hope this paper provides insights into the adversarial robustness of fair graph learning and can shed light on designing robust and fair graph learning in future studies.
Physics-Informed with Power-Enhanced Residual Network for Interpolation and Inverse Problems
Authors: Authors: Amir Noorizadegan, D.L. Young, Y.C. Hon, C.S. Chen
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Analysis of PDEs (math.AP)
Abstract
This paper introduces a novel neural network structure called the Power-Enhancing residual network, designed to improve interpolation capabilities for both smooth and non-smooth functions in 2D and 3D settings. By adding power terms to residual elements, the architecture boosts the network's expressive power. The study explores network depth, width, and optimization methods, showing the architecture's adaptability and performance advantages. Consistently, the results emphasize the exceptional accuracy of the proposed Power-Enhancing residual network, particularly for non-smooth functions. Real-world examples also confirm its superiority over plain neural network in terms of accuracy, convergence, and efficiency. The study also looks at the impact of deeper network. Moreover, the proposed architecture is also applied to solving the inverse Burgers' equation, demonstrating superior performance. In conclusion, the Power-Enhancing residual network offers a versatile solution that significantly enhances neural network capabilities. The codes implemented are available at: \url{https://github.com/CMMAi/ResNet_for_PINN}.
Abstract
3D scene segmentation based on neural implicit representation has emerged recently with the advantage of training only on 2D supervision. However, existing approaches still requires expensive per-scene optimization that prohibits generalization to novel scenes during inference. To circumvent this problem, we introduce a generalizable 3D segmentation framework based on implicit representation. Specifically, our framework takes in multi-view image features and semantic maps as the inputs instead of only spatial information to avoid overfitting to scene-specific geometric and semantic information. We propose a novel soft voting mechanism to aggregate the 2D semantic information from different views for each 3D point. In addition to the image features, view difference information is also encoded in our framework to predict the voting scores. Intuitively, this allows the semantic information from nearby views to contribute more compared to distant ones. Furthermore, a visibility module is also designed to detect and filter out detrimental information from occluded views. Due to the generalizability of our proposed method, we can synthesize semantic maps or conduct 3D semantic segmentation for novel scenes with solely 2D semantic supervision. Experimental results show that our approach achieves comparable performance with scene-specific approaches. More importantly, our approach can even outperform existing strong supervision-based approaches with only 2D annotations. Our source code is available at: https://github.com/HLinChen/GNeSF.
Gradient-Based Eigenvalue Optimization for Electromagnetic Cavities with Built-in Mode Matching
Authors: Authors: Anna Ziegler, Robert Hahn, Victoria Isensee, Anh Duc Nguyen, Sebastian Schöps
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
Shape optimization with respect to eigenvalues of a cavity plays an important role in the design of new resonators or in the optimization of existing ones. In our paper, we propose a gradient-based optimization scheme, which we enhance with closed-form shape derivatives of the system matrices. Based on these, we can compute accurate derivatives of eigenvalues, eigenmodes and the cost function with respect to the geometry, which significantly reduces the computational effort of the optimizer. We demonstrate our work by applying it to the 9-cell TESLA cavity, for which we tune the design parameters of the computational model to match the design criteria for devices in realistic use cases. Since eigenvalues may cross during the shape optimization of a cavity, we propose a new algorithm based on an eigenvalue matching procedure, to ensure the optimization of the desired mode in order to also enable successful matching along large shape variations.
Improving generalization in large language models by learning prefix subspaces
Authors: Authors: Louis Falissard, Vincent Guigue, Laure Soulier
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
This article focuses on large language models (LLMs) fine-tuning in the scarce data regime (also known as the "few-shot" learning setting). We propose a method to increase the generalization capabilities of LLMs based on neural network subspaces. This optimization method, recently introduced in computer vision, aims to improve model generalization by identifying wider local optima through the joint optimization of an entire simplex of models in parameter space. Its adaptation to massive, pretrained transformers, however, poses some challenges. First, their considerable number of parameters makes it difficult to train several models jointly, and second, their deterministic parameter initialization schemes make them unfit for the subspace method as originally proposed. We show in this paper that "Parameter Efficient Fine-Tuning" (PEFT) methods, however, are perfectly compatible with this original approach, and propose to learn entire simplex of continuous prefixes. We test our method on a variant of the GLUE benchmark adapted to the few-shot learning setting, and show that both our contributions jointly lead to a gain in average performances compared to sota methods. The implementation can be found at the following link: https://github.com/Liloulou/prefix_subspace
Unnatural language processing: How do language models handle machine-generated prompts?
Authors: Authors: Corentin Kervadec, Francesca Franzon, Marco Baroni
Abstract
Language model prompt optimization research has shown that semantically and grammatically well-formed manually crafted prompts are routinely outperformed by automatically generated token sequences with no apparent meaning or syntactic structure, including sequences of vectors from a model's embedding space. We use machine-generated prompts to probe how models respond to input that is not composed of natural language expressions. We study the behavior of models of different sizes in multiple semantic tasks in response to both continuous and discrete machine-generated prompts, and compare it to the behavior in response to human-generated natural-language prompts. Even when producing a similar output, machine-generated and human prompts trigger different response patterns through the network processing pathways, including different perplexities, different attention and output entropy distributions, and different unit activation profiles. We provide preliminary insight into the nature of the units activated by different prompt types, suggesting that only natural language prompts recruit a genuinely linguistic circuit.
Metric Clustering and MST with Strong and Weak Distance Oracles
Authors: Authors: MohammadHossein Bateni, Prathamesh Dharangutte, Rajesh Jayaram, Chen Wang
Abstract
We study optimization problems in a metric space $(\mathcal{X},d)$ where we can compute distances in two ways: via a ''strong'' oracle that returns exact distances $d(x,y)$, and a ''weak'' oracle that returns distances $\tilde{d}(x,y)$ which may be arbitrarily corrupted with some probability. This model captures the increasingly common trade-off between employing both an expensive similarity model (e.g. a large-scale embedding model), and a less accurate but cheaper model. Hence, the goal is to make as few queries to the strong oracle as possible. We consider both so-called ''point queries'', where the strong oracle is queried on a set of points $S \subset \mathcal{X} $ and returns $d(x,y)$ for all $x,y \in S$, and ''edge queries'' where it is queried for individual distances $d(x,y)$. Our main contributions are optimal algorithms and lower bounds for clustering and Minimum Spanning Tree (MST) in this model. For $k$-centers, $k$-median, and $k$-means, we give constant factor approximation algorithms with only $\tilde{O}(k)$ strong oracle point queries, and prove that $\Omega(k)$ queries are required for any bounded approximation. For edge queries, our upper and lower bounds are both $\tilde{\Theta}(k^2)$. Surprisingly, for the MST problem we give a $O(\sqrt{\log n})$ approximation algorithm using no strong oracle queries at all, and a matching $\Omega(\sqrt{\log n})$ lower bound. We empirically evaluate our algorithms, and show that their quality is comparable to that of the baseline algorithms that are given all true distances, but while querying the strong oracle on only a small fraction ($<1\%$) of points.
Beam Design and Signal Enhancement in RIS-Aided Multi-User Communication Systems: A Maximization-Minimization Approach
Authors: Authors: Rujing Xiong, Jialong Lu, Ke Yin, Tiebin Mi, Robert Caiming Qiu
Abstract
Most beam design schemes in RIS-aided multi-user systems suffer from imbalanced signal power gains and computation complexity. This paper tackles the crucial beam design issue by focusing on received power optimization. Concretely, we leverage the passive characteristics of RIS, modeling the reflecting signals and articulating a comprehensive max-min optimization framework. To efficiently address the beamforming issue, we propose the Moreau-Yosida approximation (MA) algorithm, which pursues the optimal beamforming designs based on arbitrary utility functions in the user's received power. The comprehensive numerical simulations and prototype experiments substantiate the effectiveness of our proposed algorithm.
Mitigate Domain Shift by Primary-Auxiliary Objectives Association for Generalizing Person ReID
Authors: Authors: Qilei Li, Shaogang Gong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
While deep learning has significantly improved ReID model accuracy under the independent and identical distribution (IID) assumption, it has also become clear that such models degrade notably when applied to an unseen novel domain due to unpredictable/unknown domain shift. Contemporary domain generalization (DG) ReID models struggle in learning domain-invariant representation solely through training on an instance classification objective. We consider that a deep learning model is heavily influenced and therefore biased towards domain-specific characteristics, e.g., background clutter, scale and viewpoint variations, limiting the generalizability of the learned model, and hypothesize that the pedestrians are domain invariant owning they share the same structural characteristics. To enable the ReID model to be less domain-specific from these pure pedestrians, we introduce a method that guides model learning of the primary ReID instance classification objective by a concurrent auxiliary learning objective on weakly labeled pedestrian saliency detection. To solve the problem of conflicting optimization criteria in the model parameter space between the two learning objectives, we introduce a Primary-Auxiliary Objectives Association (PAOA) mechanism to calibrate the loss gradients of the auxiliary task towards the primary learning task gradients. Benefiting from the harmonious multitask learning design, our model can be extended with the recent test-time diagram to form the PAOA+, which performs on-the-fly optimization against the auxiliary objective in order to maximize the model's generative capacity in the test target domain. Experiments demonstrate the superiority of the proposed PAOA model.
GO-FEAP: Global Optimal UAV Planner Using Frontier-Omission-Aware Exploration and Altitude-Stratified Planning
Abstract
Autonomous exploration is a fundamental problem for various applications of unmanned aerial vehicles(UAVs). Existing methods, however, are demonstrated to static local optima and two-dimensional exploration. To address these challenges, this paper introduces GO-FEAP (Global Optimal UAV Planner Using Frontier-Omission-Aware Exploration and Altitude-Stratified Planning), aiming to achieve efficient and complete three-dimensional exploration. Frontier-Omission-Aware Exploration module presented in this work takes into account multiple pivotal factors, encompassing frontier distance, nearby frontier count, frontier duration, and frontier categorization, for a comprehensive assessment of frontier importance. Furthermore, to tackle scenarios with substantial vertical variations, we introduce the Altitude-Stratified Planning strategy, which stratifies the three-dimensional space based on altitude, conducting global-local planning for each stratum. The objective of global planning is to identify the most optimal frontier for exploration, followed by viewpoint selection and local path optimization based on frontier type, ultimately generating dynamically feasible three-dimensional spatial exploration trajectories. We present extensive benchmark and real-world tests, in which our method completes the exploration tasks with unprecedented completeness compared to state-of-the-art approaches.
Comparative Review of Unscented Kalman Filter Design for Agricultural Anaerobic Digestion Model
Authors: Authors: Simon Hellmann, Terrance Wilms, Stefan Streif, Sören Weinrich
Abstract
Dynamic operation of biological processes, such as anaerobic digestion (AD), requires reliable process monitoring to guarantee stable operating conditions at all times. Unscented Kalman filters (UKF) are an established tool for nonlinear state estimation, and there exist numerous variants of UKF implementations, treating state constraints, improvements of numerical performance and different noise scenarios. So far, however, a unified comparison of proposed methods emphasizing the algorithmic details is lacking. The present study thus examines multiple unconstrained and constrained UKF variants, addresses aspects crucial for direct implementation and applies them to a simplified AD model. The constrained UKF considering additive noise delivered the most accurate state estimations. The long run time of the underlying optimization could be vastly reduced through pre-calculated gradients and Hessian of the associated cost function, as well as by reformulation of the cost function as a quadratic program. However, unconstrained UKF variants showed lower run times and competitive estimation accuracy. This study provides useful advice to practitioners working with nonlinear Kalman filters by paying close attention to algorithmic details and modifications crucial for successful implementation.
Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization
Authors: Authors: Zhen Qin, Zhishuai Liu, Pan Xu
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
signSGD is popular in nonconvex optimization due to its communication efficiency. Yet, existing analyses of signSGD rely on assuming that data are sampled with replacement in each iteration, contradicting the practical implementation where data are randomly reshuffled and sequentially fed into the algorithm. We bridge this gap by proving the first convergence result of signSGD with random reshuffling (SignRR) for nonconvex optimization. Given the dataset size $n$, the number of epochs of data passes $T$, and the variance bound of a stochastic gradient $\sigma^2$, we show that SignRR has the same convergence rate $O(\log(nT)/\sqrt{nT} + |\sigma|_1)$ as signSGD \citep{bernstein2018signsgd}. We then present SignRVR and SignRVM, which leverage variance-reduced gradients and momentum updates respectively, both converging at $O(\log(nT)/\sqrt{nT})$. In contrast with the analysis of signSGD, our results do not require an extremely large batch size in each iteration to be of the same order as the total number of iterations \citep{bernstein2018signsgd} or the signs of stochastic and true gradients match element-wise with a minimum probability of 1/2 \citep{safaryan2021stochastic}. We also extend our algorithms to cases where data are distributed across different machines, yielding dist-SignRVR and dist-SignRVM, both converging at $O(\log(n_0T)/\sqrt{n_0T})$, where $n_0$ is the dataset size of a single machine. We back up our theoretical findings through experiments on simulated and real-world problems, verifying that randomly reshuffled sign methods match or surpass existing baselines.
An Iterative Approach to Data-Driven Inference for Decoding Oscillator Network Structures
Authors: Authors: Shicheng Li, Bharat Singhal, Jr-Shin Li
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
In complex networks, interactions between multiple agents give rise to an array of intricate global dynamics, ranging from synchronization to cluster formations. Decoding the connectivity structure as well as the types of interactions from measurement data is the first step toward understanding these intriguing behaviors. In this paper, we present a bilinear optimization framework to infer both the connectivity and interaction functions of oscillator networks with the identical class of coupling functions. We then propose an iterative algorithm to solve the resulting bilinear problem and illustrate its convergence. We validate our approach on both simulated and noisy experimental datasets, where we demonstrate its effectiveness compared with existing approaches.
White-box Compiler Fuzzing Empowered by Large Language Models
Abstract
Compiler correctness is crucial, as miscompilation falsifying the program behaviors can lead to serious consequences. In the literature, fuzzing has been extensively studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on black- and grey-box fuzzing, which generates tests without sufficient understanding of internal compiler behaviors. As such, they often fail to construct programs to exercise conditions of intricate optimizations. Meanwhile, traditional white-box techniques are computationally inapplicable to the giant codebase of compilers. Recent advances demonstrate that Large Language Models (LLMs) excel in code generation/understanding tasks and have achieved state-of-the-art performance in black-box fuzzing. Nonetheless, prompting LLMs with compiler source-code information remains a missing piece of research in compiler testing. To this end, we propose WhiteFox, the first white-box compiler fuzzer using LLMs with source-code information to test compiler optimization. WhiteFox adopts a dual-model framework: (i) an analysis LLM examines the low-level optimization source code and produces requirements on the high-level test programs that can trigger the optimization; (ii) a generation LLM produces test programs based on the summarized requirements. Additionally, optimization-triggering tests are used as feedback to further enhance the test generation on the fly. Our evaluation on four popular compilers shows that WhiteFox can generate high-quality tests to exercise deep optimizations requiring intricate conditions, practicing up to 80 more optimizations than state-of-the-art fuzzers. To date, WhiteFox has found in total 96 bugs, with 80 confirmed as previously unknown and 51 already fixed. Beyond compiler testing, WhiteFox can also be adapted for white-box fuzzing of other complex, real-world software systems in general.
Keyword: adam
There is no result
Keyword: gradient
Application of deep and reinforcement learning to boundary control problems
Authors: Authors: Zenin Easa Panthakkalakath, Juraj Kardoš, Olaf Schenk
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract
The boundary control problem is a non-convex optimization and control problem in many scientific domains, including fluid mechanics, structural engineering, and heat transfer optimization. The aim is to find the optimal values for the domain boundaries such that the enclosed domain adhering to the governing equations attains the desired state values. Traditionally, non-linear optimization methods, such as the Interior-Point method (IPM), are used to solve such problems. This project explores the possibilities of using deep learning and reinforcement learning to solve boundary control problems. We adhere to the framework of iterative optimization strategies, employing a spatial neural network to construct well-informed initial guesses, and a spatio-temporal neural network learns the iterative optimization algorithm using policy gradients. Synthetic data, generated from the problems formulated in the literature, is used for training, testing and validation. The numerical experiments indicate that the proposed method can rival the speed and accuracy of existing solvers. In our preliminary results, the network attains costs lower than IPOPT, a state-of-the-art non-linear IPM, in 51\% cases. The overall number of floating point operations in the proposed method is similar to that of IPOPT. Additionally, the informed initial guess method and the learned momentum-like behaviour in the optimizer method are incorporated to avoid convergence to local minima.
GradSim: Gradient-Based Language Grouping for Effective Multilingual Training
Abstract
Most languages of the world pose low-resource challenges to natural language processing models. With multilingual training, knowledge can be shared among languages. However, not all languages positively influence each other and it is an open research question how to select the most suitable set of languages for multilingual training and avoid negative interference among languages whose characteristics or data distributions are not compatible. In this paper, we propose GradSim, a language grouping method based on gradient similarity. Our experiments on three diverse multilingual benchmark datasets show that it leads to the largest performance gains compared to other similarity measures and it is better correlated with cross-lingual model performance. As a result, we set the new state of the art on AfriSenti, a benchmark dataset for sentiment analysis on low-resource African languages. In our extensive analysis, we further reveal that besides linguistic features, the topics of the datasets play an important role for language grouping and that lower layers of transformer models encode language-specific features while higher layers capture task-specific information.
ADMM Training Algorithms for Residual Networks: Convergence, Complexity and Parallel Training
Abstract
We design a series of serial and parallel proximal point (gradient) ADMMs for the fully connected residual networks (FCResNets) training problem by introducing auxiliary variables. Convergence of the proximal point version is proven based on a Kurdyka-Lojasiewicz (KL) property analysis framework, and we can ensure a locally R-linear or sublinear convergence rate depending on the different ranges of the Kurdyka-Lojasiewicz (KL) exponent, in which a necessary auxiliary function is constructed to realize our goal. Moreover, the advantages of the parallel implementation in terms of lower time complexity and less (per-node) memory consumption are analyzed theoretically. To the best of our knowledge, this is the first work analyzing the convergence, convergence rate, time complexity and (per-node) runtime memory requirement of the ADMM applied in the FCResNets training problem theoretically. Experiments are reported to show the high speed, better performance, robustness and potential in the deep network training tasks. Finally, we present the advantage and potential of our parallel training in large-scale problems.
Deep Integrated Explanations
Authors: Authors: Oren Barkan, Yehonathan Elisha, Jonathan Weill, Yuval Asher, Amit Eshel, Noam Koenigstein
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents Deep Integrated Explanations (DIX) - a universal method for explaining vision models. DIX generates explanation maps by integrating information from the intermediate representations of the model, coupled with their corresponding gradients. Through an extensive array of both objective and subjective evaluations spanning diverse tasks, datasets, and model configurations, we showcase the efficacy of DIX in generating faithful and accurate explanation maps, while surpassing current state-of-the-art methods.
DoGE: Domain Reweighting with Generalization Estimation
Authors: Authors: Simin Fan, Matteo Pagliardini, Martin Jaggi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract
The coverage and composition of the pretraining data corpus significantly impacts the generalization ability of large language models. Conventionally, the pretraining corpus is composed of various source domains (e.g. CommonCrawl, Wikipedia, Github etc.) according to certain sampling probabilities (domain weights). However, current methods lack a principled way to optimize domain weights for ultimate goal for generalization. We propose DOmain reweighting with Generalization Estimation (DoGE), where we reweigh the sampling probability from each domain based on its contribution to the final generalization objective assessed by a gradient-based generalization estimation function. First, we train a small-scale proxy model with a min-max optimization to obtain the reweighted domain weights. At each step, the domain weights are updated to maximize the overall generalization gain by mirror descent. Finally we use the obtained domain weights to train a larger scale full-size language model. On SlimPajama-6B dataset, with universal generalization objective, DoGE achieves better average perplexity and zero-shot reasoning accuracy. On out-of-domain generalization tasks, DoGE reduces perplexity on the target domain by a large margin. We further apply a parameter-selection scheme which improves the efficiency of generalization estimation.
Fractal Landscapes in Policy Optimization
Authors: Authors: Tao Wang, Sylvia Herbert, Sicun Gao
Abstract
Policy gradient lies at the core of deep reinforcement learning (RL) in continuous domains. Despite much success, it is often observed in practice that RL training with policy gradient can fail for many reasons, even on standard control problems with known solutions. We propose a framework for understanding one inherent limitation of the policy gradient approach: the optimization landscape in the policy space can be extremely non-smooth or fractal for certain classes of MDPs, such that there does not exist gradient to be estimated in the first place. We draw on techniques from chaos theory and non-smooth analysis, and analyze the maximal Lyapunov exponents and H\"older exponents of the policy optimization objectives. Moreover, we develop a practical method that can estimate the local smoothness of objective function from samples to identify when the training process has encountered fractal landscapes. We show experiments to illustrate how some failure cases of policy optimization can be explained by such fractal landscapes.
Private Learning with Public Features
Authors: Authors: Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Shuang Song, Abhradeep Thakurta, Li Zhang
Abstract
We study a class of private learning problems in which the data is a join of private and public features. This is often the case in private personalization tasks such as recommendation or ad prediction, in which features related to individuals are sensitive, while features related to items (the movies or songs to be recommended, or the ads to be shown to users) are publicly available and do not require protection. A natural question is whether private algorithms can achieve higher utility in the presence of public features. We give a positive answer for multi-encoder models where one of the encoders operates on public features. We develop new algorithms that take advantage of this separation by only protecting certain sufficient statistics (instead of adding noise to the gradient). This method has a guaranteed utility improvement for linear regression, and importantly, achieves the state of the art on two standard private recommendation benchmarks, demonstrating the importance of methods that adapt to the private-public feature separation.
Optimization of process parameters in additive manufacturing based on the finite element method
Abstract
A design optimization framework for process parameters of additive manufacturing based on finite element simulation is proposed. The finite element method uses a coupled thermomechanical model developed for fused deposition modeling from the authors' previous work. Both gradient-based and gradient-free optimization methods are proposed. The gradient-based approach, which solves a PDE-constrained optimization problem, requires sensitivities computed from the fully discretized finite element model. We show the derivation of the sensitivities and apply them in a projected gradient descent algorithm. For the gradient-free approach, we propose two distinct algorithms: a local search algorithm called the method of local variations and a Bayesian optimization algorithm using Gaussian processes. To illustrate the effectiveness and differences of the methods, we provide two-dimensional design optimization examples using all three proposed algorithms.
VMAF Re-implementation on PyTorch: Some Experimental Results
Authors: Authors: Kirill Aistov, Maxim Koroteev
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Based on the standard VMAF implementation we propose an implementation of VMAF using PyTorch framework. For this implementation comparisons with the standard (libvmaf) show the discrepancy $\lesssim 10^{-2}$ in VMAF units. We investigate gradients computation when using VMAF as an objective function and demonstrate that training using this function does not result in ill-behaving gradients.
Deep ReLU neural networks overcome the curse of dimensionality in the numerical approximation of semilinear partial integro-differential equations
Authors: Authors: Ariel Neufeld, Tuan Anh Nguyen, Sizhou Wu
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP); Probability (math.PR)
Abstract
We prove that deep neural networks with ReLU activation function are capable of approximating solutions of semilinear partial integro-differential equations in the case of gradient-independent and Lipschitz-continuous nonlinearities, while the required number of parameters in the neural networks grows at most polynomially in both the dimension $ d\in\mathbb{N} $ and the reciprocal of the prescribed accuracy $ \epsilon $.
Momentum Gradient-based Untargeted Attack on Hypergraph Neural Networks
Authors: Authors: Yang Chen, Stjepan Picek, Zhonglin Ye, Zhaoyang Wang, Haixing Zhao
Abstract
Hypergraph Neural Networks (HGNNs) have been successfully applied in various hypergraph-related tasks due to their excellent higher-order representation capabilities. Recent works have shown that deep learning models are vulnerable to adversarial attacks. Most studies on graph adversarial attacks have focused on Graph Neural Networks (GNNs), and the study of adversarial attacks on HGNNs remains largely unexplored. In this paper, we try to reduce this gap. We design a new HGNNs attack model for the untargeted attack, namely MGHGA, which focuses on modifying node features. We consider the process of HGNNs training and use a surrogate model to implement the attack before hypergraph modeling. Specifically, MGHGA consists of two parts: feature selection and feature modification. We use a momentum gradient mechanism to choose the attack node features in the feature selection module. In the feature modification module, we use two feature generation approaches (direct modification and sign gradient) to enable MGHGA to be employed on discrete and continuous datasets. We conduct extensive experiments on five benchmark datasets to validate the attack performance of MGHGA in the node and the visual object classification tasks. The results show that MGHGA improves performance by an average of 2% compared to the than the baselines.
Abstract
Image style transfer is a challenging task in computational vision. Existing algorithms transfer the color and texture of style images by controlling the neural network's feature layers. However, they fail to control the strength of textures in different regions of the content image. To address this issue, we propose a training method that uses a loss function to constrain the style intensity in different regions. This method guides the transfer strength of style features in different regions based on the gradient relationship between style and content images. Additionally, we introduce a novel feature fusion method that linearly transforms content features to resemble style features while preserving their semantic relationships. Extensive experiments have demonstrated the effectiveness of our proposed approach.
Query-adaptive DETR for Crowded Pedestrian Detection
Authors: Authors: Feng Gao, Jiaxu Leng, Ji Gan, Xinbo Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
DEtection TRansformer (DETR) and its variants (DETRs) have been successfully applied to crowded pedestrian detection, which achieved promising performance. However, we find that, in different degrees of crowded scenes, the number of DETRs' queries must be adjusted manually, otherwise, the performance would degrade to varying degrees. In this paper, we first analyze the two current query generation methods and summarize four guidelines for designing the adaptive query generation method. Then, we propose Rank-based Adaptive Query Generation (RAQG) to alleviate the problem. Specifically, we design a rank prediction head that can predict the rank of the lowest confidence positive training sample produced by the encoder. Based on the predicted rank, we design an adaptive selection method that can adaptively select coarse detection results produced by the encoder to generate queries. Moreover, to train the rank prediction head better, we propose Soft Gradient L1 Loss. The gradient of Soft Gradient L1 Loss is continuous, which can describe the relationship between the loss value and the updated value of model parameters granularly. Our method is simple and effective, which can be plugged into any DETRs to make it query-adaptive in theory. The experimental results on Crowdhuman dataset and Citypersons dataset show that our method can adaptively generate queries for DETRs and achieve competitive results. Especially, our method achieves state-of-the-art 39.4% MR on Crowdhuman dataset.
Gradient-Based Eigenvalue Optimization for Electromagnetic Cavities with Built-in Mode Matching
Authors: Authors: Anna Ziegler, Robert Hahn, Victoria Isensee, Anh Duc Nguyen, Sebastian Schöps
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
Shape optimization with respect to eigenvalues of a cavity plays an important role in the design of new resonators or in the optimization of existing ones. In our paper, we propose a gradient-based optimization scheme, which we enhance with closed-form shape derivatives of the system matrices. Based on these, we can compute accurate derivatives of eigenvalues, eigenmodes and the cost function with respect to the geometry, which significantly reduces the computational effort of the optimizer. We demonstrate our work by applying it to the 9-cell TESLA cavity, for which we tune the design parameters of the computational model to match the design criteria for devices in realistic use cases. Since eigenvalues may cross during the shape optimization of a cavity, we propose a new algorithm based on an eigenvalue matching procedure, to ensure the optimization of the desired mode in order to also enable successful matching along large shape variations.
Mitigate Domain Shift by Primary-Auxiliary Objectives Association for Generalizing Person ReID
Authors: Authors: Qilei Li, Shaogang Gong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
While deep learning has significantly improved ReID model accuracy under the independent and identical distribution (IID) assumption, it has also become clear that such models degrade notably when applied to an unseen novel domain due to unpredictable/unknown domain shift. Contemporary domain generalization (DG) ReID models struggle in learning domain-invariant representation solely through training on an instance classification objective. We consider that a deep learning model is heavily influenced and therefore biased towards domain-specific characteristics, e.g., background clutter, scale and viewpoint variations, limiting the generalizability of the learned model, and hypothesize that the pedestrians are domain invariant owning they share the same structural characteristics. To enable the ReID model to be less domain-specific from these pure pedestrians, we introduce a method that guides model learning of the primary ReID instance classification objective by a concurrent auxiliary learning objective on weakly labeled pedestrian saliency detection. To solve the problem of conflicting optimization criteria in the model parameter space between the two learning objectives, we introduce a Primary-Auxiliary Objectives Association (PAOA) mechanism to calibrate the loss gradients of the auxiliary task towards the primary learning task gradients. Benefiting from the harmonious multitask learning design, our model can be extended with the recent test-time diagram to form the PAOA+, which performs on-the-fly optimization against the auxiliary objective in order to maximize the model's generative capacity in the test target domain. Experiments demonstrate the superiority of the proposed PAOA model.
Contrastive Learning-based Sentence Encoders Implicitly Weight Informative Words
Abstract
The performance of sentence encoders can be significantly improved through the simple practice of fine-tuning using contrastive loss. A natural question arises: what characteristics do models acquire during contrastive learning? This paper theoretically and experimentally shows that contrastive-based sentence encoders implicitly weight words based on information-theoretic quantities; that is, more informative words receive greater weight, while others receive less. The theory states that, in the lower bound of the optimal value of the contrastive learning objective, the norm of word embedding reflects the information gain associated with the distribution of surrounding words. We also conduct comprehensive experiments using various models, multiple datasets, two methods to measure the implicit weighting of models (Integrated Gradients and SHAP), and two information-theoretic quantities (information gain and self-information). The results provide empirical evidence that contrastive fine-tuning emphasizes informative words.
Comparative Review of Unscented Kalman Filter Design for Agricultural Anaerobic Digestion Model
Authors: Authors: Simon Hellmann, Terrance Wilms, Stefan Streif, Sören Weinrich
Abstract
Dynamic operation of biological processes, such as anaerobic digestion (AD), requires reliable process monitoring to guarantee stable operating conditions at all times. Unscented Kalman filters (UKF) are an established tool for nonlinear state estimation, and there exist numerous variants of UKF implementations, treating state constraints, improvements of numerical performance and different noise scenarios. So far, however, a unified comparison of proposed methods emphasizing the algorithmic details is lacking. The present study thus examines multiple unconstrained and constrained UKF variants, addresses aspects crucial for direct implementation and applies them to a simplified AD model. The constrained UKF considering additive noise delivered the most accurate state estimations. The long run time of the underlying optimization could be vastly reduced through pre-calculated gradients and Hessian of the associated cost function, as well as by reformulation of the cost function as a quadratic program. However, unconstrained UKF variants showed lower run times and competitive estimation accuracy. This study provides useful advice to practitioners working with nonlinear Kalman filters by paying close attention to algorithmic details and modifications crucial for successful implementation.
Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization
Authors: Authors: Zhen Qin, Zhishuai Liu, Pan Xu
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
signSGD is popular in nonconvex optimization due to its communication efficiency. Yet, existing analyses of signSGD rely on assuming that data are sampled with replacement in each iteration, contradicting the practical implementation where data are randomly reshuffled and sequentially fed into the algorithm. We bridge this gap by proving the first convergence result of signSGD with random reshuffling (SignRR) for nonconvex optimization. Given the dataset size $n$, the number of epochs of data passes $T$, and the variance bound of a stochastic gradient $\sigma^2$, we show that SignRR has the same convergence rate $O(\log(nT)/\sqrt{nT} + |\sigma|_1)$ as signSGD \citep{bernstein2018signsgd}. We then present SignRVR and SignRVM, which leverage variance-reduced gradients and momentum updates respectively, both converging at $O(\log(nT)/\sqrt{nT})$. In contrast with the analysis of signSGD, our results do not require an extremely large batch size in each iteration to be of the same order as the total number of iterations \citep{bernstein2018signsgd} or the signs of stochastic and true gradients match element-wise with a minimum probability of 1/2 \citep{safaryan2021stochastic}. We also extend our algorithms to cases where data are distributed across different machines, yielding dist-SignRVR and dist-SignRVM, both converging at $O(\log(n_0T)/\sqrt{n_0T})$, where $n_0$ is the dataset size of a single machine. We back up our theoretical findings through experiments on simulated and real-world problems, verifying that randomly reshuffled sign methods match or surpass existing baselines.
Keyword: sgd
Privacy Amplification for Matrix Mechanisms
Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend Existing Ones?
Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization
Keyword: optimization
Fast Path Planning for Autonomous Vehicle Parking with Safety-Guarantee using Hamilton-Jacobi Reachability
Application of deep and reinforcement learning to boundary control problems
Neural Multi-Objective Combinatorial Optimization with Diversity Enhancement
Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization
Triple Simplex Matrix Completion for Expense Forecasting
Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency
DoGE: Domain Reweighting with Generalization Estimation
Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach
Sensor Attacks and Resilient Defense on HVAC Systems for Energy Market Signal Tracking
Diverse Conventions for Human-AI Collaboration
Fractal Landscapes in Policy Optimization
Nested Control Co-design of a Spar Buoy Horizontal-axis Floating Offshore Wind Turbine
Robust Representation Learning for Unified Online Top-K Recommendation
AMG: Automated Efficient Approximate Multiplier Generator for FPGAs via Bayesian Optimization
Topology Optimization with Text-Guided Stylization
Optimization of process parameters in additive manufacturing based on the finite element method
Capacity-based Spatial Modulation Constellation and Pre-scaling Design
Accelerating Split Federated Learning over Wireless Communication Networks
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression
Deceptive Fairness Attacks on Graphs via Meta Learning
Physics-Informed with Power-Enhanced Residual Network for Interpolation and Inverse Problems
GNeSF: Generalizable Neural Semantic Fields
Gradient-Based Eigenvalue Optimization for Electromagnetic Cavities with Built-in Mode Matching
Improving generalization in large language models by learning prefix subspaces
Unnatural language processing: How do language models handle machine-generated prompts?
Metric Clustering and MST with Strong and Weak Distance Oracles
Beam Design and Signal Enhancement in RIS-Aided Multi-User Communication Systems: A Maximization-Minimization Approach
Mitigate Domain Shift by Primary-Auxiliary Objectives Association for Generalizing Person ReID
GO-FEAP: Global Optimal UAV Planner Using Frontier-Omission-Aware Exploration and Altitude-Stratified Planning
Comparative Review of Unscented Kalman Filter Design for Agricultural Anaerobic Digestion Model
Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization
An Iterative Approach to Data-Driven Inference for Decoding Oscillator Network Structures
White-box Compiler Fuzzing Empowered by Large Language Models
Keyword: adam
There is no result
Keyword: gradient
Application of deep and reinforcement learning to boundary control problems
GradSim: Gradient-Based Language Grouping for Effective Multilingual Training
ADMM Training Algorithms for Residual Networks: Convergence, Complexity and Parallel Training
Deep Integrated Explanations
DoGE: Domain Reweighting with Generalization Estimation
Fractal Landscapes in Policy Optimization
Private Learning with Public Features
Optimization of process parameters in additive manufacturing based on the finite element method
VMAF Re-implementation on PyTorch: Some Experimental Results
Deep ReLU neural networks overcome the curse of dimensionality in the numerical approximation of semilinear partial integro-differential equations
Momentum Gradient-based Untargeted Attack on Hypergraph Neural Networks
Region-controlled Style Transfer
Query-adaptive DETR for Crowded Pedestrian Detection
Gradient-Based Eigenvalue Optimization for Electromagnetic Cavities with Built-in Mode Matching
Mitigate Domain Shift by Primary-Auxiliary Objectives Association for Generalizing Person ReID
Contrastive Learning-based Sentence Encoders Implicitly Weight Informative Words
Comparative Review of Unscented Kalman Filter Design for Agricultural Anaerobic Digestion Model
Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization
Keyword: super-resolution
There is no result