New submissions for Tue, 3 Oct 23

Keyword: sgd

On Memorization and Privacy risks of Sharpness Aware Minimization

Authors: Authors: Young In Kim, Pratiksha Agrawal, Johannes O. Royset, Rajiv Khanna
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.00488
Pdf link: https://arxiv.org/pdf/2310.00488
Abstract In many recent works, there is an increased focus on designing algorithms that seek flatter optima for neural network loss optimization as there is empirical evidence that it leads to better generalization performance in many datasets. In this work, we dissect these performance gains through the lens of data memorization in overparameterized models. We define a new metric that helps us identify which data points specifically do algorithms seeking flatter optima do better when compared to vanilla SGD. We find that the generalization gains achieved by Sharpness Aware Minimization (SAM) are particularly pronounced for atypical data points, which necessitate memorization. This insight helps us unearth higher privacy risks associated with SAM, which we verify through exhaustive empirical evaluations. Finally, we propose mitigation strategies to achieve a more desirable accuracy vs privacy tradeoff.
The Noise Geometry of Stochastic Gradient Descent: A Quantitative and Analytical Characterization
Authors: Authors: Mingze Wang, Lei Wu
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.00692
Pdf link: https://arxiv.org/pdf/2310.00692
Abstract Empirical studies have demonstrated that the noise in stochastic gradient descent (SGD) aligns favorably with the local geometry of loss landscape. However, theoretical and quantitative explanations for this phenomenon remain sparse. In this paper, we offer a comprehensive theoretical investigation into the aforementioned {\em noise geometry} for over-parameterized linear (OLMs) models and two-layer neural networks. We scrutinize both average and directional alignments, paying special attention to how factors like sample size and input data degeneracy affect the alignment strength. As a specific application, we leverage our noise geometry characterizations to study how SGD escapes from sharp minima, revealing that the escape direction has significant components along flat directions. This is in stark contrast to GD, which escapes only along the sharpest directions. To substantiate our theoretical findings, both synthetic and real-world experiments are provided.
Efficient Algorithms for the CCA Family: Unconstrained Objectives with Unbiased Gradients
Authors: Authors: James Chapman, Ana Lawry Aguila, Lennie Wells
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.01012
Pdf link: https://arxiv.org/pdf/2310.01012
Abstract The Canonical Correlation Analysis (CCA) family of methods is foundational in multi-view learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. These methods show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. This speed allows us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 variants. Finally, we not only match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, but also establish the first solid theoretical links to classical CCA, laying the groundwork for future insights.
Stability and Generalization for Minibatch SGD and Local SGD
Authors: Authors: Yunwen Lei, Tao Sun, Mingrui Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01139
Pdf link: https://arxiv.org/pdf/2310.01139
Abstract The increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we pioneer the stability and generalization analysis of minibatch and local SGD to understand their learnability. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. Our stability bounds imply optimistic risk bounds which decay fast under a low noise condition. We show both minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.
Improving Dialogue Management: Quality Datasets vs Models
Authors: Authors: Miguel Ángel Medina-Ramírez, Cayetano Guerra-Artal, Mario Hernández-Tejera
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01339
Pdf link: https://arxiv.org/pdf/2310.01339
Abstract Task-oriented dialogue systems (TODS) have become crucial for users to interact with machines and computers using natural language. One of its key components is the dialogue manager, which guides the conversation towards a good goal for the user by providing the best possible response. Previous works have proposed rule-based systems (RBS), reinforcement learning (RL), and supervised learning (SL) as solutions for the correct dialogue management; in other words, select the best response given input by the user. However, this work argues that the leading cause of DMs not achieving maximum performance resides in the quality of the datasets rather than the models employed thus far; this means that dataset errors, like mislabeling, originate a large percentage of failures in dialogue management. We studied the main errors in the most widely used datasets, Multiwoz 2.1 and SGD, to demonstrate this hypothesis. To do this, we have designed a synthetic dialogue generator to fully control the amount and type of errors introduced in the dataset. Using this generator, we demonstrated that errors in the datasets contribute proportionally to the performance of the models
Keyword: optimization

Low-budget Black-box Optimization Algorithms Evaluated on BBOB and OpenAI Gym
Authors: Authors: Elena Raponi, Nathanael Rakotonirina Carraz, Jérémy Rapin, Carola Doerr, Olivier Teytaud
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.00077
Pdf link: https://arxiv.org/pdf/2310.00077
Abstract The growing ubiquity of machine learning (ML) has led it to enter various areas of computer science, including black-box optimization (BBO). Recent research is particularly concerned with Bayesian optimization (BO). BO-based algorithms are popular in the ML community, as they are used for hyperparameter optimization and more generally for algorithm configuration. However, their efficiency decreases as the dimensionality of the problem and the budget of evaluations increase. Meanwhile, derivative-free optimization methods have evolved independently in the optimization community. Therefore, we urge to understand whether cross-fertilization is possible between the two communities, ML and BBO, i.e., whether algorithms that are heavily used in ML also work well in BBO and vice versa. Comparative experiments often involve rather small benchmarks and show visible problems in the experimental setup, such as poor initialization of baselines, overfitting due to problem-specific setting of hyperparameters, and low statistical significance. With this paper, we update and extend a comparative study presented by Hutter et al. in 2013. We compare BBO tools for ML with more classical heuristics, first on the well-known BBOB benchmark suite from the COCO environment and then on Direct Policy Search for OpenAI Gym, a reinforcement learning benchmark. Our results confirm that BO-based optimizers perform well on both benchmarks when budgets are limited, albeit with a higher computational cost, while they are often outperformed by algorithms from other families when the evaluation budget becomes larger. We also show that some algorithms from the BBO community perform surprisingly well on ML tasks.
Voice2Action: Language Models as Agent for Efficient Real-Time Interaction in Virtual Reality
Authors: Authors: Yang Su
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2310.00092
Pdf link: https://arxiv.org/pdf/2310.00092
Abstract Large Language Models (LLMs) are trained and aligned to follow natural language instructions with only a handful of examples, and they are prompted as task-driven autonomous agents to adapt to various sources of execution environments. However, deploying agent LLMs in virtual reality (VR) has been challenging due to the lack of efficiency in online interactions and the complex manipulation categories in 3D environments. In this work, we propose Voice2Action, a framework that hierarchically analyzes customized voice signals and textual commands through action and entity extraction and divides the execution tasks into canonical interaction subsets in real-time with error prevention from environment feedback. Experiment results in an urban engineering VR environment with synthetic instruction data show that Voice2Action can perform more efficiently and accurately than approaches without optimizations.
Certified Robustness via Dynamic Margin Maximization and Improved Lipschitz Regularization
Authors: Authors: Mahyar Fazlyab, Taha Entesari, Aniket Roy, Rama Chellappa
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.00116
Pdf link: https://arxiv.org/pdf/2310.00116
Abstract To improve the robustness of deep classifiers against adversarial perturbations, many approaches have been proposed, such as designing new architectures with better robustness properties (e.g., Lipschitz-capped networks), or modifying the training process itself (e.g., min-max optimization, constrained learning, or regularization). These approaches, however, might not be effective at increasing the margin in the input (feature) space. As a result, there has been an increasing interest in developing training procedures that can directly manipulate the decision boundary in the input space. In this paper, we build upon recent developments in this category by developing a robust training algorithm whose objective is to increase the margin in the output (logit) space while regularizing the Lipschitz constant of the model along vulnerable directions. We show that these two objectives can directly promote larger margins in the input space. To this end, we develop a scalable method for calculating guaranteed differentiable upper bounds on the Lipschitz constant of neural networks accurately and efficiently. The relative accuracy of the bounds prevents excessive regularization and allows for more direct manipulation of the decision boundary. Furthermore, our Lipschitz bounding algorithm exploits the monotonicity and Lipschitz continuity of the activation layers, and the resulting bounds can be used to design new layers with controllable bounds on their Lipschitz constant. Experiments on the MNIST, CIFAR-10, and Tiny-ImageNet data sets verify that our proposed algorithm obtains competitively improved results compared to the state-of-the-art.
On the Disconnect Between Theory and Practice of Overparametrized Neural Networks
Authors: Authors: Jonathan Wenger, Felix Dangel, Agustinus Kristiadi
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.00137
Pdf link: https://arxiv.org/pdf/2310.00137
Abstract The infinite-width limit of neural networks (NNs) has garnered significant attention as a theoretical framework for analyzing the behavior of large-scale, overparametrized networks. By approaching infinite width, NNs effectively converge to a linear model with features characterized by the neural tangent kernel (NTK). This establishes a connection between NNs and kernel methods, the latter of which are well understood. Based on this link, theoretical benefits and algorithmic improvements have been hypothesized and empirically demonstrated in synthetic architectures. These advantages include faster optimization, reliable uncertainty quantification and improved continual learning. However, current results quantifying the rate of convergence to the kernel regime suggest that exploiting these benefits requires architectures that are orders of magnitude wider than they are deep. This assumption raises concerns that practically relevant architectures do not exhibit behavior as predicted via the NTK. In this work, we empirically investigate whether the limiting regime either describes the behavior of large-width architectures used in practice or is informative for algorithmic improvements. Our empirical results demonstrate that this is not the case in optimization, uncertainty quantification or continual learning. This observed disconnect between theory and practice calls into question the practical relevance of the infinite-width limit.
3D Reconstruction in Noisy Agricultural Environments: A Bayesian Optimization Perspective for View Planning
Authors: Authors: Athanasios Bacharis, Konstantinos D. Polyzos, Henry J. Nelson, Georgios B. Giannakis, Nikolaos Papanikolopoulos
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.00145
Pdf link: https://arxiv.org/pdf/2310.00145
Abstract 3D reconstruction is a fundamental task in robotics that gained attention due to its major impact in a wide variety of practical settings, including agriculture, underwater, and urban environments. An important approach for this task, known as view planning, is to judiciously place a number of cameras in positions that maximize the visual information improving the resulting 3D reconstruction. Circumventing the need for a large number of arbitrary images, geometric criteria can be applied to select fewer yet more informative images to markedly improve the 3D reconstruction performance. Nonetheless, incorporating the noise of the environment that exists in various real-world scenarios into these criteria may be challenging, particularly when prior information about the noise is not provided. To that end, this work advocates a novel geometric function that accounts for the existing noise, relying solely on a relatively small number of noise realizations without requiring its closed-form expression. With no analytic expression of the geometric function, this work puts forth a Bayesian optimization algorithm for accurate 3D reconstruction in the presence of noise. Numerical tests on noisy agricultural environments showcase the impressive merits of the proposed approach for 3D reconstruction with even a small number of available cameras.
Primal-Dual Continual Learning: Stability and Plasticity through Lagrange Multipliers
Authors: Authors: Juan Elenter, Navid NaderiAlizadeh, Tara Javidi, Alejandro Ribeiro
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.00154
Pdf link: https://arxiv.org/pdf/2310.00154
Abstract Continual learning is inherently a constrained learning problem. The goal is to learn a predictor under a \emph{no-forgetting} requirement. Although several prior studies formulate it as such, they do not solve the constrained problem explicitly. In this work, we show that it is both possible and beneficial to undertake the constrained optimization problem directly. To do this, we leverage recent results in constrained learning through Lagrangian duality. We focus on memory-based methods, where a small subset of samples from previous tasks can be stored in a replay buffer. In this setting, we analyze two versions of the continual learning problem: a coarse approach with constraints at the task level and a fine approach with constraints at the sample level. We show that dual variables indicate the sensitivity of the optimal value with respect to constraint perturbations. We then leverage this result to partition the buffer in the coarse approach, allocating more resources to harder tasks, and to populate the buffer in the fine approach, including only impactful samples. We derive sub-optimality bounds, and empirically corroborate our theoretical results in various continual learning benchmarks. We also discuss the limitations of these methods with respect to the amount of memory available and the number of constraints involved in the optimization problem.
LQ-OCP: Energy-Optimal Control for LQ Problems
Authors: Authors: Logan E. Beaver
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.00168
Pdf link: https://arxiv.org/pdf/2310.00168
Abstract This article presents a method to automatically generate energy-optimal trajectories for systems with linear dynamics, linear constraints, and a quadratic cost functional (LQ systems). First, using recent advancements in optimal control, we derive the optimal motion primitive generator for LQ systems--this yields linear differential equations that describe all dynamical motion primitives that the optimal system follows. We also derive the optimality conditions where the system switches between motion primitives--a system of equations that are bilinear in the unknown junction time. Finally, we demonstrate the performance of our approach on an energy-minimizing submersible robot with state and control constraints. We compare our approach to an energy-optimizing Linear Quadratic Regulator (LQR), where we learn the optimal weights of the LQR cost function to minimize energy consumption while ensuring convergence and constraint satisfaction. Our approach converges to the optimal solution 6,400% faster than the LQR weight optimization, and that our solution is 350% more energy efficient. Finally, we disturb the initial state of the submersible to show that our approach still finds energy-efficient solutions faster than LQR when the unconstrained solution is infeasible.
Degree Distribution Identifiability of Stochastic Kronecker Graphs
Authors: Authors: Daniel Alabi, Dimitris Kalimeris
Subjects: Data Structures and Algorithms (cs.DS); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2310.00171
Pdf link: https://arxiv.org/pdf/2310.00171
Abstract Large-scale analysis of the distributions of the network graphs observed in naturally-occurring phenomena has revealed that the degrees of such graphs follow a power-law or lognormal distribution. Seshadhri, Pinar, and Kolda (J. ACM, 2013) proved that stochastic Kronecker graph (SKG) models cannot generate graphs with degree distribution that follows a power-law or lognormal distribution. As a result, variants of the SKG model have been proposed to generate graphs which approximately follow degree distributions, without any significant oscillations. However, all existing solutions either require significant additional parameterization or have no provable guarantees on the degree distribution. -- In this work, we present statistical and computational identifiability notions which imply the separation of SKG models. Specifically, we prove that SKG models in different identifiability classes can be separated by the existence of isolated vertices and connected components in their corresponding generated graphs. This could explain the large (i.e., $>50\%$) fraction of isolated vertices in some popular graph generation benchmarks. -- We present and analyze an efficient algorithm that can get rid of oscillations in the degree distribution by mixing seeds of relative prime dimensions. For an initial $2\times 1$ and $2\times 2$ seed, a crucial subroutine of this algorithm solves a degree-2 and degree-4 optimization problem in the variables of the initial seed, respectively. We generalize this approach to solving optimization problems for $m\times n$ seeds, for any $m, n\in\mathbb{N}$. -- The use of $3\times 3$ seeds alone cannot get rid of significant oscillations. We prove that such seeds result in degree distribution that is bounded above by an exponential tail and thus cannot result in a power-law or lognormal.
Tight Bounds for Volumetric Spanners and Applications
Authors: Authors: Aditya Bhaskara, Sepideh Mahabadi, Ali Vakilian
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.00175
Pdf link: https://arxiv.org/pdf/2310.00175
Abstract Given a set of points of interest, a volumetric spanner is a subset of the points using which all the points can be expressed using "small" coefficients (measured in an appropriate norm). Formally, given a set of vectors $X = {v_1, v_2, \dots, vn}$, the goal is to find $T \subseteq [n]$ such that every $v \in X$ can be expressed as $\sum{i\in T} \alpha_i v_i$, with $|\alpha|$ being small. This notion, which has also been referred to as a well-conditioned basis, has found several applications, including bandit linear optimization, determinant maximization, and matrix low rank approximation. In this paper, we give almost optimal bounds on the size of volumetric spanners for all $\ell_p$ norms, and show that they can be constructed using a simple local search procedure. We then show the applications of our result to other tasks and in particular the problem of finding coresets for the Minimum Volume Enclosing Ellipsoid (MVEE) problem.
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Authors: Authors: Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.00212
Pdf link: https://arxiv.org/pdf/2310.00212
Abstract Large Language Models (LLMs) can acquire extensive world knowledge through pre-training on large corpora. However, due to exposure to low-quality data, LLMs may exhibit harmful behavior without aligning with human values. The dominant approach for steering LLMs towards beneficial behavior involves Reinforcement Learning with Human Feedback (RLHF), with Proximal Policy Optimization (PPO) serving as the default RL optimizer. Despite its effectiveness, PPO has limitations when optimizing rewards trained from comparison-based loss. Primarily, PPO is not invariant to equivalent reward functions containing identical preference information due to the need to calibrate the reward scale. Additionally, PPO's necessity for token-wise updates introduces complexity in both function approximation and algorithm design compared to trajectory-wise optimization. This paper proposes a new framework, reinforcement learning with relative feedback, and a novel trajectory-wise policy gradient algorithm, Pairwise Proximal Policy Optimization (P3O) that operates directly on comparative rewards. We show theoretically that P3O is invariant to equivalent rewards and avoids the complexity of PPO. Empirical evaluations demonstrate that P3O outperforms PPO in the KL-Reward trade-off and can align with human preferences as well as or better than prior methods. In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.
Bridging the Gap Between Foundation Models and Heterogeneous Federated Learning
Authors: Authors: Sixing Yu, J. Pablo Muñoz, Ali Jannesari
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.00247
Pdf link: https://arxiv.org/pdf/2310.00247
Abstract Federated learning (FL) offers privacy-preserving decentralized machine learning, optimizing models at edge clients without sharing private data. Simultaneously, foundation models (FMs) have gained traction in the artificial intelligence (AI) community due to their exceptional performance across various tasks. However, integrating FMs into FL presents challenges, primarily due to their substantial size and intensive resource requirements. This is especially true when considering the resource heterogeneity in edge FL systems. We present an adaptive framework for Resource-aware Federated Foundation Models (RaFFM) to address these challenges. RaFFM introduces specialized model compression algorithms tailored for FL scenarios, such as salient parameter prioritization and high-performance subnetwork extraction. These algorithms enable dynamic scaling of given transformer-based FMs to fit heterogeneous resource constraints at the network edge during both FL's optimization and deployment stages. Experimental results demonstrate that RaFFM shows significant superiority in resource utilization efficiency and uses fewer resources to deploy FMs to FL. Despite the lower resource consumption, target models optimized by RaFFM achieve performance on par with traditional FL methods applied to full-sized FMs. This is evident across tasks in both natural language processing and computer vision domains.
A bibliometric Analysis on Spectrum Sensing in Wireless Networks
Authors: Authors: Nyashadzashe Tamuka, Khulumani Sibanda
Subjects: Networking and Internet Architecture (cs.NI); Digital Libraries (cs.DL)
Arxiv link: https://arxiv.org/abs/2310.00278
Pdf link: https://arxiv.org/pdf/2310.00278
Abstract Spectrum scarcity is a prevalent problem in wireless networks due to the strict allotment of the spectrum (frequency bands) to licensed users by network regulatory bodies. Such an operation implies that the unlicensed users (secondary wireless spectrum users) have to evacuate the spectrum when the primary wireless spectrum users (licensed users) are utilizing the frequency bands to avoid interference. Cognitive radio alleviates the spectrum shortage by detecting unoccupied frequency bands. This reduces the underutilization of frequency bands in wireless networks. There have been numerous related studies on spectrum sensing, however, few studies have conducted a bibliometric analysis on this subject. The goal of this study was to conduct a bibliometric analysis on the optimization of spectrum sensing. The PRISMA methodology was the basis for the bibliometric analysis to identify the limitations of the existing spectrum sensing techniques. The findings revealed that various machine learning or hybrid models outperformed the traditional techniques such as matched filter and energy detectors at the lowest signal to noise ratio (SNR). SNR is the ratio of the desired signal magnitude to the background noise magnitude. This study, therefore, recommends researchers propose alternative techniques to optimize (improve) spectrum sensing in wireless networks. More work should be done to develop models that optimize spectrum sensing at low SNR.
RIS-aided Near-Field MIMO Communications: Codebook and Beam Training Design
Authors: Authors: Suyu Lv, Yuanwei Liu, Xiaodong Xu, Arumugam Nallanathan, A. Lee Swindlehurst
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.00294
Pdf link: https://arxiv.org/pdf/2310.00294
Abstract Downlink reconfigurable intelligent surface (RIS)-assisted multi-input-multi-output (MIMO) systems are considered with far-field, near-field, and hybrid-far-near-field channels. According to the angular or distance information contained in the received signals, 1) a distance-based codebook is designed for near-field MIMO channels, based on which a hierarchical beam training scheme is proposed to reduce the training overhead; 2) a combined angular-distance codebook is designed for mixed-far-near-field MIMO channels, based on which a two-stage beam training scheme is proposed to achieve alignment in the angular and distance domains separately. For maximizing the achievable rate while reducing the complexity, an alternating optimization algorithm is proposed to carry out the joint optimization iteratively. Specifically, the RIS coefficient matrix is optimized through the beam training process, the optimal combining matrix is obtained from the closed-form solution for the mean square error (MSE) minimization problem, and the active beamforming matrix is optimized by exploiting the relationship between the achievable rate and MSE. Numerical results reveal that: 1) the proposed beam training schemes achieve near-optimal performance with a significantly decreased training overhead; 2) compared to the angular-only far-field channel model, taking the additional distance information into consideration will effectively improve the achievable rate when carrying out beam design for near-field communications.
Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models
Authors: Authors: Chengdong Ma, Ziran Yang, Minquan Gao, Hai Ci, Jun Gao, Xuehai Pan, Yaodong Yang
Subjects: Computation and Language (cs.CL); Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2310.00322
Pdf link: https://arxiv.org/pdf/2310.00322
Abstract Deployable Large Language Models (LLMs) must conform to the criterion of helpfulness and harmlessness, thereby achieving consistency between LLMs outputs and human values. Red-teaming techniques constitute a critical way towards this criterion. Existing work rely solely on manual red team designs and heuristic adversarial prompts for vulnerability detection and optimization. These approaches lack rigorous mathematical formulation, thus limiting the exploration of diverse attack strategy within quantifiable measure and optimization of LLMs under convergence guarantees. In this paper, we present Red-teaming Game (RTG), a general game-theoretic framework without manual annotation. RTG is designed for analyzing the multi-turn attack and defense interactions between Red-team language Models (RLMs) and Blue-team Language Model (BLM). Within the RTG, we propose Gamified Red-teaming Solver (GRTS) with diversity measure of the semantic space. GRTS is an automated red teaming technique to solve RTG towards Nash equilibrium through meta-game analysis, which corresponds to the theoretically guaranteed optimization direction of both RLMs and BLM. Empirical results in multi-turn attacks with RLMs show that GRTS autonomously discovered diverse attack strategies and effectively improved security of LLMs, outperforming existing heuristic red-team designs. Overall, RTG has established a foundational framework for red teaming tasks and constructed a new scalable oversight technique for alignment.
A DSP shared is a DSP earned: HLS Task-Level Multi-Pumping for High-Performance Low-Resource Designs
Authors: Authors: Giovanni Brignone, Mihai T. Lazarescu, Luciano Lavagno
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2310.00330
Pdf link: https://arxiv.org/pdf/2310.00330
Abstract High-level synthesis (HLS) enhances digital hardware design productivity through a high abstraction level. Even if the HLS abstraction prevents fine-grained manual register-transfer level (RTL) optimizations, it also enables automatable optimizations that would be unfeasible or hard to automate at RTL. Specifically, we propose a task-level multi-pumping methodology to reduce resource utilization, particularly digital signal processors (DSPs), while preserving the throughput of HLS kernels modeled as dataflow graphs (DFGs) targeting field-programmable gate arrays. The methodology exploits the HLS resource sharing to automatically insert the logic for reusing the same functional unit for different operations. In addition, it relies on multi-clock DFG s to run the multi-pumped tasks at higher frequencies. The methodology scales the pipeline initiation interval (II) and the clock frequency constraints of resource-intensive tasks by a multi-pumping factor (M). The looser II allows sharing the same resource among M different operations, while the tighter clock frequency preserves the throughput. We verified that our methodology opens a new Pareto front in the throughput and resource space by applying it to open-source HLS designs using state-of-the-art commercial HLS and implementation tools by Xilinx. The multi-pumped designs require up to 40% fewer DSP resources at the same throughput as the original designs optimized for performance (i.e., running at the maximum clock frequency) and achieve up to 50% better throughput using the same DSP s as the original designs optimized for resources with a single clock.
Diffusion Posterior Illumination for Ambiguity-aware Inverse Rendering
Authors: Authors: Linjie Lyu, Ayush Tewari, Marc Habermann, Shunsuke Saito, Michael Zollhöfer, Thomas Leimkühler, Christian Theobalt
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2310.00362
Pdf link: https://arxiv.org/pdf/2310.00362
Abstract Inverse rendering, the process of inferring scene properties from images, is a challenging inverse problem. The task is ill-posed, as many different scene configurations can give rise to the same image. Most existing solutions incorporate priors into the inverse-rendering pipeline to encourage plausible solutions, but they do not consider the inherent ambiguities and the multi-modal distribution of possible decompositions. In this work, we propose a novel scheme that integrates a denoising diffusion probabilistic model pre-trained on natural illumination maps into an optimization framework involving a differentiable path tracer. The proposed method allows sampling from combinations of illumination and spatially-varying surface materials that are, both, natural and explain the image observations. We further conduct an extensive comparative study of different priors on illumination used in previous work on inverse rendering. Our method excels in recovering materials and producing highly realistic and diverse environment map samples that faithfully explain the illumination of the input images.
Composition of Control Barrier Functions With Differing Relative Degrees for Safety Under Input Constraints
Authors: Authors: Pedram Rabiee, Jesse B. Hoagg
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.00363
Pdf link: https://arxiv.org/pdf/2310.00363
Abstract This paper presents a new approach for guaranteed safety subject to input constraints (e.g., actuator limits) using a composition of multiple control barrier functions (CBFs). First, we present a method for constructing a single CBF from multiple CBFs, which can have different relative degrees. This construction relies on a soft minimum function and yields a CBF whose $0$-superlevel set is a subset of the union of the $0$-superlevel sets of all the CBFs used in the construction. Next, we extend the approach to systems with input constraints. Specifically, we introduce control dynamics that allow us to express the input constraints as CBFs in the closed-loop state (i.e., the state of the system and the controller). The CBFs constructed from input constraints do not have the same relative degree as the safety constraints. Thus, the composite soft-minimum CBF construction is used to combine the input-constraint CBFs with the safety-constraint CBFs. Finally, we present a feasible real-time-optimization control that guarantees that the state remains in the $0$-superlevel set of the composite soft-minimum CBF. We demonstrate these approaches on a nonholonomic ground robot example.
Distilling Inductive Bias: Knowledge Distillation Beyond Model Compression
Authors: Authors: Gousia Habib, Tausifa Jan Saleem, Brejesh Lall
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.00369
Pdf link: https://arxiv.org/pdf/2310.00369
Abstract With the rapid development of computer vision, Vision Transformers (ViTs) offer the tantalizing prospect of unified information processing across visual and textual domains. But due to the lack of inherent inductive biases in ViTs, they require enormous amount of data for training. To make their applications practical, we introduce an innovative ensemble-based distillation approach distilling inductive bias from complementary lightweight teacher models. Prior systems relied solely on convolution-based teaching. However, this method incorporates an ensemble of light teachers with different architectural tendencies, such as convolution and involution, to instruct the student transformer jointly. Because of these unique inductive biases, instructors can accumulate a wide range of knowledge, even from readily identifiable stored datasets, which leads to enhanced student performance. Our proposed framework also involves precomputing and storing logits in advance, essentially the unnormalized predictions of the model. This optimization can accelerate the distillation process by eliminating the need for repeated forward passes during knowledge distillation, significantly reducing the computational burden and enhancing efficiency.
Joint Power and 3D Trajectory Optimization for UAV-enabled Wireless Powered Communication Networks with Obstacles
Authors: Authors: Hongyang Pan, Yanheng Liu, Geng Sun, Junsong Fan, Shuang Liang, Chau Yuen
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.00384
Pdf link: https://arxiv.org/pdf/2310.00384
Abstract Unmanned aerial vehicle (UAV)-enabled wireless powered communication networks (WPCNs) are promising technologies in 5G/6G wireless communications, while there are several challenges about UAV power allocation and scheduling to enhance the energy utilization efficiency, considering the existence of obstacles. In this work, we consider a UAV-enabled WPCN scenario that a UAV needs to cover the ground wireless devices (WDs). During the coverage process, the UAV needs to collect data from the WDs and charge them simultaneously. To this end, we formulate a joint-UAV power and three-dimensional (3D) trajectory optimization problem (JUPTTOP) to simultaneously increase the total number of the covered WDs, increase the time efficiency, and reduce the total flying distance of UAV so as to improve the energy utilization efficiency in the network. Due to the difficulties and complexities, we decompose it into two sub optimization problems, which are the UAV power allocation optimization problem (UPAOP) and UAV 3D trajectory optimization problem (UTTOP), respectively. Then, we propose an improved non-dominated sorting genetic algorithm-II with K-means initialization operator and Variable dimension mechanism (NSGA-II-KV) for solving the UPAOP. For UTTOP, we first introduce a pretreatment method, and then use an improved particle swarm optimization with Normal distribution initialization, Genetic mechanism, Differential mechanism and Pursuit operator (PSO-NGDP) to deal with this sub optimization problem. Simulation results verify the effectiveness of the proposed strategies under different scales and settings of the networks.
Order-Preserving GFlowNets
Authors: Authors: Yihang Chen, Lukas Mauch
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.00386
Pdf link: https://arxiv.org/pdf/2310.00386
Abstract Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates with probabilities proportional to a given reward. However, GFlowNets can only be used with a predefined scalar reward, which can be either computationally expensive or not directly accessible, in the case of multi-objective optimization (MOO) tasks for example. Moreover, to prioritize identifying high-reward candidates, the conventional practice is to raise the reward to a higher exponent, the optimal choice of which may vary across different environments. To address these issues, we propose Order-Preserving GFlowNets (OP-GFNs), which sample with probabilities in proportion to a learned reward function that is consistent with a provided (partial) order on the candidates, thus eliminating the need for an explicit formulation of the reward function. We theoretically prove that the training process of OP-GFNs gradually sparsifies the learned reward landscape in single-objective maximization tasks. The sparsification concentrates on candidates of a higher hierarchy in the ordering, ensuring exploration at the beginning and exploitation towards the end of the training. We demonstrate OP-GFN's state-of-the-art performance in single-objective maximization (totally ordered) and multi-objective Pareto front approximation (partially ordered) tasks, including synthetic datasets, molecule generation, and neural architecture search.
Privacy-Preserving Distributed Market Mechanism for Active Distribution Networks
Authors: Authors: Matthias Franke, Ognjen Stanojev, Lesia Mitridati, Gabriela Hug
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.00387
Pdf link: https://arxiv.org/pdf/2310.00387
Abstract Amidst the worldwide efforts to decarbonize power networks, Local Electricity Markets (LEMs) in distribution networks are gaining importance due to the increased adoption of renewable energy sources and prosumers. Considering that LEMs involve data exchange among independent entities, privacy and cybersecurity are some of the main practical challenges in LEM design. This paper proposes a secure market protocol using innovations from distributed optimization and Secure MultiParty Computation (SMPC). The considered LEM is formulated as an uncertainty-aware joint market for energy and reserves with affine balancing policies. To achieve scalability and enable the use of SMPC, market clearing is solved using the Consensus ADMM algorithm. Subsequently, the data exchange among participants via ADMM iterations is protected using the Shamir secret-sharing scheme to ensure privacy. The market protocol is further reinforced by a secure and verifiable settlement process that uses SMPC and ElGamal commitments to verify market quantities and by a secure recovery scheme for missing network measurements. Finally, the feasibility and performance of the proposed LEM are evaluated on a 15-bus test network.
New SDP Roundings and Certifiable Approximation for Cubic Optimization
Authors: Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Lucas Pesenti, Luca Trevisan
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
Arxiv link: https://arxiv.org/abs/2310.00393
Pdf link: https://arxiv.org/pdf/2310.00393
Abstract We give new rounding schemes for SDP relaxations for the problems of maximizing cubic polynomials over the unit sphere and the $n$-dimensional hypercube. In both cases, the resulting algorithms yield a $O(\sqrt{n/k})$ multiplicative approximation in $2^{O(k)} \text{poly}(n)$ time. In particular, we obtain a $O(\sqrt{n/\log n})$ approximation in polynomial time. For the unit sphere, this improves on the rounding algorithms of Bhattiprolu et. al. [BGG+17] that need quasi-polynomial time to obtain a similar approximation guarantee. Over the $n$-dimensional hypercube, our results match the guarantee of a search algorithm of Khot and Naor [KN08] that obtains a similar approximation ratio via techniques from convex geometry. Unlike their method, our algorithm obtains an upper bound on the integrality gap of SDP relaxations for the problem and as a result, also yields a certificate on the optimum value of the input instance. Our results naturally generalize to homogeneous polynomials of higher degree and imply improved algorithms for approximating satisfiable instances of Max-3SAT. Our main motivation is the stark lack of rounding techniques for SDP relaxations of higher degree polynomial optimization in sharp contrast to a rich theory of SDP roundings for the quadratic case. Our rounding algorithms introduce two new ideas: 1) a new polynomial reweighting based method to round sum-of-squares relaxations of higher degree polynomial maximization problems, and 2) a general technique to compress such relaxations down to substantially smaller SDPs by relying on an explicit construction of certain hitting sets. We hope that our work will inspire improved rounding algorithms for polynomial optimization and related problems.
Joint Scheduling and Trajectory Optimization of Charging UAV in Wireless Rechargeable Sensor Networks
Authors: Authors: Yanheng Liu, Hongyang Pan, Geng Sun, Aimin Wang, Jiahui Li, Shuang Liang
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.00396
Pdf link: https://arxiv.org/pdf/2310.00396
Abstract Wireless rechargeable sensor networks with a charging unmanned aerial vehicle (CUAV) have the broad application prospects in the power supply of the rechargeable sensor nodes (SNs). However, how to schedule a CUAV and design the trajectory to improve the charging efficiency of the entire system is still a vital problem. In this paper, we formulate a joint-CUAV scheduling and trajectory optimization problem (JSTOP) to simultaneously minimize the hovering points of CUAV, the number of the repeatedly covered SNs and the flying distance of CUAV for charging all SNs. Due to the complexity of JSTOP, it is decomposed into two optimization subproblems that are CUAV scheduling optimization problem (CSOP) and CUAV trajectory optimization problem (CTOP). CSOP is a hybrid optimization problem that consists of the continuous and discrete solution space, and the solution dimension in CSOP is not fixed since it should be changed with the number of hovering points of CUAV. Moreover, CTOP is a completely discrete optimization problem. Thus, we propose a particle swarm optimization (PSO) with a flexible dimension mechanism, a K-means operator and a punishment-compensation mechanism (PSOFKP) and a PSO with a discretization factor, a 2-opt operator and a path crossover reduction mechanism (PSOD2P) to solve the converted CSOP and CTOP, respectively. Simulation results evaluate the benefits of PSOFKP and PSOD2P under different scales and settings of the network, and the stability of the proposed algorithms is verified.
Better Situational Graphs by Inferring High-level Semantic-Relational Concepts
Authors: Authors: Jose Andres Millan-Romera, Hriday Bavle, Muhammad Shaheer, Martin R. Oswald, Holger Voos, Jose Luis Sanchez-Lopez
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.00401
Pdf link: https://arxiv.org/pdf/2310.00401
Abstract Recent works on SLAM extend their pose graphs with higher-level semantic concepts exploiting relationships between them, to provide, not only a richer representation of the situation/environment but also to improve the accuracy of its estimation. Concretely, our previous work, Situational Graphs (S-Graphs), a pioneer in jointly leveraging semantic relationships in the factor optimization process, relies on semantic entities such as wall surfaces and rooms, whose relationship is mathematically defined. Nevertheless, excerpting these high-level concepts relying exclusively on the lower-level factor-graph remains a challenge and it is currently done with ad-hoc algorithms, which limits its capability to include new semantic-relational concepts. To overcome this limitation, in this work, we propose a Graph Neural Network (GNN) for learning high-level semantic-relational concepts that can be inferred from the low-level factor graph. We have demonstrated that we can infer room entities and their relationship to the mapped wall surfaces, more accurately and more computationally efficient than the baseline algorithm. Additionally, to demonstrate the versatility of our method, we provide a new semantic concept, i.e. wall, and its relationship with its wall surfaces. Our proposed method has been integrated into S-Graphs+, and it has been validated in both simulated and real datasets. A docker container with our software will be made available to the scientific community.
mmWave Beam Selection in Analog Beamforming Using Personalized Federated Learning
Authors: Authors: Martin Isaksson, Filippo Vannella, David Sandberg, Rickard Cöster
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2310.00406
Pdf link: https://arxiv.org/pdf/2310.00406
Abstract Using analog beamforming in mmWave frequency bands we can focus the energy towards a receiver to achieve high throughput. However, this requires the network to quickly find the best downlink beam configuration in the face of non-IID data. We propose a personalized Federated Learning (FL) method to address this challenge, where we learn a mapping between uplink Sub-6GHz channel estimates and the best downlink beam in heterogeneous scenarios with non-IID characteristics. We also devise FedLion, a FL implementation of the Lion optimization algorithm. Our approach reduces the signaling overhead and provides superior performance, up to 33.6% higher accuracy than a single FL model and 6% higher than a local model.
Optimizing Parameters of the DC Power Flow
Authors: Authors: Babak Taheri, Daniel K. Molzahn
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.00447
Pdf link: https://arxiv.org/pdf/2310.00447
Abstract Many power system operation and planning problems use the DC power flow approximation to address computational challenges from the nonlinearity of the AC power flow equations. The DC power flow simplifies the AC power flow equations to a linear form that relates active power flows to phase angle differences across branches, parameterized by coefficients based on the branches' susceptances. Inspired by techniques for training machine learning models, this paper proposes an algorithm that seeks optimal coefficient and bias parameters to improve the DC power flow approximation's accuracy. Specifically, the proposed algorithm selects the coefficient and bias parameter values that minimize the discrepancy, across a specified set of operational scenarios, between the power flows given by the DC approximation and the power flows from the AC equations. Gradient-based optimization methods like Broyden-Fletcher-Goldfarb-Shanno (BFGS), Limited-Memory BFGS (L-BFGS), and Truncated Newton Conjugate-Gradient (TNC) enable solution of the proposed algorithm for large systems. After an off-line training phase, the optimized parameters are used to improve the accuracy of the DC power flow during on-line computations. Numerical results show several orders of magnitude improvements in accuracy relative to a hot-start DC power flow approximation across a range of test cases.
Diff-DOPE: Differentiable Deep Object Pose Estimation
Authors: Authors: Jonathan Tremblay, Bowen Wen, Valts Blukis, Balakumar Sundaralingam, Stephen Tyree, Stan Birchfield
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.00463
Pdf link: https://arxiv.org/pdf/2310.00463
Abstract We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object. The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model. We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation datasets. Our approach is a departure from recent methods in which the pose refiner is a deep neural network trained on a large synthetic dataset to map inputs to refinement steps. Rather, our use of differentiable rendering allows us to avoid training altogether. Our approach performs multiple gradient descent optimizations in parallel with different random learning rates to avoid local minima from symmetric objects, similar appearances, or wrong step size. Various modalities can be used, e.g., RGB, depth, intensity edges, and object segmentation masks. We present experiments examining the effect of various choices, showing that the best results are found when the RGB image is accompanied by an object mask and depth image to guide the optimization process.
On Memorization and Privacy risks of Sharpness Aware Minimization
Authors: Authors: Young In Kim, Pratiksha Agrawal, Johannes O. Royset, Rajiv Khanna
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.00488
Pdf link: https://arxiv.org/pdf/2310.00488
Abstract In many recent works, there is an increased focus on designing algorithms that seek flatter optima for neural network loss optimization as there is empirical evidence that it leads to better generalization performance in many datasets. In this work, we dissect these performance gains through the lens of data memorization in overparameterized models. We define a new metric that helps us identify which data points specifically do algorithms seeking flatter optima do better when compared to vanilla SGD. We find that the generalization gains achieved by Sharpness Aware Minimization (SAM) are particularly pronounced for atypical data points, which necessitate memorization. This insight helps us unearth higher privacy risks associated with SAM, which we verify through exhaustive empirical evaluations. Finally, we propose mitigation strategies to achieve a more desirable accuracy vs privacy tradeoff.
Exploring Benchmarks for Self-Driving Labs using Color Matching
Authors: Authors: Tobias Ginsburg, Kyle Hippe, Ryan Lewis, Doga Ozgulbas, Aileen Cleary, Rory Butler, Casey Stone, Abraham Stroka, Ian Foster
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.00510
Pdf link: https://arxiv.org/pdf/2310.00510
Abstract Self Driving Labs (SDLs) that combine automation of experimental procedures with autonomous decision making are gaining popularity as a means of increasing the throughput of scientific workflows. The task of identifying quantities of supplied colored pigments that match a target color, the color matching problem, provides a simple and flexible SDL test case, as it requires experiment proposal, sample creation, and sample analysis, three common components in autonomous discovery applications. We present a robotic solution to the color matching problem that allows for fully autonomous execution of a color matching protocol. Our solution leverages the WEI science factory platform to enable portability across different robotic hardware, the use of alternative optimization methods for continuous refinement, and automated publication of results for experiment tracking and post-hoc analysis.
Are Graph Neural Networks Optimal Approximation Algorithms?
Authors: Authors: Morris Yau, Eric Lu, Nikolaos Karalias, Jessica Xu, Stefanie Jegelka
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2310.00526
Pdf link: https://arxiv.org/pdf/2310.00526
Abstract In this work we design graph neural network architectures that can be used to obtain optimal approximation algorithms for a large class of combinatorial optimization problems using powerful algorithmic tools from semidefinite programming (SDP). Concretely, we prove that polynomial-sized message passing algorithms can represent the most powerful polynomial time algorithms for Max Constraint Satisfaction Problems assuming the Unique Games Conjecture. We leverage this result to construct efficient graph neural network architectures, OptGNN, that obtain high-quality approximate solutions on landmark combinatorial optimization problems such as Max Cut and maximum independent set. Our approach achieves strong empirical results across a wide range of real-world and synthetic datasets against both neural baselines and classical algorithms. Finally, we take advantage of OptGNN's ability to capture convex relaxations to design an algorithm for producing dual certificates of optimality (bounds on the optimal solution) from the learned embeddings of OptGNN.
A primal-dual perspective for distributed TD-learning
Authors: Authors: Han-Dong Lim, Donghwan Lee
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.00638
Pdf link: https://arxiv.org/pdf/2310.00638
Abstract The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as primal-dual Ordinary differential equation (ODE) dynamics subject to null-space constraints. Based on the exponential convergence behavior of the primal-dual ODE dynamics subject to null-space constraints, we examine the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models. Unlike existing methods, the proposed algorithm does not require the assumption that the underlying communication network structure is characterized by a doubly stochastic matrix.
Fewer is More: Trojan Attacks on Parameter-Efficient Fine-Tuning
Authors: Authors: Lauren Hong (1), Ting Wang (1) ((1) Stony Brook University)
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.00648
Pdf link: https://arxiv.org/pdf/2310.00648
Abstract Parameter-efficient fine-tuning (PEFT) enables efficient adaptation of pre-trained language models (PLMs) to specific tasks. By tuning only a minimal set of (extra) parameters, PEFT achieves performance comparable to full fine-tuning. However, despite its prevalent use, the security implications of PEFT remain largely unexplored. In this paper, we conduct a pilot study revealing that PEFT exhibits unique vulnerability to trojan attacks. Specifically, we present PETA, a novel attack that accounts for downstream adaptation through bilevel optimization: the upper-level objective embeds the backdoor into a PLM while the lower-level objective simulates PEFT to retain the PLM's task-specific performance. With extensive evaluation across a variety of downstream tasks and trigger designs, we demonstrate PETA's effectiveness in terms of both attack success rate and unaffected clean accuracy, even after the victim user performs PEFT over the backdoored PLM using untainted data. Moreover, we empirically provide possible explanations for PETA's efficacy: the bilevel optimization inherently 'orthogonalizes' the backdoor and PEFT modules, thereby retaining the backdoor throughout PEFT. Based on this insight, we explore a simple defense that omits PEFT in selected layers of the backdoored PLM and unfreezes a subset of these layers' parameters, which is shown to effectively neutralize PETA.
Optimization or Architecture: How to Hack Kalman Filtering
Authors: Authors: Ido Greenberg, Netanel Yannay, Shie Mannor
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.00675
Pdf link: https://arxiv.org/pdf/2310.00675
Abstract In non-linear filtering, it is traditional to compare non-linear architectures such as neural networks to the standard linear Kalman Filter (KF). We observe that this mixes the evaluation of two separate components: the non-linear architecture, and the parameters optimization method. In particular, the non-linear model is often optimized, whereas the reference KF model is not. We argue that both should be optimized similarly, and to that end present the Optimized KF (OKF). We demonstrate that the KF may become competitive to neural models - if optimized using OKF. This implies that experimental conclusions of certain previous studies were derived from a flawed process. The advantage of OKF over the standard KF is further studied theoretically and empirically, in a variety of problems. Conveniently, OKF can replace the KF in real-world systems by merely updating the parameters.
Active Implicit Reconstruction Using One-Shot View Planning
Authors: Authors: Hao Hu, Sicong Pan, Liren Jin, Marija Popović, Maren Bennewitz
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.00685
Pdf link: https://arxiv.org/pdf/2310.00685
Abstract Active object reconstruction using autonomous robots is gaining great interest. A primary goal in this task is to maximize the information of the object to be reconstructed, given limited on-board resources. Previous view planning methods exhibit inefficiency since they rely on an iterative paradigm based on explicit representations, consisting of (1) planning a path to the next-best view only; and (2) requiring a considerable number of less-gain views in terms of surface coverage. To address these limitations, we integrated implicit representations into the One-Shot View Planning (OSVP). The key idea behind our approach is to use implicit representations to obtain the small missing surface areas instead of observing them with extra views. Therefore, we design a deep neural network, named OSVP, to directly predict a set of views given a dense point cloud refined from an initial sparse observation. To train our OSVP network, we generate supervision labels using dense point clouds refined by implicit representations and set covering optimization problems. Simulated experiments show that our method achieves sufficient reconstruction quality, outperforming several baselines under limited view and movement budgets. We further demonstrate the applicability of our approach in a real-world object reconstruction scenario.
A Simple Yet Effective Strategy to Robustify the Meta Learning Paradigm
Authors: Authors: Qi Wang, Yiqin Lv, Yanghe Feng, Zheng Xie, Jincai Huang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.00708
Pdf link: https://arxiv.org/pdf/2310.00708
Abstract Meta learning is a promising paradigm to enable skill transfer across tasks. Most previous methods employ the empirical risk minimization principle in optimization. However, the resulting worst fast adaptation to a subset of tasks can be catastrophic in risk-sensitive scenarios. To robustify fast adaptation, this paper optimizes meta learning pipelines from a distributionally robust perspective and meta trains models with the measure of expected tail risk. We take the two-stage strategy as heuristics to solve the robust meta learning problem, controlling the worst fast adaptation cases at a certain probabilistic level. Experimental results show that our simple method can improve the robustness of meta learning to task distributions and reduce the conditional expectation of the worst fast adaptation risk.
Automatic Data Repair: Are We Ready to Deploy?
Authors: Authors: Wei Ni, Xiaoye Miao, Xiangyu Zhao, Yangyang Wu, Jianwei Yin
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2310.00711
Pdf link: https://arxiv.org/pdf/2310.00711
Abstract Data quality is paramount in today's data-driven world, especially in the era of generative AI. Dirty data with errors and inconsistencies usually leads to flawed insights, unreliable decision-making, and biased or low-quality outputs from generative models. The study of repairing erroneous data has gained significant importance. Existing data repair algorithms differ in information utilization, problem settings, and are tested in limited scenarios. In this paper, we initially compare and summarize these algorithms using a new guided information-based taxonomy. We then systematically conduct a comprehensive evaluation of 12 mainstream data repair algorithms under the settings of various data error rates, error types, and downstream analysis tasks, assessing their error reduction performance with a novel metric. Also, we develop an effective and unified repair optimization strategy that substantially benefits the state of the arts, as empirically confirmed. We demonstrate that, the pure clean data may not necessarily yield the best performance in data analysis tasks and data is always worth repairing regardless of error rate. Based on the found observations and insights, we provide some practical guidelines for 5 scenarios and 2 main data analysis tasks. We anticipate this paper enabling researchers and users to well understand and deploy data repair algorithms in practice. Finally, we outline research challenges and promising future directions in the data repair field.
Efficient MPC for Emergency Evasive Maneuvers, Part II: Comparative Assessment for Hybrid Control
Authors: Authors: Leila Gharavi, Bart De Schutter, Simone Baldi
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.00716
Pdf link: https://arxiv.org/pdf/2310.00716
Abstract Optimization-based approaches such as Model Predictive Control (MPC) are promising approaches in proactive control for safety-critical applications with changing environments such as automated driving systems. However, the computational complexity of the MPC optimization problem coupled with the need for real-time control in hazardous scenarios is the main bottleneck in realization of automation levels four and five for driving systems. In this paper, we construct hybrid formulations of the nonlinear MPC problem for tracking control during emergency evasive maneuvers and assess their computational efficiency in terms of accuracy and solution time. To hybridize the MPC problem, we combine three hybrid approximations of the prediction model and four approximations of the nonlinear stability and tire saturation constraints and simulate the closed-loop behavior of the resulting controllers during five emergency maneuvers for different prediction horizons. Further, we compare the robustness of the controllers in the presence of friction uncertainty as well to assess the accuracy-time trade-off in cases where the friction of the road is either unknown or has an offset error with respect to the prediction model. This robustness is studied for different levels of friction uncertainty, as well as investigated with respect to the proximity to the vehicle handling limits. We show that the hybridization of the MPC problem is an efficient approach for real-time implementation of MPC during emergency evasive maneuvers, paving the way for implementation of high levels of automation.
Spectral Neural Networks: Approximation Theory and Optimization Landscape
Authors: Authors: Chenghui Li, Rishi Sonthalia, Nicolas Garcia Trillos
Subjects: Machine Learning (cs.LG); Analysis of PDEs (math.AP); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.00729
Pdf link: https://arxiv.org/pdf/2310.00729
Abstract There is a large variety of machine learning methodologies that are based on the extraction of spectral geometric information from data. However, the implementations of many of these methods often depend on traditional eigensolvers, which present limitations when applied in practical online big data scenarios. To address some of these challenges, researchers have proposed different strategies for training neural networks as alternatives to traditional eigensolvers, with one such approach known as Spectral Neural Network (SNN). In this paper, we investigate key theoretical aspects of SNN. First, we present quantitative insights into the tradeoff between the number of neurons and the amount of spectral geometric information a neural network learns. Second, we initiate a theoretical exploration of the optimization landscape of SNN's objective to shed light on the training dynamics of SNN. Unlike typical studies of convergence to global solutions of NN training dynamics, SNN presents an additional complexity due to its non-convex ambient loss function.
Deterministic Langevin Unconstrained Optimization with Normalizing Flows
Authors: Authors: James M. Sullivan, Uros Seljak
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.00745
Pdf link: https://arxiv.org/pdf/2310.00745
Abstract We introduce a global, gradient-free surrogate optimization strategy for expensive black-box functions inspired by the Fokker-Planck and Langevin equations. These can be written as an optimization problem where the objective is the target function to maximize minus the logarithm of the current density of evaluated samples. This objective balances exploitation of the target objective with exploration of low-density regions. The method, Deterministic Langevin Optimization (DLO), relies on a Normalizing Flow density estimate to perform active learning and select proposal points for evaluation. This strategy differs qualitatively from the widely-used acquisition functions employed by Bayesian Optimization methods, and can accommodate a range of surrogate choices. We demonstrate superior or competitive progress toward objective optima on standard synthetic test functions, as well as on non-convex and multi-modal posteriors of moderate dimension. On real-world objectives, such as scientific and neural network hyperparameter optimization, DLO is competitive with state-of-the-art baselines.
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
Authors: Authors: Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Wenhu Chen, Jie Fu, Junran Peng
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.00746
Pdf link: https://arxiv.org/pdf/2310.00746
Abstract The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters. However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in LLMs. RoleLLM comprises four stages: (1) Role Profile Construction for 100 roles; (2) Context-Based Instruction Generation (Context-Instruct) for role-specific knowledge extraction; (3) Role Prompting using GPT (RoleGPT) for speaking style imitation; and (4) Role-Conditioned Instruction Tuning (RoCIT) for fine-tuning open-source models along with role customization. By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples. Moreover, RoCIT on RoleBench yields RoleLLaMA (English) and RoleGLM (Chinese), significantly enhancing role-playing abilities and even achieving comparable results with RoleGPT (using GPT-4).
SEED: Simple, Efficient, and Effective Data Management via Large Language Models
Authors: Authors: Zui CHen, Lei Cao, Sam Madden, Ju Fan, Nan Tang, Zihui Gu, Zeyuan Shang, Chunwei Liu, Michael Cafarella, Tim Kraska
Subjects: Databases (cs.DB); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.00749
Pdf link: https://arxiv.org/pdf/2310.00749
Abstract We introduce SEED, an LLM-centric system that allows users to easily create efficient, and effective data management applications. SEED comprises three main components: code generation, model generation, and augmented LLM query to address the challenges that LLM services are computationally and economically expensive and do not always work well on all cases for a given data management task. SEED addresses the expense challenge by localizing LLM computation as much as possible. This includes replacing most of LLM calls with local code, local models, and augmenting LLM queries with batching and data access tools, etc. To ensure effectiveness, SEED features a bunch of optimization techniques to enhance the localized solution and the LLM queries, including automatic code validation, code ensemble, model representatives selection, selective tool usages, etc. Moreover, with SEED users are able to easily construct a data management solution customized to their applications. It allows the users to configure each component and compose an execution pipeline in natural language. SEED then automatically compiles it into an executable program. We showcase the efficiency and effectiveness of SEED using diverse data management tasks such as data imputation, NL2SQL translation, etc., achieving state-of-the-art few-shot performance while significantly reducing the number of required LLM calls.
Data-driven adaptive building thermal controller tuning with constraints: A primal-dual contextual Bayesian optimization approach
Authors: Authors: Wenjie Xu, Bratislav Svetozarevic, Loris Di Natale, Philipp Heer, Colin N Jones
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.00758
Pdf link: https://arxiv.org/pdf/2310.00758
Abstract We study the problem of tuning the parameters of a room temperature controller to minimize its energy consumption, subject to the constraint that the daily cumulative thermal discomfort of the occupants is below a given threshold. We formulate it as an online constrained black-box optimization problem where, on each day, we observe some relevant environmental context and adaptively select the controller parameters. In this paper, we propose to use a data-driven Primal-Dual Contextual Bayesian Optimization (PDCBO) approach to solve this problem. In a simulation case study on a single room, we apply our algorithm to tune the parameters of a Proportional Integral (PI) heating controller and the pre-heating time. Our results show that PDCBO can save up to 4.7% energy consumption compared to other state-of-the-art Bayesian optimization-based methods while keeping the daily thermal discomfort below the given tolerable threshold on average. Additionally, PDCBO can automatically track time-varying tolerable thresholds while existing methods fail to do so. We then study an alternative constrained tuning problem where we aim to minimize the thermal discomfort with a given energy budget. With this formulation, PDCBO reduces the average discomfort by up to 63% compared to state-of-the-art safe optimization methods while keeping the average daily energy consumption below the required threshold.
Bayesian Design Principles for Frequentist Sequential Learning
Authors: Authors: Yunbei Xu, Assaf Zeevi
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
Arxiv link: https://arxiv.org/abs/2310.00806
Pdf link: https://arxiv.org/pdf/2310.00806
Abstract We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to generate "algorithmic beliefs" at each round, and use Bayesian posteriors to make decisions. The optimization objective to create "algorithmic beliefs," which we term "Algorithmic Information Ratio," represents an intrinsic complexity measure that effectively characterizes the frequentist regret of any algorithm. To the best of our knowledge, this is the first systematical approach to make Bayesian-type algorithms prior-free and applicable to adversarial settings, in a generic and optimal manner. Moreover, the algorithms are simple and often efficient to implement. As a major application, we present a novel algorithm for multi-armed bandits that achieves the "best-of-all-worlds" empirical performance in the stochastic, adversarial, and non-stationary environments. And we illustrate how these principles can be used in linear bandits, bandit convex optimization, and reinforcement learning.
Parameter-Efficient Tuning Helps Language Model Alignment
Authors: Authors: Tianci Xue, Ziqi Wang, Heng Ji
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.00819
Pdf link: https://arxiv.org/pdf/2310.00819
Abstract Aligning large language models (LLMs) with human preferences is essential for safe and useful LLMs. Previous works mainly adopt reinforcement learning (RLHF) and direct preference optimization (DPO) with human feedback for alignment. Nevertheless, they have certain drawbacks. One such limitation is that they can only align models with one preference at the training time (e.g., they cannot learn to generate concise responses when the preference data prefers detailed responses), or have certain constraints for the data format (e.g., DPO only supports pairwise preference data). To this end, prior works incorporate controllable generations for alignment to make language models learn multiple preferences and provide outputs with different preferences during inference if asked. Controllable generation also offers more flexibility with regard to data format (e.g., it supports pointwise preference data). Specifically, it uses different control tokens for different preferences during training and inference, making LLMs behave differently when required. Current controllable generation methods either use a special token or hand-crafted prompts as control tokens, and optimize them together with LLMs. As control tokens are typically much lighter than LLMs, this optimization strategy may not effectively optimize control tokens. To this end, we first use parameter-efficient tuning (e.g., prompting tuning and low-rank adaptation) to optimize control tokens and then fine-tune models for controllable generations, similar to prior works. Our approach, alignMEnt with parameter-Efficient Tuning (MEET), improves the quality of control tokens, thus improving controllable generation quality consistently by an apparent margin on two well-recognized datasets compared with prior works.
Online Sensitivity Optimization in Differentially Private Learning
Authors: Authors: Filippo Galli, Catuscia Palamidessi, Tommaso Cucinotta
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.00829
Pdf link: https://arxiv.org/pdf/2310.00829
Abstract Training differentially private machine learning models requires constraining an individual's contribution to the optimization process. This is achieved by clipping the $2$-norm of their gradient at a predetermined threshold prior to averaging and batch sanitization. This selection adversely influences optimization in two opposing ways: it either exacerbates the bias due to excessive clipping at lower values, or augments sanitization noise at higher values. The choice significantly hinges on factors such as the dataset, model architecture, and even varies within the same optimization, demanding meticulous tuning usually accomplished through a grid search. In order to circumvent the privacy expenses incurred in hyperparameter tuning, we present a novel approach to dynamically optimize the clipping threshold. We treat this threshold as an additional learnable parameter, establishing a clean relationship between the threshold and the cost function. This allows us to optimize the former with gradient descent, with minimal repercussions on the overall privacy analysis. Our method is thoroughly assessed against alternative fixed and adaptive strategies across diverse datasets, tasks, model dimensions, and privacy levels. Our results demonstrate its comparable or superior performance in all evaluated scenarios, given the same privacy requirements.
Necessary and Sufficient Watermark for Large Language Models
Authors: Authors: Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, Makoto Yamada
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.00833
Pdf link: https://arxiv.org/pdf/2310.00833
Abstract In recent years, large language models (LLMs) have achieved remarkable performances in various NLP tasks. They can generate texts that are indistinguishable from those written by humans. Such remarkable performance of LLMs increases their risk of being used for malicious purposes, such as generating fake news articles. Therefore, it is necessary to develop methods for distinguishing texts written by LLMs from those written by humans. Watermarking is one of the most powerful methods for achieving this. Although existing watermarking methods have successfully detected texts generated by LLMs, they significantly degrade the quality of the generated texts. In this study, we propose the Necessary and Sufficient Watermark (NS-Watermark) for inserting watermarks into generated texts without degrading the text quality. More specifically, we derive minimum constraints required to be imposed on the generated texts to distinguish whether LLMs or humans write the texts. Then, we formulate the NS-Watermark as a constrained optimization problem and propose an efficient algorithm to solve it. Through the experiments, we demonstrate that the NS-Watermark can generate more natural texts than existing watermarking methods and distinguish more accurately between texts written by LLMs and those written by humans. Especially in machine translation tasks, the NS-Watermark can outperform the existing watermarking method by up to 30 BLEU scores.
Regulating CPU Temperature With Thermal-Aware Scheduling Using a Reduced Order Learning Thermal Model
Authors: Authors: Anthony Dowling, Lin Jiang, Ming-Cheng Cheng, Yu Liu
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.00854
Pdf link: https://arxiv.org/pdf/2310.00854
Abstract Modern real-time systems utilize considerable amounts of power while executing computation-intensive tasks. The execution of these tasks leads to significant power dissipation and heating of the device. It therefore results in severe thermal issues like temperature escalation, high thermal gradients, and excessive hot spot formation, which may result in degrading chip performance, accelerating device aging, and premature failure. Thermal-Aware Scheduling (TAS) enables the optimization of thermal dissipation to maintain a safe thermal state. In this work, we implement a new TAS algorithm, POD-TAS, which manages the thermal behavior of the cores based on a defined set of states and their transitions. We compare the performances of a dynamic Resistor-Capacitor (RC) thermal circuit simulator (HotSpot) and a reduced order Proper Orthogonal Decomposition (POD)-based thermal model and we select the latter for use in our POD-TAS algorithm. We implement a novel simulation-based evaluation methodology to compare TAS algorithms. This methodology is used to evaluate the performance of the proposed POD-TAS algorithm with high spatiotemporal resolution. Additionally, we compare the performance of a state of the art TAS algorithm, RT-TAS, to our proposed POD-TAS algorithm. Furthermore, we utilize the Clarkson Open-source Multi-physics Benchmark Suite (COMBS) to provide CPU workloads for task scheduling. Our experimental results on a multi-core processor using a set of 4 benchmarks demonstrate that the proposed POD-TAS method can improve thermal performance by decreasing the peak thermal variance by 53.0% and the peak chip temperature by 29.01%. Using a set of 8 benchmarks, the comparison of the algorithms demonstrates that POD-TAS decreases the peak spatial variance of the chip temperature and the peak chip temperature by 29.57% and 26.26% respectively.
Dynamic Manipulation of a Deformable Linear Object: Simulation and Learning
Authors: Authors: Qi Jing Chen, Timothy Bretl, Nghia Vuong, Quang-Cuong Pham
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.00911
Pdf link: https://arxiv.org/pdf/2310.00911
Abstract We show that it is possible to learn an open-loop policy in simulation for the dynamic manipulation of a deformable linear object (DLO) -- e.g., a rope, wire, or cable -- that can be executed by a real robot without additional training. Our method is enabled by integrating an existing state-of-the-art DLO model (Discrete Elastic Rods) with MuJoCo, a robot simulator. We describe how this integration was done, check that validation results produced in simulation match what we expect from analysis of the physics, and apply policy optimization to train an open-loop policy from data collected only in simulation that uses a robot arm to fling a wire precisely between two obstacles. This policy achieves a success rate of 76.7% when executed by a real robot in hardware experiments without additional training on the real task.
Trained Latent Space Navigation to Prevent Lack of Photorealism in Generated Images on Style-based Models
Authors: Authors: Takumi Harada, Kazuyuki Aihara, Hiroyuki Sakai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.00936
Pdf link: https://arxiv.org/pdf/2310.00936
Abstract Recent studies on StyleGAN variants show promising performances for various generation tasks. In these models, latent codes have traditionally been manipulated and searched for the desired images. However, this approach sometimes suffers from a lack of photorealism in generated images due to a lack of knowledge about the geometry of the trained latent space. In this paper, we show a simple unsupervised method that provides well-trained local latent subspace, enabling latent code navigation while preserving the photorealism of the generated images. Specifically, the method identifies densely mapped latent spaces and restricts latent manipulations within the local latent subspace. Experimental results demonstrate that images generated within the local latent subspace maintain photorealism even when the latent codes are significantly and repeatedly manipulated. Moreover, experiments show that the method can be applied to latent code optimization for various types of style-based models. Our empirical evidence of the method will benefit applications in style-based models.
Multi-Agent Bayesian Optimization with Coupled Black-Box and Affine Constraints
Authors: Authors: Wenjie Xu, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.00962
Pdf link: https://arxiv.org/pdf/2310.00962
Abstract This paper studies the problem of distributed multi-agent Bayesian optimization with both coupled black-box constraints and known affine constraints. A primal-dual distributed algorithm is proposed that achieves similar regret/violation bounds as those in the single-agent case for the black-box objective and constraint functions. Additionally, the algorithm guarantees an $\mathcal{O}(N\sqrt{T})$ bound on the cumulative violation for the known affine constraints, where $N$ is the number of agents. Hence, it is ensured that the average of the samples satisfies the affine constraints up to the error $\mathcal{O}({N}/{\sqrt{T}})$. Furthermore, we characterize certain conditions under which our algorithm can bound a stronger metric of cumulative violation and provide best-iterate convergence without affine constraint. The method is then applied to both sampled instances from Gaussian processes and a real-world optimal power allocation problem for wireless communication; the results show that our method simultaneously provides close-to-optimal performance and maintains minor violations on average, corroborating our theoretical analysis.
All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization
Authors: Authors: Pablo Barros, Alessandra Sciutti
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.00964
Pdf link: https://arxiv.org/pdf/2310.00964
Abstract In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time. Besides dealing with the increased dynamics of the scenarios due to the opponents' actions, they usually have to understand how to overcome the opponent's strategies. Most of the common solutions, usually based on continual learning or centralized multi-agent experiences, however, do not allow the development of personalized strategies to face individual opponents. In this paper, we propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them. The entire model is trained online, using a composed loss based on a contrastive optimization, to learn competitive and multiplayer games. We evaluate our model on a pokemon duel scenario and the four-player competitive Chef's Hat card game. Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times. We also present a discussion on the impact of our model, in particular on how well it deals with on specific strategy learning for each of the two scenarios.
BeBOP -- Combining Reactive Planning and Bayesian Optimization to Solve Robotic Manipulation Tasks
Authors: Authors: Jonathan Styrud, Matthias Mayr, Erik Hellsten, Volker Krueger, Christian Smith
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.00971
Pdf link: https://arxiv.org/pdf/2310.00971
Abstract Robotic systems for manipulation tasks are increasingly expected to be easy to configure for new tasks. While in the past, robot programs were often written statically and tuned manually, the current, faster transition times call for robust, modular and interpretable solutions that also allow a robotic system to learn how to perform a task. We propose the method Behavior-based Bayesian Optimization and Planning (BeBOP) that combines two approaches for generating behavior trees: we build the structure using a reactive planner and learn specific parameters with Bayesian optimization. The method is evaluated on a set of robotic manipulation benchmarks and is shown to outperform state-of-the-art reinforcement learning algorithms by being up to 46 times faster while simultaneously being less dependent on reward shaping. We also propose a modification to the uncertainty estimate for the random forest surrogate models that drastically improves the results.
ViPlanner: Visual Semantic Imperative Learning for Local Navigation
Authors: Authors: Pascal Roth, Julian Nubert, Fan Yang, Mayank Mittal, Marco Hutter
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.00982
Pdf link: https://arxiv.org/pdf/2310.00982
Abstract Real-time path planning in outdoor environments still challenges modern robotic systems due to differences in terrain traversability, diverse obstacles, and the necessity for fast decision-making. Established approaches have primarily focused on geometric navigation solutions, which work well for structured geometric obstacles but have limitations regarding the semantic interpretation of different terrain types and their affordances. Moreover, these methods fail to identify traversable geometric occurrences, such as stairs. To overcome these issues, we introduce ViPlanner, a learned local path planning approach that generates local plans based on geometric and semantic information. The system is trained using the Imperative Learning paradigm, for which the network weights are optimized end-to-end based on the planning task objective. This optimization uses a differentiable formulation of a semantic costmap, which enables the planner to distinguish between the traversability of different terrains and accurately identify obstacles. The semantic information is represented in 30 classes using an RGB colorspace that can effectively encode the multiple levels of traversability. We show that the planner can adapt to diverse real-world environments without requiring any real-world training. In fact, the planner is trained purely in simulation, enabling a highly scalable training data generation. Experimental results demonstrate resistance to noise, zero-shot sim-to-real transfer, and a decrease of 38.02% in terms of traversability cost compared to purely geometric-based approaches. Code and models are made publicly available: https://github.com/leggedrobotics/viplanner.
A Robust Machine Learning Approach for Path Loss Prediction in 5G Networks with Nested Cross Validation
Authors: Authors: Ibrahim Yazıcı, Emre Gures
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2310.01030
Pdf link: https://arxiv.org/pdf/2310.01030
Abstract The design and deployment of fifth-generation (5G) wireless networks pose significant challenges due to the increasing number of wireless devices. Path loss has a landmark importance in network performance optimization, and accurate prediction of the path loss, which characterizes the attenuation of signal power during transmission, is critical for effective network planning, coverage estimation, and optimization. In this sense, we utilize machine learning (ML) methods, which overcome conventional path loss prediction models drawbacks, for path loss prediction in a 5G network system to facilitate more accurate network planning, resource optimization, and performance improvement in wireless communication systems. To this end, we utilize a novel approach, nested cross validation scheme, with ML to prevent overfitting, thereby getting better generalization error and stable results for ML deployment. First, we acquire a publicly available dataset obtained through a comprehensive measurement campaign conducted in an urban macro-cell scenario located in Beijing, China. The dataset includes crucial information such as longitude, latitude, elevation, altitude, clutter height, and distance, which are utilized as essential features to predict the path loss in the 5G network system. We deploy Support Vector Regression (SVR), CatBoost Regression (CBR), eXtreme Gradient Boosting Regression (XGBR), Artificial Neural Network (ANN), and Random Forest (RF) methods to predict the path loss, and compare the prediction results in terms of Mean Absolute Error (MAE) and Mean Square Error (MSE). As per obtained results, XGBR outperforms the rest of the methods. It outperforms CBR with a slight performance differences by 0.4 % and 1 % in terms of MAE and MSE metrics, respectively. On the other hand, it outperforms the rest of the methods with clear performance differences.
A Novel Approach for Machine Learning-based Load Balancing in High-speed Train System using Nested Cross Validation
Authors: Authors: Ibrahim Yazici, Emre Gures
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2310.01034
Pdf link: https://arxiv.org/pdf/2310.01034
Abstract Fifth-generation (5G) mobile communication networks have recently emerged in various fields, including highspeed trains. However, the dense deployment of 5G millimeter wave (mmWave) base stations (BSs) and the high speed of moving trains lead to frequent handovers (HOs), which can adversely affect the Quality-of-Service (QoS) of mobile users. As a result, HO optimization and resource allocation are essential considerations for managing mobility in high-speed train systems. In this paper, we model system performance of a high-speed train system with a novel machine learning (ML) approach that is nested cross validation scheme that prevents information leakage from model evaluation into the model parameter tuning, thereby avoiding overfitting and resulting in better generalization error. To this end, we employ ML methods for the high-speed train system scenario. Handover Margin (HOM) and Time-to-Trigger (TTT) values are used as features, and several KPIs are used as outputs, and several ML methods including Gradient Boosting Regression (GBR), Adaptive Boosting (AdaBoost), CatBoost Regression (CBR), Artificial Neural Network (ANN), Kernel Ridge Regression (KRR), Support Vector Regression (SVR), and k-Nearest Neighbor Regression (KNNR) are employed for the problem. Finally, performance comparisons of the cross validation schemes with the methods are made in terms of mean absolute error (MAE) and mean square error (MSE) metrics are made. As per obtained results, boosting methods, ABR, CBR, GBR, with nested cross validation scheme superiorly outperforms conventional cross validation scheme results with the same methods. On the other hand, SVR, KNRR, KRR, ANN with the nested scheme produce promising results for prediction of some KPIs with respect to their conventional scheme employment.
Language Model Decoding as Direct Metrics Optimization
Authors: Authors: Haozhe Ji, Pei Ke, Hongning Wang, Minlie Huang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.01041
Pdf link: https://arxiv.org/pdf/2310.01041
Abstract Despite the remarkable advances in language modeling, current mainstream decoding methods still struggle to generate texts that align with human texts across different aspects. In particular, sampling-based methods produce less-repetitive texts which are often disjunctive in discourse, while search-based methods maintain topic coherence at the cost of increased repetition. Overall, these methods fall short in achieving holistic alignment across a broad range of aspects. In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts measured by multiple metrics of desired aspects simultaneously. The resulting decoding distribution enjoys an analytical solution that scales the input language model distribution via a sequence-level energy function defined by these metrics. And most importantly, we prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts. To facilitate tractable sampling from this globally normalized distribution, we adopt the Sampling-Importance-Resampling technique. Experiments on various domains and model scales demonstrate the superiority of our method in metrics alignment with human texts and human evaluation over strong baselines.
Advancements in Optimization: Adaptive Differential Evolution with Diversification Strategy
Authors: Authors: Sarit Maitra
Subjects: Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.01057
Pdf link: https://arxiv.org/pdf/2310.01057
Abstract This study presents a population-based evolutionary optimization algorithm (Adaptive Differential Evolution with Diversification Strategies or ADEDS). The algorithm was initially developed using the sinusoidal objective function and subsequently evaluated with a wide-ranging set of 22 benchmark functions, including Rosenbrock, Rastrigin, Ackley, and DeVilliersGlasser02, among others. This work employs single-objective optimization in a two-dimensional space and runs ADEDS on each of these benchmark functions with multiple iterations. The optimization algorithms used in supply chain analytics have a direct impact on the efficiency and cost-effectiveness of supply chain operations. The findings reveal the effectiveness of ADEDS in finding better solutions, which implies its importance for improving supply chain efficiency, reducing costs, and enhancing overall performance.
On Fulfilling the Exigent Need for Automating and Modernizing Logistics Infrastructure in India: Enabling AI-based Integration, Digitalization, and Smart Automation of Industrial Parks and Robotic Warehouses
Authors: Authors: Shaurya Shriyam, Prashant Palkar, Amber Srivastava
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.01077
Pdf link: https://arxiv.org/pdf/2310.01077
Abstract To stay competitive, the Low- or Middle-Income Countries (LMICs) need to embrace Industry 4.0 and Logistics 4.0. This requires government-level interventions and policy-making to incentivize quality product solutions and drive innovation in traditionally resistant economic sectors. In this position paper, we support the establishment of Smart Industrial Parks (SIPs) with a focus on enhancing operational efficiencies and bringing together MSMEs and startups targeting niche clientele with innovative Industry 4.0 solutions. SIPs along with the phased deployment of well-planned robotic automation technologies shall enable bringing down India's untenable logistics costs. Toward the successful execution of SIPs, we are required to implement the efficient allocation of manufacturing resources and capabilities within SIPs. Thus, we emphasize the importance of efficient resource utilization, collaboration, and technology adoption in industrial parks to promote industrial development and economic growth. We advocate the use of a cloud-based cyber-physical system for real-time data access and analysis in SIPs. Such centralized cloud-based monitoring of factory floors, warehouses, and industrial units using IoT infrastructure shall improve decision-making, efficiency, and safety. Digital Twins (DTs), which are cyber-replicas of physical systems, could play a significant role in enabling simulation, optimization, and real-time monitoring of smart manufacturing and distributed manufacturing systems. However, there are several challenges involved in implementing DTs in distributed manufacturing systems, such as defining data schemas and collaboration protocols, ensuring interoperability, the need for effective authentication technology, distributed machine learning models, and scalability to manage multiple DTs.
A Novel Approach with Monte-Carlo Simulation and Hybrid Optimization Approach for Inventory Management with Stochastic Demand
Authors: Authors: Sarit Maitra, Vivek Mishra, Sukanya Kundu
Subjects: Computational Engineering, Finance, and Science (cs.CE); Applications (stat.AP)
Arxiv link: https://arxiv.org/abs/2310.01079
Pdf link: https://arxiv.org/pdf/2310.01079
Abstract This study addresses the difficulties associated with inventory management of products with stochastic demand. The objective is to find the optimal combination of order quantity and reorder point that maximizes profit while considering ethical considerations in inventory management. The ethical considerations are risk assessment, social responsibility, environmental sustainability, and customer satisfaction. Monte Carlo simulation (MCS) is used in this study to generate a distribution of demand and lead times for the inventory items, which is then used to estimate the potential profit and risk associated with different inventory policies. This work proposes a hybrid optimization approach combining Gaussian process regression and conditioning function to efficiently search the high-dimensional space of potential continuous review (r, Q) and periodic review (p, Q) values to find the optimal combination that maximizes profit while considering ethical considerations. The findings show that both the (r, Q) and (p, Q) approaches can effectively manage inventory with stochastic demand, but the (r, Q) approach performs better (profits up by 12.73%) when demand is more volatile. The study adds quantifiable risk assessment and sensitivity analysis to these considerations, considering the variation in demand and expected output in profit percentage. The results provide useful information for making ethical and responsible choices in supply chain analytics, boosting efficiency and profits.
Linear attention is (maybe) all you need (to understand transformer optimization)
Authors: Authors: Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.01082
Pdf link: https://arxiv.org/pdf/2310.01082
Abstract Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training transformers by carefully studying a simple yet canonical linearized shallow transformer model. Specifically, we train linear transformers to solve regression tasks, inspired by J. von Oswald et al. (ICML 2023), and K. Ahn et al. (NeurIPS 2023). Most importantly, we observe that our proposed linearized models can reproduce several prominent aspects of transformer training dynamics. Consequently, the results obtained in this paper suggest that a simple linearized transformer model could actually be a valuable, realistic abstraction for understanding transformer optimization.
Non-negative isomorphic neural networks for photonic neuromorphic accelerators
Authors: Authors: Manos Kirtas, Nikolaos Passalis, Nikolaos Pleros, Anastasios Tefas
Subjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.01084
Pdf link: https://arxiv.org/pdf/2310.01084
Abstract Neuromorphic photonic accelerators are becoming increasingly popular, since they can significantly improve computation speed and energy efficiency, leading to femtojoule per MAC efficiency. However, deploying existing DL models on such platforms is not trivial, since a great range of photonic neural network architectures relies on incoherent setups and power addition operational schemes that cannot natively represent negative quantities. This results in additional hardware complexity that increases cost and reduces energy efficiency. To overcome this, we can train non-negative neural networks and potentially exploit the full range of incoherent neuromorphic photonic capabilities. However, existing approaches cannot achieve the same level of accuracy as their regular counterparts, due to training difficulties, as also recent evidence suggests. To this end, we introduce a methodology to obtain the non-negative isomorphic equivalents of regular neural networks that meet requirements of neuromorphic hardware, overcoming the aforementioned limitations. Furthermore, we also introduce a sign-preserving optimization approach that enables training of such isomorphic networks in a non-negative manner.
Energy-Guided Continuous Entropic Barycenter Estimation for General Costs
Authors: Authors: Alexander Kolesov, Petr Mokrov, Igor Udovichenko, Milena Gazdieva, Gudmund Pammer, Evgeny Burnaev, Alexander Korotin
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.01105
Pdf link: https://arxiv.org/pdf/2310.01105
Abstract Optimal transport (OT) barycenters are a mathematically grounded way of averaging probability distributions while capturing their geometric properties. In short, the barycenter task is to take the average of a collection of probability distributions w.r.t. given OT discrepancies. We propose a novel algorithm for approximating the continuous Entropic OT (EOT) barycenter for arbitrary OT cost functions. Our approach is built upon the dual reformulation of the EOT problem based on weak OT, which has recently gained the attention of the ML community. Beyond its novelty, our method enjoys several advantageous properties: (i) we establish quality bounds for the recovered solution; (ii) this approach seemlessly interconnects with the Energy-Based Models (EBMs) learning procedure enabling the use of well-tuned algorithms for the problem of interest; (iii) it provides an intuitive optimization scheme avoiding min-max, reinforce and other intricate technical tricks. For validation, we consider several low-dimensional scenarios and image-space setups, including non-Euclidean cost functions. Furthermore, we investigate the practical task of learning the barycenter on an image manifold generated by a pretrained generative model, opening up new directions for real-world applications.
Stability and Generalization for Minibatch SGD and Local SGD
Authors: Authors: Yunwen Lei, Tao Sun, Mingrui Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01139
Pdf link: https://arxiv.org/pdf/2310.01139
Abstract The increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we pioneer the stability and generalization analysis of minibatch and local SGD to understand their learnability. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. Our stability bounds imply optimistic risk bounds which decay fast under a low noise condition. We show both minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.
Light Schrödinger Bridge
Authors: Authors: Alexander Korotin, Nikita Gushchin, Evgeny Burnaev
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.01174
Pdf link: https://arxiv.org/pdf/2310.01174
Abstract Despite the recent advances in the field of computational Schrodinger Bridges (SB), most existing SB solvers are still heavy-weighted and require complex optimization of several neural networks. It turns out that there is no principal solver which plays the role of simple-yet-effective baseline for SB just like, e.g., $k$-means method in clustering, logistic regression in classification or Sinkhorn algorithm in discrete optimal transport. We address this issue and propose a novel fast and simple SB solver. Our development is a smart combination of two ideas which recently appeared in the field: (a) parameterization of the Schrodinger potentials with sum-exp quadratic functions and (b) viewing the log-Schrodinger potentials as the energy functions. We show that combined together these ideas yield a lightweight, simulation-free and theoretically justified SB solver with a simple straightforward optimization objective. As a result, it allows solving SB in moderate dimensions in a matter of minutes on CPU without a painful hyperparameter selection. Our light solver resembles the Gaussian mixture model which is widely used for density estimation. Inspired by this similarity, we also prove an important theoretical result showing that our light solver is a universal approximator of SBs. The code for the LightSB solver can be found at https://github.com/ngushchin/LightSB
Graph-Theoretic Bézier Curve Optimization over Safe Corridors for Safe and Smooth Motion Planning
Authors: Authors: Soufyan Zayou, Ömür Arslan
Subjects: Robotics (cs.RO); Computational Geometry (cs.CG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.01190
Pdf link: https://arxiv.org/pdf/2310.01190
Abstract As a parametric motion representation, B\'ezier curves have significant applications in polynomial trajectory optimization for safe and smooth motion planning of various robotic systems, including flying drones, autonomous vehicles, and robotic manipulators. An essential component of B\'ezier curve optimization is the optimization objective, as it significantly influences the resulting robot motion. Standard physical optimization objectives, such as minimizing total velocity, acceleration, jerk, and snap, are known to yield quadratic optimization of B\'ezier curve control points. In this paper, we present a unifying graph-theoretic perspective for defining and understanding B\'ezier curve optimization objectives using a consensus distance of B\'ezier control points derived based on their interaction graph Laplacian. In addition to demonstrating how standard physical optimization objectives define a consensus distance between B\'ezier control points, we also introduce geometric and statistical optimization objectives as alternative consensus distances, constructed using finite differencing and differential variance. To compare these optimization objectives, we apply B\'ezier curve optimization over convex polygonal safe corridors that are automatically constructed around a maximal-clearance minimal-length reference path. We provide an explicit analytical formulation for quadratic optimization of B\'ezier curves using B\'ezier matrix operations. We conclude that the norm and variance of the finite differences of B\'ezier control points lead to simpler and more intuitive interaction graphs and optimization objectives compared to B\'ezier derivative norms, despite having similar robot motion profiles.
Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning
Authors: Authors: Alexey Skrynnik, Anton Andreychuk, Maria Nesterova, Konstantin Yakovlev, Aleksandr Panov
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2310.01207
Pdf link: https://arxiv.org/pdf/2310.01207
Abstract Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph and is typically solved in a centralized fashion. Conversely, in this work, we investigate the decentralized MAPF setting, when the central controller that posses all the information on the agents' locations and goals is absent and the agents have to sequientially decide the actions on their own without having access to a full state of the environment. We focus on the practically important lifelong variant of MAPF, which involves continuously assigning new goals to the agents upon arrival to the previous ones. To address this complex problem, we propose a method that integrates two complementary approaches: planning with heuristic search and reinforcement learning through policy optimization. Planning is utilized to construct and re-plan individual paths. We enhance our planning algorithm with a dedicated technique tailored to avoid congestion and increase the throughput of the system. We employ reinforcement learning to discover the collision avoidance policies that effectively guide the agents along the paths. The policy is implemented as a neural network and is effectively trained without any reward-shaping or external guidance. We evaluate our method on a wide range of setups comparing it to the state-of-the-art solvers. The results show that our method consistently outperforms the learnable competitors, showing higher throughput and better ability to generalize to the maps that were unseen at the training stage. Moreover our solver outperforms a rule-based one in terms of throughput and is an order of magnitude faster than a state-of-the-art search-based solver.
ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale
Authors: Authors: Markus Frohmann, Carolin Holtermann, Shahed Masoudian, Anne Lauscher, Navid Rekabsaz
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.01217
Pdf link: https://arxiv.org/pdf/2310.01217
Abstract Multi-task learning (MTL) has shown considerable practical benefits, particularly when using pre-trained language models (PLMs). While this is commonly achieved by simultaneously learning $n$ tasks under a joint optimization procedure, recent methods such as AdapterFusion structure the problem into two distinct stages: (i) task learning, where knowledge specific to a task is encapsulated within sets of parameters (\eg adapters), and (ii) transfer, where this already learned knowledge is leveraged for a target task. This separation of concerns provides numerous benefits, such as promoting reusability, and addressing cases involving data privacy and societal concerns; on the flip side, current two-stage MTL methods come with the cost of introducing a substantial number of additional parameters. In this work, we address this issue by leveraging the usefulness of linearly scaling the output representations of source adapters for transfer learning. We introduce ScaLearn, a simple and highly parameter-efficient two-stage MTL method that capitalizes on the knowledge of the source tasks by learning a minimal set of scaling parameters that enable effective knowledge transfer to a target task. Our experiments on three benchmarks (GLUE, SuperGLUE, and HumSet) show that our ScaLearn, in addition to facilitating the benefits of two-stage MTL, consistently outperforms strong baselines with only a small number of transfer parameters - roughly 0.35% of those of AdapterFusion. Remarkably, we observe that ScaLearn maintains its strong abilities even when further reducing parameters through uniform scaling and layer-sharing, achieving similarly competitive results with only $8$ transfer parameters for each target task. Our proposed approach thus demonstrates the power of simple scaling as a promise for more efficient task transfer.
Cell-Free Bistatic Backscatter Communication: Channel Estimation, Optimization, and Performance Analysis
Authors: Authors: Diluka Galappaththige, Fatemeh Rezaei, Chintha Tellambura, Amine Maaref
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.01264
Pdf link: https://arxiv.org/pdf/2310.01264
Abstract This study introduces and investigates the integration of a cell-free architecture with bistatic backscatter communication (BiBC), referred to as cell-free BiBC or distributed access point (AP)-assisted BiBC, which can enable potential applications in future (EH)-based Internet-of-Things (IoT) networks. To that purpose, we first present a pilot-based channel estimation scheme for estimating the direct, cascaded, forward channels of the proposed system setup. We next utilize the channel estimates for designing the optimal beamforming weights at the APs, reflection coefficients at the tags, and reception filters at the reader to maximize the tag sum rate while meeting the tags' minimum energy requirements. Because the proposed maximization problem is non-convex, we propose a solution based on alternative optimization, fractional programming, and Rayleigh quotient techniques. We also quantify the computational complexity of the developed algorithms. Finally, we present extensive numerical results to validate the proposed channel estimation scheme and optimization framework, as well as the performance of the integration of these two technologies. Compared to the random beamforming/combining benchmark, our algorithm yields impressive gains. For example, it achieves $\sim$ 64.8\% and $\sim$ 253.5\% gains in harvested power and tag sum rate, respectively, for 10 dBm with 36 APs and 3 tags.
Learning manipulation of steep granular slopes for fast Mini Rover turning
Authors: Authors: Deniz Kerimoglu, Daniel Soto, Malone Lincoln Hemsley, Joseph Brunner, Sehoon Ha, Tingnan Zhang, Daniel I. Goldman
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.01273
Pdf link: https://arxiv.org/pdf/2310.01273
Abstract Future planetary exploration missions will require reaching challenging regions such as craters and steep slopes. Such regions are ubiquitous and present science-rich targets potentially containing information regarding the planet's internal structure. Steep slopes consisting of low-cohesion regolith are prone to flow downward under small disturbances, making it very challenging for autonomous rovers to traverse. Moreover, the navigation trajectories of rovers are heavily limited by the terrain topology and future systems will need to maneuver on flowable surfaces without getting trapped, allowing them to further expand their reach and increase mission efficiency. In this work, we used a laboratory-scale rover robot and performed maneuvering experiments on a steep granular slope of poppy seeds to explore the rover's turning capabilities. The rover is capable of lifting, sweeping, and spinning its wheels, allowing it to execute leg-like gait patterns. The high-dimensional actuation capabilities of the rover facilitate effective manipulation of the underlying granular surface. We used Bayesian Optimization (BO) to gain insight into successful turning gaits in high dimensional search space and found strategies such as differential wheel spinning and pivoting around a single sweeping wheel. We then used these insights to further fine-tune the turning gait, enabling the rover to turn 90 degrees at just above 4 seconds with minimal slip. Combining gait optimization and human-tuning approaches, we found that fast turning is empowered by creating anisotropic torques with the sweeping wheel.
Coupling public and private gradient provably helps optimization
Authors: Authors: Ruixuan Liu, Zhiqi Bu, Yu-xiang Wang, Sheng Zha, George Karypis
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.01304
Pdf link: https://arxiv.org/pdf/2310.01304
Abstract The success of large neural networks is crucially determined by the availability of data. It has been observed that training only on a small amount of public data, or privately on the abundant private data can lead to undesirable degradation of accuracy. In this work, we leverage both private and public data to improve the optimization, by coupling their gradients via a weighted linear combination. We formulate an optimal solution for the optimal weight in the convex setting to indicate that the weighting coefficient should be hyperparameter-dependent. Then, we prove the acceleration in the convergence of non-convex loss and the effects of hyper-parameters such as privacy budget, number of iterations, batch size, and model size on the choice of the weighting coefficient. We support our analysis with empirical experiments across language and vision benchmarks, and provide a guideline for choosing the optimal weight of the gradient coupling.
Optimistic Online Caching for Batched Requests
Authors: Authors: Francescomaria Faticanti, Giovanni Neglia
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2310.01309
Pdf link: https://arxiv.org/pdf/2310.01309
Abstract In this paper we study online caching problems where predictions of future requests, e.g., provided by a machine learning model, are available. Typical online optimistic policies are based on the Follow-The-Regularized-Leader algorithm and have higher computational cost than classic ones like LFU, LRU, as each update of the cache state requires to solve a constrained optimization problem. In this work we analysed the behaviour of two different optimistic policies in a \textit{batched} case, i.e., when the cache is updated less frequently in order to amortize the update cost over time or over multiple requests. Experimental results show that such an optimistic batched approach outperforms classical caching policies both on stationary and real traces
JugglePAC: A Pipelined Accumulation Circuit
Authors: Authors: Ahmad Houraniah, H. Fatih Ugurdag, Furkan Aydin
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2310.01336
Pdf link: https://arxiv.org/pdf/2310.01336
Abstract Summing a set of numbers, namely, "Accumulation," is a subtask within many computational tasks. If the numbers to sum arrive non-stop in back-to-back clock cycles at high clock frequencies, summing them without allowing them to pile up can be quite a challenge, that is, when the latency of addition (i.e., summing two numbers) is longer than one clock cycle, which is always the case for floating-point numbers. This could also be the case for integer summations with high clock frequencies. In the case of floating-point numbers, this is handled by pipelining the adder, but that does not solve all problems. The challenges include optimization of speed, area, and latency. As well as the adaptability of the design to different application requirements, such as the ability to handle variable-size subsequent data sets with no time gap in between and with results produced in the input-order. All these factors make designing an efficient floating-point accumulator a non-trivial problem. Integer accumulation is a relatively simpler problem, where high frequencies can be achieved by using carry-save tree adders. This can then be further improved by efficient resource-sharing. In this paper, we present two fast and area-efficient accumulation circuits, JugglePAC and INTAC. JugglePAC is tailored for floating-point reduction operations (such as accumulation) and offers significant advantages with respect to the literature in terms of speed, area, and adaptability to various application requirements. INTAC is designed for fast integer accumulation. Using carry-save adders and resource-sharing, it can achieve very high clock frequencies while maintaining a low area complexity.
Toward Scalable Visual Servoing Using Deep Reinforcement Learning and Optimal Control
Authors: Authors: Salar Asayesh, Hossein Sheikhi Darani, Mo chen, Mehran Mehrandezh, Kamal Gupta
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.01360
Pdf link: https://arxiv.org/pdf/2310.01360
Abstract Classical pixel-based Visual Servoing (VS) approaches offer high accuracy but suffer from a limited convergence area due to optimization nonlinearity. Modern deep learning-based VS methods overcome traditional vision issues but lack scalability, requiring training on limited scenes. This paper proposes a hybrid VS strategy utilizing Deep Reinforcement Learning (DRL) and optimal control to enhance both convergence area and scalability. The DRL component of our approach separately handles representation and policy learning to enhance scalability, generalizability, learning efficiency and ease domain adaptation. Moreover, the optimal control part ensures high end-point accuracy. Our method showcases remarkable achievements in terms of high convergence rates and minimal end-positioning errors using a 7-DOF manipulator. Importantly, it exhibits scalability across more than 1000 distinct scenes. Furthermore, we demonstrate its capacity for generalization to previously unseen datasets. Lastly, we illustrate the real-world applicability of our approach, highlighting its adaptability through single-shot domain transfer learning in environments with noise and occlusions. Real-robot experiments can be found at \url{https://sites.google.com/view/vsls}.
A Learning Based Scheme for Fair Timeliness in Sparse Gossip Networks
Authors: Authors: Purbesh Mitra, Sennur Ulukus
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.01396
Pdf link: https://arxiv.org/pdf/2310.01396
Abstract We consider a gossip network, consisting of $n$ nodes, which tracks the information at a source. The source updates its information with a Poisson arrival process and also sends updates to the nodes in the network. The nodes themselves can exchange information among themselves to become as timely as possible. However, the network structure is sparse and irregular, i.e., not every node is connected to every other node in the network, rather, the order of connectivity is low, and varies across different nodes. This asymmetry of the network implies that the nodes in the network do not perform equally in terms of timelines. Due to the gossiping nature of the network, some nodes are able to track the source very timely, whereas, some nodes fall behind versions quite often. In this work, we investigate how the rate-constrained source should distribute its update rate across the network to maintain fairness regarding timeliness, i.e., the overall worst case performance of the network can be minimized. Due to the continuous search space for optimum rate allocation, we formulate this problem as a continuum-armed bandit problem and employ Gaussian process based Bayesian optimization to meet a trade-off between exploration and exploitation sequentially.
Keyword: adam

On the Counting of Involutory MDS Matrices
Authors: Authors: Susanta Samanta
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2310.00090
Pdf link: https://arxiv.org/pdf/2310.00090
Abstract The optimal branch number of MDS matrices has established their prominence in the design of diffusion layers for various block ciphers and hash functions. Consequently, several matrix structures have been proposed for designing MDS matrices, including Hadamard and circulant matrices. In this paper, we first provide the count of Hadamard MDS matrices of order $4$ over the field $\mathbb{F}{2^r}$. Subsequently, we present the counts of order $2$ MDS matrices and order $2$ involutory MDS matrices over the field $\mathbb{F}{2^r}$. Finally, leveraging these counts of order $2$ matrices, we derive an upper bound for the number of all involutory MDS matrices of order $4$ over $\mathbb{F}_{2^r}$.
DataDAM: Efficient Dataset Distillation with Attention Matching
Authors: Authors: Ahmad Sajedi, Samir Khaki, Ehsan Amjadian, Lucy Z. Liu, Yuri A. Lawryshyn, Konstantinos N. Plataniotis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.00093
Pdf link: https://arxiv.org/pdf/2310.00093
Abstract Researchers have long tried to minimize training costs in deep learning while maintaining strong generalization across diverse datasets. Emerging research on dataset distillation aims to reduce training costs by creating a small synthetic set that contains the information of a larger real dataset and ultimately achieves test accuracy equivalent to a model trained on the whole dataset. Unfortunately, the synthetic data generated by previous methods are not guaranteed to distribute and discriminate as well as the original training data, and they incur significant computational costs. Despite promising results, there still exists a significant performance gap between models trained on condensed synthetic sets and those trained on the whole dataset. In this paper, we address these challenges using efficient Dataset Distillation with Attention Matching (DataDAM), achieving state-of-the-art performance while reducing training costs. Specifically, we learn synthetic images by matching the spatial attention maps of real and synthetic data generated by different layers within a family of randomly initialized neural networks. Our method outperforms the prior methods on several datasets, including CIFAR10/100, TinyImageNet, ImageNet-1K, and subsets of ImageNet-1K across most of the settings, and achieves improvements of up to 6.5% and 4.1% on CIFAR100 and ImageNet-1K, respectively. We also show that our high-quality distilled images have practical benefits for downstream applications, such as continual learning and neural architecture search.
Keyword: gradient

Adversarial Driving Behavior Generation Incorporating Human Risk Cognition for Autonomous Vehicle Evaluation
Authors: Authors: Zhen Liu, Hang Gao, Hao Ma, Shuo Cai, Yunfeng Hu, Ting Qu, Hong Chen, Xun Gong
Subjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.00029
Pdf link: https://arxiv.org/pdf/2310.00029
Abstract Autonomous vehicle (AV) evaluation has been the subject of increased interest in recent years both in industry and in academia. This paper focuses on the development of a novel framework for generating adversarial driving behavior of background vehicle interfering against the AV to expose effective and rational risky events. Specifically, the adversarial behavior is learned by a reinforcement learning (RL) approach incorporated with the cumulative prospect theory (CPT) which allows representation of human risk cognition. Then, the extended version of deep deterministic policy gradient (DDPG) technique is proposed for training the adversarial policy while ensuring training stability as the CPT action-value function is leveraged. A comparative case study regarding the cut-in scenario is conducted on a high fidelity Hardware-in-the-Loop (HiL) platform and the results demonstrate the adversarial effectiveness to infer the weakness of the tested AV.
Federated Learning with Differential Privacy for End-to-End Speech Recognition
Authors: Authors: Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Tatiana Likhomanenko
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.00098
Pdf link: https://arxiv.org/pdf/2310.00098
Abstract While federated learning (FL) has recently emerged as a promising approach to train machine learning models, it is limited to only preliminary explorations in the domain of automatic speech recognition (ASR). Moreover, FL does not inherently guarantee user privacy and requires the use of differential privacy (DP) for robust privacy guarantees. However, we are not aware of prior work on applying DP to FL for ASR. In this paper, we aim to bridge this research gap by formulating an ASR benchmark for FL with DP and establishing the first baselines. First, we extend the existing research on FL for ASR by exploring different aspects of recent $\textit{large end-to-end transformer models}$: architecture design, seed models, data heterogeneity, domain shift, and impact of cohort size. With a $\textit{practical}$ number of central aggregations we are able to train $\textbf{FL models}$ that are \textbf{nearly optimal} even with heterogeneous data, a seed model from another domain, or no pre-trained seed model. Second, we apply DP to FL for ASR, which is non-trivial since DP noise severely affects model training, especially for large transformer models, due to highly imbalanced gradients in the attention block. We counteract the adverse effect of DP noise by reviving per-layer clipping and explaining why its effect is more apparent in our case than in the prior work. Remarkably, we achieve user-level ($7.2$, $10^{-9}$)-$\textbf{DP}$ (resp. ($4.5$, $10^{-9}$)-$\textbf{DP}$) with a 1.3% (resp. 4.6%) absolute drop in the word error rate for extrapolation to high (resp. low) population scale for $\textbf{FL with DP in ASR}$.
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Authors: Authors: Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.00212
Pdf link: https://arxiv.org/pdf/2310.00212
Abstract Large Language Models (LLMs) can acquire extensive world knowledge through pre-training on large corpora. However, due to exposure to low-quality data, LLMs may exhibit harmful behavior without aligning with human values. The dominant approach for steering LLMs towards beneficial behavior involves Reinforcement Learning with Human Feedback (RLHF), with Proximal Policy Optimization (PPO) serving as the default RL optimizer. Despite its effectiveness, PPO has limitations when optimizing rewards trained from comparison-based loss. Primarily, PPO is not invariant to equivalent reward functions containing identical preference information due to the need to calibrate the reward scale. Additionally, PPO's necessity for token-wise updates introduces complexity in both function approximation and algorithm design compared to trajectory-wise optimization. This paper proposes a new framework, reinforcement learning with relative feedback, and a novel trajectory-wise policy gradient algorithm, Pairwise Proximal Policy Optimization (P3O) that operates directly on comparative rewards. We show theoretically that P3O is invariant to equivalent rewards and avoids the complexity of PPO. Empirical evaluations demonstrate that P3O outperforms PPO in the KL-Reward trade-off and can align with human preferences as well as or better than prior methods. In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.
Source Inference Attacks: Beyond Membership Inference Attacks in Federated Learning
Authors: Authors: Hongsheng Hu, Xuyun Zhang, Zoran Salcic, Lichao Sun, Kim-Kwang Raymond Choo, Gillian Dobbie
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.00222
Pdf link: https://arxiv.org/pdf/2310.00222
Abstract Federated learning (FL) is a popular approach to facilitate privacy-aware machine learning since it allows multiple clients to collaboratively train a global model without granting others access to their private data. It is, however, known that FL can be vulnerable to membership inference attacks (MIAs), where the training records of the global model can be distinguished from the testing records. Surprisingly, research focusing on the investigation of the source inference problem appears to be lacking. We also observe that identifying a training record's source client can result in privacy breaches extending beyond MIAs. For example, consider an FL application where multiple hospitals jointly train a COVID-19 diagnosis model, membership inference attackers can identify the medical records that have been used for training, and any additional identification of the source hospital can result the patient from the particular hospital more prone to discrimination. Seeking to contribute to the literature gap, we take the first step to investigate source privacy in FL. Specifically, we propose a new inference attack (hereafter referred to as source inference attack -- SIA), designed to facilitate an honest-but-curious server to identify the training record's source client. The proposed SIAs leverage the Bayesian theorem to allow the server to implement the attack in a non-intrusive manner without deviating from the defined FL protocol. We then evaluate SIAs in three different FL frameworks to show that in existing FL frameworks, the clients sharing gradients, model parameters, or predictions on a public dataset will leak such source information to the server. We also conduct extensive experiments on various datasets to investigate the key factors in an SIA. The experimental results validate the efficacy of the proposed SIAs.
Technical Report of 2023 ABO Fine-grained Semantic Segmentation Competition
Authors: Authors: Zeyu Dong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.00427
Pdf link: https://arxiv.org/pdf/2310.00427
Abstract In this report, we describe the technical details of our submission to the 2023 ABO Fine-grained Semantic Segmentation Competition, by Team "Zeyu_Dong" (username:ZeyuDong). The task is to predicate the semantic labels for the convex shape of five categories, which consist of high-quality, standardized 3D models of real products available for purchase online. By using DGCNN as the backbone to classify different structures of five classes, We carried out numerous experiments and found learning rate stochastic gradient descent with warm restarts and setting different rate of factors for various categories contribute most to the performance of the model. The appropriate method helps us rank 3rd place in the Dev phase of the 2023 ICCV 3DVeComm Workshop Challenge.
Optimizing Parameters of the DC Power Flow
Authors: Authors: Babak Taheri, Daniel K. Molzahn
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.00447
Pdf link: https://arxiv.org/pdf/2310.00447
Abstract Many power system operation and planning problems use the DC power flow approximation to address computational challenges from the nonlinearity of the AC power flow equations. The DC power flow simplifies the AC power flow equations to a linear form that relates active power flows to phase angle differences across branches, parameterized by coefficients based on the branches' susceptances. Inspired by techniques for training machine learning models, this paper proposes an algorithm that seeks optimal coefficient and bias parameters to improve the DC power flow approximation's accuracy. Specifically, the proposed algorithm selects the coefficient and bias parameter values that minimize the discrepancy, across a specified set of operational scenarios, between the power flows given by the DC approximation and the power flows from the AC equations. Gradient-based optimization methods like Broyden-Fletcher-Goldfarb-Shanno (BFGS), Limited-Memory BFGS (L-BFGS), and Truncated Newton Conjugate-Gradient (TNC) enable solution of the proposed algorithm for large systems. After an off-line training phase, the optimized parameters are used to improve the accuracy of the DC power flow during on-line computations. Numerical results show several orders of magnitude improvements in accuracy relative to a hot-start DC power flow approximation across a range of test cases.
Diff-DOPE: Differentiable Deep Object Pose Estimation
Authors: Authors: Jonathan Tremblay, Bowen Wen, Valts Blukis, Balakumar Sundaralingam, Stephen Tyree, Stan Birchfield
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.00463
Pdf link: https://arxiv.org/pdf/2310.00463
Abstract We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object. The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model. We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation datasets. Our approach is a departure from recent methods in which the pose refiner is a deep neural network trained on a large synthetic dataset to map inputs to refinement steps. Rather, our use of differentiable rendering allows us to avoid training altogether. Our approach performs multiple gradient descent optimizations in parallel with different random learning rates to avoid local minima from symmetric objects, similar appearances, or wrong step size. Various modalities can be used, e.g., RGB, depth, intensity edges, and object segmentation masks. We present experiments examining the effect of various choices, showing that the best results are found when the RGB image is accompanied by an object mask and depth image to guide the optimization process.
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
Authors: Authors: Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang, Ninghao Liu, Dong Yu
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.00492
Pdf link: https://arxiv.org/pdf/2310.00492
Abstract Large Language Models (LLMs) have achieved remarkable success, demonstrating powerful instruction-following capabilities across diverse tasks. Instruction fine-tuning is critical in enabling LLMs to align with user intentions and effectively follow instructions. In this work, we investigate how instruction fine-tuning modifies pre-trained models, focusing on two perspectives: instruction recognition and knowledge evolution. To study the behavior shift of LLMs, we employ a suite of local and global explanation methods, including a gradient-based approach for input-output attribution and techniques for interpreting patterns and concepts in self-attention and feed-forward layers. Our findings reveal three significant impacts of instruction fine-tuning: 1) It empowers LLMs to better recognize the instruction parts from user prompts, thereby facilitating high-quality response generation and addressing the ``lost-in-the-middle'' issue observed in pre-trained models; 2) It aligns the knowledge stored in feed-forward layers with user-oriented tasks, exhibiting minimal shifts across linguistic levels. 3) It facilitates the learning of word-word relations with instruction verbs through the self-attention mechanism, particularly in the lower and middle layers, indicating enhanced recognition of instruction words. These insights contribute to a deeper understanding of the behavior shifts in LLMs after instruction fine-tuning and lay the groundwork for future research aimed at interpreting and optimizing LLMs for various applications. We will release our code and data soon.
Understanding the Robustness of Randomized Feature Defense Against Query-Based Adversarial Attacks
Authors: Authors: Quang H. Nguyen, Yingjie Lao, Tung Pham, Kok-Seng Wong, Khoa D. Doan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.00567
Pdf link: https://arxiv.org/pdf/2310.00567
Abstract Recent works have shown that deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify. Even with access only to the model's output, an attacker can employ black-box attacks to generate such adversarial examples. In this work, we propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time. Our theoretical analysis confirms that this method effectively enhances the model's resilience against both score-based and decision-based black-box attacks. Importantly, our defense does not necessitate adversarial training and has minimal impact on accuracy, rendering it applicable to any pre-trained model. Our analysis also reveals the significance of selectively adding noise to different parts of the model based on the gradient of the adversarial objective function, which can be varied during the attack. We demonstrate the robustness of our defense against multiple black-box attacks through extensive empirical experiments involving diverse models with various architectures.
Performance evaluation of Machine learning algorithms for Intrusion Detection System
Authors: Authors: Sudhanshu Sekhar Tripathy, Bichitrananda Behera
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.00594
Pdf link: https://arxiv.org/pdf/2310.00594
Abstract The escalation of hazards to safety and hijacking of digital networks are among the strongest perilous difficulties that must be addressed in the present day. Numerous safety procedures were set up to track and recognize any illicit activity on the network's infrastructure. IDS are the best way to resist and recognize intrusions on internet connections and digital technologies. To classify network traffic as normal or anomalous, Machine Learning (ML) classifiers are increasingly utilized. An IDS with machine learning increases the accuracy with which security attacks are detected. This paper focuses on intrusion detection systems (IDSs) analysis using ML techniques. IDSs utilizing ML techniques are efficient and precise at identifying network assaults. In data with large dimensional spaces, however, the efficacy of these systems degrades. correspondingly, the case is essential to execute a feasible feature removal technique capable of getting rid of characteristics that have little effect on the classification process. In this paper, we analyze the KDD CUP-'99' intrusion detection dataset used for training and validating ML models. Then, we implement ML classifiers such as Logistic Regression, Decision Tree, K-Nearest Neighbour, Naive Bayes, Bernoulli Naive Bayes, Multinomial Naive Bayes, XG-Boost Classifier, Ada-Boost, Random Forest, SVM, Rocchio classifier, Ridge, Passive-Aggressive classifier, ANN besides Perceptron (PPN), the optimal classifiers are determined by comparing the results of Stochastic Gradient Descent and back-propagation neural networks for IDS, Conventional categorization indicators, such as "accuracy, precision, recall, and the f1-measure, have been used to evaluate the performance of the ML classification algorithms.
Hierarchical Adaptation with Hypernetworks for Few-shot Molecular Property Prediction
Authors: Authors: Shiguang Wu, Yaqing Wang, Quanming Yao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.00614
Pdf link: https://arxiv.org/pdf/2310.00614
Abstract Molecular property prediction (MPP) is important in biomedical applications, which naturally suffers from a lack of labels, thus forming a few-shot learning problem. State-of-the-art approaches are usually based on gradient-based meta learning strategy, which ignore difference in model parameter and molecule's learning difficulty. To address above problems, we propose a novel hierarchical adaptation mechanism for few-shot MPP (HiMPP). The model follows a encoder-predictor framework. First, to make molecular representation property-adaptive, we selectively adapt encoder's parameter by designing a hypernetwork to modulate node embeddings during message propagation. Next, we make molecule-level adaptation by design another hypernetwork, which assigns larger propagating steps for harder molecules in predictor. In this way, molecular representation is transformed by HiMPP hierarchically from property-level to molecular level. Extensive results show that HiMPP obtains the state-of-the-art performance in few-shot MPP problems, and our proposed hierarchical adaptation mechanism is rational and effective.
From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information
Authors: Authors: Zhendong Shi, Xiaoli Wei, Ercan E. Kuruoglu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2310.00642
Pdf link: https://arxiv.org/pdf/2310.00642
Abstract The problem of how to take the right actions to make profits in sequential process continues to be difficult due to the quick dynamics and a significant amount of uncertainty in many application scenarios. In such complicated environments, reinforcement learning (RL), a reward-oriented strategy for optimum control, has emerged as a potential technique to address this strategic decision-making issue. However, reinforcement learning also has some shortcomings that make it unsuitable for solving many financial problems, excessive resource consumption, and inability to quickly obtain optimal solutions, making it unsuitable for quantitative trading markets. In this study, we use two methods to overcome the issue with contextual information: contextual Thompson sampling and reinforcement learning under supervision which can accelerate the iterations in search of the best answer. In order to investigate strategic trading in quantitative markets, we merged the earlier financial trading strategy known as constant proportion portfolio insurance (CPPI) into deep deterministic policy gradient (DDPG). The experimental results show that both methods can accelerate the progress of reinforcement learning to obtain the optimal solution.
The Noise Geometry of Stochastic Gradient Descent: A Quantitative and Analytical Characterization
Authors: Authors: Mingze Wang, Lei Wu
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.00692
Pdf link: https://arxiv.org/pdf/2310.00692
Abstract Empirical studies have demonstrated that the noise in stochastic gradient descent (SGD) aligns favorably with the local geometry of loss landscape. However, theoretical and quantitative explanations for this phenomenon remain sparse. In this paper, we offer a comprehensive theoretical investigation into the aforementioned {\em noise geometry} for over-parameterized linear (OLMs) models and two-layer neural networks. We scrutinize both average and directional alignments, paying special attention to how factors like sample size and input data degeneracy affect the alignment strength. As a specific application, we leverage our noise geometry characterizations to study how SGD escapes from sharp minima, revealing that the escape direction has significant components along flat directions. This is in stark contrast to GD, which escapes only along the sharpest directions. To substantiate our theoretical findings, both synthetic and real-world experiments are provided.
Deterministic Langevin Unconstrained Optimization with Normalizing Flows
Authors: Authors: James M. Sullivan, Uros Seljak
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.00745
Pdf link: https://arxiv.org/pdf/2310.00745
Abstract We introduce a global, gradient-free surrogate optimization strategy for expensive black-box functions inspired by the Fokker-Planck and Langevin equations. These can be written as an optimization problem where the objective is the target function to maximize minus the logarithm of the current density of evaluated samples. This objective balances exploitation of the target objective with exploration of low-density regions. The method, Deterministic Langevin Optimization (DLO), relies on a Normalizing Flow density estimate to perform active learning and select proposal points for evaluation. This strategy differs qualitatively from the widely-used acquisition functions employed by Bayesian Optimization methods, and can accommodate a range of surrogate choices. We demonstrate superior or competitive progress toward objective optima on standard synthetic test functions, as well as on non-convex and multi-modal posteriors of moderate dimension. On real-world objectives, such as scientific and neural network hyperparameter optimization, DLO is competitive with state-of-the-art baselines.
Counterfactual Image Generation for adversarially robust and interpretable Classifiers
Authors: Authors: Rafael Bischof, Florian Scheidegger, Michael A. Kraus, A. Cristiano I. Malossi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2310.00761
Pdf link: https://arxiv.org/pdf/2310.00761
Abstract Neural Image Classifiers are effective but inherently hard to interpret and susceptible to adversarial attacks. Solutions to both problems exist, among others, in the form of counterfactual examples generation to enhance explainability or adversarially augment training datasets for improved robustness. However, existing methods exclusively address only one of the issues. We propose a unified framework leveraging image-to-image translation Generative Adversarial Networks (GANs) to produce counterfactual samples that highlight salient regions for interpretability and act as adversarial samples to augment the dataset for more robustness. This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake". We assess the method's effectiveness by evaluating (i) the produced explainability masks on a semantic segmentation task for concrete cracks and (ii) the model's resilience against the Projected Gradient Descent (PGD) attack on a fruit defects detection problem. Our produced saliency maps are highly descriptive, achieving competitive IoU values compared to classical segmentation models despite being trained exclusively on classification labels. Furthermore, the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.
SMOOT: Saliency Guided Mask Optimized Online Training
Authors: Authors: Ali Karkehabadi, Avesta Sasan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.00772
Pdf link: https://arxiv.org/pdf/2310.00772
Abstract Deep Neural Networks are powerful tools for understanding complex patterns and making decisions. However, their black-box nature impedes a complete understanding of their inner workings. Saliency-Guided Training (SGT) methods try to highlight the prominent features in the model's training based on the output to alleviate this problem. These methods use back-propagation and modified gradients to guide the model toward the most relevant features while keeping the impact on the prediction accuracy negligible. SGT makes the model's final result more interpretable by masking input partially. In this way, considering the model's output, we can infer how each segment of the input affects the output. In the particular case of image as the input, masking is applied to the input pixels. However, the masking strategy and number of pixels which we mask, are considered as a hyperparameter. Appropriate setting of masking strategy can directly affect the model's training. In this paper, we focus on this issue and present our contribution. We propose a novel method to determine the optimal number of masked images based on input, accuracy, and model loss during the training. The strategy prevents information loss which leads to better accuracy values. Also, by integrating the model's performance in the strategy formula, we show that our model represents the salient features more meaningful. Our experimental results demonstrate a substantial improvement in both model accuracy and the prominence of saliency, thereby affirming the effectiveness of our proposed solution.
Sparse Backpropagation for MoE Training
Authors: Authors: Liyuan Liu, Jianfeng Gao, Weizhu Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.00811
Pdf link: https://arxiv.org/pdf/2310.00811
Abstract One defining characteristic of Mixture-of-Expert (MoE) models is their capacity for conducting sparse computation via expert routing, leading to remarkable scalability. However, backpropagation, the cornerstone of deep learning, requires dense computation, thereby posting challenges in MoE gradient computations. Here, we introduce SparseMixer, a scalable gradient estimator that bridges the gap between backpropagation and sparse expert routing. Unlike typical MoE training which strategically neglects certain gradient terms for the sake of sparse computation and scalability, SparseMixer provides scalable gradient approximations for these terms, enabling reliable gradient estimation in MoE training. Grounded in a numerical ODE framework, SparseMixer harnesses the mid-point method, a second-order ODE solver, to deliver precise gradient approximations with negligible computational overhead. Applying SparseMixer to Switch Transformer on both pre-training and machine translation tasks, SparseMixer showcases considerable performance gain, accelerating training convergence up to 2 times.
Energy-dissipative spectral renormalization exponential integrator method for gradient flow problems
Authors: Authors: Dianming Hou, Lili Ju, Zhonghua Qiao
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.00824
Pdf link: https://arxiv.org/pdf/2310.00824
Abstract In this paper, we present a novel spectral renormalization exponential integrator method for solving gradient flow problems. Our method is specifically designed to simultaneously satisfy discrete analogues of the energy dissipation laws and achieve high-order accuracy in time. To accomplish this, our method first incorporates the energy dissipation law into the target gradient flow equation by introducing a time-dependent spectral renormalization (TDSR) factor. Then, the coupled equations are discretized using the spectral approximation in space and the exponential time differencing (ETD) in time. Finally, the resulting fully discrete nonlinear system is decoupled and solved using the Picard iteration at each time step. Furthermore, we introduce an extra enforcing term into the system for updating the TDSR factor, which greatly relaxes the time step size restriction of the proposed method and enhances its computational efficiency. Extensive numerical tests with various gradient flows are also presented to demonstrate the accuracy and effectiveness of our method as well as its high efficiency when combined with an adaptive time-stepping strategy for long-term simulations.
Online Sensitivity Optimization in Differentially Private Learning
Authors: Authors: Filippo Galli, Catuscia Palamidessi, Tommaso Cucinotta
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.00829
Pdf link: https://arxiv.org/pdf/2310.00829
Abstract Training differentially private machine learning models requires constraining an individual's contribution to the optimization process. This is achieved by clipping the $2$-norm of their gradient at a predetermined threshold prior to averaging and batch sanitization. This selection adversely influences optimization in two opposing ways: it either exacerbates the bias due to excessive clipping at lower values, or augments sanitization noise at higher values. The choice significantly hinges on factors such as the dataset, model architecture, and even varies within the same optimization, demanding meticulous tuning usually accomplished through a grid search. In order to circumvent the privacy expenses incurred in hyperparameter tuning, we present a novel approach to dynamically optimize the clipping threshold. We treat this threshold as an additional learnable parameter, establishing a clean relationship between the threshold and the cost function. This allows us to optimize the former with gradient descent, with minimal repercussions on the overall privacy analysis. Our method is thoroughly assessed against alternative fixed and adaptive strategies across diverse datasets, tasks, model dimensions, and privacy levels. Our results demonstrate its comparable or superior performance in all evaluated scenarios, given the same privacy requirements.
Subsurface Characterization using Ensemble-based Approaches with Deep Generative Models
Authors: Authors: Jichao Bao, Hongkyu Yoon, Jonghyun Lee
Subjects: Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.00839
Pdf link: https://arxiv.org/pdf/2310.00839
Abstract Estimating spatially distributed properties such as hydraulic conductivity (K) from available sparse measurements is a great challenge in subsurface characterization. However, the use of inverse modeling is limited for ill-posed, high-dimensional applications due to computational costs and poor prediction accuracy with sparse datasets. In this paper, we combine Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP), a deep generative model that can accurately capture complex subsurface structure, and Ensemble Smoother with Multiple Data Assimilation (ES-MDA), an ensemble-based inversion method, for accurate and accelerated subsurface characterization. WGAN-GP is trained to generate high-dimensional K fields from a low-dimensional latent space and ES-MDA then updates the latent variables by assimilating available measurements. Several subsurface examples are used to evaluate the accuracy and efficiency of the proposed method and the main features of the unknown K fields are characterized accurately with reliable uncertainty quantification
Regulating CPU Temperature With Thermal-Aware Scheduling Using a Reduced Order Learning Thermal Model
Authors: Authors: Anthony Dowling, Lin Jiang, Ming-Cheng Cheng, Yu Liu
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.00854
Pdf link: https://arxiv.org/pdf/2310.00854
Abstract Modern real-time systems utilize considerable amounts of power while executing computation-intensive tasks. The execution of these tasks leads to significant power dissipation and heating of the device. It therefore results in severe thermal issues like temperature escalation, high thermal gradients, and excessive hot spot formation, which may result in degrading chip performance, accelerating device aging, and premature failure. Thermal-Aware Scheduling (TAS) enables the optimization of thermal dissipation to maintain a safe thermal state. In this work, we implement a new TAS algorithm, POD-TAS, which manages the thermal behavior of the cores based on a defined set of states and their transitions. We compare the performances of a dynamic Resistor-Capacitor (RC) thermal circuit simulator (HotSpot) and a reduced order Proper Orthogonal Decomposition (POD)-based thermal model and we select the latter for use in our POD-TAS algorithm. We implement a novel simulation-based evaluation methodology to compare TAS algorithms. This methodology is used to evaluate the performance of the proposed POD-TAS algorithm with high spatiotemporal resolution. Additionally, we compare the performance of a state of the art TAS algorithm, RT-TAS, to our proposed POD-TAS algorithm. Furthermore, we utilize the Clarkson Open-source Multi-physics Benchmark Suite (COMBS) to provide CPU workloads for task scheduling. Our experimental results on a multi-core processor using a set of 4 benchmarks demonstrate that the proposed POD-TAS method can improve thermal performance by decreasing the peak thermal variance by 53.0% and the peak chip temperature by 29.01%. Using a set of 8 benchmarks, the comparison of the algorithms demonstrates that POD-TAS decreases the peak spatial variance of the chip temperature and the peak chip temperature by 29.57% and 26.26% respectively.
Distilling Influences to Mitigate Prediction Churn in Graph Neural Networks
Authors: Authors: Andreas Roth, Thomas Liebig
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.00946
Pdf link: https://arxiv.org/pdf/2310.00946
Abstract Models with similar performances exhibit significant disagreement in the predictions of individual samples, referred to as prediction churn. Our work explores this phenomenon in graph neural networks by investigating differences between models differing only in their initializations in their utilized features for predictions. We propose a novel metric called Influence Difference (ID) to quantify the variation in reasons used by nodes across models by comparing their influence distribution. Additionally, we consider the differences between nodes with a stable and an unstable prediction, positing that both equally utilize different reasons and thus provide a meaningful gradient signal to closely match two models even when the predictions for nodes are similar. Based on our analysis, we propose to minimize this ID in Knowledge Distillation, a domain where a new model should closely match an established one. As an efficient approximation, we introduce DropDistillation (DD) that matches the output for a graph perturbed by edge deletions. Our empirical evaluation of six benchmark datasets for node classification validates the differences in utilized features. DD outperforms previous methods regarding prediction stability and overall performance in all considered Knowledge Distillation experiments.
MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training
Authors: Authors: Daegun Yoon, Sangyoon Oh
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.00967
Pdf link: https://arxiv.org/pdf/2310.00967
Abstract Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase in communication traffic is caused by gradient build-up and inappropriate threshold for gradient selection. To address these challenges, we propose a novel gradient sparsification method called MiCRO. In MiCRO, the gradient vector is partitioned, and each partition is assigned to the corresponding worker. Each worker then selects gradients from its partition, and the aggregated gradients are free from gradient build-up. Moreover, MiCRO estimates the accurate threshold to maintain the communication traffic as per user requirement by minimising the compression ratio error. MiCRO enables near-zero cost gradient sparsification by solving existing problems that hinder the scalability and acceleration of distributed DNN training. In our extensive experiments, MiCRO outperformed state-of-the-art sparsifiers with an outstanding convergence rate.
Efficient Algorithms for the CCA Family: Unconstrained Objectives with Unbiased Gradients
Authors: Authors: James Chapman, Ana Lawry Aguila, Lennie Wells
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.01012
Pdf link: https://arxiv.org/pdf/2310.01012
Abstract The Canonical Correlation Analysis (CCA) family of methods is foundational in multi-view learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. These methods show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. This speed allows us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 variants. Finally, we not only match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, but also establish the first solid theoretical links to classical CCA, laying the groundwork for future insights.
A Robust Machine Learning Approach for Path Loss Prediction in 5G Networks with Nested Cross Validation
Authors: Authors: Ibrahim Yazıcı, Emre Gures
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2310.01030
Pdf link: https://arxiv.org/pdf/2310.01030
Abstract The design and deployment of fifth-generation (5G) wireless networks pose significant challenges due to the increasing number of wireless devices. Path loss has a landmark importance in network performance optimization, and accurate prediction of the path loss, which characterizes the attenuation of signal power during transmission, is critical for effective network planning, coverage estimation, and optimization. In this sense, we utilize machine learning (ML) methods, which overcome conventional path loss prediction models drawbacks, for path loss prediction in a 5G network system to facilitate more accurate network planning, resource optimization, and performance improvement in wireless communication systems. To this end, we utilize a novel approach, nested cross validation scheme, with ML to prevent overfitting, thereby getting better generalization error and stable results for ML deployment. First, we acquire a publicly available dataset obtained through a comprehensive measurement campaign conducted in an urban macro-cell scenario located in Beijing, China. The dataset includes crucial information such as longitude, latitude, elevation, altitude, clutter height, and distance, which are utilized as essential features to predict the path loss in the 5G network system. We deploy Support Vector Regression (SVR), CatBoost Regression (CBR), eXtreme Gradient Boosting Regression (XGBR), Artificial Neural Network (ANN), and Random Forest (RF) methods to predict the path loss, and compare the prediction results in terms of Mean Absolute Error (MAE) and Mean Square Error (MSE). As per obtained results, XGBR outperforms the rest of the methods. It outperforms CBR with a slight performance differences by 0.4 % and 1 % in terms of MAE and MSE metrics, respectively. On the other hand, it outperforms the rest of the methods with clear performance differences.
A Novel Approach for Machine Learning-based Load Balancing in High-speed Train System using Nested Cross Validation
Authors: Authors: Ibrahim Yazici, Emre Gures
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2310.01034
Pdf link: https://arxiv.org/pdf/2310.01034
Abstract Fifth-generation (5G) mobile communication networks have recently emerged in various fields, including highspeed trains. However, the dense deployment of 5G millimeter wave (mmWave) base stations (BSs) and the high speed of moving trains lead to frequent handovers (HOs), which can adversely affect the Quality-of-Service (QoS) of mobile users. As a result, HO optimization and resource allocation are essential considerations for managing mobility in high-speed train systems. In this paper, we model system performance of a high-speed train system with a novel machine learning (ML) approach that is nested cross validation scheme that prevents information leakage from model evaluation into the model parameter tuning, thereby avoiding overfitting and resulting in better generalization error. To this end, we employ ML methods for the high-speed train system scenario. Handover Margin (HOM) and Time-to-Trigger (TTT) values are used as features, and several KPIs are used as outputs, and several ML methods including Gradient Boosting Regression (GBR), Adaptive Boosting (AdaBoost), CatBoost Regression (CBR), Artificial Neural Network (ANN), Kernel Ridge Regression (KRR), Support Vector Regression (SVR), and k-Nearest Neighbor Regression (KNNR) are employed for the problem. Finally, performance comparisons of the cross validation schemes with the methods are made in terms of mean absolute error (MAE) and mean square error (MSE) metrics are made. As per obtained results, boosting methods, ABR, CBR, GBR, with nested cross validation scheme superiorly outperforms conventional cross validation scheme results with the same methods. On the other hand, SVR, KNRR, KRR, ANN with the nested scheme produce promising results for prediction of some KPIs with respect to their conventional scheme employment.
Dataset Condensation for Recommendation
Authors: Authors: Jiahao Wu, Wenqi Fan, Shengcai Liu, Qijiong Liu, Rui He, Qing Li, Ke Tang
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2310.01038
Pdf link: https://arxiv.org/pdf/2310.01038
Abstract Training recommendation models on large datasets often requires significant time and computational resources. Consequently, an emergent imperative has arisen to construct informative, smaller-scale datasets for efficiently training. Dataset compression techniques explored in other domains show potential possibility to address this problem, via sampling a subset or synthesizing a small dataset. However, applying existing approaches to condense recommendation datasets is impractical due to following challenges: (i) sampling-based methods are inadequate in addressing the long-tailed distribution problem; (ii) synthesizing-based methods are not applicable due to discreteness of interactions and large size of recommendation datasets; (iii) neither of them fail to address the specific issue in recommendation of false negative items, where items with potential user interest are incorrectly sampled as negatives owing to insufficient exposure. To bridge this gap, we investigate dataset condensation for recommendation, where discrete interactions are continualized with probabilistic re-parameterization. To avoid catastrophically expensive computations, we adopt a one-step update strategy for inner model training and introducing policy gradient estimation for outer dataset synthesis. To mitigate amplification of long-tailed problem, we compensate long-tailed users in the condensed dataset. Furthermore, we propose to utilize a proxy model to identify false negative items. Theoretical analysis regarding the convergence property is provided. Extensive experiments on multiple datasets demonstrate the efficacy of our method. In particular, we reduce the dataset size by 75% while approximating over 98% of the original performance on Dianping and over 90% on other datasets.
Stability and Generalization for Minibatch SGD and Local SGD
Authors: Authors: Yunwen Lei, Tao Sun, Mingrui Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01139
Pdf link: https://arxiv.org/pdf/2310.01139
Abstract The increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we pioneer the stability and generalization analysis of minibatch and local SGD to understand their learnability. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. Our stability bounds imply optimistic risk bounds which decay fast under a low noise condition. We show both minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.
The Map Equation Goes Neural
Authors: Authors: Christopher Blöcker, Chester Tan, Ingo Scholtes
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.01144
Pdf link: https://arxiv.org/pdf/2310.01144
Abstract Community detection and graph clustering are essential for unsupervised data exploration and understanding the high-level organisation of networked systems. Recently, graph clustering has been highlighted as an under-explored primary task for graph neural networks. While hierarchical graph pooling has been shown to improve performance in graph and node classification tasks, it performs poorly in identifying meaningful clusters. Community detection has a long history in network science, but typically relies on optimising objective functions with custom-tailored search algorithms, not leveraging recent advances in deep learning, particularly from graph neural networks. In this paper, we narrow this gap between the deep learning and network science communities. We consider the map equation, an information-theoretic objective function for community detection. Expressing it in a fully differentiable tensor form that produces soft cluster assignments, we optimise the map equation with deep learning through gradient descent. More specifically, the reformulated map equation is a loss function compatible with any graph neural network architecture, enabling flexible clustering and graph pooling that clusters both graph structure and data features in an end-to-end way, automatically finding an optimum number of clusters without explicit regularisation. We evaluate our approach experimentally using different neural network architectures for unsupervised clustering in synthetic and real data. Our results show that our approach achieves competitive performance against baselines, naturally detects overlapping communities, and avoids over-partitioning sparse graphs.
Convergence proof for first-order position-based dynamics: An efficient scheme for inequality constrained ODEs
Authors: Authors: Steffen Plunder, Sara Merino-Aceituno
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.01215
Pdf link: https://arxiv.org/pdf/2310.01215
Abstract NVIDIA researchers have pioneered an explicit method, position-based dynamics (PBD), for simulating systems with contact forces, gaining widespread use in computer graphics and animation. While the method yields visually compelling real-time simulations with surprising numerical stability, its scientific validity has been questioned due to a lack of rigorous analysis. In this paper, we introduce a new mathematical convergence analysis specifically tailored for PBD applied to first-order dynamics. Utilizing newly derived bounds for projections onto uniformly prox-regular sets, our proof extends classical compactness arguments. Our work paves the way for the reliable application of PBD in various scientific and engineering fields, including particle simulations with volume exclusion, agent-based models in mathematical biology or inequality-constrained gradient-flow models.
Coupling public and private gradient provably helps optimization
Authors: Authors: Ruixuan Liu, Zhiqi Bu, Yu-xiang Wang, Sheng Zha, George Karypis
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.01304
Pdf link: https://arxiv.org/pdf/2310.01304
Abstract The success of large neural networks is crucially determined by the availability of data. It has been observed that training only on a small amount of public data, or privately on the abundant private data can lead to undesirable degradation of accuracy. In this work, we leverage both private and public data to improve the optimization, by coupling their gradients via a weighted linear combination. We formulate an optimal solution for the optimal weight in the convex setting to indicate that the weighting coefficient should be hyperparameter-dependent. Then, we prove the acceleration in the convergence of non-convex loss and the effects of hyper-parameters such as privacy budget, number of iterations, batch size, and model size on the choice of the weighting coefficient. We support our analysis with empirical experiments across language and vision benchmarks, and provide a guideline for choosing the optimal weight of the gradient coupling.
Elephant Neural Networks: Born to Be a Continual Learner
Authors: Authors: Qingfeng Lan, A. Rupam Mahmood
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01365
Pdf link: https://arxiv.org/pdf/2310.01365
Abstract Catastrophic forgetting remains a significant challenge to continual learning for decades. While recent works have proposed effective methods to mitigate this problem, they mainly focus on the algorithmic side. Meanwhile, we do not fully understand what architectural properties of neural networks lead to catastrophic forgetting. This study aims to fill this gap by studying the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting. Our study reveals that, besides sparse representations, the gradient sparsity of activation functions also plays an important role in reducing forgetting. Based on this insight, we propose a new class of activation functions, elephant activation functions, that can generate both sparse representations and sparse gradients. We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting. Our method has broad applicability and benefits for continual learning in regression, class incremental learning, and reinforcement learning tasks. Specifically, we achieves excellent performance on Split MNIST dataset in just one single pass, without using replay buffer, task boundary information, or pre-training.
EXTRACTER: Efficient Texture Matching with Attention and Gradient Enhancing for Large Scale Image Super Resolution
Authors: Authors: Esteban Reyes-Saldana, Mariano Rivera
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01379
Pdf link: https://arxiv.org/pdf/2310.01379
Abstract Recent Reference-Based image super-resolution (RefSR) has improved SOTA deep methods introducing attention mechanisms to enhance low-resolution images by transferring high-resolution textures from a reference high-resolution image. The main idea is to search for matches between patches using LR and Reference image pair in a feature space and merge them using deep architectures. However, existing methods lack the accurate search of textures. They divide images into as many patches as possible, resulting in inefficient memory usage, and cannot manage large images. Herein, we propose a deep search with a more efficient memory usage that reduces significantly the number of image patches and finds the $k$ most relevant texture match for each low-resolution patch over the high-resolution reference patches, resulting in an accurate texture match. We enhance the Super Resolution result adding gradient density information using a simple residual architecture showing competitive metrics results: PSNR and SSMI.
Keyword: super-resolution

Prior Mismatch and Adaptation in PnP-ADMM with a Nonconvex Convergence Analysis
Authors: Authors: Shirin Shoushtari, Jiaming Liu, Edward P. Chandler, M. Salman Asif, Ulugbek S. Kamilov
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.00133
Pdf link: https://arxiv.org/pdf/2310.00133
Abstract Plug-and-Play (PnP) priors is a widely-used family of methods for solving imaging inverse problems by integrating physical measurement models with image priors specified using image denoisers. PnP methods have been shown to achieve state-of-the-art performance when the prior is obtained using powerful deep denoisers. Despite extensive work on PnP, the topic of distribution mismatch between the training and testing data has often been overlooked in the PnP literature. This paper presents a set of new theoretical and numerical results on the topic of prior distribution mismatch and domain adaptation for alternating direction method of multipliers (ADMM) variant of PnP. Our theoretical result provides an explicit error bound for PnP-ADMM due to the mismatch between the desired denoiser and the one used for inference. Our analysis contributes to the work in the area by considering the mismatch under nonconvex data-fidelity terms and expansive denoisers. Our first set of numerical results quantifies the impact of the prior distribution mismatch on the performance of PnP-ADMM on the problem of image super-resolution. Our second set of numerical results considers a simple and effective domain adaption strategy that closes the performance gap due to the use of mismatched denoisers. Our results suggest the relative robustness of PnP-ADMM to prior distribution mismatch, while also showing that the performance gap can be significantly reduced with few training samples from the desired distribution.
Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis
Authors: Authors: Nithin Gopalakrishnan Nair, Anoop Cherian, Suhas Lohit, Ye Wang, Toshiaki Koike-Akino, Vishal M. Patel, Tim K. Marks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.00224
Pdf link: https://arxiv.org/pdf/2310.00224
Abstract Conditional generative models typically demand large annotated training sets to achieve high-quality synthesis. As a result, there has been significant interest in designing models that perform plug-and-play generation, i.e., to use a predefined or pretrained model, which is not explicitly trained on the generative task, to guide the generative process (e.g., using language). However, such guidance is typically useful only towards synthesizing high-level semantics rather than editing fine-grained details as in image-to-image translation tasks. To this end, and capitalizing on the powerful fine-grained generative control offered by the recent diffusion-based generative models, we introduce Steered Diffusion, a generalized framework for photorealistic zero-shot conditional image generation using a diffusion model trained for unconditional generation. The key idea is to steer the image generation of the diffusion model at inference time via designing a loss using a pre-trained inverse model that characterizes the conditional task. This loss modulates the sampling trajectory of the diffusion process. Our framework allows for easy incorporation of multiple conditions during inference. We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution. Our results demonstrate clear qualitative and quantitative improvements over state-of-the-art diffusion-based plug-and-play models while adding negligible additional computational cost.
SSIF: Learning Continuous Image Representation for Spatial-Spectral Super-Resolution
Authors: Authors: Gengchen Mai, Ni Lao, Weiwei Sun, Yuchi Ma, Jiaming Song, Chenlin Meng, Hongxu Ma, Jinmeng Rao, Ziyuan Li, Stefano Ermon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2310.00413
Pdf link: https://arxiv.org/pdf/2310.00413
Abstract Existing digital sensors capture images at fixed spatial and spectral resolutions (e.g., RGB, multispectral, and hyperspectral images), and each combination requires bespoke machine learning models. Neural Implicit Functions partially overcome the spatial resolution challenge by representing an image in a resolution-independent way. However, they still operate at fixed, pre-defined spectral resolutions. To address this challenge, we propose Spatial-Spectral Implicit Function (SSIF), a neural implicit model that represents an image as a function of both continuous pixel coordinates in the spatial domain and continuous wavelengths in the spectral domain. We empirically demonstrate the effectiveness of SSIF on two challenging spatio-spectral super-resolution benchmarks. We observe that SSIF consistently outperforms state-of-the-art baselines even when the baselines are allowed to train separate models at each spectral resolution. We show that SSIF generalizes well to both unseen spatial resolutions and spectral resolutions. Moreover, SSIF can generate high-resolution images that improve the performance of downstream tasks (e.g., land use classification) by 1.7%-7%.
Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes
Authors: Authors: Alloy Das, Sanket Biswas, Umapada Pal, Josep Lladós
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.00558
Pdf link: https://arxiv.org/pdf/2310.00558
Abstract When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system. However, existing state-of-the-art methods employ pretraining and fine-tuning strategies on natural scene datasets, which do not exploit the feature interaction across other complex domains. In this work, we explore and investigate the problem of domain-agnostic scene text spotting, i.e., training a model on multi-domain source data such that it can directly generalize to target domains rather than being specialized for a specific domain or scenario. In this regard, we present the community a text spotting validation benchmark called Under-Water Text (UWT) for noisy underwater scenes to establish an important case study. Moreover, we also design an efficient super-resolution based end-to-end transformer baseline called DA-TextSpotter which achieves comparable or superior performance over existing text spotting architectures for both regular and arbitrary-shaped scene text spotting benchmarks in terms of both accuracy and model efficiency. The dataset, code and pre-trained models will be released upon acceptance.
A New Real-World Video Dataset for the Comparison of Defogging Algorithms
Authors: Authors: Alexandra Duminil, Jean-Philippe Tarel, Roland Brémond
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.01020
Pdf link: https://arxiv.org/pdf/2310.01020
Abstract Video restoration for noise removal, deblurring or super-resolution is attracting more and more attention in the fields of image processing and computer vision. Works on video restoration with data-driven approaches for fog removal are rare however, due to the lack of datasets containing videos in both clear and foggy conditions which are required for deep learning and benchmarking. A new dataset, called REVIDE, was recently proposed for just that purpose. In this paper, we implement the same approach by proposing a new REal-world VIdeo dataset for the comparison of Defogging Algorithms (VIREDA), with various fog densities and ground truths without fog. This small database can serve as a test base for defogging algorithms. A video defogging algorithm is also mentioned (still under development), with the key idea of using temporal redundancy to minimize artefacts and exposure variations between frames. Inspired by the success of Transformers architecture in deep learning for various applications, we select this kind of architecture in a neural network to show the relevance of the proposed dataset.
Prompt-tuning latent diffusion models for inverse problems
Authors: Authors: Hyungjin Chung, Jong Chul Ye, Peyman Milanfar, Mauricio Delbracio
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.01110
Pdf link: https://arxiv.org/pdf/2310.01110
Abstract We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. Existing methods using latent diffusion models for inverse problems typically rely on simple null text prompts, which can lead to suboptimal performance. To address this limitation, we introduce a method for prompt tuning, which jointly optimizes the text embedding on-the-fly while running the reverse diffusion process. This allows us to generate images that are more faithful to the diffusion prior. In addition, we propose a method to keep the evolution of latent variables within the range space of the encoder, by projection. This helps to reduce image artifacts, a major problem when using latent diffusion models instead of pixel-based diffusion models. Our combined method, called P2L, outperforms both image- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting.
EXTRACTER: Efficient Texture Matching with Attention and Gradient Enhancing for Large Scale Image Super Resolution
Authors: Authors: Esteban Reyes-Saldana, Mariano Rivera
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.01379
Pdf link: https://arxiv.org/pdf/2310.01379
Abstract Recent Reference-Based image super-resolution (RefSR) has improved SOTA deep methods introducing attention mechanisms to enhance low-resolution images by transferring high-resolution textures from a reference high-resolution image. The main idea is to search for matches between patches using LR and Reference image pair in a feature space and merge them using deep architectures. However, existing methods lack the accurate search of textures. They divide images into as many patches as possible, resulting in inefficient memory usage, and cannot manage large images. Herein, we propose a deep search with a more efficient memory usage that reduces significantly the number of image patches and finds the $k$ most relevant texture match for each low-resolution patch over the high-resolution reference patches, resulting in an accurate texture match. We enhance the Super Resolution result adding gradient density information using a simple residual architecture showing competitive metrics results: PSNR and SSMI.
Conditional Diffusion Distillation
Authors: Authors: Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, Peyman Milanfar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.01407
Pdf link: https://arxiv.org/pdf/2310.01407
Abstract Generative diffusion models provide strong priors for text-to-image generation and thereby serve as a foundation for conditional generation tasks such as image editing, restoration, and super-resolution. However, one major limitation of diffusion models is their slow sampling time. To address this challenge, we present a novel conditional distillation method designed to supplement the diffusion priors with the help of image conditions, allowing for conditional sampling with very few steps. We directly distill the unconditional pre-training in a single stage through joint-learning, largely simplifying the previous two-stage procedures that involve both distillation and conditional finetuning separately. Furthermore, our method enables a new parameter-efficient distillation mechanism that distills each task with only a small number of additional parameters combined with the shared frozen unconditional backbone. Experiments across multiple tasks including super-resolution, image editing, and depth-to-image generation demonstrate that our method outperforms existing distillation techniques for the same sampling time. Notably, our method is the first distillation strategy that can match the performance of the much slower fine-tuned conditional diffusion models.

zoq / arxiv-updates

New submissions for Tue, 3 Oct 23 #612

Keyword: sgd

On Memorization and Privacy risks of Sharpness Aware Minimization

The Noise Geometry of Stochastic Gradient Descent: A Quantitative and Analytical Characterization

Efficient Algorithms for the CCA Family: Unconstrained Objectives with Unbiased Gradients

Stability and Generalization for Minibatch SGD and Local SGD

Improving Dialogue Management: Quality Datasets vs Models

Keyword: optimization

Low-budget Black-box Optimization Algorithms Evaluated on BBOB and OpenAI Gym

Voice2Action: Language Models as Agent for Efficient Real-Time Interaction in Virtual Reality

Certified Robustness via Dynamic Margin Maximization and Improved Lipschitz Regularization

On the Disconnect Between Theory and Practice of Overparametrized Neural Networks

3D Reconstruction in Noisy Agricultural Environments: A Bayesian Optimization Perspective for View Planning

Primal-Dual Continual Learning: Stability and Plasticity through Lagrange Multipliers

LQ-OCP: Energy-Optimal Control for LQ Problems

Degree Distribution Identifiability of Stochastic Kronecker Graphs

Tight Bounds for Volumetric Spanners and Applications

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

Bridging the Gap Between Foundation Models and Heterogeneous Federated Learning

A bibliometric Analysis on Spectrum Sensing in Wireless Networks

RIS-aided Near-Field MIMO Communications: Codebook and Beam Training Design

Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models

A DSP shared is a DSP earned: HLS Task-Level Multi-Pumping for High-Performance Low-Resource Designs

Diffusion Posterior Illumination for Ambiguity-aware Inverse Rendering

Composition of Control Barrier Functions With Differing Relative Degrees for Safety Under Input Constraints

Distilling Inductive Bias: Knowledge Distillation Beyond Model Compression

Joint Power and 3D Trajectory Optimization for UAV-enabled Wireless Powered Communication Networks with Obstacles

Order-Preserving GFlowNets

Privacy-Preserving Distributed Market Mechanism for Active Distribution Networks

New SDP Roundings and Certifiable Approximation for Cubic Optimization

Joint Scheduling and Trajectory Optimization of Charging UAV in Wireless Rechargeable Sensor Networks

Better Situational Graphs by Inferring High-level Semantic-Relational Concepts

mmWave Beam Selection in Analog Beamforming Using Personalized Federated Learning

Optimizing Parameters of the DC Power Flow

Diff-DOPE: Differentiable Deep Object Pose Estimation

On Memorization and Privacy risks of Sharpness Aware Minimization

Exploring Benchmarks for Self-Driving Labs using Color Matching

Are Graph Neural Networks Optimal Approximation Algorithms?

A primal-dual perspective for distributed TD-learning

Fewer is More: Trojan Attacks on Parameter-Efficient Fine-Tuning

Optimization or Architecture: How to Hack Kalman Filtering

Active Implicit Reconstruction Using One-Shot View Planning

A Simple Yet Effective Strategy to Robustify the Meta Learning Paradigm

Automatic Data Repair: Are We Ready to Deploy?

Efficient MPC for Emergency Evasive Maneuvers, Part II: Comparative Assessment for Hybrid Control

Spectral Neural Networks: Approximation Theory and Optimization Landscape

Deterministic Langevin Unconstrained Optimization with Normalizing Flows

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

SEED: Simple, Efficient, and Effective Data Management via Large Language Models

Data-driven adaptive building thermal controller tuning with constraints: A primal-dual contextual Bayesian optimization approach

Bayesian Design Principles for Frequentist Sequential Learning

Parameter-Efficient Tuning Helps Language Model Alignment

Online Sensitivity Optimization in Differentially Private Learning

Necessary and Sufficient Watermark for Large Language Models

Regulating CPU Temperature With Thermal-Aware Scheduling Using a Reduced Order Learning Thermal Model

Dynamic Manipulation of a Deformable Linear Object: Simulation and Learning

Trained Latent Space Navigation to Prevent Lack of Photorealism in Generated Images on Style-based Models

Multi-Agent Bayesian Optimization with Coupled Black-Box and Affine Constraints

All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization

BeBOP -- Combining Reactive Planning and Bayesian Optimization to Solve Robotic Manipulation Tasks

ViPlanner: Visual Semantic Imperative Learning for Local Navigation

A Robust Machine Learning Approach for Path Loss Prediction in 5G Networks with Nested Cross Validation

A Novel Approach for Machine Learning-based Load Balancing in High-speed Train System using Nested Cross Validation

Language Model Decoding as Direct Metrics Optimization

Advancements in Optimization: Adaptive Differential Evolution with Diversification Strategy

On Fulfilling the Exigent Need for Automating and Modernizing Logistics Infrastructure in India: Enabling AI-based Integration, Digitalization, and Smart Automation of Industrial Parks and Robotic Warehouses

A Novel Approach with Monte-Carlo Simulation and Hybrid Optimization Approach for Inventory Management with Stochastic Demand

Linear attention is (maybe) all you need (to understand transformer optimization)

Non-negative isomorphic neural networks for photonic neuromorphic accelerators

Energy-Guided Continuous Entropic Barycenter Estimation for General Costs

Stability and Generalization for Minibatch SGD and Local SGD

Light Schrödinger Bridge

Graph-Theoretic Bézier Curve Optimization over Safe Corridors for Safe and Smooth Motion Planning

Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning

ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

Cell-Free Bistatic Backscatter Communication: Channel Estimation, Optimization, and Performance Analysis

Learning manipulation of steep granular slopes for fast Mini Rover turning

Coupling public and private gradient provably helps optimization

Optimistic Online Caching for Batched Requests