New submissions for Mon, 1 Jan 24

Keyword: sgd

There is no result

Keyword: optimization

Flying By ML -- CNN Inversion of Affine Transforms

Authors: Authors: L. Van Warren
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.17258
Pdf link: https://arxiv.org/pdf/2312.17258
Abstract This paper describes a machine learning method to automate reading of cockpit gauges, using a CNN to invert affine transformations and deduce aircraft states from instrument images. Validated with synthetic images of a turn-and-bank indicator, this research introduces methods such as generating datasets from a single image, the 'Clean Training Principle' for optimal noise-free training, and CNN interpolation for continuous value predictions from categorical data. It also offers insights into hyperparameter optimization and ML system software engineering.
Improving Low-resource Prompt-based Relation Representation with Multi-view Decoupling Learning
Authors: Authors: Chenghao Fan, Wei Wei, Xiaoye Qu, Zhenyi Lu, Wenfeng Xie, Yu Cheng, Dangyang Chen
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.17267
Pdf link: https://arxiv.org/pdf/2312.17267
Abstract Recently, prompt-tuning with pre-trained language models (PLMs) has demonstrated the significantly enhancing ability of relation extraction (RE) tasks. However, in low-resource scenarios, where the available training data is scarce, previous prompt-based methods may still perform poorly for prompt-based representation learning due to a superficial understanding of the relation. To this end, we highlight the importance of learning high-quality relation representation in low-resource scenarios for RE, and propose a novel prompt-based relation representation method, named MVRE (\underline{M}ulti-\underline{V}iew \underline{R}elation \underline{E}xtraction), to better leverage the capacity of PLMs to improve the performance of RE within the low-resource prompt-tuning paradigm. Specifically, MVRE decouples each relation into different perspectives to encompass multi-view relation representations for maximizing the likelihood during relation inference. Furthermore, we also design a Global-Local loss and a Dynamic-Initialization method for better alignment of the multi-view relation-representing virtual words, containing the semantics of relation labels during the optimization learning process and initialization. Extensive experiments on three benchmark datasets show that our method can achieve state-of-the-art in low-resource settings.
Intelligent Parsing: An Automated Parsing Framework for Extracting Design Semantics from E-commerce Creatives
Authors: Authors: Guandong Li, Xian Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.17283
Pdf link: https://arxiv.org/pdf/2312.17283
Abstract In the industrial e-commerce landscape, creative designs such as banners and posters are ubiquitous. Extracting structured semantic information from creative e-commerce design materials (manuscripts crafted by designers) to obtain design semantics represents a core challenge in the realm of intelligent design. In this paper, we propose a comprehensive automated framework for intelligently parsing creative materials. This framework comprises material recognition, preprocess, smartname, and label layers. The material recognition layer consolidates various detection and recognition interfaces, covering business aspects including detection of auxiliary areas within creative materials and layer-level detection, alongside label identification. Algorithmically, it encompasses a variety of coarse-to-fine methods such as Cascade RCNN, GFL, and other models. The preprocess layer involves filtering creative layers and grading creative materials. The smartname layer achieves intelligent naming for creative materials, while the label layer covers multi-level tagging for creative materials, enabling tagging at different hierarchical levels. Intelligent parsing constitutes a complete parsing framework that significantly aids downstream processes such as intelligent creation, creative optimization, and material library construction. Within the practical business applications at Suning, it markedly enhances the exposure, circulation, and click-through rates of creative materials, expediting the closed-loop production of creative materials and yielding substantial benefits.
Optimizing watermarks for large language models
Authors: Authors: Bram Wouters
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2312.17295
Pdf link: https://arxiv.org/pdf/2312.17295
Abstract With the rise of large language models (LLMs) and concerns about potential misuse, watermarks for generative LLMs have recently attracted much attention. An important aspect of such watermarks is the trade-off between their identifiability and their impact on the quality of the generated text. This paper introduces a systematic approach to this trade-off in terms of a multi-objective optimization problem. For a large class of robust, efficient watermarks, the associated Pareto optimal solutions are identified and shown to outperform the currently default watermark.
Improving Intrusion Detection with Domain-Invariant Representation Learning in Latent Space
Authors: Authors: Padmaksha Roy, Tyler Cody, Himanshu Singhal, Kevin Choi, Ming Jin
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.17300
Pdf link: https://arxiv.org/pdf/2312.17300
Abstract Domain generalization focuses on leveraging knowledge from multiple related domains with ample training data and labels to enhance inference on unseen in-distribution (IN) and out-of-distribution (OOD) domains. In our study, we introduce a two-phase representation learning technique using multi-task learning. This approach aims to cultivate a latent space from features spanning multiple domains, encompassing both native and cross-domains, to amplify generalization to IN and OOD territories. Additionally, we attempt to disentangle the latent space by minimizing the mutual information between the prior and latent space, effectively de-correlating spurious feature correlations. Collectively, the joint optimization will facilitate domain-invariant feature learning. We assess the model's efficacy across multiple cybersecurity datasets, using standard classification metrics on both unseen IN and OOD sets, and juxtapose the results with contemporary domain generalization methods.
A randomized algorithm to solve reduced rank operator regression
Authors: Authors: Giacomo Turri, Vladimir Kostic, Pietro Novelli, Massimiliano Pontil
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.17348
Pdf link: https://arxiv.org/pdf/2312.17348
Abstract We present and analyze an algorithm designed for addressing vector-valued regression problems involving possibly infinite-dimensional input and output spaces. The algorithm is a randomized adaptation of reduced rank regression, a technique to optimally learn a low-rank vector-valued function (i.e. an operator) between sampled data via regularized empirical risk minimization with rank constraints. We propose Gaussian sketching techniques both for the primal and dual optimization objectives, yielding Randomized Reduced Rank Regression (R4) estimators that are efficient and accurate. For each of our R4 algorithms we prove that the resulting regularized empirical risk is, in expectation w.r.t. randomness of a sketch, arbitrarily close to the optimal value when hyper-parameteres are properly tuned. Numerical expreriments illustrate the tightness of our bounds and show advantages in two distinct scenarios: (i) solving a vector-valued regression problem using synthetic and large-scale neuroscience datasets, and (ii) regressing the Koopman operator of a nonlinear stochastic dynamical system.
SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms
Authors: Authors: Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Robert Gower, Martin Takáč
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.17369
Pdf link: https://arxiv.org/pdf/2312.17369
Abstract Adaptive optimization methods are widely recognized as among the most popular approaches for training Deep Neural Networks (DNNs). Techniques such as Adam, AdaGrad, and AdaHessian utilize a preconditioner that modifies the search direction by incorporating information about the curvature of the objective function. However, despite their adaptive characteristics, these methods still require manual fine-tuning of the step-size. This, in turn, impacts the time required to solve a particular problem. This paper presents an optimization framework named SANIA to tackle these challenges. Beyond eliminating the need for manual step-size hyperparameter settings, SANIA incorporates techniques to address poorly scaled or ill-conditioned problems. We also explore several preconditioning methods, including Hutchinson's method, which approximates the Hessian diagonal of the loss function. We conclude with an extensive empirical examination of the proposed techniques across classification tasks, covering both convex and non-convex contexts.
Beyond PID Controllers: PPO with Neuralized PID Policy for Proton Beam Intensity Control in Mu2e
Authors: Authors: Chenwei Xu, Jerry Yao-Chieh Hu, Aakaash Narayanan, Mattson Thieme, Vladimir Nagaslaev, Mark Austin, Jeremy Arnold, Jose Berlioz, Pierrick Hanlet, Aisha Ibrahim, Dennis Nicklaus, Jovan Mitrevski, Jason Michael St.John, Gauri Pradhan, Andrea Saewert, Kiyomi Seiya, Brian Schupbach, Randy Thurman-Keup, Nhan Tran, Rui Shi, Seda Ogrenci, Alexis Maya-Isabelle Shuping, Kyle Hazelwood, Han Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Accelerator Physics (physics.acc-ph)
Arxiv link: https://arxiv.org/abs/2312.17372
Pdf link: https://arxiv.org/pdf/2312.17372
Abstract We introduce a novel Proximal Policy Optimization (PPO) algorithm aimed at addressing the challenge of maintaining a uniform proton beam intensity delivery in the Muon to Electron Conversion Experiment (Mu2e) at Fermi National Accelerator Laboratory (Fermilab). Our primary objective is to regulate the spill process to ensure a consistent intensity profile, with the ultimate goal of creating an automated controller capable of providing real-time feedback and calibration of the Spill Regulation System (SRS) parameters on a millisecond timescale. We treat the Mu2e accelerator system as a Markov Decision Process suitable for Reinforcement Learning (RL), utilizing PPO to reduce bias and enhance training stability. A key innovation in our approach is the integration of a neuralized Proportional-Integral-Derivative (PID) controller into the policy function, resulting in a significant improvement in the Spill Duty Factor (SDF) by 13.6%, surpassing the performance of the current PID controller baseline by an additional 1.6%. This paper presents the preliminary offline results based on a differentiable simulator of the Mu2e accelerator. It paves the groundwork for real-time implementations and applications, representing a crucial step towards automated proton beam intensity control for the Mu2e experiment.
Analyzing and Enhancing the Backward-Pass Convergence of Unrolled Optimization
Authors: Authors: James Kotary, Jacob Christopher, My H Dinh, Ferdinando Fioretto
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.17394
Pdf link: https://arxiv.org/pdf/2312.17394
Abstract The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form. One typical strategy is algorithm unrolling, which relies on automatic differentiation through the entire chain of operations executed by an iterative optimization solver. This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is asymptotically equivalent to the solution of a linear system by a particular iterative method. Several practical pitfalls of unrolling are demonstrated in light of these insights, and a system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations. Experiments over various end-to-end optimization and learning tasks demonstrate the advantages of this system both computationally, and in terms of flexibility over various optimization problem forms.
Parameter Optimization with Conscious Allocation (POCA)
Authors: Authors: Joshua Inman, Tanmay Khandait, Giulia Pedrielli, Lalitha Sankar
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.17404
Pdf link: https://arxiv.org/pdf/2312.17404
Abstract The performance of modern machine learning algorithms depends upon the selection of a set of hyperparameters. Common examples of hyperparameters are learning rate and the number of layers in a dense neural network. Auto-ML is a branch of optimization that has produced important contributions in this area. Within Auto-ML, hyperband-based approaches, which eliminate poorly-performing configurations after evaluating them at low budgets, are among the most effective. However, the performance of these algorithms strongly depends on how effectively they allocate the computational budget to various hyperparameter configurations. We present the new Parameter Optimization with Conscious Allocation (POCA), a hyperband-based algorithm that adaptively allocates the inputted budget to the hyperparameter configurations it generates following a Bayesian sampling scheme. We compare POCA to its nearest competitor at optimizing the hyperparameters of an artificial toy function and a deep neural network and find that POCA finds strong configurations faster in both settings.
LEFL: Low Entropy Client Sampling in Federated Learning
Authors: Authors: Waqwoya Abebe, Pablo Munoz, Ali Jannesari
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.17430
Pdf link: https://arxiv.org/pdf/2312.17430
Abstract Federated learning (FL) is a machine learning paradigm where multiple clients collaborate to optimize a single global model using their private data. The global model is maintained by a central server that orchestrates the FL training process through a series of training rounds. In each round, the server samples clients from a client pool before sending them its latest global model parameters for further optimization. Naive sampling strategies implement random client sampling and fail to factor client data distributions for privacy reasons. Hence we proposes an alternative sampling strategy by performing a one-time clustering of clients based on their model's learned high-level features while respecting data privacy. This enables the server to perform stratified client sampling across clusters in every round. We show datasets of sampled clients selected with this approach yield a low relative entropy with respect to the global data distribution. Consequently, the FL training becomes less noisy and significantly improves the convergence of the global model by as much as 7.4% in some experiments. Furthermore, it also significantly reduces the communication rounds required to achieve a target accuracy.
Efficient optimization-based trajectory planning
Authors: Authors: Jiayu Fan, Nikolce Murgovski, Jun Liang
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.17440
Pdf link: https://arxiv.org/pdf/2312.17440
Abstract This study proposes a unified optimization-based planning framework that addresses the precise and efficient navigation of a controlled object within a constrained region, while contending with obstacles. We focus on handling two collision avoidance problems, i.e., the object not colliding with obstacles and not colliding with boundaries of the constrained region. The object or obstacle is denoted as a union of convex polytopes and ellipsoids, and the constrained region is denoted as an intersection of such convex sets. Using these representations, collision avoidance can be approached by formulating explicit constraints that separate two convex sets, or ensure that a convex set is contained in another convex set, referred to as separating constraints and containing constraints, respectively. We propose to use the hyperplane separation theorem to formulate differentiable separating constraints, and utilize the S-procedure and geometrical methods to formulate smooth containing constraints. We state that compared to the state of the art, the proposed formulations allow a considerable reduction in nonlinear program size and geometry-based initialization in auxiliary variables used to formulate collision avoidance constraints. Finally, the efficacy of the proposed unified planning framework is evaluated in two contexts, autonomous parking in tractor-trailer vehicles and overtaking on curved lanes. The results in both cases exhibit an improved computational performance compared to existing methods.
Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET
Authors: Authors: Yifei Zhou, Xuchu Huang, Jianyi Yang, Kai Ni, Hussam Amrouch, Cheng Zhuo, Xunxhao Yin
Subjects: Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2312.17442
Pdf link: https://arxiv.org/pdf/2312.17442
Abstract Compute-in-memory (CiM) is a promising solution for addressing the challenges of artificial intelligence (AI) and the Internet of Things (IoT) hardware such as 'memory wall' issue. Specifically, CiM employing nonvolatile memory (NVM) devices in a crossbar structure can efficiently accelerate multiply-accumulation (MAC) computation, a crucial operator in neural networks among various AI models. Low power CiM designs are thus highly desired for further energy efficiency optimization on AI models. Ferroelectric FET (FeFET), an emerging device, is attractive for building ultra-low power CiM array due to CMOS compatibility, high ION/IOFF ratio, etc. Recent studies have explored FeFET based CiM designs that achieve low power consumption. Nevertheless, subthreshold-operated FeFETs, where the operating voltages are scaled down to the subthreshold region to reduce array power consumption, are particularly vulnerable to temperature drift, leading to accuracy degradation. To address this challenge, we propose a temperature-resilient 2T-1FeFET CiM design that performs MAC operations reliably at subthreahold region from 0 to 85 Celsius, while consuming ultra-low power. Benchmarked against the VGG neural network architecture running the CIFAR-10 dataset, the proposed 2T-1FeFET CiM design achieves 89.45% CIFAR-10 test accuracy. Compared to previous FeFET based CiM designs, it exhibits immunity to temperature drift at an 8-bit wordlength scale, and achieves better energy efficiency with 2866 TOPS/W.
Solving Decision-Dependent Games by Learning from Feedback
Authors: Authors: Killian Wood, Ahmed Zamzam, Emiliano Dall'Anese
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.17471
Pdf link: https://arxiv.org/pdf/2312.17471
Abstract This paper tackles the problem of solving stochastic optimization problems with a decision-dependent distribution in the setting of stochastic monotone games and when the distributional dependence is unknown. A two-stage approach is proposed, which initially involves estimating the distributional dependence on decision variables, and subsequently optimizing over the estimated distributional map. The paper presents guarantees for the approximation of the cost of each agent. Furthermore, a stochastic gradient-based algorithm is developed and analyzed for finding the Nash equilibrium in a distributed fashion. Numerical simulations are provided for a novel electric vehicle charging market formulation using real-world data.
Noise-free Optimization in Early Training Steps for Image Super-Resolution
Authors: Authors: MinKyu Lee, Jae-Pil Heo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.17526
Pdf link: https://arxiv.org/pdf/2312.17526
Abstract Recent deep-learning-based single image super-resolution (SISR) methods have shown impressive performance whereas typical methods train their networks by minimizing the pixel-wise distance with respect to a given high-resolution (HR) image. However, despite the basic training scheme being the predominant choice, its use in the context of ill-posed inverse problems has not been thoroughly investigated. In this work, we aim to provide a better comprehension of the underlying constituent by decomposing target HR images into two subcomponents: (1) the optimal centroid which is the expectation over multiple potential HR images, and (2) the inherent noise defined as the residual between the HR image and the centroid. Our findings show that the current training scheme cannot capture the ill-posed nature of SISR and becomes vulnerable to the inherent noise term, especially during early training steps. To tackle this issue, we propose a novel optimization method that can effectively remove the inherent noise term in the early steps of vanilla training by estimating the optimal centroid and directly optimizing toward the estimation. Experimental results show that the proposed method can effectively enhance the stability of vanilla training, leading to overall performance gain. Codes are available at github.com/2minkyulee/ECO.
Informative Rays Selection for Few-Shot Neural Radiance Fields
Authors: Authors: Marco Orsingher, Anthony Dell'Eva, Paolo Zani, Paolo Medici, Massimo Bertozzi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.17561
Pdf link: https://arxiv.org/pdf/2312.17561
Abstract Neural Radiance Fields (NeRF) have recently emerged as a powerful method for image-based 3D reconstruction, but the lengthy per-scene optimization limits their practical usage, especially in resource-constrained settings. Existing approaches solve this issue by reducing the number of input views and regularizing the learned volumetric representation with either complex losses or additional inputs from other modalities. In this paper, we present KeyNeRF, a simple yet effective method for training NeRF in few-shot scenarios by focusing on key informative rays. Such rays are first selected at camera level by a view selection algorithm that promotes baseline diversity while guaranteeing scene coverage, then at pixel level by sampling from a probability distribution based on local image entropy. Our approach performs favorably against state-of-the-art methods, while requiring minimal changes to existing NeRF codebases.
CAD-compatible structural shape optimization with a movable Bézier tetrahedral mesh
Authors: Authors: Jorge López, Cosmin Anitescu, Timon Rabczuk
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2312.17575
Pdf link: https://arxiv.org/pdf/2312.17575
Abstract This paper presents the development of a complete CAD-compatible framework for structural shape optimization in 3D. The boundaries of the domain are described using NURBS while the interior is discretized with B\'ezier tetrahedra. The tetrahedral mesh is obtained from the mesh generator software Gmsh. A methodology to reconstruct the NURBS surfaces from the triangular faces of the boundary mesh is presented. The description of the boundary is used for the computation of the analytical sensitivities with respect to the control points employed in surface design. Further, the mesh is updated at each iteration of the structural optimization process by a pseudo-elastic moving mesh method. In this procedure, the existing mesh is deformed to match the updated surface and therefore reduces the need for remeshing. Numerical examples are presented to test the performance of the proposed method. The use of the movable mesh technique results in a considerable decrease in the computational effort for the numerical examples.
Decision-focused predictions via pessimistic bilevel optimization: a computational study
Authors: Authors: Víctor Bucarey, Sophia Calderón, Gonzalo Muñoz, Frederic Semet
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.17640
Pdf link: https://arxiv.org/pdf/2312.17640
Abstract Dealing with uncertainty in optimization parameters is an important and longstanding challenge. Typically, uncertain parameters are predicted accurately, and then a deterministic optimization problem is solved. However, the decisions produced by this so-called \emph{predict-then-optimize} procedure can be highly sensitive to uncertain parameters. In this work, we contribute to recent efforts in producing \emph{decision-focused} predictions, i.e., to build predictive models that are constructed with the goal of minimizing a \emph{regret} measure on the decisions taken with them. We formulate the exact expected regret minimization as a pessimistic bilevel optimization model. Then, using duality arguments, we reformulate it as a non-convex quadratic optimization problem. Finally, we show various computational techniques to achieve tractability. We report extensive computational results on shortest-path instances with uncertain cost vectors. Our results indicate that our approach can improve training performance over the approach of Elmachtoub and Grigas (2022), a state-of-the-art method for decision-focused learning.
Keyword: adam

SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms
Authors: Authors: Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Robert Gower, Martin Takáč
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.17369
Pdf link: https://arxiv.org/pdf/2312.17369
Abstract Adaptive optimization methods are widely recognized as among the most popular approaches for training Deep Neural Networks (DNNs). Techniques such as Adam, AdaGrad, and AdaHessian utilize a preconditioner that modifies the search direction by incorporating information about the curvature of the objective function. However, despite their adaptive characteristics, these methods still require manual fine-tuning of the step-size. This, in turn, impacts the time required to solve a particular problem. This paper presents an optimization framework named SANIA to tackle these challenges. Beyond eliminating the need for manual step-size hyperparameter settings, SANIA incorporates techniques to address poorly scaled or ill-conditioned problems. We also explore several preconditioning methods, including Hutchinson's method, which approximates the Hessian diagonal of the loss function. We conclude with an extensive empirical examination of the proposed techniques across classification tasks, covering both convex and non-convex contexts.
Keyword: gradient

Generating gradients in the energy landscape using rectified linear type cost functions for efficiently solving 0/1 matrix factorization in Simulated Annealing
Authors: Authors: Makiko Konoshima, Hirotaka Tamura, Yoshiyuki Kabashima
Subjects: Machine Learning (cs.LG); Applied Physics (physics.app-ph); Data Analysis, Statistics and Probability (physics.data-an)
Arxiv link: https://arxiv.org/abs/2312.17272
Pdf link: https://arxiv.org/pdf/2312.17272
Abstract The 0/1 matrix factorization defines matrix products using logical AND and OR as product-sum operators, revealing the factors influencing various decision processes. Instances and their characteristics are arranged in rows and columns. Formulating matrix factorization as an energy minimization problem and exploring it with Simulated Annealing (SA) theoretically enables finding a minimum solution in sufficient time. However, searching for the optimal solution in practical time becomes problematic when the energy landscape has many plateaus with flat slopes. In this work, we propose a method to facilitate the solution process by applying a gradient to the energy landscape, using a rectified linear type cost function readily available in modern annealing machines. We also propose a method to quickly obtain a solution by updating the cost function's gradient during the search process. Numerical experiments were conducted, confirming the method's effectiveness with both noise-free artificial and real data.
Gradient Flossing: Improving Gradient Descent through Dynamic Control of Jacobians
Authors: Authors: Rainer Engelken
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Chaotic Dynamics (nlin.CD); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.17306
Pdf link: https://arxiv.org/pdf/2312.17306
Abstract Training recurrent neural networks (RNNs) remains a challenge due to the instability of gradients across long time horizons, which can lead to exploding and vanishing gradients. Recent research has linked these problems to the values of Lyapunov exponents for the forward-dynamics, which describe the growth or shrinkage of infinitesimal perturbations. Here, we propose gradient flossing, a novel approach to tackling gradient instability by pushing Lyapunov exponents of the forward dynamics toward zero during learning. We achieve this by regularizing Lyapunov exponents through backpropagation using differentiable linear algebra. This enables us to "floss" the gradients, stabilizing them and thus improving network training. We demonstrate that gradient flossing controls not only the gradient norm but also the condition number of the long-term Jacobian, facilitating multidimensional error feedback propagation. We find that applying gradient flossing prior to training enhances both the success rate and convergence speed for tasks involving long time horizons. For challenging tasks, we show that gradient flossing during training can further increase the time horizon that can be bridged by backpropagation through time. Moreover, we demonstrate the effectiveness of our approach on various RNN architectures and tasks of variable temporal complexity. Additionally, we provide a simple implementation of our gradient flossing algorithm that can be used in practice. Our results indicate that gradient flossing via regularizing Lyapunov exponents can significantly enhance the effectiveness of RNN training and mitigate the exploding and vanishing gradient problem.
A non-intrusive neural-network based gradient algorithm for parameter estimation in non-stationary elasticity
Authors: Authors: Stefan Frei, Jan Reichle, Stefan Volkwein
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.17373
Pdf link: https://arxiv.org/pdf/2312.17373
Abstract We present a non-intrusive gradient algorithm for parameter estimation problems in non-stationary elasticity. To avoid multiple (and potentially expensive) solutions of the underlying partial differential equation (PDE), we approximate the PDE solver by a neural network within the gradient algorithm. The network is trained offline for a given set of parameters. The algorithm is applied to an unsteady linear-elastic contact problem; its convergence and approximation properties are investigated numerically.
Solving Decision-Dependent Games by Learning from Feedback
Authors: Authors: Killian Wood, Ahmed Zamzam, Emiliano Dall'Anese
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.17471
Pdf link: https://arxiv.org/pdf/2312.17471
Abstract This paper tackles the problem of solving stochastic optimization problems with a decision-dependent distribution in the setting of stochastic monotone games and when the distributional dependence is unknown. A two-stage approach is proposed, which initially involves estimating the distributional dependence on decision variables, and subsequently optimizing over the estimated distributional map. The paper presents guarantees for the approximation of the cost of each agent. Furthermore, a stochastic gradient-based algorithm is developed and analyzed for finding the Nash equilibrium in a distributed fashion. Numerical simulations are provided for a novel electric vehicle charging market formulation using real-world data.
RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation
Authors: Authors: Weiying Xie, Zixuan Wang, Jitao Ma, Daixun Li, Yunsong Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.17530
Pdf link: https://arxiv.org/pdf/2312.17530
Abstract Distributed deep learning has recently been attracting more attention in remote sensing (RS) applications due to the challenges posed by the increased amount of open data that are produced daily by Earth observation programs. However, the high communication costs of sending model updates among multiple nodes are a significant bottleneck for scalable distributed learning. Gradient sparsification has been validated as an effective gradient compression (GC) technique for reducing communication costs and thus accelerating the training speed. Existing state-of-the-art gradient sparsification methods are mostly based on the "larger-absolute-more-important" criterion, ignoring the importance of small gradients, which is generally observed to affect the performance. Inspired by informative representation of manifold structures from neighborhood information, we propose a simple yet effective dynamic gradient compression scheme leveraging neighborhood statistics indicator for RS image interpretation, termed RS-DGC. We first enhance the interdependence between gradients by introducing the gradient neighborhood to reduce the effect of random noise. The key component of RS-DGC is a Neighborhood Statistical Indicator (NSI), which can quantify the importance of gradients within a specified neighborhood on each node to sparsify the local gradients before gradient transmission in each iteration. Further, a layer-wise dynamic compression scheme is proposed to track the importance changes of each layer in real time. Extensive downstream tasks validate the superiority of our method in terms of intelligent interpretation of RS images. For example, we achieve an accuracy improvement of 0.51% with more than 50 times communication compression on the NWPU-RESISC45 dataset using VGG-19 network.
Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training
Authors: Authors: Dongfang Li, Baotian Hu, Qingcai Chen, Shan He
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2312.17591
Pdf link: https://arxiv.org/pdf/2312.17591
Abstract Feature attribution methods highlight the important input tokens as explanations to model predictions, which have been widely applied to deep neural networks towards trustworthy AI. However, recent works show that explanations provided by these methods face challenges of being faithful and robust. In this paper, we propose a method with Robustness improvement and Explanation Guided training towards more faithful EXplanations (REGEX) for text classification. First, we improve model robustness by input gradient regularization technique and virtual adversarial training. Secondly, we use salient ranking to mask noisy tokens and maximize the similarity between model attention and feature attribution, which can be seen as a self-training procedure without importing other external information. We conduct extensive experiments on six datasets with five attribution methods, and also evaluate the faithfulness in the out-of-domain setting. The results show that REGEX improves fidelity metrics of explanations in all settings and further achieves consistent gains based on two randomization tests. Moreover, we show that using highlight explanations produced by REGEX to train select-then-predict models results in comparable task performance to the end-to-end method.
Solar Radiation Prediction in the UTEQ based on Machine Learning Models
Authors: Authors: Jordy Anchundia Troncoso, Ángel Torres Quijije, Byron Oviedo, Cristian Zambrano-Vega
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.17659
Pdf link: https://arxiv.org/pdf/2312.17659
Abstract This research explores the effectiveness of various Machine Learning (ML) models used to predicting solar radiation at the Central Campus of the State Technical University of Quevedo (UTEQ). The data was obtained from a pyranometer, strategically located in a high area of the campus. This instrument continuously recorded solar irradiance data since 2020, offering a comprehensive dataset encompassing various weather conditions and temporal variations. After a correlation analysis, temperature and the time of day were identified as the relevant meteorological variables that influenced the solar irradiance. Different machine learning algorithms such as Linear Regression, K-Nearest Neighbors, Decision Tree, and Gradient Boosting were compared using the evaluation metrics Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the Coefficient of Determination ($R^2$). The study revealed that Gradient Boosting Regressor exhibited superior performance, closely followed by the Random Forest Regressor. These models effectively captured the non-linear patterns in solar radiation, as evidenced by their low MSE and high $R^2$ values. With the aim of assess the performance of our ML models, we developed a web-based tool for the Solar Radiation Forecasting in the UTEQ available at this http URL The results obtained demonstrate the effectiveness of our ML models in solar radiation prediction and contribute a practical utility in real-time solar radiation forecasting, aiding in efficient solar energy management.
Principled Gradient-based Markov Chain Monte Carlo for Text Generation
Authors: Authors: Li Du, Afra Amini, Lucas Torroba Hennigen, Xinyan Velocity Yu, Jason Eisner, Holden Lee, Ryan Cotterell
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.17710
Pdf link: https://arxiv.org/pdf/2312.17710
Abstract Recent papers have demonstrated the possibility of energy-based text generation by adapting gradient-based sampling algorithms, a paradigm of MCMC algorithms that promises fast convergence. However, as we show in this paper, previous attempts on this approach to text generation all fail to sample correctly from the target language model distributions. To address this limitation, we consider the problem of designing text samplers that are faithful, meaning that they have the target text distribution as its limiting distribution. We propose several faithful gradient-based sampling algorithms to sample from the target energy-based text distribution correctly, and study their theoretical properties. Through experiments on various forms of text generation, we demonstrate that faithful samplers are able to generate more fluent text while adhering to the control objectives better.
Keyword: super-resolution

Noise-free Optimization in Early Training Steps for Image Super-Resolution
Authors: Authors: MinKyu Lee, Jae-Pil Heo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.17526
Pdf link: https://arxiv.org/pdf/2312.17526
Abstract Recent deep-learning-based single image super-resolution (SISR) methods have shown impressive performance whereas typical methods train their networks by minimizing the pixel-wise distance with respect to a given high-resolution (HR) image. However, despite the basic training scheme being the predominant choice, its use in the context of ill-posed inverse problems has not been thoroughly investigated. In this work, we aim to provide a better comprehension of the underlying constituent by decomposing target HR images into two subcomponents: (1) the optimal centroid which is the expectation over multiple potential HR images, and (2) the inherent noise defined as the residual between the HR image and the centroid. Our findings show that the current training scheme cannot capture the ill-posed nature of SISR and becomes vulnerable to the inherent noise term, especially during early training steps. To tackle this issue, we propose a novel optimization method that can effectively remove the inherent noise term in the early steps of vanilla training by estimating the optimal centroid and directly optimizing toward the estimation. Experimental results show that the proposed method can effectively enhance the stability of vanilla training, leading to overall performance gain. Codes are available at github.com/2minkyulee/ECO.

zoq / arxiv-updates

New submissions for Mon, 1 Jan 24 #676

Keyword: sgd

Keyword: optimization

Flying By ML -- CNN Inversion of Affine Transforms

Improving Low-resource Prompt-based Relation Representation with Multi-view Decoupling Learning

Intelligent Parsing: An Automated Parsing Framework for Extracting Design Semantics from E-commerce Creatives

Optimizing watermarks for large language models

Improving Intrusion Detection with Domain-Invariant Representation Learning in Latent Space

A randomized algorithm to solve reduced rank operator regression

SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms

Beyond PID Controllers: PPO with Neuralized PID Policy for Proton Beam Intensity Control in Mu2e

Analyzing and Enhancing the Backward-Pass Convergence of Unrolled Optimization

Parameter Optimization with Conscious Allocation (POCA)

LEFL: Low Entropy Client Sampling in Federated Learning

Efficient optimization-based trajectory planning

Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET

Solving Decision-Dependent Games by Learning from Feedback

Noise-free Optimization in Early Training Steps for Image Super-Resolution

Informative Rays Selection for Few-Shot Neural Radiance Fields

CAD-compatible structural shape optimization with a movable Bézier tetrahedral mesh

Decision-focused predictions via pessimistic bilevel optimization: a computational study

Keyword: adam

SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms

Keyword: gradient

Generating gradients in the energy landscape using rectified linear type cost functions for efficiently solving 0/1 matrix factorization in Simulated Annealing

Gradient Flossing: Improving Gradient Descent through Dynamic Control of Jacobians

A non-intrusive neural-network based gradient algorithm for parameter estimation in non-stationary elasticity

Solving Decision-Dependent Games by Learning from Feedback

RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation

Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Solar Radiation Prediction in the UTEQ based on Machine Learning Models

Principled Gradient-based Markov Chain Monte Carlo for Text Generation

Keyword: super-resolution

Noise-free Optimization in Early Training Steps for Image Super-Resolution