New submissions for Fri, 13 Oct 23

Keyword: sgd

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

Authors: Authors: Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.07831
Pdf link: https://arxiv.org/pdf/2310.07831
Abstract Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works that study the convergence of the average iterate, we study the last iterate, which is what most people use in practice. When considering only worst-case analysis, our theory predicts that the best choice is the linear decay schedule: a popular choice in practice that sets the stepsize proportionally to $1 - t/T$, where $t$ is the current iteration and $T$ is the total number of steps. To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task. These refined schedules exhibit learning rate warm-up and rapid learning rate annealing near the end of training. Ours is the first systematic approach to automatically yield both of these properties. We perform the most comprehensive evaluation of learning rate schedules to date, evaluating across 10 diverse deep learning problems, a series of LLMs, and a suite of logistic regression problems. We validate that overall, the linear-decay schedule matches or outperforms all commonly used default schedules including cosine annealing, and that our schedule refinement method gives further improvements.
Differentially Private Non-convex Learning for Multi-layer Neural Networks
Authors: Authors: Hanpu Shen, Cheng-Long Wang, Zihang Xiang, Yiming Ying, Di Wang
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.08425
Pdf link: https://arxiv.org/pdf/2310.08425
Abstract This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded and Lipschitz continuous. We propose several algorithms and our analysis demonstrates the feasibility of achieving an excess population risk that remains invariant to the data dimension. We also delve into the scenario involving the ReLU link function, and our findings mirror those of the bounded link function. We conclude this section by contrasting well-specified and misspecified models, using ReLU regression as a representative example. In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.
Keyword: optimization

Optimizing the concentration ratio of multi-faceted focusing heliostats
Authors: Authors: F. Henault
Subjects: Systems and Control (eess.SY); Optics (physics.optics)
Arxiv link: https://arxiv.org/abs/2310.07721
Pdf link: https://arxiv.org/pdf/2310.07721
Abstract This technical note aims at optimizing the concentration ratio of multi-faceted focusing heliostats implemented into a solar tower power plant. The ideal shape of a heliostat located off-axis in the field is known to be the local section of a fictitious parabolo\"id whose parameters are varying continuously with the Sun angular position. We describe an optimization procedure applicable to those heliostats. The flux densities formed at the solar receiver and the achievable concentrating ratios are computed using an improved convolution algorithm. It is shown that the optimized heliostat shape can produce typical concentration gains of approximately 10%, even when the heliostats reflect the Sun under large incidence angles.
Equitable and Fair Performance Evaluation of Whale Optimization Algorithm
Authors: Authors: Bryar A. Hassan, Tarik A. Rashid, Aram Ahmed, Shko M. Qader, Jaffer Majidpour, Mohmad Hussein Abdalla, Noor Tayfor, Hozan K. Hamarashid, Haval Sidqi, Kaniaw A. Noori
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2310.07723
Pdf link: https://arxiv.org/pdf/2310.07723
Abstract It is essential that all algorithms are exhaustively, somewhat, and intelligently evaluated. Nonetheless, evaluating the effectiveness of optimization algorithms equitably and fairly is not an easy process for various reasons. Choosing and initializing essential parameters, such as the size issues of the search area for each method and the number of iterations required to reduce the issues, might be particularly challenging. As a result, this chapter aims to contrast the Whale Optimization Algorithm (WOA) with the most recent algorithms on a selected set of benchmark problems with varying benchmark function hardness scores and initial control parameters comparable problem dimensions and search space. When solving a wide range of numerical optimization problems with varying difficulty scores, dimensions, and search areas, the experimental findings suggest that WOA may be statistically superior or inferior to the preceding algorithms referencing convergence speed, running time, and memory utilization.
Body-mounted MR-conditional Robot for Minimally Invasive Liver Intervention
Authors: Authors: Zhefeng Huang, Anthony L. Gunderman, Samuel E. Wilcox, Saikat Sengupta, Aiming Lu, David Woodrum, Jay Shah, Yue Chen
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.07822
Pdf link: https://arxiv.org/pdf/2310.07822
Abstract MR-guided microwave ablation (MWA) has proven effective in treating hepatocellular carcinoma (HCC) with small-sized tumors, but the state-of-the-art technique suffers from sub-optimal workflow due to speed and accuracy of needle placement. This paper presents a compact body-mounted MR-conditional robot that can operate in closed-bore MR scanners for accurate needle guidance. The robotic platform consists of two stacked Cartesian XY stages, each with two degrees of freedom, that facilitate needle guidance. The robot is actuated using 3D-printed pneumatic turbines with MR-conditional bevel gear transmission systems. Pneumatic valves and control mechatronics are located inside the MRI control room and are connected to the robot with pneumatic transmission lines and optical fibers. Free space experiments indicated robot-assisted needle insertion error of 2.6$\pm$1.3 mm at an insertion depth of 80 mm. The MR-guided phantom studies were conducted to verify the MR-conditionality and targeting performance of the robot. Future work will focus on the system optimization and validations in animal trials.
When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Authors: Authors: Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.07831
Pdf link: https://arxiv.org/pdf/2310.07831
Abstract Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works that study the convergence of the average iterate, we study the last iterate, which is what most people use in practice. When considering only worst-case analysis, our theory predicts that the best choice is the linear decay schedule: a popular choice in practice that sets the stepsize proportionally to $1 - t/T$, where $t$ is the current iteration and $T$ is the total number of steps. To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task. These refined schedules exhibit learning rate warm-up and rapid learning rate annealing near the end of training. Ours is the first systematic approach to automatically yield both of these properties. We perform the most comprehensive evaluation of learning rate schedules to date, evaluating across 10 diverse deep learning problems, a series of LLMs, and a suite of logistic regression problems. We validate that overall, the linear-decay schedule matches or outperforms all commonly used default schedules including cosine annealing, and that our schedule refinement method gives further improvements.
DAG-aware Synthesis Orchestration
Authors: Authors: Yingjie Li, Mingju Liu, Mark Ren, Alan Mishchenko, Cunxi Yu
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2310.07846
Pdf link: https://arxiv.org/pdf/2310.07846
Abstract The key methodologies of modern logic synthesis techniques are conducted on multi-level technology-independent representations such as And-Inverter-Graphs (AIGs) of the digital logic via directed-acyclic-graph (DAGs) traversal based structural rewriting, resubstitution, and refactoring. Existing state-of-the-art DAG-aware logic synthesis algorithms are all designed to perform stand-alone optimizations during a single DAG traversal. However, we empirically identify and demonstrate that these algorithms are limited in quality-of-results and runtime complexity due to this design concept. This work proposes Synthesis Orchestration, which orchestrates stand-alone operations within the single traversal of AIG. Thus, orchestration method explores more optimization opportunities and results in better performance. Our experimental results are comprehensively conducted on all 104 designs collected from ISCAS'85/89/99, VTR, and EPFL benchmark suites, with consistent logic minimization improvements over rewriting, resubstitution, refactoring, leading to an average of 4% more node reduction with improved runtime efficiency for the single optimization. Moreover, we evaluate orchestration as a plug-in algorithm in resyn and resyn3 flows in ABC, which demonstrates consistent logic minimization improvements (3.8% and 10.9% more node reduction on average). The runtime analysis demonstrates the orchestration outperforms stand-alone algorithms in both AIG minimization and runtime efficiency. Finally, we integrate the orchestration into OpenROAD for end-to-end performance evaluation. Our results demonstrate the advantages of the orchestration optimization technique, even after technology mapping and post-routing in the design flow have been conducted.
VaPr: Variable-Precision Tensors to Accelerate Robot Motion Planning
Authors: Authors: Yu-Shun Hsiao, Siva Kumar Sastry Hari, Balakumar Sundaralingam, Jason Yik, Thierry Tambe, Charbel Sakr, Stephen W. Keckler, Vijay Janapa Reddi
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.07854
Pdf link: https://arxiv.org/pdf/2310.07854
Abstract High-dimensional motion generation requires numerical precision for smooth, collision-free solutions. Typically, double-precision or single-precision floating-point (FP) formats are utilized. Using these for big tensors imposes a strain on the memory bandwidth provided by the devices and alters the memory footprint, hence limiting their applicability to low-power edge devices needed for mobile robots. The uniform application of reduced precision can be advantageous but severely degrades solutions. Using decreased precision data types for important tensors, we propose to accelerate motion generation by removing memory bottlenecks. We propose variable-precision (VaPr) search optimization to determine the appropriate precision for large tensors from a vast search space of approximately 4 million unique combinations for FP data types across the tensors. To obtain the efficiency gains, we exploit existing platform support for an out-of-the-box GPU speedup and evaluate prospective precision converter units for GPU types that are not currently supported. Our experimental results on 800 planning problems for the Franka Panda robot on the MotionBenchmaker dataset across 8 environments show that a 4-bit FP format is sufficient for the largest set of tensors in the motion generation stack. With the software-only solution, VaPr achieves 6.3% and 6.3% speedups on average for a significant portion of motion generation over the SOTA solution (CuRobo) on Jetson Orin and RTX2080 Ti GPU, respectively, and 9.9%, 17.7% speedups with the FP converter.
DeePref: Deep Reinforcement Learning For Video Prefetching In Content Delivery Networks
Authors: Authors: Nawras Alkassab, Chin-Tser Huang, Tania Lorido Botran
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07881
Pdf link: https://arxiv.org/pdf/2310.07881
Abstract Content Delivery Networks carry the majority of Internet traffic, and the increasing demand for video content as a major IP traffic across the Internet highlights the importance of caching and prefetching optimization algorithms. Prefetching aims to make data available in the cache before the requester places its request to reduce access time and improve the Quality of Experience on the user side. Prefetching is well investigated in operating systems, compiler instructions, in-memory cache, local storage systems, high-speed networks, and cloud systems. Traditional prefetching techniques are well adapted to a particular access pattern, but fail to adapt to sudden variations or randomization in workloads. This paper explores the use of reinforcement learning to tackle the changes in user access patterns and automatically adapt over time. To this end, we propose, DeePref, a Deep Reinforcement Learning agent for online video content prefetching in Content Delivery Networks. DeePref is a prefetcher implemented on edge networks and is agnostic to hardware design, operating systems, and applications. Our results show that DeePref DRQN, using a real-world dataset, achieves a 17% increase in prefetching accuracy and a 28% increase in prefetching coverage on average compared to baseline approaches that use video content popularity as a building block to statically or dynamically make prefetching decisions. We also study the possibility of transfer learning of statistical models from one edge network into another, where unseen user requests from unknown distribution are observed. In terms of transfer learning, the increase in prefetching accuracy and prefetching coverage are [$30%$, $10%$], respectively. Our source code will be available on Github.
Cut-Cell Microstructures for Two-scale Structural Optimization
Authors: Authors: Davi Colli Tozoni, Zizhou Huang, Daniele Panozzo, Denis Zorin
Subjects: Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2310.07890
Pdf link: https://arxiv.org/pdf/2310.07890
Abstract Two-scale topology optimization, combined with the design of microstructure families with a broad range of effective material parameters, is increasingly widely used in many fabrication applications to achieve a target deformation behavior for a variety of objects. The main idea of this approach is to optimize the distribution of material properties in the object partitioned into relatively coarse cells, and then replace each cell with microstructure geometry that mimics these material properties. In this paper, we focus on adapting this approach to complex shapes in situations when preserving the shape's surface is important. Our approach extends any regular (i.e. defined on a regular lattice grid) microstructure family to complex shapes, by enriching it with individually optimized cut-cell tiles adapted to the geometry of the cut-cell. We propose an automated and robust pipeline based on this approach, and we show that the performance of the regular microstructure family is only minimally affected by our extension while allowing its use on 2D and 3D shapes of high complexity.
Singular Perturbation via Contraction Theory
Authors: Authors: Liliaokeawawa Cothren, Francesco Bullo, Emiliano Dall'Anese
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.07966
Pdf link: https://arxiv.org/pdf/2310.07966
Abstract In this paper, we provide a novel contraction-theoretic approach to analyze two-time scale systems. In our proposed framework, systems enjoy several robustness properties, which can lead to a more complete characterization of their behaviors. Key assumptions are the contractivity of the fast sub-system and of the reduced model, combined with an explicit upper bound on the time-scale parameter. For two-time scale systems subject to disturbances, we show that the distance between solutions of the nominal system and solutions of its reduced model is uniformly upper bounded by a function of contraction rates, Lipschitz constants, the time-scale parameter, and the time variability of the disturbances. We also show local contractivity of the two-time scale system and give sufficient conditions for global contractivity. We then consider two special cases: for autonomous nonlinear systems we obtain sharper bounds than our general results and for linear time-invariant systems we present novel bounds based upon log norms and induced norms. Finally, we apply our theory to two application areas -- online feedback optimization and Stackelberg games -- and obtain new individual tracking error bounds showing that solutions converge to their (time-varying) optimizer and computing overall contraction rates.
Hyperparameter Adaptive Search for Surrogate Optimization: A Self-Adjusting Approach
Authors: Authors: Nazanin Nezami, Hadis Anahideh
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.07970
Pdf link: https://arxiv.org/pdf/2310.07970
Abstract Surrogate Optimization (SO) algorithms have shown promise for optimizing expensive black-box functions. However, their performance is heavily influenced by hyperparameters related to sampling and surrogate fitting, which poses a challenge to their widespread adoption. We investigate the impact of hyperparameters on various SO algorithms and propose a Hyperparameter Adaptive Search for SO (HASSO) approach. HASSO is not a hyperparameter tuning algorithm, but a generic self-adjusting SO algorithm that dynamically tunes its own hyperparameters while concurrently optimizing the primary objective function, without requiring additional evaluations. The aim is to improve the accessibility, effectiveness, and convergence speed of SO algorithms for practitioners. Our approach identifies and modifies the most influential hyperparameters specific to each problem and SO approach, reducing the need for manual tuning without significantly increasing the computational burden. Experimental results demonstrate the effectiveness of HASSO in enhancing the performance of various SO algorithms across different global optimization test problems.
Graph-SCP: Accelerating Set Cover Problems with Graph Neural Networks
Authors: Authors: Zohair Shafi, Benjamin A. Miller, Tina Eliassi-Rad, Rajmonda S. Caceres
Subjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM)
Arxiv link: https://arxiv.org/abs/2310.07979
Pdf link: https://arxiv.org/pdf/2310.07979
Abstract Machine learning (ML) approaches are increasingly being used to accelerate combinatorial optimization (CO) problems. We look specifically at the Set Cover Problem (SCP) and propose Graph-SCP, a graph neural network method that can augment existing optimization solvers by learning to identify a much smaller sub-problem that contains the solution space. We evaluate the performance of Graph-SCP on synthetic weighted and unweighted SCP instances with diverse problem characteristics and complexities, and on instances from the OR Library, a canonical benchmark for SCP. We show that Graph-SCP reduces the problem size by 30-70% and achieves run time speedups up to~25x when compared to commercial solvers (Gurobi). Given a desired optimality threshold, Graph-SCP will improve upon it or even achieve 100% optimality. This is in contrast to fast greedy solutions that significantly compromise solution quality to achieve guaranteed polynomial run time. Graph-SCP can generalize to larger problem sizes and can be used with other conventional or ML-augmented CO solvers to lead to potential additional run time improvement.
GRASP: Accelerating Shortest Path Attacks via Graph Attention
Authors: Authors: Zohair Shafi. Benjamin A. Miller, Ayan Chatterjee, Tina Eliassi-Rad, Rajmonda S. Caceres
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07980
Pdf link: https://arxiv.org/pdf/2310.07980
Abstract Recent advances in machine learning (ML) have shown promise in aiding and accelerating classical combinatorial optimization algorithms. ML-based speed ups that aim to learn in an end to end manner (i.e., directly output the solution) tend to trade off run time with solution quality. Therefore, solutions that are able to accelerate existing solvers while maintaining their performance guarantees, are of great interest. We consider an APX-hard problem, where an adversary aims to attack shortest paths in a graph by removing the minimum number of edges. We propose the GRASP algorithm: Graph Attention Accelerated Shortest Path Attack, an ML aided optimization algorithm that achieves run times up to 10x faster, while maintaining the quality of solution generated. GRASP uses a graph attention network to identify a smaller subgraph containing the combinatorial solution, thus effectively reducing the input problem size. Additionally, we demonstrate how careful representation of the input graph, including node features that correlate well with the optimization task, can highlight important structure in the optimization solution.
Reinforcement Learning of Display Transfer Robots in Glass Flow Control Systems: A Physical Simulation-Based Approach
Authors: Authors: Hwajong Lee, Chan Kim, Seong-Woo Kim
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.07981
Pdf link: https://arxiv.org/pdf/2310.07981
Abstract A flow control system is a critical concept for increasing the production capacity of manufacturing systems. To solve the scheduling optimization problem related to the flow control with the aim of improving productivity, existing methods depend on a heuristic design by domain human experts. Therefore, the methods require correction, monitoring, and verification by using real equipment. As system designs increase in complexity, the monitoring time increases, which decreases the probability of arriving at the optimal design. As an alternative approach to the heuristic design of flow control systems, the use of deep reinforcement learning to solve the scheduling optimization problem has been considered. Although the existing research on reinforcement learning has yielded excellent performance in some areas, the applicability of the results to actual FAB such as display and semiconductor manufacturing processes is not evident so far. To this end, we propose a method to implement a physical simulation environment and devise a feasible flow control system design using a transfer robot in display manufacturing through reinforcement learning. We present a model and parameter setting to build a virtual environment for different display transfer robots, and training methods of reinforcement learning on the environment to obtain an optimal scheduling of glass flow control systems. Its feasibility was verified by using different types of robots used in the actual process.
RandCom: Random Communication Skipping Method for Decentralized Stochastic Optimization
Authors: Authors: Luyao Guo, Sulaiman A. Alghunaim, Kun Yuan, Laurent Condat, Jinde Cao
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.07983
Pdf link: https://arxiv.org/pdf/2310.07983
Abstract Distributed optimization methods with random communication skips are gaining increasing attention due to their proven benefits in accelerating communication complexity. Nevertheless, existing research mainly focuses on centralized communication protocols for strongly convex deterministic settings. In this work, we provide a decentralized optimization method called RandCom, which incorporates probabilistic local updates. We analyze the performance of RandCom in stochastic non-convex, convex, and strongly convex settings and demonstrate its ability to asymptotically reduce communication overhead by the probability of communication. Additionally, we prove that RandCom achieves linear speedup as the number of nodes increases. In stochastic strongly convex settings, we further prove that RandCom can achieve linear speedup with network-independent stepsizes. Moreover, we apply RandCom to federated learning and provide positive results concerning the potential for achieving linear speedup and the suitability of the probabilistic local update approach for non-convex settings.
Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization
Authors: Authors: Fu Luo, Xi Lin, Fei Liu, Qingfu Zhang, Zhenkun Wang
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.07985
Pdf link: https://arxiv.org/pdf/2310.07985
Abstract Neural combinatorial optimization (NCO) is a promising learning-based approach for solving challenging combinatorial optimization problems without specialized algorithm design by experts. However, most constructive NCO methods cannot solve problems with large-scale instance sizes, which significantly diminishes their usefulness for real-world applications. In this work, we propose a novel Light Encoder and Heavy Decoder (LEHD) model with a strong generalization ability to address this critical issue. The LEHD model can learn to dynamically capture the relationships between all available nodes of varying sizes, which is beneficial for model generalization to problems of various scales. Moreover, we develop a data-efficient training scheme and a flexible solution construction mechanism for the proposed LEHD model. By training on small-scale problem instances, the LEHD model can generate nearly optimal solutions for the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) with up to 1000 nodes, and also generalizes well to solve real-world TSPLib and CVRPLib problems. These results confirm our proposed LEHD model can significantly improve the state-of-the-art performance for constructive NCO. The code is available at https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD.
AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE
Authors: Authors: Wei Ao, Vishnu Naresh Boddeti
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.08012
Pdf link: https://arxiv.org/pdf/2310.08012
Abstract Secure inference of deep convolutional neural networks (CNNs) under RNS-CKKS involves polynomial approximation of unsupported non-linear activation functions. However, existing approaches have three main limitations: 1) Inflexibility: The polynomial approximation and associated homomorphic evaluation architecture are customized manually for each CNN architecture and do not generalize to other networks. 2) Suboptimal Approximation: Each activation function is approximated instead of the function represented by the CNN. 3) Restricted Design: Either high-degree or low-degree polynomial approximations are used. The former retains high accuracy but slows down inference due to bootstrapping operations, while the latter accelerates ciphertext inference but compromises accuracy. To address these limitations, we present AutoFHE, which automatically adapts standard CNNs for secure inference under RNS-CKKS. The key idea is to adopt layerwise mixed-degree polynomial activation functions, which are optimized jointly with the homomorphic evaluation architecture in terms of the placement of bootstrapping operations. The problem is modeled within a multi-objective optimization framework to maximize accuracy and minimize the number of bootstrapping operations. AutoFHE can be applied flexibly on any CNN architecture, and it provides diverse solutions that span the trade-off between accuracy and latency. Experimental evaluation over RNS-CKKS encrypted CIFAR datasets shows that AutoFHE accelerates secure inference by $1.32\times$ to $1.8\times$ compared to methods employing high-degree polynomials. It also improves accuracy by up to 2.56% compared to methods using low-degree polynomials. Lastly, AutoFHE accelerates inference and improves accuracy by $103\times$ and 3.46%, respectively, compared to CNNs under TFHE.
Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning
Authors: Authors: Iman Askari, Xumein Tu, Shen Zeng, Huazhen Fang
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.08045
Pdf link: https://arxiv.org/pdf/2310.08045
Abstract Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models.
Learning Regularized Monotone Graphon Mean-Field Games
Authors: Authors: Fengzhuo Zhang, Vincent Y. F. Tan, Zhaoran Wang, Zhuoran Yang
Subjects: Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.08089
Pdf link: https://arxiv.org/pdf/2310.08089
Abstract This paper studies two fundamental problems in regularized Graphon Mean-Field Games (GMFGs). First, we establish the existence of a Nash Equilibrium (NE) of any $\lambda$-regularized GMFG (for $\lambda\geq 0$). This result relies on weaker conditions than those in previous works for analyzing both unregularized GMFGs ($\lambda=0$) and $\lambda$-regularized MFGs, which are special cases of GMFGs. Second, we propose provably efficient algorithms to learn the NE in weakly monotone GMFGs, motivated by Lasry and Lions [2007]. Previous literature either only analyzed continuous-time algorithms or required extra conditions to analyze discrete-time algorithms. In contrast, we design a discrete-time algorithm and derive its convergence rate solely under weakly monotone conditions. Furthermore, we develop and analyze the action-value function estimation procedure during the online learning process, which is absent from algorithms for monotone GMFGs. This serves as a sub-module in our optimization algorithm. The efficiency of the designed algorithm is corroborated by empirical evaluations.
Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models
Authors: Authors: Beier Zhu, Kaihua Tang, Qianru Sun, Hanwang Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.08106
Pdf link: https://arxiv.org/pdf/2310.08106
Abstract Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data. Yet, the zero-shot performance is less competitive than a fully supervised one. Thus, to enhance the performance, fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks. However, we argue that such prior work has overlooked the inherent biases in foundation models. Due to the highly imbalanced Web-scale training set, these foundation models are inevitably skewed toward frequent semantics, and thus the subsequent fine-tuning or ensembling is still biased. In this study, we systematically examine the biases in foundation models and demonstrate the efficacy of our proposed Generalized Logit Adjustment (GLA) method. Note that bias estimation in foundation models is challenging, as most pre-train data cannot be explicitly accessed like in traditional long-tailed classification tasks. To this end, GLA has an optimization-based bias estimation approach for debiasing foundation models. As our work resolves a fundamental flaw in the pre-training, the proposed GLA demonstrates significant improvements across a diverse range of tasks: it achieves 1.5 pp accuracy gains on ImageNet, an large average improvement (1.4-4.6 pp) on 11 few-shot datasets, 2.4 pp gains on long-tailed classification. Codes are in \url{https://github.com/BeierZhu/GLA}.
Boosting Client Selection of Federated Learning under Device and Data Heterogeneity
Authors: Authors: Shuaijun Chen, Omid Tavallaie, Michael Henri Hambali, Seid Miad Zandavi, Hamed Haddadi, Song Guo, Albert Y. Zomaya
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.08147
Pdf link: https://arxiv.org/pdf/2310.08147
Abstract Federated learning (FL) is a promising distributed learning framework designed for privacy-aware applications of resource-constrained devices. Without sharing data, FL trains a model on each device locally and builds the global model on the server by aggregating the trained models. To reduce the communication overhead, only a portion of client devices participate in each round of training. Random selection is the most common way of selecting client devices for training data in a round of FL. However, random client selection uses distributed data and computational resources inefficiently, as it does not take into account the hardware specifications and data distribution among clients. This paper proposes FedGRA, an adaptive fair client selection algorithm designed for FL applications with unbalanced, non-Identically and Independently Distributed (IID) data running on client devices with heterogeneous computing resources. FedGRA dynamically adjusts the set of selected clients at each round of training based on clients' trained models and their available computational resources. To find an optimal solution, we model the client selection problem of FL as a multi-objective optimization by using Grey Relational Analysis (GRA) theory. To examine the performance of our proposed method, we implement our contribution on Amazon Web Services (AWS) by using 50 Elastic Compute Cloud (EC2) instances with 4 different hardware configurations. The evaluation results reveal that our contribution improves convergence significantly and reduces the average client's waiting time compared to state-of-the-art methods.
MCRepair: Multi-Chunk Program Repair via Patch Optimization with Buggy Block
Authors: Authors: Jisung Kim, Byeongjung Lee
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2310.08157
Pdf link: https://arxiv.org/pdf/2310.08157
Abstract Automated program repair (APR) is a technology that identifies and repairs bugs automatically. However, repairing multi-chunk bugs remains a long-standing and challenging problem because an APR technique must consider dependencies and then reduce the large patch space. In addition, little is known about how to combine individual candidate patches even though multi-chunk bugs require combinations. Therefore, we propose a novel APR technique called multi-code repair (MCRepair), which applies a buggy block, patch optimization, and CodeBERT to target multi-chunk bugs. A buggy block is a novel method that binds buggy chunks into a multi-buggy chunk and preprocesses the chunk with its buggy contexts for patch space reduction and dependency problems. Patch optimization is a novel strategy that effectively combines the generated candidate patches with patch space reduction. In addition, CodeBERT, a BERT for source code datasets, is fine-tuned to address the lack of datasets and out-of-vocabulary problems. We conducted several experiments to evaluate our approach on six project modules of Defects4J. In the experiments using Defects4J, MCRepair repaired 65 bugs, including 21 multi-chunk bugs. Moreover, it fixed 18 unique bugs, including eight multi-chunk bugs, and improved 40 to 250 percent performance than the baselines.
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Authors: Authors: Junyu Lu, Dixiang Zhang, Xiaojun Wu, Xinyu Gao, Ruyi Gan, Jiaxing Zhang, Yan Song, Pingjian Zhang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.08166
Pdf link: https://arxiv.org/pdf/2310.08166
Abstract Recent advancements enlarge the capabilities of large language models (LLMs) in zero-shot image-to-text generation and understanding by integrating multi-modal inputs. However, such success is typically limited to English scenarios due to the lack of large-scale and high-quality non-English multi-modal resources, making it extremely difficult to establish competitive counterparts in other languages. In this paper, we introduce the Ziya-VL series, a set of bilingual large-scale vision-language models (LVLMs) designed to incorporate visual semantics into LLM for multi-modal dialogue. Composed of Ziya-VL-Base and Ziya-VL-Chat, our models adopt the Querying Transformer from BLIP-2, further exploring the assistance of optimization schemes such as instruction tuning, multi-stage training and low-rank adaptation module for visual-language alignment. In addition, we stimulate the understanding ability of GPT-4 in multi-modal scenarios, translating our gathered English image-text datasets into Chinese and generating instruction-response through the in-context learning method. The experiment results demonstrate that compared to the existing LVLMs, Ziya-VL achieves competitive performance across a wide range of English-only tasks including zero-shot image-text retrieval, image captioning, and visual question answering. The evaluation leaderboard accessed by GPT-4 also indicates that our models possess satisfactory image-text understanding and generation capabilities in Chinese multi-modal scenario dialogues. Code, demo and models are available at ~\url{https://huggingface.co/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1}.
Improving Fast Minimum-Norm Attacks with Hyperparameter Optimization
Authors: Authors: Giuseppe Floris, Raffaele Mura, Luca Scionis, Giorgio Piras, Maura Pintor, Ambra Demontis, Battista Biggio
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.08177
Pdf link: https://arxiv.org/pdf/2310.08177
Abstract Evaluating the adversarial robustness of machine learning models using gradient-based attacks is challenging. In this work, we show that hyperparameter optimization can improve fast minimum-norm attacks by automating the selection of the loss function, the optimizer and the step-size scheduler, along with the corresponding hyperparameters. Our extensive evaluation involving several robust models demonstrates the improved efficacy of fast minimum-norm attacks when hyper-up with hyperparameter optimization. We release our open-source code at https://github.com/pralab/HO-FMN.
Beyond Traditional DoE: Deep Reinforcement Learning for Optimizing Experiments in Model Identification of Battery Dynamics
Authors: Authors: Gokhan Budan, Francesca Damiani, Can Kurtulus, N. Kemal Ure
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.08198
Pdf link: https://arxiv.org/pdf/2310.08198
Abstract Model identification of battery dynamics is a central problem in energy research; many energy management systems and design processes rely on accurate battery models for efficiency optimization. The standard methodology for battery modelling is traditional design of experiments (DoE), where the battery dynamics are excited with many different current profiles and the measured outputs are used to estimate the system dynamics. However, although it is possible to obtain useful models with the traditional approach, the process is time consuming and expensive because of the need to sweep many different current-profile configurations. In the present work, a novel DoE approach is developed based on deep reinforcement learning, which alters the configuration of the experiments on the fly based on the statistics of past experiments. Instead of sticking to a library of predefined current profiles, the proposed approach modifies the current profiles dynamically by updating the output space covered by past measurements, hence only the current profiles that are informative for future experiments are applied. Simulations and real experiments are used to show that the proposed approach gives models that are as accurate as those obtained with traditional DoE but by using 85\% less resources.
A Universal Scheme for Partitioned Dynamic Shortest Path Index
Authors: Authors: Mengxuan Zhang, Xinjie Zhou, Lei Li, Ziyi Liu, Goce Trajcevski, Yan Huang, Xiaofang Zhou
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2310.08213
Pdf link: https://arxiv.org/pdf/2310.08213
Abstract Graph partitioning is a common solution to scale up the graph algorithms, and shortest path (SP) computation is one of them. However, the existing solutions typically have a fixed partition method with a fixed path index and fixed partition structure, so it is unclear how the partition method and path index influence the pathfinding performance. Moreover, few studies have explored the index maintenance of partitioned SP (PSP) on dynamic graphs. To provide a deeper insight into the dynamic PSP indexes, we systematically deliberate on the existing works and propose a universal scheme to analyze this problem theoretically. Specifically, we first propose two novel partitioned index strategies and one optimization to improve index construction, query answering, or index maintenance of PSP index. Then we propose a path-oriented graph partitioning classification criteria for easier partition method selection. After that, we re-couple the dimensions in our scheme (partitioned index strategy, path index, and partition structure) to propose five new partitioned SP indexes that are more efficient either in the query or update on different networks. Finally, we demonstrate the effectiveness of our new indexes by comparing them with state-of-the-art PSP indexes through comprehensive evaluations.
Collaborative Precoding Design for Adjacent Integrated Sensing and Communication Base Stations
Authors: Authors: Wangjun Jiang, Zhiqing Wei, Fan Liu, Zhiyong Feng, Ping Zhang
Subjects: Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2310.08246
Pdf link: https://arxiv.org/pdf/2310.08246
Abstract Integrated sensing and communication (ISAC) base stations can provide communication and wide range sensing information for vehicles via downlink (DL) transmission, thus enhancing vehicle driving safety. One major challenge for realizing high performance communication and sensing is how to deal with the DL mutual interference among adjacent ISAC base stations, which includes not only communication related interference, but also radar sensing related interference. In this paper, we establish a DL mutual interference model of adjacent ISAC base stations, and analyze the relationship for mutual interference channels between communications and radar sensing. To improve the sensing and communication performance, we propose a collaborative precoding design for coordinated adjacent base stations to mitigate the mutual interference under the transmit power constraint and constant modulus constraint, which is formulated as a non-convex optimization problem. We first relax the problem into a convex programming by omitting the rank constraint, and propose a joint optimization algorithm to solve the problem. We furthermore propose a sequential optimization algorithm, which divides the collaborative precoding design problem into four subproblems and finds the optimum via a gradient descent algorithm. Finally, we evaluate the collaborative precoding design algorithms by considering sensing and communication performance via numerical results.
MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning
Authors: Authors: Zeyuan Ma, Hongshu Guo, Jiacheng Chen, Zhenrui Li, Guojun Peng, Yue-Jiao Gong, Yining Ma, Zhiguang Cao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2310.08252
Pdf link: https://arxiv.org/pdf/2310.08252
Abstract Recently, Meta-Black-Box Optimization with Reinforcement Learning (MetaBBO-RL) has showcased the power of leveraging RL at the meta-level to mitigate manual fine-tuning of low-level black-box optimizers. However, this field is hindered by the lack of a unified benchmark. To fill this gap, we introduce MetaBox, the first benchmark platform expressly tailored for developing and evaluating MetaBBO-RL methods. MetaBox offers a flexible algorithmic template that allows users to effortlessly implement their unique designs within the platform. Moreover, it provides a broad spectrum of over 300 problem instances, collected from synthetic to realistic scenarios, and an extensive library of 19 baseline methods, including both traditional black-box optimizers and recent MetaBBO-RL methods. Besides, MetaBox introduces three standardized performance metrics, enabling a more thorough assessment of the methods. In a bid to illustrate the utility of MetaBox for facilitating rigorous evaluation and in-depth analysis, we carry out a wide-ranging benchmarking study on existing MetaBBO-RL methods. Our MetaBox is open-source and accessible at: https://github.com/GMC-DRL/MetaBox.
Hilbert Space Embedding-based Trajectory Optimization for Multi-Modal Uncertain Obstacle Trajectory Prediction
Authors: Authors: Basant Sharma, Aditya Sharma, K.Madhava Krishna, Arun Kumar Singh
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.08270
Pdf link: https://arxiv.org/pdf/2310.08270
Abstract Safe autonomous driving critically depends on how well the ego-vehicle can predict the trajectories of neighboring vehicles. To this end, several trajectory prediction algorithms have been presented in the existing literature. Many of these approaches output a multi-modal distribution of obstacle trajectories instead of a single deterministic prediction to account for the underlying uncertainty. However, existing planners cannot handle the multi-modality based on just sample-level information of the predictions. With this motivation, this paper proposes a trajectory optimizer that can leverage the distributional aspects of the prediction in a computationally tractable and sample-efficient manner. Our optimizer can work with arbitrarily complex distributions and thus can be used with output distribution represented as a deep neural network. The core of our approach is built on embedding distribution in Reproducing Kernel Hilbert Space (RKHS), which we leverage in two ways. First, we propose an RKHS embedding approach to select probable samples from the obstacle trajectory distribution. Second, we rephrase chance-constrained optimization as distribution matching in RKHS and propose a novel sampling-based optimizer for its solution. We validate our approach with hand-crafted and neural network-based predictors trained on real-world datasets and show improvement over the existing stochastic optimization approaches in safety metrics.
Maximization of minimum rate in MIMO OFDM RIS-assisted Broadcast Channels
Authors: Authors: Mohammad Soleymani, Ignacio Santamaria, Aydin Sezgin, Eduard Jorswieck
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.08289
Pdf link: https://arxiv.org/pdf/2310.08289
Abstract Reconfigurable intelligent surface (RIS) is a promising technology to enhance the spectral efficiency of wireless communication systems. By optimizing the RIS elements, the performance of the overall system can be improved. Yet, in contrast to single-carrier systems, in multi-carrier systems, it is not possible to independently optimize RIS elements at each sub-carrier, which may reduce the benefits of RIS in multi-user orthogonal frequency division multiplexing (OFDM) systems. To this end, we investigate the effectiveness of RIS in multiple-input, multiple-output (MIMO) OFDM broadcast channels (BC). We formulate and solve a joint precoding and RIS optimization problem. We show that RIS can significantly improve the system performance even when the number of RIS elements per sub-band is very low.
Multicriteria Optimization of Lower Limb Exoskeleton Mechanism
Authors: Authors: Sayat Ibrayev, Arman Ibrayeva, Ayaulym Rakhmatullina, Aizhan Ibrayeva, Bekzat Amanov, Nurbibi Imanbayeva
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.08308
Pdf link: https://arxiv.org/pdf/2310.08308
Abstract Typical leg exoskeletons employ open-loop kinematic chains with motors placed directly on movable joints; while this design offers flexibility, it leads to increased costs and heightened control complexity due to the high number of degrees of freedom. The use of heavy servo-motors to handle torque in active joints results in complex and bulky designs, as highlighted in existing literature. In this study, we introduced a novel synthesis method with analytical solutions provided for synthesizing lower-limb exoskeleton. Additionally, we have incorporated multicriteria optimization by six designing criteria. As a result, we offer several mechanisms, comprising only six links, well-suited to the human anatomical structure, exhibit superior trajectory accuracy, efficient force transmission, satisfactory step height, and having internal transfer segment of the foot.
LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
Authors: Authors: Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren, Shuai Hu, Hongsheng Li, Yu Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.08348
Pdf link: https://arxiv.org/pdf/2310.08348
Abstract Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari. However, it has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications, especially when these environments involve complex action spaces and significant simulation costs, or inherent stochasticity. In this work, we introduce LightZero, the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios. Specificially, we summarize the most critical challenges in designing a general MCTS-style decision-making solver, then decompose the tightly-coupled algorithm and system design of tree-search RL methods into distinct sub-modules. By incorporating more appropriate exploration and optimization strategies, we can significantly enhance these sub-modules and construct powerful LightZero agents to tackle tasks across a wide range of domains, such as board games, Atari, MuJoCo, MiniGrid and GoBigger. Detailed benchmark results reveal the significant potential of such methods in building scalable and efficient decision intelligence. The code is available as part of OpenDILab at https://github.com/opendilab/LightZero.
Towards Demystifying the Generalization Behaviors When Neural Collapse Emerges
Authors: Authors: Peifeng Gao, Qianqian Xu, Yibo Yang, Peisong Wen, Huiyang Shao, Zhiyong Yang, Bernard Ghanem, Qingming Huang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.08358
Pdf link: https://arxiv.org/pdf/2310.08358
Abstract Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT). It is characterized by the collapse of features and classifier into a symmetrical structure, known as simplex equiangular tight frame (ETF). While there have been extensive studies on optimization characteristics showing the global optimality of neural collapse, little research has been done on the generalization behaviors during the occurrence of NC. Particularly, the important phenomenon of generalization improvement during TPT has been remaining in an empirical observation and lacking rigorous theoretical explanation. In this paper, we establish the connection between the minimization of CE and a multi-class SVM during TPT, and then derive a multi-class margin generalization bound, which provides a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%. Additionally, our further theoretical results indicate that different alignment between labels and features in a simplex ETF can result in varying degrees of generalization improvement, despite all models reaching NC and demonstrating similar optimization performance on train set. We refer to this newly discovered property as "non-conservative generalization". In experiments, we also provide empirical observations to verify the indications suggested by our theoretical results.
AutoVP: An Automated Visual Prompting Framework and Benchmark
Authors: Authors: Hsi-Ai Tsao, Lei Hsiung, Pin-Yu Chen, Sijia Liu, Tsung-Yi Ho
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.08381
Pdf link: https://arxiv.org/pdf/2310.08381
Abstract Visual prompting (VP) is an emerging parameter-efficient fine-tuning approach to adapting pre-trained vision models to solve various downstream image-classification tasks. However, there has hitherto been little systematic study of the design space of VP and no clear benchmark for evaluating its performance. To bridge this gap, we propose AutoVP, an end-to-end expandable framework for automating VP design choices, along with 12 downstream image-classification tasks that can serve as a holistic VP-performance benchmark. Our design space covers 1) the joint optimization of the prompts; 2) the selection of pre-trained models, including image classifiers and text-image encoders; and 3) model output mapping strategies, including nonparametric and trainable label mapping. Our extensive experimental results show that AutoVP outperforms the best-known current VP methods by a substantial margin, having up to 6.7% improvement in accuracy; and attains a maximum performance increase of 27.5% compared to linear-probing (LP) baseline. AutoVP thus makes a two-fold contribution: serving both as an efficient tool for hyperparameter tuning on VP design choices, and as a comprehensive benchmark that can reasonably be expected to accelerate VP's development. The source code is available at https://github.com/IBM/AutoVP.
Towards Running Time Analysis of Interactive Multi-objective Evolutionary Algorithms
Authors: Authors: Tianhao Lu, Chao Bian, Chao Qian
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2310.08384
Pdf link: https://arxiv.org/pdf/2310.08384
Abstract Evolutionary algorithms (EAs) are widely used for multi-objective optimization due to their population-based nature. Traditional multi-objective EAs (MOEAs) generate a large set of solutions to approximate the Pareto front, leaving a decision maker (DM) with the task of selecting a preferred solution. However, this process can be inefficient and time-consuming, especially when there are many objectives or the subjective preferences of DM is known. To address this issue, interactive MOEAs (iMOEAs) combine decision making into the optimization process, i.e., update the population with the help of the DM. In contrast to their wide applications, there has existed only two pieces of theoretical works on iMOEAs, which only considered interactive variants of the two simple single-objective algorithms, RLS and (1+1)-EA. This paper provides the first running time analysis (the essential theoretical aspect of EAs) for practical iMOEAs. Specifically, we prove that the expected running time of the well-developed interactive NSGA-II (called R-NSGA-II) for solving the OneMinMax and OneJumpZeroJump problems is $O(n \log n)$ and $O(n^k)$, respectively, which are all asymptotically faster than the traditional NSGA-II. Meanwhile, we present a variant of OneMinMax, and prove that R-NSGA-II can be exponentially slower than NSGA-II. These results provide theoretical justification for the effectiveness of iMOEAs while identifying situations where they may fail. Experiments are also conducted to validate the theoretical results.
Introducing a Deep Neural Network-based Model Predictive Control Framework for Rapid Controller Implementation
Authors: Authors: David C. Gordon, Alexander Winkler, Julian Bedei, Patrick Schaber, Jakob Andert, Charles R. Koch
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.08392
Pdf link: https://arxiv.org/pdf/2310.08392
Abstract Model Predictive Control (MPC) provides an optimal control solution based on a cost function while allowing for the implementation of process constraints. As a model-based optimal control technique, the performance of MPC strongly depends on the model used where a trade-off between model computation time and prediction performance exists. One solution is the integration of MPC with a machine learning (ML) based process model which are quick to evaluate online. This work presents the experimental implementation of a deep neural network (DNN) based nonlinear MPC for Homogeneous Charge Compression Ignition (HCCI) combustion control. The DNN model consists of a Long Short-Term Memory (LSTM) network surrounded by fully connected layers which was trained using experimental engine data and showed acceptable prediction performance with under 5% error for all outputs. Using this model, the MPC is designed to track the Indicated Mean Effective Pressure (IMEP) and combustion phasing trajectories, while minimizing several parameters. Using the acados software package to enable the real-time implementation of the MPC on an ARM Cortex A72, the optimization calculations are completed within 1.4 ms. The external A72 processor is integrated with the prototyping engine controller using a UDP connection allowing for rapid experimental deployment of the NMPC. The IMEP trajectory following of the developed controller was excellent, with a root-mean-square error of 0.133 bar, in addition to observing process constraints.
Control-Based Planning over Probability Mass Function Measurements via Robust Linear Programming
Authors: Authors: Mehdi Kermanshah, Calin Belta, Roberto Tron
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.08413
Pdf link: https://arxiv.org/pdf/2310.08413
Abstract We propose an approach to synthesize linear feedback controllers for linear systems in polygonal environments. Our method focuses on designing a robust controller that can account for uncertainty in measurements. Its inputs are provided by a perception module that generates probability mass functions (PMFs) for predefined landmarks in the environment, such as distinguishable geometric features. We formulate an optimization problem with Control Lyapunov Function (CLF) and Control Barrier Function (CBF) constraints to derive a stable and safe controller. Using the strong duality of linear programs (LPs) and robust optimization, we convert the optimization problem to a linear program that can be efficiently solved offline. At a high level, our approach partially combines perception, planning, and real-time control into a single design problem. An additional advantage of our method is the ability to produce controllers capable of exhibiting nonlinear behavior while relying solely on an offline LP for control synthesis.
Visual Attention-Prompted Prediction and Learning
Authors: Authors: Yifei Zhang, Siyi Gu, Bo Pan, Guangji Bai, Xiaofeng Yang, Liang Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.08420
Pdf link: https://arxiv.org/pdf/2310.08420
Abstract Explanation(attention)-guided learning is a method that enhances a model's predictive power by incorporating human understanding during the training phase. While attention-guided learning has shown promising results, it often involves time-consuming and computationally expensive model retraining. To address this issue, we introduce the attention-prompted prediction technique, which enables direct prediction guided by the attention prompt without the need for model retraining. However, this approach presents several challenges, including: 1) How to incorporate the visual attention prompt into the model's decision-making process and leverage it for future predictions even in the absence of a prompt? and 2) How to handle the incomplete information from the visual attention prompt? To tackle these challenges, we propose a novel framework called Visual Attention-Prompted Prediction and Learning, which seamlessly integrates visual attention prompts into the model's decision-making process and adapts to images both with and without attention prompts for prediction. To address the incomplete information of the visual attention prompt, we introduce a perturbation-based attention map modification method. Additionally, we propose an optimization-based mask aggregation method with a new weight learning function for adaptive perturbed annotation aggregation in the attention map modification process. Our overall framework is designed to learn in an attention-prompt guided multi-task manner to enhance future predictions even for samples without attention prompts and trained in an alternating manner for better convergence. Extensive experiments conducted on two datasets demonstrate the effectiveness of our proposed framework in enhancing predictions for samples, both with and without provided prompts.
Differentially Private Non-convex Learning for Multi-layer Neural Networks
Authors: Authors: Hanpu Shen, Cheng-Long Wang, Zihang Xiang, Yiming Ying, Di Wang
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.08425
Pdf link: https://arxiv.org/pdf/2310.08425
Abstract This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded and Lipschitz continuous. We propose several algorithms and our analysis demonstrates the feasibility of achieving an excess population risk that remains invariant to the data dimension. We also delve into the scenario involving the ReLU link function, and our findings mirror those of the bounded link function. We conclude this section by contrasting well-specified and misspecified models, using ReLU regression as a representative example. In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.
Cold Start Latency in Serverless Computing: A Systematic Review, Taxonomy, and Future Directions
Authors: Authors: Muhammed Golec, Guneet Kaur Walia, Mohit Kumar, Felix Cuadrado, Sukhpal Singh Gill, Steve Uhlig
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.08437
Pdf link: https://arxiv.org/pdf/2310.08437
Abstract Recently, academics and the corporate sector have paid attention to serverless computing, which enables dynamic scalability and an economic model. In serverless computing, users pay only for the time they actually spend using the resources. Although zero scaling optimises cost and resource utilisation, it is the fundamental reason for the serverless cold start problem. Various academic and corporate sector studies are being conducted to tackle the cold start problem, which has large research challenges. To study the "cold start" problem in serverless computing, this article provides a comprehensive literature overview of recent research. In addition, we present a detailed taxonomy of several approaches to addressing the issue of cold start latency in serverless computing. Several academic and industrial organisations have proposed methods for cutting down the cold start time and cold start frequency, and this taxonomy is being used to explore these methods. There are several categories in which a current study on cold start latency is organised: caching and application-level optimization-based solutions, as well as AI/ML-based solutions. We have analysed the current methods and grouped them into categories based on their commonalities and features. Finally, we conclude with a review of current challenges and possible future research directions.
UniPose: Detecting Any Keypoints
Authors: Authors: Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.08530
Pdf link: https://arxiv.org/pdf/2310.08530
Abstract This work proposes a unified framework called UniPose to detect keypoints of any articulated (e.g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation. Keypoint is a structure-aware, pixel-level, and compact representation of any object, especially articulated objects. Existing fine-grained promptable tasks mainly focus on object instance detection and segmentation but often fail to identify fine-grained granularity and structured information of image and instance, such as eyes, leg, paw, etc. Meanwhile, prompt-based keypoint detection is still under-explored. To bridge the gap, we make the first attempt to develop an end-to-end prompt-based keypoint detection framework called UniPose to detect keypoints of any objects. As keypoint detection tasks are unified in this framework, we can leverage 13 keypoint detection datasets with 338 keypoints across 1,237 categories over 400K instances to train a generic keypoint detection model. UniPose can effectively align text-to-keypoint and image-to-keypoint due to the mutual enhancement of textual and visual prompts based on the cross-modality contrastive learning optimization objectives. Our experimental results show that UniPose has strong fine-grained localization and generalization abilities across image styles, categories, and poses. Based on UniPose as a generalist keypoint detector, we hope it could serve fine-grained visual perception, understanding, and generation.
Placement Optimization of Substitutable Products
Authors: Authors: Omar El Housni, Rajan Udwani
Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.08568
Pdf link: https://arxiv.org/pdf/2310.08568
Abstract Strategic product placement can have a strong influence on customer purchase behavior in physical stores as well as online platforms. Motivated by this, we consider the problem of optimizing the placement of substitutable products in designated display locations to maximize the expected revenue of the seller. We model the customer behavior as a two-stage process: first, the customer visits a subset of display locations according to a browsing distribution; second, the customer chooses at most one product from the displayed products at those locations according to a choice model. Our goal is to design a general algorithm that can select and place the products optimally for any browsing distribution and choice model, and we call this the Placement problem. We give a randomized algorithm that utilizes an $\alpha$-approximate algorithm for cardinality constrained assortment optimization and outputs a $\frac{\Theta(\alpha)}{\log m}$-approximate solution (in expectation) for Placement with $m$ display locations, i.e., our algorithm outputs a solution with value at least $\frac{\Omega(\alpha)}{\log m}$ factor of the optimal and this is tight in the worst case. We also give algorithms with stronger guarantees in some special cases. In particular, we give a deterministic $\frac{\Omega(1)}{\log m}$-approximation algorithm for the Markov choice model, and a tight $(1-1/e)$-approximation algorithm for the problem when products have identical prices.
Is Generalized Dynamic Novel View Synthesis from Monocular Videos Possible Today?
Authors: Authors: Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Angel Bautista, Joshua M. Susskind, Alexander G. Schwing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.08587
Pdf link: https://arxiv.org/pdf/2310.08587
Abstract Rendering scenes observed in a monocular video from novel viewpoints is a challenging problem. For static scenes the community has studied both scene-specific optimization techniques, which optimize on every test scene, and generalized techniques, which only run a deep net forward pass on a test scene. In contrast, for dynamic scenes, scene-specific optimization techniques exist, but, to our best knowledge, there is currently no generalized method for dynamic novel view synthesis from a given monocular video. To answer whether generalized dynamic novel view synthesis from monocular videos is possible today, we establish an analysis framework based on existing techniques and work toward the generalized approach. We find a pseudo-generalized process without scene-specific appearance optimization is possible, but geometrically and temporally consistent depth estimates are needed. Despite no scene-specific appearance optimization, the pseudo-generalized approach improves upon some scene-specific methods.
Keyword: adam

There is no result

Keyword: gradient

Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning
Authors: Authors: Stamatis Mastromichalakis
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2310.07720
Pdf link: https://arxiv.org/pdf/2310.07720
Abstract Activation functions (AFs) are crucial components of deep neural networks (DNNs), having a significant impact on their performance. An activation function in a DNN is typically a smooth, nonlinear function that transforms an input signal into an output signal for the subsequent layer. In this paper, we propose the Parametric Leaky Tanh (PLTanh), a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU (LReLU) activation functions. PLTanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs, consistent with the behavior of LReLU. By integrating the unique advantages of these two diverse activation functions, PLTanh facilitates the learning of more intricate nonlinear relationships within the network. This paper presents an empirical evaluation of PLTanh against established activation functions, namely ReLU, LReLU, and ALReLU utilizing five diverse datasets.
AI Algorithm for the Generation of Three-Dimensional Accessibility Ramps in Grasshopper / Rhinoceros 7
Authors: Authors: Antonio Li, Leila Yi, Brandon Yeo Pei Hui
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07728
Pdf link: https://arxiv.org/pdf/2310.07728
Abstract Often overlooked as a component of urban development, accessibility infrastructure is undeniably crucial in daily life. Accessibility ramps are one of the most common types of accessibility infrastructure, and serve to benefit not only people with mobile impairments but also able-bodied third parties. While the necessity of accessibility ramps is acknowledged, actual implementation fails in light of the limits of manpower required for the design stage. In response, we present an algorithm capable of the automatic generation of a feasible accessibility ramp based on a 3D model of the relevant environment. Through the manual specification of initial and terminal points within a 3D model, the algorithm uses AI search algorithms to determine the optimal pathway connecting these points. Essential components in devising a wheelchair-accessible ramp are encoded within the process, as evaluated by the algorithm, including but not limited to elevation differentials, spatial constraints, and gradient specifications. From this, the algorithm then generates the pathway to be expanded into a full-scale, usable model of a ramp, which then can be easily exported and transformed through inter-software exchanges. Though some human input is still required following the generation stage, the minimising of human resources provides significant boosts of efficiency in the design process thus lowering the threshold for the incorporation of accessibility features in future urban design.
Feature Learning and Generalization in Deep Networks with Orthogonal Weights
Authors: Authors: Hannah Day, Yonatan Kahn, Daniel A. Roberts
Subjects: Machine Learning (cs.LG); High Energy Physics - Phenomenology (hep-ph); High Energy Physics - Theory (hep-th); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.07765
Pdf link: https://arxiv.org/pdf/2310.07765
Abstract Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network. However, such networks still exhibit fluctuations that grow linearly with the depth of the network, which may impair the training of networks with width comparable to depth. We show analytically that rectangular networks with tanh activations and weights initialized from the ensemble of orthogonal matrices have corresponding preactivation fluctuations which are independent of depth, to leading order in inverse width. Moreover, we demonstrate numerically that, at initialization, all correlators involving the neural tangent kernel (NTK) and its descendants at leading order in inverse width -- which govern the evolution of observables during training -- saturate at a depth of $\sim 20$, rather than growing without bound as in the case of Gaussian initializations. We speculate that this structure preserves finite-width feature learning while reducing overall noise, thus improving both generalization and training speed. We provide some experimental justification by relating empirical measurements of the NTK to the superior performance of deep nonlinear orthogonal networks trained under full-batch gradient descent on the MNIST and CIFAR-10 classification tasks.
Using Spark Machine Learning Models to Perform Predictive Analysis on Flight Ticket Pricing Data
Authors: Authors: Philip Wong, Phue Thant, Pratiksha Yadav, Ruta Antaliya, Jongwook Woo
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.07787
Pdf link: https://arxiv.org/pdf/2310.07787
Abstract This paper discusses predictive performance and processes undertaken on flight pricing data utilizing r2(r-square) and RMSE that leverages a large dataset, originally from Expedia.com, consisting of approximately 20 million records or 4.68 gigabytes. The project aims to determine the best models usable in the real world to predict airline ticket fares for non-stop flights across the US. Therefore, good generalization capability and optimized processing times are important measures for the model. We will discover key business insights utilizing feature importance and discuss the process and tools used for our analysis. Four regression machine learning algorithms were utilized: Random Forest, Gradient Boost Tree, Decision Tree, and Factorization Machines utilizing Cross Validator and Training Validator functions for assessing performance and generalization capability.
When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement
Authors: Authors: Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.07831
Pdf link: https://arxiv.org/pdf/2310.07831
Abstract Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works that study the convergence of the average iterate, we study the last iterate, which is what most people use in practice. When considering only worst-case analysis, our theory predicts that the best choice is the linear decay schedule: a popular choice in practice that sets the stepsize proportionally to $1 - t/T$, where $t$ is the current iteration and $T$ is the total number of steps. To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task. These refined schedules exhibit learning rate warm-up and rapid learning rate annealing near the end of training. Ours is the first systematic approach to automatically yield both of these properties. We perform the most comprehensive evaluation of learning rate schedules to date, evaluating across 10 diverse deep learning problems, a series of LLMs, and a suite of logistic regression problems. We validate that overall, the linear-decay schedule matches or outperforms all commonly used default schedules including cosine annealing, and that our schedule refinement method gives further improvements.
Coupled Scheme for Linear and Hamilton-Jacobi Equations: Theoretical and Numerical Aspects
Authors: Authors: Smita Sahu
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
Arxiv link: https://arxiv.org/abs/2310.07878
Pdf link: https://arxiv.org/pdf/2310.07878
Abstract We present a comprehensive analysis of the coupled scheme introduced in [Springer Proceedings in Mathematics \& Statistics, vol 237. Springer, Cham 2018 \cite{S2018}] for linear and Hamilton-Jacobi equations. This method merges two distinct schemes, each tailored to handle specific solution characteristics. It offers a versatile framework for coupling various schemes, enabling the integration of accurate methods for smooth solutions and the treatment of discontinuities and gradient jumps. In \cite{S2018}, the emphasis was on coupling an anti-dissipative scheme designed for discontinuous solutions with a semi-Lagrangian scheme developed for smooth solutions. In this paper, we rigorously establish the essential properties of the resulting coupled scheme, especially in the linear case. To illustrate the effectiveness of this coupled approach, we present a series of one-dimensional examples.
Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning
Authors: Authors: Lapo Frati, Neil Traft, Jeff Clune, Nick Cheney
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2310.07996
Pdf link: https://arxiv.org/pdf/2310.07996
Abstract This work identifies a simple pre-training mechanism that leads to representations exhibiting better continual and transfer learning. This mechanism -- the repeated resetting of weights in the last layer, which we nickname "zapping" -- was originally designed for a meta-continual-learning procedure, yet we show it is surprisingly applicable in many settings beyond both meta-learning and continual learning. In our experiments, we wish to transfer a pre-trained image classifier to a new set of classes, in a few shots. We show that our zapping procedure results in improved transfer accuracy and/or more rapid adaptation in both standard fine-tuning and continual learning settings, while being simple to implement and computationally efficient. In many cases, we achieve performance on par with state of the art meta-learning without needing the expensive higher-order gradients, by using a combination of zapping and sequential learning. An intuitive explanation for the effectiveness of this zapping procedure is that representations trained with repeated zapping learn features that are capable of rapidly adapting to newly initialized classifiers. Such an approach may be considered a computationally cheaper type of, or alternative to, meta-learning rapidly adaptable features with higher-order gradients. This adds to recent work on the usefulness of resetting neural network parameters during training, and invites further investigation of this mechanism.
Robust 1-bit Compressed Sensing with Iterative Hard Thresholding
Authors: Authors: Namiko Matsumoto, Arya Mazumdar
Subjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.08019
Pdf link: https://arxiv.org/pdf/2310.08019
Abstract In 1-bit compressed sensing, the aim is to estimate a $k$-sparse unit vector $x\in S^{n-1}$ within an $\epsilon$ error (in $\ell_2$) from minimal number of linear measurements that are quantized to just their signs, i.e., from measurements of the form $y = \mathrm{Sign}(\langle a, x\rangle).$ In this paper, we study a noisy version where a fraction of the measurements can be flipped, potentially by an adversary. In particular, we analyze the Binary Iterative Hard Thresholding (BIHT) algorithm, a proximal gradient descent on a properly defined loss function used for 1-bit compressed sensing, in this noisy setting. It is known from recent results that, with $\tilde{O}(\frac{k}{\epsilon})$ noiseless measurements, BIHT provides an estimate within $\epsilon$ error. This result is optimal and universal, meaning one set of measurements work for all sparse vectors. In this paper, we show that BIHT also provides better results than all known methods for the noisy setting. We show that when up to $\tau$-fraction of the sign measurements are incorrect (adversarial error), with the same number of measurements as before, BIHT agnostically provides an estimate of $x$ within an $\tilde{O}(\epsilon+\tau)$ error, maintaining the universality of measurements. This establishes stability of iterative hard thresholding in the presence of measurement error. To obtain the result, we use the restricted approximate invertibility of Gaussian matrices, as well as a tight analysis of the high-dimensional geometry of the adversarially corrupted measurements.
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
Authors: Authors: Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.08041
Pdf link: https://arxiv.org/pdf/2310.08041
Abstract Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive training costs make Post-Training Quantization (PTQ) a more practical approach for LLMs. In existing studies, activation outliers in particular channels are identified as the bottleneck to PTQ accuracy. They propose to transform the magnitudes from activations to weights, which however offers limited alleviation or suffers from unstable gradients, resulting in a severe performance drop at low-bitwidth. In this paper, we propose QLLM, an accurate and efficient low-bitwidth PTQ method designed for LLMs. QLLM introduces an adaptive channel reassembly technique that reallocates the magnitude of outliers to other channels, thereby mitigating their impact on the quantization range. This is achieved by channel disassembly and channel assembly, which first breaks down the outlier channels into several sub-channels to ensure a more balanced distribution of activation magnitudes. Then similar channels are merged to maintain the original channel number for efficiency. Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly. To further compensate for the performance loss caused by quantization, we propose an efficient tuning method that only learns a small number of low-rank weights while freezing the pre-trained quantized model. After training, these low-rank parameters can be fused into the frozen weights without affecting inference. Extensive experiments on LLaMA-1 and LLaMA-2 show that QLLM can obtain accurate quantized models efficiently. For example, QLLM quantizes the 4-bit LLaMA-2-70B within 10 hours on a single A100-80G GPU, outperforming the previous state-of-the-art method by 7.89% on the average accuracy across five zero-shot tasks.
Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning
Authors: Authors: Iman Askari, Xumein Tu, Shen Zeng, Huazhen Fang
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.08045
Pdf link: https://arxiv.org/pdf/2310.08045
Abstract Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models.
Counterfactual Explanations for Time Series Forecasting
Authors: Authors: Zhendong Wang, Ioanna Miliou, Isak Samsten, Panagiotis Papapetrou
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.08137
Pdf link: https://arxiv.org/pdf/2310.08137
Abstract Among recent developments in time series forecasting methods, deep forecasting models have gained popularity as they can utilize hidden feature patterns in time series to improve forecasting performance. Nevertheless, the majority of current deep forecasting models are opaque, hence making it challenging to interpret the results. While counterfactual explanations have been extensively employed as a post-hoc approach for explaining classification models, their application to forecasting models still remains underexplored. In this paper, we formulate the novel problem of counterfactual generation for time series forecasting, and propose an algorithm, called ForecastCF, that solves the problem by applying gradient-based perturbations to the original time series. ForecastCF guides the perturbations by applying constraints to the forecasted values to obtain desired prediction outcomes. We experimentally evaluate ForecastCF using four state-of-the-art deep model architectures and compare to two baselines. Our results show that ForecastCF outperforms the baseline in terms of counterfactual validity and data manifold closeness. Overall, our findings suggest that ForecastCF can generate meaningful and relevant counterfactual explanations for various forecasting tasks.
Improving Fast Minimum-Norm Attacks with Hyperparameter Optimization
Authors: Authors: Giuseppe Floris, Raffaele Mura, Luca Scionis, Giorgio Piras, Maura Pintor, Ambra Demontis, Battista Biggio
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.08177
Pdf link: https://arxiv.org/pdf/2310.08177
Abstract Evaluating the adversarial robustness of machine learning models using gradient-based attacks is challenging. In this work, we show that hyperparameter optimization can improve fast minimum-norm attacks by automating the selection of the loss function, the optimizer and the step-size scheduler, along with the corresponding hyperparameters. Our extensive evaluation involving several robust models demonstrates the improved efficacy of fast minimum-norm attacks when hyper-up with hyperparameter optimization. We release our open-source code at https://github.com/pralab/HO-FMN.
Collaborative Precoding Design for Adjacent Integrated Sensing and Communication Base Stations
Authors: Authors: Wangjun Jiang, Zhiqing Wei, Fan Liu, Zhiyong Feng, Ping Zhang
Subjects: Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2310.08246
Pdf link: https://arxiv.org/pdf/2310.08246
Abstract Integrated sensing and communication (ISAC) base stations can provide communication and wide range sensing information for vehicles via downlink (DL) transmission, thus enhancing vehicle driving safety. One major challenge for realizing high performance communication and sensing is how to deal with the DL mutual interference among adjacent ISAC base stations, which includes not only communication related interference, but also radar sensing related interference. In this paper, we establish a DL mutual interference model of adjacent ISAC base stations, and analyze the relationship for mutual interference channels between communications and radar sensing. To improve the sensing and communication performance, we propose a collaborative precoding design for coordinated adjacent base stations to mitigate the mutual interference under the transmit power constraint and constant modulus constraint, which is formulated as a non-convex optimization problem. We first relax the problem into a convex programming by omitting the rank constraint, and propose a joint optimization algorithm to solve the problem. We furthermore propose a sequential optimization algorithm, which divides the collaborative precoding design problem into four subproblems and finds the optimum via a gradient descent algorithm. Finally, we evaluate the collaborative precoding design algorithms by considering sensing and communication performance via numerical results.
MeanAP-Guided Reinforced Active Learning for Object Detection
Authors: Authors: Zhixuan Liang, Xingyu Zeng, Rui Zhao, Ping Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.08387
Pdf link: https://arxiv.org/pdf/2310.08387
Abstract Active learning presents a promising avenue for training high-performance models with minimal labeled data, achieved by judiciously selecting the most informative instances to label and incorporating them into the task learner. Despite notable advancements in active learning for image recognition, metrics devised or learned to gauge the information gain of data, crucial for query strategy design, do not consistently align with task model performance metrics, such as Mean Average Precision (MeanAP) in object detection tasks. This paper introduces MeanAP-Guided Reinforced Active Learning for Object Detection (MAGRAL), a novel approach that directly utilizes the MeanAP metric of the task model to devise a sampling strategy employing a reinforcement learning-based sampling agent. Built upon LSTM architecture, the agent efficiently explores and selects subsequent training instances, and optimizes the process through policy gradient with MeanAP serving as reward. Recognizing the time-intensive nature of MeanAP computation at each step, we propose fast look-up tables to expedite agent training. We assess MAGRAL's efficacy across popular benchmarks, PASCAL VOC and MS COCO, utilizing different backbone architectures. Empirical findings substantiate MAGRAL's superiority over recent state-of-the-art methods, showcasing substantial performance gains. MAGRAL establishes a robust baseline for reinforced active object detection, signifying its potential in advancing the field.
Neural Sampling in Hierarchical Exponential-family Energy-based Models
Authors: Authors: Xingsi Dong, Si Wu
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2310.08431
Pdf link: https://arxiv.org/pdf/2310.08431
Abstract Bayesian brain theory suggests that the brain employs generative models to understand the external world. The sampling-based perspective posits that the brain infers the posterior distribution through samples of stochastic neuronal responses. Additionally, the brain continually updates its generative model to approach the true distribution of the external world. In this study, we introduce the Hierarchical Exponential-family Energy-based (HEE) model, which captures the dynamics of inference and learning. In the HEE model, we decompose the partition function into individual layers and leverage a group of neurons with shorter time constants to sample the gradient of the decomposed normalization term. This allows our model to estimate the partition function and perform inference simultaneously, circumventing the negative phase encountered in conventional energy-based models (EBMs). As a result, the learning process is localized both in time and space, and the model is easy to converge. To match the brain's rapid computation, we demonstrate that neural adaptation can serve as a momentum term, significantly accelerating the inference process. On natural image datasets, our model exhibits representations akin to those observed in the biological visual system. Furthermore, for the machine learning community, our model can generate observations through joint or marginal generation. We show that marginal generation outperforms joint generation and achieves performance on par with other EBMs.
Monotone discretizations of levelset convex geometric PDEs
Authors: Authors: Jeff Calder, Wonjun Lee
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.08450
Pdf link: https://arxiv.org/pdf/2310.08450
Abstract We introduce a novel algorithm that converges to level-set convex viscosity solutions of high-dimensional Hamilton-Jacobi equations. The algorithm is applicable to a broad class of curvature motion PDEs, as well as a recently developed Hamilton-Jacobi equation for the Tukey depth, which is a statistical depth measure of data points. A main contribution of our work is a new monotone scheme for approximating the direction of the gradient, which allows for monotone discretizations of pure partial derivatives in the direction of, and orthogonal to, the gradient. We provide a convergence analysis of the algorithm on both regular Cartesian grids and unstructured point clouds in any dimension and present numerical experiments that demonstrate the effectiveness of the algorithm in approximating solutions of the affine flow in two dimensions and the Tukey depth measure of high-dimensional datasets such as MNIST and FashionMNIST.
Do pretrained Transformers Really Learn In-context by Gradient Descent?
Authors: Authors: Lingfeng Shen, Aayush Mishra, Daniel Khashabi
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.08540
Pdf link: https://arxiv.org/pdf/2310.08540
Abstract Is In-Context Learning (ICL) implicitly equivalent to Gradient Descent (GD)? Several recent works draw analogies between the dynamics of GD and the emergent behavior of ICL in large language models. However, these works make assumptions far from the realistic natural language setting in which language models are trained. Such discrepancies between theory and practice, therefore, necessitate further investigation to validate their applicability. We start by highlighting the weaknesses in prior works that construct Transformer weights to simulate gradient descent. Their experiments with training Transformers on ICL objective, inconsistencies in the order sensitivity of ICL and GD, sparsity of the constructed weights, and sensitivity to parameter changes are some examples of a mismatch from the real-world setting. Furthermore, we probe and compare the ICL vs. GD hypothesis in a natural setting. We conduct comprehensive empirical analyses on language models pretrained on natural data (LLaMa-7B). Our comparisons on various performance metrics highlight the inconsistent behavior of ICL and GD as a function of various factors such as datasets, models, and number of demonstrations. We observe that ICL and GD adapt the output distribution of language models differently. These results indicate that the equivalence between ICL and GD is an open hypothesis, requires nuanced considerations and calls for further studies.
Keyword: super-resolution

There is no result

zoq / arxiv-updates

New submissions for Fri, 13 Oct 23 #620

Keyword: sgd

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

Differentially Private Non-convex Learning for Multi-layer Neural Networks

Keyword: optimization

Optimizing the concentration ratio of multi-faceted focusing heliostats

Equitable and Fair Performance Evaluation of Whale Optimization Algorithm

Body-mounted MR-conditional Robot for Minimally Invasive Liver Intervention

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

DAG-aware Synthesis Orchestration

VaPr: Variable-Precision Tensors to Accelerate Robot Motion Planning

DeePref: Deep Reinforcement Learning For Video Prefetching In Content Delivery Networks

Cut-Cell Microstructures for Two-scale Structural Optimization

Singular Perturbation via Contraction Theory

Hyperparameter Adaptive Search for Surrogate Optimization: A Self-Adjusting Approach

Graph-SCP: Accelerating Set Cover Problems with Graph Neural Networks

GRASP: Accelerating Shortest Path Attacks via Graph Attention

Reinforcement Learning of Display Transfer Robots in Glass Flow Control Systems: A Physical Simulation-Based Approach

RandCom: Random Communication Skipping Method for Decentralized Stochastic Optimization

Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization

AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE

Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning

Learning Regularized Monotone Graphon Mean-Field Games

Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models

Boosting Client Selection of Federated Learning under Device and Data Heterogeneity

MCRepair: Multi-Chunk Program Repair via Patch Optimization with Buggy Block

Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning

Improving Fast Minimum-Norm Attacks with Hyperparameter Optimization

Beyond Traditional DoE: Deep Reinforcement Learning for Optimizing Experiments in Model Identification of Battery Dynamics

A Universal Scheme for Partitioned Dynamic Shortest Path Index

Collaborative Precoding Design for Adjacent Integrated Sensing and Communication Base Stations

MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning

Hilbert Space Embedding-based Trajectory Optimization for Multi-Modal Uncertain Obstacle Trajectory Prediction

Maximization of minimum rate in MIMO OFDM RIS-assisted Broadcast Channels

Multicriteria Optimization of Lower Limb Exoskeleton Mechanism

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Towards Demystifying the Generalization Behaviors When Neural Collapse Emerges

AutoVP: An Automated Visual Prompting Framework and Benchmark

Towards Running Time Analysis of Interactive Multi-objective Evolutionary Algorithms

Introducing a Deep Neural Network-based Model Predictive Control Framework for Rapid Controller Implementation

Control-Based Planning over Probability Mass Function Measurements via Robust Linear Programming

Visual Attention-Prompted Prediction and Learning

Differentially Private Non-convex Learning for Multi-layer Neural Networks

Cold Start Latency in Serverless Computing: A Systematic Review, Taxonomy, and Future Directions

UniPose: Detecting Any Keypoints

Placement Optimization of Substitutable Products

Is Generalized Dynamic Novel View Synthesis from Monocular Videos Possible Today?

Keyword: adam

Keyword: gradient

Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning

AI Algorithm for the Generation of Three-Dimensional Accessibility Ramps in Grasshopper / Rhinoceros 7

Feature Learning and Generalization in Deep Networks with Orthogonal Weights

Using Spark Machine Learning Models to Perform Predictive Analysis on Flight Ticket Pricing Data

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

Coupled Scheme for Linear and Hamilton-Jacobi Equations: Theoretical and Numerical Aspects

Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning

Robust 1-bit Compressed Sensing with Iterative Hard Thresholding

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning

Counterfactual Explanations for Time Series Forecasting

Improving Fast Minimum-Norm Attacks with Hyperparameter Optimization

Collaborative Precoding Design for Adjacent Integrated Sensing and Communication Base Stations

MeanAP-Guided Reinforced Active Learning for Object Detection

Neural Sampling in Hierarchical Exponential-family Energy-based Models

Monotone discretizations of levelset convex geometric PDEs

Do pretrained Transformers Really Learn In-context by Gradient Descent?

Keyword: super-resolution