New submissions for Thu, 14 Sep 23

Keyword: sgd

DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass

Authors: Authors: Minxin Du, Xiang Yue, Sherman S. M. Chow, Tianhao Wang, Chenyu Huang, Huan Sun
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2309.06746
Pdf link: https://arxiv.org/pdf/2309.06746
Abstract Differentially private stochastic gradient descent (DP-SGD) adds noise to gradients in back-propagation, safeguarding training data from privacy leakage, particularly membership inference. It fails to cover (inference-time) threats like embedding inversion and sensitive attribute inference. It is also costly in storage and computation when used to fine-tune large pre-trained language models (LMs). We propose DP-Forward, which directly perturbs embedding matrices in the forward pass of LMs. It satisfies stringent local DP requirements for training and inference data. To instantiate it using the smallest matrix-valued noise, we devise an analytic matrix Gaussian~mechanism (aMGM) by drawing possibly non-i.i.d. noise from a matrix Gaussian distribution. We then investigate perturbing outputs from different hidden (sub-)layers of LMs with aMGM noises. Its utility on three typical tasks almost hits the non-private baseline and outperforms DP-SGD by up to 7.7pp at a moderate privacy level. It saves 3$\times$ time and memory costs compared to DP-SGD with the latest high-speed library. It also reduces the average success rates of embedding inversion and sensitive attribute inference by up to 88pp and 41pp, respectively, whereas DP-SGD fails.
Keyword: optimization

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Authors: Authors: Hao-Jun Michael Shi, Tsung-Hsien Lee, Shintaro Iwasaki, Jose Gallego-Posada, Zhijing Li, Kaushik Rangadurai, Dheevatsa Mudigere, Michael Rabbat
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.06497
Pdf link: https://arxiv.org/pdf/2309.06497
Abstract Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the performance optimizations that our implementation leverages to train deep networks at-scale in PyTorch. Our implementation enables fast multi-GPU distributed data-parallel training by distributing the memory and computation associated with blocks of each parameter via PyTorch's DTensor data structure and performing an AllGather primitive on the computed search directions at each iteration. This major performance enhancement enables us to achieve at most a 10% performance reduction in per-step wall-clock time compared against standard diagonal-scaling-based adaptive gradient methods. We validate our implementation by performing an ablation study on training ImageNet ResNet50, demonstrating Shampoo's superiority over standard training recipes with minimal hyperparameter tuning.
Distributionally Robust Transfer Learning
Authors: Authors: Xin Xiong, Zijian Guo, Tianxi Cai
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
Arxiv link: https://arxiv.org/abs/2309.06534
Pdf link: https://arxiv.org/pdf/2309.06534
Abstract Many existing transfer learning methods rely on leveraging information from source data that closely resembles the target data. However, this approach often overlooks valuable knowledge that may be present in different yet potentially related auxiliary samples. When dealing with a limited amount of target data and a diverse range of source models, our paper introduces a novel approach, Distributionally Robust Optimization for Transfer Learning (TransDRO), that breaks free from strict similarity constraints. TransDRO is designed to optimize the most adversarial loss within an uncertainty set, defined as a collection of target populations generated as a convex combination of source distributions that guarantee excellent prediction performances for the target data. TransDRO effectively bridges the realms of transfer learning and distributional robustness prediction models. We establish the identifiability of TransDRO and its interpretation as a weighted average of source models closest to the baseline model. We also show that TransDRO achieves a faster convergence rate than the model fitted with the target data. Our comprehensive numerical studies and analysis of multi-institutional electronic health records data using TransDRO further substantiate the robustness and accuracy of TransDRO, highlighting its potential as a powerful tool in transfer learning applications.
Offline Prompt Evaluation and Optimization with Inverse Reinforcement Learning
Authors: Authors: Hao Sun
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.06553
Pdf link: https://arxiv.org/pdf/2309.06553
Abstract The recent advances in the development of Large Language Models (LLMs) like ChatGPT have achieved remarkable performance by leveraging human expertise. Yet, fully eliciting LLMs' potential for complex tasks requires navigating the vast search space of natural language prompts. While prompt engineering has shown promise, the requisite human-crafted prompts in trial-and-error attempts and the associated costs pose significant challenges. Crucially, the efficiency of prompt optimization hinges on the costly procedure of prompt evaluation. This work introduces Prompt-OIRL, an approach rooted in offline inverse reinforcement learning that seeks to bridge the gap between effective prompt evaluation and affordability. Our method draws on offline datasets from expert evaluations, employing Inverse-RL to derive a reward model for offline, query-dependent prompt evaluations. The advantages of Prompt-OIRL are manifold: it predicts prompt performance, is cost-efficient, produces human-readable results, and efficiently navigates the prompt space. We validate our method across four LLMs and three arithmetic datasets, highlighting its potential as a robust and effective tool for offline prompt evaluation and optimization. Our code as well as the offline datasets are released, and we highlight the Prompt-OIRL can be reproduced within a few hours using a single laptop using CPU
Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices
Authors: Authors: Mohamed Imed Eddine Ghebriout, Halima Bouzidi, Smail Niar, Hamza Ouarnoughi
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.06612
Pdf link: https://arxiv.org/pdf/2309.06612
Abstract The recent surge of interest surrounding Multimodal Neural Networks (MM-NN) is attributed to their ability to effectively process and integrate information from diverse data sources. In MM-NN, features are extracted and fused from multiple modalities using adequate unimodal backbones and specific fusion networks. Although this helps strengthen the multimodal information representation, designing such networks is labor-intensive. It requires tuning the architectural parameters of the unimodal backbones, choosing the fusing point, and selecting the operations for fusion. Furthermore, multimodality AI is emerging as a cutting-edge option in Internet of Things (IoT) systems where inference latency and energy consumption are critical metrics in addition to accuracy. In this paper, we propose Harmonic-NAS, a framework for the joint optimization of unimodal backbones and multimodal fusion networks with hardware awareness on resource-constrained devices. Harmonic-NAS involves a two-tier optimization approach for the unimodal backbone architectures and fusion strategy and operators. By incorporating the hardware dimension into the optimization, evaluation results on various devices and multimodal datasets have demonstrated the superiority of Harmonic-NAS over state-of-the-art approaches achieving up to 10.9% accuracy improvement, 1.91x latency reduction, and 2.14x energy efficiency gain.
RT-LM: Uncertainty-Aware Resource Management for Real-Time Inference of Language Models
Authors: Authors: Yufei Li, Zexin Li, Wei Yang, Cong Liu
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.06619
Pdf link: https://arxiv.org/pdf/2309.06619
Abstract Recent advancements in language models (LMs) have gained substantial attentions on their capability to generate human-like responses. Though exhibiting a promising future for various applications such as conversation AI, these LMs face deployment challenges on various devices due to their extreme computational cost and unpredictable inference latency. Such varied inference latency, identified as a consequence of uncertainty intrinsic to the nature of language, can lead to computational inefficiency and degrade the overall performance of LMs, especially under high-traffic workloads. Unfortunately, the bandwidth of these uncertainty sources is extensive, complicating the prediction of latency and the effects emanating from such uncertainties. To understand and mitigate the impact of uncertainty on real-time response-demanding systems, we take the first step to comprehend, quantify and optimize these uncertainty-induced latency performance variations in LMs. Specifically, we present RT-LM, an uncertainty-aware resource management ecosystem for real-time inference of LMs. RT-LM innovatively quantifies how specific input uncertainties, adversely affect latency, often leading to an increased output length. Exploiting these insights, we devise a lightweight yet effective method to dynamically correlate input text uncertainties with output length at runtime. Utilizing this quantification as a latency heuristic, we integrate the uncertainty information into a system-level scheduler which explores several uncertainty-induced optimization opportunities, including uncertainty-aware prioritization, dynamic consolidation, and strategic CPU offloading. Quantitative experiments across five state-of-the-art LMs on two hardware platforms demonstrates that RT-LM can significantly reduce the average response time and improve throughput while incurring a rather small runtime overhead.
Bregman Graph Neural Network
Authors: Authors: Jiayu Zhai, Lequan Lin, Dai Shi, Junbin Gao
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.06645
Pdf link: https://arxiv.org/pdf/2309.06645
Abstract Numerous recent research on graph neural networks (GNNs) has focused on formulating GNN architectures as an optimization problem with the smoothness assumption. However, in node classification tasks, the smoothing effect induced by GNNs tends to assimilate representations and over-homogenize labels of connected nodes, leading to adverse effects such as over-smoothing and misclassification. In this paper, we propose a novel bilevel optimization framework for GNNs inspired by the notion of Bregman distance. We demonstrate that the GNN layer proposed accordingly can effectively mitigate the over-smoothing issue by introducing a mechanism reminiscent of the "skip connection". We validate our theoretical results through comprehensive empirical studies in which Bregman-enhanced GNNs outperform their original counterparts in both homophilic and heterophilic graphs. Furthermore, our experiments also show that Bregman GNNs can produce more robust learning accuracy even when the number of layers is high, suggesting the effectiveness of the proposed method in alleviating the over-smoothing issue.
Statistical Rejection Sampling Improves Preference Optimization
Authors: Authors: Tianqi Liu, Yao Zhao, Rishabh Joshi, Misha Khalman, Mohammad Saleh, Peter J. Liu, Jialu Liu
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2309.06657
Pdf link: https://arxiv.org/pdf/2309.06657
Abstract Improving the alignment of language models with human preferences remains an active research challenge. Previous approaches have primarily utilized Reinforcement Learning from Human Feedback (RLHF) via online RL methods such as Proximal Policy Optimization (PPO). Recently, offline methods such as Sequence Likelihood Calibration (SLiC) and Direct Preference Optimization (DPO) have emerged as attractive alternatives, offering improvements in stability and scalability while maintaining competitive performance. SLiC refines its loss function using sequence pairs sampled from a supervised fine-tuned (SFT) policy, while DPO directly optimizes language models based on preference data, foregoing the need for a separate reward model. However, the maximum likelihood estimator (MLE) of the target optimal policy requires labeled preference pairs sampled from that policy. DPO's lack of a reward model constrains its ability to sample preference pairs from the optimal policy, and SLiC is restricted to sampling preference pairs only from the SFT policy. To address these limitations, we introduce a novel approach called Statistical Rejection Sampling Optimization (RSO) that aims to source preference data from the target optimal policy using rejection sampling, enabling a more accurate estimation of the optimal policy. We also propose a unified framework that enhances the loss functions used in both SLiC and DPO from a preference modeling standpoint. Through extensive experiments across three diverse tasks, we demonstrate that RSO consistently outperforms both SLiC and DPO on evaluations from both Large Language Model (LLM) and human raters.
Generalizable Neural Fields as Partially Observed Neural Processes
Authors: Authors: Jeffrey Gu, Kuan-Chieh Wang, Serena Yeung
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.06660
Pdf link: https://arxiv.org/pdf/2309.06660
Abstract Neural fields, which represent signals as a function parameterized by a neural network, are a promising alternative to traditional discrete vector or grid-based representations. Compared to discrete representations, neural representations both scale well with increasing resolution, are continuous, and can be many-times differentiable. However, given a dataset of signals that we would like to represent, having to optimize a separate neural field for each signal is inefficient, and cannot capitalize on shared information or structures among signals. Existing generalization methods view this as a meta-learning problem and employ gradient-based meta-learning to learn an initialization which is then fine-tuned with test-time optimization, or learn hypernetworks to produce the weights of a neural field. We instead propose a new paradigm that views the large-scale training of neural representations as a part of a partially-observed neural process framework, and leverage neural process algorithms to solve this task. We demonstrate that this approach outperforms both state-of-the-art gradient-based meta-learning approaches and hypernetwork approaches.
Federated PAC-Bayesian Learning on Non-IID data
Authors: Authors: Zihao Zhao, Yang Liu, Wenbo Ding, Xiao-Ping Zhang
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2309.06683
Pdf link: https://arxiv.org/pdf/2309.06683
Abstract Existing research has either adapted the Probably Approximately Correct (PAC) Bayesian framework for federated learning (FL) or used information-theoretic PAC-Bayesian bounds while introducing their theorems, but few considering the non-IID challenges in FL. Our work presents the first non-vacuous federated PAC-Bayesian bound tailored for non-IID local data. This bound assumes unique prior knowledge for each client and variable aggregation weights. We also introduce an objective function and an innovative Gibbs-based algorithm for the optimization of the derived bound. The results are validated on real-world datasets.
Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense
Authors: Authors: Jianqiao Wangni
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.06724
Pdf link: https://arxiv.org/pdf/2309.06724
Abstract We aim to provide a general framework of for computational photography that recovers the real scene from imperfect images, via the Deep Nonparametric Convexified Filtering (DNCF). It is consists of a nonparametric deep network to resemble the physical equations behind the image formation, such as denoising, super-resolution, inpainting, and flash. DNCF has no parameterization dependent on training data, therefore has a strong generalization and robustness to adversarial image manipulation. During inference, we also encourage the network parameters to be nonnegative and create a bi-convex function on the input and parameters, and this adapts to second-order optimization algorithms with insufficient running time, having 10X acceleration over Deep Image Prior. With these tools, we empirically verify its capability to defend image classification deep networks against adversary attack algorithms in real-time.
Hierarchical Time-Optimal Planning for Multi-Vehicle Racing
Authors: Authors: Georg Jank, Matthias Rowold, Boris Lohmann
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.06768
Pdf link: https://arxiv.org/pdf/2309.06768
Abstract This paper presents a hierarchical planning algorithm for racing with multiple opponents. The two-stage approach consists of a high-level behavioral planning step and a low-level optimization step. By combining discrete and continuous planning methods, our algorithm encourages global time optimality without being limited by coarse discretization. In the behavioral planning step, the fastest behavior is determined with a low-resolution spatio-temporal visibility graph. Based on the selected behavior, we calculate maneuver envelopes that are subsequently applied as constraints in a time-optimal control problem. The performance of our method is comparable to a parallel approach that selects the fastest trajectory from multiple optimizations with different behavior classes. However, our algorithm can be executed on a single core. This significantly reduces computational requirements, especially when multiple opponents are involved. Therefore, the proposed method is an efficient and practical solution for real-time multi-vehicle racing scenarios.
Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss
Authors: Authors: Tilahun M. Getu, Georges Kaddoum
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.06774
Pdf link: https://arxiv.org/pdf/2309.06774
Abstract Although deep learning (DL) has led to several breakthroughs in many disciplines as diverse as chemistry, computer science, electrical engineering, mathematics, medicine, neuroscience, and physics, a comprehensive understanding of why and how DL is empirically successful remains fundamentally elusive. To attack this fundamental problem and unravel the mysteries behind DL's empirical successes, significant innovations toward a unified theory of DL have been made. These innovations encompass nearly fundamental advances in optimization, generalization, and approximation. Despite these advances, however, no work to date has offered a way to quantify the testing performance of a DL-based algorithm employed to solve a pattern classification problem. To overcome this fundamental challenge in part, this paper exposes the fundamental testing performance limits of DL-based binary classifiers trained with hinge loss. For binary classifiers that are based on deep rectified linear unit (ReLU) feedforward neural networks (FNNs) and ones that are based on deep FNNs with ReLU and Tanh activation, we derive their respective novel asymptotic testing performance limits. The derived testing performance limits are validated by extensive computer experiments.
A Critical Escape Probability Formulation for Enhancing the Transient Stability of Power Systems with System Parameter Design
Authors: Authors: Xian Wu, Kaihua Xi, Aijie Cheng, Chenghui Zhang, Hai Xiang Lin
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.06803
Pdf link: https://arxiv.org/pdf/2309.06803
Abstract For the enhancement of the transient stability of power systems, the key is to define a quantitative optimization formulation with system parameters as decision variables. In this paper, we model the disturbances by Gaussian noise and define a metric named Critical Escape Probability (CREP) based on the invariant probability measure of a linearised stochastic processes. CREP characterizes the probability of the state escaping from a critical set. CREP involves all the system parameters and reflects the size of the basin of attraction of the nonlinear systems. An optimization framework that minimizes CREP with the system parameters as decision variablesis is presented. Simulations show that the mean first hitting time when the state hits the boundary of the critical set, that is often used to describe the stability of nonlinear systems, is dramatically increased by minimizing CREP. This indicates that the transient stability of the system is effectively enhanced. It also shown that suppressing the state fluctuations only is insufficient for enhancing the transient stability. In addition, the famous Braess' paradox which also exists in power systems is revisited. Surprisingly, it turned out that the paradoxes identified by the traditional metric may not exist according to CREP. This new metric opens a new avenue for the transient stability analysis of future power systems integrated with large amounts of renewable energy.
Supervised Machine Learning and Physics based Machine Learning approach for prediction of peak temperature distribution in Additive Friction Stir Deposition of Aluminium Alloy
Authors: Authors: Akshansh Mishra
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.06838
Pdf link: https://arxiv.org/pdf/2309.06838
Abstract Additive friction stir deposition (AFSD) is a novel solid-state additive manufacturing technique that circumvents issues of porosity, cracking, and properties anisotropy that plague traditional powder bed fusion and directed energy deposition approaches. However, correlations between process parameters, thermal profiles, and resulting microstructure in AFSD remain poorly understood. This hinders process optimization for properties. This work employs a cutting-edge framework combining supervised machine learning (SML) and physics-informed neural networks (PINNs) to predict peak temperature distribution in AFSD from process parameters. Eight regression algorithms were implemented for SML modeling, while four PINNs leveraged governing equations for transport, wave propagation, heat transfer, and quantum mechanics. Across multiple statistical measures, ensemble techniques like gradient boosting proved superior for SML, with lowest MSE of 165.78. The integrated ML approach was also applied to classify deposition quality from process factors, with logistic regression delivering robust accuracy. By fusing data-driven learning and fundamental physics, this dual methodology provides comprehensive insights into tailoring microstructure through thermal management in AFSD. The work demonstrates the power of bridging statistical and physics-based modeling for elucidating AM process-property relationships.
Lotaru: Locally Predicting Workflow Task Runtimes for Resource Management on Heterogeneous Infrastructures
Authors: Authors: Jonathan Bader, Fabian Lehmann, Lauritz Thamsen, Ulf Leser, Odej Kao
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2309.06918
Pdf link: https://arxiv.org/pdf/2309.06918
Abstract Many resource management techniques for task scheduling, energy and carbon efficiency, and cost optimization in workflows rely on a-priori task runtime knowledge. Building runtime prediction models on historical data is often not feasible in practice as workflows, their input data, and the cluster infrastructure change. Online methods, on the other hand, which estimate task runtimes on specific machines while the workflow is running, have to cope with a lack of measurements during start-up. Frequently, scientific workflows are executed on heterogeneous infrastructures consisting of machines with different CPU, I/O, and memory configurations, further complicating predicting runtimes due to different task runtimes on different machine types. This paper presents Lotaru, a method for locally predicting the runtimes of scientific workflow tasks before they are executed on heterogeneous compute clusters. Crucially, our approach does not rely on historical data and copes with a lack of training data during the start-up. To this end, we use microbenchmarks, reduce the input data to quickly profile the workflow locally, and predict a task's runtime with a Bayesian linear regression based on the gathered data points from the local workflow execution and the microbenchmarks. Due to its Bayesian approach, Lotaru provides uncertainty estimates that can be used for advanced scheduling methods on distributed cluster infrastructures. In our evaluation with five real-world scientific workflows, our method outperforms two state-of-the-art runtime prediction baselines and decreases the absolute prediction error by more than 12.5%. In a second set of experiments, the prediction performance of our method, using the predicted runtimes for state-of-the-art scheduling, carbon reduction, and cost prediction, enables results close to those achieved with perfect prior knowledge of runtimes.
Investigating the Impact of Action Representations in Policy Gradient Algorithms
Authors: Authors: Jan Schneider, Pierre Schumacher, Daniel Häufle, Bernhard Schölkopf, Dieter Büchler
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.06921
Pdf link: https://arxiv.org/pdf/2309.06921
Abstract Reinforcement learning~(RL) is a versatile framework for learning to solve complex real-world tasks. However, influences on the learning performance of RL algorithms are often poorly understood in practice. We discuss different analysis techniques and assess their effectiveness for investigating the impact of action representations in RL. Our experiments demonstrate that the action representation can significantly influence the learning performance on popular RL benchmark tasks. The analysis results indicate that some of the performance differences can be attributed to changes in the complexity of the optimization landscape. Finally, we discuss open challenges of analysis techniques for RL algorithms.
Instance Adaptive Prototypical Contrastive Embedding for Generalized Zero Shot Learning
Authors: Authors: Riti Paul, Sahil Vora, Baoxin Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.06987
Pdf link: https://arxiv.org/pdf/2309.06987
Abstract Generalized zero-shot learning(GZSL) aims to classify samples from seen and unseen labels, assuming unseen labels are not accessible during training. Recent advancements in GZSL have been expedited by incorporating contrastive-learning-based (instance-based) embedding in generative networks and leveraging the semantic relationship between data points. However, existing embedding architectures suffer from two limitations: (1) limited discriminability of synthetic features' embedding without considering fine-grained cluster structures; (2) inflexible optimization due to restricted scaling mechanisms on existing contrastive embedding networks, leading to overlapped representations in the embedding space. To enhance the quality of representations in the embedding space, as mentioned in (1), we propose a margin-based prototypical contrastive learning embedding network that reaps the benefits of prototype-data (cluster quality enhancement) and implicit data-data (fine-grained representations) interaction while providing substantial cluster supervision to the embedding network and the generator. To tackle (2), we propose an instance adaptive contrastive loss that leads to generalized representations for unseen labels with increased inter-class margin. Through comprehensive experimental evaluation, we show that our method can outperform the current state-of-the-art on three benchmark datasets. Our approach also consistently achieves the best unseen performance in the GZSL setting.
Efficient Reinforcement Learning for Jumping Monopods
Authors: Authors: Riccardo Bussola, Michele Focchi, Andrea Del Prete, Daniele Fontanelli, Luigi Palopoli
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.07038
Pdf link: https://arxiv.org/pdf/2309.07038
Abstract In this work, we consider the complex control problem of making a monopod reach a target with a jump. The monopod can jump in any direction and the terrain underneath its foot can be uneven. This is a template of a much larger class of problems, which are extremely challenging and computationally expensive to solve using standard optimisation-based techniques. Reinforcement Learning (RL) could be an interesting alternative, but the application of an end-to-end approach in which the controller must learn everything from scratch, is impractical. The solution advocated in this paper is to guide the learning process within an RL framework by injecting physical knowledge. This expedient brings to widespread benefits, such as a drastic reduction of the learning time, and the ability to learn and compensate for possible errors in the low-level controller executing the motion. We demonstrate the advantage of our approach with respect to both optimization-based and end-to-end RL approaches.
Large Language Models for Compiler Optimization
Authors: Authors: Chris Cummins, Volker Seeker, Dejan Grubisic, Mostafa Elhoushi, Youwei Liang, Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Kim Hazelwood, Gabriel Synnaeve, Hugh Leather
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.07062
Pdf link: https://arxiv.org/pdf/2309.07062
Abstract We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and outputs a list of compiler options to best optimize the program. Crucially, during training, we ask the model to predict the instruction counts before and after optimization, and the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and improve the model's depth of understanding. We evaluate on a large suite of test programs. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Furthermore, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time.
Characterizing Speed Performance of Multi-Agent Reinforcement Learning
Authors: Authors: Samuel Wiggins, Yuan Meng, Rajgopal Kannan, Viktor Prasanna
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2309.07108
Pdf link: https://arxiv.org/pdf/2309.07108
Abstract Multi-Agent Reinforcement Learning (MARL) has achieved significant success in large-scale AI systems and big-data applications such as smart grids, surveillance, etc. Existing advancements in MARL algorithms focus on improving the rewards obtained by introducing various mechanisms for inter-agent cooperation. However, these optimizations are usually compute- and memory-intensive, thus leading to suboptimal speed performance in end-to-end training time. In this work, we analyze the speed performance (i.e., latency-bounded throughput) as the key metric in MARL implementations. Specifically, we first introduce a taxonomy of MARL algorithms from an acceleration perspective categorized by (1) training scheme and (2) communication method. Using our taxonomy, we identify three state-of-the-art MARL algorithms - Multi-Agent Deep Deterministic Policy Gradient (MADDPG), Target-oriented Multi-agent Communication and Cooperation (ToM2C), and Networked Multi-Agent RL (NeurComm) - as target benchmark algorithms, and provide a systematic analysis of their performance bottlenecks on a homogeneous multi-core CPU platform. We justify the need for MARL latency-bounded throughput to be a key performance metric in future literature while also addressing opportunities for parallelization and acceleration.
Tree-Structured Shading Decomposition
Authors: Authors: Chen Geng, Hong-Xing Yu, Sharon Zhang, Maneesh Agrawala, Jiajun Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2309.07122
Pdf link: https://arxiv.org/pdf/2309.07122
Abstract We study inferring a tree-structured representation from a single image for object shading. Prior work typically uses the parametric or measured representation to model shading, which is neither interpretable nor easily editable. We propose using the shade tree representation, which combines basic shading nodes and compositing methods to factorize object surface shading. The shade tree representation enables novice users who are unfamiliar with the physical shading process to edit object shading in an efficient and intuitive manner. A main challenge in inferring the shade tree is that the inference problem involves both the discrete tree structure and the continuous parameters of the tree nodes. We propose a hybrid approach to address this issue. We introduce an auto-regressive inference model to generate a rough estimation of the tree structure and node parameters, and then we fine-tune the inferred shade tree through an optimization algorithm. We show experiments on synthetic images, captured reflectance, real images, and non-realistic vector drawings, allowing downstream applications such as material editing, vectorized shading, and relighting. Project website: https://chen-geng.com/inv-shade-trees
Keyword: adam

There is no result

Keyword: gradient

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Authors: Authors: Hao-Jun Michael Shi, Tsung-Hsien Lee, Shintaro Iwasaki, Jose Gallego-Posada, Zhijing Li, Kaushik Rangadurai, Dheevatsa Mudigere, Michael Rabbat
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.06497
Pdf link: https://arxiv.org/pdf/2309.06497
Abstract Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the performance optimizations that our implementation leverages to train deep networks at-scale in PyTorch. Our implementation enables fast multi-GPU distributed data-parallel training by distributing the memory and computation associated with blocks of each parameter via PyTorch's DTensor data structure and performing an AllGather primitive on the computed search directions at each iteration. This major performance enhancement enables us to achieve at most a 10% performance reduction in per-step wall-clock time compared against standard diagonal-scaling-based adaptive gradient methods. We validate our implementation by performing an ablation study on training ImageNet ResNet50, demonstrating Shampoo's superiority over standard training recipes with minimal hyperparameter tuning.
Machine Translation Models Stand Strong in the Face of Adversarial Attacks
Authors: Authors: Pavel Burnyshev, Elizaveta Kostenok, Alexey Zaytsev
Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.06527
Pdf link: https://arxiv.org/pdf/2309.06527
Abstract Adversarial attacks expose vulnerabilities of deep learning models by introducing minor perturbations to the input, which lead to substantial alterations in the output. Our research focuses on the impact of such adversarial attacks on sequence-to-sequence (seq2seq) models, specifically machine translation models. We introduce algorithms that incorporate basic text perturbation heuristics and more advanced strategies, such as the gradient-based attack, which utilizes a differentiable approximation of the inherently non-differentiable translation metric. Through our investigation, we provide evidence that machine translation models display robustness displayed robustness against best performed known adversarial attacks, as the degree of perturbation in the output is directly proportional to the perturbation in the input. However, among underdogs, our attacks outperform alternatives, providing the best relative performance. Another strong candidate is an attack based on mixing of individual characters.
Strong-Weak Integrated Semi-supervision for Unsupervised Single and Multi Target Domain Adaptation
Authors: Authors: Xiaohu Lu, Hayder Radha
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.06528
Pdf link: https://arxiv.org/pdf/2309.06528
Abstract Unsupervised domain adaptation (UDA) focuses on transferring knowledge learned in the labeled source domain to the unlabeled target domain. Despite significant progress that has been achieved in single-target domain adaptation for image classification in recent years, the extension from single-target to multi-target domain adaptation is still a largely unexplored problem area. In general, unsupervised domain adaptation faces a major challenge when attempting to learn reliable information from a single unlabeled target domain. Increasing the number of unlabeled target domains further exacerbate the problem rather significantly. In this paper, we propose a novel strong-weak integrated semi-supervision (SWISS) learning strategy for image classification using unsupervised domain adaptation that works well for both single-target and multi-target scenarios. Under the proposed SWISS-UDA framework, a strong representative set with high confidence but low diversity target domain samples and a weak representative set with low confidence but high diversity target domain samples are updated constantly during the training process. Both sets are fused to generate an augmented strong-weak training batch with pseudo-labels to train the network during every iteration. The extension from single-target to multi-target domain adaptation is accomplished by exploring the class-wise distance relationship between domains and replacing the strong representative set with much stronger samples from peer domains via peer scaffolding. Moreover, a novel adversarial logit loss is proposed to reduce the intra-class divergence between source and target domains, which is back-propagated adversarially with a gradient reverse layer between the classifier and the rest of the network. Experimental results based on three benchmarks, Office-31, Office-Home, and DomainNet, show the effectiveness of the proposed SWISS framework.
Explainable Graph Neural Network for Alzheimer's Disease And Related Dementias Risk Prediction
Authors: Authors: Xinyue Hu (1), Zenan Sun (1), Yi Nian (1), Yifang Dang (1), Fang Li (1), Jingna Feng (1), Evan Yu (1), Cui Tao (1) ((1) McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA)
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.06584
Pdf link: https://arxiv.org/pdf/2309.06584
Abstract Alzheimer's disease and related dementias (ADRD) ranks as the sixth leading cause of death in the US, underlining the importance of accurate ADRD risk prediction. While recent advancement in ADRD risk prediction have primarily relied on imaging analysis, yet not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes. Our goal is to utilize Graph Neural Networks (GNNs) with claims data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative method to evaluate relationship importance and its influence on ADRD risk prediction, ensuring comprehensive interpretation. We employed Variationally Regularized Encoder-decoder Graph Neural Network (VGNN) for estimating ADRD likelihood. We created three scenarios to assess the model's efficiency, using Random Forest and Light Gradient Boost Machine as baselines. We further used our relation importance method to clarify the key relationships for ADRD risk prediction. VGNN surpassed other baseline models by 10% in the area under the receiver operating characteristic. The integration of the GNN model and relation importance interpretation could potentially play an essential role in providing valuable insight into factors that may contribute to or delay ADRD progression. Employing a GNN approach with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.
Epistemic Modeling Uncertainty of Rapid Neural Network Ensembles for Adaptive Learning
Authors: Authors: Atticus Beachy (1), Harok Bae (1), Jose Camberos (2), Ramana Grandhi (2) ((1) Wright State University, Dayton, OH, USA (2) Air Force Institute of Technology, Wright-Patterson AFB, OH, USA)
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.06628
Pdf link: https://arxiv.org/pdf/2309.06628
Abstract Emulator embedded neural networks, which are a type of physics informed neural network, leverage multi-fidelity data sources for efficient design exploration of aerospace engineering systems. Multiple realizations of the neural network models are trained with different random initializations. The ensemble of model realizations is used to assess epistemic modeling uncertainty caused due to lack of training samples. This uncertainty estimation is crucial information for successful goal-oriented adaptive learning in an aerospace system design exploration. However, the costs of training the ensemble models often become prohibitive and pose a computational challenge, especially when the models are not trained in parallel during adaptive learning. In this work, a new type of emulator embedded neural network is presented using the rapid neural network paradigm. Unlike the conventional neural network training that optimizes the weights and biases of all the network layers by using gradient-based backpropagation, rapid neural network training adjusts only the last layer connection weights by applying a linear regression technique. It is found that the proposed emulator embedded neural network trains near-instantaneously, typically without loss of prediction accuracy. The proposed method is demonstrated on multiple analytical examples, as well as an aerospace flight parameter study of a generic hypersonic vehicle.
Dissipative Imitation Learning for Discrete Dynamic Output Feedback Control with Sparse Data Sets
Authors: Authors: Amy K. Strong, Ethan J. LoCicero, Leila J. Bridgeman
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.06658
Pdf link: https://arxiv.org/pdf/2309.06658
Abstract Imitation learning enables the synthesis of controllers for complex objectives and highly uncertain plant models. However, methods to provide stability guarantees to imitation learned controllers often rely on large amounts of data and/or known plant models. In this paper, we explore an input-output (IO) stability approach to dissipative imitation learning, which achieves stability with sparse data sets and with little known about the plant model. A closed-loop stable dynamic output feedback controller is learned using expert data, a coarse IO plant model, and a new constraint to enforce dissipativity on the learned controller. While the learning objective is nonconvex, iterative convex overbounding (ICO) and projected gradient descent (PGD) are explored as methods to successfully learn the controller. This new imitation learning method is applied to two unknown plants and compared to traditionally learned dynamic output feedback controller and neural network controller. With little knowledge of the plant model and a small data set, the dissipativity constrained learned controller achieves closed loop stability and successfully mimics the behavior of the expert controller, while other methods often fail to maintain stability and achieve good performance.
Generalizable Neural Fields as Partially Observed Neural Processes
Authors: Authors: Jeffrey Gu, Kuan-Chieh Wang, Serena Yeung
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.06660
Pdf link: https://arxiv.org/pdf/2309.06660
Abstract Neural fields, which represent signals as a function parameterized by a neural network, are a promising alternative to traditional discrete vector or grid-based representations. Compared to discrete representations, neural representations both scale well with increasing resolution, are continuous, and can be many-times differentiable. However, given a dataset of signals that we would like to represent, having to optimize a separate neural field for each signal is inefficient, and cannot capitalize on shared information or structures among signals. Existing generalization methods view this as a meta-learning problem and employ gradient-based meta-learning to learn an initialization which is then fine-tuned with test-time optimization, or learn hypernetworks to produce the weights of a neural field. We instead propose a new paradigm that views the large-scale training of neural representations as a part of a partially-observed neural process framework, and leverage neural process algorithms to solve this task. We demonstrate that this approach outperforms both state-of-the-art gradient-based meta-learning approaches and hypernetwork approaches.
Attention Loss Adjusted Prioritized Experience Replay
Authors: Authors: Zhuoying Chen, Huiping Li, Rizhong Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.06684
Pdf link: https://arxiv.org/pdf/2309.06684
Abstract Prioritized Experience Replay (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network. However, the non-uniform sampling used in PER inevitably shifts the state-action space distribution and brings the estimation error of Q-value function. In this paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay algorithm is proposed, which integrates the improved Self-Attention network with Double-Sampling mechanism to fit the hyperparameter that can regulate the importance sampling weights to eliminate the estimation error caused by PER. In order to verify the effectiveness and generality of the algorithm, the ALAP is tested with value-function based, policy-gradient based and multi-agent reinforcement learning algorithms in OPENAI gym, and comparison studies verify the advantage and efficiency of the proposed training framework.
Tackling the Non-IID Issue in Heterogeneous Federated Learning by Gradient Harmonization
Authors: Authors: Xinyu Zhang, Weiyu Sun, Ying Chen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.06692
Pdf link: https://arxiv.org/pdf/2309.06692
Abstract Federated learning (FL) is a privacy-preserving paradigm for collaboratively training a global model from decentralized clients. However, the performance of FL is hindered by non-independent and identically distributed (non-IID) data and device heterogeneity. In this work, we revisit this key challenge through the lens of gradient conflicts on the server side. Specifically, we first investigate the gradient conflict phenomenon among multiple clients and reveal that stronger heterogeneity leads to more severe gradient conflicts. To tackle this issue, we propose FedGH, a simple yet effective method that mitigates local drifts through Gradient Harmonization. This technique projects one gradient vector onto the orthogonal plane of the other within conflicting client pairs. Extensive experiments demonstrate that FedGH consistently enhances multiple state-of-the-art FL baselines across diverse benchmarks and non-IID scenarios. Notably, FedGH yields more significant improvements in scenarios with stronger heterogeneity. As a plug-and-play module, FedGH can be seamlessly integrated into any FL framework without requiring hyperparameter tuning.
DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass
Authors: Authors: Minxin Du, Xiang Yue, Sherman S. M. Chow, Tianhao Wang, Chenyu Huang, Huan Sun
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2309.06746
Pdf link: https://arxiv.org/pdf/2309.06746
Abstract Differentially private stochastic gradient descent (DP-SGD) adds noise to gradients in back-propagation, safeguarding training data from privacy leakage, particularly membership inference. It fails to cover (inference-time) threats like embedding inversion and sensitive attribute inference. It is also costly in storage and computation when used to fine-tune large pre-trained language models (LMs). We propose DP-Forward, which directly perturbs embedding matrices in the forward pass of LMs. It satisfies stringent local DP requirements for training and inference data. To instantiate it using the smallest matrix-valued noise, we devise an analytic matrix Gaussian~mechanism (aMGM) by drawing possibly non-i.i.d. noise from a matrix Gaussian distribution. We then investigate perturbing outputs from different hidden (sub-)layers of LMs with aMGM noises. Its utility on three typical tasks almost hits the non-private baseline and outperforms DP-SGD by up to 7.7pp at a moderate privacy level. It saves 3$\times$ time and memory costs compared to DP-SGD with the latest high-speed library. It also reduces the average success rates of embedding inversion and sensitive attribute inference by up to 88pp and 41pp, respectively, whereas DP-SGD fails.
Supervised Machine Learning and Physics based Machine Learning approach for prediction of peak temperature distribution in Additive Friction Stir Deposition of Aluminium Alloy
Authors: Authors: Akshansh Mishra
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.06838
Pdf link: https://arxiv.org/pdf/2309.06838
Abstract Additive friction stir deposition (AFSD) is a novel solid-state additive manufacturing technique that circumvents issues of porosity, cracking, and properties anisotropy that plague traditional powder bed fusion and directed energy deposition approaches. However, correlations between process parameters, thermal profiles, and resulting microstructure in AFSD remain poorly understood. This hinders process optimization for properties. This work employs a cutting-edge framework combining supervised machine learning (SML) and physics-informed neural networks (PINNs) to predict peak temperature distribution in AFSD from process parameters. Eight regression algorithms were implemented for SML modeling, while four PINNs leveraged governing equations for transport, wave propagation, heat transfer, and quantum mechanics. Across multiple statistical measures, ensemble techniques like gradient boosting proved superior for SML, with lowest MSE of 165.78. The integrated ML approach was also applied to classify deposition quality from process factors, with logistic regression delivering robust accuracy. By fusing data-driven learning and fundamental physics, this dual methodology provides comprehensive insights into tailoring microstructure through thermal management in AFSD. The work demonstrates the power of bridging statistical and physics-based modeling for elucidating AM process-property relationships.
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection
Authors: Authors: Hanqing Guo, Guangjing Wang, Yuanda Wang, Bocheng Chen, Qiben Yan, Li Xiao
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2309.06960
Pdf link: https://arxiv.org/pdf/2309.06960
Abstract In this paper, we propose PhantomSound, a query-efficient black-box attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training stage. PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation. In the experiments, we perform our attack against 4 different speech-to-text APIs under 3 real-world scenarios to demonstrate the real-time attack impact. The results show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air, and is able to bypass 3 liveness detection mechanisms with >95% success rate. The benchmark result shows that PhantomSound can generate adversarial examples and launch the attack in a few minutes. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks, using merely ~300 queries (~5 minutes) and ~1,500 queries (~25 minutes), respectively.
Differentiable JPEG: The Devil is in the Details
Authors: Authors: Christoph Reich, Biplob Debnath, Deep Patel, Srimat Chakradhar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2309.06978
Pdf link: https://arxiv.org/pdf/2309.06978
Abstract JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by previous methods. To this end, we propose a novel diff. JPEG approach, overcoming previous limitations. Our approach is differentiable w.r.t. the input image, the JPEG quality, the quantization tables, and the color conversion parameters. We evaluate the forward and backward performance of our diff. JPEG approach against existing methods. Additionally, extensive ablations are performed to evaluate crucial design choices. Our proposed diff. JPEG resembles the (non-diff.) reference implementation best, significantly surpassing the recent-best diff. approach by $3.47$dB (PSNR) on average. For strong compression rates, we can even improve PSNR by $9.51$dB. Strong adversarial attack results are yielded by our diff. JPEG, demonstrating the effective gradient approximation. Our code is available at https://github.com/necla-ml/Diff-JPEG.
APICom: Automatic API Completion via Prompt Learning and Adversarial Training-based Data Augmentation
Authors: Authors: Yafeng Gu, Yiheng Shen, Xiang Chen, Shaoyu Yang, Yiling Huang, Zhixiang Cao
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2309.07026
Pdf link: https://arxiv.org/pdf/2309.07026
Abstract Based on developer needs and usage scenarios, API (Application Programming Interface) recommendation is the process of assisting developers in finding the required API among numerous candidate APIs. Previous studies mainly modeled API recommendation as the recommendation task, which can recommend multiple candidate APIs for the given query, and developers may not yet be able to find what they need. Motivated by the neural machine translation research domain, we can model this problem as the generation task, which aims to directly generate the required API for the developer query. After our preliminary investigation, we find the performance of this intuitive approach is not promising. The reason is that there exists an error when generating the prefixes of the API. However, developers may know certain API prefix information during actual development in most cases. Therefore, we model this problem as the automatic completion task and propose a novel approach APICom based on prompt learning, which can generate API related to the query according to the prompts (i.e., API prefix information). Moreover, the effectiveness of APICom highly depends on the quality of the training dataset. In this study, we further design a novel gradient-based adversarial training method {\atpart} for data augmentation, which can improve the normalized stability when generating adversarial examples. To evaluate the effectiveness of APICom, we consider a corpus of 33k developer queries and corresponding APIs. Compared with the state-of-the-art baselines, our experimental results show that APICom can outperform all baselines by at least 40.02\%, 13.20\%, and 16.31\% in terms of the performance measures EM@1, MRR, and MAP. Finally, our ablation studies confirm the effectiveness of our component setting (such as our designed adversarial training method, our used pre-trained model, and prompt learning) in APICom.
Multi-Robot Informative Path Planning from Regression with Sparse Gaussian Processes
Authors: Authors: Kalvik Jakkala, Srinivas Akella
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.07050
Pdf link: https://arxiv.org/pdf/2309.07050
Abstract This paper addresses multi-robot informative path planning (IPP) for environmental monitoring. The problem involves determining informative regions in the environment that should be visited by robots in order to gather the most information about the environment. We propose an efficient sparse Gaussian process-based approach that uses gradient descent to optimize paths in continuous environments. Our approach efficiently scales to both spatially and spatio-temporally correlated environments. Moreover, our approach can simultaneously optimize the informative paths while accounting for routing constraints, such as a distance budget and limits on the robot's velocity and acceleration. Our approach can be used for IPP with both discrete and continuous sensing robots, with point and non-point field-of-view sensing shapes, and for multi-robot IPP. The proposed approach is demonstrated to be fast and accurate on real-world data.
Characterizing Speed Performance of Multi-Agent Reinforcement Learning
Authors: Authors: Samuel Wiggins, Yuan Meng, Rajgopal Kannan, Viktor Prasanna
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2309.07108
Pdf link: https://arxiv.org/pdf/2309.07108
Abstract Multi-Agent Reinforcement Learning (MARL) has achieved significant success in large-scale AI systems and big-data applications such as smart grids, surveillance, etc. Existing advancements in MARL algorithms focus on improving the rewards obtained by introducing various mechanisms for inter-agent cooperation. However, these optimizations are usually compute- and memory-intensive, thus leading to suboptimal speed performance in end-to-end training time. In this work, we analyze the speed performance (i.e., latency-bounded throughput) as the key metric in MARL implementations. Specifically, we first introduce a taxonomy of MARL algorithms from an acceleration perspective categorized by (1) training scheme and (2) communication method. Using our taxonomy, we identify three state-of-the-art MARL algorithms - Multi-Agent Deep Deterministic Policy Gradient (MADDPG), Target-oriented Multi-agent Communication and Cooperation (ToM2C), and Networked Multi-Agent RL (NeurComm) - as target benchmark algorithms, and provide a systematic analysis of their performance bottlenecks on a homogeneous multi-core CPU platform. We justify the need for MARL latency-bounded throughput to be a key performance metric in future literature while also addressing opportunities for parallelization and acceleration.
RAIN: Your Language Models Can Align Themselves without Finetuning
Authors: Authors: Yuhui Li, Fangyun Wei, Jinjing Zhao, Chao Zhang, Hongyang Zhang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2309.07124
Pdf link: https://arxiv.org/pdf/2309.07124
Abstract Large language models (LLMs) often demonstrate inconsistencies with human preferences. Previous research gathered human preference data and then aligned the pre-trained models using reinforcement learning or instruction tuning, the so-called finetuning step. In contrast, aligning frozen LLMs without any extra data is more appealing. This work explores the potential of the latter setting. We discover that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting. We introduce a novel inference method, Rewindable Auto-regressive INference (RAIN), that allows pre-trained LLMs to evaluate their own generation and use the evaluation results to guide backward rewind and forward generation for AI safety. Notably, RAIN operates without the need of extra data for model alignment and abstains from any training, gradient computation, or parameter updates; during the self-evaluation phase, the model receives guidance on which human preference to align with through a fixed-template prompt, eliminating the need to modify the initial prompt. Experimental results evaluated by GPT-4 and humans demonstrate the effectiveness of RAIN: on the HH dataset, RAIN improves the harmlessness rate of LLaMA 30B over vanilla inference from 82% to 97%, while maintaining the helpfulness rate. Under the leading adversarial attack llm-attacks on Vicuna 33B, RAIN establishes a new defense baseline by reducing the attack success rate from 94% to 19%.
Keyword: super-resolution

Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense
Authors: Authors: Jianqiao Wangni
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.06724
Pdf link: https://arxiv.org/pdf/2309.06724
Abstract We aim to provide a general framework of for computational photography that recovers the real scene from imperfect images, via the Deep Nonparametric Convexified Filtering (DNCF). It is consists of a nonparametric deep network to resemble the physical equations behind the image formation, such as denoising, super-resolution, inpainting, and flash. DNCF has no parameterization dependent on training data, therefore has a strong generalization and robustness to adversarial image manipulation. During inference, we also encourage the network parameters to be nonnegative and create a bi-convex function on the input and parameters, and this adapts to second-order optimization algorithms with insufficient running time, having 10X acceleration over Deep Image Prior. With these tools, we empirically verify its capability to defend image classification deep networks against adversary attack algorithms in real-time.

zoq / arxiv-updates

New submissions for Thu, 14 Sep 23 #600

Keyword: sgd

DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass

Keyword: optimization

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Distributionally Robust Transfer Learning

Offline Prompt Evaluation and Optimization with Inverse Reinforcement Learning

Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices

RT-LM: Uncertainty-Aware Resource Management for Real-Time Inference of Language Models

Bregman Graph Neural Network

Statistical Rejection Sampling Improves Preference Optimization

Generalizable Neural Fields as Partially Observed Neural Processes

Federated PAC-Bayesian Learning on Non-IID data

Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense

Hierarchical Time-Optimal Planning for Multi-Vehicle Racing

Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss

A Critical Escape Probability Formulation for Enhancing the Transient Stability of Power Systems with System Parameter Design

Supervised Machine Learning and Physics based Machine Learning approach for prediction of peak temperature distribution in Additive Friction Stir Deposition of Aluminium Alloy

Lotaru: Locally Predicting Workflow Task Runtimes for Resource Management on Heterogeneous Infrastructures

Investigating the Impact of Action Representations in Policy Gradient Algorithms

Instance Adaptive Prototypical Contrastive Embedding for Generalized Zero Shot Learning

Efficient Reinforcement Learning for Jumping Monopods

Large Language Models for Compiler Optimization

Characterizing Speed Performance of Multi-Agent Reinforcement Learning

Tree-Structured Shading Decomposition

Keyword: adam

Keyword: gradient

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Machine Translation Models Stand Strong in the Face of Adversarial Attacks

Strong-Weak Integrated Semi-supervision for Unsupervised Single and Multi Target Domain Adaptation

Explainable Graph Neural Network for Alzheimer's Disease And Related Dementias Risk Prediction

Epistemic Modeling Uncertainty of Rapid Neural Network Ensembles for Adaptive Learning

Dissipative Imitation Learning for Discrete Dynamic Output Feedback Control with Sparse Data Sets

Generalizable Neural Fields as Partially Observed Neural Processes

Attention Loss Adjusted Prioritized Experience Replay

Tackling the Non-IID Issue in Heterogeneous Federated Learning by Gradient Harmonization

DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass

Supervised Machine Learning and Physics based Machine Learning approach for prediction of peak temperature distribution in Additive Friction Stir Deposition of Aluminium Alloy

PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

Differentiable JPEG: The Devil is in the Details

APICom: Automatic API Completion via Prompt Learning and Adversarial Training-based Data Augmentation

Multi-Robot Informative Path Planning from Regression with Sparse Gaussian Processes

Characterizing Speed Performance of Multi-Agent Reinforcement Learning

RAIN: Your Language Models Can Align Themselves without Finetuning

Keyword: super-resolution

Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense