New submissions for Mon, 23 Oct 23

Keyword: sgd

LASER: Linear Compression in Wireless Distributed Optimization

Authors: Authors: Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael C. Gastpar
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.13033
Pdf link: https://arxiv.org/pdf/2310.13033
Abstract Data-parallel SGD is the de facto algorithm for distributed optimization, especially for large scale machine learning. Despite its merits, communication bottleneck is one of its persistent issues. Most compression schemes to alleviate this either assume noiseless communication links, or fail to achieve good performance on practical tasks. In this paper, we close this gap and introduce LASER: LineAr CompreSsion in WirEless DistRibuted Optimization. LASER capitalizes on the inherent low-rank structure of gradients and transmits them efficiently over the noisy channels. Whilst enjoying theoretical guarantees similar to those of the classical SGD, LASER shows consistent gains over baselines on a variety of practical benchmarks. In particular, it outperforms the state-of-the-art compression schemes on challenging computer vision and GPT language modeling tasks. On the latter, we obtain $50$-$64 \%$ improvement in perplexity over our baselines for noisy channels.
Numerical approximation of McKean-Vlasov SDEs via stochastic gradient descent
Authors: Authors: Ankush Agarwal, Andrea Amato, Goncalo dos Reis, Stefano Pagliarani
Subjects: Numerical Analysis (math.NA); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2310.13579
Pdf link: https://arxiv.org/pdf/2310.13579
Abstract We propose a novel approach to numerically approximate McKean-Vlasov stochastic differential equations (MV-SDE) using stochastic gradient descent (SGD) while avoiding the use of interacting particle systems. The technique of SGD is deployed to solve a Euclidean minimization problem, which is obtained by first representing the MV-SDE as a minimization problem over the set of continuous functions of time, and then by approximating the domain with a finite-dimensional subspace. Convergence is established by proving certain intermediate stability and moment estimates of the relevant stochastic processes (including the tangent ones). Numerical experiments illustrate the competitive performance of our SGD based method compared to the IPS benchmarks. This work offers a theoretical foundation for using the SGD method in the context of numerical approximation of MV-SDEs, and provides analytical tools to study its stability and convergence.
Keyword: optimization

Adaptive Dynamic Programming for Energy-Efficient Base Station Cell Switching
Authors: Authors: Junliang Luo, Yi Tian Xu, Di Wu, Michael Jenkin, Xue Liu, Gregory Dudek
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.12999
Pdf link: https://arxiv.org/pdf/2310.12999
Abstract Energy saving in wireless networks is growing in importance due to increasing demand for evolving new-gen cellular networks, environmental and regulatory concerns, and potential energy crises arising from geopolitical tensions. In this work, we propose an approximate dynamic programming (ADP)-based method coupled with online optimization to switch on/off the cells of base stations to reduce network power consumption while maintaining adequate Quality of Service (QoS) metrics. We use a multilayer perceptron (MLP) given each state-action pair to predict the power consumption to approximate the value function in ADP for selecting the action with optimal expected power saved. To save the largest possible power consumption without deteriorating QoS, we include another MLP to predict QoS and a long short-term memory (LSTM) for predicting handovers, incorporated into an online optimization algorithm producing an adaptive QoS threshold for filtering cell switching actions based on the overall QoS history. The performance of the method is evaluated using a practical network simulator with various real-world scenarios with dynamic traffic patterns.
Compositional preference models for aligning LMs
Authors: Authors: Dongyoung Go, Tomasz Korbak, Germán Kruszewski, Jos Rozen, Marc Dymetman
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.13011
Pdf link: https://arxiv.org/pdf/2310.13011
Abstract As language models (LMs) become more capable, it is increasingly important to align them with human preferences. However, the dominant paradigm for training Preference Models (PMs) for that purpose suffers from fundamental limitations, such as lack of transparency and scalability, along with susceptibility to overfitting the preference dataset. We propose Compositional Preference Models (CPMs), a novel PM framework that decomposes one global preference assessment into several interpretable features, obtains scalar scores for these features from a prompted LM, and aggregates these scores using a logistic regression classifier. CPMs allow to control which properties of the preference data are used to train the preference model and to build it based on features that are believed to underlie the human preference judgment. Our experiments show that CPMs not only improve generalization and are more robust to overoptimization than standard PMs, but also that best-of-n samples obtained using CPMs tend to be preferred over samples obtained using conventional PMs. Overall, our approach demonstrates the benefits of endowing PMs with priors about which features determine human preferences while relying on LM capabilities to extract those features in a scalable and robust way.
Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised Language Understanding
Authors: Authors: Jianing Wang, Qiushi Sun, Nuo Chen, Chengyu Wang, Jun Huang, Ming Gao, Xiang Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.13022
Pdf link: https://arxiv.org/pdf/2310.13022
Abstract The recent success of large pre-trained language models (PLMs) heavily hinges on massive labeled data, which typically produces inferior performance in low-resource scenarios. To remedy this dilemma, we study self-training as one of the predominant semi-supervised learning (SSL) approaches, which utilizes large-scale unlabeled data to generate synthetic examples. However, too many noisy labels will hurt the model performance, and the self-training procedure requires multiple training iterations making it more expensive if all the model parameters of the PLM are updated. This paper presents UPET, a novel Uncertainty-aware Parameter-Efficient self-Training framework to effectively and efficiently address the labeled data scarcity issue. Specifically, we incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation for the teacher model and then judiciously select reliable pseudo-labeled examples based on confidence and certainty. During the student training, we introduce multiple parameter-efficient learning (PEL) paradigms that allow the optimization of only a small percentage of parameters. We also propose a novel Easy-Hard Contrastive Tuning to enhance the robustness and generalization. Extensive experiments over multiple downstream tasks demonstrate that UPET achieves a substantial improvement in terms of performance and efficiency. Our codes and data are released at https: //github.com/wjn1996/UPET.
LASER: Linear Compression in Wireless Distributed Optimization
Authors: Authors: Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael C. Gastpar
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.13033
Pdf link: https://arxiv.org/pdf/2310.13033
Abstract Data-parallel SGD is the de facto algorithm for distributed optimization, especially for large scale machine learning. Despite its merits, communication bottleneck is one of its persistent issues. Most compression schemes to alleviate this either assume noiseless communication links, or fail to achieve good performance on practical tasks. In this paper, we close this gap and introduce LASER: LineAr CompreSsion in WirEless DistRibuted Optimization. LASER capitalizes on the inherent low-rank structure of gradients and transmits them efficiently over the noisy channels. Whilst enjoying theoretical guarantees similar to those of the classical SGD, LASER shows consistent gains over baselines on a variety of practical benchmarks. In particular, it outperforms the state-of-the-art compression schemes on challenging computer vision and GPT language modeling tasks. On the latter, we obtain $50$-$64 \%$ improvement in perplexity over our baselines for noisy channels.
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
Authors: Authors: Darshil Doshi, Aritra Das, Tianyu He, Andrey Gromov
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.13061
Pdf link: https://arxiv.org/pdf/2310.13061
Abstract Robust generalization is a major challenge in deep learning, particularly when the number of trainable parameters is very large. In general, it is very difficult to know if the network has memorized a particular set of examples or understood the underlying rule (or both). Motivated by this challenge, we study an interpretable model where generalizing representations are understood analytically, and are easily distinguishable from the memorizing ones. Namely, we consider two-layer neural networks trained on modular arithmetic tasks where ($\xi \cdot 100\%$) of labels are corrupted (\emph{i.e.} some results of the modular operations in the training set are incorrect). We show that (i) it is possible for the network to memorize the corrupted labels \emph{and} achieve $100\%$ generalization at the same time; (ii) the memorizing neurons can be identified and pruned, lowering the accuracy on corrupted data and improving the accuracy on uncorrupted data; (iii) regularization methods such as weight decay, dropout and BatchNorm force the network to ignore the corrupted data during optimization, and achieve $100\%$ accuracy on the uncorrupted dataset; and (iv) the effect of these regularization methods is (``mechanistically'') interpretable: weight decay and dropout force all the neurons to learn generalizing representations, while BatchNorm de-amplifies the output of memorizing neurons and amplifies the output of the generalizing ones. Finally, we show that in the presence of regularization, the training dynamics involves two consecutive stages: first, the network undergoes the \emph{grokking} dynamics reaching high train \emph{and} test accuracy; second, it unlearns the memorizing representations, where train accuracy suddenly jumps from $100\%$ to $100 (1-\xi)\%$.
Creative Robot Tool Use with Large Language Models
Authors: Authors: Mengdi Xu, Peide Huang, Wenhao Yu, Shiqi Liu, Xilun Zhang, Yaru Niu, Tingnan Zhang, Fei Xia, Jie Tan, Ding Zhao
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.13065
Pdf link: https://arxiv.org/pdf/2310.13065
Abstract Tool use is a hallmark of advanced intelligence, exemplified in both animal behavior and robotic capabilities. This paper investigates the feasibility of imbuing robots with the ability to creatively use tools in tasks that involve implicit physical constraints and long-term planning. Leveraging Large Language Models (LLMs), we develop RoboTool, a system that accepts natural language instructions and outputs executable code for controlling robots in both simulated and real-world environments. RoboTool incorporates four pivotal components: (i) an "Analyzer" that interprets natural language to discern key task-related concepts, (ii) a "Planner" that generates comprehensive strategies based on the language input and key concepts, (iii) a "Calculator" that computes parameters for each skill, and (iv) a "Coder" that translates these plans into executable Python code. Our results show that RoboTool can not only comprehend explicit or implicit physical constraints and environmental factors but also demonstrate creative tool use. Unlike traditional Task and Motion Planning (TAMP) methods that rely on explicit optimization, our LLM-based system offers a more flexible, efficient, and user-friendly solution for complex robotics tasks. Through extensive experiments, we validate that RoboTool is proficient in handling tasks that would otherwise be infeasible without the creative use of tools, thereby expanding the capabilities of robotic systems. Demos are available on our project page: https://creative-robotool.github.io/.
Closed-Loop Motion Planning for Differentially Flat Systems: A Time-Varying Optimization Framework
Authors: Authors: Tianqi Zheng, John W. Simpson-Porco, Enrique Mallada
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.13090
Pdf link: https://arxiv.org/pdf/2310.13090
Abstract Motion planning and control are two core components of the robotic systems autonomy stack. The standard approach to combine these methodologies comprises an offline/open-loop stage, planning, that designs a feasible and safe trajectory to follow, and an online/closed-loop stage, tracking, that corrects for unmodeled dynamics and disturbances. Such an approach generally introduces conservativeness into the planning stage, which becomes difficult to overcome as the model complexity increases and real-time decisions need to be made in a changing environment. This work addresses these challenges for the class of differentially flat nonlinear systems by integrating planning and control into a cohesive closed-loop task. Precisely, we develop an optimization-based framework that aims to steer a differentially flat system to a trajectory implicitly defined via a constrained time-varying optimization problem. To that end, we generalize the notion of feedback linearization, which makes non-linear systems behave as linear systems, and develop controllers that effectively transform a differentially flat system into an optimization algorithm that seeks to find the optimal solution of a (possibly time-varying) optimization problem. Under sufficient regularity assumptions, we prove global asymptotic convergence for the optimization dynamics to the minimizer of the time-varying optimization problem. We illustrate the effectiveness of our method with two numerical examples: a multi-robot tracking problem and a robot obstacle avoidance problem.
Semi-Supervised Learning of Dynamical Systems with Neural Ordinary Differential Equations: A Teacher-Student Model Approach
Authors: Authors: Yu Wang, Yuxuan Yin, Karthik Somayaji Nanjangud Suryanarayana, Jan Drgona, Malachi Schram, Mahantesh Halappanavar, Frank Liu, Peng Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.13110
Pdf link: https://arxiv.org/pdf/2310.13110
Abstract Modeling dynamical systems is crucial for a wide range of tasks, but it remains challenging due to complex nonlinear dynamics, limited observations, or lack of prior knowledge. Recently, data-driven approaches such as Neural Ordinary Differential Equations (NODE) have shown promising results by leveraging the expressive power of neural networks to model unknown dynamics. However, these approaches often suffer from limited labeled training data, leading to poor generalization and suboptimal predictions. On the other hand, semi-supervised algorithms can utilize abundant unlabeled data and have demonstrated good performance in classification and regression tasks. We propose TS-NODE, the first semi-supervised approach to modeling dynamical systems with NODE. TS-NODE explores cheaply generated synthetic pseudo rollouts to broaden exploration in the state space and to tackle the challenges brought by lack of ground-truth system data under a teacher-student model. TS-NODE employs an unified optimization framework that corrects the teacher model based on the student's feedback while mitigating the potential false system dynamics present in pseudo rollouts. TS-NODE demonstrates significant performance improvements over a baseline Neural ODE model on multiple dynamical system modeling tasks.
Optimal Symbolic Bound Synthesis
Authors: Authors: John Cyphert, Yotam Feldman, Zachary Kincaid, Thomas Reps
Subjects: Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2310.13144
Pdf link: https://arxiv.org/pdf/2310.13144
Abstract The problem of finding a constant bound on a term given a set of assumptions has wide applications in optimization as well as program analysis. However, in many contexts the objective term may be unbounded. Still, some sort of symbolic bound may be useful. In this paper we introduce the optimal symbolic-bound synthesis problem, and a technique that tackles this problem for non-linear arithmetic with function symbols. This allows us to automatically produce symbolic bounds on complex arithmetic expressions from a set of both equality and inequality assumptions. Our solution employs a novel combination of powerful mathematical objects -- Gr\"obner bases together with polyhedral cones -- to represent an infinite set of implied inequalities. We obtain a sound symbolic bound by reducing the objective term by this infinite set. We implemented our method in a tool, AutoBound, which we tested on problems originating from real Solidity programs. We find that AutoBound yields relevant bounds in each case, matching or nearly-matching upper bounds produced by a human analyst on the same set of programs.
Approximated Modeling and Optimal Design for a Soft Pneumatic Actuator Considering the Force/Torque and System Controllability
Authors: Authors: Wu-Te Yang, Burak Kurkcu, Masayoshi Tomizuka
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.13168
Pdf link: https://arxiv.org/pdf/2310.13168
Abstract Soft pneumatic actuators (SPAs) are widely employed to drive soft robots. However, their inherent flexibility offers both benefits and challenges. This property reduces their output force/torque and makes them hard to control. This paper introduces a new design method that enhances the actuator's performance and controllability. The complex structure of the soft actuator is simplified by approximating it as a cantilever beam. This allows us to derive a mechanical equation between input pressure to output torque. Additionally, a dynamical model is explored to understand the correlation between the natural frequency and dimensional parameters of the SPA. The design problem is then transformed into an optimization problem, using the mechanical equation as the objective function and the dynamical equation as a constraint. By solving this optimization problem, the optimal dimensional parameters are determined. Prior to fabrication, preliminary tests are conducted using the finite element method. Six prototypes are manufactured to validate the proposed approach. The optimal actuator successfully generates the desired force/torque, while its natural frequency remains within the constrained range. This work highlights the potential of using approximated models and optimization formulation to boost the efficiency and dynamic performance of soft pneumatic actuators.
Absolute Policy Optimization
Authors: Authors: Weiye Zhao, Feihan Li, Yifan Sun, Rui Chen, Tianhao Wei, Changliu Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.13230
Pdf link: https://arxiv.org/pdf/2310.13230
Abstract In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function; by optimizing which, it will lead to guaranteed monotonic improvement in the lower bound of near-total performance samples (absolute performance). Considering this groundbreaking theoretical advancement, we then refine this theoretically grounded algorithm through a series of approximations, resulting in a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in both expected performance and worst-case performance.
Detecting Shared Data Manipulation in Distributed Optimization Algorithms
Authors: Authors: Mohannad Alkhraijah, Rachel Harris, Samuel Litchfield, David Huggins, Daniel K. Molzahn
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.13252
Pdf link: https://arxiv.org/pdf/2310.13252
Abstract This paper investigates the vulnerability of the Alternating Direction Method of Multipliers (ADMM) algorithm to shared data manipulation, with a focus on solving optimal power flow (OPF) problems. Deliberate data manipulation may cause the ADMM algorithm to converge to suboptimal solutions. We derive two sufficient conditions for detecting data manipulation based on the theoretical convergence trajectory of the ADMM algorithm. We evaluate the detection conditions' performance on three data manipulation strategies we previously proposed: simple, feedback, and bilevel optimization attacks. We then extend these three data manipulation strategies to avoid detection by considering both the detection conditions and a neural network (NN) detection model in the attacks. We also propose an adversarial NN training framework to detect shared data manipulation. We illustrate the performance of our data manipulation strategy and detection framework on OPF problems. The results show that the proposed detection conditions successfully detect most of the data manipulation attacks. However, a bilevel optimization attack strategy that incorporates the detection methods may avoid being detected. Countering this, our proposed adversarial training framework detects all the instances of the bilevel optimization attack.
A Data-Centric Multi-Objective Learning Framework for Responsible Recommendation Systems
Authors: Authors: Xu Huang, Jianxun Lian, Hao Wang, Defu Lian, Xing Xie
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2310.13260
Pdf link: https://arxiv.org/pdf/2310.13260
Abstract Recommendation systems effectively guide users in locating their desired information within extensive content repositories. Generally, a recommendation model is optimized to enhance accuracy metrics from a user utility standpoint, such as click-through rate or matching relevance. However, a responsible industrial recommendation system must address not only user utility (responsibility to users) but also other objectives, including increasing platform revenue (responsibility to platforms), ensuring fairness (responsibility to content creators), and maintaining unbiasedness (responsibility to long-term healthy development). Multi-objective learning is a potent approach for achieving responsible recommendation systems. Nevertheless, current methods encounter two challenges: difficulty in scaling to heterogeneous objectives within a unified framework, and inadequate controllability over objective priority during optimization, leading to uncontrollable solutions. In this paper, we present a data-centric optimization framework, MoRec, which unifies the learning of diverse objectives. MoRec is a tri-level framework: the outer level manages the balance between different objectives, utilizing a proportional-integral-derivative (PID)-based controller to ensure a preset regularization on the primary objective. The middle level transforms objective-aware optimization into data sampling weights using sign gradients. The inner level employs a standard optimizer to update model parameters with the sampled data. Consequently, MoRec can flexibly support various objectives while maintaining the original model intact. Comprehensive experiments on two public datasets and one industrial dataset showcase the effectiveness, controllability, flexibility, and Pareto efficiency of MoRec, making it highly suitable for real-world implementation.
Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models
Authors: Authors: Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.13315
Pdf link: https://arxiv.org/pdf/2310.13315
Abstract Quantization is a promising approach for reducing memory overhead and accelerating inference, especially in large pre-trained language model (PLM) scenarios. While having no access to original training data due to security and privacy concerns has emerged the demand for zero-shot quantization. Most of the cutting-edge zero-shot quantization methods primarily 1) apply to computer vision tasks, and 2) neglect of overfitting problem in the generative adversarial learning process, leading to sub-optimal performance. Motivated by this, we propose a novel zero-shot sharpness-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs. The key algorithm in solving ZSAQ is the SAM-SGA optimization, which aims to improve the quantization accuracy and model generalization via optimizing a minimax problem. We theoretically prove the convergence rate for the minimax optimization problem and this result can be applied to other nonconvex-PL minimax optimization frameworks. Extensive experiments on 11 tasks demonstrate that our method brings consistent and significant performance gains on both discriminative and generative PLMs, i.e., up to +6.98 average score. Furthermore, we empirically validate that our method can effectively improve the model generalization.
VFedMH: Vertical Federated Learning for Training Multi-party Heterogeneous Models
Authors: Authors: Shuo Wang, Keke Gai, Jing Yu, Liehuang Zhu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.13367
Pdf link: https://arxiv.org/pdf/2310.13367
Abstract Vertical Federated Learning (VFL) has gained increasing attention as a novel training paradigm that integrates sample alignment and feature union. However, existing VFL methods face challenges when dealing with heterogeneous local models among participants, which affects optimization convergence and generalization. To address this issue, this paper proposes a novel approach called Vertical Federated learning for training Multi-parties Heterogeneous models (VFedMH). VFedMH focuses on aggregating the embeddings of each participant's knowledge instead of intermediate results during forward propagation. The active party, who possesses labels and features of the sample, in VFedMH securely aggregates local embeddings to obtain global knowledge embeddings, and sends them to passive parties. The passive parties, who own only features of the sample, then utilize the global embeddings to propagate forward on their local heterogeneous networks. However, the passive party does not own the labels, so the local model gradient cannot be calculated locally. To overcome this limitation, the active party assists the passive party in computing its local heterogeneous model gradients. Then, each participant trains their local model using the heterogeneous model gradients. The objective is to minimize the loss value of their respective local heterogeneous models. Additionally, the paper provides a theoretical analysis of VFedMH's convergence performance. Extensive experiments are conducted to demonstrate that VFedMH can simultaneously train multiple heterogeneous models with heterogeneous optimization and outperform some recent methods in model performance.
An Improved Artificial Fish Swarm Algorithm for Solving the Problem of Investigation Path Planning
Authors: Authors: Qian Huang, Weiwen Qian, Chang Li, Xuan Ding
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2310.13375
Pdf link: https://arxiv.org/pdf/2310.13375
Abstract Informationization is a prevailing trend in today's world. The increasing demand for information in decision-making processes poses significant challenges for investigation activities, particularly in terms of effectively allocating limited resources to plan investigation programs. This paper addresses the investigation path planning problem by formulating it as a multi-traveling salesman problem (MTSP). Our objective is to minimize costs, and to achieve this, we propose a chaotic artificial fish swarm algorithm based on multiple population differential evolution (DE-CAFSA). To overcome the limitations of the artificial fish swarm algorithm, such as low optimization accuracy and the inability to consider global and local information, we incorporate adaptive field of view and step size adjustments, replace random behavior with the 2-opt operation, and introduce chaos theory and sub-optimal solutions to enhance optimization accuracy and search performance. Additionally, we integrate the differential evolution algorithm to create a hybrid algorithm that leverages the complementary advantages of both approaches. Experimental results demonstrate that DE-CAFSA outperforms other algorithms on various public datasets of different sizes, as well as showcasing excellent performance on the examples proposed in this study.
Equivariant Deep Weight Space Alignment
Authors: Authors: Aviv Navon, Aviv Shamsian, Ethan Fetaya, Gal Chechik, Nadav Dym, Haggai Maron
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.13397
Pdf link: https://arxiv.org/pdf/2310.13397
Abstract Permutation symmetries of deep networks make simple operations like model averaging and similarity estimation challenging. In many cases, aligning the weights of the networks, i.e., finding optimal permutations between their weights, is necessary. More generally, weight alignment is essential for a wide range of applications, from model merging, through exploring the optimization landscape of deep neural networks, to defining meaningful distance functions between neural networks. Unfortunately, weight alignment is an NP-hard problem. Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to either time-consuming methods or sub-optimal solutions. To accelerate the alignment process and improve its quality, we propose a novel framework aimed at learning to solve the weight alignment problem, which we name Deep-Align. To that end, we first demonstrate that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries. Notably, our framework does not require any labeled data. We provide a theoretical analysis of our approach and evaluate Deep-Align on several types of network architectures and learning setups. Our experimental results indicate that a feed-forward pass with Deep-Align produces better or equivalent alignments compared to those produced by current optimization algorithms. Additionally, our alignments can be used as an initialization for other methods to gain even better solutions with a significant speedup in convergence.
Two-Stage Triplet Loss Training with Curriculum Augmentation for Audio-Visual Retrieval
Authors: Authors: Donghuo Zeng, Kazushi Ikeda
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2310.13451
Pdf link: https://arxiv.org/pdf/2310.13451
Abstract The cross-modal retrieval model leverages the potential of triple loss optimization to learn robust embedding spaces. However, existing methods often train these models in a singular pass, overlooking the distinction between semi-hard and hard triples in the optimization process. The oversight of not distinguishing between semi-hard and hard triples leads to suboptimal model performance. In this paper, we introduce a novel approach rooted in curriculum learning to address this problem. We propose a two-stage training paradigm that guides the model's learning process from semi-hard to hard triplets. In the first stage, the model is trained with a set of semi-hard triplets, starting from a low-loss base. Subsequently, in the second stage, we augment the embeddings using an interpolation technique. This process identifies potential hard negatives, alleviating issues arising from high-loss functions due to a scarcity of hard triples. Our approach then applies hard triplet mining in the augmented embedding space to further optimize the model. Extensive experimental results conducted on two audio-visual datasets show a significant improvement of approximately 9.8% in terms of average Mean Average Precision (MAP) over the current state-of-the-art method, MSNSCA, for the Audio-Visual Cross-Modal Retrieval (AV-CMR) task on the AVE dataset, indicating the effectiveness of our proposed method.
Stable Nonconvex-Nonconcave Training via Linear Interpolation
Authors: Authors: Thomas Pethick, Wanyun Xie, Volkan Cevher
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.13459
Pdf link: https://arxiv.org/pdf/2310.13459
Abstract This paper presents a theoretical analysis of linear interpolation as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear interpolation can help by leveraging the theory of nonexpansive operators. We construct a new optimization scheme called relaxed approximate proximal point (RAPP), which is the first explicit method to achieve last iterate convergence rates for the full range of cohypomonotone problems. The construction extends to constrained and regularized settings. By replacing the inner optimizer in RAPP we rediscover the family of Lookahead algorithms for which we establish convergence in cohypomonotone problems even when the base optimizer is taken to be gradient descent ascent. The range of cohypomonotone problems in which Lookahead converges is further expanded by exploiting that Lookahead inherits the properties of the base optimizer. We corroborate the results with experiments on generative adversarial networks which demonstrates the benefits of the linear interpolation present in both RAPP and Lookahead.
Distributed Adaptive Time-Varying Convex Optimization for Multi-agent Systems
Authors: Authors: Liangze Jiang, Zhengguang Wu, Lei Wang
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.13541
Pdf link: https://arxiv.org/pdf/2310.13541
Abstract This paper focus on the time-varying convex optimization problems with uncertain parameters. A new class of adaptive algorithms are proposed to solve time-varying convex optimization problems. Under the mild assumption of Hessian and partial derivative of the gradient with respect to time, the dependence on them is reduced through appropriate adaptive law design. By integrating the new adaptive optimization algorithm and a modified consensus algorithms, the time-varying optimization problems can be solved in a distributed manner. Then, they are extended from the agents with single-integrator dynamics to double-integrator dynamics, which describes more practical systems. As an example, the source seeking problem is used to verify the proposed design.
Distributed Global Optimal Coverage Control in Multi-agent Systems: Known and Unknown Environments
Authors: Authors: Mohammadhasan Faghihi, Meysam Yadegar, Mohammadhosein Bakhtiaridoust, Nader Meskin, Javad Shari, Peng Shi
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.13557
Pdf link: https://arxiv.org/pdf/2310.13557
Abstract This paper introduces a novel approach to solve the coverage optimization problem in multi-agent systems. The proposed technique offers a solution that not only achieves the global optimality in the agents configuration but also effectively handles the issue of agents remaining stationary in regions void of information. The proposed approach leverages a novel cost function for optimizing the agents coverage and the cost function eventually aligns with the conventional Voronoi-based cost function. Theoretical analyses are conducted to assure the asymptotic convergence of agents towards the optimal configuration. A distinguishing feature of this approach lies in its departure from the reliance on geometric methods that are characteristic of Voronoi-based approaches; hence can be implemented more simply. Remarkably, the technique is adaptive and applicable to various environments with both known and unknown information distributions. Lastly, the efficacy of the proposed method is demonstrated through simulations, and the obtained results are compared with those of Voronoi-based algorithms.
Semantic Decomposition of Question and SQL for Text-to-SQL Parsing
Authors: Authors: Ben Eyal, Amir Bachar, Ophir Haroche, Moran Mahabi, Michael Elhadad
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.13575
Pdf link: https://arxiv.org/pdf/2310.13575
Abstract Text-to-SQL semantic parsing faces challenges in generalizing to cross-domain and complex queries. Recent research has employed a question decomposition strategy to enhance the parsing of complex SQL queries. However, this strategy encounters two major obstacles: (1) existing datasets lack question decomposition; (2) due to the syntactic complexity of SQL, most complex queries cannot be disentangled into sub-queries that can be readily recomposed. To address these challenges, we propose a new modular Query Plan Language (QPL) that systematically decomposes SQL queries into simple and regular sub-queries. We develop a translator from SQL to QPL by leveraging analysis of SQL server query optimization plans, and we augment the Spider dataset with QPL programs. Experimental results demonstrate that the modular nature of QPL benefits existing semantic-parsing architectures, and training text-to-QPL parsers is more effective than text-to-SQL parsing for semantically equivalent queries. The QPL approach offers two additional advantages: (1) QPL programs can be paraphrased as simple questions, which allows us to create a dataset of (complex question, decomposed questions). Training on this dataset, we obtain a Question Decomposer for data retrieval that is sensitive to database schemas. (2) QPL is more accessible to non-experts for complex queries, leading to more interpretable output from the semantic parser.
Entangled Preferences: The History and Risks of Reinforcement Learning and Human Feedback
Authors: Authors: Nathan Lambert, Thomas Krendl Gilbert, Tom Zick
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2310.13595
Pdf link: https://arxiv.org/pdf/2310.13595
Abstract Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to use and more effective. A core piece of the RLHF process is the training and utilization of a model of human preferences that acts as a reward function for optimization. This approach, which operates at the intersection of many stakeholders and academic disciplines, remains poorly understood. RLHF reward models are often cited as being central to achieving performance, yet very few descriptors of capabilities, evaluations, training methods, or open-source models exist. Given this lack of information, further study and transparency is needed for learned RLHF reward models. In this paper, we illustrate the complex history of optimizing preferences, and articulate lines of inquiry to understand the sociotechnical context of reward models. In particular, we highlight the ontological differences between costs, rewards, and preferences at stake in RLHF's foundations, related methodological tensions, and possible research directions to improve general understanding of how reward models function.
Contrastive Prefence Learning: Learning from Human Feedback without RL
Authors: Authors: Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.13639
Pdf link: https://arxiv.org/pdf/2310.13639
Abstract Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the model by optimizing the learned reward via reinforcement learning (RL). This paradigm assumes that human preferences are distributed according to reward, but recent work suggests that they instead follow the regret under the user's optimal policy. Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase. Because of these optimization challenges, contemporary RLHF methods restrict themselves to contextual bandit settings (e.g., as in large language models) or limit observation dimensionality (e.g., state-based robotics). We overcome these limitations by introducing a new family of algorithms for optimizing behavior from human feedback using the regret-based model of human preferences. Using the principle of maximum entropy, we derive Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions, circumventing the need for RL. CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs. This enables CPL to elegantly scale to high-dimensional and sequential RLHF problems while being simpler than prior methods.
Keyword: adam

There is no result

Keyword: gradient

Enabling energy-Efficient object detection with surrogate gradient descent in spiking neural networks
Authors: Authors: Jilong Luo, Shanlin Xiao, Yinsheng Chen, Zhiyi Yu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.12985
Pdf link: https://arxiv.org/pdf/2310.12985
Abstract Spiking Neural Networks (SNNs) are a biologically plausible neural network model with significant advantages in both event-driven processing and spatio-temporal information processing, rendering SNNs an appealing choice for energyefficient object detection. However, the non-differentiability of the biological neuronal dynamics model presents a challenge during the training of SNNs. Furthermore, a suitable decoding strategy for object detection in SNNs is currently lacking. In this study, we introduce the Current Mean Decoding (CMD) method, which solves the regression problem to facilitate the training of deep SNNs for object detection tasks. Based on the gradient surrogate and CMD, we propose the SNN-YOLOv3 model for object detection. Our experiments demonstrate that SNN-YOLOv3 achieves a remarkable performance with an mAP of 61.87% on the PASCAL VOC dataset, requiring only 6 time steps. Compared to SpikingYOLO, we have managed to increase mAP by nearly 10% while reducing energy consumption by two orders of magnitude.
Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time series
Authors: Authors: Ioannis Nasios, Konstantinos Vogklis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Analysis, Statistics and Probability (physics.data-an); Statistical Finance (q-fin.ST)
Arxiv link: https://arxiv.org/abs/2310.13029
Pdf link: https://arxiv.org/pdf/2310.13029
Abstract In this paper we tackle the problem of point and probabilistic forecasting by describing a blending methodology of machine learning models that belong to gradient boosted trees and neural networks families. These principles were successfully applied in the recent M5 Competition on both Accuracy and Uncertainty tracks. The keypoints of our methodology are: a) transform the task to regression on sales for a single day b) information rich feature engineering c) create a diverse set of state-of-the-art machine learning models and d) carefully construct validation sets for model tuning. We argue that the diversity of the machine learning models along with the careful selection of validation examples, where the most important ingredients for the effectiveness of our approach. Although forecasting data had an inherent hierarchy structure (12 levels), none of our proposed solutions exploited that hierarchical scheme. Using the proposed methodology, our team was ranked within the gold medal range in both Accuracy and the Uncertainty track. Inference code along with already trained models are available at https://github.com/IoannisNasios/M5_Uncertainty_3rd_place
LASER: Linear Compression in Wireless Distributed Optimization
Authors: Authors: Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael C. Gastpar
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.13033
Pdf link: https://arxiv.org/pdf/2310.13033
Abstract Data-parallel SGD is the de facto algorithm for distributed optimization, especially for large scale machine learning. Despite its merits, communication bottleneck is one of its persistent issues. Most compression schemes to alleviate this either assume noiseless communication links, or fail to achieve good performance on practical tasks. In this paper, we close this gap and introduce LASER: LineAr CompreSsion in WirEless DistRibuted Optimization. LASER capitalizes on the inherent low-rank structure of gradients and transmits them efficiently over the noisy channels. Whilst enjoying theoretical guarantees similar to those of the classical SGD, LASER shows consistent gains over baselines on a variety of practical benchmarks. In particular, it outperforms the state-of-the-art compression schemes on challenging computer vision and GPT language modeling tasks. On the latter, we obtain $50$-$64 \%$ improvement in perplexity over our baselines for noisy channels.
In-context Learning with Transformer Is Really Equivalent to a Contrastive Learning Pattern
Authors: Authors: Ruifeng Ren, Yong Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.13220
Pdf link: https://arxiv.org/pdf/2310.13220
Abstract Pre-trained large language models based on Transformers have demonstrated amazing in-context learning (ICL) abilities. Given several demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL. In this paper, we interpret the inference process of ICL as a gradient descent process in a contrastive learning pattern. Firstly, leveraging kernel methods, we establish the relationship between gradient descent and self-attention mechanism under generally used softmax attention setting instead of linear attention setting. Then, we analyze the corresponding gradient descent process of ICL from the perspective of contrastive learning without negative samples and discuss possible improvements of this contrastive learning pattern, based on which the self-attention layer can be further modified. Finally, we design experiments to support our opinions. To the best of our knowledge, our work is the first to provide the understanding of ICL from the perspective of contrastive learning and has the potential to facilitate future model design by referring to related works on contrastive learning.
Absolute Policy Optimization
Authors: Authors: Weiye Zhao, Feihan Li, Yifan Sun, Rui Chen, Tianhao Wei, Changliu Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.13230
Pdf link: https://arxiv.org/pdf/2310.13230
Abstract In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function; by optimizing which, it will lead to guaranteed monotonic improvement in the lower bound of near-total performance samples (absolute performance). Considering this groundbreaking theoretical advancement, we then refine this theoretically grounded algorithm through a series of approximations, resulting in a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in both expected performance and worst-case performance.
A Data-Centric Multi-Objective Learning Framework for Responsible Recommendation Systems
Authors: Authors: Xu Huang, Jianxun Lian, Hao Wang, Defu Lian, Xing Xie
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2310.13260
Pdf link: https://arxiv.org/pdf/2310.13260
Abstract Recommendation systems effectively guide users in locating their desired information within extensive content repositories. Generally, a recommendation model is optimized to enhance accuracy metrics from a user utility standpoint, such as click-through rate or matching relevance. However, a responsible industrial recommendation system must address not only user utility (responsibility to users) but also other objectives, including increasing platform revenue (responsibility to platforms), ensuring fairness (responsibility to content creators), and maintaining unbiasedness (responsibility to long-term healthy development). Multi-objective learning is a potent approach for achieving responsible recommendation systems. Nevertheless, current methods encounter two challenges: difficulty in scaling to heterogeneous objectives within a unified framework, and inadequate controllability over objective priority during optimization, leading to uncontrollable solutions. In this paper, we present a data-centric optimization framework, MoRec, which unifies the learning of diverse objectives. MoRec is a tri-level framework: the outer level manages the balance between different objectives, utilizing a proportional-integral-derivative (PID)-based controller to ensure a preset regularization on the primary objective. The middle level transforms objective-aware optimization into data sampling weights using sign gradients. The inner level employs a standard optimizer to update model parameters with the sampled data. Consequently, MoRec can flexibly support various objectives while maintaining the original model intact. Comprehensive experiments on two public datasets and one industrial dataset showcase the effectiveness, controllability, flexibility, and Pareto efficiency of MoRec, making it highly suitable for real-world implementation.
Learning Recurrent Models with Temporally Local Rules
Authors: Authors: Azwar Abdulsalam, Joseph G. Makin
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.13284
Pdf link: https://arxiv.org/pdf/2310.13284
Abstract Fitting generative models to sequential data typically involves two recursive computations through time, one forward and one backward. The latter could be a computation of the loss gradient (as in backpropagation through time), or an inference algorithm (as in the RTS/Kalman smoother). The backward pass in particular is computationally expensive (since it is inherently serial and cannot exploit GPUs), and difficult to map onto biological processes. Work-arounds have been proposed; here we explore a very different one: requiring the generative model to learn the joint distribution over current and previous states, rather than merely the transition probabilities. We show on toy datasets that different architectures employing this principle can learn aspects of the data typically requiring the backward pass.
Combining Policy Gradient and Safety-Based Control for Autonomous Driving
Authors: Authors: Xi Xiong, Lu Liu
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.13314
Pdf link: https://arxiv.org/pdf/2310.13314
Abstract With the advancement of data-driven techniques, addressing continuous con-trol challenges has become more efficient. However, the reliance of these methods on historical data introduces the potential for unexpected decisions in novel scenarios. To enhance performance in autonomous driving and collision avoidance, we propose a symbiotic fusion of policy gradient with safety-based control. In this study, we em-ploy the Deep Deterministic Policy Gradient (DDPG) algorithm to enable autono-mous driving in the absence of surrounding vehicles. By training the vehicle's driving policy within a stable and familiar environment, a robust and efficient learning pro-cess is achieved. Subsequently, an artificial potential field approach is utilized to formulate a collision avoidance algorithm, accounting for the presence of surround-ing vehicles. Furthermore, meticulous consideration is given to path tracking meth-ods. The amalgamation of these approaches demonstrates substantial performance across diverse scenarios, underscoring its potential for advancing autonomous driving while upholding safety standards.
VFedMH: Vertical Federated Learning for Training Multi-party Heterogeneous Models
Authors: Authors: Shuo Wang, Keke Gai, Jing Yu, Liehuang Zhu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.13367
Pdf link: https://arxiv.org/pdf/2310.13367
Abstract Vertical Federated Learning (VFL) has gained increasing attention as a novel training paradigm that integrates sample alignment and feature union. However, existing VFL methods face challenges when dealing with heterogeneous local models among participants, which affects optimization convergence and generalization. To address this issue, this paper proposes a novel approach called Vertical Federated learning for training Multi-parties Heterogeneous models (VFedMH). VFedMH focuses on aggregating the embeddings of each participant's knowledge instead of intermediate results during forward propagation. The active party, who possesses labels and features of the sample, in VFedMH securely aggregates local embeddings to obtain global knowledge embeddings, and sends them to passive parties. The passive parties, who own only features of the sample, then utilize the global embeddings to propagate forward on their local heterogeneous networks. However, the passive party does not own the labels, so the local model gradient cannot be calculated locally. To overcome this limitation, the active party assists the passive party in computing its local heterogeneous model gradients. Then, each participant trains their local model using the heterogeneous model gradients. The objective is to minimize the loss value of their respective local heterogeneous models. Additionally, the paper provides a theoretical analysis of VFedMH's convergence performance. Extensive experiments are conducted to demonstrate that VFedMH can simultaneously train multiple heterogeneous models with heterogeneous optimization and outperform some recent methods in model performance.
Single-view 3D reconstruction via inverse procedural modeling
Authors: Authors: Albert Garifullin, Nikolay Maiorov, Vladimir Frolov
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.13373
Pdf link: https://arxiv.org/pdf/2310.13373
Abstract We propose an approach to 3D reconstruction via inverse procedural modeling and investigate two variants of this approach. The first option consists in the fitting set of input parameters using a genetic algorithm. We demonstrate the results of our work on tree models, complex objects, with the reconstruction of which most existing methods cannot handle. The second option allows us to significantly improve the precision by using gradients within memetic algorithm, differentiable rendering and also differentiable procedural generators. In our work we see 2 main contributions. First, we propose a method to join differentiable rendering and inverse procedural modeling. This gives us an opportunity to reconstruct 3D model more accurately than existing approaches when a small number of input images are available (even for single image). Second, we join both differentiable and non-differentiable procedural generators in a single framework which allow us to apply inverse procedural modeling to fairly complex generators: when gradient is available, reconstructions is precise, when gradient is not available, reconstruction is approximate, but always high quality without visual artifacts.
BRFL: A Blockchain-based Byzantine-Robust Federated Learning Model
Authors: Authors: Yang Li, Chunhe Xia, Chang Li, Tianbo Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.13403
Pdf link: https://arxiv.org/pdf/2310.13403
Abstract With the increasing importance of machine learning, the privacy and security of training data have become critical. Federated learning, which stores data in distributed nodes and shares only model parameters, has gained significant attention for addressing this concern. However, a challenge arises in federated learning due to the Byzantine Attack Problem, where malicious local models can compromise the global model's performance during aggregation. This article proposes the Blockchain-based Byzantine-Robust Federated Learning (BRLF) model that combines federated learning with blockchain technology. This integration enables traceability of malicious models and provides incentives for locally trained clients. Our approach involves selecting the aggregation node based on Pearson's correlation coefficient, and we perform spectral clustering and calculate the average gradient within each cluster, validating its accuracy using local dataset of the aggregation nodes. Experimental results on public datasets demonstrate the superior byzantine robustness of our secure aggregation algorithm compared to other baseline byzantine robust aggregation methods, and proved our proposed model effectiveness in addressing the resource consumption problem.
Stable Nonconvex-Nonconcave Training via Linear Interpolation
Authors: Authors: Thomas Pethick, Wanyun Xie, Volkan Cevher
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.13459
Pdf link: https://arxiv.org/pdf/2310.13459
Abstract This paper presents a theoretical analysis of linear interpolation as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear interpolation can help by leveraging the theory of nonexpansive operators. We construct a new optimization scheme called relaxed approximate proximal point (RAPP), which is the first explicit method to achieve last iterate convergence rates for the full range of cohypomonotone problems. The construction extends to constrained and regularized settings. By replacing the inner optimizer in RAPP we rediscover the family of Lookahead algorithms for which we establish convergence in cohypomonotone problems even when the base optimizer is taken to be gradient descent ascent. The range of cohypomonotone problems in which Lookahead converges is further expanded by exploiting that Lookahead inherits the properties of the base optimizer. We corroborate the results with experiments on generative adversarial networks which demonstrates the benefits of the linear interpolation present in both RAPP and Lookahead.
Distributed Adaptive Time-Varying Convex Optimization for Multi-agent Systems
Authors: Authors: Liangze Jiang, Zhengguang Wu, Lei Wang
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.13541
Pdf link: https://arxiv.org/pdf/2310.13541
Abstract This paper focus on the time-varying convex optimization problems with uncertain parameters. A new class of adaptive algorithms are proposed to solve time-varying convex optimization problems. Under the mild assumption of Hessian and partial derivative of the gradient with respect to time, the dependence on them is reduced through appropriate adaptive law design. By integrating the new adaptive optimization algorithm and a modified consensus algorithms, the time-varying optimization problems can be solved in a distributed manner. Then, they are extended from the agents with single-integrator dynamics to double-integrator dynamics, which describes more practical systems. As an example, the source seeking problem is used to verify the proposed design.
ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection
Authors: Authors: Zhongzhan Huang, Pan Zhou, Shuicheng Yan, Liang Lin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.13545
Pdf link: https://arxiv.org/pdf/2310.13545
Abstract In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improves the training stability of UNet. Experimental results on four famous datasets show that our methods are superior to stabilize training and yield about 1.5x training acceleration on different diffusion models with UNet or UViT backbones. Code: https://github.com/sail-sg/ScaleLong
Numerical approximation of McKean-Vlasov SDEs via stochastic gradient descent
Authors: Authors: Ankush Agarwal, Andrea Amato, Goncalo dos Reis, Stefano Pagliarani
Subjects: Numerical Analysis (math.NA); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2310.13579
Pdf link: https://arxiv.org/pdf/2310.13579
Abstract We propose a novel approach to numerically approximate McKean-Vlasov stochastic differential equations (MV-SDE) using stochastic gradient descent (SGD) while avoiding the use of interacting particle systems. The technique of SGD is deployed to solve a Euclidean minimization problem, which is obtained by first representing the MV-SDE as a minimization problem over the set of continuous functions of time, and then by approximating the domain with a finite-dimensional subspace. Convergence is established by proving certain intermediate stability and moment estimates of the relevant stochastic processes (including the tangent ones). Numerical experiments illustrate the competitive performance of our SGD based method compared to the IPS benchmarks. This work offers a theoretical foundation for using the SGD method in the context of numerical approximation of MV-SDEs, and provides analytical tools to study its stability and convergence.
Contrastive Prefence Learning: Learning from Human Feedback without RL
Authors: Authors: Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.13639
Pdf link: https://arxiv.org/pdf/2310.13639
Abstract Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the model by optimizing the learned reward via reinforcement learning (RL). This paradigm assumes that human preferences are distributed according to reward, but recent work suggests that they instead follow the regret under the user's optimal policy. Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase. Because of these optimization challenges, contemporary RLHF methods restrict themselves to contextual bandit settings (e.g., as in large language models) or limit observation dimensionality (e.g., state-based robotics). We overcome these limitations by introducing a new family of algorithms for optimizing behavior from human feedback using the regret-based model of human preferences. Using the principle of maximum entropy, we derive Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions, circumventing the need for RL. CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs. This enables CPL to elegantly scale to high-dimensional and sequential RLHF problems while being simpler than prior methods.
CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages
Authors: Authors: Gabriel Oliveira dos Santos, Diego Alysson Moreia, Alef Iury Ferreira, Jhessica Silva, Luiz Pereira, Pedro Bueno, Thiago Sousa, Helena Maia, Nádia Da Silva, Esther Colombini, Helio Pedrini, Sandra Avila
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.13683
Pdf link: https://arxiv.org/pdf/2310.13683
Abstract This work introduces CAPIVARA, a cost-efficient framework designed to enhance the performance of multilingual CLIP models in low-resource languages. While CLIP has excelled in zero-shot vision-language tasks, the resource-intensive nature of model training remains challenging. Many datasets lack linguistic diversity, featuring solely English descriptions for images. CAPIVARA addresses this by augmenting text data using image captioning and machine translation to generate multiple synthetic captions in low-resource languages. We optimize the training pipeline with LiT, LoRA, and gradient checkpointing to alleviate the computational cost. Through extensive experiments, CAPIVARA emerges as state of the art in zero-shot tasks involving images and Portuguese texts. We show the potential for significant improvements in other low-resource languages, achieved by fine-tuning the pre-trained multilingual CLIP using CAPIVARA on a single GPU for 2 hours. Our model and code is available at https://github.com/hiaac-nlp/CAPIVARA.
Keyword: super-resolution

Auxiliary Features-Guided Super Resolution for Monte Carlo Rendering
Authors: Authors: Qiqi Hou, Feng Liu
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.13235
Pdf link: https://arxiv.org/pdf/2310.13235
Abstract This paper investigates super resolution to reduce the number of pixels to render and thus speed up Monte Carlo rendering algorithms. While great progress has been made to super resolution technologies, it is essentially an ill-posed problem and cannot recover high-frequency details in renderings. To address this problem, we exploit high-resolution auxiliary features to guide super resolution of low-resolution renderings. These high-resolution auxiliary features can be quickly rendered by a rendering engine and at the same time provide valuable high-frequency details to assist super resolution. To this end, we develop a cross-modality Transformer network that consists of an auxiliary feature branch and a low-resolution rendering branch. These two branches are designed to fuse high-resolution auxiliary features with the corresponding low-resolution rendering. Furthermore, we design residual densely-connected Swin Transformer groups to learn to extract representative features to enable high-quality super-resolution. Our experiments show that our auxiliary features-guided super-resolution method outperforms both super-resolution methods and Monte Carlo denoising methods in producing high-quality renderings.

zoq / arxiv-updates

New submissions for Mon, 23 Oct 23 #626

Keyword: sgd

LASER: Linear Compression in Wireless Distributed Optimization

Numerical approximation of McKean-Vlasov SDEs via stochastic gradient descent

Keyword: optimization

Adaptive Dynamic Programming for Energy-Efficient Base Station Cell Switching

Compositional preference models for aligning LMs

Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised Language Understanding

LASER: Linear Compression in Wireless Distributed Optimization

To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

Creative Robot Tool Use with Large Language Models

Closed-Loop Motion Planning for Differentially Flat Systems: A Time-Varying Optimization Framework

Semi-Supervised Learning of Dynamical Systems with Neural Ordinary Differential Equations: A Teacher-Student Model Approach

Optimal Symbolic Bound Synthesis

Approximated Modeling and Optimal Design for a Soft Pneumatic Actuator Considering the Force/Torque and System Controllability

Absolute Policy Optimization

Detecting Shared Data Manipulation in Distributed Optimization Algorithms

A Data-Centric Multi-Objective Learning Framework for Responsible Recommendation Systems

Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

VFedMH: Vertical Federated Learning for Training Multi-party Heterogeneous Models

An Improved Artificial Fish Swarm Algorithm for Solving the Problem of Investigation Path Planning

Equivariant Deep Weight Space Alignment

Two-Stage Triplet Loss Training with Curriculum Augmentation for Audio-Visual Retrieval

Stable Nonconvex-Nonconcave Training via Linear Interpolation

Distributed Adaptive Time-Varying Convex Optimization for Multi-agent Systems

Distributed Global Optimal Coverage Control in Multi-agent Systems: Known and Unknown Environments

Semantic Decomposition of Question and SQL for Text-to-SQL Parsing

Entangled Preferences: The History and Risks of Reinforcement Learning and Human Feedback

Contrastive Prefence Learning: Learning from Human Feedback without RL

Keyword: adam

Keyword: gradient

Enabling energy-Efficient object detection with surrogate gradient descent in spiking neural networks

Blending gradient boosted trees and neural networks for point and probabilistic forecasting of hierarchical time series

LASER: Linear Compression in Wireless Distributed Optimization

In-context Learning with Transformer Is Really Equivalent to a Contrastive Learning Pattern

Absolute Policy Optimization

A Data-Centric Multi-Objective Learning Framework for Responsible Recommendation Systems

Learning Recurrent Models with Temporally Local Rules

Combining Policy Gradient and Safety-Based Control for Autonomous Driving

VFedMH: Vertical Federated Learning for Training Multi-party Heterogeneous Models

Single-view 3D reconstruction via inverse procedural modeling

BRFL: A Blockchain-based Byzantine-Robust Federated Learning Model

Stable Nonconvex-Nonconcave Training via Linear Interpolation

Distributed Adaptive Time-Varying Convex Optimization for Multi-agent Systems

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

Numerical approximation of McKean-Vlasov SDEs via stochastic gradient descent

Contrastive Prefence Learning: Learning from Human Feedback without RL

CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages

Keyword: super-resolution

Auxiliary Features-Guided Super Resolution for Monte Carlo Rendering