New submissions for Thu, 9 Nov 23

Keyword: sgd

There is no result

Keyword: optimization

Parameter Tuning in the Radial Kernel-Based Partition of Unity Method by Bayesian Optimization

Authors: Authors: Roberto Cavoretto, Alessandra De Rossi, Sandro Lancellotti, Federico Romaniello
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.04289
Pdf link: https://arxiv.org/pdf/2311.04289
Abstract In this paper, we employ Bayesian optimization to concurrently explore the optimal values for both the shape parameter and the radius in the partition of unity interpolation using radial basis functions. Bayesian optimization is a probabilistic, iterative approach that models the error function through a progressively self-updated Gaussian process. Meanwhile, the partition of unity approach harnesses a meshfree method, allowing us to significantly reduce computational expenses, particularly when considering a substantial number of scattered data points. This reduction in computational cost is achieved by decomposing the entire domain into several smaller subdomains, each of them with a variable radius. We provide an estimation of the complexity of our algorithm and carry out numerical experiments to illustrate the effectiveness of our approach, dealing with test and real-world datasets.
Bayesian Approach for Radial Kernel Parameter Tuning
Authors: Authors: Roberto Cavoretto, Alessandra De Rossi, Sandro Lancellotti
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.04311
Pdf link: https://arxiv.org/pdf/2311.04311
Abstract In this paper we present a new fast and accurate method for Radial Basis Function (RBF) approximation, including interpolation as a special case, which enables us to effectively find the optimal value of the RBF shape parameter. In particular, we propose a statistical technique, called Bayesian optimization, that consists in modelling the error function with a Gaussian process, by which, through an iterative process, the optimal shape parameter is selected. The process is step by step self-updated resulting in a relevant decrease in search time with respect to the classical leave one out cross validation technique. Numerical results deriving from some test examples and an application to real data show the performance of the proposed method.
Practical Parallel Algorithms for Near-Optimal Densest Subgraphs on Massive Graphs
Authors: Authors: Pattara Sukprasert, Quanquan C. Liu, Laxman Dhulipala, Julian Shun
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.04333
Pdf link: https://arxiv.org/pdf/2311.04333
Abstract The densest subgraph problem has received significant attention, both in theory and in practice, due to its applications in problems such as community detection, social network analysis, and spam detection. Due to the high cost of obtaining exact solutions, much attention has focused on designing approximate densest subgraph algorithms. However, existing approaches are not able to scale to massive graphs with billions of edges. In this paper, we introduce a new framework that combines approximate densest subgraph algorithms with a pruning optimization. We design new parallel variants of the state-of-the-art sequential Greedy++ algorithm, and plug it into our framework in conjunction with a parallel pruning technique based on $k$-core decomposition to obtain parallel $(1+\varepsilon)$-approximate densest subgraph algorithms. On a single thread, our algorithms achieve $2.6$--$34\times$ speedup over Greedy++, and obtain up to $22.37\times$ self relative parallel speedup on a 30-core machine with two-way hyper-threading. Compared with the state-of-the-art parallel algorithm by Harb et al. [NeurIPS'22], we achieve up to a $114\times$ speedup on the same machine. Finally, against the recent sequential algorithm of Xu et al. [PACMMOD'23], we achieve up to a $25.9\times$ speedup. The scalability of our algorithms enables us to obtain near-optimal density statistics on the hyperlink2012 (with roughly 113 billion edges) and clueweb (with roughly 37 billion edges) graphs for the first time in the literature.
Convex Methods for Constrained Linear Bandits
Authors: Authors: Amirhossein Afsharrad, Ahmadreza Moradipari, Sanjay Lall
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.04338
Pdf link: https://arxiv.org/pdf/2311.04338
Abstract Recently, bandit optimization has received significant attention in real-world safety-critical systems that involve repeated interactions with humans. While there exist various algorithms with performance guarantees in the literature, practical implementation of the algorithms has not received as much attention. This work presents a comprehensive study on the computational aspects of safe bandit algorithms, specifically safe linear bandits, by introducing a framework that leverages convex programming tools to create computationally efficient policies. In particular, we first characterize the properties of the optimal policy for safe linear bandit problem and then propose an end-to-end pipeline of safe linear bandit algorithms that only involves solving convex problems. We also numerically evaluate the performance of our proposed methods.
Device Sampling and Resource Optimization for Federated Learning in Cooperative Edge Networks
Authors: Authors: Su Wang, Roberto Morabito, Seyyedali Hosseinalipour, Mung Chiang, Christopher G. Brinton
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.04350
Pdf link: https://arxiv.org/pdf/2311.04350
Abstract The conventional federated learning (FedL) architecture distributes machine learning (ML) across worker devices by having them train local models that are periodically aggregated by a server. FedL ignores two important characteristics of contemporary wireless networks, however: (i) the network may contain heterogeneous communication/computation resources, and (ii) there may be significant overlaps in devices' local data distributions. In this work, we develop a novel optimization methodology that jointly accounts for these factors via intelligent device sampling complemented by device-to-device (D2D) offloading. Our optimization methodology aims to select the best combination of sampled nodes and data offloading configuration to maximize FedL training accuracy while minimizing data processing and D2D communication resource consumption subject to realistic constraints on the network topology and device capabilities. Theoretical analysis of the D2D offloading subproblem leads to new FedL convergence bounds and an efficient sequential convex optimizer. Using these results, we develop a sampling methodology based on graph convolutional networks (GCNs) which learns the relationship between network attributes, sampled nodes, and D2D data offloading to maximize FedL accuracy. Through evaluation on popular datasets and real-world network measurements from our edge testbed, we find that our methodology outperforms popular device sampling methodologies from literature in terms of ML model performance, data processing overhead, and energy consumption.
Force-Constrained Visual Policy: Safe Robot-Assisted Dressing via Multi-Modal Sensing
Authors: Authors: Zhanyi Sun, Yufei Wang, David Held, Zackory Erickson
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.04390
Pdf link: https://arxiv.org/pdf/2311.04390
Abstract Robot-assisted dressing could profoundly enhance the quality of life of adults with physical disabilities. To achieve this, a robot can benefit from both visual and force sensing. The former enables the robot to ascertain human body pose and garment deformations, while the latter helps maintain safety and comfort during the dressing process. In this paper, we introduce a new technique that leverages both vision and force modalities for this assistive task. Our approach first trains a vision-based dressing policy using reinforcement learning in simulation with varying body sizes, poses, and types of garments. We then learn a force dynamics model for action planning to ensure safety. Due to limitations of simulating accurate force data when deformable garments interact with the human body, we learn a force dynamics model directly from real-world data. Our proposed method combines the vision-based policy, trained in simulation, with the force dynamics model, learned in the real world, by solving a constrained optimization problem to infer actions that facilitate the dressing process without applying excessive force on the person. We evaluate our system in simulation and in a real-world human study with 10 participants across 240 dressing trials, showing it greatly outperforms prior baselines. Video demonstrations are available on our project website\footnote{\url{https://sites.google.com/view/dressing-fcvp}}.
Understanding the Role and Design Space of Demand Sinks in Low-carbon Power Systems
Authors: Authors: Sam van der Jagt, Neha Patankar, Jesse Jenkins
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.04398
Pdf link: https://arxiv.org/pdf/2311.04398
Abstract As the availability of weather-dependent, zero marginal cost resources such as wind and solar power increases, a variety of flexible electricity loads, or `demand sinks', could be deployed to use intermittently available low-cost electricity to produce valuable outputs. This study provides a general framework to evaluate any potential demand sink technology and understand its viability to be deployed cost-effectively in low-carbon power systems. We use an electricity system optimization model to assess 98 discrete combinations of capital costs and output values that collectively span the range of feasible characteristics of potential demand sink technologies. We find that candidates like hydrogen electrolysis, direct air capture, and flexible electric heating can all achieve significant installed capacity (>10% of system peak load) if lower capital costs are reached in the future. Demand sink technologies significantly increase installed wind and solar capacity while not significantly affecting battery storage, firm generating capacity, or the average cost of electricity.
Likelihood Ratio Confidence Sets for Sequential Decision Making
Authors: Authors: Nicolas Emmenegger, Mojmír Mutný, Andreas Krause
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.04402
Pdf link: https://arxiv.org/pdf/2311.04402
Abstract Certifiable, adaptive uncertainty estimates for unknown quantities are an essential ingredient of sequential decision-making algorithms. Standard approaches rely on problem-dependent concentration results and are limited to a specific combination of parameterization, noise family, and estimator. In this paper, we revisit the likelihood-based inference principle and propose to use likelihood ratios to construct any-time valid confidence sequences without requiring specialized treatment in each application scenario. Our method is especially suitable for problems with well-specified likelihoods, and the resulting sets always maintain the prescribed coverage in a model-agnostic manner. The size of the sets depends on a choice of estimator sequence in the likelihood ratio. We discuss how to provably choose the best sequence of estimators and shed light on connections to online convex optimization with algorithms such as Follow-the-Regularized-Leader. To counteract the initially large bias of the estimators, we propose a reweighting scheme that also opens up deployment in non-parametric settings such as RKHS function classes. We provide a non-asymptotic analysis of the likelihood ratio confidence sets size for generalized linear models, using insights from convex duality and online learning. We showcase the practical strength of our method on generalized linear bandit problems, survival analysis, and bandits with various additive noise distributions.
Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs
Authors: Authors: Hongwu Peng, Caiwen Ding, Tong Geng, Sutanay Choudhury, Kevin Barker, Ang Li
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2311.04417
Pdf link: https://arxiv.org/pdf/2311.04417
Abstract The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators capable of handling the increasing complexity and computational demands. Traditional computing architectures, based on the von Neumann model, are being outstripped by the requirements of contemporary AI/ML algorithms, leading to a surge in the creation of accelerators like the Graphcore Intelligence Processing Unit (IPU), Sambanova Reconfigurable Dataflow Unit (RDU), and enhanced GPU platforms. These hardware accelerators are characterized by their innovative data-flow architectures and other design optimizations that promise to deliver superior performance and energy efficiency for AI/ML tasks. This research provides a preliminary evaluation and comparison of these commercial AI/ML accelerators, delving into their hardware and software design features to discern their strengths and unique capabilities. By conducting a series of benchmark evaluations on common DNN operators and other AI/ML workloads, we aim to illuminate the advantages of data-flow architectures over conventional processor designs and offer insights into the performance trade-offs of each platform. The findings from our study will serve as a valuable reference for the design and performance expectations of research prototypes, thereby facilitating the development of next-generation hardware accelerators tailored for the ever-evolving landscape of AI/ML applications. Through this analysis, we aspire to contribute to the broader understanding of current accelerator technologies and to provide guidance for future innovations in the field.
Strategies for Parallelizing the Big-Means Algorithm: A Comprehensive Tutorial for Effective Big Data Clustering
Authors: Authors: Ravil Mussabayev, Rustam Mussabayev
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.04517
Pdf link: https://arxiv.org/pdf/2311.04517
Abstract This study focuses on the optimization of the Big-means algorithm for clustering large-scale datasets, exploring four distinct parallelization strategies. We conducted extensive experiments to assess the computational efficiency, scalability, and clustering performance of each approach, revealing their benefits and limitations. The paper also delves into the trade-offs between computational efficiency and clustering quality, examining the impacts of various factors. Our insights provide practical guidance on selecting the best parallelization strategy based on available resources and dataset characteristics, contributing to a deeper understanding of parallelization techniques for the Big-means algorithm.
An Unsupervised Deep Learning Approach for the Wave Equation Inverse Problem
Authors: Authors: Xiong-Bin Yan, Keke Wu, Zhi-Qin John Xu, Zheng Ma
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.04531
Pdf link: https://arxiv.org/pdf/2311.04531
Abstract Full-waveform inversion (FWI) is a powerful geophysical imaging technique that infers high-resolution subsurface physical parameters by solving a non-convex optimization problem. However, due to limitations in observation, e.g., limited shots or receivers, and random noise, conventional inversion methods are confronted with numerous challenges, such as the local-minimum problem. In recent years, a substantial body of work has demonstrated that the integration of deep neural networks and partial differential equations for solving full-waveform inversion problems has shown promising performance. In this work, drawing inspiration from the expressive capacity of neural networks, we provide an unsupervised learning approach aimed at accurately reconstructing subsurface physical velocity parameters. This method is founded on a re-parametrization technique for Bayesian inference, achieved through a deep neural network with random weights. Notably, our proposed approach does not hinge upon the requirement of the labeled training dataset, rendering it exceedingly versatile and adaptable to diverse subsurface models. Extensive experiments show that the proposed approach performs noticeably better than existing conventional inversion methods.
FEIR: Quantifying and Reducing Envy and Inferiority for Fair Recommendation of Limited Resources
Authors: Authors: Nan Li, Bo Kang, Jefrey Lijffijt, Tijl De Bie
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.04542
Pdf link: https://arxiv.org/pdf/2311.04542
Abstract In settings such as e-recruitment and online dating, recommendation involves distributing limited opportunities, calling for novel approaches to quantify and enforce fairness. We introduce \emph{inferiority}, a novel (un)fairness measure quantifying a user's competitive disadvantage for their recommended items. Inferiority complements \emph{envy}, a fairness notion measuring preference for others' recommendations. We combine inferiority and envy with \emph{utility}, an accuracy-related measure of aggregated relevancy scores. Since these measures are non-differentiable, we reformulate them using a probabilistic interpretation of recommender systems, yielding differentiable versions. We combine these loss functions in a multi-objective optimization problem called \texttt{FEIR} (Fairness through Envy and Inferiority Reduction), applied as post-processing for standard recommender systems. Experiments on synthetic and real-world data demonstrate that our approach improves trade-offs between inferiority, envy, and utility compared to naive recommendations and the baseline methods.
Information-Theoretic Generalization Bounds for Transductive Learning and its Applications
Authors: Authors: Huayi Tang, Yong Liu
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.04561
Pdf link: https://arxiv.org/pdf/2311.04561
Abstract In this paper, we develop data-dependent and algorithm-dependent generalization bounds for transductive learning algorithms in the context of information theory for the first time. We show that the generalization gap of transductive learning algorithms can be bounded by the mutual information between training labels and hypothesis. By innovatively proposing the concept of transductive supersamples, we go beyond the inductive learning setting and establish upper bounds in terms of various information measures. Furthermore, we derive novel PAC-Bayesian bounds and build the connection between generalization and loss landscape flatness under the transductive learning setting. Finally, we present the upper bounds for adaptive optimization algorithms and demonstrate the applications of results on semi-supervised learning and graph learning scenarios. Our theoretic results are validated on both synthetic and real-world datasets.
GResilience: Trading Off Between the Greenness and the Resilience of Collaborative AI Systems
Authors: Authors: Diaeddin Rimawi, Antonio Liotta, Marco Todescato, Barbara Russo
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.04569
Pdf link: https://arxiv.org/pdf/2311.04569
Abstract A Collaborative Artificial Intelligence System (CAIS) works with humans in a shared environment to achieve a common goal. To recover from a disruptive event that degrades its performance and ensures its resilience, a CAIS may then need to perform a set of actions either by the system, by the humans, or collaboratively together. As for any other system, recovery actions may cause energy adverse effects due to the additional required energy. Therefore, it is of paramount importance to understand which of the above actions can better trade-off between resilience and greenness. In this in-progress work, we propose an approach to automatically evaluate CAIS recovery actions for their ability to trade-off between the resilience and greenness of the system. We have also designed an experiment protocol and its application to a real CAIS demonstrator. Our approach aims to attack the problem from two perspectives: as a one-agent decision problem through optimization, which takes the decision based on the score of resilience and greenness, and as a two-agent decision problem through game theory, which takes the decision based on the payoff computed for resilience and greenness as two players of a cooperative game.
Navigating Resource Conflicts: Co-opetition and Fairness
Authors: Authors: Shiksha Singhal
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2311.04603
Pdf link: https://arxiv.org/pdf/2311.04603
Abstract In today's dynamic and interconnected world, resource constraints pose significant challenges across various domains, ranging from networks, logistics and manufacturing to project management and optimization, etc. Resource-constrained problems (RCPs) represent a class of complex computational problems that require efficient allocation and utilization of limited resources to achieve optimal outcomes. This thesis aims to delve into such problems involving multiple agents, where agents aim to enhance their own payoffs, or a neutral moderator aims to maximise the system revenue while distributing the resources appropriately among all agents. In the former type of problems, agents may seek collaboration to achieve higher individual shares, resulting in a cooperative game with competition, i.e., co-opetition. Cooperative and non-cooperative game theory tools are utilized to analyze such games. On the other hand, for the latter kind of problems, we use tools from optimization and Markov decision processes.
Towards Deeper, Lighter and Interpretable Cross Network for CTR Prediction
Authors: Authors: Fangye Wang, Hansu Gu, Dongsheng Li, Tun Lu, Peng Zhang, Ning Gu
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2311.04635
Pdf link: https://arxiv.org/pdf/2311.04635
Abstract Click Through Rate (CTR) prediction plays an essential role in recommender systems and online advertising. It is crucial to effectively model feature interactions to improve the prediction performance of CTR models. However, existing methods face three significant challenges. First, while most methods can automatically capture high-order feature interactions, their performance tends to diminish as the order of feature interactions increases. Second, existing methods lack the ability to provide convincing interpretations of the prediction results, especially for high-order feature interactions, which limits the trustworthiness of their predictions. Third, many methods suffer from the presence of redundant parameters, particularly in the embedding layer. This paper proposes a novel method called Gated Deep Cross Network (GDCN) and a Field-level Dimension Optimization (FDO) approach to address these challenges. As the core structure of GDCN, Gated Cross Network (GCN) captures explicit high-order feature interactions and dynamically filters important interactions with an information gate in each order. Additionally, we use the FDO approach to learn condensed dimensions for each field based on their importance. Comprehensive experiments on five datasets demonstrate the effectiveness, superiority and interpretability of GDCN. Moreover, we verify the effectiveness of FDO in learning various dimensions and reducing model parameters. The code is available on \url{https://github.com/anonctr/GDCN}.
Bilevel Relations and Their Applications to Data Insights
Authors: Authors: Xi Wu, Xiangyao Yu, Shaleen Deep, Ahmed Mahmood, Uyeong Jang, Stratis Viglas, Somesh Jha, John Cieslewicz, Jeffrey F. Naughton
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2311.04824
Pdf link: https://arxiv.org/pdf/2311.04824
Abstract Many data-insight analytic tasks in anomaly detection, metric attribution, and experimentation analysis can be modeled as searching in a large space of tables and finding important ones, where the notion of importance is defined in some adhoc manner. While various frameworks have been proposed (e.g., DIFF, VLDB 2019), a systematic and general treatment is lacking. This paper describes bilevel relations and operators. While a relation (i.e., table) models a set of tuples, a bilevel relation is a dictionary that explicitly models a set of tables, where each value'' table is identified by akey'' of a (region, features) pair, where region specifies key attributes of the table, and features specify columns of the table. Bilevel relational operators are BilevelRelation-to-BilevelRelation transformations and directly analyze a set of tables. Bilevel relations and operators provide higher level abstractions for creating and manipulating a set of tables, and are compatible with the classic relational algebra. Together, they allow us to construct bilevel queries, which can express succinctly a range of insight-analytical questions with ``search+eval'' character. We have implemented and deployed a query engine for bilevel queries as a service, which is a first of its kind. Bilevel queries pose a rich algorithm and system design space, such as query optimization and data format, in order to evaluate them efficiently. We describe our current designs and lessons, and report empirical evaluations. Bilevel queries have found many useful applications, and have attracted more than 30 internal teams to build data-insight applications with it.
Toward Rapid, Optimal, and Feasible Power Dispatch through Generalized Neural Mapping
Authors: Authors: Meiyi Li, Javad Mohammadi
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.04838
Pdf link: https://arxiv.org/pdf/2311.04838
Abstract The evolution towards a more distributed and interconnected grid necessitates large-scale decision-making within strict temporal constraints. Machine learning (ML) paradigms have demonstrated significant potential in improving the efficacy of optimization processes. However, the feasibility of solutions derived from ML models continues to pose challenges. It's imperative that ML models produce solutions that are attainable and realistic within the given system constraints of power systems. To address the feasibility issue and expedite the solution search process, we proposed LOOP-LC 2.0(Learning to Optimize the Optimization Process with Linear Constraints version 2.0) as a learning-based approach for solving the power dispatch problem. A notable advantage of the LOOP-LC 2.0 framework is its ability to ensure near-optimality and strict feasibility of solutions without depending on computationally intensive post-processing procedures, thus eliminating the need for iterative processes. At the heart of the LOOP-LC 2.0 model lies the newly proposed generalized gauge map method, capable of mapping any infeasible solution to a feasible point within the linearly-constrained domain. The proposed generalized gauge map method improves the traditional gauge map by exhibiting reduced sensitivity to input variances while increasing search speeds significantly. Utilizing the IEEE-200 test case as a benchmark, we demonstrate the effectiveness of the LOOP-LC 2.0 methodology, confirming its superior performance in terms of training speed, computational time, optimality, and solution feasibility compared to existing methodologies.
Computing with Residue Numbers in High-Dimensional Representation
Authors: Authors: Christopher J. Kymn, Denis Kleyko, E. Paxon Frady, Connor Bybee, Pentti Kanerva, Friedrich T. Sommer, Bruno A. Olshausen
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2311.04872
Pdf link: https://arxiv.org/pdf/2311.04872
Abstract We introduce Residue Hyperdimensional Computing, a computing framework that unifies residue number systems with an algebra defined over random, high-dimensional vectors. We show how residue numbers can be represented as high-dimensional vectors in a manner that allows algebraic operations to be performed with component-wise, parallelizable operations on the vector elements. The resulting framework, when combined with an efficient method for factorizing high-dimensional vectors, can represent and operate on numerical values over a large dynamic range using vastly fewer resources than previous methods, and it exhibits impressive robustness to noise. We demonstrate the potential for this framework to solve computationally difficult problems in visual perception and combinatorial optimization, showing improvement over baseline methods. More broadly, the framework provides a possible account for the computational operations of grid cells in the brain, and it suggests new machine learning architectures for representing and manipulating numerical data.
Two Complementary Perspectives to Continual Learning: Ask Not Only What to Optimize, But Also How
Authors: Authors: Timm Hess, Tinne Tuytelaars, Gido M. van de Ven
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.04898
Pdf link: https://arxiv.org/pdf/2311.04898
Abstract Recent years have seen considerable progress in the continual training of deep neural networks, predominantly thanks to approaches that add replay or regularization terms to the loss function to approximate the joint loss over all tasks so far. However, we show that even with a perfect approximation to the joint loss, these approaches still suffer from temporary but substantial forgetting when starting to train on a new task. Motivated by this 'stability gap', we propose that continual learning strategies should focus not only on the optimization objective, but also on the way this objective is optimized. While there is some continual learning work that alters the optimization trajectory (e.g., using gradient projection techniques), this line of research is positioned as alternative to improving the optimization objective, while we argue it should be complementary. To evaluate the merits of our proposition, we plan to combine replay-approximated joint objectives with gradient projection-based optimization routines to test whether the addition of the latter provides benefits in terms of (1) alleviating the stability gap, (2) increasing the learning efficiency and (3) improving the final learning outcome.
Keyword: adam

Challenging Common Assumptions in Multi-task Learning
Authors: Authors: Cathrin Elich, Lukas Kirchdorfer, Jan M. Köhler, Lukas Schott
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.04698
Pdf link: https://arxiv.org/pdf/2311.04698
Abstract While multi-task learning (MTL) has gained significant attention in recent years, its underlying mechanisms remain poorly understood. Recent methods did not yield consistent performance improvements over single task learning (STL) baselines, underscoring the importance of gaining more profound insights about challenges specific to MTL. In our study, we challenge common assumptions in MTL in the context of STL: First, the choice of optimizer has only been mildly investigated in MTL. We show the pivotal role of common STL tools such as the Adam optimizer in MTL. We deduce the effectiveness of Adam to its partial loss-scale invariance. Second, the notion of gradient conflicts has often been phrased as a specific problem in MTL. We delve into the role of gradient conflicts in MTL and compare it to STL. For angular gradient alignment we find no evidence that this is a unique problem in MTL. We emphasize differences in gradient magnitude as the main distinguishing factor. Lastly, we compare the transferability of features learned through MTL and STL on common image corruptions, and find no conclusive evidence that MTL leads to superior transferability. Overall, we find surprising similarities between STL and MTL suggesting to consider methods from both fields in a broader context.
Keyword: gradient

Can LLMs Follow Simple Rules?
Authors: Authors: Norman Mu, Sarah Chen, Zifan Wang, Sizhe Chen, David Karamardian, Lulwa Aljeraisy, Dan Hendrycks, David Wagner
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.04235
Pdf link: https://arxiv.org/pdf/2311.04235
Abstract As Large Language Models (LLMs) are deployed with increasing real-world responsibilities, it is important to be able to specify and constrain the behavior of these systems in a reliable manner. Model developers may wish to set explicit rules for the model, such as "do not generate abusive content", but these may be circumvented by jailbreaking techniques. Evaluating how well LLMs follow developer-provided rules in the face of adversarial inputs typically requires manual review, which slows down monitoring and methods development. To address this issue, we propose Rule-following Language Evaluation Scenarios (RuLES), a programmatic framework for measuring rule-following ability in LLMs. RuLES consists of 15 simple text scenarios in which the model is instructed to obey a set of rules in natural language while interacting with the human user. Each scenario has a concise evaluation program to determine whether the model has broken any rules in a conversation. Through manual exploration of model behavior in our scenarios, we identify 6 categories of attack strategies and collect two suites of test cases: one consisting of unique conversations from manual testing and one that systematically implements strategies from the 6 categories. Across various popular proprietary and open models such as GPT-4 and Llama 2, we find that all models are susceptible to a wide variety of adversarial hand-crafted user inputs, though GPT-4 is the best-performing model. Additionally, we evaluate open models under gradient-based attacks and find significant vulnerabilities. We propose RuLES as a challenging new setting for research into exploring and defending against both manual and automatic attacks on LLMs.
Enhancing Malware Detection by Integrating Machine Learning with Cuckoo Sandbox
Authors: Authors: Amaal F. Alshmarni, Mohammed A. Alliheedi
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2311.04372
Pdf link: https://arxiv.org/pdf/2311.04372
Abstract In the modern era, malware is experiencing a significant increase in both its variety and quantity, aligning with the widespread adoption of the digital world. This surge in malware has emerged as a critical challenge in the realm of cybersecurity, prompting numerous research endeavors and contributions to address the issue. Machine learning algorithms have been leveraged for malware detection due to their ability to uncover concealed patterns within vast datasets. However, deep learning algorithms, characterized by their multi-layered structure, surpass the limitations of traditional machine learning approaches. By employing deep learning techniques such as CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network), this study aims to classify and identify malware extracted from a dataset containing API call sequences. The performance of these algorithms is compared with that of conventional machine learning methods, including SVM (Support Vector Machine), RF (Random Forest), KNN (K-Nearest Neighbors), XGB (Extreme Gradient Boosting), and GBC (Gradient Boosting Classifier), all using the same dataset. The outcomes of this research demonstrate that both deep learning and machine learning algorithms achieve remarkably high levels of accuracy, reaching up to 99% in certain cases.
Reusing Convolutional Neural Network Models through Modularization and Composition
Authors: Authors: Binhang Qi, Hailong Sun, Hongyu Zhang, Xiang Gao
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.04438
Pdf link: https://arxiv.org/pdf/2311.04438
Abstract With the widespread success of deep learning technologies, many trained deep neural network (DNN) models are now publicly available. However, directly reusing the public DNN models for new tasks often fails due to mismatching functionality or performance. Inspired by the notion of modularization and composition in software reuse, we investigate the possibility of improving the reusability of DNN models in a more fine-grained manner. Specifically, we propose two modularization approaches named CNNSplitter and GradSplitter, which can decompose a trained convolutional neural network (CNN) model for $N$-class classification into $N$ small reusable modules. Each module recognizes one of the $N$ classes and contains a part of the convolution kernels of the trained CNN model. Then, the resulting modules can be reused to patch existing CNN models or build new CNN models through composition. The main difference between CNNSplitter and GradSplitter lies in their search methods: the former relies on a genetic algorithm to explore search space, while the latter utilizes a gradient-based search method. Our experiments with three representative CNNs on three widely-used public datasets demonstrate the effectiveness of the proposed approaches. Compared with CNNSplitter, GradSplitter incurs less accuracy loss, produces much smaller modules (19.88% fewer kernels), and achieves better results on patching weak models. In particular, experiments on GradSplitter show that (1) by patching weak models, the average improvement in terms of precision, recall, and F1-score is 17.13%, 4.95%, and 11.47%, respectively, and (2) for a new task, compared with the models trained from scratch, reusing modules achieves similar accuracy (the average loss of accuracy is only 2.46%) without a costly training process. Our approaches provide a viable solution to the rapid development and improvement of CNN models.
CLearViD: Curriculum Learning for Video Description
Authors: Authors: Cheng-Yu Chuang, Pooyan Fazli
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.04480
Pdf link: https://arxiv.org/pdf/2311.04480
Abstract Video description entails automatically generating coherent natural language sentences that narrate the content of a given video. We introduce CLearViD, a transformer-based model for video description generation that leverages curriculum learning to accomplish this task. In particular, we investigate two curriculum strategies: (1) progressively exposing the model to more challenging samples by gradually applying a Gaussian noise to the video data, and (2) gradually reducing the capacity of the network through dropout during the training process. These methods enable the model to learn more robust and generalizable features. Moreover, CLearViD leverages the Mish activation function, which provides non-linearity and non-monotonicity and helps alleviate the issue of vanishing gradients. Our extensive experiments and ablation studies demonstrate the effectiveness of the proposed model. The results on two datasets, namely ActivityNet Captions and YouCook2, show that CLearViD significantly outperforms existing state-of-the-art models in terms of both accuracy and diversity metrics.
Near-Linear Scaling Data Parallel Training with Overlapping-Aware Gradient Compression
Authors: Authors: Lin Meng, Yuzhong Sun, Weimin Li
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.04499
Pdf link: https://arxiv.org/pdf/2311.04499
Abstract Existing Data Parallel (DP) trainings for deep neural networks (DNNs) often experience limited scalability in speedup due to substantial communication overheads. While Overlapping technique can mitigate such problem by paralleling communication and computation in DP, its effectiveness is constrained by the high communication-to-computation ratios (CCR) of DP training tasks. Gradient compression (GC) is a promising technique to obtain lower CCR by reducing communication volume directly. However, it is challenging to obtain real performance improvement by applying GC into Overlapping because of (1) severe performance penalties in traditional GCs caused by high compression overhead and (2) decline of Overlapping benefit owing to the possible data dependency in GC schemes. In this paper, we propose COVAP, a novel GC scheme designing a new coarse-grained filter, makes the compression overhead close to zero. COVAP ensures an almost complete overlap of communication and computation by employing adaptive compression ratios and tensor sharding tailored to specific training tasks. COVAP also adopts an improved error feedback mechanism to maintain training accuracy. Experiments are conducted on Alibaba Cloud ECS instances with different DNNs of real-world applications. The results illustrate that COVAP outperforms existent GC schemes in time-to-solution by 1.92x-15.39x and exhibits near-linear scaling. Furthermore, COVAP achieves best scalability under experiments on four different cluster sizes.
Constrained Adaptive Attacks: Realistic Evaluation of Adversarial Examples and Robust Training of Deep Neural Networks for Tabular Data
Authors: Authors: Thibault Simonetto, Salah Ghamizi, Antoine Desjardins, Maxime Cordy, Yves Le Traon
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.04503
Pdf link: https://arxiv.org/pdf/2311.04503
Abstract State-of-the-art deep learning models for tabular data have recently achieved acceptable performance to be deployed in industrial settings. However, the robustness of these models remains scarcely explored. Contrary to computer vision, there is to date no realistic protocol to properly evaluate the adversarial robustness of deep tabular models due to intrinsic properties of tabular data such as categorical features, immutability, and feature relationship constraints. To fill this gap, we propose CAA, the first efficient evasion attack for constrained tabular deep learning models. CAA is an iterative parameter-free attack that combines gradient and search attacks to generate adversarial examples under constraints. We leverage CAA to build a benchmark of deep tabular models across three popular use cases: credit scoring, phishing and botnet attacks detection. Our benchmark supports ten threat models with increasing capabilities of the attacker, and reflects real-world attack scenarios for each use case. Overall, our results demonstrate how domain knowledge, adversarial training, and attack budgets impact the robustness assessment of deep tabular models and provide security practitioners with a set of recommendations to improve the robustness of deep tabular models against various evasion attack scenarios.
Challenging Common Assumptions in Multi-task Learning
Authors: Authors: Cathrin Elich, Lukas Kirchdorfer, Jan M. Köhler, Lukas Schott
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.04698
Pdf link: https://arxiv.org/pdf/2311.04698
Abstract While multi-task learning (MTL) has gained significant attention in recent years, its underlying mechanisms remain poorly understood. Recent methods did not yield consistent performance improvements over single task learning (STL) baselines, underscoring the importance of gaining more profound insights about challenges specific to MTL. In our study, we challenge common assumptions in MTL in the context of STL: First, the choice of optimizer has only been mildly investigated in MTL. We show the pivotal role of common STL tools such as the Adam optimizer in MTL. We deduce the effectiveness of Adam to its partial loss-scale invariance. Second, the notion of gradient conflicts has often been phrased as a specific problem in MTL. We delve into the role of gradient conflicts in MTL and compare it to STL. For angular gradient alignment we find no evidence that this is a unique problem in MTL. We emphasize differences in gradient magnitude as the main distinguishing factor. Lastly, we compare the transferability of features learned through MTL and STL on common image corruptions, and find no conclusive evidence that MTL leads to superior transferability. Overall, we find surprising similarities between STL and MTL suggesting to consider methods from both fields in a broader context.
Finite Element Methods for the Stretching and Bending of Thin Structures with Folding
Authors: Authors: Andrea Bonito, Diane Guignard, Angelique Morvant
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.04810
Pdf link: https://arxiv.org/pdf/2311.04810
Abstract In [Bonito et al., J. Comput. Phys. (2022)], a local discontinous Galerkin method was proposed for approximating the large bending of prestrained plates, and in [Bonito et al., IMA J. Numer. Anal. (2023)] the numerical properties of this method were explored. These works considered deformations driven predominantly by bending. Thus, a bending energy with a metric constraint was considered. We extend these results to the case of an energy with both a bending component and a nonconvex stretching component, and we also consider folding across a crease. The proposed discretization of this energy features a continuous finite element space, as well as a discrete Hessian operator. We establish the $\Gamma$-convergence of the discrete to the continuous energy and also present an energy-decreasing gradient flow for finding critical points of the discrete energy. Finally, we provide numerical simulations illustrating the convergence of minimizers and the capabilities of the model.
Real-Time Recurrent Reinforcement Learning
Authors: Authors: Julian Lemmel, Radu Grosu
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.04830
Pdf link: https://arxiv.org/pdf/2311.04830
Abstract Recent advances in reinforcement learning, for partially-observable Markov decision processes (POMDPs), rely on the biologically implausible backpropagation through time algorithm (BPTT) to perform gradient-descent optimisation. In this paper we propose a novel reinforcement learning algorithm that makes use of random feedback local online learning (RFLO), a biologically plausible approximation of realtime recurrent learning (RTRL) to compute the gradients of the parameters of a recurrent neural network in an online manner. By combining it with TD($\lambda$), a variant of temporaldifference reinforcement learning with eligibility traces, we create a biologically plausible, recurrent actor-critic algorithm, capable of solving discrete and continuous control tasks in POMDPs. We compare BPTT, RTRL and RFLO as well as different network architectures, and find that RFLO can perform just as well as RTRL while exceeding even BPTT in terms of complexity. The proposed method, called real-time recurrent reinforcement learning (RTRRL), serves as a model of learning in biological neural networks mimicking reward pathways in the mammalian brain.
Two Complementary Perspectives to Continual Learning: Ask Not Only What to Optimize, But Also How
Authors: Authors: Timm Hess, Tinne Tuytelaars, Gido M. van de Ven
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.04898
Pdf link: https://arxiv.org/pdf/2311.04898
Abstract Recent years have seen considerable progress in the continual training of deep neural networks, predominantly thanks to approaches that add replay or regularization terms to the loss function to approximate the joint loss over all tasks so far. However, we show that even with a perfect approximation to the joint loss, these approaches still suffer from temporary but substantial forgetting when starting to train on a new task. Motivated by this 'stability gap', we propose that continual learning strategies should focus not only on the optimization objective, but also on the way this objective is optimized. While there is some continual learning work that alters the optimization trajectory (e.g., using gradient projection techniques), this line of research is positioned as alternative to improving the optimization objective, while we argue it should be complementary. To evaluate the merits of our proposition, we plan to combine replay-approximated joint objectives with gradient projection-based optimization routines to test whether the addition of the latter provides benefits in terms of (1) alleviating the stability gap, (2) increasing the learning efficiency and (3) improving the final learning outcome.
Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models
Authors: Authors: Rocktim Jyoti Das, Liqun Ma, Zhiqiang Shen
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.04902
Pdf link: https://arxiv.org/pdf/2311.04902
Abstract Large Language Models (LLMs) with a billion or more parameters are prime targets for network pruning, which aims to reduce a portion of the network weights without compromising performance. Prior approaches such as Weights Magnitude, SparseGPT, and Wanda, either concentrated solely on weights or integrated weights with activations for sparsity. However, they overlooked the informative gradients derived from pretrained large language models. In this paper, we present a novel sparsity-centric pruning method for pretrained LLMs, termed Gradient-based Language Model Pruner (GBLM-Pruner). GBLM-Pruner leverages the first-order term of the Taylor expansion, operating in a training-free manner by harnessing properly normalized gradients from a few calibration samples to determine the importance pruning score, and substantially outperforms competitive counterparts like SparseGPT and Wanda in multiple benchmarks. Intriguing, after incorporating gradients, the unstructured pruning method tends to reveal some structural patterns post-pruning, which mirrors the geometric interdependence inherent in the LLMs' parameter structure. Additionally, GBLM-Pruner functions without any subsequent retraining or weight updates to maintain its simplicity as other counterparts. Extensive evaluations on LLaMA-1 and LLaMA-2 across various language benchmarks and perplexity show that GBLM-Pruner surpasses magnitude pruning, Wanda (weights+activations) and SparseGPT (weights+activations+weight update) by significant margins. Our code and models are available at https://github.com/RocktimJyotiDas/GBLM-Pruner.
Keyword: super-resolution

There is no result

zoq / arxiv-updates

New submissions for Thu, 9 Nov 23 #639

Keyword: sgd

Keyword: optimization

Parameter Tuning in the Radial Kernel-Based Partition of Unity Method by Bayesian Optimization

Bayesian Approach for Radial Kernel Parameter Tuning

Practical Parallel Algorithms for Near-Optimal Densest Subgraphs on Massive Graphs

Convex Methods for Constrained Linear Bandits

Device Sampling and Resource Optimization for Federated Learning in Cooperative Edge Networks

Force-Constrained Visual Policy: Safe Robot-Assisted Dressing via Multi-Modal Sensing

Understanding the Role and Design Space of Demand Sinks in Low-carbon Power Systems

Likelihood Ratio Confidence Sets for Sequential Decision Making

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs

Strategies for Parallelizing the Big-Means Algorithm: A Comprehensive Tutorial for Effective Big Data Clustering

An Unsupervised Deep Learning Approach for the Wave Equation Inverse Problem

FEIR: Quantifying and Reducing Envy and Inferiority for Fair Recommendation of Limited Resources

Information-Theoretic Generalization Bounds for Transductive Learning and its Applications

GResilience: Trading Off Between the Greenness and the Resilience of Collaborative AI Systems

Navigating Resource Conflicts: Co-opetition and Fairness

Towards Deeper, Lighter and Interpretable Cross Network for CTR Prediction

Bilevel Relations and Their Applications to Data Insights

Toward Rapid, Optimal, and Feasible Power Dispatch through Generalized Neural Mapping

Computing with Residue Numbers in High-Dimensional Representation

Two Complementary Perspectives to Continual Learning: Ask Not Only What to Optimize, But Also How

Keyword: adam

Challenging Common Assumptions in Multi-task Learning

Keyword: gradient

Can LLMs Follow Simple Rules?

Enhancing Malware Detection by Integrating Machine Learning with Cuckoo Sandbox

Reusing Convolutional Neural Network Models through Modularization and Composition

CLearViD: Curriculum Learning for Video Description

Near-Linear Scaling Data Parallel Training with Overlapping-Aware Gradient Compression

Constrained Adaptive Attacks: Realistic Evaluation of Adversarial Examples and Robust Training of Deep Neural Networks for Tabular Data

Challenging Common Assumptions in Multi-task Learning

Finite Element Methods for the Stretching and Bending of Thin Structures with Folding

Real-Time Recurrent Reinforcement Learning

Two Complementary Perspectives to Continual Learning: Ask Not Only What to Optimize, But Also How

Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

Keyword: super-resolution