New submissions for Tue, 10 Oct 23

Keyword: sgd

Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning

Authors: Authors: Liam Collins, Shanshan Wu, Sewoong Oh, Khe Chai Sim
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04627
Pdf link: https://arxiv.org/pdf/2310.04627
Abstract In many applications of federated learning (FL), clients desire models that are personalized using their local data, yet are also robust in the sense that they retain general global knowledge. However, the presence of data heterogeneity across clients induces a fundamental trade-off between personalization (i.e., adaptation to a local distribution) and robustness (i.e., not forgetting previously learned general knowledge). It is critical to understand how to navigate this personalization vs robustness trade-off when designing federated systems, which are increasingly moving towards a paradigm of fine-tuning large foundation models. Due to limited computational and communication capabilities in most federated settings, this foundation model fine-tuning must be done using parameter-efficient fine-tuning (PEFT) approaches. While some recent work has studied federated approaches to PEFT, the personalization vs robustness trade-off of federated PEFT has been largely unexplored. In this work, we take a step towards bridging this gap by benchmarking fundamental FL algorithms -- FedAvg and FedSGD plus personalization (via client local fine-tuning) -- applied to one of the most ubiquitous PEFT approaches to large language models (LLMs) -- prompt tuning -- in a multitude of hyperparameter settings under varying levels of data heterogeneity. Our results show that federated-trained prompts can be surprisingly robust when using a small learning rate with many local epochs for personalization, especially when using an adaptive optimizer as the client optimizer during federated training. We also demonstrate that simple approaches such as adding regularization and interpolating two prompts are effective in improving the personalization vs robustness trade-off in computation-limited settings with few local updates allowed for personalization.
Keyword: optimization

Facilitating Battery Swapping Services for Freight Trucks with Spatial-Temporal Demand Prediction
Authors: Authors: Linyu Liu, Zhen Dai, Shiji Song, Xiaocheng Li, Guanting Chen
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.04440
Pdf link: https://arxiv.org/pdf/2310.04440
Abstract Electrifying heavy-duty trucks offers a substantial opportunity to curtail carbon emissions, advancing toward a carbon-neutral future. However, the inherent challenges of limited battery energy and the sheer weight of heavy-duty trucks lead to reduced mileage and prolonged charging durations. Consequently, battery-swapping services emerge as an attractive solution for these trucks. This paper employs a two-fold approach to investigate the potential and enhance the efficacy of such services. Firstly, spatial-temporal demand prediction models are adopted to predict the traffic patterns for the upcoming hours. Subsequently, the prediction guides an optimization module for efficient battery allocation and deployment. Analyzing the heavy-duty truck data on a highway network spanning over 2,500 miles, our model and analysis underscore the value of prediction/machine learning in facilitating future decision-makings. In particular, we find that the initial phase of implementing battery-swapping services favors mobile battery-swapping stations, but as the system matures, fixed-location stations are preferred.
Leveraging Data Geometry to Mitigate CSM in Steganalysis
Authors: Authors: Rony Abecidan (CRIStAL, CNRS), Vincent Itier (IMT Nord Europe, CRIStAL), Jérémie Boulanger (CRIStAL), Patrick Bas (CRIStAL, CNRS), Tomáš Pevný (CTU)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.04479
Pdf link: https://arxiv.org/pdf/2310.04479
Abstract In operational scenarios, steganographers use sets of covers from various sensors and processing pipelines that differ significantly from those used by researchers to train steganalysis models. This leads to an inevitable performance gap when dealing with out-of-distribution covers, commonly referred to as Cover Source Mismatch (CSM). In this study, we consider the scenario where test images are processed using the same pipeline. However, knowledge regarding both the labels and the balance between cover and stego is missing. Our objective is to identify a training dataset that allows for maximum generalization to our target. By exploring a grid of processing pipelines fostering CSM, we discovered a geometrical metric based on the chordal distance between subspaces spanned by DCTr features, that exhibits high correlation with operational regret while being not affected by the cover-stego balance. Our contribution lies in the development of a strategy that enables the selection or derivation of customized training datasets, enhancing the overall generalization performance for a given target. Experimental validation highlights that our geometry-based optimization strategy outperforms traditional atomistic methods given reasonable assumptions. Additional resources are available at github.com/RonyAbecidan/LeveragingGeometrytoMitigateCSM.
EMOFM: Ensemble MLP mOdel with Feature-based Mixers for Click-Through Rate Prediction
Authors: Authors: Yujian Betterest Li, Kai Wu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.04482
Pdf link: https://arxiv.org/pdf/2310.04482
Abstract Track one of CTI competition is on click-through rate (CTR) prediction. The dataset contains millions of records and each field-wise feature in a record consists of hashed integers for privacy. For this task, the keys of network-based methods might be type-wise feature extraction and information fusion across different fields. Multi-layer perceptrons (MLPs) are able to extract field feature, but could not efficiently fuse features. Motivated by the natural fusion characteristic of cross attention and the efficiency of transformer-based structures, we propose simple plug-in mixers for field/type-wise feature fusion, and thus construct an field&type-wise ensemble model, namely EMOFM (Ensemble MLP mOdel with Feature-based Mixers). In the experiments, the proposed model is evaluated on the dataset, the optimization process is visualized and ablation studies are explored. It is shown that EMOFM outperforms compared baselines. In the end, we discuss on future work. WARNING: The comparison might not be fair enough since the proposed method is designed for this data in particular while compared methods are not. For example, EMOFM especially takes different types of interactions into consideration while others do not. Anyway, we do hope that the ideas inside our method could help other developers/learners/researchers/thinkers and so on.
A Bi-objective Perspective on Controllable Language Models: Reward Dropout Improves Off-policy Control Performance
Authors: Authors: Changhun Lee, Chiehyeon Lim
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.04483
Pdf link: https://arxiv.org/pdf/2310.04483
Abstract We study the theoretical aspects of CLMs (Controllable Language Models) from a bi-objective optimization perspective. Specifically, we consider the CLMs as an off-policy RL problem that requires simultaneously maximizing the reward and likelihood objectives. Our main contribution consists of three parts. First, we establish the theoretical foundations of CLM by presenting reward upper bound and Pareto improvement/optimality conditions. Second, we analyze conditions that improve and violate Pareto optimality itself, respectively. Finally, we propose Reward Dropout, a simple yet powerful method to guarantee policy improvement based on a Pareto improvement condition. Our theoretical outcomes are supported by not only deductive proofs but also empirical results. The performance of Reward Dropout was evaluated on five CLM benchmark datasets, and it turns out that the Reward Dropout significantly improves the performance of CLMs.
Submodular Norms with Applications To Online Facility Location and Stochastic Probing
Authors: Authors: Kalen Patton, Matteo Russo, Sahil Singla
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2310.04548
Pdf link: https://arxiv.org/pdf/2310.04548
Abstract Optimization problems often involve vector norms, which has led to extensive research on developing algorithms that can handle objectives beyond the $\ell_p$ norms. Our work introduces the concept of submodular norms, which are a versatile type of norms that possess marginal properties similar to submodular set functions. We show that submodular norms can accurately represent or approximate well-known classes of norms, such as $\ell_p$ norms, ordered norms, and symmetric norms. Furthermore, we establish that submodular norms can be applied to optimization problems such as online facility location, stochastic probing, and generalized load balancing. This allows us to develop a logarithmic-competitive algorithm for online facility location with symmetric norms, to prove a logarithmic adaptivity gap for stochastic probing with symmetric norms, and to give an alternative poly-logarithmic approximation algorithm for generalized load balancing with outer $\ell_1$ norm and inner symmetric norms.
DragD3D: Vertex-based Editing for Realistic Mesh Deformations using 2D Diffusion Priors
Authors: Authors: Tianhao Xie, Eugene Belilovsky, Sudhir Mudur, Tiberiu Popa
Subjects: Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04561
Pdf link: https://arxiv.org/pdf/2310.04561
Abstract Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline. Direct mesh editing methods are typically framed as optimization problems combining user-specified vertex constraints with a regularizer that determines the position of the rest of the vertices. The choice of the regularizer is key to the realism and authenticity of the final result. Physics and geometry-based regularizers are not aware of the global context and semantics of the object, and the more recent deep learning priors are limited to a specific class of 3D object deformations. In this work, our main contribution is a local mesh editing method called DragD3D for global context-aware realistic deformation through direct manipulation of a few vertices. DragD3D is not restricted to any class of objects. It achieves this by combining the classic geometric ARAP (as rigid as possible) regularizer with 2D priors obtained from a large-scale diffusion model. Specifically, we render the objects from multiple viewpoints through a differentiable renderer and use the recently introduced DDS loss which scores the faithfulness of the rendered image to one from a diffusion model. DragD3D combines the approximate gradients of the DDS with gradients from the ARAP loss to modify the mesh vertices via neural Jacobian field, while also satisfying vertex constraints. We show that our deformations are realistic and aware of the global context of the objects, and provide better results than just using geometric regularizers.
Can pruning make Large Language Models more efficient?
Authors: Authors: Sia Gholami, Marwan Omar
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.04573
Pdf link: https://arxiv.org/pdf/2310.04573
Abstract Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational efficiency, environmental impact, and deployability on resource-limited platforms. To address these challenges, this paper investigates the application of weight pruning-a strategic reduction of model parameters based on their significance-as an optimization strategy for Transformer architectures. Through extensive experimentation, we explore various pruning methodologies, highlighting their impact on model performance, size, and computational demands. Our findings suggest that with judicious selection of pruning hyperparameters, significant reductions in model size are attainable without considerable compromise on performance. Moreover, when coupled with post-pruning fine-tuning strategies, some pruned models even exhibit enhanced generalization capabilities. This work seeks to bridge the gap between model efficiency and performance, paving the way for more scalable and environmentally responsible deep learning applications.
Deep Model Predictive Optimization
Authors: Authors: Jacob Sacks, Rwik Rana, Kevin Huang, Alex Spitzer, Guanya Shi, Byron Boots
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04590
Pdf link: https://arxiv.org/pdf/2310.04590
Abstract A major challenge in robotics is to design robust policies which enable complex and agile behaviors in the real world. On one end of the spectrum, we have model-free reinforcement learning (MFRL), which is incredibly flexible and general but often results in brittle policies. In contrast, model predictive control (MPC) continually re-plans at each time step to remain robust to perturbations and model inaccuracies. However, despite its real-world successes, MPC often under-performs the optimal strategy. This is due to model quality, myopic behavior from short planning horizons, and approximations due to computational constraints. And even with a perfect model and enough compute, MPC can get stuck in bad local optima, depending heavily on the quality of the optimization algorithm. To this end, we propose Deep Model Predictive Optimization (DMPO), which learns the inner-loop of an MPC optimization algorithm directly via experience, specifically tailored to the needs of the control problem. We evaluate DMPO on a real quadrotor agile trajectory tracking task, on which it improves performance over a baseline MPC algorithm for a given computational budget. It can outperform the best MPC algorithm by up to 27% with fewer samples and an end-to-end policy trained with MFRL by 19%. Moreover, because DMPO requires fewer samples, it can also achieve these benefits with 4.3X less memory. When we subject the quadrotor to turbulent wind fields with an attached drag plate, DMPO can adapt zero-shot while still outperforming all baselines. Additional results can be found at https://tinyurl.com/mr2ywmnw.
KyberMat: Efficient Accelerator for Matrix-Vector Polynomial Multiplication in CRYSTALS-Kyber Scheme via NTT and Polyphase Decomposition
Authors: Authors: Weihang Tan, Yingjie Lao, Keshab K. Parhi
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2310.04618
Pdf link: https://arxiv.org/pdf/2310.04618
Abstract CRYSTAL-Kyber (Kyber) is one of the post-quantum cryptography (PQC) key-encapsulation mechanism (KEM) schemes selected during the standardization process. This paper addresses optimization for Kyber architecture with respect to latency and throughput constraints. Specifically, matrix-vector multiplication and number theoretic transform (NTT)-based polynomial multiplication are critical operations and bottlenecks that require optimization. To address this challenge, we propose an algorithm and hardware co-design approach to systematically optimize matrix-vector multiplication and NTT-based polynomial multiplication by employing a novel sub-structure sharing technique in order to reduce computational complexity, i.e., the number of modular multiplications and modular additions/subtractions consumed. The sub-structure sharing approach is inspired by prior fast parallel approaches based on polyphase decomposition. The proposed efficient feed-forward architecture achieves high speed, low latency, and full utilization of all hardware components, which can significantly enhance the overall efficiency of the Kyber scheme. The FPGA implementation results show that our proposed design, using the fast two-parallel structure, leads to an approximate reduction of 90% in execution time, along with a 66 times improvement in throughput performance.
An Experimental Comparison of Methods for Computing the Numerical Radius
Authors: Authors: Tim Mitchell, Michael L. Overton
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.04646
Pdf link: https://arxiv.org/pdf/2310.04646
Abstract We make an experimental comparison of methods for computing the numerical radius of an $n\times n$ complex matrix, based on two well-known characterizations, the first a nonconvex optimization problem in one real variable and the second a convex optimization problem in $n^{2}+1$ real variables. We make comparisons with respect to both accuracy and computation time using publicly available software.
Oracle Efficient Algorithms for Groupwise Regret
Authors: Authors: Krishna Acharya, Eshwar Ram Arunachaleswaran, Sampath Kannan, Aaron Roth, Juba Ziani
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04652
Pdf link: https://arxiv.org/pdf/2310.04652
Abstract We study the problem of online prediction, in which at each time step $t$, an individual $x_t$ arrives, whose label we must predict. Each individual is associated with various groups, defined based on their features such as age, sex, race etc., which may intersect. Our goal is to make predictions that have regret guarantees not just overall but also simultaneously on each sub-sequence comprised of the members of any single group. Previous work such as [Blum & Lykouris] and [Lee et al] provide attractive regret guarantees for these problems; however, these are computationally intractable on large model classes. We show that a simple modification of the sleeping experts technique of [Blum & Lykouris] yields an efficient reduction to the well-understood problem of obtaining diminishing external regret absent group considerations. Our approach gives similar regret guarantees compared to [Blum & Lykouris]; however, we run in time linear in the number of groups, and are oracle-efficient in the hypothesis class. This in particular implies that our algorithm is efficient whenever the number of groups is polynomially bounded and the external-regret problem can be solved efficiently, an improvement on [Blum & Lykouris]'s stronger condition that the model class must be small. Our approach can handle online linear regression and online combinatorial optimization problems like online shortest paths. Beyond providing theoretical regret bounds, we evaluate this algorithm with an extensive set of experiments on synthetic data and on two real data sets -- Medical costs and the Adult income dataset, both instantiated with intersecting groups defined in terms of race, sex, and other demographic characteristics. We find that uniformly across groups, our algorithm gives substantial error improvements compared to running a standard online linear regression algorithm with no groupwise regret guarantees.
Hypergraph Analysis Based on a Compatible Tensor Product Structure
Authors: Authors: Jiaqi Gu, Shenghao Feng, Yimin Wei
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.04682
Pdf link: https://arxiv.org/pdf/2310.04682
Abstract We propose a tensor product structure that is compatible with the hypergraph structure. We define the algebraic connectivity of the $(m+1)$-uniform hypergraph in this product, and prove the relationship with the vertex connectivity. We introduce some connectivity optimization problem into the hypergraph, and solve them with the algebraic connectivity. We introduce the Laplacian eigenmap algorithm to the hypergraph under our tensor product.
Automatic and Efficient Customization of Neural Networks for ML Applications
Authors: Authors: Yuhan Liu, Chengcheng Wan, Kuntai Du, Henry Hoffmann, Junchen Jiang, Shan Lu, Michael Maire
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2310.04685
Pdf link: https://arxiv.org/pdf/2310.04685
Abstract ML APIs have greatly relieved application developers of the burden to design and train their own neural network models -- classifying objects in an image can now be as simple as one line of Python code to call an API. However, these APIs offer the same pre-trained models regardless of how their output is used by different applications. This can be suboptimal as not all ML inference errors can cause application failures, and the distinction between inference errors that can or cannot cause failures varies greatly across applications. To tackle this problem, we first study 77 real-world applications, which collectively use six ML APIs from two providers, to reveal common patterns of how ML API output affects applications' decision processes. Inspired by the findings, we propose ChameleonAPI, an optimization framework for ML APIs, which takes effect without changing the application source code. ChameleonAPI provides application developers with a parser that automatically analyzes the application to produce an abstract of its decision process, which is then used to devise an application-specific loss function that only penalizes API output errors critical to the application. ChameleonAPI uses the loss function to efficiently train a neural network model customized for each application and deploys it to serve API invocations from the respective application via existing interface. Compared to a baseline that selects the best-of-all commercial ML API, we show that ChameleonAPI reduces incorrect application decisions by 43%.
Understanding and Improving Adversarial Attacks on Latent Diffusion Model
Authors: Authors: Boyang Zheng, Chumeng Liang, Xiaoyu Wu, Yan Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.04687
Pdf link: https://arxiv.org/pdf/2310.04687
Abstract Latent Diffusion Model (LDM) has emerged as a leading tool in image generation, particularly with its capability in few-shot generation. This capability also presents risks, notably in unauthorized artwork replication and misinformation generation. In response, adversarial attacks have been designed to safeguard personal images from being used as reference data. However, existing adversarial attacks are predominantly empirical, lacking a solid theoretical foundation. In this paper, we introduce a comprehensive theoretical framework for understanding adversarial attacks on LDM. Based on the framework, we propose a novel adversarial attack that exploits a unified target to guide the adversarial attack both in the forward and the reverse process of LDM. We provide empirical evidences that our method overcomes the offset problem of the optimization of adversarial attacks in existing methods. Through rigorous experiments, our findings demonstrate that our method outperforms current attacks and is able to generalize over different state-of-the-art few-shot generation pipelines based on LDM. Our method can serve as a stronger and efficient tool for people exposed to the risk of data privacy and security to protect themselves in the new era of powerful generative models. The code is available on GitHub: https://github.com/CaradryanLiang/ImprovedAdvDM.git.
EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling
Authors: Authors: Siyu Ren, Zhiyong Wu, Kenny Q. Zhu
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.04691
Pdf link: https://arxiv.org/pdf/2310.04691
Abstract Neural language models are probabilistic models of human text. They are predominantly trained using maximum likelihood estimation (MLE), which is equivalent to minimizing the forward cross-entropy between the empirical data distribution and the model distribution. However, various degeneration phenomena are still widely observed when decoding from the distributions learned by such models. We establish that the forward cross-entropy is suboptimal as a distance metric for aligning human and model distribution due to its (1) recall-prioritization (2) negative diversity ignorance and (3) train-test mismatch. In this paper, we propose Earth Mover Distance Optimization (EMO) for auto-regressive language modeling. EMO capitalizes on the inherent properties of earth mover distance to address the aforementioned challenges. Due to the high complexity of direct computation, we further introduce a feasible upper bound for EMO to ease end-to-end training. Upon extensive evaluation of language models trained using EMO and MLE. We find that EMO demonstrates a consistently better language modeling performance than MLE across domains. Moreover, EMO demonstrates noteworthy enhancements in downstream performance with minimal fine-tuning on merely 25,000 sentences. This highlights the tremendous potential of EMO as a lightweight calibration method for enhancing large-scale pre-trained language models.
Review of Machine Learning Techniques for Power Electronics Control and Optimization
Authors: Authors: Maryam Bahrami, Zeyad Khashroum
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.04699
Pdf link: https://arxiv.org/pdf/2310.04699
Abstract In the rapidly advancing landscape of contemporary technology, power electronics assume a pivotal role across diverse applications, ranging from renewable energy systems to electric vehicles and consumer electronics. The efficacy and precision of these power electronics systems stand as cornerstones of their functionality. Within this context, the integration of machine learning techniques assumes paramount significance. This article endeavors to present an extensive and comprehensive review of the machine learning techniques that find application in power electronics control and optimization. Through meticulous exploration, we aim to elucidate the profound potential of these methods in shaping the future of power electronics control and optimization.
A Comprehensive Survey on Deep Neural Image Deblurring
Authors: Authors: Sajjad Amrollahi Biyouki, Hoon Hwangbo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2310.04719
Pdf link: https://arxiv.org/pdf/2310.04719
Abstract Image deblurring tries to eliminate degradation elements of an image causing blurriness and improve the quality of an image for better texture and object visualization. Traditionally, prior-based optimization approaches predominated in image deblurring, but deep neural networks recently brought a major breakthrough in the field. In this paper, we comprehensively review the recent progress of the deep neural architectures in both blind and non-blind image deblurring. We outline the most popular deep neural network structures used in deblurring applications, describe their strengths and novelties, summarize performance metrics, and introduce broadly used datasets. In addition, we discuss the current challenges and research gaps in this domain and suggest potential research directions for future works.
Optimal Sequential Decision-Making in Geosteering: A Reinforcement Learning Approach
Authors: Authors: Ressi Bonti Muhammad, Sergey Alyaev, Reidar Brumer Bratvold
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Geophysics (physics.geo-ph)
Arxiv link: https://arxiv.org/abs/2310.04772
Pdf link: https://arxiv.org/pdf/2310.04772
Abstract Trajectory adjustment decisions throughout the drilling process, called geosteering, affect subsequent choices and information gathering, thus resulting in a coupled sequential decision problem. Previous works on applying decision optimization methods in geosteering rely on greedy optimization or Approximate Dynamic Programming (ADP). Either decision optimization method requires explicit uncertainty and objective function models, making developing decision optimization methods for complex and realistic geosteering environments challenging to impossible. We use the Deep Q-Network (DQN) method, a model-free reinforcement learning (RL) method that learns directly from the decision environment, to optimize geosteering decisions. The expensive computations for RL are handled during the offline training stage. Evaluating DQN needed for real-time decision support takes milliseconds and is faster than the traditional alternatives. Moreover, for two previously published synthetic geosteering scenarios, our results show that RL achieves high-quality outcomes comparable to the quasi-optimal ADP. Yet, the model-free nature of RL means that by replacing the training environment, we can extend it to problems where the solution to ADP is prohibitively expensive to compute. This flexibility will allow applying it to more complex environments and make hybrid versions trained with real data in the future.
HI-SLAM: Monocular Real-time Dense Mapping with Hybrid Implicit Fields
Authors: Authors: Wei Zhang, Tiecheng Sun, Sen Wang, Qing Cheng, Norbert Haala
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.04787
Pdf link: https://arxiv.org/pdf/2310.04787
Abstract In this letter, we present a neural field-based real-time monocular mapping framework for accurate and dense Simultaneous Localization and Mapping (SLAM). Recent neural mapping frameworks show promising results, but rely on RGB-D or pose inputs, or cannot run in real-time. To address these limitations, our approach integrates dense-SLAM with neural implicit fields. Specifically, our dense SLAM approach runs parallel tracking and global optimization, while a neural field-based map is constructed incrementally based on the latest SLAM estimates. For the efficient construction of neural fields, we employ multi-resolution grid encoding and signed distance function (SDF) representation. This allows us to keep the map always up-to-date and adapt instantly to global updates via loop closing. For global consistency, we propose an efficient Sim(3)-based pose graph bundle adjustment (PGBA) approach to run online loop closing and mitigate the pose and scale drift. To enhance depth accuracy further, we incorporate learned monocular depth priors. We propose a novel joint depth and scale adjustment (JDSA) module to solve the scale ambiguity inherent in depth priors. Extensive evaluations across synthetic and real-world datasets validate that our approach outperforms existing methods in accuracy and map completeness while preserving real-time performance.
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Authors: Authors: Luoming Zhang, Wen Fei, Weijia Wu, Yefei He, Zhenyu Lou, Hong Zhou
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.04836
Pdf link: https://arxiv.org/pdf/2310.04836
Abstract Large Language Models (LLMs) pose significant hardware challenges related to memory requirements and computational ability. There are two mainstream quantization schemes for LLMs: coarse-grained ($\textit{e.g.,}$ channel-wise) quantization and fine-grained ($\textit{e.g.,}$ group-wise) quantization. Fine-grained quantization has smaller quantization loss, consequently achieving superior performance. However, when applied to weight-activation quantization, it disrupts continuous integer matrix multiplication, leading to inefficient inference. In this paper, we introduce Dual Grained Quantization (DGQ), a novel A8W4 quantization for LLM that maintains superior performance while ensuring fast inference speed. DSQ dequantizes the fine-grained INT4 weight into coarse-grained INT8 representation and preform matrix multiplication using INT8 kernels. Besides, we develop a two-phase grid search algorithm to simplify the determination of fine-grained and coarse-grained quantization scales. We also devise a percentile clipping schema for smoothing the activation outliers without the need for complex optimization techniques. Experimental results demonstrate that DGQ consistently outperforms prior methods across various LLM architectures and a wide range of tasks. Remarkably, by our implemented efficient CUTLASS kernel, we achieve $\textbf{1.12}$ $\times$ memory reduction and $\textbf{3.24}$ $\times$ speed gains comparing A16W4 implementation. These advancements enable efficient deployment of A8W4 LLMs for real-world applications.
End-to-End Lip Reading in Romanian with Cross-Lingual Domain Adaptation and Lateral Inhibition
Authors: Authors: Emilian-Claudiu Mănescu, Răzvan-Alexandru Smădu, Andrei-Marius Avram, Dumitru-Clementin Cercel, Florin Pop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.04858
Pdf link: https://arxiv.org/pdf/2310.04858
Abstract Lip reading or visual speech recognition has gained significant attention in recent years, particularly because of hardware development and innovations in computer vision. While considerable progress has been obtained, most models have only been tested on a few large-scale datasets. This work addresses this shortcoming by analyzing several architectures and optimizations on the underrepresented, short-scale Romanian language dataset called Wild LRRo. Most notably, we compare different backend modules, demonstrating the effectiveness of adding ample regularization methods. We obtain state-of-the-art results using our proposed method, namely cross-lingual domain adaptation and unlabeled videos from English and German datasets to help the model learn language-invariant features. Lastly, we assess the performance of adding a layer inspired by the neural inhibition mechanism.
Robust Multivariate Detection and Estimation with Fault Frequency Content Information
Authors: Authors: Jingwei Dong, Kaikai Pan, Sergio Pequito, Peyman Mohajerin Esfahani
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.04922
Pdf link: https://arxiv.org/pdf/2310.04922
Abstract This paper studies the problem of fault detection and estimation (FDE) for LTI systems with a particular focus on frequency content information for the faults, possibly as a continuum range, and under both disturbances and stochastic noise. Considering the worst-case fault sensitivity in the frequency range and the effects of disturbances and noise, we introduce a mixed $\mathcal{H}2/\mathcal{H}{_}$ performance index and develop an optimization framework to compute the optimal detection filter. We further propose a thresholding rule that provides guarantees on both false alarm rate (FAR) and fault detection rate (FDR). Next, shifting our attention to the estimation problem, we introduce the restricted $\mathcal{H}_{\infty}$ performance index and obtain an exact reformulation of the optimal filter design. This problem is inherently non-convex, however, focusing on finite frequency samples and fixed poles, we then establish a lower bound via a highly tractable quadratic programming (QP) problem. This lower bound together with an alternating optimization approach to the original estimation problem leads to a suboptimality gap for the overall filter design. The effectiveness of the proposed approaches is validated through a synthetic non-minimum phase system and an application of the multi-area power system.
A Optimal Unequal Error Protection LDPC Coded Recording System
Authors: Authors: Hong-fu Chou
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2310.04923
Pdf link: https://arxiv.org/pdf/2310.04923
Abstract For efficient modulation and error control coding, the deliberate flipping approach imposes the run-length-limited(RLL) constraint by bit error before recording. From the read side, a high coding rate limits the correcting capability of RLL bit error. In this paper, we study the low-density parity-check (LDPC) coding for RLL constrained recording system based on the Unequal Error Protection (UEP) coding scheme design. The UEP capability of irregular LDPC codes is used for recovering flipped bits. We provide an allocation technique to limit the occurrence of flipped bits on the bit with robust correction capability. In addition, we consider the signal labeling design to decrease the number of nearest neighbors to enhance the robust bit. We also apply the density evolution technique to the proposed system for evaluating the code performances. In addition, we utilize the EXIT characteristic to reveal the decoding behavior of the recommended code distribution. Finally, the optimization approach for the best distribution is proven by differential evolution for the proposed system.
Algorithms for the Ridesharing with Profit Constraint Problem
Authors: Authors: Qian-Ping Gu, Jiajian Leo Liang
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2310.04933
Pdf link: https://arxiv.org/pdf/2310.04933
Abstract Mobility-on-demand (MoD) ridesharing is a promising way to improve the occupancy rate of personal vehicles and reduce traffic congestion and emissions. Maximizing the number of passengers served and maximizing a profit target are major optimization goals in MoD ridesharing. We study the ridesharing with profit constraint problem (labeled as RPC) which considers both optimization goals altogether: maximize the total number of passengers subject to an overall drivers' profit target. We give a mathematical formulation for the RPC problem. We present a polynomial-time exact algorithm framework (including two practical implementations of the algorithm) and a (1/2)-approximation algorithm for the case that each vehicle serves at most one passenger. We propose a (2/3*lambda)-approximation algorithm for the case that each vehicle serves at most lambda >= 2 passengers. Our algorithms revolve around the idea of maximum cardinality matching in bipartite graphs and hypergraphs (set packing) with general edge weight. Based on a real-world ridesharing dataset in Chicago City and price schemes of Uber, we conduct an extensive empirical study on our model and algorithms. Experimental results show that practical price schemes can be incorporated into our model, our exact algorithms are efficient, and our approximation algorithms achieve about 90% of optimal solutions, in the number of passengers served.
Compositional Semantics for Open Vocabulary Spatio-semantic Representations
Authors: Authors: Robin Karlsson, Francisco Lepe-Salazar, Kazuya Takeda
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04981
Pdf link: https://arxiv.org/pdf/2310.04981
Abstract General-purpose mobile robots need to complete tasks without exact human instructions. Large language models (LLMs) is a promising direction for realizing commonsense world knowledge and reasoning-based planning. Vision-language models (VLMs) transform environment percepts into vision-language semantics interpretable by LLMs. However, completing complex tasks often requires reasoning about information beyond what is currently perceived. We propose latent compositional semantic embeddings z as a principled learning-based knowledge representation for queryable spatio-semantic memories. We mathematically prove that z can always be found, and the optimal z is the centroid for any set Z. We derive a probabilistic bound for estimating separability of related and unrelated semantics. We prove that z is discoverable by iterative optimization by gradient descent from visual appearance and singular descriptions. We experimentally verify our findings on four embedding spaces incl. CLIP and SBERT. Our results show that z can represent up to 10 semantics encoded by SBERT, and up to 100 semantics for ideal uniformly distributed high-dimensional embeddings. We demonstrate that a simple dense VLM trained on the COCO-Stuff dataset can learn z for 181 overlapping semantics by 42.23 mIoU, while improving conventional non-overlapping open-vocabulary segmentation performance by +3.48 mIoU compared with a popular SOTA model.
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition
Authors: Authors: Zixiao Wang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Boqiang Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.04999
Pdf link: https://arxiv.org/pdf/2310.04999
Abstract In this paper, we explore the potential of the Contrastive Language-Image Pretraining (CLIP) model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature Distillation framework (named CLIP-OCR) to leverage both visual and linguistic knowledge in CLIP. Different from previous CLIP-based methods mainly considering feature generalization on visual encoding, we propose a symmetrical distillation strategy (SDS) that further captures the linguistic knowledge in the CLIP text encoder. By cascading the CLIP image encoder with the reversed CLIP text encoder, a symmetrical structure is built with an image-to-text feature flow that covers not only visual but also linguistic information for distillation.Benefiting from the natural alignment in CLIP, such guidance flow provides a progressive optimization objective from vision to language, which can supervise the STR feature forwarding process layer-by-layer.Besides, a new Linguistic Consistency Loss (LCL) is proposed to enhance the linguistic capability by considering second-order statistics during the optimization. Overall, CLIP-OCR is the first to design a smooth transition between image and text for the STR task.Extensive experiments demonstrate the effectiveness of CLIP-OCR with 93.8% average accuracy on six popular STR benchmarks.Code will be available at https://github.com/wzx99/CLIPOCR.
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data
Authors: Authors: Zuxuan Wu, Zejia Weng, Wujian Peng, Xitong Yang, Ang Li, Larry S. Davis, Yu-Gang Jiang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.05010
Pdf link: https://arxiv.org/pdf/2310.05010
Abstract Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video recognition. This paper presents Open-VCLIP++, a simple yet effective framework that adapts CLIP to a strong zero-shot video classifier, capable of identifying novel actions and events during testing. Open-VCLIP++ minimally modifies CLIP to capture spatial-temporal relationships in videos, thereby creating a specialized video classifier while striving for generalization. We formally demonstrate that training Open-VCLIP++ is tantamount to continual learning with zero historical data. To address this problem, we introduce Interpolated Weight Optimization, a technique that leverages the advantages of weight interpolation during both training and testing. Furthermore, we build upon large language models to produce fine-grained video descriptions. These detailed descriptions are further aligned with video features, facilitating a better transfer of CLIP to the video domain. Our approach is evaluated on three widely used action recognition datasets, following a variety of zero-shot evaluation protocols. The results demonstrate that our method surpasses existing state-of-the-art techniques by significant margins. Specifically, we achieve zero-shot accuracy scores of 88.1%, 58.7%, and 81.2% on UCF, HMDB, and Kinetics-600 datasets respectively, outpacing the best-performing alternative methods by 8.5%, 8.2%, and 12.3%. We also evaluate our approach on the MSR-VTT video-text retrieval dataset, where it delivers competitive video-to-text and text-to-video retrieval performance, while utilizing substantially less fine-tuning data compared to other methods. Code is released at https://github.com/wengzejia1/Open-VCLIP.
FP3O: Enabling Proximal Policy Optimization in Multi-Agent Cooperation with Parameter-Sharing Versatility
Authors: Authors: Lang Feng, Dong Xing, Junru Zhang, Gang Pan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2310.05053
Pdf link: https://arxiv.org/pdf/2310.05053
Abstract Existing multi-agent PPO algorithms lack compatibility with different types of parameter sharing when extending the theoretical guarantee of PPO to cooperative multi-agent reinforcement learning (MARL). In this paper, we propose a novel and versatile multi-agent PPO algorithm for cooperative MARL to overcome this limitation. Our approach is achieved upon the proposed full-pipeline paradigm, which establishes multiple parallel optimization pipelines by employing various equivalent decompositions of the advantage function. This procedure successfully formulates the interconnections among agents in a more general manner, i.e., the interconnections among pipelines, making it compatible with diverse types of parameter sharing. We provide a solid theoretical foundation for policy improvement and subsequently develop a practical algorithm called Full-Pipeline PPO (FP3O) by several approximations. Empirical evaluations on Multi-Agent MuJoCo and StarCraftII tasks demonstrate that FP3O outperforms other strong baselines and exhibits remarkable versatility across various parameter-sharing configurations.
Low-Latency Video Conferencing System for Geo-Distributed Data Centers
Authors: Authors: Yao Xiao, Sitian Chen, Amelie Chi Zhou, Shuhao Zhang, Yi Wang, Rui Mao, Xuan Yang
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.05054
Pdf link: https://arxiv.org/pdf/2310.05054
Abstract In the face of rising global demand for audio/video meetings, managing traffic across geographically distributed (geo-distributed) data centers presents a significant challenge due to the dynamic and limited nature of inter-DC network performance. Facing these issues, this paper introduces two novel techniques, VCRoute and WMJitter, to optimize the performance of geo-distributed video conferencing systems. VCRoute is a routing method designed for video conferencing data packets. It treats the routing problem as a Multi-Armed Bandit issue, and utilizes a tailored Thompson Sampling algorithm for resolution. Unlike traditional approaches, VCRoute uses predicted end-to-end latency as the routing selection reward for each packet, enabling effective and timely end-to-end latency optimization. In conjunction with VCRoute, we present WMJitter, a watermark-based mechanism for managing network jitter. Leveraging a window-based statistic method, WMJitter enables real-time network jitter estimation, leading to significant reductions in end-to-end delay and an improved balance between latency and loss rate. Evaluations based on real geo-distributed network performance demonstrate the effectiveness and scalability of VCRoute and WMJitter, offering robust solutions for optimizing video conferencing systems in geo-distributed settings.
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models
Authors: Authors: Chengcheng Han, Xiaowei Du, Che Zhang, Yixin Lian, Xiang Li, Ming Gao, Baoyuan Wang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.05074
Pdf link: https://arxiv.org/pdf/2310.05074
Abstract Chain-of-Thought (CoT) prompting has proven to be effective in enhancing the reasoning capabilities of Large Language Models (LLMs) with at least 100 billion parameters. However, it is ineffective or even detrimental when applied to reasoning tasks in Smaller Language Models (SLMs) with less than 10 billion parameters. To address this limitation, we introduce Dialogue-guided Chain-of-Thought (DialCoT) which employs a dialogue format to generate intermediate reasoning steps, guiding the model toward the final answer. Additionally, we optimize the model's reasoning path selection using the Proximal Policy Optimization (PPO) algorithm, further enhancing its reasoning capabilities. Our method offers several advantages compared to previous approaches. Firstly, we transform the process of solving complex reasoning questions by breaking them down into a series of simpler sub-questions, significantly reducing the task difficulty and making it more suitable for SLMs. Secondly, we optimize the model's reasoning path selection through the PPO algorithm. We conduct comprehensive experiments on four arithmetic reasoning datasets, demonstrating that our method achieves significant performance improvements compared to state-of-the-art competitors.
Towards Scalable Wireless Federated Learning: Challenges and Solutions
Authors: Authors: Yong Zhou, Yuanming Shi, Haibo Zhou, Jingjing Wang, Liqun Fu, Yang Yang
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05076
Pdf link: https://arxiv.org/pdf/2310.05076
Abstract The explosive growth of smart devices (e.g., mobile phones, vehicles, drones) with sensing, communication, and computation capabilities gives rise to an unprecedented amount of data. The generated massive data together with the rapid advancement of machine learning (ML) techniques spark a variety of intelligent applications. To distill intelligence for supporting these applications, federated learning (FL) emerges as an effective distributed ML framework, given its potential to enable privacy-preserving model training at the network edge. In this article, we discuss the challenges and solutions of achieving scalable wireless FL from the perspectives of both network design and resource orchestration. For network design, we discuss how task-oriented model aggregation affects the performance of wireless FL, followed by proposing effective wireless techniques to enhance the communication scalability via reducing the model aggregation distortion and improving the device participation. For resource orchestration, we identify the limitations of the existing optimization-based algorithms and propose three task-oriented learning algorithms to enhance the algorithmic scalability via achieving computation-efficient resource allocation for wireless FL. We highlight several potential research issues that deserve further study.
A Privacy-Preserving Trajectory Synthesis Method Based on Vector Translation Invariance Supporting Traffic Constraints
Authors: Authors: Zechen Liu, Wei Song, Yuhan Wang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.05091
Pdf link: https://arxiv.org/pdf/2310.05091
Abstract With the popularization of different kinds of smart terminals and the development of autonomous driving technology, more and more services based on spatio-temporal data have emerged in our lives, such as online taxi services, traffic flow prediction, and tracking virus propagation. However, the privacy concerns of spatio-temporal data greatly limit the use of them. To address this issue, differential privacy method based on spatio-temporal data has been proposed. In differential privacy, a good aggregation query can highly improve the data utility. But the mainstream aggregation query methods are based on area partitioning, which is difficult to generate trajectory with high utility for they are hard to take time and constraints into account. Motivated by this, we propose an aggregation query based on the relationships between trajectories, so it can greatly improve the data utility as compared to the existing methods. The trajectory synthesis task can be regarded as an optimization problem of finding trajectories that match the relationships between trajectories. We adopt gradient descent to find new trajectories that meet the conditions, and during the gradient descent, we can easily take the constraints into account by adding penalty terms which area partitioning based query is hard to achieve. We carry out extensive experiments to validate that the trajectories generated by our method have higher utility and the theoretic analysis shows that our method is safe and reliable.
Asymmetrically Decentralized Federated Learning
Authors: Authors: Qinglun Li, Miao Zhang, Nan Yin, Quanjun Yin, Li Shen
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.05093
Pdf link: https://arxiv.org/pdf/2310.05093
Abstract To address the communication burden and privacy concerns associated with the centralized server in Federated Learning (FL), Decentralized Federated Learning (DFL) has emerged, which discards the server with a peer-to-peer (P2P) communication framework. However, most existing DFL algorithms are based on symmetric topologies, such as ring and grid topologies, which can easily lead to deadlocks and are susceptible to the impact of network link quality in practice. To address these issues, this paper proposes the DFedSGPSM algorithm, which is based on asymmetric topologies and utilizes the Push-Sum protocol to effectively solve consensus optimization problems. To further improve algorithm performance and alleviate local heterogeneous overfitting in Federated Learning (FL), our algorithm combines the Sharpness Aware Minimization (SAM) optimizer and local momentum. The SAM optimizer employs gradient perturbations to generate locally flat models and searches for models with uniformly low loss values, mitigating local heterogeneous overfitting. The local momentum accelerates the optimization process of the SAM optimizer. Theoretical analysis proves that DFedSGPSM achieves a convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T}})$ in a non-convex smooth setting under mild assumptions. This analysis also reveals that better topological connectivity achieves tighter upper bounds. Empirically, extensive experiments are conducted on the MNIST, CIFAR10, and CIFAR100 datasets, demonstrating the superior performance of our algorithm compared to state-of-the-art optimizers.
How Graph Neural Networks Learn: Lessons from Training Dynamics in Function Space
Authors: Authors: Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun, Junchi Yan
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05105
Pdf link: https://arxiv.org/pdf/2310.05105
Abstract A long-standing goal in deep learning has been to characterize the learning behavior of black-box models in a more interpretable manner. For graph neural networks (GNNs), considerable advances have been made in formalizing what functions they can represent, however it remains less clear whether and how GNNs learn desired functions during the optimization process. To fill this critical gap, we study the learning dynamics of GNNs in function space via the analytic framework of overparameterization. In particular, we find that the seemingly complicated training process of GNNs can be re-cast into a more familiar label propagation framework, due to the graph inductive bias implicit in this process. From this vantage point, we provide explanations for why the learned GNN functions successfully generalize and for their pathological behavior on heterophilic graphs, which are consistent with observations. Practically, sparsifying and implementing the learning dynamics lead to a minimalist semi-supervised learning algorithm with the efficiency of classic algorithms and the effectiveness of modern GNNs.
Secure Short-Packet Transmission with Aerial Relaying: Blocklength and Trajectory Co-Design
Authors: Authors: Milad Tatar Mamaghani, Xiangyun Zhou, Nan Yang, A. Lee Swindlehurst
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.05142
Pdf link: https://arxiv.org/pdf/2310.05142
Abstract In this paper, we propose a secure short-packet communication (SPC) system involving an unmanned aerial vehicle (UAV)-aided relay in the presence of a terrestrial passive eavesdropper. The considered system, which is applicable to various next-generation Internet-of-Things (IoT) networks, exploits a UAV as a mobile relay, facilitating the reliable and secure exchange of intermittent short packets between a pair of remote IoT devices with strict latency. Our objective is to improve the overall secrecy throughput performance of the system by carefully designing key parameters such as the coding blocklengths and the UAV trajectory. However, this inherently poses a challenging optimization problem that is difficult to solve optimally. To address the issue, we propose a low-complexity algorithm inspired by the block successive convex approximation approach, where we divide the original problem into two subproblems and solve them alternately until convergence. Numerical results demonstrate that the proposed design achieves significant performance improvements relative to other benchmarks, and offer valuable insights into determining appropriate coding blocklengths and UAV trajectory.
ZooPFL: Exploring Black-box Foundation Models for Personalized Federated Learning
Authors: Authors: Wang Lu, Hao Yu, Jindong Wang, Damien Teney, Haohan Wang, Yiqiang Chen, Qiang Yang, Xing Xie, Xiangyang Ji
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05143
Pdf link: https://arxiv.org/pdf/2310.05143
Abstract When personalized federated learning (FL) meets large foundation models, new challenges arise from various limitations in resources. In addition to typical limitations such as data, computation, and communication costs, access to the models is also often limited. This paper endeavors to solve both the challenges of limited resources and personalization. i.e., distribution shifts between clients. To do so, we propose a method named ZOOPFL that uses Zeroth-Order Optimization for Personalized Federated Learning. ZOOPFL avoids direct interference with the foundation models and instead learns to adapt its inputs through zeroth-order optimization. In addition, we employ simple yet effective linear projections to remap its predictions for personalization. To reduce the computation costs and enhance personalization, we propose input surgery to incorporate an auto-encoder with low-dimensional and client-specific embeddings. We provide theoretical support for ZOOPFL to analyze its convergence. Extensive empirical experiments on computer vision and natural language processing tasks using popular foundation models demonstrate its effectiveness for FL on black-box foundation models.
A Corrected Expected Improvement Acquisition Function Under Noisy Observations
Authors: Authors: Han Zhou, Xingchen Ma, Matthew B Blaschko
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.05166
Pdf link: https://arxiv.org/pdf/2310.05166
Abstract Sequential maximization of expected improvement (EI) is one of the most widely used policies in Bayesian optimization because of its simplicity and ability to handle noisy observations. In particular, the improvement function often uses the best posterior mean as the best incumbent in noisy settings. However, the uncertainty associated with the incumbent solution is often neglected in many analytic EI-type methods: a closed-form acquisition function is derived in the noise-free setting, but then applied to the setting with noisy observations. To address this limitation, we propose a modification of EI that corrects its closed-form expression by incorporating the covariance information provided by the Gaussian Process (GP) model. This acquisition function specializes to the classical noise-free result, and we argue should replace that formula in Bayesian optimization software packages, tutorials, and textbooks. This enhanced acquisition provides good generality for noisy and noiseless settings. We show that our method achieves a sublinear convergence rate on the cumulative regret bound under heteroscedastic observation noise. Our empirical results demonstrate that our proposed acquisition function can outperform EI in the presence of noisy observations on benchmark functions for black-box optimization, as well as on parameter search for neural network model compression.
Evolutionary Retrosynthetic Route Planning
Authors: Authors: Yan Zhang, Hao Hao, Xiao He, Shuanhu Gao, Aimin Zhou
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.05186
Pdf link: https://arxiv.org/pdf/2310.05186
Abstract Molecular retrosynthesis is a significant and complex problem in the field of chemistry, however, traditional manual synthesis methods not only need well-trained experts but also are time-consuming. With the development of big data and machine learning, artificial intelligence (AI) based retrosynthesis is attracting more attention and is becoming a valuable tool for molecular retrosynthesis. At present, Monte Carlo tree search is a mainstream search framework employed to address this problem. Nevertheless, its search efficiency is compromised by its large search space. Therefore, we propose a novel approach for retrosynthetic route planning based on evolutionary optimization, marking the first use of Evolutionary Algorithm (EA) in the field of multi-step retrosynthesis. The proposed method involves modeling the retrosynthetic problem into an optimization problem, defining the search space and operators. Additionally, to improve the search efficiency, a parallel strategy is implemented. The new approach is applied to four case products, and is compared with Monte Carlo tree search. The experimental results show that, in comparison to the Monte Carlo tree search algorithm, EA significantly reduces the number of calling single-step model by an average of 53.9%. The time required to search three solutions decreased by an average of 83.9%, and the number of feasible search routes increases by 5 times.
Towards Optimizing with Large Language Models
Authors: Authors: Pei-Fu Guo, Ying-Hsuan Chen, Yun-Da Tsai, Shou-De Lin
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05204
Pdf link: https://arxiv.org/pdf/2310.05204
Abstract In this work, we conduct an assessment of the optimization capabilities of LLMs across various tasks and data sizes. Each of these tasks corresponds to unique optimization domains, and LLMs are required to execute these tasks with interactive prompting. That is, in each optimization step, the LLM generates new solutions from the past generated solutions with their values, and then the new solutions are evaluated and considered in the next optimization step. Additionally, we introduce three distinct metrics for a comprehensive assessment of task performance from various perspectives. These metrics offer the advantage of being applicable for evaluating LLM performance across a broad spectrum of optimization tasks and are less sensitive to variations in test samples. By applying these metrics, we observe that LLMs exhibit strong optimization capabilities when dealing with small-sized samples. However, their performance is significantly influenced by factors like data size and values, underscoring the importance of further research in the domain of optimization tasks for LLMs.
Do Automatic Test Generation Tools Generate Flaky Tests?
Authors: Authors: Martin Gruber, Muhammad Firhard Roslan, Owain Parry, Fabian Scharnböck, Phil McMinn, Gordon Fraser
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2310.05223
Pdf link: https://arxiv.org/pdf/2310.05223
Abstract Non-deterministic test behavior, or flakiness, is common and dreaded among developers. Researchers have studied the issue and proposed approaches to mitigate it. However, the vast majority of previous work has only considered developer-written tests. The prevalence and nature of flaky tests produced by test generation tools remain largely unknown. We ask whether such tools also produce flaky tests and how these differ from developer-written ones. Furthermore, we evaluate mechanisms that suppress flaky test generation. We sample 6 356 projects written in Java or Python. For each project, we generate tests using EvoSuite (Java) and Pynguin (Python), and execute each test 200 times, looking for inconsistent outcomes. Our results show that flakiness is at least as common in generated tests as in developer-written tests. Nevertheless, existing flakiness suppression mechanisms implemented in EvoSuite are effective in alleviating this issue (71.7 % fewer flaky tests). Compared to developer-written flaky tests, the causes of generated flaky tests are distributed differently. Their non-deterministic behavior is more frequently caused by randomness, rather than by networking and concurrency. Using flakiness suppression, the remaining flaky tests differ significantly from any flakiness previously reported, where most are attributable to runtime optimizations and EvoSuite-internal resource thresholds. These insights, with the accompanying dataset, can help maintainers to improve test generation tools, give recommendations for developers using these tools, and serve as a foundation for future research in test flakiness or test generation.
Limitations of Stochastic Selection Problems with Pairwise Independent Priors
Authors: Authors: Shaddin Dughmi, Yusuf Hakan Kalayci, Neel Patel
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2310.05240
Pdf link: https://arxiv.org/pdf/2310.05240
Abstract Motivated by the growing interest in correlation-robust stochastic optimization, we investigate stochastic selection problems beyond independence. Specifically, we consider the instructive case of pairwise-independent priors and matroid constraints. We obtain essentially-optimal bounds for offline contention resolution and prophet inequalities against the almighty online adversary. The impetus for our work comes from the recent work of \cite{pi-uniform-prophet}, who derived a constant-approximation for the single-choice prophet inequality with pairwise-independent priors. For general matroids, our results are tight and largely negative. For both contention resolution and prophet inequalities, our impossibility results hold for the full linear matroid over a finite field. We explicitly construct pairwise-independent distributions which rule out an $\omega\left(\frac{1}{\rank}\right)$-balanced offline CRS and an $\omega\left(\frac{1}{\log \rank}\right)$-competitive prophet inequality. For both results, we employ a generic approach for constructing pairwise-independent random vectors -- one which unifies and generalizes existing pairwise-independence constructions from the literature on universal hash functions and pseudorandomness. Specifically, our approach is based on our observation that random linear maps turn linear independence into stochastic independence. We then examine the class of matroids which satisfy the so-called partition property -- these include most common matroids encountered in optimization. We obtain positive results for both contention resolution and prophet inequalities with pairwise-independent priors on such matroids, approximately matching the corresponding guarantees for fully independent priors.
Time-Varying Soft-Maximum Control Barrier Functions for Safety in an A Priori Unknown Environment
Authors: Authors: Amirsaeid Safari, Jesse B. Hoagg
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.05261
Pdf link: https://arxiv.org/pdf/2310.05261
Abstract This paper presents a time-varying soft-maximum composite control barrier function (CBF) that can be used to ensure safety in an a priori unknown environment, where local perception information regarding the safe set is periodically obtained. We consider the scenario where the periodically obtained perception feedback can be used to construct a local CBF that models a local subset of the unknown safe set. Then, we use a novel smooth time-varying soft-maximum function to compose the N most recently obtained local CBFs into a single CBF. This composite CBF models an approximate union of the N most recently obtained local subsets of the safe set. Notably, this composite CBF can have arbitrary relative degree r. Next, this composite CBF is used as a rth-order CBF constraint in a real-time-optimization to determine a control that minimizes a quadratic cost while guaranteeing that the state stays in a time-varying subset of the unknown safe set. We also present 2 applications of the time-varying soft-maximum composite CBF method: (1) a nonholonomic ground robot with nonnegligible inertia, and (2) a quadrotor aerial robot. In these applications, we present a simple new approach to generate the local CBFs from the periodically obtained perception data.
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods
Authors: Authors: Constantine Caramanis, Dimitris Fotakis, Alkis Kalavasis, Vasilis Kontonis, Christos Tzamos
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.05309
Pdf link: https://arxiv.org/pdf/2310.05309
Abstract Deep Neural Networks and Reinforcement Learning methods have empirically shown great promise in tackling challenging combinatorial problems. In those methods a deep neural network is used as a solution generator which is then trained by gradient-based methods (e.g., policy gradient) to successively obtain better solution distributions. In this work we introduce a novel theoretical framework for analyzing the effectiveness of such methods. We ask whether there exist generative models that (i) are expressive enough to generate approximately optimal solutions; (ii) have a tractable, i.e, polynomial in the size of the input, number of parameters; (iii) their optimization landscape is benign in the sense that it does not contain sub-optimal stationary points. Our main contribution is a positive answer to this question. Our result holds for a broad class of combinatorial problems including Max- and Min-Cut, Max-$k$-CSP, Maximum-Weight-Bipartite-Matching, and the Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla gradient descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks
Authors: Authors: Andrew Starnes, Anton Dereventsov, Clayton Webster
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05324
Pdf link: https://arxiv.org/pdf/2310.05324
Abstract In this effort, we consider the impact of regularization on the diversity of actions taken by policies generated from reinforcement learning agents trained using a policy gradient. Policy gradient agents are prone to entropy collapse, which means certain actions are seldomly, if ever, selected. We augment the optimization objective function for the policy with terms constructed from various $\varphi$-divergences and Maximum Mean Discrepancy which encourages current policies to follow different state visitation and/or action choice distribution than previously computed policies. We provide numerical experiments using MNIST, CIFAR10, and Spotify datasets. The results demonstrate the advantage of diversity-promoting policy regularization and that its use on gradient-based approaches have significantly improved performance on a variety of personalization tasks. Furthermore, numerical evidence is given to show that policy regularization increases performance without losing accuracy.
Infrared Small Target Detection Using Double-Weighted Multi-Granularity Patch Tensor Model With Tensor-Train Decomposition
Authors: Authors: Guiyu Zhang, Qunbo Lv, Zui Tao, Baoyu Zhu, Zheng Tan, Yuan Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.05347
Pdf link: https://arxiv.org/pdf/2310.05347
Abstract Infrared small target detection plays an important role in the remote sensing fields. Therefore, many detection algorithms have been proposed, in which the infrared patch-tensor (IPT) model has become a mainstream tool due to its excellent performance. However, most IPT-based methods face great challenges, such as inaccurate measure of the tensor low-rankness and poor robustness to complex scenes, which will leadto poor detection performance. In order to solve these problems, this paper proposes a novel double-weighted multi-granularity infrared patch tensor (DWMGIPT) model. First, to capture different granularity information of tensor from multiple modes, a multi-granularity infrared patch tensor (MGIPT) model is constructed by collecting nonoverlapping patches and tensor augmentation based on the tensor train (TT) decomposition. Second, to explore the latent structure of tensor more efficiently, we utilize the auto-weighted mechanism to balance the importance of information at different granularity. Then, the steering kernel (SK) is employed to extract local structure prior, which suppresses background interference such as strong edges and noise. Finally, an efficient optimization algorithm based on the alternating direction method of multipliers (ADMM) is presented to solve the model. Extensive experiments in various challenging scenes show that the proposed algorithm is robust to noise and different scenes. Compared with the other eight state-of-the-art methods, different evaluation metrics demonstrate that our method achieves better detection performance in various complex scenes.
Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pre-training
Authors: Authors: Michael Benington, Leo Phan, Chris Pierre Paul, Evan Shoemaker, Priyanka Ranade, Torstein Collett, Grant Hodgson Perez, Christopher Krieger
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05350
Pdf link: https://arxiv.org/pdf/2310.05350
Abstract AI accelerator processing capabilities and memory constraints largely dictate the scale in which machine learning workloads (e.g., training and inference) can be executed within a desirable time frame. Training a state of the art, transformer-based model today requires use of GPU-accelerated high performance computers with high-speed interconnects. As datasets and models continue to increase in size, computational requirements and memory demands for AI also continue to grow. These challenges have inspired the development of distributed algorithm and circuit-based optimization techniques that enable the ability to progressively scale models in multi-node environments, efficiently minimize neural network cost functions for faster convergence, and store more parameters into a set number of available resources. In our research project, we focus on parallel and distributed machine learning algorithm development, specifically for optimizing the data processing and pre-training of a set of 5 encoder-decoder LLMs, ranging from 580 million parameters to 13 billion parameters. We performed a fine-grained study to quantify the relationships between three ML parallelism methods, specifically exploring Microsoft DeepSpeed Zero Redundancy Optimizer (ZeRO) stages.
C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network
Authors: Authors: Ruizhi Wang, Xiangtao Wang, Jie Zhou, Thomas Lukasiewicz, Zhenghua Xu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.05355
Pdf link: https://arxiv.org/pdf/2310.05355
Abstract In clinical scenarios, multiple medical images with different views are usually generated simultaneously, and these images have high semantic consistency. However, most existing medical report generation methods only consider single-view data. The rich multi-view mutual information of medical images can help generate more accurate reports, however, the dependence of multi-view models on multi-view data in the inference stage severely limits their application in clinical practice. In addition, word-level optimization based on numbers ignores the semantics of reports and medical images, and the generated reports often cannot achieve good performance. Therefore, we propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C^2M-DoT). Specifically, (i) a semantic-based multi-view contrastive learning medical report generation framework is adopted to utilize cross-view information to learn the semantic representation of lesions; (ii) a domain transfer network is further proposed to ensure that the multi-view report generation model can still achieve good inference performance under single-view input; (iii) meanwhile, optimization using a cross-modal consistency loss facilitates the generation of textual reports that are semantically consistent with medical images. Extensive experimental studies on two public benchmark datasets demonstrate that C^2M-DoT substantially outperforms state-of-the-art baselines in all metrics. Ablation studies also confirmed the validity and necessity of each component in C^2M-DoT.
Quantum Bayesian Optimization
Authors: Authors: Zhongxiang Dai, Gregory Kang Ruey Lau, Arun Verma, Yao Shu, Bryan Kian Hsiang Low, Patrick Jaillet
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.05373
Pdf link: https://arxiv.org/pdf/2310.05373
Abstract Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent method for optimizing complicated black-box reward functions. Various BO algorithms have been theoretically shown to enjoy upper bounds on their cumulative regret which are sub-linear in the number T of iterations, and a regret lower bound of Omega(sqrt(T)) has been derived which represents the unavoidable regrets for any classical BO algorithm. Recent works on quantum bandits have shown that with the aid of quantum computing, it is possible to achieve tighter regret upper bounds better than their corresponding classical lower bounds. However, these works are restricted to either multi-armed or linear bandits, and are hence not able to solve sophisticated real-world problems with non-linear reward functions. To this end, we introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm. To the best of our knowledge, our Q-GP-UCB is the first BO algorithm able to achieve a regret upper bound of O(polylog T), which is significantly smaller than its regret lower bound of Omega(sqrt(T)) in the classical setting. Moreover, thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm from the previous work. We use simulations, as well as an experiment using a real quantum computer, to verify that the theoretical quantum speedup achieved by our Q-GP-UCB is also potentially relevant in practice.
Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization
Authors: Authors: Qiqi Duan, Chang Shao, Guochen Zhou, Qi Zhao, Yuhui Shi
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2310.05377
Pdf link: https://arxiv.org/pdf/2310.05377
Abstract In the post-Moore era, the main performance gains of black-box optimizers are increasingly depending upon parallelism, especially for large-scale optimization (LSO). In this paper, we propose to parallelize the well-established covariance matrix adaptation evolution strategy (CMA-ES) and in particular its one latest variant called limited-memory CMA (LM-CMA) for LSO. To achieve scalability while maintaining the invariance property as much as possible, we present a multilevel learningbased meta-framework. Owing to its hierarchically organized structure, Meta-ES is well-suited to implement our distributed meta-framework, wherein the outer-ES controls strategy parameters while all parallel inner-ESs run the serial LM-CMA with different settings. For the distribution mean update of the outerES, both the elitist and multi-recombination strategy are used in parallel to avoid stagnation and regression, respectively. To exploit spatiotemporal information, the global step-size adaptation combines Meta-ES with the parallel cumulative stepsize adaptation. After each isolation time, our meta-framework employs both the structure and parameter learning strategy to combine aligned evolution paths for CMA reconstruction. Experiments on a set of large-scale benchmarking functions with memory-intensive evaluations, arguably reflecting many data-driven optimization problems, validate the benefits (e.g., scalability w.r.t. CPU cores, effectiveness w.r.t. solution quality, and adaptability w.r.t. second-order learning) and costs of our meta-framework.
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels
Authors: Authors: Da Long, Wei W. Xing, Aditi S. Krishnapriyan, Robert M. Kirby, Shandian Zhe, Michael W. Mahoney
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.05387
Pdf link: https://arxiv.org/pdf/2310.05387
Abstract Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity as well as noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS). We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We combine it with a Bayesian spike-and-slab prior -- an ideal Bayesian sparse distribution -- for effective operator selection and uncertainty quantification. We develop an expectation propagation expectation-maximization (EP-EM) algorithm for efficient posterior inference and function estimation. To overcome the computational challenge of kernel regression, we place the function values on a mesh and induce a Kronecker product construction, and we use tensor algebra methods to enable efficient computation and optimization. We show the significant advantages of KBASS on a list of benchmark ODE and PDE discovery tasks.
Local Structure-Preserving Relaxation Method for Charged Systems on Unstructured Meshes
Authors: Authors: Zhonghua Qiao, Zhenli Xu, Qian Yin, Shenggao Zhou
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.05411
Pdf link: https://arxiv.org/pdf/2310.05411
Abstract This work considers charged systems described by the modified Poisson--Nernst--Planck (PNP) equations, which incorporate ionic steric effects and the Born solvation energy for dielectric inhomogeneity. Solving the steady-state modified PNP equations poses numerical challenges due to the emergence of sharp boundary layers caused by small Debye lengths, particularly when local ionic concentrations reach saturation. To address this, we first reformulate the steady-state problem as a constraint optimization, where the ionic concentrations on unstructured Delaunay nodes are treated as fractional particles moving along edges between nodes. The electric fields are then updated to minimize the objective free energy while satisfying the discrete Gauss's law. We develop a local relaxation method on unstructured meshes that inherently respects the discrete Gauss's law, ensuring curl-free electric fields. Numerical analysis demonstrates that the optimal mass of the moving fractional particles guarantees the positivity of both ionic and solvent concentrations. Additionally, the free energy of the charged system consistently decreases during successive updates of ionic concentrations and electric fields. We conduct numerical tests to validate the expected numerical accuracy, positivity, free-energy dissipation, and robustness of our method in simulating charged systems with sharp boundary layers.
Waveform Design for MIMO-OFDM Integrated Sensing and Communication System: An Information Theoretical Approach
Authors: Authors: Zhiqing Wei, Jinghui Piao, Xin Yuan, Huici Wu, J. Andrew Zhang, Zhiyong Feng, Lin Wang, Ping Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.05444
Pdf link: https://arxiv.org/pdf/2310.05444
Abstract Integrated sensing and communication (ISAC) is regarded as the enabling technology in the future 5th-Generation-Advanced (5G-A) and 6th-Generation (6G) mobile communication system. ISAC waveform design is critical in ISAC system. However, the difference of the performance metrics between sensing and communication brings challenges for the ISAC waveform design. This paper applies the unified performance metrics in information theory, namely mutual information (MI), to measure the communication and sensing performance in multicarrier ISAC system. In multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) ISAC system, we first derive the sensing and communication MI with subcarrier correlation and spatial correlation. Then, we propose optimal waveform designs for maximizing the sensing MI, communication MI and the weighted sum of sensing and communication MI, respectively. The optimization results are validated by Monte Carlo simulations. Our work provides effective closed-form expressions for waveform design, enabling the realization of MIMO-OFDM ISAC system with balanced performance in communication and sensing.
Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective
Authors: Authors: Ricardo Knauer, Erik Rodner
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05464
Pdf link: https://arxiv.org/pdf/2310.05464
Abstract A key challenge in machine learning is to design interpretable models that can reduce their inputs to the best subset for making transparent predictions, especially in the clinical domain. In this work, we propose a certifiably optimal feature selection procedure for logistic regression from a mixed-integer conic optimization perspective that can take an auxiliary cost to obtain features into account. Based on an extensive review of the literature, we carefully create a synthetic dataset generator for clinical prognostic model research. This allows us to systematically evaluate different heuristic and optimal cardinality- and budget-constrained feature selection procedures. The analysis shows key limitations of the methods for the low-data regime and when confronted with label noise. Our paper not only provides empirical recommendations for suitable methods and dataset designs, but also paves the way for future research in the area of meta-learning.
Vibroacoustic Frequency Response Prediction with Query-based Operator Networks
Authors: Authors: Jan van Delden, Julius Schultz, Christopher Blech, Sabine C. Langer, Timo Lüddecke
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05469
Pdf link: https://arxiv.org/pdf/2310.05469
Abstract Understanding vibroacoustic wave propagation in mechanical structures like airplanes, cars and houses is crucial to ensure health and comfort of their users. To analyze such systems, designers and engineers primarily consider the dynamic response in the frequency domain, which is computed through expensive numerical simulations like the finite element method. In contrast, data-driven surrogate models offer the promise of speeding up these simulations, thereby facilitating tasks like design optimization, uncertainty quantification, and design space exploration. We present a structured benchmark for a representative vibroacoustic problem: Predicting the frequency response for vibrating plates with varying forms of beadings. The benchmark features a total of 12,000 plate geometries with an associated numerical solution and introduces evaluation metrics to quantify the prediction quality. To address the frequency response prediction task, we propose a novel frequency query operator model, which is trained to map plate geometries to frequency response functions. By integrating principles from operator learning and implicit models for shape encoding, our approach effectively addresses the prediction of resonance peaks of frequency responses. We evaluate the method on our vibrating-plates benchmark and find that it outperforms DeepONets, Fourier Neural Operators and more traditional neural network architectures. The code and dataset are available from https://eckerlab.org/code/delden2023_plate.
Collective Graph Exploration Parameterized by Vertex Cover
Authors: Authors: Siddharth Gupta, Guy Sa'ar, Meirav Zehavi
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2310.05480
Pdf link: https://arxiv.org/pdf/2310.05480
Abstract We initiate the study of the parameterized complexity of the {\sc Collective Graph Exploration} ({\sc CGE}) problem. In {\sc CGE}, the input consists of an undirected connected graph $G$ and a collection of $k$ robots, initially placed at the same vertex $r$ of $G$, and each one of them has an energy budget of $B$. The objective is to decide whether $G$ can be \emph{explored} by the $k$ robots in $B$ time steps, i.e., there exist $k$ closed walks in $G$, one corresponding to each robot, such that every edge is covered by at least one walk, every walk starts and ends at the vertex $r$, and the maximum length of any walk is at most $B$. Unfortunately, this problem is \textsf{NP}-hard even on trees [Fraigniaud {\em et~al.}, 2006]. Further, we prove that the problem remains \textsf{W[1]}-hard parameterized by $k$ even for trees of treedepth $3$. Due to the \textsf{para-NP}-hardness of the problem parameterized by treedepth, and motivated by real-world scenarios, we study the parameterized complexity of the problem parameterized by the vertex cover number ($\mathsf{vc}$) of the graph, and prove that the problem is fixed-parameter tractable (\textsf{FPT}) parameterized by $\mathsf{vc}$. Additionally, we study the optimization version of {\sc CGE}, where we want to optimize $B$, and design an approximation algorithm with an additive approximation factor of $O(\mathsf{vc})$.
Geometry-Guided Ray Augmentation for Neural Surface Reconstruction with Sparse Views
Authors: Authors: Jiawei Yao, Chen Wang, Tong Wu, Chuming Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.05483
Pdf link: https://arxiv.org/pdf/2310.05483
Abstract In this paper, we propose a novel method for 3D scene and object reconstruction from sparse multi-view images. Different from previous methods that leverage extra information such as depth or generalizable features across scenes, our approach leverages the scene properties embedded in the multi-view inputs to create precise pseudo-labels for optimization without any prior training. Specifically, we introduce a geometry-guided approach that improves surface reconstruction accuracy from sparse views by leveraging spherical harmonics to predict the novel radiance while holistically considering all color observations for a point in the scene. Also, our pipeline exploits proxy geometry and correctly handles the occlusion in generating the pseudo-labels of radiance, which previous image-warping methods fail to avoid. Our method, dubbed Ray Augmentation (RayAug), achieves superior results on DTU and Blender datasets without requiring prior training, demonstrating its effectiveness in addressing the problem of sparse view reconstruction. Our pipeline is flexible and can be integrated into other implicit neural reconstruction methods for sparse views.
A Neural Tangent Kernel View on Federated Averaging for Deep Linear Neural Network
Authors: Authors: Xin Liu, Dazhi Zhan, Wei Tao, Xin Ma, Yu Pan, Yu Ding, Zhisong Pan
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.05495
Pdf link: https://arxiv.org/pdf/2310.05495
Abstract Federated averaging (FedAvg) is a widely employed paradigm for collaboratively training models from distributed clients without sharing data. Nowadays, the neural network has achieved remarkable success due to its extraordinary performance, which makes it a preferred choice as the model in FedAvg. However, the optimization problem of the neural network is often non-convex even non-smooth. Furthermore, FedAvg always involves multiple clients and local updates, which results in an inaccurate updating direction. These properties bring difficulties in analyzing the convergence of FedAvg in training neural networks. Recently, neural tangent kernel (NTK) theory has been proposed towards understanding the convergence of first-order methods in tackling the non-convex problem of neural networks. The deep linear neural network is a classical model in theoretical subject due to its simple formulation. Nevertheless, there exists no theoretical result for the convergence of FedAvg in training the deep linear neural network. By applying NTK theory, we make a further step to provide the first theoretical guarantee for the global convergence of FedAvg in training deep linear neural networks. Specifically, we prove FedAvg converges to the global minimum at a linear rate $\mathcal{O}\big((1-\eta K /N)^t\big)$, where $t$ is the number of iterations, $\eta$ is the learning rate, $N$ is the number of clients and $K$ is the number of local updates. Finally, experimental evaluations on two benchmark datasets are conducted to empirically validate the correctness of our theoretical findings.
AbCD: A Component-wise Adjustable Framework for Dynamic Optimization Problems
Authors: Authors: Alexandre Mascarenhas, Yuri Lavinas, Claus Aranha
Subjects: Neural and Evolutionary Computing (cs.NE); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2310.05505
Pdf link: https://arxiv.org/pdf/2310.05505
Abstract Dynamic Optimization Problems (DOPs) are characterized by changes in the fitness landscape that can occur at any time and are common in real world applications. The main issues to be considered include detecting the change in the fitness landscape and reacting in accord. Over the years, several evolutionary algorithms have been proposed to take into account this characteristic during the optimization process. However, the number of available tools or open source codebases for these approaches is limited, making reproducibility and extensive experimentation difficult. To solve this, we developed a component-oriented framework for DOPs called Adjustable Components for Dynamic Problems (AbCD), inspired by similar works in the Multiobjective static domain. Using this framework, we investigate components that were proposed in several popular DOP algorithms. Our experiments show that the performance of these components depends on the problem and the selected components used in a configuration, which differs from the results reported in the literature. Using irace, we demonstrate how this framework can automatically generate DOP algorithm configurations that take into account the characteristics of the problem to be solved. Our results highlight existing problems in the DOP field that need to be addressed in the future development of algorithms and components.
One Problem, One Solution: Unifying Robot and Environment Design Optimization
Authors: Authors: Jan Baumgärtner, Gajanan Kanagalingam, Alexander Puchtaand Jürgen Fleischer
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.05520
Pdf link: https://arxiv.org/pdf/2310.05520
Abstract The task-specific optimization of robotic systems has long been divided into the optimization of the robot and the optimization of the environment. In this letter, we argue that these two problems are interdependent and should be treated as such. To this end, we present a unified problem formulation that enables for the simultaneous optimization of both the robot kinematics and the environment. We demonstrate the effectiveness of our approach by jointly optimizing a robotic milling system. To compare our approach to the state of the art we also optimize the robot kinematics and environment separately. The results show that our approach outperforms the state of the art and that simultaneous optimization leads to a much better solution.
Geometry-Aware Safety-Critical Local Reactive Controller for Robot Navigation in Unknown and Cluttered Environments
Authors: Authors: Yulin Li, Xindong Tang, Kai Chen, Chunxin Zheng, Haichao Liu, Jun Ma
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.05547
Pdf link: https://arxiv.org/pdf/2310.05547
Abstract This work proposes a safety-critical local reactive controller that enables the robot to navigate in unknown and cluttered environments. In particular, the trajectory tracking task is formulated as a constrained polynomial optimization problem. Then, safety constraints are imposed on the control variables invoking the notion of polynomial positivity certificates in conjunction with their Sum-of-Squares (SOS) approximation, thereby confining the robot motion inside the locally extracted convex free region. It is noteworthy that, in the process of devising the proposed safety constraints, the geometry of the robot can be approximated using any shape that can be characterized with a set of polynomial functions. The optimization problem is further convexified into a semidefinite program (SDP) leveraging truncated multi-sequences (tms) and moment relaxation, which favorably facilitates the effective use of off-the-shelf conic programming solvers, such that real-time performance is attainable. Various robot navigation tasks are investigated to demonstrate the effectiveness of the proposed approach in terms of safety and tracking performance.
EdgeAISim: A Toolkit for Simulation and Modelling of AI Models in Edge Computing Environments
Authors: Authors: Aadharsh Roshan Nandhakumar, Ayush Baranwal, Priyanshukumar Choudhary, Muhammed Golec, Sukhpal Singh Gill
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.05605
Pdf link: https://arxiv.org/pdf/2310.05605
Abstract To meet next-generation IoT application demands, edge computing moves processing power and storage closer to the network edge to minimise latency and bandwidth utilisation. Edge computing is becoming popular as a result of these benefits, but resource management is still challenging. Researchers are utilising AI models to solve the challenge of resource management in edge computing systems. However, existing simulation tools are only concerned with typical resource management policies, not the adoption and implementation of AI models for resource management, especially. Consequently, researchers continue to face significant challenges, making it hard and time-consuming to use AI models when designing novel resource management policies for edge computing with existing simulation tools. To overcome these issues, we propose a lightweight Python-based toolkit called EdgeAISim for the simulation and modelling of AI models for designing resource management policies in edge computing environments. In EdgeAISim, we extended the basic components of the EdgeSimPy framework and developed new AI-based simulation models for task scheduling, energy management, service migration, network flow scheduling, and mobility support for edge computing environments. In EdgeAISim, we have utilised advanced AI models such as Multi-Armed Bandit with Upper Confidence Bound, Deep Q-Networks, Deep Q-Networks with Graphical Neural Network, and ActorCritic Network to optimize power usage while efficiently managing task migration within the edge computing environment. The performance of these proposed models of EdgeAISim is compared with the baseline, which uses a worst-fit algorithm-based resource management policy in different settings. Experimental results indicate that EdgeAISim exhibits a substantial reduction in power consumption, highlighting the compelling success of power optimization strategies in EdgeAISim.
Making Scalable Meta Learning Practical
Authors: Authors: Sang Keun Choe, Sanket Vaibhav Mehta, Hwijeen Ahn, Willie Neiswanger, Pengtao Xie, Emma Strubell, Eric Xing
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.05674
Pdf link: https://arxiv.org/pdf/2310.05674
Abstract Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e., learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Evaluated on multiple large-scale meta learning benchmarks, SAMA showcases up to 1.7/4.8x increase in throughput and 2.0/3.8x decrease in memory consumption respectively on single-/multi-GPU setups compared to other baseline meta learning algorithms. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains.
Climate-sensitive Urban Planning through Optimization of Tree Placements
Authors: Authors: Simon Schrodi, Ferdinand Briegel, Max Argus, Andreas Christen, Thomas Brox
Subjects: Computer Vision and Pattern Recognition (cs.CV); Atmospheric and Oceanic Physics (physics.ao-ph)
Arxiv link: https://arxiv.org/abs/2310.05691
Pdf link: https://arxiv.org/pdf/2310.05691
Abstract Climate change is increasing the intensity and frequency of many extreme weather events, including heatwaves, which results in increased thermal discomfort and mortality rates. While global mitigation action is undoubtedly necessary, so is climate adaptation, e.g., through climate-sensitive urban planning. Among the most promising strategies is harnessing the benefits of urban trees in shading and cooling pedestrian-level environments. Our work investigates the challenge of optimal placement of such trees. Physical simulations can estimate the radiative and thermal impact of trees on human thermal comfort but induce high computational costs. This rules out optimization of tree placements over large areas and considering effects over longer time scales. Hence, we employ neural networks to simulate the point-wise mean radiant temperatures--a driving factor of outdoor human thermal comfort--across various time scales, spanning from daily variations to extended time scales of heatwave events and even decades. To optimize tree placements, we harness the innate local effect of trees within the iterated local search framework with tailored adaptations. We show the efficacy of our approach across a wide spectrum of study areas and time scales. We believe that our approach is a step towards empowering decision-makers, urban designers and planners to proactively and effectively assess the potential of urban trees to mitigate heat stress.
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Authors: Authors: Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, Erik Cambria
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.05694
Pdf link: https://arxiv.org/pdf/2310.05694
Abstract The utilization of large language models (LLMs) in the Healthcare domain has generated both excitement and concern due to their ability to effectively respond to freetext queries with certain professional knowledge. This survey outlines the capabilities of the currently developed LLMs for Healthcare and explicates their development process, with the aim of providing an overview of the development roadmap from traditional Pretrained Language Models (PLMs) to LLMs. Specifically, we first explore the potential of LLMs to enhance the efficiency and effectiveness of various Healthcare applications highlighting both the strengths and limitations. Secondly, we conduct a comparison between the previous PLMs and the latest LLMs, as well as comparing various LLMs with each other. Then we summarize related Healthcare training data, training methods, optimization strategies, and usage. Finally, the unique concerns associated with deploying LLMs in Healthcare settings are investigated, particularly regarding fairness, accountability, transparency and ethics. Our survey provide a comprehensive investigation from perspectives of both computer science and Healthcare specialty. Besides the discussion about Healthcare concerns, we supports the computer science community by compiling a collection of open source resources, such as accessible datasets, the latest methodologies, code implementations, and evaluation benchmarks in the Github. Summarily, we contend that a significant paradigm shift is underway, transitioning from PLMs to LLMs. This shift encompasses a move from discriminative AI approaches to generative AI approaches, as well as a shift from model-centered methodologies to datacentered methodologies.
Large-Scale OD Matrix Estimation with A Deep Learning Method
Authors: Authors: Zheli Xiong, Defu Lian, Enhong Chen, Gang Chen, Xiaomin Cheng
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.05753
Pdf link: https://arxiv.org/pdf/2310.05753
Abstract The estimation of origin-destination (OD) matrices is a crucial aspect of Intelligent Transport Systems (ITS). It involves adjusting an initial OD matrix by regressing the current observations like traffic counts of road sections (e.g., using least squares). However, the OD estimation problem lacks sufficient constraints and is mathematically underdetermined. To alleviate this problem, some researchers incorporate a prior OD matrix as a target in the regression to provide more structural constraints. However, this approach is highly dependent on the existing prior matrix, which may be outdated. Others add structural constraints through sensor data, such as vehicle trajectory and speed, which can reflect more current structural constraints in real-time. Our proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization. This approach combines the advantages of both deep learning and numerical optimization algorithms. The neural network(NN) learns to infer structural constraints from probe traffic flows, eliminating dependence on prior information and providing real-time performance. Additionally, due to the generalization capability of NN, this method is economical in engineering. We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset. Subsequently, we verified the stability of our method on real traffic data. Our experiments provided confirmation of the benefits of combining NN and numerical optimization.
Deep Concept Removal
Authors: Authors: Yegor Klochkov, Jean-Francois Ton, Ruocheng Guo, Yang Liu, Hang Li
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05755
Pdf link: https://arxiv.org/pdf/2310.05755
Abstract We address the problem of concept removal in deep neural networks, aiming to learn representations that do not encode certain specified concepts (e.g., gender etc.) We propose a novel method based on adversarial linear classifiers trained on a concept dataset, which helps to remove the targeted attribute while maintaining model performance. Our approach Deep Concept Removal incorporates adversarial probing classifiers at various layers of the network, effectively addressing concept entanglement and improving out-of-distribution generalization. We also introduce an implicit gradient-based technique to tackle the challenges associated with adversarial training using linear classifiers. We evaluate the ability to remove a concept on a set of popular distributionally robust optimization (DRO) benchmarks with spurious correlations, as well as out-of-distribution (OOD) generalization tasks.
FMM-Head: Enhancing Autoencoder-based ECG anomaly detection with prior knowledge
Authors: Authors: Giacomo Verardo, Magnus Boman, Samuel Bruchfeld, Marco Chiesa, Sabine Koch, Gerald Q. Maguire Jr., Dejan Kostic
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.05848
Pdf link: https://arxiv.org/pdf/2310.05848
Abstract Detecting anomalies in electrocardiogram data is crucial to identifying deviations from normal heartbeat patterns and providing timely intervention to at-risk patients. Various AutoEncoder models (AE) have been proposed to tackle the anomaly detection task with ML. However, these models do not consider the specific patterns of ECG leads and are unexplainable black boxes. In contrast, we replace the decoding part of the AE with a reconstruction head (namely, FMM-Head) based on prior knowledge of the ECG shape. Our model consistently achieves higher anomaly detection capabilities than state-of-the-art models, up to 0.31 increase in area under the ROC curve (AUROC), with as little as half the original model size and explainable extracted features. The processing time of our model is four orders of magnitude lower than solving an optimization problem to obtain the same parameters, thus making it suitable for real-time ECG parameters extraction and anomaly detection.
A Meta-Learning Perspective on Transformers for Causal Language Modeling
Authors: Authors: Xinbo Wu, Lav R. Varshney
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.05884
Pdf link: https://arxiv.org/pdf/2310.05884
Abstract The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view of the Transformer architecture when trained for the causal language modeling task, by explicating an inner optimization process that may happen within the Transformer. Further, from within the inner optimization, we discover and theoretically analyze a special characteristic of the norms of learned token representations within Transformer-based causal language models. Our analysis is supported by experiments conducted on pre-trained large language models and real-world data.
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
Authors: Authors: Lizhang Chen, Bo Liu, Kaizhao Liang, Qiang Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Applications (stat.AP); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.05898
Pdf link: https://arxiv.org/pdf/2310.05898
Abstract Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $|x|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
Keyword: adam

Combining UPerNet and ConvNeXt for Contrails Identification to reduce Global Warming
Authors: Authors: Zhenkuan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.04808
Pdf link: https://arxiv.org/pdf/2310.04808
Abstract Semantic segmentation is a critical tool in computer vision, applied in various domains like autonomous driving and medical imaging. This study focuses on aircraft contrail detection in global satellite images to improve contrail models and mitigate their impact on climate change.An innovative data preprocessing technique for NOAA GOES-16 satellite images is developed, using brightness temperature data from the infrared channel to create false-color images, enhancing model perception. To tackle class imbalance, the training dataset exclusively includes images with positive contrail labels.The model selection is based on the UPerNet architecture, implemented using the MMsegmentation library, with the integration of two ConvNeXt configurations for improved performance. Cross-entropy loss with positive class weights enhances contrail recognition. Fine-tuning employs the AdamW optimizer with a learning rate of $2.5 \times 10^{-4}$.During inference, a multi-model prediction fusion strategy and a contrail determination threshold of 0.75 yield a binary prediction mask. RLE encoding is used for efficient prediction result organization.The approach achieves exceptional results, boasting a high Dice coefficient score, placing it in the top 5\% of participating teams. This underscores the innovative nature of the segmentation model and its potential for enhanced contrail recognition in satellite imagery.For further exploration, the code and models are available on GitHub: \url{https://github.com/biluko/2023GRIC.git}.
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
Authors: Authors: Lizhang Chen, Bo Liu, Kaizhao Liang, Qiang Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Applications (stat.AP); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.05898
Pdf link: https://arxiv.org/pdf/2310.05898
Abstract Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $|x|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
Keyword: gradient

Training-free Linear Image Inversion via Flows
Authors: Authors: Ashwini Pokle, Matthew J. Muckley, Ricky T. Q. Chen, Brian Karrer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04432
Pdf link: https://arxiv.org/pdf/2310.04432
Abstract Training-free linear inversion involves the use of a pretrained generative model and -- through appropriate modifications to the generation process -- solving inverse problems without any finetuning of the generative model. While recent prior methods have explored the use of diffusion models, they still require the manual tuning of many hyperparameters for different inverse problems. In this work, we propose a training-free method for image inversion using pretrained flow models, leveraging the simplicity and efficiency of Flow Matching models, using theoretically-justified weighting schemes and thereby significantly reducing the amount of manual tuning. In particular, we draw inspiration from two main sources: adopting prior gradient correction methods to the flow regime, and a solver scheme based on conditional Optimal Transport paths. As pretrained diffusion models are widely accessible, we also show how to practically adapt diffusion models for our method. Empirically, our approach requires no problem-specific tuning across an extensive suite of noisy linear image inversion problems on high-dimensional datasets, ImageNet-64/128 and AFHQ-256, and we observe that our flow-based method for image inversion significantly improves upon closely-related diffusion-based linear inversion methods.
Generative Diffusion From An Action Principle
Authors: Authors: Akhil Premkumar
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04490
Pdf link: https://arxiv.org/pdf/2310.04490
Abstract Generative diffusion models synthesize new samples by reversing a diffusive process that converts a given data set to generic noise. This is accomplished by training a neural network to match the gradient of the log of the probability distribution of a given data set, also called the score. By casting reverse diffusion as an optimal control problem, we show that score matching can be derived from an action principle, like the ones commonly used in physics. We use this insight to demonstrate the connection between different classes of diffusion models.
Utilizing Free Clients in Federated Learning for Focused Model Enhancement
Authors: Authors: Aditya Narayan Ravi, Ilan Shomorony
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.04515
Pdf link: https://arxiv.org/pdf/2310.04515
Abstract Federated Learning (FL) is a distributed machine learning approach to learn models on decentralized heterogeneous data, without the need for clients to share their data. Many existing FL approaches assume that all clients have equal importance and construct a global objective based on all clients. We consider a version of FL we call Prioritized FL, where the goal is to learn a weighted mean objective of a subset of clients, designated as priority clients. An important question arises: How do we choose and incentivize well aligned non priority clients to participate in the federation, while discarding misaligned clients? We present FedALIGN (Federated Adaptive Learning with Inclusion of Global Needs) to address this challenge. The algorithm employs a matching strategy that chooses non priority clients based on how similar the models loss is on their data compared to the global data, thereby ensuring the use of non priority client gradients only when it is beneficial for priority clients. This approach ensures mutual benefits as non priority clients are motivated to join when the model performs satisfactorily on their data, and priority clients can utilize their updates and computational resources when their goals align. We present a convergence analysis that quantifies the trade off between client selection and speed of convergence. Our algorithm shows faster convergence and higher test accuracy than baselines for various synthetic and benchmark datasets.
Generating Less Certain Adversarial Examples Improves Robust Generalization
Authors: Authors: Minxing Zhang, Michael Backes, Xiao Zhang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04539
Pdf link: https://arxiv.org/pdf/2310.04539
Abstract Recent studies have shown that deep neural networks are vulnerable to adversarial examples. Numerous defenses have been proposed to improve model robustness, among which adversarial training is most successful. In this work, we revisit the robust overfitting phenomenon. In particular, we argue that overconfident models produced during adversarial training could be a potential cause, supported by the empirical observation that the predicted labels of adversarial examples generated by models with better robust generalization ability tend to have significantly more even distributions. Based on the proposed definition of adversarial certainty, we incorporate an extragradient step in the adversarial training framework to search for models that can generate adversarially perturbed inputs with lower certainty, further improving robust generalization. Our approach is general and can be easily combined with other variants of adversarial training methods. Extensive experiments on image benchmarks demonstrate that our method effectively alleviates robust overfitting and is able to produce models with consistently improved robustness.
DragD3D: Vertex-based Editing for Realistic Mesh Deformations using 2D Diffusion Priors
Authors: Authors: Tianhao Xie, Eugene Belilovsky, Sudhir Mudur, Tiberiu Popa
Subjects: Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04561
Pdf link: https://arxiv.org/pdf/2310.04561
Abstract Direct mesh editing and deformation are key components in the geometric modeling and animation pipeline. Direct mesh editing methods are typically framed as optimization problems combining user-specified vertex constraints with a regularizer that determines the position of the rest of the vertices. The choice of the regularizer is key to the realism and authenticity of the final result. Physics and geometry-based regularizers are not aware of the global context and semantics of the object, and the more recent deep learning priors are limited to a specific class of 3D object deformations. In this work, our main contribution is a local mesh editing method called DragD3D for global context-aware realistic deformation through direct manipulation of a few vertices. DragD3D is not restricted to any class of objects. It achieves this by combining the classic geometric ARAP (as rigid as possible) regularizer with 2D priors obtained from a large-scale diffusion model. Specifically, we render the objects from multiple viewpoints through a differentiable renderer and use the recently introduced DDS loss which scores the faithfulness of the rendered image to one from a diffusion model. DragD3D combines the approximate gradients of the DDS with gradients from the ARAP loss to modify the mesh vertices via neural Jacobian field, while also satisfying vertex constraints. We show that our deformations are realistic and aware of the global context of the objects, and provide better results than just using geometric regularizers.
An Algorithm to Train Unrestricted Sequential Discrete Morphological Neural Networks
Authors: Authors: Diego Marcondes, Mariana Feldman, Junior Barrera
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.04584
Pdf link: https://arxiv.org/pdf/2310.04584
Abstract With the advent of deep learning, there have been attempts to insert mathematical morphology (MM) operators into convolutional neural networks (CNN), and the most successful endeavor to date has been the morphological neural networks (MNN). Although MNN have performed better than CNN in solving some problems, they inherit their black-box nature. Furthermore, in the case of binary images, they are approximations, which loose the Boolean lattice structure of MM operators and, thus, it is not possible to represent a specific class of W-operators with desired properties. In a recent work, we proposed the Discrete Morphological Neural Networks (DMNN) for binary image transformation to represent specific classes of W-operators and estimate them via machine learning. We also proposed a stochastic lattice gradient descent algorithm (SLGDA) to learn the parameters of Canonical Discrete Morphological Neural Networks (CDMNN), whose architecture is composed only of operators that can be decomposed as the supremum, infimum, and complement of erosions and dilations. In this paper, we propose an algorithm to learn unrestricted sequential DMNN (USDMNN), whose architecture is given by the composition of general W-operators. We consider the representation of a W-operator by its characteristic Boolean function, and then learn it via a SLGDA in the Boolean lattice of functions. Although both the CDMNN and USDMNN have the Boolean lattice structure, USDMNN are not as dependent on prior information about the problem at hand, and may be more suitable in instances in which the practitioner does not have strong domain knowledge. We illustrate the algorithm in a practical example.
Discrete energy balance equation via a symplectic second-order method for two-phase flow in porous media
Authors: Authors: Giselle Sosa Jones, Catalin Trenchea
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.04602
Pdf link: https://arxiv.org/pdf/2310.04602
Abstract We propose and analyze a second-order partitioned time-stepping method for a two-phase flow problem in porous media. The algorithm is based on a refactorization of Cauchy's one-leg $\theta$-method. The main part consists of the implicit backward Euler method on $[t^n, t^{n+\theta}]$, while part two uses a linear extrapolation on $[t^{n+\theta},t^{n+1}]$ to obtain the solution at $t^{n+1}$, equivalent to the forward Euler method. In the backward Euler step, the decoupled equations are solved iteratively. We prove that the iterations converge linearly to the solution of the coupled problem, under some conditions on the data. When $\theta = 1/2$, the algorithm is equivalent to the symplectic midpoint method. In the absence of the chain rule for time-discrete setting, we approximate the change in the free energy by the product of a second-order accurate discrete gradient (chemical potential) and the one-step increment of the state variables. Similar to the continuous case, we also prove a discrete Gibbs free energy balance equation, without numerical dissipation. In the numerical tests we compare this implicit midpoint method with the classic backward Euler method, and two implicit-explicit time-lagging schemes. The midpoint method outperforms the other schemes in terms of rates of convergence, long-time behavior and energy approximation, for small and large values of the time step.
PriViT: Vision Transformers for Fast Private Inference
Authors: Authors: Naren Dhyani, Jianqiao Mo, Minsu Cho, Ameya Joshi, Siddharth Garg, Brandon Reagen, Chinmay Hegde
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04604
Pdf link: https://arxiv.org/pdf/2310.04604
Abstract The Vision Transformer (ViT) architecture has emerged as the backbone of choice for state-of-the-art deep models for computer vision applications. However, ViTs are ill-suited for private inference using secure multi-party computation (MPC) protocols, due to the large number of non-polynomial operations (self-attention, feed-forward rectifiers, layer normalization). We propose PriViT, a gradient based algorithm to selectively "Taylorize" nonlinearities in ViTs while maintaining their prediction accuracy. Our algorithm is conceptually simple, easy to implement, and achieves improved performance over existing approaches for designing MPC-friendly transformer architectures in terms of achieving the Pareto frontier in latency-accuracy. We confirm these improvements via experiments on several standard image classification tasks. Public code is available at https://github.com/NYU-DICE-Lab/privit.
A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators
Authors: Authors: Murali Emani, Sam Foreman, Varuni Sastry, Zhen Xie, Siddhisanket Raskar, William Arnold, Rajeev Thakur, Venkatram Vishwanath, Michael E. Papka
Subjects: Performance (cs.PF); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04607
Pdf link: https://arxiv.org/pdf/2310.04607
Abstract Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered as a promising approach to address some of the challenging problems because of their superior generalization capabilities across domains. The effectiveness of the models and the accuracy of the applications is contingent upon their efficient execution on the underlying hardware infrastructure. Specialized AI accelerator hardware systems have recently become available for accelerating AI applications. However, the comparative performance of these AI accelerators on large language models has not been previously studied. In this paper, we systematically study LLMs on multiple AI accelerators and GPUs and evaluate their performance characteristics for these models. We evaluate these systems with (i) a micro-benchmark using a core transformer block, (ii) a GPT- 2 model, and (iii) an LLM-driven science use case, GenSLM. We present our findings and analyses of the models' performance to better understand the intrinsic capabilities of AI accelerators. Furthermore, our analysis takes into account key factors such as sequence lengths, scaling behavior, sparsity, and sensitivity to gradient accumulation steps.
X-Transfer: A Transfer Learning-Based Framework for Robust GAN-Generated Fake Image Detection
Authors: Authors: Lei Zhang, Hao Chen, Shu Hu, Bin Zhu, Xi Wu, Jinrong Hu, Xin Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.04639
Pdf link: https://arxiv.org/pdf/2310.04639
Abstract Generative adversarial networks (GANs) have remarkably advanced in diverse domains, especially image generation and editing. However, the misuse of GANs for generating deceptive images raises significant security concerns, including face replacement and fake accounts, which have gained widespread attention. Consequently, there is an urgent need for effective detection methods to distinguish between real and fake images. Some of the current research centers around the application of transfer learning. Nevertheless, it encounters challenges such as knowledge forgetting from the original dataset and inadequate performance when dealing with imbalanced data during training. To alleviate the above issues, this paper introduces a novel GAN-generated image detection algorithm called X-Transfer. This model enhances transfer learning by utilizing two sibling neural networks that employ interleaved parallel gradient transmission. This approach also effectively mitigates the problem of excessive knowledge forgetting. In addition, we combine AUC loss term and cross-entropy loss to enhance the model's performance comprehensively. The AUC loss approximates the AUC metric using WMW statistics, ensuring differentiability and improving the performance of traditional AUC evaluation. We carry out comprehensive experiments on multiple facial image datasets. The results show that our model outperforms the general transferring approach, and the best accuracy achieves 99.04%, which is increased by approximately 10%. Furthermore, we demonstrate excellent performance on non-face datasets, validating its generality and broader application prospects.
Balancing stability and plasticity in continual learning: the readout-decomposition of activation change (RDAC) framework
Authors: Authors: Daniel Anthes, Sushrut Thorat, Tim C. Kietzmann, Peter König
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2310.04741
Pdf link: https://arxiv.org/pdf/2310.04741
Abstract Continual learning (CL) algorithms strive to acquire new knowledge while preserving prior information. However, this stability-plasticity trade-off remains a central challenge. This paper introduces a framework that dissects this trade-off, offering valuable insights into CL algorithms. The Readout-Decomposition of Activation Change (RDAC) framework first addresses the stability-plasticity dilemma and its relation to catastrophic forgetting. It relates learning-induced activation changes in the range of prior readouts to the degree of stability and changes in the null space to the degree of plasticity. In deep non-linear networks tackling split-CIFAR-110 tasks, the framework clarifies the stability-plasticity trade-offs of the popular regularization algorithms Synaptic intelligence (SI), Elastic-weight consolidation (EWC), and learning without Forgetting (LwF), and replay-based algorithms Gradient episodic memory (GEM), and data replay. GEM and data replay preserved stability and plasticity, while SI, EWC, and LwF traded off plasticity for stability. The inability of the regularization algorithms to maintain plasticity was linked to them restricting the change of activations in the null space of the prior readout. Additionally, for one-hidden-layer linear neural networks, we derived a gradient decomposition algorithm to restrict activation change only in the range of the prior readouts, to maintain high stability while not further sacrificing plasticity. Results demonstrate that the algorithm maintained stability without significant plasticity loss. The RDAC framework informs the behavior of existing CL algorithms and paves the way for novel CL approaches. Finally, it sheds light on the connection between learning-induced activation/representation changes and the stability-plasticity dilemma, also offering insights into representational drift in biological systems.
Rethink Baseline of Integrated Gradients from the Perspective of Shapley Value
Authors: Authors: Shuyang Liu, Zixuan Chen, Shi Ge, Ji Wang, Changjie Fan, Yu Xiong, Runze Wu Yujing Hu, Ze Ji, Yang Gao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.04821
Pdf link: https://arxiv.org/pdf/2310.04821
Abstract Numerous approaches have attempted to interpret deep neural networks (DNNs) by attributing the prediction of DNN to its input features. One of the well-studied attribution methods is Integrated Gradients (IG). Specifically, the choice of baselines for IG is a critical consideration for generating meaningful and unbiased explanations for model predictions in different scenarios. However, current practice of exploiting a single baseline fails to fulfill this ambition, thus demanding multiple baselines. Fortunately, the inherent connection between IG and Aumann-Shapley Value forms a unique perspective to rethink the design of baselines. Under certain hypothesis, we theoretically analyse that a set of baseline aligns with the coalitions in Shapley Value. Thus, we propose a novel baseline construction method called Shapley Integrated Gradients (SIG) that searches for a set of baselines by proportional sampling to partly simulate the computation path of Shapley Value. Simulations on GridWorld show that SIG approximates the proportion of Shapley Values. Furthermore, experiments conducted on various image tasks demonstrate that compared to IG using other baseline methods, SIG exhibits an improved estimation of feature's contribution, offers more consistent explanations across diverse applications, and is generic to distinct data types or instances with insignificant computational overhead.
Combining Sampling- and Gradient-based Planning for Contact-rich Manipulation
Authors: Authors: Filippo Rozzi, Loris Roveda, Kevin Haninger
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.04822
Pdf link: https://arxiv.org/pdf/2310.04822
Abstract Planning over discontinuous dynamics is needed for robotics tasks like contact-rich manipulation, which presents challenges in the numerical stability and speed of planning methods when either neural network or analytical models are used. On the one hand, sampling-based planners require higher sample complexity in high-dimensional problems and cannot describe safety constraints such as force limits. On the other hand, gradient-based solvers can suffer from local optima and convergence issues when the Hessian is poorly conditioned. We propose a planning method with both sampling- and gradient-based elements, using the Cross-entropy Method to initialize a gradient-based solver, providing better search over local minima and the ability to handle explicit constraints. We show the approach allows smooth, stable contact-rich planning for an impedance-controlled robot making contact with a stiff environment, benchmarking against gradient-only MPC and CEM.
GradXKG: A Universal Explain-per-use Temporal Knowledge Graph Explainer
Authors: Authors: Chenhan Yuan, Hoda Eldardiry
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.04889
Pdf link: https://arxiv.org/pdf/2310.04889
Abstract Temporal knowledge graphs (TKGs) have shown promise for reasoning tasks by incorporating a temporal dimension to represent how facts evolve over time. However, existing TKG reasoning (TKGR) models lack explainability due to their black-box nature. Recent work has attempted to address this through customized model architectures that generate reasoning paths, but these recent approaches have limited generalizability and provide sparse explanatory output. To enable interpretability for most TKGR models, we propose GradXKG, a novel two-stage gradient-based approach for explaining Relational Graph Convolution Network (RGCN)-based TKGR models. First, a Grad-CAM-inspired RGCN explainer tracks gradients to quantify each node's contribution across timesteps in an efficient "explain-per-use" fashion. Second, an integrated gradients explainer consolidates importance scores for RGCN outputs, extending compatibility across diverse TKGR architectures based on RGCN. Together, the two explainers highlight the most critical nodes at each timestep for a given prediction. Our extensive experiments demonstrated that, by leveraging gradient information, GradXKG provides insightful explanations grounded in the model's logic in a timely manner for most RGCN-based TKGR models. This helps address the lack of interpretability in existing TKGR models and provides a universal explanation approach applicable across various models.
Robust Network Pruning With Sparse Entropic Wasserstein Regression
Authors: Authors: Lei You, Hei Victor Cheng
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.04918
Pdf link: https://arxiv.org/pdf/2310.04918
Abstract This study unveils a cutting-edge technique for neural network pruning that judiciously addresses noisy gradients during the computation of the empirical Fisher Information Matrix (FIM). We introduce an entropic Wasserstein regression (EWR) formulation, capitalizing on the geometric attributes of the optimal transport (OT) problem. This is analytically showcased to excel in noise mitigation by adopting neighborhood interpolation across data points. The unique strength of the Wasserstein distance is its intrinsic ability to strike a balance between noise reduction and covariance information preservation. Extensive experiments performed on various networks show comparable performance of the proposed method with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining.
Diff-Transfer: Model-based Robotic Manipulation Skill Transfer via Differentiable Physics Simulation
Authors: Authors: Yuqi Xiang, Feitong Chen, Qinsi Wang, Yang Gang, Xiang Zhang, Xinghao Zhu, Xingyu Liu, Lin Shao
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04930
Pdf link: https://arxiv.org/pdf/2310.04930
Abstract The capability to transfer mastered skills to accomplish a range of similar yet novel tasks is crucial for intelligent robots. In this work, we introduce $\textit{Diff-Transfer}$, a novel framework leveraging differentiable physics simulation to efficiently transfer robotic skills. Specifically, $\textit{Diff-Transfer}$ discovers a feasible path within the task space that brings the source task to the target task. At each pair of adjacent points along this task path, which is two sub-tasks, $\textit{Diff-Transfer}$ adapts known actions from one sub-task to tackle the other sub-task successfully. The adaptation is guided by the gradient information from differentiable physics simulations. We propose a novel path-planning method to generate sub-tasks, leveraging $Q$-learning with a task-level state and reward. We implement our framework in simulation experiments and execute four challenging transfer tasks on robotic manipulation, demonstrating the efficacy of $\textit{Diff-Transfer}$ through comprehensive experiments. Supplementary and Videos are on the website $~\href{https://sites.google.com/view/difftransfer}{https://sites.google.com/view/difftransfer}$
Compositional Semantics for Open Vocabulary Spatio-semantic Representations
Authors: Authors: Robin Karlsson, Francisco Lepe-Salazar, Kazuya Takeda
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.04981
Pdf link: https://arxiv.org/pdf/2310.04981
Abstract General-purpose mobile robots need to complete tasks without exact human instructions. Large language models (LLMs) is a promising direction for realizing commonsense world knowledge and reasoning-based planning. Vision-language models (VLMs) transform environment percepts into vision-language semantics interpretable by LLMs. However, completing complex tasks often requires reasoning about information beyond what is currently perceived. We propose latent compositional semantic embeddings z as a principled learning-based knowledge representation for queryable spatio-semantic memories. We mathematically prove that z can always be found, and the optimal z is the centroid for any set Z. We derive a probabilistic bound for estimating separability of related and unrelated semantics. We prove that z is discoverable by iterative optimization by gradient descent from visual appearance and singular descriptions. We experimentally verify our findings on four embedding spaces incl. CLIP and SBERT. Our results show that z can represent up to 10 semantics encoded by SBERT, and up to 100 semantics for ideal uniformly distributed high-dimensional embeddings. We demonstrate that a simple dense VLM trained on the COCO-Stuff dataset can learn z for 181 overlapping semantics by 42.23 mIoU, while improving conventional non-overlapping open-vocabulary segmentation performance by +3.48 mIoU compared with a popular SOTA model.
High Order Mimetic Symplectic Methods For Hamiltonian Systems
Authors: Authors: Anand Srinivasan, Jose E. Castillo
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.04998
Pdf link: https://arxiv.org/pdf/2310.04998
Abstract Hamiltonian systems are known to conserve the Hamiltonian function, which describes the energy evolution over time. Obtaining a numerical spatio-temporal scheme that accurately preserves the discretized Hamiltonian function is often a challenge. In this paper, the use of high order mimetic spatial schemes is investigated for the numerical solution of Hamiltonian equations. The mimetic operators are based on developing high order discrete analogs of the vector calculus quantities divergence and gradient. The resulting high order operators preserve the properties of their continuum ones, and are therefore said to mimic properties of conservation laws and symmetries. Symplectic fourth order schemes are implemented in this paper for the time integration of Hamiltonian systems. A theoretical framework for the energy preserving nature of the resulting schemes is also presented, followed by numerical examples.
The Reinforce Policy Gradient Algorithm Revisited
Authors: Authors: Shalabh Bhatnagar
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.05000
Pdf link: https://arxiv.org/pdf/2310.05000
Abstract We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with episodic tasks) or from instants of visit to a prescribed recurrent state (in the case of continuing tasks). We propose a major enhancement to the basic algorithm. We estimate the policy gradient using a function measurement over a perturbed parameter by appealing to a class of random search approaches. This has advantages in the case of systems with infinite state and action spaces as it relax some of the regularity requirements that would otherwise be needed for proving convergence of the Reinforce algorithm. Nonetheless, we observe that even though we estimate the gradient of the performance objective using the performance objective itself (and not via the sample gradient), the algorithm converges to a neighborhood of a local minimum. We also provide a proof of convergence for this new algorithm.
Robust-GBDT: A Novel Gradient Boosting Model for Noise-Robust Classification
Authors: Authors: Jiaqi Luo, Yuedong Quan, Shixin Xu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05067
Pdf link: https://arxiv.org/pdf/2310.05067
Abstract Robust boosting algorithms have emerged as alternative solutions to traditional boosting techniques for addressing label noise in classification tasks. However, these methods have predominantly focused on binary classification, limiting their applicability to multi-class tasks. Furthermore, they encounter challenges with imbalanced datasets, missing values, and computational efficiency. In this paper, we establish that the loss function employed in advanced Gradient Boosting Decision Trees (GBDT), particularly Newton's method-based GBDT, need not necessarily exhibit global convexity. Instead, the loss function only requires convexity within a specific region. Consequently, these GBDT models can leverage the benefits of nonconvex robust loss functions, making them resilient to noise. Building upon this theoretical insight, we introduce a new noise-robust boosting model called Robust-GBDT, which seamlessly integrates the advanced GBDT framework with robust losses. Additionally, we enhance the existing robust loss functions and introduce a novel robust loss function, Robust Focal Loss, designed to address class imbalance. As a result, Robust-GBDT generates more accurate predictions, significantly enhancing its generalization capabilities, especially in scenarios marked by label noise and class imbalance. Furthermore, Robust-GBDT is user-friendly and can easily integrate existing open-source code, enabling it to effectively handle complex datasets while improving computational efficiency. Numerous experiments confirm the superiority of Robust-GBDT over other noise-robust methods.
A Privacy-Preserving Trajectory Synthesis Method Based on Vector Translation Invariance Supporting Traffic Constraints
Authors: Authors: Zechen Liu, Wei Song, Yuhan Wang
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.05091
Pdf link: https://arxiv.org/pdf/2310.05091
Abstract With the popularization of different kinds of smart terminals and the development of autonomous driving technology, more and more services based on spatio-temporal data have emerged in our lives, such as online taxi services, traffic flow prediction, and tracking virus propagation. However, the privacy concerns of spatio-temporal data greatly limit the use of them. To address this issue, differential privacy method based on spatio-temporal data has been proposed. In differential privacy, a good aggregation query can highly improve the data utility. But the mainstream aggregation query methods are based on area partitioning, which is difficult to generate trajectory with high utility for they are hard to take time and constraints into account. Motivated by this, we propose an aggregation query based on the relationships between trajectories, so it can greatly improve the data utility as compared to the existing methods. The trajectory synthesis task can be regarded as an optimization problem of finding trajectories that match the relationships between trajectories. We adopt gradient descent to find new trajectories that meet the conditions, and during the gradient descent, we can easily take the constraints into account by adding penalty terms which area partitioning based query is hard to achieve. We carry out extensive experiments to validate that the trajectories generated by our method have higher utility and the theoretic analysis shows that our method is safe and reliable.
Asymmetrically Decentralized Federated Learning
Authors: Authors: Qinglun Li, Miao Zhang, Nan Yin, Quanjun Yin, Li Shen
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.05093
Pdf link: https://arxiv.org/pdf/2310.05093
Abstract To address the communication burden and privacy concerns associated with the centralized server in Federated Learning (FL), Decentralized Federated Learning (DFL) has emerged, which discards the server with a peer-to-peer (P2P) communication framework. However, most existing DFL algorithms are based on symmetric topologies, such as ring and grid topologies, which can easily lead to deadlocks and are susceptible to the impact of network link quality in practice. To address these issues, this paper proposes the DFedSGPSM algorithm, which is based on asymmetric topologies and utilizes the Push-Sum protocol to effectively solve consensus optimization problems. To further improve algorithm performance and alleviate local heterogeneous overfitting in Federated Learning (FL), our algorithm combines the Sharpness Aware Minimization (SAM) optimizer and local momentum. The SAM optimizer employs gradient perturbations to generate locally flat models and searches for models with uniformly low loss values, mitigating local heterogeneous overfitting. The local momentum accelerates the optimization process of the SAM optimizer. Theoretical analysis proves that DFedSGPSM achieves a convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T}})$ in a non-convex smooth setting under mild assumptions. This analysis also reveals that better topological connectivity achieves tighter upper bounds. Empirically, extensive experiments are conducted on the MNIST, CIFAR10, and CIFAR100 datasets, demonstrating the superior performance of our algorithm compared to state-of-the-art optimizers.
Transferable Availability Poisoning Attacks
Authors: Authors: Yiyong Liu, Michael Backes, Xiao Zhang
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05141
Pdf link: https://arxiv.org/pdf/2310.05141
Abstract We consider availability data poisoning attacks, where an adversary aims to degrade the overall test accuracy of a machine learning model by crafting small perturbations to its training data. Existing poisoning strategies can achieve the attack goal but assume the victim to employ the same learning method as what the adversary uses to mount the attack. In this paper, we argue that this assumption is strong, since the victim may choose any learning algorithm to train the model as long as it can achieve some targeted performance on clean data. Empirically, we observe a large decrease in the effectiveness of prior poisoning attacks if the victim uses a different learning paradigm to train the model and show marked differences in frequency-level characteristics between perturbations generated with respect to different learners and attack methods. To enhance the attack transferability, we propose Transferable Poisoning, which generates high-frequency poisoning perturbations by alternately leveraging the gradient information with two specific algorithms selected from supervised and unsupervised contrastive learning paradigms. Through extensive experiments on benchmark image datasets, we show that our transferable poisoning attack can produce poisoned samples with significantly improved transferability, not only applicable to the two learners used to devise the attack but also for learning algorithms and even paradigms beyond.
In-Context Convergence of Transformers
Authors: Authors: Yu Huang, Yuan Cheng, Yingbin Liang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.05249
Pdf link: https://arxiv.org/pdf/2310.05249
Abstract Transformers have recently revolutionized many domains in modern machine learning and one salient discovery is their remarkable in-context learning capability, where models can solve an unseen task by utilizing task-specific prompts without further parameters fine-tuning. This also inspired recent theoretical studies aiming to understand the in-context learning mechanism of transformers, which however focused only on linear transformers. In this work, we take the first step toward studying the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent in order to in-context learn linear function classes. We consider a structured data model, where each token is randomly sampled from a set of feature vectors in either balanced or imbalanced fashion. For data with balanced features, we establish the finite-time convergence guarantee with near-zero prediction error by navigating our analysis over two phases of the training dynamics of the attention map. More notably, for data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process, where the transformer first converges to a near-zero prediction error for the query tokens of dominant features, and then converges later to a near-zero prediction error for the query tokens of under-represented features, respectively via one and four training phases. Our proof features new techniques for analyzing the competing strengths of two types of attention weights, the change of which determines different training phases.
Federated Learning: A Cutting-Edge Survey of the Latest Advancements and Applications
Authors: Authors: Azim Akhtarshenas, Mohammad Ali Vahedifar, Navid Ayoobi, Behrouz Maham, Tohid Alizadeh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2310.05269
Pdf link: https://arxiv.org/pdf/2310.05269
Abstract In the realm of machine learning (ML) systems featuring client-host connections, the enhancement of privacy security can be effectively achieved through federated learning (FL) as a secure distributed ML methodology. FL effectively integrates cloud infrastructure to transfer ML models onto edge servers using blockchain technology. Through this mechanism, it guarantees the streamlined processing and data storage requirements of both centralized and decentralized systems, with an emphasis on scalability, privacy considerations, and cost-effective communication. In current FL implementations, data owners locally train their models, and subsequently upload the outcomes in the form of weights, gradients, and parameters to the cloud for overall model aggregation. This innovation obviates the necessity of engaging Internet of Things (IoT) clients and participants to communicate raw and potentially confidential data directly with a cloud center. This not only reduces the costs associated with communication networks but also enhances the protection of private data. This survey conducts an analysis and comparison of recent FL applications, aiming to assess their efficiency, accuracy, and privacy protection. However, in light of the complex and evolving nature of FL, it becomes evident that additional research is imperative to address lingering knowledge gaps and effectively confront the forthcoming challenges in this field. In this study, we categorize recent literature into the following clusters: privacy protection, resource allocation, case study analysis, and applications. Furthermore, at the end of each section, we tabulate the open areas and future directions presented in the referenced literature, affording researchers and scholars an insightful view of the evolution of the field.
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods
Authors: Authors: Constantine Caramanis, Dimitris Fotakis, Alkis Kalavasis, Vasilis Kontonis, Christos Tzamos
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.05309
Pdf link: https://arxiv.org/pdf/2310.05309
Abstract Deep Neural Networks and Reinforcement Learning methods have empirically shown great promise in tackling challenging combinatorial problems. In those methods a deep neural network is used as a solution generator which is then trained by gradient-based methods (e.g., policy gradient) to successively obtain better solution distributions. In this work we introduce a novel theoretical framework for analyzing the effectiveness of such methods. We ask whether there exist generative models that (i) are expressive enough to generate approximately optimal solutions; (ii) have a tractable, i.e, polynomial in the size of the input, number of parameters; (iii) their optimization landscape is benign in the sense that it does not contain sub-optimal stationary points. Our main contribution is a positive answer to this question. Our result holds for a broad class of combinatorial problems including Max- and Min-Cut, Max-$k$-CSP, Maximum-Weight-Bipartite-Matching, and the Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla gradient descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks
Authors: Authors: Andrew Starnes, Anton Dereventsov, Clayton Webster
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05324
Pdf link: https://arxiv.org/pdf/2310.05324
Abstract In this effort, we consider the impact of regularization on the diversity of actions taken by policies generated from reinforcement learning agents trained using a policy gradient. Policy gradient agents are prone to entropy collapse, which means certain actions are seldomly, if ever, selected. We augment the optimization objective function for the policy with terms constructed from various $\varphi$-divergences and Maximum Mean Discrepancy which encourages current policies to follow different state visitation and/or action choice distribution than previously computed policies. We provide numerical experiments using MNIST, CIFAR10, and Spotify datasets. The results demonstrate the advantage of diversity-promoting policy regularization and that its use on gradient-based approaches have significantly improved performance on a variety of personalization tasks. Furthermore, numerical evidence is given to show that policy regularization increases performance without losing accuracy.
GradientSurf: Gradient-Domain Neural Surface Reconstruction from RGB Video
Authors: Authors: Crane He Chen, Joerg Liebelt
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.05406
Pdf link: https://arxiv.org/pdf/2310.05406
Abstract This paper proposes GradientSurf, a novel algorithm for real time surface reconstruction from monocular RGB video. Inspired by Poisson Surface Reconstruction, the proposed method builds on the tight coupling between surface, volume, and oriented point cloud and solves the reconstruction problem in gradient-domain. Unlike Poisson Surface Reconstruction which finds an offline solution to the Poisson equation by solving a linear system after the scanning process is finished, our method finds online solutions from partial scans with a neural network incrementally where the Poisson layer is designed to supervise both local and global reconstruction. The main challenge that existing methods suffer from when reconstructing from RGB signal is a lack of details in the reconstructed surface. We hypothesize this is due to the spectral bias of neural networks towards learning low frequency geometric features. To address this issue, the reconstruction problem is cast onto gradient domain, where zeroth-order and first-order energies are minimized. The zeroth-order term penalizes location of the surface. The first-order term penalizes the difference between the gradient of reconstructed implicit function and the vector field formulated from oriented point clouds sampled at adaptive local densities. For the task of indoor scene reconstruction, visual and quantitative experimental results show that the proposed method reconstructs surfaces with more details in curved regions and higher fidelity for small objects than previous methods.
RECESS Vaccine for Federated Learning: Proactive Defense Against Model Poisoning Attacks
Authors: Authors: Haonan Yan, Wenjing Zhang, Qian Chen, Xiaoguang Li, Wenhai Sun, Hui Li, Xiaodong Lin
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.05431
Pdf link: https://arxiv.org/pdf/2310.05431
Abstract Model poisoning attacks greatly jeopardize the application of federated learning (FL). The effectiveness of existing defenses is susceptible to the latest model poisoning attacks, leading to a decrease in prediction accuracy. Besides, these defenses are intractable to distinguish benign outliers from malicious gradients, which further compromises the model generalization. In this work, we propose a novel proactive defense named RECESS against model poisoning attacks. Different from the passive analysis in previous defenses, RECESS proactively queries each participating client with a delicately constructed aggregation gradient, accompanied by the detection of malicious clients according to their responses with higher accuracy. Furthermore, RECESS uses a new trust scoring mechanism to robustly aggregate gradients. Unlike previous methods that score each iteration, RECESS considers clients' performance correlation across multiple iterations to estimate the trust score, substantially increasing fault tolerance. Finally, we extensively evaluate RECESS on typical model architectures and four datasets under various settings. We also evaluated the defensive effectiveness against other types of poisoning attacks, the sensitivity of hyperparameters, and adaptive adversarial attacks. Experimental results show the superiority of RECESS in terms of reducing accuracy loss caused by the latest model poisoning attacks over five classic and two state-of-the-art defenses.
Ensemble-based Hybrid Optimization of Bayesian Neural Networks and Traditional Machine Learning Algorithms
Authors: Authors: Peiwen Tan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.05456
Pdf link: https://arxiv.org/pdf/2310.05456
Abstract This research introduces a novel methodology for optimizing Bayesian Neural Networks (BNNs) by synergistically integrating them with traditional machine learning algorithms such as Random Forests (RF), Gradient Boosting (GB), and Support Vector Machines (SVM). Feature integration solidifies these results by emphasizing the second-order conditions for optimality, including stationarity and positive definiteness of the Hessian matrix. Conversely, hyperparameter tuning indicates a subdued impact in improving Expected Improvement (EI), represented by EI(x). Overall, the ensemble method stands out as a robust, algorithmically optimized approach.
Dynamic Top-k Estimation Consolidates Disagreement between Feature Attribution Methods
Authors: Authors: Jonathan Kamp, Lisa Beinborn, Antske Fokkens
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.05619
Pdf link: https://arxiv.org/pdf/2310.05619
Abstract Feature attribution scores are used for explaining the prediction of a text classifier to users by highlighting a k number of tokens. In this work, we propose a way to determine the number of optimal k tokens that should be displayed from sequential properties of the attribution scores. Our approach is dynamic across sentences, method-agnostic, and deals with sentence length bias. We compare agreement between multiple methods and humans on an NLI task, using fixed k and dynamic k. We find that perturbation-based methods and Vanilla Gradient exhibit highest agreement on most method--method and method--human agreement metrics with a static k. Their advantage over other methods disappears with dynamic ks which mainly improve Integrated Gradient and GradientXInput. To our knowledge, this is the first evidence that sequential properties of attribution scores are informative for consolidating attribution signals for human interpretation.
High-order geometric integrators for the local cubic variational Gaussian wavepacket dynamics
Authors: Authors: Roya Moghaddasi Fereidani, Jiří JL Vaníček
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2310.05633
Pdf link: https://arxiv.org/pdf/2310.05633
Abstract Gaussian wavepacket dynamics has proven to be a useful semiclassical approximation for quantum simulations of high-dimensional systems with low anharmonicity. Compared to Heller's original local harmonic method, the variational Gaussian wavepacket dynamics is more accurate, but much more difficult to apply in practice because it requires evaluating the expectation values of the potential energy, gradient, and Hessian. If the variational approach is applied to the local cubic approximation of the potential, these expectation values can be evaluated analytically, but still require the costly third derivative of the potential. To reduce the cost of the resulting local cubic variational Gaussian wavepacket dynamics, we describe efficient high-order geometric integrators, which are symplectic, time-reversible, and norm-conserving. For small time steps, they also conserve the effective energy. We demonstrate the efficiency and geometric properties of these integrators numerically on a multi-dimensional, nonseparable coupled Morse potential.
Reinforcement learning for freeform robot design
Authors: Authors: Muhan Li, David Matthews, Sam Kriegman
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.05670
Pdf link: https://arxiv.org/pdf/2310.05670
Abstract Inspired by the necessity of morphological adaptation in animals, a growing body of work has attempted to expand robot training to encompass physical aspects of a robot's design. However, reinforcement learning methods capable of optimizing the 3D morphology of a robot have been restricted to reorienting or resizing the limbs of a predetermined and static topological genus. Here we show policy gradients for designing freeform robots with arbitrary external and internal structure. This is achieved through actions that deposit or remove bundles of atomic building blocks to form higher-level nonparametric macrostructures such as appendages, organs and cavities. Although results are provided for open loop control only, we discuss how this method could be adapted for closed loop control and sim2real transfer to physical machines in future.
Making Scalable Meta Learning Practical
Authors: Authors: Sang Keun Choe, Sanket Vaibhav Mehta, Hwijeen Ahn, Willie Neiswanger, Pengtao Xie, Emma Strubell, Eric Xing
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.05674
Pdf link: https://arxiv.org/pdf/2310.05674
Abstract Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e., learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Evaluated on multiple large-scale meta learning benchmarks, SAMA showcases up to 1.7/4.8x increase in throughput and 2.0/3.8x decrease in memory consumption respectively on single-/multi-GPU setups compared to other baseline meta learning algorithms. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains.
Protecting Sensitive Data through Federated Co-Training
Authors: Authors: Amr Abourayya, Jens Kleesiek, Kanishka Rao, Erman Ayday, Bharat Rao, Geoff Webb, Michael Kamp
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05696
Pdf link: https://arxiv.org/pdf/2310.05696
Abstract In many critical applications, sensitive data is inherently distributed. Federated learning trains a model collaboratively by aggregating the parameters of locally trained models. This avoids exposing sensitive local data. It is possible, though, to infer upon the sensitive data from the shared model parameters. At the same time, many types of machine learning models do not lend themselves to parameter aggregation, such as decision trees, or rule ensembles. It has been observed that in many applications, in particular healthcare, large unlabeled datasets are publicly available. They can be used to exchange information between clients by distributed distillation, i.e., co-regularizing local training via the discrepancy between the soft predictions of each local client on the unlabeled dataset. This, however, still discloses private information and restricts the types of models to those trainable via gradient-based methods. We propose to go one step further and use a form of federated co-training, where local hard labels on the public unlabeled datasets are shared and aggregated into a consensus label. This consensus label can be used for local training by any supervised machine learning model. We show that this federated co-training approach achieves a model quality comparable to both federated learning and distributed distillation on a set of benchmark datasets and real-world medical datasets. It improves privacy over both approaches, protecting against common membership inference attacks to the highest degree. Furthermore, we show that federated co-training can collaboratively train interpretable models, such as decision trees and rule ensembles, achieving a model quality comparable to centralized training.
An Attribution Method for Siamese Encoders
Authors: Authors: Lucas Möller, Dmitry Nikolaev, Sebastian Padó
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05703
Pdf link: https://arxiv.org/pdf/2310.05703
Abstract Despite the success of Siamese encoder models such as sentence transformers (ST), little is known about the aspects of inputs they pay attention to. A barrier is that their predictions cannot be attributed to individual features, as they compare two inputs rather than processing a single one. This paper derives a local attribution method for Siamese encoders by generalizing the principle of integrated gradients to models with multiple inputs. The solution takes the form of feature-pair attributions, and can be reduced to a token-token matrix for STs. Our method involves the introduction of integrated Jacobians and inherits the advantageous formal properties of integrated gradients: it accounts for the model's full computation graph and is guaranteed to converge to the actual prediction. A pilot study shows that in an ST few token-pairs can often explain large fractions of predictions, and it focuses on nouns and verbs. For accurate predictions, it however needs to attend to the majority of tokens and parts of speech.
Deep Concept Removal
Authors: Authors: Yegor Klochkov, Jean-Francois Ton, Ruocheng Guo, Yang Liu, Hang Li
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05755
Pdf link: https://arxiv.org/pdf/2310.05755
Abstract We address the problem of concept removal in deep neural networks, aiming to learn representations that do not encode certain specified concepts (e.g., gender etc.) We propose a novel method based on adversarial linear classifiers trained on a concept dataset, which helps to remove the targeted attribute while maintaining model performance. Our approach Deep Concept Removal incorporates adversarial probing classifiers at various layers of the network, effectively addressing concept entanglement and improving out-of-distribution generalization. We also introduce an implicit gradient-based technique to tackle the challenges associated with adversarial training using linear classifiers. We evaluate the ability to remove a concept on a set of popular distributionally robust optimization (DRO) benchmarks with spurious correlations, as well as out-of-distribution (OOD) generalization tasks.
An operator preconditioning perspective on training in physics-informed machine learning
Authors: Authors: Tim De Ryck, Florent Bonnet, Siddhartha Mishra, Emmanuel de Bézenac
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05801
Pdf link: https://arxiv.org/pdf/2310.05801
Abstract In this paper, we investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs, which minimize residuals connected to partial differential equations (PDEs). Our key result is that the difficulty in training these models is closely related to the conditioning of a specific differential operator. This operator, in turn, is associated to the Hermitian square of the differential operator of the underlying PDE. If this operator is ill-conditioned, it results in slow or infeasible training. Therefore, preconditioning this operator is crucial. We employ both rigorous mathematical analysis and empirical evaluations to investigate various strategies, explaining how they better condition this critical operator, and consequently improve training.
Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems
Authors: Authors: Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Zhongxuan Han, Dan Meng, Jun Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2310.05847
Pdf link: https://arxiv.org/pdf/2310.05847
Abstract With the growing privacy concerns in recommender systems, recommendation unlearning, i.e., forgetting the impact of specific learned targets, is getting increasing attention. Existing studies predominantly use training data, i.e., model inputs, as the unlearning target. However, we find that attackers can extract private information, i.e., gender, race, and age, from a trained model even if it has not been explicitly encountered during training. We name this unseen information as attribute and treat it as the unlearning target. To protect the sensitive attribute of users, Attribute Unlearning (AU) aims to degrade attacking performance and make target attributes indistinguishable. In this paper, we focus on a strict but practical setting of AU, namely Post-Training Attribute Unlearning (PoT-AU), where unlearning can only be performed after the training of the recommendation model is completed. To address the PoT-AU problem in recommender systems, we design a two-component loss function that consists of i) distinguishability loss: making attribute labels indistinguishable from attackers, and ii) regularization loss: preventing drastic changes in the model that result in a negative impact on recommendation performance. Specifically, we investigate two types of distinguishability measurements, i.e., user-to-user and distribution-to-distribution. We use the stochastic gradient descent algorithm to optimize our proposed loss. Extensive experiments on three real-world datasets demonstrate the effectiveness of our proposed methods.
DSAC-T: Distributional Soft Actor-Critic with Three Refinements
Authors: Authors: Jingliang Duan, Wenxuan Wang, Liming Xiao, Jiaxin Gao, Shengbo Eben Li
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.05858
Pdf link: https://arxiv.org/pdf/2310.05858
Abstract Reinforcement learning (RL) has proven to be highly effective in tackling complex decision-making and control tasks. However, prevalent model-free RL methods often face severe performance degradation due to the well-known overestimation issue. In response to this problem, we recently introduced an off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effectively improve the value estimation accuracy by learning a continuous Gaussian value distribution. Nonetheless, standard DSAC has its own shortcomings, including occasionally unstable learning processes and needs for task-specific reward scaling, which may hinder its overall performance and adaptability in some special tasks. This paper further introduces three important refinements to standard DSAC in order to address these shortcomings. These refinements consist of critic gradient adjusting, twin value distribution learning, and variance-based target return clipping. The modified RL algorithm is named as DSAC with three refinements (DSAC-T or DSAC-v2), and its performances are systematically evaluated on a diverse set of benchmark tasks. Without any task-specific hyperparameter tuning, DSAC-T surpasses a lot of mainstream model-free RL algorithms, including SAC, TD3, DDPG, TRPO, and PPO, in all tested environments. Additionally, DSAC-T, unlike its standard version, ensures a highly stable learning process and delivers similar performance across varying reward scales.
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
Authors: Authors: Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.05861
Pdf link: https://arxiv.org/pdf/2310.05861
Abstract An increasing number of vision-language tasks can be handled with little to no training, i.e., in a zero and few-shot manner, by marrying large language models (LLMs) to vision encoders, resulting in large vision-language models (LVLMs). While this has huge upsides, such as not requiring training data or custom architectures, how an input is presented to a LVLM can have a major impact on zero-shot model performance. In particular, inputs phrased in an underspecified way can result in incorrect answers due to factors like missing visual information, complex implicit reasoning, or linguistic ambiguity. Therefore, adding visually grounded information to the input as a preemptive clarification should improve model performance by reducing underspecification, e.g., by localizing objects and disambiguating references. Similarly, in the VQA setting, changing the way questions are framed can make them easier for models to answer. To this end, we present Rephrase, Augment and Reason (RepARe), a gradient-free framework that extracts salient details about the image using the underlying LVLM as a captioner and reasoner, in order to propose modifications to the original question. We then use the LVLM's confidence over a generated answer as an unsupervised scoring function to select the rephrased question most likely to improve zero-shot performance. Focusing on two visual question answering tasks, we show that RepARe can result in a 3.85% (absolute) increase in zero-shot performance on VQAv2 and a 6.41% point increase on A-OKVQA. Additionally, we find that using gold answers for oracle question candidate selection achieves a substantial gain in VQA accuracy by up to 14.41%. Through extensive analysis, we demonstrate that outputs from RepARe increase syntactic complexity, and effectively utilize vision-language interaction and the frozen language model in LVLMs.
Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts
Authors: Authors: Lizhang Chen, Bo Liu, Kaizhao Liang, Qiang Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Applications (stat.AP); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.05898
Pdf link: https://arxiv.org/pdf/2310.05898
Abstract Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $|x|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
Energy Management in a Cooperative Energy Harvesting Wireless Sensor Network
Authors: Authors: Arghyadeep Barat, Prabuchandran.K.J, Shalabh Bhatnagar
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.05911
Pdf link: https://arxiv.org/pdf/2310.05911
Abstract In this paper, we consider the problem of finding an optimal energy management policy for a network of sensor nodes capable of harvesting their own energy and sharing it with other nodes in the network. We formulate this problem in the discounted cost Markov decision process framework and obtain good energy-sharing policies using the Deep Deterministic Policy Gradient (DDPG) algorithm. Earlier works have attempted to obtain the optimal energy allocation policy for a single sensor and for multiple sensors arranged on a mote with a single centralized energy buffer. Our algorithms, on the other hand, provide optimal policies for a distributed network of sensors individually harvesting energy and capable of sharing energy amongst themselves. Through simulations, we illustrate that the policies obtained by our DDPG algorithm using this enhanced network model outperform algorithms that do not share energy or use a centralized energy buffer in the distributed multi-nodal case.
Keyword: super-resolution

Learning Many-to-Many Mapping for Unpaired Real-World Image Super-resolution and Downscaling
Authors: Authors: Wanjie Sun, Zhenzhong Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2310.04964
Pdf link: https://arxiv.org/pdf/2310.04964
Abstract Learning based single image super-resolution (SISR) for real-world images has been an active research topic yet a challenging task, due to the lack of paired low-resolution (LR) and high-resolution (HR) training images. Most of the existing unsupervised real-world SISR methods adopt a two-stage training strategy by synthesizing realistic LR images from their HR counterparts first, then training the super-resolution (SR) models in a supervised manner. However, the training of image degradation and SR models in this strategy are separate, ignoring the inherent mutual dependency between downscaling and its inverse upscaling process. Additionally, the ill-posed nature of image degradation is not fully considered. In this paper, we propose an image downscaling and SR model dubbed as SDFlow, which simultaneously learns a bidirectional many-to-many mapping between real-world LR and HR images unsupervisedly. The main idea of SDFlow is to decouple image content and degradation information in the latent space, where content information distribution of LR and HR images is matched in a common latent space. Degradation information of the LR images and the high-frequency information of the HR images are fitted to an easy-to-sample conditional distribution. Experimental results on real-world image SR datasets indicate that SDFlow can generate diverse realistic LR and SR images both quantitatively and qualitatively.

zoq / arxiv-updates

New submissions for Tue, 10 Oct 23 #617

Keyword: sgd

Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning

Keyword: optimization

Facilitating Battery Swapping Services for Freight Trucks with Spatial-Temporal Demand Prediction

Leveraging Data Geometry to Mitigate CSM in Steganalysis

EMOFM: Ensemble MLP mOdel with Feature-based Mixers for Click-Through Rate Prediction

A Bi-objective Perspective on Controllable Language Models: Reward Dropout Improves Off-policy Control Performance

Submodular Norms with Applications To Online Facility Location and Stochastic Probing

DragD3D: Vertex-based Editing for Realistic Mesh Deformations using 2D Diffusion Priors

Can pruning make Large Language Models more efficient?

Deep Model Predictive Optimization

KyberMat: Efficient Accelerator for Matrix-Vector Polynomial Multiplication in CRYSTALS-Kyber Scheme via NTT and Polyphase Decomposition

An Experimental Comparison of Methods for Computing the Numerical Radius

Oracle Efficient Algorithms for Groupwise Regret

Hypergraph Analysis Based on a Compatible Tensor Product Structure

Automatic and Efficient Customization of Neural Networks for ML Applications

Understanding and Improving Adversarial Attacks on Latent Diffusion Model

EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling

Review of Machine Learning Techniques for Power Electronics Control and Optimization

A Comprehensive Survey on Deep Neural Image Deblurring

Optimal Sequential Decision-Making in Geosteering: A Reinforcement Learning Approach

HI-SLAM: Monocular Real-time Dense Mapping with Hybrid Implicit Fields

Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM

End-to-End Lip Reading in Romanian with Cross-Lingual Domain Adaptation and Lateral Inhibition

Robust Multivariate Detection and Estimation with Fault Frequency Content Information

A Optimal Unequal Error Protection LDPC Coded Recording System

Algorithms for the Ridesharing with Profit Constraint Problem

Compositional Semantics for Open Vocabulary Spatio-semantic Representations

Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition

Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

FP3O: Enabling Proximal Policy Optimization in Multi-Agent Cooperation with Parameter-Sharing Versatility

Low-Latency Video Conferencing System for Geo-Distributed Data Centers

DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models

Towards Scalable Wireless Federated Learning: Challenges and Solutions

A Privacy-Preserving Trajectory Synthesis Method Based on Vector Translation Invariance Supporting Traffic Constraints

Asymmetrically Decentralized Federated Learning

How Graph Neural Networks Learn: Lessons from Training Dynamics in Function Space

Secure Short-Packet Transmission with Aerial Relaying: Blocklength and Trajectory Co-Design

ZooPFL: Exploring Black-box Foundation Models for Personalized Federated Learning

A Corrected Expected Improvement Acquisition Function Under Noisy Observations

Evolutionary Retrosynthetic Route Planning

Towards Optimizing with Large Language Models

Do Automatic Test Generation Tools Generate Flaky Tests?

Limitations of Stochastic Selection Problems with Pairwise Independent Priors

Time-Varying Soft-Maximum Control Barrier Functions for Safety in an A Priori Unknown Environment

Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods

Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks

Infrared Small Target Detection Using Double-Weighted Multi-Granularity Patch Tensor Model With Tensor-Train Decomposition

Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pre-training

C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network

Quantum Bayesian Optimization

Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization

Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels

Local Structure-Preserving Relaxation Method for Charged Systems on Unstructured Meshes

Waveform Design for MIMO-OFDM Integrated Sensing and Communication System: An Information Theoretical Approach

Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective

Vibroacoustic Frequency Response Prediction with Query-based Operator Networks

Collective Graph Exploration Parameterized by Vertex Cover

Geometry-Guided Ray Augmentation for Neural Surface Reconstruction with Sparse Views

A Neural Tangent Kernel View on Federated Averaging for Deep Linear Neural Network

AbCD: A Component-wise Adjustable Framework for Dynamic Optimization Problems

One Problem, One Solution: Unifying Robot and Environment Design Optimization

Geometry-Aware Safety-Critical Local Reactive Controller for Robot Navigation in Unknown and Cluttered Environments

EdgeAISim: A Toolkit for Simulation and Modelling of AI Models in Edge Computing Environments

Making Scalable Meta Learning Practical

Climate-sensitive Urban Planning through Optimization of Tree Placements

A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics

Large-Scale OD Matrix Estimation with A Deep Learning Method

Deep Concept Removal

FMM-Head: Enhancing Autoencoder-based ECG anomaly detection with prior knowledge

A Meta-Learning Perspective on Transformers for Causal Language Modeling

Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts

Keyword: adam

Combining UPerNet and ConvNeXt for Contrails Identification to reduce Global Warming

Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts

Keyword: gradient

Training-free Linear Image Inversion via Flows

Generative Diffusion From An Action Principle