Abstract
Table of contents (ToC) extraction centres on structuring documents in a hierarchical manner. In this paper, we propose a new dataset, ESGDoc, comprising 1,093 ESG annual reports from 563 companies spanning from 2001 to 2022. These reports pose significant challenges due to their diverse structures and extensive length. To address these challenges, we propose a new framework for Toc extraction, consisting of three steps: (1) Constructing an initial tree of text blocks based on reading order and font sizes; (2) Modelling each tree node (or text block) independently by considering its contextual information captured in node-centric subtree; (3) Modifying the original tree by taking appropriate action on each tree node (Keep, Delete, or Move). This construction-modelling-modification (CMM) process offers several benefits. It eliminates the need for pairwise modelling of section headings as in previous approaches, making document segmentation practically feasible. By incorporating structured information, each section heading can leverage both local and long-distance context relevant to itself. Experimental results show that our approach outperforms the previous state-of-the-art baseline with a fraction of running time. Our framework proves its scalability by effectively handling documents of any length.
Keyword: optimization
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Authors: Authors: Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han
Abstract
On-device learning and efficient fine-tuning enable continuous and privacy-preserving customization (e.g., locally fine-tuning large language models on personalized data). However, existing training frameworks are designed for cloud servers with powerful accelerators (e.g., GPUs, TPUs) and lack the optimizations for learning on the edge, which faces challenges of resource limitations and edge hardware diversity. We introduce PockEngine: a tiny, sparse and efficient engine to enable fine-tuning on various edge devices. PockEngine supports sparse backpropagation: it prunes the backward graph and sparsely updates the model with measured memory saving and latency reduction while maintaining the model quality. Secondly, PockEngine is compilation first: the entire training graph (including forward, backward and optimization steps) is derived at compile-time, which reduces the runtime overhead and brings opportunities for graph transformations. PockEngine also integrates a rich set of training graph optimizations, thus can further accelerate the training cost, including operator reordering and backend switching. PockEngine supports diverse applications, frontends and hardware backends: it flexibly compiles and tunes models defined in PyTorch/TensorFlow/Jax and deploys binaries to mobile CPU/GPU/DSPs. We evaluated PockEngine on both vision models and large language models. PockEngine achieves up to 15 $\times$ speedup over off-the-shelf TensorFlow (Raspberry Pi), 5.6 $\times$ memory saving back-propagation (Jetson AGX Orin). Remarkably, PockEngine enables fine-tuning LLaMav2-7B on NVIDIA Jetson AGX Orin at 550 tokens/s, 7.9$\times$ faster than the PyTorch.
Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization
Authors: Authors: Liang Zhang, Junchi Yang, Amin Karbasi, Niao He
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
Algorithmic reproducibility measures the deviation in outputs of machine learning algorithms upon minor changes in the training process. Previous work suggests that first-order methods would need to trade-off convergence rate (gradient complexity) for better reproducibility. In this work, we challenge this perception and demonstrate that both optimal reproducibility and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems under various error-prone oracle settings. Particularly, given the inexact initialization oracle, our regularization-based algorithms achieve the best of both worlds - optimal reproducibility and near-optimal gradient complexity - for minimization and minimax optimization. With the inexact gradient oracle, the near-optimal guarantees also hold for minimax optimization. Additionally, with the stochastic gradient oracle, we show that stochastic gradient descent ascent is optimal in terms of both reproducibility and gradient complexity. We believe our results contribute to an enhanced understanding of the reproducibility-convergence trade-off in the context of convex optimization.
Learning Optimal Classification Trees Robust to Distribution Shifts
Authors: Authors: Nathan Justin, Sina Aghaei, Andrés Gómez, Phebe Vayanos
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
We consider the problem of learning classification trees that are robust to distribution shifts between training and testing/deployment data. This problem arises frequently in high stakes settings such as public health and social work where data is often collected using self-reported surveys which are highly sensitive to e.g., the framing of the questions, the time when and place where the survey is conducted, and the level of comfort the interviewee has in sharing information with the interviewer. We propose a method for learning optimal robust classification trees based on mixed-integer robust optimization technology. In particular, we demonstrate that the problem of learning an optimal robust tree can be cast as a single-stage mixed-integer robust optimization problem with a highly nonlinear and discontinuous objective. We reformulate this problem equivalently as a two-stage linear robust optimization problem for which we devise a tailored solution procedure based on constraint generation. We evaluate the performance of our approach on numerous publicly available datasets, and compare the performance to a regularized, non-robust optimal tree. We show an increase of up to 12.48% in worst-case accuracy and of up to 4.85% in average-case accuracy across several datasets and distribution shifts from using our robust solution in comparison to the non-robust one.
Probabilistic Multi-product Trading in Sequential Intraday and Frequency-Regulation Markets
Authors: Authors: Saeed Nordin (1), Abolfazl Khodadadi (1), Priyanka Shinde (1), Evelin Blom (1), Mohammad Reza Hesamzadeh (1), Lennart Söder (1) ((1) KTH Royal Institute of Technology)
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC); Probability (math.PR); Applications (stat.AP)
Abstract
With the increasing integration of power plants into the frequency-regulation markets, the importance of optimal trading has grown substantially. This paper conducts an in-depth analysis of their optimal trading behavior in sequential day-ahead, intraday, and frequency-regulation markets. We introduce a probabilistic multi-product optimization model, derived through a series of transformation techniques. Additionally, we present two reformulations that re-frame the problem as a mixed-integer linear programming problem with uncertain parameters. Various aspects of the model are thoroughly examined to observe the optimal multi-product trading behavior of hydro power plant assets, along with numerous case studies. Leveraging historical data from Nordic electricity markets, we construct realistic scenarios for the uncertain parameters. Furthermore, we then proposed an algorithm based on the No-U-Turn sampler to provide probability distribution functions of cleared prices in frequency-regulation and day-ahead markets. These distribution functions offer valuable statistical insights into temporal price risks for informed multi-product optimal-trading decisions.
Soft Wrist Exosuit Actuated by Fabric Pneumatic Artificial Muscles
Authors: Authors: Katalin Schäffer, Yasemin Ozkan-Aydin, Margaret M. Coad
Abstract
Recently, soft actuator-based exosuits have gained interest, due to their high strength-to-weight ratio, inherent safety, and low cost. We present a novel wrist exosuit actuated by fabric pneumatic artificial muscles that can move the wrist in flexion/extension and ulnar/radial deviation. We derive a model representing the torque exerted by the exosuit and introduce a model-based optimization methodology for the selection of placement parameters of the exosuit muscles. We evaluate the accuracy of the model by measuring the exosuit torques throughout the full range of wrist flexion/extension. When accounting for the displacement of the mounting points, the model predicts the exosuit torque with a mean absolute error of 0.279 Nm, which is 26.1% of the average measured torque. To explore the capabilities of the exosuit to move the human body, we measure its range of motion on a passive human wrist; the exosuit is able to achieve 55.0% of the active biological range in flexion, 69.1% in extension, 68.6% in ulnar deviation, and 68.4% in radial deviation. Finally, we demonstrate the device controlling the passive human wrist to move to a desired orientation in the flexion/extension plane and along a two-degree-of-freedom trajectory.
Dimensionally Homogeneous Jacobian using Extended Selection Matrix for Performance Evaluation and Optimization of Parallel Manipulators
Abstract
This paper proposes a new methodology for deriving a point-based dimensionally homogeneous Jacobian, intended for performance evaluation and optimization of parallel manipulators with mixed degrees of freedom. Optimal manipulator often rely on performance indices obtained from the Jacobian matrix. However, when manipulators exhibit mixed translational and rotational freedoms, the conventional Jacobian's inconsistency of units lead to unbalanced optimal result. Addressing this issue, a point-based dimensionally homogeneous Jacobian has appeared as a prominent solution. However, existing point-based approaches for formulating dimensionally homogeneous Jacobian are applicable to a limited variety of parallel manipulators. Moreover, they are complicated and less intuitive. This paper introduces an extended selection matrix that combines component velocities from different points to describe the entire motion of moving plate. This proposed approach enables us to formulate an intuitive point-based, dimensionally homogeneous Jacobian, which can be applied to a wide variety of constrained parallel manipulators. To prove the validity of proposed method, a numerical example is provided utilizing a four-degree-of-freedom parallel manipulator.
Resource Allocation for Near-Field Communications: Fundamentals, Tools, and Outlooks
Authors: Authors: Bokai Xu, Jiayi Zhang, Hongyang Du, Zhe Wang, Yuanwei Liu, Dusit Niyato, Bo Ai, Khaled B. Letaief
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Extremely large-scale multiple-input-multiple output (XL-MIMO) is a promising technology to achieve high spectral efficiency (SE) and energy efficiency (EE) in future wireless systems. The larger array aperture of XL-MIMO makes communication scenarios closer to the near-field region. Therefore, near-field resource allocation is essential in realizing the above key performance indicators (KPIs). Moreover, the overall performance of XL-MIMO systems heavily depends on the channel characteristics of the selected users, eliminating interference between users through beamforming, power control, etc. The above resource allocation issue constitutes a complex joint multi-objective optimization problem since many variables and parameters must be optimized, including the spatial degree of freedom, rate, power allocation, and transmission technique. In this article, we review the basic properties of near-field communications and focus on the corresponding "resource allocation" problems. First, we identify available resources in near-field communication systems and highlight their distinctions from far-field communications. Then, we summarize optimization tools, such as numerical techniques and machine learning methods, for addressing near-field resource allocation, emphasizing their strengths and limitations. Finally, several important research directions of near-field communications are pointed out for further investigation.
User Association and Resource Allocation in Large Language Model Based Mobile Edge Computing System over Wireless Communications
Authors: Authors: Liangxin Qian, Jun Zhao
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
In the rapidly evolving landscape of large language models (LLMs) and mobile edge computing, the need for efficient service delivery to mobile users with constrained computational resources has become paramount. Addressing this, our paper delves into a collaborative framework for model training where user data and model adapters are shared with servers to optimize performance. Within this framework, users initially update the first several layers of the adapters while freezing the other layers of them, leveraging their local datasets. Once this step is complete, these partially trained parameters are transmitted to servers. The servers, equipped with more robust computational capabilities, then update the subsequent layers. After this training, they send the enhanced parameters back to the users. This collaborative training approach ensures that mobile users with limited computational capacities can still benefit from advanced LLM services without being burdened by exhaustive computations. Central to our methodology is the DASHF algorithm, which encapsulates the Dinkelbach algorithm, alternating optimization, semidefinite relaxation (SDR), the Hungarian method, and a pioneering fractional programming technique from our recent IEEE JSAC paper "Human-Centric Resource Allocation in the Metaverse over Wireless Communications". The crux of DASHF is its capability to reformulate an optimization problem as Quadratically Constrained Quadratic Programming (QCQP) via meticulously crafted transformations, making it solvable by SDR and the Hungarian algorithm. Through extensive simulations, we demonstrate the effectiveness of the DASHF algorithm, offering significant insights for the advancement of collaborative LLM service deployments.
Machine Learning Infused Distributed Optimization for Coordinating Virtual Power Plant Assets
Authors: Authors: Meiyi Li, Javad Mohammadi
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
Amid the increasing interest in the deployment of Distributed Energy Resources (DERs), the Virtual Power Plant (VPP) has emerged as a pivotal tool for aggregating diverse DERs and facilitating their participation in wholesale energy markets. These VPP deployments have been fueled by the Federal Energy Regulatory Commission's Order 2222, which makes DERs and VPPs competitive across market segments. However, the diversity and decentralized nature of DERs present significant challenges to the scalable coordination of VPP assets. To address efficiency and speed bottlenecks, this paper presents a novel machine learning-assisted distributed optimization to coordinate VPP assets. Our method, named LOOP-MAC(Learning to Optimize the Optimization Process for Multi-agent Coordination), adopts a multi-agent coordination perspective where each VPP agent manages multiple DERs and utilizes neural network approximators to expedite the solution search. The LOOP-MAC method employs a gauge map to guarantee strict compliance with local constraints, effectively reducing the need for additional post-processing steps. Our results highlight the advantages of LOOP-MAC, showcasing accelerated solution times per iteration and significantly reduced convergence times. The LOOP-MAC method outperforms conventional centralized and distributed optimization methods in optimization tasks that require repetitive and sequential execution.
Decision-theoretic MPC: Motion Planning with Weighted Maneuver Preferences Under Uncertainty
Authors: Authors: Ömer Şahin Taş, Philipp Heinrich Brusius, Christoph Stiller
Subjects: Robotics (cs.RO); Optimization and Control (math.OC)
Abstract
Continuous optimization based motion planners require deciding on a maneuver homotopy before optimizing the trajectory. Under uncertainty, maneuver intentions of other participants can be unclear, and the vehicle might not be able to decide on the most suitable maneuver. This work introduces a method that incorporates multiple maneuver preferences in planning. It optimizes the trajectory by considering weighted maneuver preferences together with uncertainties ranging from perception to prediction while ensuring the feasibility of a chance-constrained fallback option. Evaluations in both driving experiments and simulation studies show enhanced interaction capabilities and comfort levels compared to conventional planners, which consider only a single maneuver.
Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity
Abstract
Recently, Arjevani et al. [1] established a lower bound of iteration complexity for the first-order optimization under an $L$-smooth condition and a bounded noise variance assumption. However, a thorough review of existing literature on Adam's convergence reveals a noticeable gap: none of them meet the above lower bound. In this paper, we close the gap by deriving a new convergence guarantee of Adam, with only an $L$-smooth condition and a bounded noise variance assumption. Our results remain valid across a broad spectrum of hyperparameters. Especially with properly chosen hyperparameters, we derive an upper bound of the iteration complexity of Adam and show that it meets the lower bound for first-order optimizers. To the best of our knowledge, this is the first to establish such a tight upper bound for Adam's convergence. Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate and to convert the first-order term in the Descent Lemma to the gradient norm, which may be of independent interest.
DP-SGD with weight clipping
Authors: Authors: Antoine Barczewski, Jan Ramon
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
Recently, due to the popularity of deep neural networks and other methods whose training typically relies on the optimization of an objective function, and due to concerns for data privacy, there is a lot of interest in differentially private gradient descent methods. To achieve differential privacy guarantees with a minimum amount of noise, it is important to be able to bound precisely the sensitivity of the information which the participants will observe. In this study, we present a novel approach that mitigates the bias arising from traditional gradient clipping. By leveraging public information concerning the current global model and its location within the search domain, we can achieve improved gradient bounds, leading to enhanced sensitivity determinations and refined noise level adjustments. We extend the state of the art algorithms, present improved differential privacy guarantees requiring less noise and present an empirical evaluation.
Experimental Validation for Distributed Control of Energy Hubs
Authors: Authors: Varsha Behrunani, Philipp Heer, John Lygeros
Subjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA); Optimization and Control (math.OC)
Abstract
As future energy systems become more decentralised due to the integration of renewable energy resources and storage technologies, several autonomous energy management and peer-to-peer trading mechanisms have been recently proposed for the operation of energy hub networks based on optimization and game theory. However, most of these strategies have been tested either only in simulated environments or small prosumer units as opposed to larger energy hubs. This simulation reality gap has hindered large-scale implementation and practical application of these method. In this paper, we aim to experimentally validate the performance of a novel multi-horizon distributed model predictive controller for an energy hub network by implementing the controller on a complete network of hubs comprising of a real energy hub inter-faced with multiple virtual hubs. The experiments are done using two different network topologies and the controller shows promising results in both setups.
Adversarial Anomaly Detection using Gaussian Priors and Nonlinear Anomaly Scores
Authors: Authors: Fiete Lüer, Tobias Weber, Maxim Dolgich, Christian Böhm
Abstract
Anomaly detection in imbalanced datasets is a frequent and crucial problem, especially in the medical domain where retrieving and labeling irregularities is often expensive. By combining the generative stability of a $\beta$-variational autoencoder (VAE) with the discriminative strengths of generative adversarial networks (GANs), we propose a novel model, $\beta$-VAEGAN. We investigate methods for composing anomaly scores based on the discriminative and reconstructive capabilities of our model. Existing work focuses on linear combinations of these components to determine if data is anomalous. We advance existing work by training a kernelized support vector machine (SVM) on the respective error components to also consider nonlinear relationships. This improves anomaly detection performance, while allowing faster optimization. Lastly, we use the deviations from the Gaussian prior of $\beta$-VAEGAN to form a novel anomaly score component. In comparison to state-of-the-art work, we improve the $F_1$ score during anomaly detection from 0.85 to 0.92 on the widely used MITBIH Arrhythmia Database.
End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context
Abstract
In this letter, we propose a new method, Multi-Clue Gaze (MCGaze), to facilitate video gaze estimation via capturing spatial-temporal interaction context among head, face, and eye in an end-to-end learning way, which has not been well concerned yet. The main advantage of MCGaze is that the tasks of clue localization of head, face, and eye can be solved jointly for gaze estimation in a one-step way, with joint optimization to seek optimal performance. During this, spatial-temporal context exchange happens among the clues on the head, face, and eye. Accordingly, the final gazes obtained by fusing features from various queries can be aware of global clues from heads and faces, and local clues from eyes simultaneously, which essentially leverages performance. Meanwhile, the one-step running way also ensures high running efficiency. Experiments on the challenging Gaze360 dataset verify the superiority of our proposition. The source code will be released at https://github.com/zgchen33/MCGaze.
Improving Intrinsic Exploration by Creating Stationary Objectives
Authors: Authors: Roger Creus Castanyer, Joshua Romoff, Glen Berseth
Abstract
Exploration bonuses in reinforcement learning guide long-horizon exploration by defining custom intrinsic objectives. Count-based methods use the frequency of state visits to derive an exploration bonus. In this paper, we identify that any intrinsic reward function derived from count-based methods is non-stationary and hence induces a difficult objective to optimize for the agent. The key contribution of our work lies in transforming the original non-stationary rewards into stationary rewards through an augmented state representation. For this purpose, we introduce the Stationary Objectives For Exploration (SOFE) framework. SOFE requires identifying sufficient statistics for different exploration bonuses and finding an efficient encoding of these statistics to use as input to a deep network. SOFE is based on proposing state augmentations that expand the state space but hold the promise of simplifying the optimization of the agent's objective. Our experiments show that SOFE improves the agents' performance in challenging exploration problems, including sparse-reward tasks, pixel-based observations, 3D navigation, and procedurally generated environments.
Distributed Delay-Tolerant Strategies for Equality-Constraint Sum-Preserving Resource Allocation
Authors: Authors: Mohammadreza Doostmohammadian, Alireza Aghasi, Maria Vrakopoulou, Hamid R. Rabiee, Usman A. Khan, Themistoklis Charalambou
Subjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA); Optimization and Control (math.OC)
Abstract
This paper proposes two nonlinear dynamics to solve constrained distributed optimization problem for resource allocation over a multi-agent network. In this setup, coupling constraint refers to resource-demand balance which is preserved at all-times. The proposed solutions can address various model nonlinearities, for example, due to quantization and/or saturation. Further, it allows to reach faster convergence or to robustify the solution against impulsive noise or uncertainties. We prove convergence over weakly connected networks using convex analysis and Lyapunov theory. Our findings show that convergence can be reached for general sign-preserving odd nonlinearity. We further propose delay-tolerant mechanisms to handle general bounded heterogeneous time-varying delays over the communication network of agents while preserving all-time feasibility. This work finds application in CPU scheduling and coverage control among others. This paper advances the state-of-the-art by addressing (i) possible nonlinearity on the agents/links, meanwhile handling (ii) resource-demand feasibility at all times, (iii) uniform-connectivity instead of all-time connectivity, and (iv) possible heterogeneous and time-varying delays. To our best knowledge, no existing work addresses contributions (i)-(iv) altogether. Simulations and comparative analysis are provided to corroborate our contributions.
Generative AI Model for Artistic Style Transfer Using Convolutional Neural Networks
Authors: Authors: Jonayet Miah, Duc M Cao, Md Abu Sayed, Md. Sabbirul Haque
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Abstract
Artistic style transfer, a captivating application of generative artificial intelligence, involves fusing the content of one image with the artistic style of another to create unique visual compositions. This paper presents a comprehensive overview of a novel technique for style transfer using Convolutional Neural Networks (CNNs). By leveraging deep image representations learned by CNNs, we demonstrate how to separate and manipulate image content and style, enabling the synthesis of high-quality images that combine content and style in a harmonious manner. We describe the methodology, including content and style representations, loss computation, and optimization, and showcase experimental results highlighting the effectiveness and versatility of the approach across different styles and content
FOUND: Foot Optimization with Uncertain Normals for Surface Deformation Using Synthetic Data
Authors: Authors: Oliver Boyne, Gwangbin Bae, James Charles, Roberto Cipolla
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Surface reconstruction from multi-view images is a challenging task, with solutions often requiring a large number of sampled images with high overlap. We seek to develop a method for few-view reconstruction, for the case of the human foot. To solve this task, we must extract rich geometric cues from RGB images, before carefully fusing them into a final 3D object. Our FOUND approach tackles this, with 4 main contributions: (i) SynFoot, a synthetic dataset of 50,000 photorealistic foot images, paired with ground truth surface normals and keypoints; (ii) an uncertainty-aware surface normal predictor trained on our synthetic dataset; (iii) an optimization scheme for fitting a generative foot model to a series of images; and (iv) a benchmark dataset of calibrated images and high resolution ground truth geometry. We show that our normal predictor outperforms all off-the-shelf equivalents significantly on real images, and our optimization scheme outperforms state-of-the-art photogrammetry pipelines, especially for a few-view setting. We release our synthetic dataset and baseline 3D scans to the research community.
Sustainable Concrete via Bayesian Optimization
Authors: Authors: Sebastian Ament, Andrew Witte, Nishant Garg, Julius Kusuma
Subjects: Machine Learning (cs.LG); Physics and Society (physics.soc-ph)
Abstract
Eight percent of global carbon dioxide emissions can be attributed to the production of cement, the main component of concrete, which is also the dominant source of CO2 emissions in the construction of data centers. The discovery of lower-carbon concrete formulae is therefore of high significance for sustainability. However, experimenting with new concrete formulae is time consuming and labor intensive, as one usually has to wait to record the concrete's 28-day compressive strength, a quantity whose measurement can by its definition not be accelerated. This provides an opportunity for experimental design methodology like Bayesian Optimization (BO) to accelerate the search for strong and sustainable concrete formulae. Herein, we 1) propose modeling steps that make concrete strength amenable to be predicted accurately by a Gaussian process model with relatively few measurements, 2) formulate the search for sustainable concrete as a multi-objective optimization problem, and 3) leverage the proposed model to carry out multi-objective BO with real-world strength measurements of the algorithmically proposed mixes. Our experimental results show improved trade-offs between the mixtures' global warming potential (GWP) and their associated compressive strengths, compared to mixes based on current industry practices.
Interactive Motion Planning for Autonomous Vehicles with Joint Optimization
Authors: Authors: Yuxiao Chen, Sushant Veer, Peter Karkus, Marco Pavone
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract
In highly interactive driving scenarios, the actions of one agent greatly influences those of its neighbors. Planning safe motions for autonomous vehicles in such interactive environments, therefore, requires reasoning about the impact of the ego's intended motion plan on nearby agents' behavior. Deep-learning-based models have recently achieved great success in trajectory prediction and many models in the literature allow for ego-conditioned prediction. However, leveraging ego-conditioned prediction remains challenging in downstream planning due to the complex nature of neural networks, limiting the planner structure to simple ones, e.g., sampling-based planner. Despite their ability to generate fine-grained high-quality motion plans, it is difficult for gradient-based planning algorithms, such as model predictive control (MPC), to leverage ego-conditioned prediction due to their iterative nature and need for gradient. We present Interactive Joint Planning (IJP) that bridges MPC with learned prediction models in a computationally scalable manner to provide us the best of both the worlds. In particular, IJP jointly optimizes over the behavior of the ego and the surrounding agents and leverages deep-learned prediction models as prediction priors that the join trajectory optimization tries to stay close to. Furthermore, by leveraging homotopy classes, our joint optimizer searches over diverse motion plans to avoid getting stuck at local minima. Closed-loop simulation result shows that IJP significantly outperforms the baselines that are either without joint optimization or running sampling-based planning.
Keyword: adam
Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity
Abstract
Recently, Arjevani et al. [1] established a lower bound of iteration complexity for the first-order optimization under an $L$-smooth condition and a bounded noise variance assumption. However, a thorough review of existing literature on Adam's convergence reveals a noticeable gap: none of them meet the above lower bound. In this paper, we close the gap by deriving a new convergence guarantee of Adam, with only an $L$-smooth condition and a bounded noise variance assumption. Our results remain valid across a broad spectrum of hyperparameters. Especially with properly chosen hyperparameters, we derive an upper bound of the iteration complexity of Adam and show that it meets the lower bound for first-order optimizers. To the best of our knowledge, this is the first to establish such a tight upper bound for Adam's convergence. Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate and to convert the first-order term in the Descent Lemma to the gradient norm, which may be of independent interest.
Is Scaling Learned Optimizers Worth It? Evaluating The Value of VeLO's 4000 TPU Months
Authors: Authors: Fady Rezk, Antreas Antoniou, Henry Gouk, Timothy Hospedales
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract
We analyze VeLO (versatile learned optimizer), the largest scale attempt to train a general purpose "foundational" optimizer to date. VeLO was trained on thousands of machine learning tasks using over 4000 TPU months with the goal of producing an optimizer capable of generalizing to new problems while being hyperparameter free, and outperforming industry standards such as Adam. We independently evaluate VeLO on the MLCommons optimizer benchmark suite. We find that, contrary to initial claims: (1) VeLO has a critical hyperparameter that needs problem-specific tuning, (2) VeLO does not necessarily outperform competitors in quality of solution found, and (3) VeLO is not faster than competing optimizers at reducing the training loss. These observations call into question VeLO's generality and the value of the investment in training it.
Keyword: gradient
Fast Scalable and Accurate Discovery of DAGs Using the Best Order Score Search and Grow-Shrink Trees
Authors: Authors: Bryan Andrews, Joseph Ramsey, Ruben Sanchez-Romero, Jazmin Camchong, Erich Kummerfeld
Abstract
Learning graphical conditional independence structures is an important machine learning problem and a cornerstone of causal discovery. However, the accuracy and execution time of learning algorithms generally struggle to scale to problems with hundreds of highly connected variables -- for instance, recovering brain networks from fMRI data. We introduce the best order score search (BOSS) and grow-shrink trees (GSTs) for learning directed acyclic graphs (DAGs) in this paradigm. BOSS greedily searches over permutations of variables, using GSTs to construct and score DAGs from permutations. GSTs efficiently cache scores to eliminate redundant calculations. BOSS achieves state-of-the-art performance in accuracy and execution time, comparing favorably to a variety of combinatorial and gradient-based learning algorithms under a broad range of conditions. To demonstrate its practicality, we apply BOSS to two sets of resting-state fMRI data: simulated data with pseudo-empirical noise distributions derived from randomized empirical fMRI cortical signals and clinical data from 3T fMRI scans processed into cortical parcels. BOSS is available for use within the TETRAD project which includes Python and R wrappers.
Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization
Authors: Authors: Liang Zhang, Junchi Yang, Amin Karbasi, Niao He
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
Algorithmic reproducibility measures the deviation in outputs of machine learning algorithms upon minor changes in the training process. Previous work suggests that first-order methods would need to trade-off convergence rate (gradient complexity) for better reproducibility. In this work, we challenge this perception and demonstrate that both optimal reproducibility and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems under various error-prone oracle settings. Particularly, given the inexact initialization oracle, our regularization-based algorithms achieve the best of both worlds - optimal reproducibility and near-optimal gradient complexity - for minimization and minimax optimization. With the inexact gradient oracle, the near-optimal guarantees also hold for minimax optimization. Additionally, with the stochastic gradient oracle, we show that stochastic gradient descent ascent is optimal in terms of both reproducibility and gradient complexity. We believe our results contribute to an enhanced understanding of the reproducibility-convergence trade-off in the context of convex optimization.
Improving the Knowledge Gradient Algorithm
Authors: Authors: Yang Le, Gao Siyang, Ho Chin Pang
Abstract
The knowledge gradient (KG) algorithm is a popular policy for the best arm identification (BAI) problem. It is built on the simple idea of always choosing the measurement that yields the greatest expected one-step improvement in the estimate of the best mean of the arms. In this research, we show that this policy has limitations, causing the algorithm not asymptotically optimal. We next provide a remedy for it, by following the manner of one-step look ahead of KG, but instead choosing the measurement that yields the greatest one-step improvement in the probability of selecting the best arm. The new policy is called improved knowledge gradient (iKG). iKG can be shown to be asymptotically optimal. In addition, we show that compared to KG, it is easier to extend iKG to variant problems of BAI, with the $\epsilon$-good arm identification and feasible arm identification as two examples. The superior performances of iKG on these problems are further demonstrated using numerical examples.
Understanding Parameter Saliency via Extreme Value Theory
Authors: Authors: Shuo Wang, Issei Sato
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Deep neural networks are being increasingly implemented throughout society in recent years. It is useful to identify which parameters trigger misclassification in diagnosing undesirable model behaviors. The concept of parameter saliency is proposed and used to diagnose convolutional neural networks (CNNs) by ranking convolution filters that may have caused misclassification on the basis of parameter saliency. It is also shown that fine-tuning the top ranking salient filters has efficiently corrected misidentification on ImageNet. However, there is still a knowledge gap in terms of understanding why parameter saliency ranking can find the filters inducing misidentification. In this work, we attempt to bridge the gap by analyzing parameter saliency ranking from a statistical viewpoint, namely, extreme value theory. We first show that the existing work implicitly assumes that the gradient norm computed for each filter follows a normal distribution. Then, we clarify the relationship between parameter saliency and the score based on the peaks-over-threshold (POT) method, which is often used to model extreme values. Finally, we reformulate parameter saliency in terms of the POT method, where this reformulation is regarded as statistical anomaly detection and does not require the implicit assumptions of the existing parameter-saliency formulation. Our experimental results demonstrate that our reformulation can detect malicious filters as well. Furthermore, we show that the existing parameter saliency method exhibits a bias against the depth of layers in deep neural networks. In particular, this bias has the potential to inhibit the discovery of filters that cause misidentification in situations where domain shift occurs. In contrast, parameter saliency based on POT shows less of this bias.
Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity
Abstract
Recently, Arjevani et al. [1] established a lower bound of iteration complexity for the first-order optimization under an $L$-smooth condition and a bounded noise variance assumption. However, a thorough review of existing literature on Adam's convergence reveals a noticeable gap: none of them meet the above lower bound. In this paper, we close the gap by deriving a new convergence guarantee of Adam, with only an $L$-smooth condition and a bounded noise variance assumption. Our results remain valid across a broad spectrum of hyperparameters. Especially with properly chosen hyperparameters, we derive an upper bound of the iteration complexity of Adam and show that it meets the lower bound for first-order optimizers. To the best of our knowledge, this is the first to establish such a tight upper bound for Adam's convergence. Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate and to convert the first-order term in the Descent Lemma to the gradient norm, which may be of independent interest.
DP-SGD with weight clipping
Authors: Authors: Antoine Barczewski, Jan Ramon
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
Recently, due to the popularity of deep neural networks and other methods whose training typically relies on the optimization of an objective function, and due to concerns for data privacy, there is a lot of interest in differentially private gradient descent methods. To achieve differential privacy guarantees with a minimum amount of noise, it is important to be able to bound precisely the sensitivity of the information which the participants will observe. In this study, we present a novel approach that mitigates the bias arising from traditional gradient clipping. By leveraging public information concerning the current global model and its location within the search domain, we can achieve improved gradient bounds, leading to enhanced sensitivity determinations and refined noise level adjustments. We extend the state of the art algorithms, present improved differential privacy guarantees requiring less noise and present an empirical evaluation.
Nitsche's prescription of Dirichlet conditions in the finite element approximation of Maxwell's problem
Abstract
In this paper we consider the finite element approximation of Maxwell's problem and analyse the prescription of essential boundary conditions in a weak sense using Nitsche's method. To avoid indefiniteness of the problem, the original equations are augmented with the gradient of a scalar field that allows one to impose the zero divergence of the magnetic induction, even if the exact solution for this scalar field is zero. Two finite element approximations are considered, namely, one in which the approximation spaces are assumed to satisfy the appropriate inf-sup condition that render the standard Galerkin method stable, and another augmented and stabilised one that permits the use of finite element interpolations of arbitrary order. Stability and convergence results are provided for the two finite element formulations considered.
Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling
Authors: Authors: Zhenyu Zhu, Francesco Locatello, Volkan Cevher
Abstract
This paper provides statistical sample complexity bounds for score-matching and its applications in causal discovery. We demonstrate that accurate estimation of the score function is achievable by training a standard deep ReLU neural network using stochastic gradient descent. We establish bounds on the error rate of recovering causal relationships using the score-matching-based causal discovery method of Rolland et al. [2022], assuming a sufficiently good estimation of the score function. Finally, we analyze the upper bound of score-matching estimation within the score-based generative modeling, which has been applied for causal discovery but is also of independent interest within the domain of generative models.
Interactive Motion Planning for Autonomous Vehicles with Joint Optimization
Authors: Authors: Yuxiao Chen, Sushant Veer, Peter Karkus, Marco Pavone
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract
In highly interactive driving scenarios, the actions of one agent greatly influences those of its neighbors. Planning safe motions for autonomous vehicles in such interactive environments, therefore, requires reasoning about the impact of the ego's intended motion plan on nearby agents' behavior. Deep-learning-based models have recently achieved great success in trajectory prediction and many models in the literature allow for ego-conditioned prediction. However, leveraging ego-conditioned prediction remains challenging in downstream planning due to the complex nature of neural networks, limiting the planner structure to simple ones, e.g., sampling-based planner. Despite their ability to generate fine-grained high-quality motion plans, it is difficult for gradient-based planning algorithms, such as model predictive control (MPC), to leverage ego-conditioned prediction due to their iterative nature and need for gradient. We present Interactive Joint Planning (IJP) that bridges MPC with learned prediction models in a computationally scalable manner to provide us the best of both the worlds. In particular, IJP jointly optimizes over the behavior of the ego and the surrounding agents and leverages deep-learned prediction models as prediction priors that the join trajectory optimization tries to stay close to. Furthermore, by leveraging homotopy classes, our joint optimizer searches over diverse motion plans to avoid getting stuck at local minima. Closed-loop simulation result shows that IJP significantly outperforms the baselines that are either without joint optimization or running sampling-based planning.
Abstract
In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM training can employ low-precision data formats without compromising model accuracy and requiring no changes to hyper-parameters. Specifically, we propose a new FP8 automatic mixed-precision framework for training LLMs. This framework offers three levels of FP8 utilization to streamline mixed-precision and distributed parallel training for LLMs. It gradually incorporates 8-bit gradients, optimizer states, and distributed learning in an incremental manner. Experiment results show that, during the training of GPT-175B model on H100 GPU platform, our FP8 mixed-precision training framework not only achieved a remarkable 42% reduction in real memory usage but also ran 64% faster than the widely adopted BF16 framework (i.e., Megatron-LM), surpassing the speed of Nvidia Transformer Engine by 17%. This largely reduces the training costs for large foundation models. Furthermore, our FP8 mixed-precision training methodology is generic. It can be seamlessly applied to other tasks such as LLM instruction tuning and reinforcement learning with human feedback, offering savings in fine-tuning expenses. Our FP8 low-precision training framework is open-sourced at {https://github.com/Azure/MS-AMP}{aka.ms/MS.AMP}.
Keyword: sgd
A Scalable Framework for Table of Contents Extraction from Complex ESG Annual Reports
Keyword: optimization
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization
Learning Optimal Classification Trees Robust to Distribution Shifts
Probabilistic Multi-product Trading in Sequential Intraday and Frequency-Regulation Markets
Soft Wrist Exosuit Actuated by Fabric Pneumatic Artificial Muscles
Dimensionally Homogeneous Jacobian using Extended Selection Matrix for Performance Evaluation and Optimization of Parallel Manipulators
Resource Allocation for Near-Field Communications: Fundamentals, Tools, and Outlooks
User Association and Resource Allocation in Large Language Model Based Mobile Edge Computing System over Wireless Communications
Machine Learning Infused Distributed Optimization for Coordinating Virtual Power Plant Assets
Decision-theoretic MPC: Motion Planning with Weighted Maneuver Preferences Under Uncertainty
Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity
DP-SGD with weight clipping
Experimental Validation for Distributed Control of Energy Hubs
Adversarial Anomaly Detection using Gaussian Priors and Nonlinear Anomaly Scores
End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context
Improving Intrinsic Exploration by Creating Stationary Objectives
Distributed Delay-Tolerant Strategies for Equality-Constraint Sum-Preserving Resource Allocation
Generative AI Model for Artistic Style Transfer Using Convolutional Neural Networks
FOUND: Foot Optimization with Uncertain Normals for Surface Deformation Using Synthetic Data
Sustainable Concrete via Bayesian Optimization
Interactive Motion Planning for Autonomous Vehicles with Joint Optimization
Keyword: adam
Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity
Is Scaling Learned Optimizers Worth It? Evaluating The Value of VeLO's 4000 TPU Months
Keyword: gradient
Fast Scalable and Accurate Discovery of DAGs Using the Best Order Score Search and Grow-Shrink Trees
Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization
Improving the Knowledge Gradient Algorithm
Understanding Parameter Saliency via Extreme Value Theory
Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity
DP-SGD with weight clipping
Nitsche's prescription of Dirichlet conditions in the finite element approximation of Maxwell's problem
Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling
Interactive Motion Planning for Autonomous Vehicles with Joint Optimization
FP8-LM: Training FP8 Large Language Models
Keyword: super-resolution
There is no result