Abstract
A recent line of empirical studies has demonstrated that SGD might exhibit a heavy-tailed behavior in practical settings, and the heaviness of the tails might correlate with the overall performance. In this paper, we investigate the emergence of such heavy tails. Previous works on this problem only considered, up to our knowledge, online (also called single-pass) SGD, in which the emergence of heavy tails in theoretical findings is contingent upon access to an infinite amount of data. Hence, the underlying mechanism generating the reported heavy-tailed behavior in practical settings, where the amount of training data is finite, is still not well-understood. Our contribution aims to fill this gap. In particular, we show that the stationary distribution of offline (also called multi-pass) SGD exhibits 'approximate' power-law tails and the approximation error is controlled by how fast the empirical distribution of the training data converges to the true underlying data distribution in the Wasserstein metric. Our main takeaway is that, as the number of data points increases, offline SGD will behave increasingly 'power-law-like'. To achieve this result, we first prove nonasymptotic Wasserstein convergence bounds for offline SGD to online SGD as the number of data points increases, which can be interesting on their own. Finally, we illustrate our theory on various experiments conducted on synthetic data and neural networks.
Linear Mode Connectivity in Sparse Neural Networks
Abstract
With the rise in interest of sparse neural networks, we study how neural network pruning with synthetic data leads to sparse networks with unique training properties. We find that distilled data, a synthetic summarization of the real data, paired with Iterative Magnitude Pruning (IMP) unveils a new class of sparse networks that are more stable to SGD noise on the real data, than either the dense model, or subnetworks found with real data in IMP. That is, synthetically chosen subnetworks often train to the same minima, or exhibit linear mode connectivity. We study this through linear interpolation, loss landscape visualizations, and measuring the diagonal of the hessian. While dataset distillation as a field is still young, we find that these properties lead to synthetic subnetworks matching the performance of traditional IMP with up to 150x less training points in settings where distilled data applies.
High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise
Authors: Authors: Aleksandar Armacki, Pranay Sharma, Gauri Joshi, Dragana Bajovic, Dusan Jakovetic, Soummya Kar
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
Abstract
Several recent works have studied the convergence \textit{in high probability} of stochastic gradient descent (SGD) and its clipped variant. Compared to vanilla SGD, clipped SGD is practically more stable and has the additional theoretical benefit of logarithmic dependence on the failure probability. However, the convergence of other practical nonlinear variants of SGD, e.g., sign SGD, quantized SGD and normalized SGD, that achieve improved communication efficiency or accelerated convergence is much less understood. In this work, we study the convergence bounds \textit{in high probability} of a broad class of nonlinear SGD methods. For strongly convex loss functions with Lipschitz continuous gradients, we prove a logarithmic dependence on the failure probability, even when the noise is heavy-tailed. Strictly more general than the results for clipped SGD, our results hold for any nonlinearity with bounded (component-wise or joint) outputs, such as clipping, normalization, and quantization. Further, existing results with heavy-tailed noise assume bounded $\eta$-th central moments, with $\eta \in (1,2]$. In contrast, our refined analysis works even for $\eta=1$, strictly relaxing the noise moment assumptions in the literature.
Escaping Saddle Points in Heterogeneous Federated Learning via Distributed SGD with Communication Compression
Authors: Authors: Sijin Chen, Zhize Li, Yuejie Chi
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Abstract
We consider the problem of finding second-order stationary points of heterogeneous federated learning (FL). Previous works in FL mostly focus on first-order convergence guarantees, which do not rule out the scenario of unstable saddle points. Meanwhile, it is a key bottleneck of FL to achieve communication efficiency without compensating the learning accuracy, especially when local data are highly heterogeneous across different clients. Given this, we propose a novel algorithm Power-EF that only communicates compressed information via a novel error-feedback scheme. To our knowledge, Power-EF is the first distributed and compressed SGD algorithm that provably escapes saddle points in heterogeneous FL without any data homogeneity assumptions. In particular, Power-EF improves to second-order stationary points after visiting first-order (possibly saddle) points, using additional gradient queries and communication rounds only of almost the same order required by first-order convergence, and the convergence rate exhibits a linear speedup in terms of the number of workers. Our theory improves/recovers previous results, while extending to much more tolerant settings on the local data. Numerical experiments are provided to complement the theory.
Keyword: optimization
Unveiling Energy Efficiency in Deep Learning: Measurement, Prediction, and Scoring across Edge Devices
Abstract
Today, deep learning optimization is primarily driven by research focused on achieving high inference accuracy and reducing latency. However, the energy efficiency aspect is often overlooked, possibly due to a lack of sustainability mindset in the field and the absence of a holistic energy dataset. In this paper, we conduct a threefold study, including energy measurement, prediction, and efficiency scoring, with an objective to foster transparency in power and energy consumption within deep learning across various edge devices. Firstly, we present a detailed, first-of-its-kind measurement study that uncovers the energy consumption characteristics of on-device deep learning. This study results in the creation of three extensive energy datasets for edge devices, covering a wide range of kernels, state-of-the-art DNN models, and popular AI applications. Secondly, we design and implement the first kernel-level energy predictors for edge devices based on our kernel-level energy dataset. Evaluation results demonstrate the ability of our predictors to provide consistent and accurate energy estimations on unseen DNN models. Lastly, we introduce two scoring metrics, PCS and IECS, developed to convert complex power and energy consumption data of an edge device into an easily understandable manner for edge device end-users. We hope our work can help shift the mindset of both end-users and the research community towards sustainability in edge computing, a principle that drives our research. Find data, code, and more up-to-date information at https://amai-gsu.github.io/DeepEn2023.
Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning
Abstract
Large Language Models (LLMs) prompted to generate chain-of-thought (CoT) exhibit impressive reasoning capabilities. Recent attempts at prompt decomposition toward solving complex, multi-step reasoning problems depend on the ability of the LLM to simultaneously decompose and solve the problem. A significant disadvantage is that foundational LLMs are typically not available for fine-tuning, making adaptation computationally prohibitive. We believe (and demonstrate) that problem decomposition and solution generation are distinct capabilites, better addressed in separate modules, than by one monolithic LLM. We introduce DaSLaM, which uses a decomposition generator to decompose complex problems into subproblems that require fewer reasoning steps. These subproblems are answered by a solver. We use a relatively small (13B parameters) LM as the decomposition generator, which we train using policy gradient optimization to interact with a solver LM (regarded as black-box) and guide it through subproblems, thereby rendering our method solver-agnostic. Evaluation on multiple different reasoning datasets reveal that with our method, a 175 billion parameter LM (text-davinci-003) can produce competitive or even better performance, compared to its orders-of-magnitude larger successor, GPT-4. Additionally, we show that DaSLaM is not limited by the solver's capabilities as a function of scale; e.g., solver LMs with diverse sizes give significant performance improvement with our solver-agnostic decomposition technique. Exhaustive ablation studies evince the superiority of our modular finetuning technique over exorbitantly large decomposer LLMs, based on prompting alone.
Middle-mile optimization for next-day delivery
Authors: Authors: Konstantinos Benidis, Georgios Paschos, Martin Gross, George Iosifidis
Abstract
We consider an e-commerce retailer operating a supply chain that consists of middle- and last-mile transportation, and study its ability to deliver products stored in warehouses within a day from customer's order time. Successful next-day delivery requires inventory availability and timely truck schedules in the middle-mile and in this paper we assume a fixed inventory position and focus on optimizing the middle-mile. We formulate a novel optimization problem which decides the departure of the last middle-mile truck at each (potential) network connection in order to maximize the number of next-day deliveries. We show that the respective \emph{next-day delivery optimization} is a combinatorial problem that is $NP$-hard to approximate within $(1-1/e)\cdot\texttt{opt}\approx 0.632\cdot\texttt{opt}$, hence every retailer that offers one-day deliveries has to deal with this complexity barrier. We study three variants of the problem motivated by operational constraints that different retailers encounter, and propose solutions schemes tailored to each problem's properties. To that end, we rely on greedy submodular maximization, pipage rounding techniques, and Lagrangian heuristics. The algorithms are scalable, offer optimality gap guarantees, and evaluated in realistic datasets and network scenarios were found to achieve near-optimal results.
On the Fairness ROAD: Robust Optimization for Adversarial Debiasing
Authors: Authors: Vincent Grari, Thibault Laugel, Tatsunori Hashimoto, Sylvain Lamprier, Marcin Detyniecki
Abstract
In the field of algorithmic fairness, significant attention has been put on group fairness criteria, such as Demographic Parity and Equalized Odds. Nevertheless, these objectives, measured as global averages, have raised concerns about persistent local disparities between sensitive groups. In this work, we address the problem of local fairness, which ensures that the predictor is unbiased not only in terms of expectations over the whole population, but also within any subregion of the feature space, unknown at training time. To enforce this objective, we introduce ROAD, a novel approach that leverages the Distributionally Robust Optimization (DRO) framework within a fair adversarial learning objective, where an adversary tries to infer the sensitive attribute from the predictions. Using an instance-level re-weighting strategy, ROAD is designed to prioritize inputs that are likely to be locally unfair, i.e. where the adversary faces the least difficulty in reconstructing the sensitive attribute. Numerical experiments demonstrate the effectiveness of our method: it achieves Pareto dominance with respect to local fairness and accuracy for a given global fairness level across three standard datasets, and also enhances fairness generalization under distribution shift.
Generalized Firefly Algorithm for Optimal Transmit Beamforming
Authors: Authors: Tuan Anh Le, Xin-She Yang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This paper proposes a generalized Firefly Algorithm (FA) to solve an optimization framework having objective function and constraints as multivariate functions of independent optimization variables. Four representative examples of how the proposed generalized FA can be adopted to solve downlink beamforming problems are shown for a classic transmit beamforming, cognitive beamforming, reconfigurable-intelligent-surfaces-aided (RIS-aided) transmit beamforming, and RIS-aided wireless power transfer (WPT). Complexity analyzes indicate that in large-antenna regimes the proposed FA approaches require less computational complexity than their corresponding interior point methods (IPMs) do, yet demand a higher complexity than the iterative and the successive convex approximation (SCA) approaches do. Simulation results reveal that the proposed FA attains the same global optimal solution as that of the IPM for an optimization problem in cognitive beamforming. On the other hand, the proposed FA approaches outperform the iterative, IPM and SCA in terms of obtaining better solution for optimization problems, respectively, for a classic transmit beamforming, RIS-aided transmit beamforming and RIS-aided WPT.
Multi-fidelity Design of Porous Microstructures for Thermofluidic Applications
Abstract
As modern electronic devices are increasingly miniaturized and integrated, their performance relies more heavily on effective thermal management. Two-phase cooling methods enhanced by porous surfaces, which capitalize on thin-film evaporation atop structured porous surfaces, are emerging as potential solutions. In such porous structures, the optimum heat dissipation capacity relies on two competing objectives that depend on mass and heat transfer. The computational costs of evaluating these objectives, the high dimensionality of the design space which a voxelated microstructure representation, and the manufacturability constraints hinder the optimization process for thermal management. We address these challenges by developing a data-driven framework for designing optimal porous microstructures for cooling applications. In our framework we leverage spectral density functions (SDFs) to encode the design space via a handful of interpretable variables and, in turn, efficiently search it. We develop physics-based formulas to quantify the thermofluidic properties and feasibility of candidate designs via offline simulations. To decrease the reliance on expensive simulations, we generate multi-fidelity data and build emulators to find Pareto-optimal designs. We apply our approach to a canonical problem on evaporator wick design and obtain fin-like topologies in the optimal microstructures which are also characteristics often observed in industrial applications.
End-to-end Feature Selection Approach for Learning Skinny Trees
Abstract
Joint feature selection and tree ensemble learning is a challenging task. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature selection post-training based on feature importances, which are known to be misleading, and can significantly hurt performance. We propose Skinny Trees: a toolkit for feature selection in tree ensembles, such that feature selection and tree ensemble learning occurs simultaneously. It is based on an end-to-end optimization approach that considers feature selection in differentiable trees with Group $\ell_0 - \ell_2$ regularization. We optimize with a first-order proximal method and present convergence guarantees for a non-convex and non-smooth objective. Interestingly, dense-to-sparse regularization scheduling can lead to more expressive and sparser tree ensembles than vanilla proximal method. On 15 synthetic and real-world datasets, Skinny Trees can achieve $1.5\times$ - $620\times$ feature compression rates, leading up to $10\times$ faster inference over dense trees, without any loss in performance. Skinny Trees lead to superior feature selection than many existing toolkits e.g., in terms of AUC performance for $25\%$ feature budget, Skinny Trees outperforms LightGBM by $10.2\%$ (up to $37.7\%$), and Random Forests by $3\%$ (up to $12.5\%$).
Optimization-Free Test-Time Adaptation for Cross-Person Activity Recognition
Authors: Authors: Shuoyuan Wang, Jindong Wang, HuaJun Xi, Bob Zhang, Lei Zhang, Hongxin Wei
Abstract
Human Activity Recognition (HAR) models often suffer from performance degradation in real-world applications due to distribution shifts in activity patterns across individuals. Test-Time Adaptation (TTA) is an emerging learning paradigm that aims to utilize the test stream to adjust predictions in real-time inference, which has not been explored in HAR before. However, the high computational cost of optimization-based TTA algorithms makes it intractable to run on resource-constrained edge devices. In this paper, we propose an Optimization-Free Test-Time Adaptation (OFTTA) framework for sensor-based HAR. OFTTA adjusts the feature extractor and linear classifier simultaneously in an optimization-free manner. For the feature extractor, we propose Exponential DecayTest-time Normalization (EDTN) to replace the conventional batch normalization (CBN) layers. EDTN combines CBN and Test-time batch Normalization (TBN) to extract reliable features against domain shifts with TBN's influence decreasing exponentially in deeper layers. For the classifier, we adjust the prediction by computing the distance between the feature and the prototype, which is calculated by a maintained support set. In addition, the update of the support set is based on the pseudo label, which can benefit from reliable features extracted by EDTN. Extensive experiments on three public cross-person HAR datasets and two different TTA settings demonstrate that OFTTA outperforms the state-of-the-art TTA approaches in both classification performance and computational efficiency. Finally, we verify the superiority of our proposed OFTTA on edge devices, indicating possible deployment in real applications. Our code is available at \href{https://github.com/Claydon-Wang/OFTTA}{this https URL}.
Abstract
Multi-objective optimization is a type of decision making problems where multiple conflicting objectives are optimized. We study offline optimization of multi-objective policies from data collected by an existing policy. We propose a pessimistic estimator for the multi-objective policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator is based on inverse propensity scores (IPS), and improves upon a naive IPS estimator in both theory and experiments. Our analysis is general, and applies beyond our IPS estimators and methods for optimizing them. The pessimistic estimator can be optimized by policy gradients and performs well in all of our experiments.
Designing a Kinetic Façade Using BB-BC Algorithm with a Focus on Enhancing Building Energy Efficiency
Abstract
In order to increase energy efficiency in buildings, optimizing the parameters of the facade form can be challenging due to the dynamic nature of solar radiation. One effective solution is the use of kinetic facades as a second skin, which can control energy consumption. This study proposes a parametric kinetic facade to increase building energy efficiency, along with a framework to optimize its form using the Bang-Big Crunch (BB-BC) optimization algorithm. The study involved modeling a two-story office building in Shiraz city and calculating the energy consumption resulting from building operation over a three-day period without considering the second skin of the facade. In the second stage of the study, the second skin was optimized for the same three-day interval and calculated as a parametric, static facade. In the last step, the parameters of the second skin were optimized for three one-day intervals, assuming the possibility of kinematic changes each day. The total energy consumption of building operation for the three days was then calculated and analyzed.The Python programming language was used to develop the optimization algorithm, while Rhino software, and Grasshopper, Ladybug, and Honeybee plugins were used for building modeling and simulation of light, energy, and weather parameters.The results of the study demonstrate the effectiveness of the proposed kinetic facade and the proper performance of the proposed algorithm for solving similar problems. The study found that the use of the second skin with kinetic function reduced energy consumption by 28%. Additionally, the results from the second and third stages of the study showed that the use of the second facade shell with kinematic function, compared to its function in static mode, reduced energy consumption by 4%.Overall...
Clairvoyance: A Pipeline Toolkit for Medical Time Series
Authors: Authors: Daniel Jarrett, Jinsung Yoon, Ioana Bica, Zhaozhi Qian, Ari Ercole, Mihaela van der Schaar
Abstract
Time-series learning is the bread and butter of data-driven clinical decision support, and the recent explosion in ML research has demonstrated great potential in various healthcare settings. At the same time, medical time-series problems in the wild are challenging due to their highly composite nature: They entail design choices and interactions among components that preprocess data, impute missing values, select features, issue predictions, estimate uncertainty, and interpret models. Despite exponential growth in electronic patient data, there is a remarkable gap between the potential and realized utilization of ML for clinical research and decision support. In particular, orchestrating a real-world project lifecycle poses challenges in engineering (i.e. hard to build), evaluation (i.e. hard to assess), and efficiency (i.e. hard to optimize). Designed to address these issues simultaneously, Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a (i) software toolkit, (ii) empirical standard, and (iii) interface for optimization. Our ultimate goal lies in facilitating transparent and reproducible experimentation with complex inference workflows, providing integrated pathways for (1) personalized prediction, (2) treatment-effect estimation, and (3) information acquisition. Through illustrative examples on real-world data in outpatient, general wards, and intensive-care settings, we illustrate the applicability of the pipeline paradigm on core tasks in the healthcare journey. To the best of our knowledge, Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
KernelGPA: A Globally Optimal Solution to Deformable SLAM in Closed-form
Abstract
We study the generalized Procrustes analysis (GPA), as a minimal formulation to the simultaneous localization and mapping (SLAM) problem. We propose KernelGPA, a novel global registration technique to solve SLAM in the deformable environment. We propose the concept of deformable transformation which encodes the entangled pose and deformation. We define deformable transformations using a kernel method, and show that both the deformable transformations and the environment map can be solved globally in closed-form, up to global scale ambiguities. We solve the scale ambiguities by an optimization formulation that maximizes rigidity. We demonstrate KernelGPA using the Gaussian kernel, and validate the superiority of KernelGPA with various datasets. Code and data are available at \url{https://bitbucket.org/FangBai/deformableprocrustes}.
Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards
Authors: Authors: Jin Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi
Abstract
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation (OPE) and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions.
The Evolution of the Interplay Between Input Distributions and Linear Regions in Networks
Abstract
It is commonly recognized that the expressiveness of deep neural networks is contingent upon a range of factors, encompassing their depth, width, and other relevant considerations. Currently, the practical performance of the majority of deep neural networks remains uncertain. For ReLU (Rectified Linear Unit) networks with piecewise linear activations, the number of linear convex regions serves as a natural metric to gauge the network's expressivity. In this paper, we count the number of linear convex regions in deep neural networks based on ReLU. In particular, we prove that for any one-dimensional input, there exists a minimum threshold for the number of neurons required to express it. We also empirically observe that for the same network, intricate inputs hinder its capacity to express linear regions. Furthermore, we unveil the iterative refinement process of decision boundaries in ReLU networks during training. We aspire for our research to serve as an inspiration for network optimization endeavors and aids in the exploration and analysis of the behaviors exhibited by deep networks.
Optimization of utility-based shortfall risk: A non-asymptotic viewpoint
Authors: Authors: Sumedh Gupte, Prashanth L. A., Sanjay P. Bhat
Abstract
We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR. Next, in the context of UBSR optimization, we derive an expression for the UBSR gradient under a smooth parameterization. This expression is a ratio of expectations, both of which involve the UBSR. We use SAA for the numerator as well as denominator in the UBSR gradient expression to arrive at a biased gradient estimator. We derive non-asymptotic bounds on the estimation error, which show that our gradient estimator is asymptotically unbiased. We incorporate the aforementioned gradient estimator into a stochastic gradient (SG) algorithm for UBSR optimization. Finally, we derive non-asymptotic bounds that quantify the rate of convergence of our SG algorithm for UBSR optimization.
A Data-driven Recommendation Framework for Optimal Walker Designs
Authors: Authors: Advaith Narayanan
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract
The rapidly advancing fields of statistical modeling and machine learning have significantly enhanced data-driven design and optimization. This paper focuses on leveraging these design algorithms to optimize a medical walker, an integral part of gait rehabilitation and physiological therapy of the lower extremities. To achieve the desirable qualities of a walker, we train a predictive machine-learning model to identify trade-offs between performance objectives, thus enabling the use of efficient optimization algorithms. To do this, we use an Automated Machine Learning model utilizing a stacked-ensemble approach shown to outperform traditional ML models. However, training a predictive model requires vast amounts of data for accuracy. Due to limited publicly available walker designs, this paper presents a dataset of more than 5,000 parametric walker designs with performance values to assess mass, structural integrity, and stability. These performance values include displacement vectors for the given load case, stress coefficients, mass, and other physical properties. We also introduce a novel method of systematically calculating the stability index of a walker. We use MultiObjective Counterfactuals for Design (MCD), a novel genetic-based optimization algorithm, to explore the diverse 16-dimensional design space and search for high-performing designs based on numerous objectives. This paper presents potential walker designs that demonstrate up to a 30% mass reduction while increasing structural stability and integrity. This work takes a step toward the improved development of assistive mobility devices.
A Fuzzy Time Series-Based Model Using Particle Swarm Optimization and Weighted Rules
Authors: Authors: Daniel Ortiz-Arroyo
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract
During the last decades, a myriad of fuzzy time series models have been proposed in scientific literature. Among the most accurate models found in fuzzy time series, the high-order ones are the most accurate. The research described in this paper tackles three potential limitations associated with the application of high-order fuzzy time series models. To begin with, the adequacy of forecast rules lacks consistency. Secondly, as the model's order increases, data utilization diminishes. Thirdly, the uniformity of forecast rules proves to be highly contingent on the chosen interval partitions. To address these likely drawbacks, we introduce a novel model based on fuzzy time series that amalgamates the principles of particle swarm optimization (PSO) and weighted summation. Our results show that our approach models accurately the time series in comparison with previous methods.
Correlation Aware Sparsified Mean Estimation Using Random Projection
Abstract
We study the problem of communication-efficient distributed vector mean estimation, a commonly used subroutine in distributed optimization and Federated Learning (FL). Rand-$k$ sparsification is a commonly used technique to reduce communication cost, where each client sends $k < d$ of its coordinates to the server. However, Rand-$k$ is agnostic to any correlations, that might exist between clients in practical scenarios. The recently proposed Rand-$k$-Spatial estimator leverages the cross-client correlation information at the server to improve Rand-$k$'s performance. Yet, the performance of Rand-$k$-Spatial is suboptimal. We propose the Rand-Proj-Spatial estimator with a more flexible encoding-decoding procedure, which generalizes the encoding of Rand-$k$ by projecting the client vectors to a random $k$-dimensional subspace. We utilize Subsampled Randomized Hadamard Transform (SRHT) as the projection matrix and show that Rand-Proj-Spatial with SRHT outperforms Rand-$k$-Spatial, using the correlation information more efficiently. Furthermore, we propose an approach to incorporate varying degrees of correlation and suggest a practical variant of Rand-Proj-Spatial when the correlation information is not available to the server. Experiments on real-world distributed optimization tasks showcase the superior performance of Rand-Proj-Spatial compared to Rand-$k$-Spatial and other more sophisticated sparsification techniques.
Ever Evolving Evaluator (EV3): Towards Flexible and Reliable Meta-Optimization for Knowledge Distillation
Authors: Authors: Li Ding, Masrour Zoghi, Guy Tennenholtz, Maryam Karimzadehgan
Abstract
We introduce EV3, a novel meta-optimization framework designed to efficiently train scalable machine learning models through an intuitive explore-assess-adapt protocol. In each iteration of EV3, we explore various model parameter updates, assess them using pertinent evaluation methods, and adapt the model based on the optimal updates and previous progress history. EV3 offers substantial flexibility without imposing stringent constraints like differentiability on the key objectives relevant to the tasks of interest. Moreover, this protocol welcomes updates with biased gradients and allows for the use of a diversity of losses and optimizers. Additionally, in scenarios with multiple objectives, it can be used to dynamically prioritize tasks. With inspiration drawn from evolutionary algorithms, meta-learning, and neural architecture search, we investigate an application of EV3 to knowledge distillation. Our experimental results illustrate EV3's capability to safely explore model spaces, while hinting at its potential applicability across numerous domains due to its inherent flexibility and adaptability.
Optimizing Task-Specific Timeliness With Edge-Assisted Scheduling for Status Update
Abstract
Intelligent real-time applications, such as video surveillance, demand intensive computation to extract status information from raw sensing data. This poses a substantial challenge in orchestrating computation and communication resources to provide fresh status information. In this paper, we consider a scenario where multiple energy-constrained devices served by an edge server. To extract status information, each device can either do the computation locally or offload it to the edge server. A scheduling policy is needed to determine when and where to compute for each device, taking into account communication and computation capabilities, as well as task-specific timeliness requirements. To that end, we first model the timeliness requirements as general penalty functions of Age of Information (AoI). A convex optimization problem is formulated to provide a lower bound of the minimum AoI penalty given system parameters. Using KKT conditions, we proposed a novel scheduling policy which evaluates status update priorities based on communication and computation delays and task-specific timeliness requirements. The proposed policy is applied to an object tracking application and carried out on a large video dataset. Simulation results show that our policy improves tracking accuracy compared with scheduling policies based on video content information.
TiV-NeRF: Tracking and Mapping via Time-Varying Representation with Dynamic Neural Radiance Fields
Authors: Authors: Chengyao Duan, Zhiliu Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Previous attempts to integrate Neural Radiance Fields (NeRF) into Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or treat dynamic objects as outliers. However, most of real-world scenarios is dynamic. In this paper, we propose a time-varying representation to track and reconstruct the dynamic scenes. Our system simultaneously maintains two processes, tracking process and mapping process. For tracking process, the entire input images are uniformly sampled and training of the RGB images are self-supervised. For mapping process, we leverage know masks to differentiate dynamic objects and static backgrounds, and we apply distinct sampling strategies for two types of areas. The parameters optimization for both processes are made up by two stages, the first stage associates time with 3D positions to convert the deformation field to the canonical field. And the second associates time with 3D positions in canonical field to obtain colors and Signed Distance Function (SDF). Besides, We propose a novel keyframe selection strategy based on the overlapping rate. We evaluate our approach on two publicly available synthetic datasets and validate that our method is more effective compared to current state-of-the-art dynamic mapping methods.
Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks on Nearly-orthogonal Data
Abstract
The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well. While the implicit bias of gradient flow has been widely studied for homogeneous neural networks (including ReLU and leaky ReLU networks), the implicit bias of gradient descent is currently only understood for smooth neural networks. Therefore, implicit bias in non-smooth neural networks trained by gradient descent remains an open question. In this paper, we aim to answer this question by studying the implicit bias of gradient descent for training two-layer fully connected (leaky) ReLU neural networks. We showed that when the training data are nearly-orthogonal, for leaky ReLU activation function, gradient descent will find a network with a stable rank that converges to $1$, whereas for ReLU activation function, gradient descent will find a neural network with a stable rank that is upper bounded by a constant. Additionally, we show that gradient descent will find a neural network such that all the training data points have the same normalized margin asymptotically. Experiments on both synthetic and real data backup our theoretical findings.
Playing in the Dark: No-regret Learning with Adversarial Constraints
Authors: Authors: Abhishek Sinha, Rahul Vaze
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
We study a generalization of the classic Online Convex Optimization (OCO) framework by considering additional long-term adversarial constraints. Specifically, after an online policy decides its action on a round, in addition to a convex cost function, the adversary also reveals a set of $k$ convex constraints. The cost and the constraint functions could change arbitrarily with time, and no information about the future functions is assumed to be available. In this paper, we propose a meta-policy that simultaneously achieves a sublinear cumulative constraint violation and a sublinear regret. This is achieved via a black box reduction of the constrained problem to the standard OCO problem for a recursively constructed sequence of surrogate cost functions. We show that optimal performance bounds can be achieved by solving the surrogate problem using any adaptive OCO policy enjoying a standard data-dependent regret bound. A new Lyapunov-based proof technique is presented that reveals a connection between regret and certain sequential inequalities through a novel decomposition result. We conclude the paper by highlighting applications to online multi-task learning and network control problems.
Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders
Authors: Authors: Qianren Mao, Shaobo Zhao, Jiarui Li, Xiaolei Gu, Shizhu He, Bo Li, Jianxin Li
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Abstract
Pre-trained sentence representations are crucial for identifying significant sentences in unsupervised document extractive summarization. However, the traditional two-step paradigm of pre-training and sentence-ranking, creates a gap due to differing optimization objectives. To address this issue, we argue that utilizing pre-trained embeddings derived from a process specifically designed to optimize cohensive and distinctive sentence representations helps rank significant sentences. To do so, we propose a novel graph pre-training auto-encoder to obtain sentence embeddings by explicitly modelling intra-sentential distinctive features and inter-sentential cohesive features through sentence-word bipartite graphs. These pre-trained sentence representations are then utilized in a graph-based ranking algorithm for unsupervised summarization. Our method produces predominant performance for unsupervised summarization frameworks by providing summary-worthy sentence representations. It surpasses heavy BERT- or RoBERTa-based sentence representations in downstream tasks.
A 0.21-ps FOM Capacitor-Less Analog LDO with Dual-Range Load Current for Biomedical Applications
Abstract
This paper presents an output capacitor-less low-dropout regulator (LDO) with a bias switching scheme for biomedical applications with dual-range load currents. Power optimization is crucial for systems with multiple activation modes such as neural interfaces, IoT and edge devices with varying load currents. To enable rapid switching between low and high current states, a flipped voltage follower (FVF) configuration is utilized, along with a super source follower buffer to drive the power transistor. Two feedback loops and an on-chip compensation capacitor Cc maintain the stability of the regulator under various load conditions. The LDO was implemented in a 65nm CMOS process with 1.5V input and 1.2V output voltages. The measured quiescent current is as low as 3uA and 50uA for the 0-500uA and 5-15mA load current ranges, respectively. An undershoot voltage of 100mV is observed when the load current switches from 0 to 15mA within 80ns, with a maximum current efficiency of 99.98%. Our design achieved a low Figure-of-Merit of 0.21ps, outperforming state-of-the-art analog LDOs.
Behavior Alignment via Reward Function Optimization
Authors: Authors: Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno Castro da Silva
Abstract
Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outcomes and promote behaviors that are not aligned with the designer's intended goal. Although potential-based reward shaping is often suggested as a remedy, we systematically investigate settings where deploying it often significantly impairs performance. To address these issues, we introduce a new framework that uses a bi-level objective to learn \emph{behavior alignment reward functions}. These functions integrate auxiliary rewards reflecting a designer's heuristics and domain knowledge with the environment's primary rewards. Our approach automatically determines the most effective way to blend these types of feedback, thereby enhancing robustness against heuristic reward misspecification. Remarkably, it can also adapt an agent's policy optimization process to mitigate suboptimalities resulting from limitations and biases inherent in the underlying RL algorithms. We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges. We investigate heuristic auxiliary rewards of varying quality -- some of which are beneficial and others detrimental to the learning process. Our results show that our framework offers a robust and principled way to integrate designer-specified heuristics. It not only addresses key shortcomings of existing approaches but also consistently leads to high-performing solutions, even when given misaligned or poorly-specified auxiliary reward functions.
Optimizing simultaneous autoscaling for serverless cloud computing
Abstract
This paper explores resource allocation in serverless cloud computing platforms and proposes an optimization approach for autoscaling systems. Serverless computing relieves users from resource management tasks, enabling focus on application functions. However, dynamic resource allocation and function replication based on changing loads remain crucial. Typically, autoscalers in these platforms utilize threshold-based mechanisms to adjust function replicas independently. We model applications as interconnected graphs of functions, where requests probabilistically traverse the graph, triggering associated function execution. Our objective is to develop a control policy that optimally allocates resources on servers, minimizing failed requests and response time in reaction to load changes. Using a fluid approximation model and Separated Continuous Linear Programming (SCLP), we derive an optimal control policy that determines the number of resources per replica and the required number of replicas over time. We evaluate our approach using a simulation framework built with Python and simpy. Comparing against threshold-based autoscaling, our approach demonstrates significant improvements in average response times and failed requests, ranging from 15% to over 300% in most cases. We also explore the impact of system and workload parameters on performance, providing insights into the behavior of our optimization approach under different conditions. Overall, our study contributes to advancing resource allocation strategies, enhancing efficiency and reliability in serverless cloud computing platforms.
RIS-aided Real-time Beam Tracking for a Mobile User via Bayesian Optimization
Authors: Authors: Junshuo Liu, Rujing Xiong, Jialong Lu, Tiebin Mi, Robert C. Qiu
Abstract
The conventional beam management procedure mandates that the user equipment (UE) periodically measure the received signal reference power (RSRP) and transmit these measurements to the base station (BS). The challenge lies in balancing the number of beams used: it should be large enough to identify high-RSRP beams but small enough to minimize reporting overhead. This paper investigates this essential performance-versus-overhead trade-off using Bayesian optimization. The proposed approach represents the first application of real-time beam tracking via Bayesian optimization in RIS-assisted communication systems. Simulation results validate the effectiveness of this scheme.
An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits
Authors: Authors: Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Max Springer
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
Abstract
We present an oracle-efficient relaxation for the adversarial contextual bandits problem, where the contexts are sequentially drawn i.i.d from a known distribution and the cost sequence is chosen by an online adversary. Our algorithm has a regret bound of $O(T^{\frac{2}{3}}(K\log(|\Pi|))^{\frac{1}{3}})$ and makes at most $O(K)$ calls per round to an offline optimization oracle, where $K$ denotes the number of actions, $T$ denotes the number of rounds and $\Pi$ denotes the set of policies. This is the first result to improve the prior best bound of $O((TK)^{\frac{2}{3}}(\log(|\Pi|))^{\frac{1}{3}})$ as obtained by Syrgkanis et al. at NeurIPS 2016, and the first to match the original bound of Langford and Zhang at NeurIPS 2007 which was obtained for the stochastic case.
Large Language Models as Evolutionary Optimizers
Authors: Authors: Shengcai Liu, Caishun Chen, Xinghua Qu, Ke Tang, Yew-Soon Ong
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
Evolutionary algorithms (EAs) have achieved remarkable success in tackling complex combinatorial optimization problems. However, EAs often demand carefully-designed operators with the aid of domain expertise to achieve satisfactory performance. In this work, we present the first study on large language models (LLMs) as evolutionary combinatorial optimizers. The main advantage is that it requires minimal domain knowledge and human efforts, as well as no additional training of the model. This approach is referred to as LLM-driven EA (LMEA). Specifically, in each generation of the evolutionary search, LMEA instructs the LLM to select parent solutions from current population, and perform crossover and mutation to generate offspring solutions. Then, LMEA evaluates these new solutions and include them into the population for the next generation. LMEA is equipped with a self-adaptation mechanism that controls the temperature of the LLM. This enables it to balance between exploration and exploitation and prevents the search from getting stuck in local optima. We investigate the power of LMEA on the classical traveling salesman problems (TSPs) widely used in combinatorial optimization research. Notably, the results show that LMEA performs competitively to traditional heuristics in finding high-quality solutions on TSP instances with up to 20 nodes. Additionally, we also study the effectiveness of LLM-driven crossover/mutation and the self-adaptation mechanism in evolutionary search. In summary, our results reveal the great potentials of LLMs as evolutionary optimizers for solving combinatorial problems. We hope our research shall inspire future explorations on LLM-driven EAs for complex optimization challenges.
Estimation of Semiconductor Power Losses Through Automatic Thermal Modeling
Authors: Authors: Jose Miguel Sanz-Alcaine, Eduardo Sebastian, Francisco Jose Perez-Cebolla, Asier Arruti, Carlos Bernal-Ruiz, Iosu Aizpuru
Abstract
The optimal design of power converters requires accurate knowledge of the dissipation elements of its system to achieve the desired performance and security requirements. Calorimetric methods have surpassed classical electrical methods for the estimation of semiconductor power losses but have mechanical limitations and resort to analytical electrothermal equivalent circuits for this task. These electrothermal models are highly dependent on the topology and technology used on the power converter leading to either simplifications that underestimate the thermal effects or intractable sets of differential equations. To overcome these issues, we propose a novel data-driven identification method to characterize the thermal dynamics of power converters allowing the designer to obtain semiconductor total power losses only by means of temperature measurements without the need of a calorimeter. Given a set of power vs.temperature profiles, our solution identifies the linear model that best fits the data. The solution is based on an optimization problem that allows not only accurate identification but also coding of the desired modeling requirements, such as dynamics' invertibility to allow the estimation of power losses from the temperature profiles. The proposed methodology can be applied to any power converter topology. Furthermore, by obtaining a linear model, standard control theory techniques can be exploited to analyze and control the thermal dynamics. Real experiments validate the generality and accuracy of the proposal.
Datasets and Benchmarks for Nanophotonic Structure and Parametric Design Simulations
Authors: Authors: Jungtaek Kim, Mingxuan Li, Oliver Hinder, Paul W. Leu
Abstract
Nanophotonic structures have versatile applications including solar cells, anti-reflective coatings, electromagnetic interference shielding, optical filters, and light emitting diodes. To design and understand these nanophotonic structures, electrodynamic simulations are essential. These simulations enable us to model electromagnetic fields over time and calculate optical properties. In this work, we introduce frameworks and benchmarks to evaluate nanophotonic structures in the context of parametric structure design problems. The benchmarks are instrumental in assessing the performance of optimization algorithms and identifying an optimal structure based on target optical properties. Moreover, we explore the impact of varying grid sizes in electrodynamic simulations, shedding light on how evaluation fidelity can be strategically leveraged in enhancing structure designs.
Gauge-optimal approximate learning for small data classification problems
Abstract
Small data learning problems are characterized by a significant discrepancy between the limited amount of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information, and cannot derive an appropriate learning rule which allows to discriminate between different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the Gauge-Optimal Approximate Learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space, and that it can be approximated through a monotonically convergent algorithm which presents -- under the assumption of a discrete segmentation of the feature space -- a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning (ML) tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Nino Southern Oscillation and inference of epigenetically-induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems both in learning performance and computational cost.
Bridging the Gap: Towards an Expanded Toolkit for ML-Supported Decision-Making in the Public Sector
Abstract
Machine Learning (ML) systems are becoming instrumental in the public sector, with applications spanning areas like criminal justice, social welfare, financial fraud detection, and public health. While these systems offer great potential benefits to institutional decision-making processes, such as improved efficiency and reliability, they still face the challenge of aligning intricate and nuanced policy objectives with the precise formalization requirements necessitated by ML models. In this paper, we aim to bridge the gap between ML and public sector decision-making by presenting a comprehensive overview of key technical challenges where disjunctions between policy goals and ML models commonly arise. We concentrate on pivotal points of the ML pipeline that connect the model to its operational environment, delving into the significance of representative training data and highlighting the importance of a model setup that facilitates effective decision-making. Additionally, we link these challenges with emerging methodological advancements, encompassing causal ML, domain adaptation, uncertainty quantification, and multi-objective optimization, illustrating the path forward for harmonizing ML and public sector objectives.
Proving Linear Mode Connectivity of Neural Networks via Optimal Transport
Abstract
The energy landscape of high-dimensional non-convex optimization problems is crucial to understanding the effectiveness of modern deep neural network architectures. Recent works have experimentally shown that two different solutions found after two runs of a stochastic training are often connected by very simple continuous paths (e.g., linear) modulo a permutation of the weights. In this paper, we provide a framework theoretically explaining this empirical observation. Based on convergence rates in Wasserstein distance of empirical measures, we show that, with high probability, two wide enough two-layer neural networks trained with stochastic gradient descent are linearly connected. Additionally, we express upper and lower bounds on the width of each layer of two deep neural networks with independent neuron weights to be linearly connected. Finally, we empirically demonstrate the validity of our approach by showing how the dimension of the support of the weight distribution of neurons, which dictates Wasserstein convergence rates is correlated with linear mode connectivity.
Partial Orderings as Heuristic for Multi-Objective Model-Based Reasoning
Abstract
Model-based reasoning is becoming increasingly common in software engineering. The process of building and analyzing models helps stakeholders to understand the ramifications of their software decisions. But complex models can confuse and overwhelm stakeholders when these models have too many candidate solutions. We argue here that a technique based on partial orderings lets humans find acceptable solutions via a binary chop needing $O(log(N))$ queries (or less). This paper checks the value of this approach via the iSNEAK partial ordering tool. Pre-experimentally, we were concerned that (a)~our automated methods might produce models that were unacceptable to humans; and that (b)~our human-in-the-loop methods might actual overlooking significant optimizations. Hence, we checked the acceptability of the solutions found by iSNEAK via a human-in-the-loop double-blind evaluation study of 20 Brazilian programmers. We also checked if iSNEAK misses significant optimizations (in a corpus of 16 SE models of size ranging up to 1000 attributes by comparing it against two rival technologies (the genetic algorithms preferred by the interactive search-based SE community; and the sequential model optimizers developed by the SE configuration community~\citep{flash_vivek}). iSNEAK 's solutions were found to be human acceptable (and those solutions took far less time to generate, with far fewer questions to any stakeholder). Significantly, our methods work well even for multi-objective models with competing goals (in this work we explore models with four to five goals). These results motivate more work on partial ordering for many-goal model-based problems.
Optical STAR-RIS-Aided VLC Systems: RSMA Versus NOMA
Authors: Authors: Omar Maraqa, Sylvester Aboagye, Telex M. N. Ngatched
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
A critical concern within the realm of visible light communications (VLC) pertains to enhancing system data rate, particularly in scenarios where the direct line-of-sight (LoS) connection is obstructed by obstacles. The deployment of meta-surface-based simultaneous transmission and reflection reconfigurable intelligent surface (STAR-RIS) has emerged to combat challenging LoS blockage scenarios and to provide 360 coverage in radio-frequency wireless systems. Recently, the concept of optical simultaneous transmission and reflection reconfigurable intelligent surface (OSTAR-RIS) has been promoted for VLC systems. This work is dedicated to studying the performance of OSTAR-RIS in detail and unveiling the VLC system performance gain under such technology. Specifically, we propose a novel multi-user indoor VLC system that is assisted by OSTAR-RIS. To improve the sum rate performance of the proposed system, both power-domain non-orthogonal multiple access (NOMA) and rate splitting multiple access (RSMA) are investigated in this work. To realize this, a sum rate maximization problem that jointly optimizes the roll and yaw angles of the reflector elements as well as the refractive index of the refractor elements in OSTAR-RIS is formulated, solved, and evaluated. The maximization problem takes into account practical considerations, such as the presence of non-users (i.e., blockers) and the orientation of the recipient's device. The sine-cosine meta-heuristic algorithm is employed to get the optimal solution of the formulated non-convex optimization problem. Moreover, the study delves into the sum energy efficiency optimization of the proposed system. Simulation results indicate that the proposed OSTAR-RIS RSMA-aided VLC system outperforms the OSTAR-RIS NOMA-based VLC system in terms of both the sum rate and the sum energy efficiency.
Abstract
While Graph Neural Networks (GNNs) recently became powerful tools in graph learning tasks, considerable efforts have been spent on improving GNNs' structural encoding ability. A particular line of work proposed subgraph GNNs that use subgraph information to improve GNNs' expressivity and achieved great success. However, such effectivity sacrifices the efficiency of GNNs by enumerating all possible subgraphs. In this paper, we analyze the necessity of complete subgraph enumeration and show that a model can achieve a comparable level of expressivity by considering a small subset of the subgraphs. We then formulate the identification of the optimal subset as a combinatorial optimization problem and propose Magnetic Graph Neural Network (MAG-GNN), a reinforcement learning (RL) boosted GNN, to solve the problem. Starting with a candidate subgraph set, MAG-GNN employs an RL agent to iteratively update the subgraphs to locate the most expressive set for prediction. This reduces the exponential complexity of subgraph enumeration to the constant complexity of a subgraph search algorithm while keeping good expressivity. We conduct extensive experiments on many datasets, showing that MAG-GNN achieves competitive performance to state-of-the-art methods and even outperforms many subgraph GNNs. We also demonstrate that MAG-GNN effectively reduces the running time of subgraph GNNs.
RAIFLE: Reconstruction Attacks on Interaction-based Federated Learning with Active Data Manipulation
Authors: Authors: Dzung Pham, Shreyas Kulkarni, Amir Houmansadr
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
Federated learning (FL) has recently emerged as a privacy-preserving approach for machine learning in domains that rely on user interactions, particularly recommender systems (RS) and online learning to rank (OLTR). While there has been substantial research on the privacy of traditional FL, little attention has been paid to studying the privacy properties of these interaction-based FL (IFL) systems. In this work, we show that IFL can introduce unique challenges concerning user privacy, particularly when the central server has knowledge and control over the items that users interact with. Specifically, we demonstrate the threat of reconstructing user interactions by presenting RAIFLE, a general optimization-based reconstruction attack framework customized for IFL. RAIFLE employs Active Data Manipulation (ADM), a novel attack technique unique to IFL, where the server actively manipulates the training features of the items to induce adversarial behaviors in the local FL updates. We show that RAIFLE is more impactful than existing FL privacy attacks in the IFL context, and describe how it can undermine privacy defenses like secure aggregation and private information retrieval. Based on our findings, we propose and discuss countermeasure guidelines to mitigate our attack in the context of federated RS/OLTR specifically and IFL more broadly.
Identification of the Most Significant Parameter for Optimizing the Performance of RPL Routing Protocol in IoT Using Taguchi Design of Experiments
Abstract
Internet of Things (IoT) consists of a wide variety of devices with limited power sources. Due to the adhered reason, energy consumption is considered as one of the major challenges in the IoT environment. In this research article, an attempt is made to optimize the existing Routing Protocol (RPL) towards a green technology. It focuses on finding the most significant parameter in the RPL using Taguchi Design of Experiments. It emphasizes the effects of five input factors, such as Network Size, Mobility Speed, DIO_DOUBLING, DIO_MIN_INTERVAL, and Redundancy Constant on only one output parameter Power Consumption. The findings show that DIO_MIN_INTERVAL is the leading factor that has a significant effect on the power consumption in RPL. After determining the most significant factor that affects the power consumption, measures can be taken to optimize the performance of RPL by applying some optimization techniques. COOJA simulator is used to carry out the simulations required for this research article.
On the accuracy and efficiency of group-wise clipping in differentially private optimization
Authors: Authors: Zhiqi Bu, Ruixuan Liu, Yu-Xiang Wang, Sheng Zha, George Karypis
Abstract
Recent advances have substantially improved the accuracy, memory cost, and training speed of differentially private (DP) deep learning, especially on large vision and language models with millions to billions of parameters. In this work, we thoroughly study the per-sample gradient clipping style, a key component in DP optimization. We show that different clipping styles have the same time complexity but instantiate an accuracy-memory trade-off: while the all-layer clipping (of coarse granularity) is the most prevalent and usually gives the best accuracy, it incurs heavier memory cost compared to other group-wise clipping, such as the layer-wise clipping (of finer granularity). We formalize this trade-off through our convergence theory and complexity analysis. Importantly, we demonstrate that the accuracy gap between group-wise clipping and all-layer clipping becomes smaller for larger models, while the memory advantage of the group-wise clipping remains. Consequently, the group-wise clipping allows DP optimization of large models to achieve high accuracy and low peak memory simultaneously.
Abstract
Adapters are widely popular parameter-efficient transfer learning approaches in natural language processing that insert trainable modules in between layers of a pre-trained language model. Apart from several heuristics, however, there has been a lack of studies analyzing the optimal number of adapter parameters needed for downstream applications. In this paper, we propose an adapter pruning approach by studying the tropical characteristics of trainable modules. We cast it as an optimization problem that aims to prune parameters from the adapter layers without changing the orientation of underlying tropical hypersurfaces. Our experiments on five NLP datasets show that tropical geometry tends to identify more relevant parameters to prune when compared with the magnitude-based baseline, while a combined approach works best across the tasks.
IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI
Abstract
Diffusion-based image generation models, such as Stable Diffusion or DALL-E 2, are able to learn from given images and generate high-quality samples following the guidance from prompts. For instance, they can be used to create artistic images that mimic the style of an artist based on his/her original artworks or to maliciously edit the original images for fake content. However, such ability also brings serious ethical issues without proper authorization from the owner of the original images. In response, several attempts have been made to protect the original images from such unauthorized data usage by adding imperceptible perturbations, which are designed to mislead the diffusion model and make it unable to properly generate new samples. In this work, we introduce a perturbation purification platform, named IMPRESS, to evaluate the effectiveness of imperceptible perturbations as a protective measure. IMPRESS is based on the key observation that imperceptible perturbations could lead to a perceptible inconsistency between the original image and the diffusion-reconstructed image, which can be used to devise a new optimization strategy for purifying the image, which may weaken the protection of the original image from unauthorized data usage (e.g., style mimicking, malicious editing). The proposed IMPRESS platform offers a comprehensive evaluation of several contemporary protection methods, and can be used as an evaluation platform for future protection methods.
Flow-based Distributionally Robust Optimization
Authors: Authors: Chen Xu, Jonghyeok Lee, Xiuyuan Cheng, Yao Xie
Abstract
We present a computationally efficient framework, called \texttt{FlowDRO}, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets, when requiring the worst-case distribution (also called the Least Favorable Distribution, LFD) to be continuous so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models, continuous-time invertible transport maps between the data distribution and the target distribution, and develop a Wasserstein proximal gradient flow type of algorithm. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. Our computational framework is general, can handle high-dimensional data with large sample sizes, and can be useful for various applications. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on real high-dimensional data.
Online Data-Driven Safety Certification for Systems Subject to Unknown Disturbances
Authors: Authors: Nicholas Rober, Karan Mahesh, Tyler M. Paine, Max L. Greene, Steven Lee, Sildomar T. Monteiro, Michael R. Benjamin, Jonathan P. How
Abstract
Deploying autonomous systems in safety critical settings necessitates methods to verify their safety properties. This is challenging because real-world systems may be subject to disturbances that affect their performance, but are unknown a priori. This work develops a safety-verification strategy wherein data is collected online and incorporated into a reachability analysis approach to check in real-time that the system avoids dangerous regions of the state space. Specifically, we employ an optimization-based moving horizon estimator (MHE) to characterize the disturbance affecting the system, which is incorporated into an online reachability calculation. Reachable sets are calculated using a computational graph analysis tool to predict the possible future states of the system and verify that they satisfy safety constraints. We include theoretical arguments proving our approach generates reachable sets that bound the future states of the system, as well as numerical results demonstrating how it can be used for safety verification. Finally, we present results from hardware experiments demonstrating our approach's ability to perform online reachability calculations for an unmanned surface vehicle subject to currents and actuator failures.
ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout
Authors: Authors: Huiyao Shu, Ang Wang, Ziji Shi, Hanyu Zhao, Yong Li, Lu Lu
Abstract
As deep learning models continue to increase in size, the memory requirements for training have surged. While high-level techniques like offloading, recomputation, and compression can alleviate memory pressure, they also introduce overheads. However, a memory-efficient execution plan that includes a reasonable operator execution order and tensor memory layout can significantly increase the models' memory efficiency and reduce overheads from high-level techniques. In this paper, we propose ROAM which operates on computation graph level to derive memory-efficient execution plan with optimized operator order and tensor memory layout for models. We first propose sophisticated theories that carefully consider model structure and training memory load to support optimization for large complex graphs that have not been well supported in the past. An efficient tree-based algorithm is further proposed to search task divisions automatically, along with delivering high performance and effectiveness to solve the problem. Experiments show that ROAM achieves a substantial memory reduction of 35.7%, 13.3%, and 27.2% compared to Pytorch and two state-of-the-art methods and offers a remarkable 53.7x speedup. The evaluation conducted on the expansive GPT2-XL further validates ROAM's scalability.
D4Explainer: In-Distribution GNN Explanations via Discrete Denoising Diffusion
Authors: Authors: Jialin Chen, Shirley Wu, Abhijit Gupta, Rex Ying
Abstract
The widespread deployment of Graph Neural Networks (GNNs) sparks significant interest in their explainability, which plays a vital role in model auditing and ensuring trustworthy graph learning. The objective of GNN explainability is to discern the underlying graph structures that have the most significant impact on model predictions. Ensuring that explanations generated are reliable necessitates consideration of the in-distribution property, particularly due to the vulnerability of GNNs to out-of-distribution data. Unfortunately, prevailing explainability methods tend to constrain the generated explanations to the structure of the original graph, thereby downplaying the significance of the in-distribution property and resulting in explanations that lack reliability. To address these challenges, we propose D4Explainer, a novel approach that provides in-distribution GNN explanations for both counterfactual and model-level explanation scenarios. The proposed D4Explainer incorporates generative graph distribution learning into the optimization objective, which accomplishes two goals: 1) generate a collection of diverse counterfactual graphs that conform to the in-distribution property for a given instance, and 2) identify the most discriminative graph patterns that contribute to a specific class prediction, thus serving as model-level explanations. It is worth mentioning that D4Explainer is the first unified framework that combines both counterfactual and model-level explanations. Empirical evaluations conducted on synthetic and real-world datasets provide compelling evidence of the state-of-the-art performance achieved by D4Explainer in terms of explanation accuracy, faithfulness, diversity, and robustness.
Abstract
This article describes an improved brute-force solving strategy for Quadratic Unconstrained Binary Optimization (QUBO) problems that is faster than naive approaches and easily parallelizable. It exploits the Gray code ordering of natural numbers to allow for a more efficient evaluation of the QUBO objective function. The implementation in Python is discussed in detail, and an additional C implementation is provided.
Volterra black-box models identification methods: direct collocation vs least squares
Authors: Authors: Denis Sidorov, Aleksandr Tynda, Vladislav Muratov, Eugeny Yanitsky
Subjects: Numerical Analysis (math.NA); Systems and Control (eess.SY)
Abstract
The Volterra integral-functional series is the classic approach for nonlinear black box dynamical systems modeling. It is widely employed in many domains including radiophysics, aerodynamics, electronic and electrical engineering and many other. Identifying the time-varying functional parameters, also known as Volterra kernels, poses a difficulty due to the curse of dimensionality. This refers to the exponential growth in the number of model parameters as the complexity of the input-output response increases. The least squares method (LSM) is widely acknowledged as the standard approach for tackling the issue of identifying parameters. Unfortunately, the LSM suffers with many drawbacks such as the sensitivity to outliers causing biased estimation, multicollinearity, overfitting and inefficiency with large datasets. This paper presents alternative approach based on direct estimation of the Volterra kernels using the collocation method. Two model examples are studied. It is found that the collocation method presents a promising alternative for optimization, surpassing the traditional least squares method when it comes to the Volterra kernels identification including the case when input and output signals suffer from considerable measurement errors.
A numerical tool for efficient analysis and optimization of offshore wind turbine jacket substructure considering realistic boundary and loading conditions
Abstract
The jacket substructure is a critical component of the offshore wind turbine (OWT) that is the interface between the transition piece at the top and the grouted connection. This paper presents a comprehensive study on the optimization of a jacket substructure to achieve greater cost efficiency while maintain acceptable structural performance. A fast parametric finite element modelling (FEM) approach for jacket substructures was firstly proposed. The generated models took into account realistic loading conditions, including self-weight, wind load and section-dependent wave load, and soil-pile interaction. Parametric studies were conducted afterwards to investigate the trends of the mass and response of the jacket substructure with respect to the variation of geometric and sectional parameters. Optimizations of the jacket substructure were carried out using parametric optimization and numerical genetic algorithm (GA) optimization under three different optimization strategies corresponding to three groups of objective and constraint functions. The trends obtained by parametric analysis were used to guide the parameter selection in parametric optimization, while a rank-based mutation GA was established with the proposed efficient FEM embedded in as the solver to the optimization objective and constraint functions. Parametric optimization gained its advantage in computational efficiency, and the mass reduction were 6.2%,10% and 14.8% for the three strategies respectively. GA optimization was more aggressive as the mass reductions were 16.8%,22.3% and 34.3% for the three strategies, but was relatively more computational intense. The two proposed optimization methods and the three optimization strategies are both expected to be applied in practical engineering design of OWT jacket substructure with good optimization output and high computational efficiency.
Abstract
Text-to-3D generation has made remarkable progress recently, particularly with methods based on Score Distillation Sampling (SDS) that leverages pre-trained 2D diffusion models. While the usage of classifier-free guidance is well acknowledged to be crucial for successful optimization, it is considered an auxiliary trick rather than the most essential component. In this paper, we re-evaluate the role of classifier-free guidance in score distillation and discover a surprising finding: the guidance alone is enough for effective text-to-3D generation tasks. We name this method Classifier Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. This new perspective reveals new insights for understanding existing techniques. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing, achieving results superior to those of state-of-the-art methods. Our project page is https://xinyu-andy.github.io/Classifier-Score-Distillation
Regret-Minimization Algorithms for Multi-Agent Cooperative Learning Systems
Authors: Authors: Jialin Yi
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Abstract
A Multi-Agent Cooperative Learning (MACL) system is an artificial intelligence (AI) system where multiple learning agents work together to complete a common task. Recent empirical success of MACL systems in various domains (e.g. traffic control, cloud computing, robotics) has sparked active research into the design and analysis of MACL systems for sequential decision making problems. One important metric of the learning algorithm for decision making problems is its regret, i.e. the difference between the highest achievable reward and the actual reward that the algorithm gains. The design and development of a MACL system with low-regret learning algorithms can create huge economic values. In this thesis, I analyze MACL systems for different sequential decision making problems. Concretely, the Chapter 3 and 4 investigate the cooperative multi-agent multi-armed bandit problems, with full-information or bandit feedback, in which multiple learning agents can exchange their information through a communication network and the agents can only observe the rewards of the actions they choose. Chapter 5 considers the communication-regret trade-off for online convex optimization in the distributed setting. Chapter 6 discusses how to form high-productive teams for agents based on their unknown but fixed types using adaptive incremental matchings. For the above problems, I present the regret lower bounds for feasible learning algorithms and provide the efficient algorithms to achieve this bound. The regret bounds I present in Chapter 3, 4 and 5 quantify how the regret depends on the connectivity of the communication network and the communication delay, thus giving useful guidance on design of the communication protocol in MACL systems
VDIP-TGV: Blind Image Deconvolution via Variational Deep Image Prior Empowered by Total Generalized Variation
Abstract
Recovering clear images from blurry ones with an unknown blur kernel is a challenging problem. Deep image prior (DIP) proposes to use the deep network as a regularizer for a single image rather than as a supervised model, which achieves encouraging results in the nonblind deblurring problem. However, since the relationship between images and the network architectures is unclear, it is hard to find a suitable architecture to provide sufficient constraints on the estimated blur kernels and clean images. Also, DIP uses the sparse maximum a posteriori (MAP), which is insufficient to enforce the selection of the recovery image. Recently, variational deep image prior (VDIP) was proposed to impose constraints on both blur kernels and recovery images and take the standard deviation of the image into account during the optimization process by the variational principle. However, we empirically find that VDIP struggles with processing image details and tends to generate suboptimal results when the blur kernel is large. Therefore, we combine total generalized variational (TGV) regularization with VDIP in this paper to overcome these shortcomings of VDIP. TGV is a flexible regularization that utilizes the characteristics of partial derivatives of varying orders to regularize images at different scales, reducing oil painting artifacts while maintaining sharp edges. The proposed VDIP-TGV effectively recovers image edges and details by supplementing extra gradient information through TGV. Additionally, this model is solved by the alternating direction method of multipliers (ADMM), which effectively combines traditional algorithms and deep learning methods. Experiments show that our proposed VDIP-TGV surpasses various state-of-the-art models quantitatively and qualitatively.
A General Neural Causal Model for Interactive Recommendation
Authors: Authors: Jialin Liu, Xinyan Su, Peng Zhou, Xiangyu Zhao, Jun Li
Abstract
Survivor bias in observational data leads the optimization of recommender systems towards local optima. Currently most solutions re-mines existing human-system collaboration patterns to maximize longer-term satisfaction by reinforcement learning. However, from the causal perspective, mitigating survivor effects requires answering a counterfactual problem, which is generally unidentifiable and inestimable. In this work, we propose a neural causal model to achieve counterfactual inference. Specifically, we first build a learnable structural causal model based on its available graphical representations which qualitatively characterizes the preference transitions. Mitigation of the survivor bias is achieved though counterfactual consistency. To identify the consistency, we use the Gumbel-max function as structural constrains. To estimate the consistency, we apply reinforcement optimizations, and use Gumbel-Softmax as a trade-off to get a differentiable function. Both theoretical and empirical studies demonstrate the effectiveness of our solution.
Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation
Authors: Authors: Jialin Liu, Xinyan Su, Zeyu He, Xiangyu Zhao, Jun Li
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
Abstract
Rewards serve as a measure of user satisfaction and act as a limiting factor in interactive recommender systems. In this research, we focus on the problem of learning to reward (LTR), which is fundamental to reinforcement learning. Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization, or assume that user-agent interactions provide perfect demonstrations, which is not feasible in practice. Ideally, we aim to employ a unified approach that optimizes both the reward and policy using compositional demonstrations. However, this requirement presents a challenge since rewards inherently quantify user feedback on-policy, while recommender agents approximate off-policy future cumulative valuation. To tackle this challenge, we propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties. Our method utilizes discounted stationary distribution correction to combine LTR and recommender agent evaluation. To fulfill the compositional requirement, we incorporate the concept of pessimism through conservation. Specifically, we modify the vanilla correction using Bellman transformation and enforce KL regularization to constrain consecutive policy updates. We use two real-world datasets which represent two compositional coverage to conduct empirical studies, the results also show that the proposed method relatively improves both effectiveness (2.3\%) and efficiency (11.53\%)
A Perceptual Shape Loss for Monocular 3D Face Reconstruction
Authors: Authors: Christopher Otto, Prashanth Chandran, Gaspard Zoss, Markus Gross, Paulo Gotardo, Derek Bradley
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
Monocular 3D face reconstruction is a wide-spread topic, and existing approaches tackle the problem either through fast neural network inference or offline iterative reconstruction of face geometry. In either case carefully-designed energy functions are minimized, commonly including loss terms like a photometric loss, a landmark reprojection loss, and others. In this work we propose a new loss function for monocular face capture, inspired by how humans would perceive the quality of a 3D face reconstruction given a particular image. It is widely known that shading provides a strong indicator for 3D shape in the human visual system. As such, our new 'perceptual' shape loss aims to judge the quality of a 3D face estimate using only shading cues. Our loss is implemented as a discriminator-style neural network that takes an input face image and a shaded render of the geometry estimate, and then predicts a score that perceptually evaluates how well the shaded render matches the given image. This 'critic' network operates on the RGB image and geometry render alone, without requiring an estimate of the albedo or illumination in the scene. Furthermore, our loss operates entirely in image space and is thus agnostic to mesh topology. We show how our new perceptual shape loss can be combined with traditional energy terms for monocular 3D face optimization and deep neural network regression, improving upon current state-of-the-art results.
RayDF: Neural Ray-surface Distance Fields with Multi-view Consistency
Abstract
In this paper, we study the problem of continuous 3D shape representations. The majority of existing successful methods are coordinate-based implicit neural representations. However, they are inefficient to render novel views or recover explicit surface points. A few works start to formulate 3D shapes as ray-based neural functions, but the learned structures are inferior due to the lack of multi-view geometry consistency. To tackle these challenges, we propose a new framework called RayDF. It consists of three major components: 1) the simple ray-surface distance field, 2) the novel dual-ray visibility classifier, and 3) a multi-view consistency optimization module to drive the learned ray-surface distances to be multi-view geometry consistent. We extensively evaluate our method on three public datasets, demonstrating remarkable performance in 3D surface point reconstruction on both synthetic and challenging real-world 3D scenes, clearly surpassing existing coordinate-based and ray-based baselines. Most notably, our method achieves a 1000x faster speed than coordinate-based methods to render an 800x800 depth image, showing the superiority of our method for 3D shape representation. Our code and data are available at https://github.com/vLAR-group/RayDF
Mixed coordinate Node link Visualization for Co_authorship Hypergraph Networks
Abstract
We present an algorithmic technique for visualizing the co-authorship networks and other networks modeled with hypergraphs (set systems). As more than two researchers can co-author a paper, a direct representation of the interaction of researchers through their joint works cannot be adequately modeled with direct links between the author-nodes. A hypergraph representation of a co-authorship network treats researchers/authors as nodes and papers as hyperedges (sets of authors). The visualization algorithm that we propose is based on one of the well-studied approaches representing both authors and papers as nodes of different classes. Our approach resembles some known ones like anchored maps but introduces some special techniques for optimizing the vertex positioning. The algorithm involves both continuous (force-directed) optimization and discrete optimization for determining the node coordinates. Moreover, one of the novelties of this work is classifying nodes and links using different colors. This usage has a meaningful purpose that helps the viewer to obtain valuable information from the visualization and increases the readability of the layout. The algorithm is tuned to enable the viewer to answer questions specific to co-authorship network studies.
Optimizing Logical Execution Time Model for Both Determinism and Low Latency
Authors: Authors: Sen Wang, Dong Li, Ashrarul H. Sifat, Shao-Yu Huang, Xuanliang Deng, Changhee Jung, Ryan Williams, Haibo Zeng
Subjects: Systems and Control (eess.SY); Operating Systems (cs.OS); Symbolic Computation (cs.SC)
Abstract
The Logical Execution Time (LET) programming model has recently received considerable attention, particularly because of its timing and dataflow determinism. In LET, task computation appears always to take the same amount of time (called the task's LET interval), and the task reads (resp. writes) at the beginning (resp. end) of the interval. Compared to other communication mechanisms, such as implicit communication and Dynamic Buffer Protocol (DBP), LET performs worse on many metrics, such as end-to-end latency (including reaction time and data age) and time disparity jitter. Compared with the default LET setting, the flexible LET (fLET) model shrinks the LET interval while still guaranteeing schedulability by introducing the virtual offset to defer the read operation and using the virtual deadline to move up the write operation. Therefore, fLET has the potential to significantly improve the end-to-end timing performance while keeping the benefits of deterministic behavior on timing and dataflow. To fully realize the potential of fLET, we consider the problem of optimizing the assignments of its virtual offsets and deadlines. We propose new abstractions to describe the task communication pattern and new optimization algorithms to explore the solution space efficiently. The algorithms leverage the linearizability of communication patterns and utilize symbolic operations to achieve efficient optimization while providing a theoretical guarantee. The framework supports optimizing multiple performance metrics and guarantees bounded suboptimality when optimizing end-to-end latency. Experimental results show that our optimization algorithms improve upon the default LET and its existing extensions and significantly outperform implicit communication and DBP in terms of various metrics, such as end-to-end latency, time disparity, and its jitter.
Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery
Authors: Authors: Sarah Rastegar, Hazel Doughty, Cees G. M. Snoek
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Abstract
In the quest for unveiling novel categories at test time, we confront the inherent limitations of traditional supervised recognition models that are restricted by a predefined category set. While strides have been made in the realms of self-supervised and open-world learning towards test-time category discovery, a crucial yet often overlooked question persists: what exactly delineates a \textit{category}? In this paper, we conceptualize a \textit{category} through the lens of optimization, viewing it as an optimal solution to a well-defined problem. Harnessing this unique conceptualization, we propose a novel, efficient and self-supervised method capable of discovering previously unknown categories at test time. A salient feature of our approach is the assignment of minimum length category codes to individual data instances, which encapsulates the implicit category hierarchy prevalent in real-world datasets. This mechanism affords us enhanced control over category granularity, thereby equipping our model to handle fine-grained categories adeptly. Experimental evaluations, bolstered by state-of-the-art benchmark comparisons, testify to the efficacy of our solution in managing unknown categories at test time. Furthermore, we fortify our proposition with a theoretical foundation, providing proof of its optimality. Our code is available at: \url{https://github.com/SarahRastegar/InfoSieve}.
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models
Abstract
Incorporating a customized object into image generation presents an attractive feature in text-to-image generation. However, existing optimization-based and encoder-based methods are hindered by drawbacks such as time-consuming optimization, insufficient identity preservation, and a prevalent copy-pasting effect. To overcome these limitations, we introduce CustomNet, a novel object customization approach that explicitly incorporates 3D novel view synthesis capabilities into the object customization process. This integration facilitates the adjustment of spatial position relationships and viewpoints, yielding diverse outputs while effectively preserving object identity. Moreover, we introduce delicate designs to enable location control and flexible background control through textual descriptions or specific user-defined images, overcoming the limitations of existing 3D novel view synthesis methods. We further leverage a dataset construction pipeline that can better handle real-world objects and complex backgrounds. Equipped with these designs, our method facilitates zero-shot object customization without test-time optimization, offering simultaneous control over the viewpoints, location, and background. As a result, our CustomNet ensures enhanced identity preservation and generates diverse, harmonious outputs.
DEFT: Dexterous Fine-Tuning for Real-World Hand Policies
Abstract
Dexterity is often seen as a cornerstone of complex manipulation. Humans are able to perform a host of skills with their hands, from making food to operating tools. In this paper, we investigate these challenges, especially in the case of soft, deformable objects as well as complex, relatively long-horizon tasks. However, learning such behaviors from scratch can be data inefficient. To circumvent this, we propose a novel approach, DEFT (DExterous Fine-Tuning for Hand Policies), that leverages human-driven priors, which are executed directly in the real world. In order to improve upon these priors, DEFT involves an efficient online optimization procedure. With the integration of human-based learning and online fine-tuning, coupled with a soft robotic hand, DEFT demonstrates success across various tasks, establishing a robust, data-efficient pathway toward general dexterous manipulation. Please see our website at https://dexterous-finetuning.github.io for video results.
Gradient-Based Dovetail Joint Shape Optimization for Stiffness
Authors: Authors: Xingyuan Sun, Chenyue Cai, Ryan P. Adams, Szymon Rusinkiewicz
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
It is common to manufacture an object by decomposing it into parts that can be assembled. This decomposition is often required by size limits of the machine, the complex structure of the shape, etc. To make it possible to easily assemble the final object, it is often desirable to design geometry that enables robust connections between the subcomponents. In this project, we study the task of dovetail-joint shape optimization for stiffness using gradient-based optimization. This optimization requires a differentiable simulator that is capable of modeling the contact between the two parts of a joint, making it possible to reason about the gradient of the stiffness with respect to shape parameters. Our simulation approach uses a penalty method that alternates between optimizing each side of the joint, using the adjoint method to compute gradients. We test our method by optimizing the joint shapes in three different joint shape spaces, and evaluate optimized joint shapes in both simulation and real-world tests. The experiments show that optimized joint shapes achieve higher stiffness, both synthetically and in real-world tests.
Keyword: adam
Correlation Aware Sparsified Mean Estimation Using Random Projection
Abstract
We study the problem of communication-efficient distributed vector mean estimation, a commonly used subroutine in distributed optimization and Federated Learning (FL). Rand-$k$ sparsification is a commonly used technique to reduce communication cost, where each client sends $k < d$ of its coordinates to the server. However, Rand-$k$ is agnostic to any correlations, that might exist between clients in practical scenarios. The recently proposed Rand-$k$-Spatial estimator leverages the cross-client correlation information at the server to improve Rand-$k$'s performance. Yet, the performance of Rand-$k$-Spatial is suboptimal. We propose the Rand-Proj-Spatial estimator with a more flexible encoding-decoding procedure, which generalizes the encoding of Rand-$k$ by projecting the client vectors to a random $k$-dimensional subspace. We utilize Subsampled Randomized Hadamard Transform (SRHT) as the projection matrix and show that Rand-Proj-Spatial with SRHT outperforms Rand-$k$-Spatial, using the correlation information more efficiently. Furthermore, we propose an approach to incorporate varying degrees of correlation and suggest a practical variant of Rand-Proj-Spatial when the correlation information is not available to the server. Experiments on real-world distributed optimization tasks showcase the superior performance of Rand-Proj-Spatial compared to Rand-$k$-Spatial and other more sophisticated sparsification techniques.
Fast Trainable Projection for Robust Fine-Tuning
Authors: Authors: Junjiao Tian, Yen-Cheng Liu, James Seale Smith, Zsolt Kira
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Robust fine-tuning aims to achieve competitive in-distribution (ID) performance while maintaining the out-of-distribution (OOD) robustness of a pre-trained model when transferring it to a downstream task. Recently, projected gradient descent has been successfully used in robust fine-tuning by constraining the deviation from the initialization of the fine-tuned model explicitly through projection. However, algorithmically, two limitations prevent this method from being adopted more widely, scalability and efficiency. In this paper, we propose a new projection-based fine-tuning algorithm, Fast Trainable Projection (FTP) for computationally efficient learning of per-layer projection constraints, resulting in an average $35\%$ speedup on our benchmarks compared to prior works. FTP can be combined with existing optimizers such as AdamW, and be used in a plug-and-play fashion. Finally, we show that FTP is a special instance of hyper-optimizers that tune the hyper-parameters of optimizers in a learnable manner through nested differentiation. Empirically, we show superior robustness on OOD datasets, including domain shifts and natural corruptions, across four different vision tasks with five different pre-trained models. Additionally, we demonstrate that FTP is broadly applicable and beneficial to other learning scenarios such as low-label and continual learning settings thanks to its easy adaptability. The code will be available at https://github.com/GT-RIPL/FTP.git.
Keyword: gradient
Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning
Abstract
Large Language Models (LLMs) prompted to generate chain-of-thought (CoT) exhibit impressive reasoning capabilities. Recent attempts at prompt decomposition toward solving complex, multi-step reasoning problems depend on the ability of the LLM to simultaneously decompose and solve the problem. A significant disadvantage is that foundational LLMs are typically not available for fine-tuning, making adaptation computationally prohibitive. We believe (and demonstrate) that problem decomposition and solution generation are distinct capabilites, better addressed in separate modules, than by one monolithic LLM. We introduce DaSLaM, which uses a decomposition generator to decompose complex problems into subproblems that require fewer reasoning steps. These subproblems are answered by a solver. We use a relatively small (13B parameters) LM as the decomposition generator, which we train using policy gradient optimization to interact with a solver LM (regarded as black-box) and guide it through subproblems, thereby rendering our method solver-agnostic. Evaluation on multiple different reasoning datasets reveal that with our method, a 175 billion parameter LM (text-davinci-003) can produce competitive or even better performance, compared to its orders-of-magnitude larger successor, GPT-4. Additionally, we show that DaSLaM is not limited by the solver's capabilities as a function of scale; e.g., solver LMs with diverse sizes give significant performance improvement with our solver-agnostic decomposition technique. Exhaustive ablation studies evince the superiority of our modular finetuning technique over exorbitantly large decomposer LLMs, based on prompting alone.
A general learning scheme for classical and quantum Ising machines
Authors: Authors: Ludwig Schmid, Enrico Zardini, Davide Pastorello
Abstract
An Ising machine is any hardware specifically designed for finding the ground state of the Ising model. Relevant examples are coherent Ising machines and quantum annealers. In this paper, we propose a new machine learning model that is based on the Ising structure and can be efficiently trained using gradient descent. We provide a mathematical characterization of the training process, which is based upon optimizing a loss function whose partial derivatives are not explicitly calculated but estimated by the Ising machine itself. Moreover, we present some experimental results on the training and execution of the proposed learning model. These results point out new possibilities offered by Ising machines for different learning tasks. In particular, in the quantum realm, the quantum resources are used for both the execution and the training of the model, providing a promising perspective in quantum machine learning.
End-to-end Feature Selection Approach for Learning Skinny Trees
Abstract
Joint feature selection and tree ensemble learning is a challenging task. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature selection post-training based on feature importances, which are known to be misleading, and can significantly hurt performance. We propose Skinny Trees: a toolkit for feature selection in tree ensembles, such that feature selection and tree ensemble learning occurs simultaneously. It is based on an end-to-end optimization approach that considers feature selection in differentiable trees with Group $\ell_0 - \ell_2$ regularization. We optimize with a first-order proximal method and present convergence guarantees for a non-convex and non-smooth objective. Interestingly, dense-to-sparse regularization scheduling can lead to more expressive and sparser tree ensembles than vanilla proximal method. On 15 synthetic and real-world datasets, Skinny Trees can achieve $1.5\times$ - $620\times$ feature compression rates, leading up to $10\times$ faster inference over dense trees, without any loss in performance. Skinny Trees lead to superior feature selection than many existing toolkits e.g., in terms of AUC performance for $25\%$ feature budget, Skinny Trees outperforms LightGBM by $10.2\%$ (up to $37.7\%$), and Random Forests by $3\%$ (up to $12.5\%$).
Visual Explanations via Iterated Integrated Attributions
Abstract
We introduce Iterated Integrated Attributions (IIA) - a generic method for explaining the predictions of vision models. IIA employs iterative integration across the input image, the internal representations generated by the model, and their gradients, yielding precise and focused explanation maps. We demonstrate the effectiveness of IIA through comprehensive evaluations across various tasks, datasets, and network architectures. Our results showcase that IIA produces accurate explanation maps, outperforming other state-of-the-art explanation techniques.
Domain Generalisation via Risk Distribution Matching
Authors: Authors: Toan Nguyen, Kien Do, Bao Duong, Thin Nguyen
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
We propose a novel approach for domain generalisation (DG) leveraging risk distributions to characterise domains, thereby achieving domain invariance. In our findings, risk distributions effectively highlight differences between training domains and reveal their inherent complexities. In testing, we may observe similar, or potentially intensifying in magnitude, divergences between risk distributions. Hence, we propose a compelling proposition: Minimising the divergences between risk distributions across training domains leads to robust invariance for DG. The key rationale behind this concept is that a model, trained on domain-invariant or stable features, may consistently produce similar risk distributions across various domains. Building upon this idea, we propose Risk Distribution Matching (RDM). Using the maximum mean discrepancy (MMD) distance, RDM aims to minimise the variance of risk distributions across training domains. However, when the number of domains increases, the direct optimisation of variance leads to linear growth in MMD computations, resulting in inefficiency. Instead, we propose an approximation that requires only one MMD computation, by aligning just two distributions: that of the worst-case domain and the aggregated distribution from all domains. Notably, this method empirically outperforms optimising distributional variance while being computationally more efficient. Unlike conventional DG matching algorithms, RDM stands out for its enhanced efficacy by concentrating on scalar risk distributions, sidestepping the pitfalls of high-dimensional challenges seen in feature or gradient matching. Our extensive experiments on standard benchmark datasets demonstrate that RDM shows superior generalisation capability over state-of-the-art DG methods.
Device-Edge Cooperative Fine-Tuning of Foundation Models as a 6G Service
Authors: Authors: Hai Wu, Xu Chen, Kaibin Huang
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
Abstract
Foundation models (FoMos), referring to large-scale AI models, possess human-like capabilities and are able to perform competitively in the domain of human intelligence. The breakthrough in FoMos has inspired researchers to deploy such models in the sixth-generation (6G) mobile networks for automating a broad range of tasks in next-generation mobile applications. While the sizes of FoMos are reaching their peaks, their next phase is expected to focus on fine-tuning the models to specific downstream tasks. This inspires us to propose the vision of FoMo fine-tuning as a 6G service. Its key feature is the exploitation of existing parameter-efficient fine-tuning (PEFT) techniques to tweak only a small fraction of model weights for a FoMo to become customized for a specific task. To materialize the said vision, we survey the state-of-the-art PEFT and then present a novel device-edge fine-tuning (DEFT) framework for providing efficient and privacy-preserving fine-tuning services at the 6G network edge. The framework consists of the following comprehensive set of techniques: 1) Control of fine-tuning parameter sizes in different transformer blocks of a FoMo; 2) Over-the-air computation for realizing neural connections in DEFT; 3) Federated DEFT in a multi-device system by downloading a FoMo emulator or gradients; 4) On-the-fly prompt-ensemble tuning; 5) Device-to-device prompt transfer among devices. Experiments are conducted using pre-trained FoMos with up to 11 billion parameters to demonstrate the effectiveness of DEFT techniques. The article is concluded by presenting future research opportunities.
Abstract
Multi-objective optimization is a type of decision making problems where multiple conflicting objectives are optimized. We study offline optimization of multi-objective policies from data collected by an existing policy. We propose a pessimistic estimator for the multi-objective policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator is based on inverse propensity scores (IPS), and improves upon a naive IPS estimator in both theory and experiments. Our analysis is general, and applies beyond our IPS estimators and methods for optimizing them. The pessimistic estimator can be optimized by policy gradients and performs well in all of our experiments.
Explainable Modeling for Wind Power Forecasting: A Glass-Box Approach with Exceptional Accuracy
Authors: Authors: Wenlong Liao, Fernando Porté-Agel, Jiannong Fang, Birgitte Bak-Jensen, Guangchun Ruan, Zhe Yang
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
Machine learning models (e.g., neural networks) achieve high accuracy in wind power forecasting, but they are usually regarded as black boxes that lack interpretability. To address this issue, the paper proposes a glass-box approach that combines exceptional accuracy with transparency for wind power forecasting. Specifically, advanced artificial intelligence methods (e.g., gradient boosting) are innovatively employed to create shape functions within the forecasting model. These functions effectively map the intricate non-linear relationships between wind power output and input features. Furthermore, the forecasting model is enriched by incorporating interaction terms that adeptly capture interdependencies and synergies among the input features. Simulation results show that the proposed glass-box approach effectively interprets the results of wind power forecasting from both global and instance perspectives. Besides, it outperforms most benchmark models and exhibits comparable performance to the best-performing neural networks. This dual strength of transparency and high accuracy positions the proposed glass-box approach as a compelling choice for reliable wind power forecasting.
On Training Implicit Meta-Learning With Applications to Inductive Weighing in Consistency Regularization
Abstract
Meta-learning that uses implicit gradient have provided an exciting alternative to standard techniques which depend on the trajectory of the inner loop training. Implicit meta-learning (IML), however, require computing $2^{nd}$ order gradients, particularly the Hessian which is impractical to compute for modern deep learning models. Various approximations for the Hessian were proposed but a systematic comparison of their compute cost, stability, generalization of solution found and estimation accuracy were largely overlooked. In this study, we start by conducting a systematic comparative analysis of the various approximation methods and their effect when incorporated into IML training routines. We establish situations where catastrophic forgetting is exhibited in IML and explain their cause in terms of the inability of the approximations to estimate the curvature at convergence points. Sources of IML training instability are demonstrated and remedied. A detailed analysis of the effeciency of various inverse Hessian-vector product approximation methods is also provided. Subsequently, we use the insights gained to propose and evaluate a novel semi-supervised learning algorithm that learns to inductively weigh consistency regularization losses. We show how training a "Confidence Network" to extract domain specific features can learn to up-weigh useful images and down-weigh out-of-distribution samples. Results outperform the baseline FixMatch performance.
Optimization of utility-based shortfall risk: A non-asymptotic viewpoint
Authors: Authors: Sumedh Gupte, Prashanth L. A., Sanjay P. Bhat
Abstract
We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR. Next, in the context of UBSR optimization, we derive an expression for the UBSR gradient under a smooth parameterization. This expression is a ratio of expectations, both of which involve the UBSR. We use SAA for the numerator as well as denominator in the UBSR gradient expression to arrive at a biased gradient estimator. We derive non-asymptotic bounds on the estimation error, which show that our gradient estimator is asymptotically unbiased. We incorporate the aforementioned gradient estimator into a stochastic gradient (SG) algorithm for UBSR optimization. Finally, we derive non-asymptotic bounds that quantify the rate of convergence of our SG algorithm for UBSR optimization.
High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise
Authors: Authors: Aleksandar Armacki, Pranay Sharma, Gauri Joshi, Dragana Bajovic, Dusan Jakovetic, Soummya Kar
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
Abstract
Several recent works have studied the convergence \textit{in high probability} of stochastic gradient descent (SGD) and its clipped variant. Compared to vanilla SGD, clipped SGD is practically more stable and has the additional theoretical benefit of logarithmic dependence on the failure probability. However, the convergence of other practical nonlinear variants of SGD, e.g., sign SGD, quantized SGD and normalized SGD, that achieve improved communication efficiency or accelerated convergence is much less understood. In this work, we study the convergence bounds \textit{in high probability} of a broad class of nonlinear SGD methods. For strongly convex loss functions with Lipschitz continuous gradients, we prove a logarithmic dependence on the failure probability, even when the noise is heavy-tailed. Strictly more general than the results for clipped SGD, our results hold for any nonlinearity with bounded (component-wise or joint) outputs, such as clipping, normalization, and quantization. Further, existing results with heavy-tailed noise assume bounded $\eta$-th central moments, with $\eta \in (1,2]$. In contrast, our refined analysis works even for $\eta=1$, strictly relaxing the noise moment assumptions in the literature.
Differentiable Learning of Generalized Structured Matrices for Efficient Deep Neural Networks
Authors: Authors: Changwoo Lee, Hun-Seok Kim
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
Abstract
This paper investigates efficient deep neural networks (DNNs) to replace dense unstructured weight matrices with structured ones that possess desired properties. The challenge arises because the optimal weight matrix structure in popular neural network models is obscure in most cases and may vary from layer to layer even in the same network. Prior structured matrices proposed for efficient DNNs were mostly hand-crafted without a generalized framework to systematically learn them. To address this issue, we propose a generalized and differentiable framework to learn efficient structures of weight matrices by gradient descent. We first define a new class of structured matrices that covers a wide range of structured matrices in the literature by adjusting the structural parameters. Then, the frequency-domain differentiable parameterization scheme based on the Gaussian-Dirichlet kernel is adopted to learn the structural parameters by proximal gradient descent. Finally, we introduce an effective initialization method for the proposed scheme. Our method learns efficient DNNs with structured matrices, achieving lower complexity and/or higher performance than prior approaches that employ low-rank, block-sparse, or block-low-rank matrices.
D2NO: Efficient Handling of Heterogeneous Input Function Spaces with Distributed Deep Neural Operators
Authors: Authors: Zecheng Zhang, Christian Moya, Lu Lu, Guang Lin, Hayden Schaeffer
Abstract
Neural operators have been applied in various scientific fields, such as solving parametric partial differential equations, dynamical systems with control, and inverse problems. However, challenges arise when dealing with input functions that exhibit heterogeneous properties, requiring multiple sensors to handle functions with minimal regularity. To address this issue, discretization-invariant neural operators have been used, allowing the sampling of diverse input functions with different sensor locations. However, existing frameworks still require an equal number of sensors for all functions. In our study, we propose a novel distributed approach to further relax the discretization requirements and solve the heterogeneous dataset challenges. Our method involves partitioning the input function space and processing individual input functions using independent and separate neural networks. A centralized neural network is used to handle shared information across all output functions. This distributed methodology reduces the number of gradient descent back-propagation steps, improving efficiency while maintaining accuracy. We demonstrate that the corresponding neural network is a universal approximator of continuous nonlinear operators and present four numerical examples to validate its performance.
Ever Evolving Evaluator (EV3): Towards Flexible and Reliable Meta-Optimization for Knowledge Distillation
Authors: Authors: Li Ding, Masrour Zoghi, Guy Tennenholtz, Maryam Karimzadehgan
Abstract
We introduce EV3, a novel meta-optimization framework designed to efficiently train scalable machine learning models through an intuitive explore-assess-adapt protocol. In each iteration of EV3, we explore various model parameter updates, assess them using pertinent evaluation methods, and adapt the model based on the optimal updates and previous progress history. EV3 offers substantial flexibility without imposing stringent constraints like differentiability on the key objectives relevant to the tasks of interest. Moreover, this protocol welcomes updates with biased gradients and allows for the use of a diversity of losses and optimizers. Additionally, in scenarios with multiple objectives, it can be used to dynamically prioritize tasks. With inspiration drawn from evolutionary algorithms, meta-learning, and neural architecture search, we investigate an application of EV3 to knowledge distillation. Our experimental results illustrate EV3's capability to safely explore model spaces, while hinting at its potential applicability across numerous domains due to its inherent flexibility and adaptability.
Estimating the Rate-Distortion Function by Wasserstein Gradient Descent
Authors: Authors: Yibo Yang, Stephan Eckstein, Marcel Nutz, Stephan Mandt
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
Abstract
In the theory of lossy compression, the rate-distortion (R-D) function $R(D)$ describes how much a data source can be compressed (in bit-rate) at any given level of fidelity (distortion). Obtaining $R(D)$ for a given data source establishes the fundamental performance limit for all compression algorithms. We propose a new method to estimate $R(D)$ from the perspective of optimal transport. Unlike the classic Blahut--Arimoto algorithm which fixes the support of the reproduction distribution in advance, our Wasserstein gradient descent algorithm learns the support of the optimal reproduction distribution by moving particles. We prove its local convergence and analyze the sample complexity of our R-D estimator based on a connection to entropic optimal transport. Experimentally, we obtain comparable or tighter bounds than state-of-the-art neural network methods on low-rate sources while requiring considerably less tuning and computation effort. We also highlight a connection to maximum-likelihood deconvolution and introduce a new class of sources that can be used as test cases with known solutions to the R-D problem.
Hyperbolic Graph Neural Networks at Scale: A Meta Learning Approach
Authors: Authors: Nurendra Choudhary, Nikhil Rao, Chandan K. Reddy
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract
The progress in hyperbolic neural networks (HNNs) research is hindered by their absence of inductive bias mechanisms, which are essential for generalizing to new tasks and facilitating scalable learning over large datasets. In this paper, we aim to alleviate these issues by learning generalizable inductive biases from the nodes' local subgraph and transfer them for faster learning over new subgraphs with a disjoint set of nodes, edges, and labels in a few-shot setting. We introduce a novel method, Hyperbolic GRAph Meta Learner (H-GRAM), that, for the tasks of node classification and link prediction, learns transferable information from a set of support local subgraphs in the form of hyperbolic meta gradients and label hyperbolic protonets to enable faster learning over a query set of new tasks dealing with disjoint subgraphs. Furthermore, we show that an extension of our meta-learning framework also mitigates the scalability challenges seen in HNNs faced by existing approaches. Our comparative analysis shows that H-GRAM effectively learns and transfers information in multiple challenging few-shot settings compared to other state-of-the-art baselines. Additionally, we demonstrate that, unlike standard HNNs, our approach is able to scale over large graph datasets and improve performance over its Euclidean counterparts.
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation
Authors: Authors: Nikki Lijing Kuang, Ming Yin, Mengdi Wang, Yu-Xiang Wang, Yi-An Ma
Abstract
Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms typically rely on the accessibility of immediate feedback upon taking actions. The failure to account for the impact of delay in observations can significantly degrade the performance of real-world systems due to the regret blow-up. In this work, we tackle the challenge of delayed feedback in RL with linear function approximation by employing posterior sampling, which has been shown to empirically outperform the popular UCB algorithms in a wide range of regimes. We first introduce Delayed-PSVI, an optimistic value-based algorithm that effectively explores the value function space via noise perturbation with posterior sampling. We provide the first analysis for posterior sampling algorithms with delayed feedback in RL and show our algorithm achieves $\widetilde{O}(\sqrt{d^3H^3 T} + d^2H^2 E[\tau])$ worst-case regret in the presence of unknown stochastic delays. Here $E[\tau]$ is the expected delay. To further improve its computational efficiency and to expand its applicability in high-dimensional RL problems, we incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI, which maintains the same order-optimal regret guarantee with $\widetilde{O}(dHK)$ computational cost. Empirical evaluations are performed to demonstrate the statistical and computational efficacy of our algorithms.
Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks on Nearly-orthogonal Data
Abstract
The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well. While the implicit bias of gradient flow has been widely studied for homogeneous neural networks (including ReLU and leaky ReLU networks), the implicit bias of gradient descent is currently only understood for smooth neural networks. Therefore, implicit bias in non-smooth neural networks trained by gradient descent remains an open question. In this paper, we aim to answer this question by studying the implicit bias of gradient descent for training two-layer fully connected (leaky) ReLU neural networks. We showed that when the training data are nearly-orthogonal, for leaky ReLU activation function, gradient descent will find a network with a stable rank that converges to $1$, whereas for ReLU activation function, gradient descent will find a neural network with a stable rank that is upper bounded by a constant. Additionally, we show that gradient descent will find a neural network such that all the training data points have the same normalized margin asymptotically. Experiments on both synthetic and real data backup our theoretical findings.
Machine Learning Algorithms to Predict Chess960 Result and Develop Opening Themes
Abstract
This work focuses on the analysis of Chess 960, also known as Fischer Random Chess, a variant of traditional chess where the starting positions of the pieces are randomized. The study aims to predict the game outcome using machine learning techniques and develop an opening theme for each starting position. The first part of the analysis utilizes machine learning models to predict the game result based on certain moves in each position. The methodology involves segregating raw data from .pgn files into usable formats and creating datasets comprising approximately 500 games for each starting position. Three machine learning algorithms -- KNN Clustering, Random Forest, and Gradient Boosted Trees -- have been used to predict the game outcome. To establish an opening theme, the board is divided into five regions: center, white kingside, white queenside, black kingside, and black queenside. The data from games played by top engines in all 960 positions is used to track the movement of pieces in the opening. By analysing the change in the number of pieces in each region at specific moves, the report predicts the region towards which the game is developing. These models provide valuable insights into predicting game outcomes and understanding the opening theme in Chess 960.
TIC-TAC: A Framework To Learn And Evaluate Your Covariance
Abstract
We study the problem of unsupervised heteroscedastic covariance estimation, where the goal is to learn the multivariate target distribution $\mathcal{N}(y, \Sigmay | x )$ given an observation $x$. This problem is particularly challenging as $\Sigma{y}$ varies for different samples (heteroscedastic) and no annotation for the covariance is available (unsupervised). Typically, state-of-the-art methods predict the mean $f{\theta}(x)$ and covariance $\textrm{Cov}(f{\theta}(x))$ of the target distribution through two neural networks trained using the negative log-likelihood. This raises two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean? (2) In the absence of ground-truth annotation, how can we quantify the performance of covariance estimation? We address (1) by deriving TIC: Taylor Induced Covariance, which captures the randomness of the multivariate $f_{\theta}(x)$ by incorporating its gradient and curvature around $x$ through the second order Taylor polynomial. Furthermore, we tackle (2) by introducing TAC: Task Agnostic Correlations, a metric which leverages conditioning of the normal distribution to evaluate the covariance. We verify the effectiveness of TIC through multiple experiments spanning synthetic (univariate, multivariate) and real-world datasets (UCI Regression, LSP, and MPII Human Pose Estimation). Our experiments show that TIC outperforms state-of-the-art in accurately learning the covariance, as quantified through TAC.
Blacksmith: Fast Adversarial Training of Vision Transformers via a Mixture of Single-step and Multi-step Methods
Authors: Authors: Mahdi Salmani, Alireza Dehghanpour Farashah, Mohammad Azizmalayeri, Mahdi Amiri, Navid Eslami, Mohammad Taghi Manzuri, Mohammad Hossein Rohban
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Despite the remarkable success achieved by deep learning algorithms in various domains, such as computer vision, they remain vulnerable to adversarial perturbations. Adversarial Training (AT) stands out as one of the most effective solutions to address this issue; however, single-step AT can lead to Catastrophic Overfitting (CO). This scenario occurs when the adversarially trained network suddenly loses robustness against multi-step attacks like Projected Gradient Descent (PGD). Although several approaches have been proposed to address this problem in Convolutional Neural Networks (CNNs), we found out that they do not perform well when applied to Vision Transformers (ViTs). In this paper, we propose Blacksmith, a novel training strategy to overcome the CO problem, specifically in ViTs. Our approach utilizes either of PGD-2 or Fast Gradient Sign Method (FGSM) randomly in a mini-batch during the adversarial training of the neural network. This will increase the diversity of our training attacks, which could potentially mitigate the CO issue. To manage the increased training time resulting from this combination, we craft the PGD-2 attack based on only the first half of the layers, while FGSM is applied end-to-end. Through our experiments, we demonstrate that our novel method effectively prevents CO, achieves PGD-2 level performance, and outperforms other existing techniques including N-FGSM, which is the state-of-the-art method in fast training for CNNs.
NP-SBFL: Bridging the Gap Between Spectrum-Based Fault Localization and Faulty Neural Pathways Diagnosis
Abstract
Deep learning has revolutionized various real-world applications, but the quality of Deep Neural Networks (DNNs) remains a concern. DNNs are complex and have millions of parameters, making it difficult to determine their contributions to fulfilling a task. Moreover, the behavior of a DNN is highly influenced by the data used during training, making it challenging to collect enough data to exercise all potential DNN behavior under all possible scenarios. This paper proposes a novel NP-SBFL method that adapts spectrum-based fault localization (SBFL) to locate faulty neural pathways. Our method identifies critical neurons using the layer-wise relevance propagation (LRP) technique and determines which critical neurons are faulty. We propose a multi-stage gradient ascent (MGA), an extension of gradient ascent, to effectively activate a sequence of neurons one at a time while maintaining the activation of previous neurons. We evaluated the effectiveness of our method on two commonly used datasets, MNIST and CIFAR-10, two baselines DeepFault and NP-SBFL-GA, and three suspicious neuron measures, Tarantula, Ochiai, and Barinel. The empirical results showed that NP-SBFL-MGA is statistically more effective than the baselines at identifying suspicious paths and synthesizing adversarial inputs. Particularly, Tarantula on NP-SBFL-MGA had the highest fault detection rate at 96.75%, surpassing DeepFault on Ochiai (89.90%) and NP-SBFL-GA on Ochiai (60.61%). Our approach also yielded comparable results to the baselines in synthesizing naturalness inputs, and we found a positive correlation between the coverage of critical paths and the number of failed tests in DNN fault localization.
Boosting Decision-Based Black-Box Adversarial Attack with Gradient Priors
Authors: Authors: Han Liu, Xingshuo Huang, Xiaotong Zhang, Qimai Li, Fenglong Ma, Wei Wang, Hongyang Chen, Hong Yu, Xianchao Zhang
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Decision-based methods have shown to be effective in black-box adversarial attacks, as they can obtain satisfactory performance and only require to access the final model prediction. Gradient estimation is a critical step in black-box adversarial attacks, as it will directly affect the query efficiency. Recent works have attempted to utilize gradient priors to facilitate score-based methods to obtain better results. However, these gradient priors still suffer from the edge gradient discrepancy issue and the successive iteration gradient direction issue, thus are difficult to simply extend to decision-based methods. In this paper, we propose a novel Decision-based Black-box Attack framework with Gradient Priors (DBA-GP), which seamlessly integrates the data-dependent gradient prior and time-dependent prior into the gradient estimation procedure. First, by leveraging the joint bilateral filter to deal with each random perturbation, DBA-GP can guarantee that the generated perturbations in edge locations are hardly smoothed, i.e., alleviating the edge gradient discrepancy, thus remaining the characteristics of the original image as much as possible. Second, by utilizing a new gradient updating strategy to automatically adjust the successive iteration gradient direction, DBA-GP can accelerate the convergence speed, thus improving the query efficiency. Extensive experiments have demonstrated that the proposed method outperforms other strong baselines significantly.
Escaping Saddle Points in Heterogeneous Federated Learning via Distributed SGD with Communication Compression
Authors: Authors: Sijin Chen, Zhize Li, Yuejie Chi
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Abstract
We consider the problem of finding second-order stationary points of heterogeneous federated learning (FL). Previous works in FL mostly focus on first-order convergence guarantees, which do not rule out the scenario of unstable saddle points. Meanwhile, it is a key bottleneck of FL to achieve communication efficiency without compensating the learning accuracy, especially when local data are highly heterogeneous across different clients. Given this, we propose a novel algorithm Power-EF that only communicates compressed information via a novel error-feedback scheme. To our knowledge, Power-EF is the first distributed and compressed SGD algorithm that provably escapes saddle points in heterogeneous FL without any data homogeneity assumptions. In particular, Power-EF improves to second-order stationary points after visiting first-order (possibly saddle) points, using additional gradient queries and communication rounds only of almost the same order required by first-order convergence, and the convergence rate exhibits a linear speedup in terms of the number of workers. Our theory improves/recovers previous results, while extending to much more tolerant settings on the local data. Numerical experiments are provided to complement the theory.
Expanding memory in recurrent spiking networks
Authors: Authors: Ismael Balafrej, Fabien Alibart, Jean Rouat
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
Recurrent spiking neural networks (RSNNs) are notoriously difficult to train because of the vanishing gradient problem that is enhanced by the binary nature of the spikes. In this paper, we review the ability of the current state-of-the-art RSNNs to solve long-term memory tasks, and show that they have strong constraints both in performance, and for their implementation on hardware analog neuromorphic processors. We present a novel spiking neural network that circumvents these limitations. Our biologically inspired neural network uses synaptic delays, branching factor regularization and a novel surrogate derivative for the spiking function. The proposed network proves to be more successful in using the recurrent connections on memory tasks.
Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery
Abstract
Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles -- where alignment with human expectations can be imperative. In this paper, we propose to adapt similar RL-based methods to unsupervised object discovery, i.e. learning to detect objects from LiDAR points without any training labels. Instead of labels, we use simple heuristics to mimic human feedback. More explicitly, we combine multiple heuristics into a simple reward function that positively correlates its score with bounding box accuracy, \ie, boxes containing objects are scored higher than those without. We start from the detector's own predictions to explore the space and reinforce boxes with high rewards through gradient updates. Empirically, we demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train compared to prior works on object discovery.
Proving Linear Mode Connectivity of Neural Networks via Optimal Transport
Abstract
The energy landscape of high-dimensional non-convex optimization problems is crucial to understanding the effectiveness of modern deep neural network architectures. Recent works have experimentally shown that two different solutions found after two runs of a stochastic training are often connected by very simple continuous paths (e.g., linear) modulo a permutation of the weights. In this paper, we provide a framework theoretically explaining this empirical observation. Based on convergence rates in Wasserstein distance of empirical measures, we show that, with high probability, two wide enough two-layer neural networks trained with stochastic gradient descent are linearly connected. Additionally, we express upper and lower bounds on the width of each layer of two deep neural networks with independent neuron weights to be linearly connected. Finally, we empirically demonstrate the validity of our approach by showing how the dimension of the support of the weight distribution of neurons, which dictates Wasserstein convergence rates is correlated with linear mode connectivity.
Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery
Authors: Authors: Huseyin Fuat Alsan, Taner Arsan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
This paper explores post-disaster analytics using multimodal deep learning models trained with curriculum learning method. Studying post-disaster analytics is important as it plays a crucial role in mitigating the impact of disasters by providing timely and accurate insights into the extent of damage and the allocation of resources. We propose a curriculum learning strategy to enhance the performance of multimodal deep learning models. Curriculum learning emulates the progressive learning sequence in human education by training deep learning models on increasingly complex data. Our primary objective is to develop a curriculum-trained multimodal deep learning model, with a particular focus on visual question answering (VQA) capable of jointly processing image and text data, in conjunction with semantic segmentation for disaster analytics using the FloodNet\footnote{https://github.com/BinaLab/FloodNet-Challenge-EARTHVISION2021} dataset. To achieve this, U-Net model is used for semantic segmentation and image encoding. A custom built text classifier is used for visual question answering. Existing curriculum learning methods rely on manually defined difficulty functions. We introduce a novel curriculum learning approach termed Dynamic Task and Weight Prioritization (DATWEP), which leverages a gradient-based method to automatically decide task difficulty during curriculum learning training, thereby eliminating the need for explicit difficulty computation. The integration of DATWEP into our multimodal model shows improvement on VQA performance. Source code is available at https://github.com/fualsan/DATWEP.
Fast Trainable Projection for Robust Fine-Tuning
Authors: Authors: Junjiao Tian, Yen-Cheng Liu, James Seale Smith, Zsolt Kira
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Robust fine-tuning aims to achieve competitive in-distribution (ID) performance while maintaining the out-of-distribution (OOD) robustness of a pre-trained model when transferring it to a downstream task. Recently, projected gradient descent has been successfully used in robust fine-tuning by constraining the deviation from the initialization of the fine-tuned model explicitly through projection. However, algorithmically, two limitations prevent this method from being adopted more widely, scalability and efficiency. In this paper, we propose a new projection-based fine-tuning algorithm, Fast Trainable Projection (FTP) for computationally efficient learning of per-layer projection constraints, resulting in an average $35\%$ speedup on our benchmarks compared to prior works. FTP can be combined with existing optimizers such as AdamW, and be used in a plug-and-play fashion. Finally, we show that FTP is a special instance of hyper-optimizers that tune the hyper-parameters of optimizers in a learnable manner through nested differentiation. Empirically, we show superior robustness on OOD datasets, including domain shifts and natural corruptions, across four different vision tasks with five different pre-trained models. Additionally, we demonstrate that FTP is broadly applicable and beneficial to other learning scenarios such as low-label and continual learning settings thanks to its easy adaptability. The code will be available at https://github.com/GT-RIPL/FTP.git.
On the accuracy and efficiency of group-wise clipping in differentially private optimization
Authors: Authors: Zhiqi Bu, Ruixuan Liu, Yu-Xiang Wang, Sheng Zha, George Karypis
Abstract
Recent advances have substantially improved the accuracy, memory cost, and training speed of differentially private (DP) deep learning, especially on large vision and language models with millions to billions of parameters. In this work, we thoroughly study the per-sample gradient clipping style, a key component in DP optimization. We show that different clipping styles have the same time complexity but instantiate an accuracy-memory trade-off: while the all-layer clipping (of coarse granularity) is the most prevalent and usually gives the best accuracy, it incurs heavier memory cost compared to other group-wise clipping, such as the layer-wise clipping (of finer granularity). We formalize this trade-off through our convergence theory and complexity analysis. Importantly, we demonstrate that the accuracy gap between group-wise clipping and all-layer clipping becomes smaller for larger models, while the memory advantage of the group-wise clipping remains. Consequently, the group-wise clipping allows DP optimization of large models to achieve high accuracy and low peak memory simultaneously.
Maximum Knowledge Orthogonality Reconstruction with Gradients in Federated Learning
Authors: Authors: Feng Wang, Senem Velipasalar, M. Cenk Gursoy
Abstract
Federated learning (FL) aims at keeping client data local to preserve privacy. Instead of gathering the data itself, the server only collects aggregated gradient updates from clients. Following the popularity of FL, there has been considerable amount of work, revealing the vulnerability of FL approaches by reconstructing the input data from gradient updates. Yet, most existing works assume an FL setting with unrealistically small batch size, and have poor image quality when the batch size is large. Other works modify the neural network architectures or parameters to the point of being suspicious, and thus, can be detected by clients. Moreover, most of them can only reconstruct one sample input from a large batch. To address these limitations, we propose a novel and completely analytical approach, referred to as the maximum knowledge orthogonality reconstruction (MKOR), to reconstruct clients' input data. Our proposed method reconstructs a mathematically proven high quality image from large batches. MKOR only requires the server to send secretly modified parameters to clients and can efficiently and inconspicuously reconstruct the input images from clients' gradient updates. We evaluate MKOR's performance on the MNIST, CIFAR-100, and ImageNet dataset and compare it with the state-of-the-art works. The results show that MKOR outperforms the existing approaches, and draws attention to a pressing need for further research on the privacy protection of FL so that comprehensive defense approaches can be developed.
Flow-based Distributionally Robust Optimization
Authors: Authors: Chen Xu, Jonghyeok Lee, Xiuyuan Cheng, Yao Xie
Abstract
We present a computationally efficient framework, called \texttt{FlowDRO}, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets, when requiring the worst-case distribution (also called the Least Favorable Distribution, LFD) to be continuous so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models, continuous-time invertible transport maps between the data distribution and the target distribution, and develop a Wasserstein proximal gradient flow type of algorithm. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. Our computational framework is general, can handle high-dimensional data with large sample sizes, and can be useful for various applications. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on real high-dimensional data.
VDIP-TGV: Blind Image Deconvolution via Variational Deep Image Prior Empowered by Total Generalized Variation
Abstract
Recovering clear images from blurry ones with an unknown blur kernel is a challenging problem. Deep image prior (DIP) proposes to use the deep network as a regularizer for a single image rather than as a supervised model, which achieves encouraging results in the nonblind deblurring problem. However, since the relationship between images and the network architectures is unclear, it is hard to find a suitable architecture to provide sufficient constraints on the estimated blur kernels and clean images. Also, DIP uses the sparse maximum a posteriori (MAP), which is insufficient to enforce the selection of the recovery image. Recently, variational deep image prior (VDIP) was proposed to impose constraints on both blur kernels and recovery images and take the standard deviation of the image into account during the optimization process by the variational principle. However, we empirically find that VDIP struggles with processing image details and tends to generate suboptimal results when the blur kernel is large. Therefore, we combine total generalized variational (TGV) regularization with VDIP in this paper to overcome these shortcomings of VDIP. TGV is a flexible regularization that utilizes the characteristics of partial derivatives of varying orders to regularize images at different scales, reducing oil painting artifacts while maintaining sharp edges. The proposed VDIP-TGV effectively recovers image edges and details by supplementing extra gradient information through TGV. Additionally, this model is solved by the alternating direction method of multipliers (ADMM), which effectively combines traditional algorithms and deep learning methods. Experiments show that our proposed VDIP-TGV surpasses various state-of-the-art models quantitatively and qualitatively.
Abstract
Actor-Critic methods are in a stalemate of two seemingly irreconcilable problems. Firstly, critic proneness towards overestimation requires sampling temporal-difference targets from a conservative policy optimized using lower-bound Q-values. Secondly, well-known results show that policies that are optimistic in the face of uncertainty yield lower regret levels. To remedy this dichotomy, we propose Decoupled Actor-Critic (DAC). DAC is an off-policy algorithm that learns two distinct actors by gradient backpropagation: a conservative actor used for temporal-difference learning and an optimistic actor used for exploration. We test DAC on DeepMind Control tasks in low and high replay ratio regimes and ablate multiple design choices. Despite minimal computational overhead, DAC achieves state-of-the-art performance and sample efficiency on locomotion tasks.
Model Uncertainty based Active Learning on Tabular Data using Boosted Trees
Abstract
Supervised machine learning relies on the availability of good labelled data for model training. Labelled data is acquired by human annotation, which is a cumbersome and costly process, often requiring subject matter experts. Active learning is a sub-field of machine learning which helps in obtaining the labelled data efficiently by selecting the most valuable data instances for model training and querying the labels only for those instances from the human annotator. Recently, a lot of research has been done in the field of active learning, especially for deep neural network based models. Although deep learning shines when dealing with image\textual\multimodal data, gradient boosting methods still tend to achieve much better results on tabular data. In this work, we explore active learning for tabular data using boosted trees. Uncertainty based sampling in active learning is the most commonly used querying strategy, wherein the labels of those instances are sequentially queried for which the current model prediction is maximally uncertain. Entropy is often the choice for measuring uncertainty. However, entropy is not exactly a measure of model uncertainty. Although there has been a lot of work in deep learning for measuring model uncertainty and employing it in active learning, it is yet to be explored for non-neural network models. To this end, we explore the effectiveness of boosted trees based model uncertainty methods in active learning. Leveraging this model uncertainty, we propose an uncertainty based sampling in active learning for regression tasks on tabular data. Additionally, we also propose a novel cost-effective active learning method for regression tasks along with an improved cost-effective active learning method for classification tasks.
HyPE: Attention with Hyperbolic Biases for Relative Positional Encoding
Abstract
In Transformer-based architectures, the attention mechanism is inherently permutation-invariant with respect to the input sequence's tokens. To impose sequential order, token positions are typically encoded using a scheme with either fixed or learnable parameters. We introduce Hyperbolic Positional Encoding (HyPE), a novel method that utilizes hyperbolic functions' properties to encode tokens' relative positions. This approach biases the attention mechanism without the necessity of storing the $O(L^2)$ values of the mask, with $L$ being the length of the input sequence. HyPE leverages preliminary concatenation operations and matrix multiplications, facilitating the encoding of relative distances indirectly incorporating biases into the softmax computation. This design ensures compatibility with FlashAttention-2 and supports the gradient backpropagation for any potential learnable parameters within the encoding. We analytically demonstrate that, by careful hyperparameter selection, HyPE can approximate the attention bias of ALiBi, thereby offering promising generalization capabilities for contexts extending beyond the lengths encountered during pretraining. The experimental evaluation of HyPE is proposed as a direction for future research.
Distributed multi-UAV shield formation based on virtual surface constraints
Authors: Authors: María Guinaldo, José Sánchez-Moreno, Salvador Zaragoza, Francisco José Mañas-Álvarez
Abstract
This paper proposes a method for the deployment of a multi-agent system of unmanned aerial vehicles (UAVs) as a shield with potential applications in the protection of infrastructures. For this purpose, a distributed control law based on the gradient of a potential function is proposed to acquire the desired shield shape, which is modeled as a quadric surface in the 3D space. The graph of the formation is a Delaunay triangulation, which guarantees the formation to be rigid. An algorithm is proposed to design the formation (target distances between agents and interconnections) to distribute the agents over the virtual surface, where the input parameters are just the parametrization of the quadric and the number of agents of the system. Proofs of system stability with the proposed control law are provided, as well as a new method to guarantee that the resulting triangulation over the surface is Delaunay, which can be executed locally. Simulation and experimental results illustrate the effectiveness of the proposed approach.
Gradient-Based Dovetail Joint Shape Optimization for Stiffness
Authors: Authors: Xingyuan Sun, Chenyue Cai, Ryan P. Adams, Szymon Rusinkiewicz
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
It is common to manufacture an object by decomposing it into parts that can be assembled. This decomposition is often required by size limits of the machine, the complex structure of the shape, etc. To make it possible to easily assemble the final object, it is often desirable to design geometry that enables robust connections between the subcomponents. In this project, we study the task of dovetail-joint shape optimization for stiffness using gradient-based optimization. This optimization requires a differentiable simulator that is capable of modeling the contact between the two parts of a joint, making it possible to reason about the gradient of the stiffness with respect to shape parameters. Our simulation approach uses a penalty method that alternates between optimizing each side of the joint, using the adjoint method to compute gradients. We test our method by optimizing the joint shapes in three different joint shape spaces, and evaluate optimized joint shapes in both simulation and real-world tests. The experiments show that optimized joint shapes achieve higher stiffness, both synthetically and in real-world tests.
Keyword: super-resolution
INCODE: Implicit Neural Conditioning with Prior Knowledge Embeddings
Abstract
Implicit Neural Representations (INRs) have revolutionized signal representation by leveraging neural networks to provide continuous and smooth representations of complex data. However, existing INRs face limitations in capturing fine-grained details, handling noise, and adapting to diverse signal types. To address these challenges, we introduce INCODE, a novel approach that enhances the control of the sinusoidal-based activation function in INRs using deep prior knowledge. INCODE comprises a harmonizer network and a composer network, where the harmonizer network dynamically adjusts key parameters of the activation function. Through a task-specific pre-trained model, INCODE adapts the task-specific parameters to optimize the representation process. Our approach not only excels in representation, but also extends its prowess to tackle complex tasks such as audio, image, and 3D shape reconstructions, as well as intricate challenges such as neural radiance fields (NeRFs), and inverse problems, including denoising, super-resolution, inpainting, and CT reconstruction. Through comprehensive experiments, INCODE demonstrates its superiority in terms of robustness, accuracy, quality, and convergence rate, broadening the scope of signal representation. Please visit the project's website for details on the proposed method and access to the code.
Efficient Test-Time Adaptation for Super-Resolution with Second-Order Degradation and Reconstruction
Authors: Authors: Zeshuai Deng, Zhuokun Chen, Shuaicheng Niu, Thomas H. Li, Bohan Zhuang, Mingkui Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Image super-resolution (SR) aims to learn a mapping from low-resolution (LR) to high-resolution (HR) using paired HR-LR training images. Conventional SR methods typically gather the paired training data by synthesizing LR images from HR images using a predetermined degradation model, e.g., Bicubic down-sampling. However, the realistic degradation type of test images may mismatch with the training-time degradation type due to the dynamic changes of the real-world scenarios, resulting in inferior-quality SR images. To address this, existing methods attempt to estimate the degradation model and train an image-specific model, which, however, is quite time-consuming and impracticable to handle rapidly changing domain shifts. Moreover, these methods largely concentrate on the estimation of one degradation type (e.g., blur degradation), overlooking other degradation types like noise and JPEG in real-world test-time scenarios, thus limiting their practicality. To tackle these problems, we present an efficient test-time adaptation framework for SR, named SRTTA, which is able to quickly adapt SR models to test domains with different/unknown degradation types. Specifically, we design a second-order degradation scheme to construct paired data based on the degradation type of the test image, which is predicted by a pre-trained degradation classifier. Then, we adapt the SR model by implementing feature-level reconstruction learning from the initial test image to its second-order degraded counterparts, which helps the SR model generate plausible HR images. Extensive experiments are conducted on newly synthesized corrupted DIV2K datasets with 8 different degradations and several real-world datasets, demonstrating that our SRTTA framework achieves an impressive improvement over existing methods with satisfying speed. The source code is available at https://github.com/DengZeshuai/SRTTA.
IterInv: Iterative Inversion for Pixel-Level T2I Models
Authors: Authors: Chuanming Tang, Kai Wang, Joost van de Weijer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
Large-scale text-to-image diffusion models have been a ground-breaking development in generating convincing images following an input text prompt. The goal of image editing research is to give users control over the generated images by modifying the text prompt. Current image editing techniques are relying on DDIM inversion as a common practice based on the Latent Diffusion Models (LDM). However, the large pretrained T2I models working on the latent space as LDM suffer from losing details due to the first compression stage with an autoencoder mechanism. Instead, another mainstream T2I pipeline working on the pixel level, such as Imagen and DeepFloyd-IF, avoids this problem. They are commonly composed of several stages, normally with a text-to-image stage followed by several super-resolution stages. In this case, the DDIM inversion is unable to find the initial noise to generate the original image given that the super-resolution diffusion models are not compatible with the DDIM technique. According to our experimental findings, iteratively concatenating the noisy image as the condition is the root of this problem. Based on this observation, we develop an iterative inversion (IterInv) technique for this stream of T2I models and verify IterInv with the open-source DeepFloyd-IF model. By combining our method IterInv with a popular image editing method, we prove the application prospects of IterInv. The code will be released at \url{https://github.com/Tchuanm/IterInv.git}.
Keyword: sgd
Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent
Linear Mode Connectivity in Sparse Neural Networks
High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise
Escaping Saddle Points in Heterogeneous Federated Learning via Distributed SGD with Communication Compression
Keyword: optimization
Unveiling Energy Efficiency in Deep Learning: Measurement, Prediction, and Scoring across Edge Devices
Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning
Middle-mile optimization for next-day delivery
On the Fairness ROAD: Robust Optimization for Adversarial Debiasing
Generalized Firefly Algorithm for Optimal Transmit Beamforming
Multi-fidelity Design of Porous Microstructures for Thermofluidic Applications
End-to-end Feature Selection Approach for Learning Skinny Trees
Optimization-Free Test-Time Adaptation for Cross-Person Activity Recognition
Pessimistic Off-Policy Multi-Objective Optimization
Designing a Kinetic Façade Using BB-BC Algorithm with a Focus on Enhancing Building Energy Efficiency
Clairvoyance: A Pipeline Toolkit for Medical Time Series
KernelGPA: A Globally Optimal Solution to Deformable SLAM in Closed-form
Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards
The Evolution of the Interplay Between Input Distributions and Linear Regions in Networks
Optimization of utility-based shortfall risk: A non-asymptotic viewpoint
A Data-driven Recommendation Framework for Optimal Walker Designs
A Fuzzy Time Series-Based Model Using Particle Swarm Optimization and Weighted Rules
Correlation Aware Sparsified Mean Estimation Using Random Projection
Ever Evolving Evaluator (EV3): Towards Flexible and Reliable Meta-Optimization for Knowledge Distillation
Optimizing Task-Specific Timeliness With Edge-Assisted Scheduling for Status Update
TiV-NeRF: Tracking and Mapping via Time-Varying Representation with Dynamic Neural Radiance Fields
Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks on Nearly-orthogonal Data
Playing in the Dark: No-regret Learning with Adversarial Constraints
Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders
A 0.21-ps FOM Capacitor-Less Analog LDO with Dual-Range Load Current for Biomedical Applications
Behavior Alignment via Reward Function Optimization
Optimizing simultaneous autoscaling for serverless cloud computing
RIS-aided Real-time Beam Tracking for a Mobile User via Bayesian Optimization
An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits
Large Language Models as Evolutionary Optimizers
Estimation of Semiconductor Power Losses Through Automatic Thermal Modeling
Datasets and Benchmarks for Nanophotonic Structure and Parametric Design Simulations
Gauge-optimal approximate learning for small data classification problems
Bridging the Gap: Towards an Expanded Toolkit for ML-Supported Decision-Making in the Public Sector
Proving Linear Mode Connectivity of Neural Networks via Optimal Transport
Partial Orderings as Heuristic for Multi-Objective Model-Based Reasoning
Optical STAR-RIS-Aided VLC Systems: RSMA Versus NOMA
MAG-GNN: Reinforcement Learning Boosted Graph Neural Network
RAIFLE: Reconstruction Attacks on Interaction-based Federated Learning with Active Data Manipulation
Identification of the Most Significant Parameter for Optimizing the Performance of RPL Routing Protocol in IoT Using Taguchi Design of Experiments
On the accuracy and efficiency of group-wise clipping in differentially private optimization
Adapter Pruning using Tropical Characterization
IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI
Flow-based Distributionally Robust Optimization
Online Data-Driven Safety Certification for Systems Subject to Unknown Disturbances
ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout
D4Explainer: In-Distribution GNN Explanations via Discrete Denoising Diffusion
Faster QUBO Brute-Force Solving using Gray Code
Volterra black-box models identification methods: direct collocation vs least squares
A numerical tool for efficient analysis and optimization of offshore wind turbine jacket substructure considering realistic boundary and loading conditions
Text-to-3D with classifier score distillation
Regret-Minimization Algorithms for Multi-Agent Cooperative Learning Systems
VDIP-TGV: Blind Image Deconvolution via Variational Deep Image Prior Empowered by Total Generalized Variation
A General Neural Causal Model for Interactive Recommendation
Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation
A Perceptual Shape Loss for Monocular 3D Face Reconstruction
RayDF: Neural Ray-surface Distance Fields with Multi-view Consistency
Mixed coordinate Node link Visualization for Co_authorship Hypergraph Networks
Optimizing Logical Execution Time Model for Both Determinism and Low Latency
Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models
DEFT: Dexterous Fine-Tuning for Real-World Hand Policies
Gradient-Based Dovetail Joint Shape Optimization for Stiffness
Keyword: adam
Correlation Aware Sparsified Mean Estimation Using Random Projection
Fast Trainable Projection for Robust Fine-Tuning
Keyword: gradient
Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning
A general learning scheme for classical and quantum Ising machines
End-to-end Feature Selection Approach for Learning Skinny Trees
Visual Explanations via Iterated Integrated Attributions
Domain Generalisation via Risk Distribution Matching
Device-Edge Cooperative Fine-Tuning of Foundation Models as a 6G Service
Pessimistic Off-Policy Multi-Objective Optimization
Explainable Modeling for Wind Power Forecasting: A Glass-Box Approach with Exceptional Accuracy
On Training Implicit Meta-Learning With Applications to Inductive Weighing in Consistency Regularization
Optimization of utility-based shortfall risk: A non-asymptotic viewpoint
High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise
Differentiable Learning of Generalized Structured Matrices for Efficient Deep Neural Networks
D2NO: Efficient Handling of Heterogeneous Input Function Spaces with Distributed Deep Neural Operators
Ever Evolving Evaluator (EV3): Towards Flexible and Reliable Meta-Optimization for Knowledge Distillation
Estimating the Rate-Distortion Function by Wasserstein Gradient Descent
Hyperbolic Graph Neural Networks at Scale: A Meta Learning Approach
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation
Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks on Nearly-orthogonal Data
Machine Learning Algorithms to Predict Chess960 Result and Develop Opening Themes
TIC-TAC: A Framework To Learn And Evaluate Your Covariance
Blacksmith: Fast Adversarial Training of Vision Transformers via a Mixture of Single-step and Multi-step Methods
NP-SBFL: Bridging the Gap Between Spectrum-Based Fault Localization and Faulty Neural Pathways Diagnosis
Boosting Decision-Based Black-Box Adversarial Attack with Gradient Priors
Escaping Saddle Points in Heterogeneous Federated Learning via Distributed SGD with Communication Compression
Expanding memory in recurrent spiking networks
Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery
Proving Linear Mode Connectivity of Neural Networks via Optimal Transport
Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery
Fast Trainable Projection for Robust Fine-Tuning
On the accuracy and efficiency of group-wise clipping in differentially private optimization
Maximum Knowledge Orthogonality Reconstruction with Gradients in Federated Learning
Flow-based Distributionally Robust Optimization
VDIP-TGV: Blind Image Deconvolution via Variational Deep Image Prior Empowered by Total Generalized Variation
Decoupled Actor-Critic
Model Uncertainty based Active Learning on Tabular Data using Boosted Trees
HyPE: Attention with Hyperbolic Biases for Relative Positional Encoding
Distributed multi-UAV shield formation based on virtual surface constraints
Gradient-Based Dovetail Joint Shape Optimization for Stiffness
Keyword: super-resolution
INCODE: Implicit Neural Conditioning with Prior Knowledge Embeddings
Efficient Test-Time Adaptation for Super-Resolution with Second-Order Degradation and Reconstruction
IterInv: Iterative Inversion for Pixel-Level T2I Models