New submissions for Thu, 19 Oct 23

Keyword: sgd

Learning to Generate Parameters of ConvNets for Unseen Image Data

Authors: Authors: Shiye Wang, Kaituo Feng, Changsheng Li, Ye Yuan, Guoren Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.11862
Pdf link: https://arxiv.org/pdf/2310.11862
Abstract Typical Convolutional Neural Networks (ConvNets) depend heavily on large amounts of image data and resort to an iterative optimization algorithm (e.g., SGD or Adam) to learn network parameters, which makes training very time- and resource-intensive. In this paper, we propose a new training paradigm and formulate the parameter learning of ConvNets into a prediction task: given a ConvNet architecture, we observe there exists correlations between image datasets and their corresponding optimal network parameters, and explore if we can learn a hyper-mapping between them to capture the relations, such that we can directly predict the parameters of the network for an image dataset never seen during the training phase. To do this, we put forward a new hypernetwork based model, called PudNet, which intends to learn a mapping between datasets and their corresponding network parameters, and then predicts parameters for unseen data with only a single forward propagation. Moreover, our model benefits from a series of adaptive hyper recurrent units sharing weights to capture the dependencies of parameters among different network layers. Extensive experiments demonstrate that our proposed method achieves good efficacy for unseen image datasets on two kinds of settings: Intra-dataset prediction and Inter-dataset prediction. Our PudNet can also well scale up to large-scale datasets, e.g., ImageNet-1K. It takes 8967 GPU seconds to train ResNet-18 on the ImageNet-1K using GC from scratch and obtain a top-5 accuracy of 44.65 %. However, our PudNet costs only 3.89 GPU seconds to predict the network parameters of ResNet-18 achieving comparable performance (44.92 %), more than 2,300 times faster than the traditional training paradigm.
Keyword: optimization

Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction
Authors: Authors: Yufei Huang, Siyuan Li, Jin Su, Lirong Wu, Odin Zhang, Haitao Lin, Jingqi Qi, Zihan Liu, Zhangyang Gao, Jiangbin Zheng, Stan.ZQ.Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2310.11466
Pdf link: https://arxiv.org/pdf/2310.11466
Abstract Protein structure-based property prediction has emerged as a promising approach for various biological tasks, such as protein function prediction and sub-cellular location estimation. The existing methods highly rely on experimental protein structure data and fail in scenarios where these data are unavailable. Predicted protein structures from AI tools (e.g., AlphaFold2) were utilized as alternatives. However, we observed that current practices, which simply employ accurately predicted structures during inference, suffer from notable degradation in prediction accuracy. While similar phenomena have been extensively studied in general fields (e.g., Computer Vision) as model robustness, their impact on protein property prediction remains unexplored. In this paper, we first investigate the reason behind the performance decrease when utilizing predicted structures, attributing it to the structure embedding bias from the perspective of structure representation learning. To study this problem, we identify a Protein 3D Graph Structure Learning Problem for Robust Protein Property Prediction (PGSL-RP3), collect benchmark datasets, and present a protein Structure embedding Alignment Optimization framework (SAO) to mitigate the problem of structure embedding bias between the predicted and experimental protein structures. Extensive experiments have shown that our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures. The benchmark datasets and codes will be released to benefit the community.
ASP: Automatic Selection of Proxy dataset for efficient AutoML
Authors: Authors: Peng Yao, Chao Liao, Jiyuan Jia, Jianchao Tan, Bin Chen, Chengru Song, Di Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.11478
Pdf link: https://arxiv.org/pdf/2310.11478
Abstract Deep neural networks have gained great success due to the increasing amounts of data, and diverse effective neural network designs. However, it also brings a heavy computing burden as the amount of training data is proportional to the training time. In addition, a well-behaved model requires repeated trials of different structure designs and hyper-parameters, which may take a large amount of time even with state-of-the-art (SOTA) hyper-parameter optimization (HPO) algorithms and neural architecture search (NAS) algorithms. In this paper, we propose an Automatic Selection of Proxy dataset framework (ASP) aimed to dynamically find the informative proxy subsets of training data at each epoch, reducing the training data size as well as saving the AutoML processing time. We verify the effectiveness and generalization of ASP on CIFAR10, CIFAR100, ImageNet16-120, and ImageNet-1k, across various public model benchmarks. The experiment results show that ASP can obtain better results than other data selection methods at all selection ratios. ASP can also enable much more efficient AutoML processing with a speedup of 2x-20x while obtaining better architectures and better hyper-parameters compared to utilizing the entire dataset.
Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs
Authors: Authors: Yu-Heng Hung, Ping-Chun Hsieh, Akshay Mete, P. R. Kumar
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.11515
Pdf link: https://arxiv.org/pdf/2310.11515
Abstract We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition probabilities of the dynamic model can be linearly parameterized with the help of a predefined low-dimensional feature mapping. While the existing regression-based approaches have been theoretically shown to achieve nearly-optimal regret, they are computationally rather inefficient due to the need for a large number of optimization runs in each time step, especially when the state and action spaces are large. To address this issue, we propose to solve linear MDPs through the lens of Value-Biased Maximum Likelihood Estimation (VBMLE), which is a classic model-based exploration principle in the adaptive control literature for resolving the well-known closed-loop identification problem of Maximum Likelihood Estimation. We formally show that (i) VBMLE enjoys $\widetilde{O}(d\sqrt{T})$ regret, where $T$ is the time horizon and $d$ is the dimension of the model parameter, and (ii) VBMLE is computationally more efficient as it only requires solving one optimization problem in each time step. In our regret analysis, we offer a generic convergence result of MLE in linear MDPs through a novel supermartingale construct and uncover an interesting connection between linear MDPs and online learning, which could be of independent interest. Finally, the simulation results show that VBMLE significantly outperforms the benchmark method in terms of both empirical regret and computation time.
Group Preference Optimization: Few-Shot Alignment of Large Language Models
Authors: Authors: Siyan Zhao, John Dang, Aditya Grover
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.11523
Pdf link: https://arxiv.org/pdf/2310.11523
Abstract Many applications of large language models (LLMs), ranging from chatbots to creative writing, require nuanced subjective judgments that can differ significantly across different groups. Existing alignment algorithms can be expensive to align for each group, requiring prohibitive amounts of group-specific preference data and computation for real-world use cases. We introduce Group Preference Optimization (GPO), an alignment framework that steers language models to preferences of individual groups in a few-shot manner. In GPO, we augment the base LLM with an independent transformer module trained to predict the preferences of a group for the LLM generations. For few-shot learning, we parameterize this module as an in-context autoregressive transformer and train it via meta-learning on several groups. We empirically validate the efficacy of GPO through rigorous evaluations using LLMs with varied sizes on three human opinion adaptation tasks. These tasks involve adapting to the preferences of US demographic groups, global countries, and individual users. Our results demonstrate that GPO not only aligns models more accurately but also requires fewer group-specific preferences, and less training and inference computing resources, outperforming existing strategies such as in-context steering and fine-tuning methods.
Bias and Error Mitigation in Software-Generated Data: An Advanced Search and Optimization Framework Leveraging Generative Code Models
Authors: Authors: Ernesto Giralt Hernández
Subjects: Software Engineering (cs.SE); Information Theory (cs.IT); Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.11546
Pdf link: https://arxiv.org/pdf/2310.11546
Abstract Data generation and analysis is a fundamental aspect of many industries and disciplines, from strategic decision making in business to research in the physical and social sciences. However, data generated using software and algorithms can be subject to biases and errors. These can be due to problems with the original software, default settings that do not align with the specific needs of the situation, or even deeper problems with the underlying theories and models. This paper proposes an advanced search and optimization framework aimed at generating and choosing optimal source code capable of correcting errors and biases from previous versions to address typical problems in software systems specializing in data analysis and generation, especially those in the corporate and data science world. Applying this framework multiple times on the same software system would incrementally improve the quality of the output results. It uses Solomonoff Induction as a sound theoretical basis, extending it with Kolmogorov Conditional Complexity, a novel adaptation, to evaluate a set of candidate programs. We propose the use of generative models for the creation of this set of programs, with special emphasis on the capabilities of Large Language Models (LLMs) to generate high quality code.
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback
Authors: Authors: Haolin Liu, Chen-Yu Wei, Julian Zimmert
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.11550
Pdf link: https://arxiv.org/pdf/2310.11550
Abstract We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback, without prior knowledge on transitions or access to simulators. We introduce two algorithms that achieve improved regret performance compared to existing approaches. The first algorithm, although computationally inefficient, ensures a regret of $\widetilde{\mathcal{O}}\left(\sqrt{K}\right)$, where $K$ is the number of episodes. This is the first result with the optimal $K$ dependence in the considered setting. The second algorithm, which is based on the policy optimization framework, guarantees a regret of $\widetilde{\mathcal{O}}\left(K^{\frac{3}{4}} \right)$ and is computationally efficient. Both our results significantly improve over the state-of-the-art: a computationally inefficient algorithm by Kong et al. [2023] with $\widetilde{\mathcal{O}}\left(K^{\frac{4}{5}}+poly\left(\frac{1}{\lambda{\min}}\right) \right)$ regret, for some problem-dependent constant $\lambda{\min}$ that can be arbitrarily close to zero, and a computationally efficient algorithm by Sherman et al. [2023b] with $\widetilde{\mathcal{O}}\left(K^{\frac{6}{7}} \right)$ regret.
Hybrid Trajectory Optimization of Simple Skateboarding Tricks through Contact
Authors: Authors: Michael Burgess
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.11599
Pdf link: https://arxiv.org/pdf/2310.11599
Abstract Trajectories are optimized for a two-dimensional simplified skateboarding system to allow it to perform a fundamental skateboarding trick called an "ollie". A methodology for generating trick trajectories by controlling the position of a point-mass relative to a board is presented and demonstrated over a range of peak jump heights. A hybrid dynamics approach is taken to perform this optimization, with contact constraints applied along a sequence of discrete timesteps based on the board's position throughout designated sections of the trick. These constraints introduce explicit and implicit discontinuities between chosen sections of the trick sequence. The approach has been shown to be successful for a set of realistic system parameters.
Holistic Parking Slot Detection with Polygon-Shaped Representations
Authors: Authors: Lihao Wang, Antonyo Musabini, Christel Leonet, Rachid Benmokhtar, Amaury Breheret, Chaima Yedes, Fabian Burger, Thomas Boulay, Xavier Perrotton
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.11629
Pdf link: https://arxiv.org/pdf/2310.11629
Abstract Current parking slot detection in advanced driver-assistance systems (ADAS) primarily relies on ultrasonic sensors. This method has several limitations such as the need to scan the entire parking slot before detecting it, the incapacity of detecting multiple slots in a row, and the difficulty of classifying them. Due to the complex visual environment, vehicles are equipped with surround view camera systems to detect vacant parking slots. Previous research works in this field mostly use image-domain models to solve the problem. These two-stage approaches separate the 2D detection and 3D pose estimation steps using camera calibration. In this paper, we propose one-step Holistic Parking Slot Network (HPS-Net), a tailor-made adaptation of the You Only Look Once (YOLO)v4 algorithm. This camera-based approach directly outputs the four vertex coordinates of the parking slot in topview domain, instead of a bounding box in raw camera images. Several visible points and shapes can be proposed from different angles. A novel regression loss function named polygon-corner Generalized Intersection over Union (GIoU) for polygon vertex position optimization is also proposed to manage the slot orientation and to distinguish the entrance line. Experiments show that HPS-Net can detect various vacant parking slots with a F1-score of 0.92 on our internal Valeo Parking Slots Dataset (VPSD) and 0.99 on the public dataset PS2.0. It provides a satisfying generalization and robustness in various parking scenarios, such as indoor (F1: 0.86) or paved ground (F1: 0.91). Moreover, it achieves a real-time detection speed of 17 FPS on Nvidia Drive AGX Xavier. A demo video can be found at https://streamable.com/75j7sj.
A Set-Based Approach for Robust Control Co-Design
Authors: Authors: Trevor J. Bird, Jacob A. Siefert, Herschel C. Pangborn, Neera Jain
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.11658
Pdf link: https://arxiv.org/pdf/2310.11658
Abstract Control Co-Design (CCD) considers the coupled effects of both the plant and control parameters to optimize a system's closed-loop transient performance during the design stage. This paper presents a new method for CCD with guarantees on robustness to nondeterministic disturbances for all initial conditions within a specified region of operation. This is accomplished by calculating the reachable sets of a candidate closed-loop system directly within the optimization problem. Using this approach, the plant and control parameters are simultaneously chosen to shape these reachable sets to be robustly positive invariant and thus safe for all time. Compared to conventional approaches that perform the optimization for a single initial condition and an a priori chosen sequence of disturbances, the proposed set-based method avoids sensitivity to variations in the assumed design scenario. As a representative example, the proposed method is applied to an active suspension system.
Estimating Material Properties of Interacting Objects Using Sum-GP-UCB
Authors: Authors: M. Yunus Seker, Oliver Kroemer
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.11749
Pdf link: https://arxiv.org/pdf/2310.11749
Abstract Robots need to estimate the material and dynamic properties of objects from observations in order to simulate them accurately. We present a Bayesian optimization approach to identifying the material property parameters of objects based on a set of observations. Our focus is on estimating these properties based on observations of scenes with different sets of interacting objects. We propose an approach that exploits the structure of the reward function by modeling the reward for each observation separately and using only the parameters of the objects in that scene as inputs. The resulting lower-dimensional models generalize better over the parameter space, which in turn results in a faster optimization. To speed up the optimization process further, and reduce the number of simulation runs needed to find good parameter values, we also propose partial evaluations of the reward function, wherein the selected parameters are only evaluated on a subset of real world evaluations. The approach was successfully evaluated on a set of scenes with a wide range of object interactions, and we showed that our method can effectively perform incremental learning without resetting the rewards of the gathered observations.
Min-max Decoding Error Probability Optimization in RIS-Aided Hybrid TDMA-NOMA Networks
Authors: Authors: Tra Huong Thi Le, Yan Kyaw Tun
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2310.11750
Pdf link: https://arxiv.org/pdf/2310.11750
Abstract One of the primary objectives for future wireless communication networks is to facilitate the provision of ultra-reliable and low-latency communication services while simultaneously ensuring the capability for vast connection. In order to achieve this objective, we examine a hybrid multi-access scheme inside the finite blocklength (FBL) regime. This system combines the benefits of non-orthogonal multiple access (NOMA) and time-division multiple access (TDMA) schemes with the aim of fulfilling the objectives of future wireless communication networks. In addition, a reconfigurable intelligent surface (RIS) is utilized to facilitate the establishment of the uplink transmission between the base station and mobile devices in situations when impediments impede their direct communication linkages. This paper aims to minimize the worst-case decoding-error probability for all mobile users by jointly optimizing power allocation, receiving beamforming, blocklength, RIS reflection, and user pairing. To deal with the coupled variables in the formulated mixed-integer non-convex optimization problem, we decompose it into three sub-problems, namely, 1) decoding order determination problem, 2) joint power allocation, receiving beamforming, RIS reflection, and blocklength optimization problem, and 3) optimal user pairing problem. Then, we provide the sequential convex approximation (SCA) and semidefinite relaxation (SDR)-based algorithms as potential solutions for iteratively addressing the deconstructed first two sub-problems at a fixed random user pairing. In addition, the Hungarian matching approach is employed to address the challenge of optimizing user pairing. In conclusion, we undertake a comprehensive simulation, which reveals the advantageous qualities of the proposed algorithm and its superior performance compared to existing benchmark methods.
A Security-Constrained Optimal Power Management Algorithm for Shipboard Microgrids with Battery Energy Storage System and Fuel Cell
Authors: Authors: Fabio D'Agostino, Marco Gallo, Matteo Saviozzi, Federico Silvestro
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.11760
Pdf link: https://arxiv.org/pdf/2310.11760
Abstract This work proposes an optimal power management strategy for shipboard microgrids equipped with diesel generators, a fuel cell and a battery energy storage system. The optimization aims to determine both the unit commitment and the optimal power dispatch for all resources to ensure a reliable power supply at minimum cost and with minimal environmental impact. This strategy takes into account the zero-emission capability of the ship and incorporates a soft constraint related to the ship's speed. The optimization is performed solving a mixed integer linear programming problem, where the constraints are defined according to the operational limits of the resources when a contingency occurs. The algorithm is tested on a notional all-electric ship where the electrical load is generated through a Markov chain, modelled on real measurement data. The results show that the proposed power management strategy successfully maximizes fuel and emission savings while ensuring blackout prevention capability.
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts
Authors: Authors: Xinhua Cheng, Tianyu Yang, Jianan Wang, Yu Li, Lei Zhang, Jian Zhang, Li Yuan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.11784
Pdf link: https://arxiv.org/pdf/2310.11784
Abstract Recent text-to-3D generation methods achieve impressive 3D content creation capacity thanks to the advances in image diffusion models and optimizing strategies. However, current methods struggle to generate correct 3D content for a complex prompt in semantics, i.e., a prompt describing multiple interacted objects binding with different attributes. In this work, we propose a general framework named Progressive3D, which decomposes the entire generation into a series of locally progressive editing steps to create precise 3D content for complex prompts, and we constrain the content change to only occur in regions determined by user-defined region prompts in each editing step. Furthermore, we propose an overlapped semantic component suppression technique to encourage the optimization process to focus more on the semantic differences between prompts. Extensive experiments demonstrate that the proposed Progressive3D framework generates precise 3D content for prompts with complex semantics and is general for various text-to-3D methods driven by different 3D representations.
NeuroCUT: A Neural Approach for Robust Graph Partitioning
Authors: Authors: Rishi Shah, Krishnanshu Jain, Sahil Manchanda, Sourav Medya, Sayan Ranu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.11787
Pdf link: https://arxiv.org/pdf/2310.11787
Abstract Graph partitioning aims to divide a graph into $k$ disjoint subsets while optimizing a specific partitioning objective. The majority of formulations related to graph partitioning exhibit NP-hardness due to their combinatorial nature. As a result, conventional approximation algorithms rely on heuristic methods, sometimes with approximation guarantees and sometimes without. Unfortunately, traditional approaches are tailored for specific partitioning objectives and do not generalize well across other known partitioning objectives from the literature. To overcome this limitation, and learn heuristics from the data directly, neural approaches have emerged, demonstrating promising outcomes. In this study, we extend this line of work through a novel framework, NeuroCut. NeuroCut introduces two key innovations over prevailing methodologies. First, it is inductive to both graph topology and the partition count, which is provided at query time. Second, by leveraging a reinforcement learning based framework over node representations derived from a graph neural network, NeuroCut can accommodate any optimization objective, even those encompassing non-differentiable functions. Through empirical evaluation, we demonstrate that NeuroCut excels in identifying high-quality partitions, showcases strong generalization across a wide spectrum of partitioning objectives, and exhibits resilience to topological modifications.
Physics-informed Neural Network for Acoustic Resonance Analysis
Authors: Authors: Kazuya Yokota, Takahiko Kurahashi, Masajiro Abe
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2310.11804
Pdf link: https://arxiv.org/pdf/2310.11804
Abstract This study proposes the physics-informed neural network (PINN) framework to solve the wave equation for acoustic resonance analysis. ResoNet, the analytical model proposed in this study, minimizes the loss function for periodic solutions, in addition to conventional PINN loss functions, thereby effectively using the function approximation capability of neural networks, while performing resonance analysis. Additionally, it can be easily applied to inverse problems. Herein, the resonance in a one-dimensional acoustic tube was analyzed. The effectiveness of the proposed method was validated through the forward and inverse analyses of the wave equation with energy-loss terms. In the forward analysis, the applicability of PINN to the resonance problem was evaluated by comparison with the finite-difference method. The inverse analysis, which included the identification of the energy loss term in the wave equation and design optimization of the acoustic tube, was performed with good accuracy.
Multistable Perception, False Consensus, and Information Complements
Authors: Authors: Yuqing Kong
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2310.11857
Pdf link: https://arxiv.org/pdf/2310.11857
Abstract This paper presents a distributed communication model to investigate multistable perception, where a stimulus gives rise to multiple competing perceptual interpretations. We formalize stable perception as consensus achieved through components exchanging information. Our key finding is that relationships between components influence monostable versus multistable perceptions. When components contain substitute information about the prediction target, stimuli display monostability. With complementary information, multistability arises. We then analyze phenomena like order effects and switching costs. Finally, we provide two additional perspectives. An optimization perspective balances accuracy and communication costs, relating stability to local optima. A Prediction market perspective highlights the strategic behaviors of neural coordination and provides insights into phenomena like rivalry, inhibition, and mental disorders. The two perspectives demonstrate how relationships among components influence perception costs, and impact competition and coordination behaviors in neural dynamics.
Learning to Generate Parameters of ConvNets for Unseen Image Data
Authors: Authors: Shiye Wang, Kaituo Feng, Changsheng Li, Ye Yuan, Guoren Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.11862
Pdf link: https://arxiv.org/pdf/2310.11862
Abstract Typical Convolutional Neural Networks (ConvNets) depend heavily on large amounts of image data and resort to an iterative optimization algorithm (e.g., SGD or Adam) to learn network parameters, which makes training very time- and resource-intensive. In this paper, we propose a new training paradigm and formulate the parameter learning of ConvNets into a prediction task: given a ConvNet architecture, we observe there exists correlations between image datasets and their corresponding optimal network parameters, and explore if we can learn a hyper-mapping between them to capture the relations, such that we can directly predict the parameters of the network for an image dataset never seen during the training phase. To do this, we put forward a new hypernetwork based model, called PudNet, which intends to learn a mapping between datasets and their corresponding network parameters, and then predicts parameters for unseen data with only a single forward propagation. Moreover, our model benefits from a series of adaptive hyper recurrent units sharing weights to capture the dependencies of parameters among different network layers. Extensive experiments demonstrate that our proposed method achieves good efficacy for unseen image datasets on two kinds of settings: Intra-dataset prediction and Inter-dataset prediction. Our PudNet can also well scale up to large-scale datasets, e.g., ImageNet-1K. It takes 8967 GPU seconds to train ResNet-18 on the ImageNet-1K using GC from scratch and obtain a top-5 accuracy of 44.65 %. However, our PudNet costs only 3.89 GPU seconds to predict the network parameters of ResNet-18 achieving comparable performance (44.92 %), more than 2,300 times faster than the traditional training paradigm.
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function
Authors: Authors: Liu Liu, Xuanqing Liu, Cho-Jui Hsieh, Dacheng Tao
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.11866
Pdf link: https://arxiv.org/pdf/2310.11866
Abstract Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appealing theoretical properties for non-convex optimization by concurrently computing function value, gradient, and Hessian matrix to obtain the next search direction and the adjusted parameters. Although stochastic approximations help largely reduce the computational cost, it is challenging to theoretically guarantee the convergence rate. In this paper, we explore a family of stochastic TR and ARC methods that can simultaneously provide inexact computations of the Hessian matrix, gradient, and function values. Our algorithms require much fewer propagations overhead per iteration than TR and ARC. We prove that the iteration complexity to achieve $\epsilon$-approximate second-order optimality is of the same order as the exact computations demonstrated in previous studies. Additionally, the mild conditions on inexactness can be met by leveraging a random sampling technology in the finite-sum minimization problem. Numerical experiments with a non-convex problem support these findings and demonstrate that, with the same or a similar number of iterations, our algorithms require less computational overhead per iteration than current second-order methods.
Online Convex Optimization with Switching Cost and Delayed Gradients
Authors: Authors: Spandan Senapati, Rahul Vaze
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.11880
Pdf link: https://arxiv.org/pdf/2310.11880
Abstract We consider the online convex optimization (OCO) problem with quadratic and linear switching cost in the limited information setting, where an online algorithm can choose its action using only gradient information about the previous objective function. For $L$-smooth and $\mu$-strongly convex objective functions, we propose an online multiple gradient descent (OMGD) algorithm and show that its competitive ratio for the OCO problem with quadratic switching cost is at most $4(L + 5) + \frac{16(L + 5)}{\mu}$. The competitive ratio upper bound for OMGD is also shown to be order-wise tight in terms of $L,\mu$. In addition, we show that the competitive ratio of any online algorithm is $\max{\Omega(L), \Omega(\frac{L}{\sqrt{\mu}})}$ in the limited information setting when the switching cost is quadratic. We also show that the OMGD algorithm achieves the optimal (order-wise) dynamic regret in the limited information setting. For the linear switching cost, the competitive ratio upper bound of the OMGD algorithm is shown to depend on both the path length and the squared path length of the problem instance, in addition to $L, \mu$, and is shown to be order-wise, the best competitive ratio any online algorithm can achieve. Consequently, we conclude that the optimal competitive ratio for the quadratic and linear switching costs are fundamentally different in the limited information setting.
Differentially Private Distributed Stochastic Optimization with Time-Varying Sample Sizes
Authors: Authors: Jimin Wang, Ji-Feng Zhang
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.11892
Pdf link: https://arxiv.org/pdf/2310.11892
Abstract Differentially private distributed stochastic optimization has become a hot topic due to the urgent need of privacy protection in distributed stochastic optimization. In this paper, two-time scale stochastic approximation-type algorithms for differentially private distributed stochastic optimization with time-varying sample sizes are proposed using gradient- and output-perturbation methods. For both gradient- and output-perturbation cases, the convergence of the algorithm and differential privacy with a finite cumulative privacy budget $\varepsilon$ for an infinite number of iterations are simultaneously established, which is substantially different from the existing works. By a time-varying sample sizes method, the privacy level is enhanced, and differential privacy with a finite cumulative privacy budget $\varepsilon$ for an infinite number of iterations is established. By properly choosing a Lyapunov function, the algorithm achieves almost-sure and mean-square convergence even when the added privacy noises have an increasing variance. Furthermore, we rigorously provide the mean-square convergence rates of the algorithm and show how the added privacy noise affects the convergence rate of the algorithm. Finally, numerical examples including distributed training on a benchmark machine learning dataset are presented to demonstrate the efficiency and advantages of the algorithms.
Acoustic shape optimization using energy stable curvilinear finite differences
Authors: Authors: Gustav Eriksson, Vidar Stiernström
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.11956
Pdf link: https://arxiv.org/pdf/2310.11956
Abstract A gradient-based method for shape optimization problems constrained by the acoustic wave equation is presented. The method makes use of high-order accurate finite differences with summation-by-parts properties on multiblock curvilinear grids to discretize in space. Representing the design domain through a coordinate mapping from a reference domain, the design shape is obtained by inversion of the discretized coordinate map. The adjoint state framework is employed to efficiently compute the gradient of the loss functional. Using the summation-by-parts properties of the finite difference discretization, we prove stability and dual consistency for the semi-discrete forward and adjoint problems. Numerical experiments verify the accuracy of the finite difference scheme and demonstrate the capabilities of the shape optimization method on two model problems with real-world relevance.
A Finite-Horizon Approach to Active Level Set Estimation
Authors: Authors: Phillip Kearns, Bruno Jedynak, John Lipor
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.11985
Pdf link: https://arxiv.org/pdf/2310.11985
Abstract We consider the problem of active learning in the context of spatial sampling for level set estimation (LSE), where the goal is to localize all regions where a function of interest lies above/below a given threshold as quickly as possible. We present a finite-horizon search procedure to perform LSE in one dimension while optimally balancing both the final estimation error and the distance traveled for a fixed number of samples. A tuning parameter is used to trade off between the estimation accuracy and distance traveled. We show that the resulting optimization problem can be solved in closed form and that the resulting policy generalizes existing approaches to this problem. We then show how this approach can be used to perform level set estimation in higher dimensions under the popular Gaussian process model. Empirical results on synthetic data indicate that as the cost of travel increases, our method's ability to treat distance nonmyopically allows it to significantly improve on the state of the art. On real air quality data, our approach achieves roughly one fifth the estimation error at less than half the cost of competing algorithms.
Exact and efficient solutions of the LMC Multitask Gaussian Process model
Authors: Authors: Olivier Truffinet (CEA Saclay), Karim Ammar (CEA Saclay), Jean-Philippe Argaud (EDF R&D), Bertrand Bouriquet (EDF)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.12032
Pdf link: https://arxiv.org/pdf/2310.12032
Abstract The Linear Model of Co-regionalization (LMC) is a very general model of multitask gaussian process for regression or classification. While its expressivity and conceptual simplicity are appealing, naive implementations have cubic complexity in the number of datapoints and number of tasks, making approximations mandatory for most applications. However, recent work has shown that under some conditions the latent processes of the model can be decoupled, leading to a complexity that is only linear in the number of said processes. We here extend these results, showing from the most general assumptions that the only condition necessary to an efficient exact computation of the LMC is a mild hypothesis on the noise model. We introduce a full parametrization of the resulting \emph{projected LMC} model, and an expression of the marginal likelihood enabling efficient optimization. We perform a parametric study on synthetic data to show the excellent performance of our approach, compared to an unrestricted exact LMC and approximations of the latter. Overall, the projected LMC appears as a credible and simpler alternative to state-of-the art models, which greatly facilitates some computations such as leave-one-out cross-validation and fantasization.
A Persuasive Approach to Combating Misinformation
Authors: Authors: Safwan Hossain, Andjela Mladenovic, Yiling Chen, Gauthier Gidel
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2310.12065
Pdf link: https://arxiv.org/pdf/2310.12065
Abstract We propose using Bayesian Persuasion as a tool for social media platforms to combat the spread of online misinformation. As platforms can predict the popularity and misinformation features of to-be-shared posts, and users are motivated to only share popular content, platforms can strategically reveal this informational advantage to persuade users to not share misinformed content. Our work mathematically characterizes the optimal information design scheme and the resulting utility when observations are not perfectly observed but arise from an imperfect classifier. Framing the optimization problem as a linear program, we give sufficient and necessary conditions on the classifier accuracy to ensure platform utility under optimal signaling is monotonically increasing and continuous. We next consider this interaction under a performative model, wherein platform intervention through signaling affects the content distribution in the future. We fully characterize the convergence and stability of optimal signaling under this performative process. Lastly, the broader scope of using information design to combat misinformation is discussed throughout.
DHOT-GM: Robust Graph Matching Using A Differentiable Hierarchical Optimal Transport Framework
Authors: Authors: Haoran Cheng, Dixin Luo, Hongteng Xu
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.12081
Pdf link: https://arxiv.org/pdf/2310.12081
Abstract Graph matching is one of the most significant graph analytic tasks in practice, which aims to find the node correspondence across different graphs. Most existing approaches rely on adjacency matrices or node embeddings when matching graphs, whose performances are often sub-optimal because of not fully leveraging the multi-modal information hidden in graphs, such as node attributes, subgraph structures, etc. In this study, we propose a novel and effective graph matching method based on a differentiable hierarchical optimal transport (HOT) framework, called DHOT-GM. Essentially, our method represents each graph as a set of relational matrices corresponding to the information of different modalities. Given two graphs, we enumerate all relational matrix pairs and obtain their matching results, and accordingly, infer the node correspondence by the weighted averaging of the matching results. This method can be implemented as computing the HOT distance between the two graphs -- each matching result is an optimal transport plan associated with the Gromov-Wasserstein (GW) distance between two relational matrices, and the weights of all matching results are the elements of an upper-level optimal transport plan defined on the matrix sets. We propose a bi-level optimization algorithm to compute the HOT distance in a differentiable way, making the significance of the relational matrices adjustable. Experiments on various graph matching tasks demonstrate the superiority and robustness of our method compared to state-of-the-art approaches.
Quality Diversity through Human Feedback
Authors: Authors: Li Ding, Jenny Zhang, Jeff Clune, Lee Spector, Joel Lehman
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2310.12103
Pdf link: https://arxiv.org/pdf/2310.12103
Abstract Reinforcement learning from human feedback (RLHF) has exhibited the potential to enhance the performance of foundation models for qualitative tasks. Despite its promise, its efficacy is often restricted when conceptualized merely as a mechanism to maximize learned reward models of averaged human preferences, especially in areas such as image generation which demand diverse model responses. Meanwhile, quality diversity (QD) algorithms, dedicated to seeking diverse, high-quality solutions, are often constrained by the dependency on manually defined diversity metrics. Interestingly, such limitations of RLHF and QD can be overcome by blending insights from both. This paper introduces Quality Diversity through Human Feedback (QDHF), which employs human feedback for inferring diversity metrics, expanding the applicability of QD algorithms. Empirical results reveal that QDHF outperforms existing QD methods regarding automatic diversity discovery, and matches the search capabilities of QD with human-constructed metrics. Notably, when deployed for a latent space illumination task, QDHF markedly enhances the diversity of images generated by a Diffusion model. The study concludes with an in-depth analysis of QDHF's sample efficiency and the quality of its derived diversity metrics, emphasizing its promise for enhancing exploration and diversity in optimization for complex, open-ended tasks.
Distributed Indexing Schemes for k-Dominant Skyline Analytics on Uncertain Edge-IoT Data
Authors: Authors: Chuan-Chi Lai, Hsuan-Yu Lin, Chuan-Ming Liu
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2310.12116
Pdf link: https://arxiv.org/pdf/2310.12116
Abstract Skyline queries typically search a Pareto-optimal set from a given data set to solve the corresponding multiobjective optimization problem. As the number of criteria increases, the skyline presumes excessive data items, which yield a meaningless result. To address this curse of dimensionality, we proposed a k-dominant skyline in which the number of skyline members was reduced by relaxing the restriction on the number of dimensions, considering the uncertainty of data. Specifically, each data item was associated with a probability of appearance, which represented the probability of becoming a member of the k-dominant skyline. As data items appear continuously in data streams, the corresponding k-dominant skyline may vary with time. Therefore, an effective and rapid mechanism of updating the k-dominant skyline becomes crucial. Herein, we proposed two time-efficient schemes, Middle Indexing (MI) and All Indexing (AI), for k-dominant skyline in distributed edge-computing environments, where irrelevant data items can be effectively excluded from the compute to reduce the processing duration. Furthermore, the proposed schemes were validated with extensive experimental simulations. The experimental results demonstrated that the proposed MI and AI schemes reduced the computation time by approximately 13% and 56%, respectively, compared with the existing method.
Fairer and More Accurate Tabular Models Through NAS
Authors: Authors: Richeek Das, Samuel Dooley
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.12145
Pdf link: https://arxiv.org/pdf/2310.12145
Abstract Making models algorithmically fairer in tabular data has been long studied, with techniques typically oriented towards fixes which usually take a neural model with an undesirable outcome and make changes to how the data are ingested, what the model weights are, or how outputs are processed. We employ an emergent and different strategy where we consider updating the model's architecture and training hyperparameters to find an entirely new model with better outcomes from the beginning of the debiasing procedure. In this work, we propose using multi-objective Neural Architecture Search (NAS) and Hyperparameter Optimization (HPO) in the first application to the very challenging domain of tabular data. We conduct extensive exploration of architectural and hyperparameter spaces (MLP, ResNet, and FT-Transformer) across diverse datasets, demonstrating the dependence of accuracy and fairness metrics of model predictions on hyperparameter combinations. We show that models optimized solely for accuracy with NAS often fail to inherently address fairness concerns. We propose a novel approach that jointly optimizes architectural and training hyperparameters in a multi-objective constraint of both accuracy and fairness. We produce architectures that consistently Pareto dominate state-of-the-art bias mitigation methods either in fairness, accuracy or both, all of this while being Pareto-optimal over hyperparameters achieved through single-objective (accuracy) optimization runs. This research underscores the promise of automating fairness and accuracy optimization in deep learning models.
Probabilistic Sampling of Balanced K-Means using Adiabatic Quantum Computing
Authors: Authors: Jan-Nico Zaech, Martin Danelljan, Luc Van Gool
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.12153
Pdf link: https://arxiv.org/pdf/2310.12153
Abstract Adiabatic quantum computing (AQC) is a promising quantum computing approach for discrete and often NP-hard optimization problems. Current AQCs allow to implement problems of research interest, which has sparked the development of quantum representations for many machine learning and computer vision tasks. Despite requiring multiple measurements from the noisy AQC, current approaches only utilize the best measurement, discarding information contained in the remaining ones. In this work, we explore the potential of using this information for probabilistic balanced k-means clustering. Instead of discarding non-optimal solutions, we propose to use them to compute calibrated posterior probabilities with little additional compute cost. This allows us to identify ambiguous solutions and data points, which we demonstrate on a D-Wave AQC on synthetic and real data.
Keyword: adam

Learning to Generate Parameters of ConvNets for Unseen Image Data
Authors: Authors: Shiye Wang, Kaituo Feng, Changsheng Li, Ye Yuan, Guoren Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.11862
Pdf link: https://arxiv.org/pdf/2310.11862
Abstract Typical Convolutional Neural Networks (ConvNets) depend heavily on large amounts of image data and resort to an iterative optimization algorithm (e.g., SGD or Adam) to learn network parameters, which makes training very time- and resource-intensive. In this paper, we propose a new training paradigm and formulate the parameter learning of ConvNets into a prediction task: given a ConvNet architecture, we observe there exists correlations between image datasets and their corresponding optimal network parameters, and explore if we can learn a hyper-mapping between them to capture the relations, such that we can directly predict the parameters of the network for an image dataset never seen during the training phase. To do this, we put forward a new hypernetwork based model, called PudNet, which intends to learn a mapping between datasets and their corresponding network parameters, and then predicts parameters for unseen data with only a single forward propagation. Moreover, our model benefits from a series of adaptive hyper recurrent units sharing weights to capture the dependencies of parameters among different network layers. Extensive experiments demonstrate that our proposed method achieves good efficacy for unseen image datasets on two kinds of settings: Intra-dataset prediction and Inter-dataset prediction. Our PudNet can also well scale up to large-scale datasets, e.g., ImageNet-1K. It takes 8967 GPU seconds to train ResNet-18 on the ImageNet-1K using GC from scratch and obtain a top-5 accuracy of 44.65 %. However, our PudNet costs only 3.89 GPU seconds to predict the network parameters of ResNet-18 achieving comparable performance (44.92 %), more than 2,300 times faster than the traditional training paradigm.
Keyword: gradient

High Efficiency Polymer based Direct Multi-jet Impingement Cooling Solution for High Power Devices
Authors: Authors: Tiwei Wei
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.11663
Pdf link: https://arxiv.org/pdf/2310.11663
Abstract Liquid jet impingement cooling is an efficient cooling technique where the liquid coolant is directly ejected from nozzles on the chip backside resulting in a high cooling efficiency due to the absence of the TIM and the lateral temperature gradient. In literature, several Si-fabrication based impingement coolers with nozzle diameters of a few distributed returns or combination of micro-channels and impingement nozzles. The drawback of this Si processing of the cooler is the high fabrication cost. Other fabrication methods for nozzle diameters for ceramic and metal. Low cost fabrication methods, including injection molding and 3D printing have been introduced for much larger nozzle diameters (mm range) with larger cooler dimensions. These dimensions and processes are however not compatible with the chip packaging process flow. This PhD focuses on the modeling, design, fabrication and characterization of a micro-scale liquid impingement cooler using advanced, yet cost efficient, fabrication techniques. The main objectives are: (a) development of a modeling methodology to optimize the cooler geometry; (b) exploring low cost fabrication methods for the package level impingement jet cooler; (c) experimental thermal and hydraulic characterization and analysis of the fabricated coolers; (d) applying the direct impingement jet cooling solutions to different applications.
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes
Authors: Authors: Washim Uddin Mondal, Vaneet Aggarwal
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.11677
Pdf link: https://arxiv.org/pdf/2310.11677
Abstract We consider the problem of designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Process. Specifically, we propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an accelerated stochastic gradient descent process to obtain the natural policy gradient. ANPG achieves $\mathcal{O}({\epsilon^{-2}})$ sample complexity and $\mathcal{O}(\epsilon^{-1})$ iteration complexity with general parameterization where $\epsilon$ defines the optimality error. This improves the state-of-the-art sample complexity by a $\log(\frac{1}{\epsilon})$ factor. ANPG is a first-order algorithm and unlike some existing literature, does not require the unverifiable assumption that the variance of importance sampling (IS) weights is upper bounded. In the class of Hessian-free and IS-free algorithms, ANPG beats the best-known sample complexity by a factor of $\mathcal{O}(\epsilon^{-\frac{1}{2}})$ and simultaneously matches their state-of-the-art iteration complexity.
Deep learning based on Transformer architecture for power system short-term voltage stability assessment with class imbalance
Authors: Authors: Yang Li, Jiting Cao, Yan Xu, Lipeng Zhu, Zhao Yang Dong
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.11690
Pdf link: https://arxiv.org/pdf/2310.11690
Abstract Most existing data-driven power system short-term voltage stability assessment (STVSA) approaches presume class-balanced input data. However, in practical applications, the occurrence of short-term voltage instability following a disturbance is minimal, leading to a significant class imbalance problem and a consequent decline in classifier performance. This work proposes a Transformer-based STVSA method to address this challenge. By utilizing the basic Transformer architecture, a stability assessment Transformer (StaaT) is developed {as a classification model to reflect the correlation between the operational states of the system and the resulting stability outcomes}. To combat the negative impact of imbalanced datasets, this work employs a conditional Wasserstein generative adversarial network with gradient penalty (CWGAN-GP) for synthetic data generation, aiding in the creation of a balanced, representative training set for the classifier. Semi-supervised clustering learning is implemented to enhance clustering quality, addressing the lack of a unified quantitative criterion for short-term voltage stability. {Numerical tests on the IEEE 39-bus test system extensively demonstrate that the proposed method exhibits robust performance under class imbalances up to 100:1 and noisy environments, and maintains consistent effectiveness even with an increased penetration of renewable energy}. Comparative results reveal that the CWGAN-GP generates more balanced datasets than traditional oversampling methods and that the StaaT outperforms other deep learning algorithms. This study presents a compelling solution for real-world STVSA applications that often face class imbalance and data noise challenges.
Unintended Memorization in Large ASR Models, and How to Mitigate It
Authors: Authors: Lun Wang, Om Thakkar, Rajiv Mathews
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2310.11739
Pdf link: https://arxiv.org/pdf/2310.11739
Abstract It is well-known that neural networks can unintentionally memorize their training examples, causing privacy concerns. However, auditing memorization in large non-auto-regressive automatic speech recognition (ASR) models has been challenging due to the high compute cost of existing methods such as hardness calibration. In this work, we design a simple auditing method to measure memorization in large ASR models without the extra compute overhead. Concretely, we speed up randomly-generated utterances to create a mapping between vocal and text information that is difficult to learn from typical training examples. Hence, accurate predictions only for sped-up training examples can serve as clear evidence for memorization, and the corresponding accuracy can be used to measure memorization. Using the proposed method, we showcase memorization in the state-of-the-art ASR models. To mitigate memorization, we tried gradient clipping during training to bound the influence of any individual example on the final model. We empirically show that clipping each example's gradient can mitigate memorization for sped-up training examples with up to 16 repetitions in the training set. Furthermore, we show that in large-scale distributed training, clipping the average gradient on each compute core maintains neutral model quality and compute cost while providing strong privacy protection.
Adversarial Training for Physics-Informed Neural Networks
Authors: Authors: Yao Li, Shengzhu Shi, Zhichang Guo, Boying Wu
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.11789
Pdf link: https://arxiv.org/pdf/2310.11789
Abstract Physics-informed neural networks have shown great promise in solving partial differential equations. However, due to insufficient robustness, vanilla PINNs often face challenges when solving complex PDEs, especially those involving multi-scale behaviors or solutions with sharp or oscillatory characteristics. To address these issues, based on the projected gradient descent adversarial attack, we proposed an adversarial training strategy for PINNs termed by AT-PINNs. AT-PINNs enhance the robustness of PINNs by fine-tuning the model with adversarial samples, which can accurately identify model failure locations and drive the model to focus on those regions during training. AT-PINNs can also perform inference with temporal causality by selecting the initial collocation points around temporal initial values. We implement AT-PINNs to the elliptic equation with multi-scale coefficients, Poisson equation with multi-peak solutions, Burgers equation with sharp solutions and the Allen-Cahn equation. The results demonstrate that AT-PINNs can effectively locate and reduce failure regions. Moreover, AT-PINNs are suitable for solving complex PDEs, since locating failure regions through adversarial attacks is independent of the size of failure regions or the complexity of the distribution.
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function
Authors: Authors: Liu Liu, Xuanqing Liu, Cho-Jui Hsieh, Dacheng Tao
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.11866
Pdf link: https://arxiv.org/pdf/2310.11866
Abstract Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appealing theoretical properties for non-convex optimization by concurrently computing function value, gradient, and Hessian matrix to obtain the next search direction and the adjusted parameters. Although stochastic approximations help largely reduce the computational cost, it is challenging to theoretically guarantee the convergence rate. In this paper, we explore a family of stochastic TR and ARC methods that can simultaneously provide inexact computations of the Hessian matrix, gradient, and function values. Our algorithms require much fewer propagations overhead per iteration than TR and ARC. We prove that the iteration complexity to achieve $\epsilon$-approximate second-order optimality is of the same order as the exact computations demonstrated in previous studies. Additionally, the mild conditions on inexactness can be met by leveraging a random sampling technology in the finite-sum minimization problem. Numerical experiments with a non-convex problem support these findings and demonstrate that, with the same or a similar number of iterations, our algorithms require less computational overhead per iteration than current second-order methods.
Online Convex Optimization with Switching Cost and Delayed Gradients
Authors: Authors: Spandan Senapati, Rahul Vaze
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.11880
Pdf link: https://arxiv.org/pdf/2310.11880
Abstract We consider the online convex optimization (OCO) problem with quadratic and linear switching cost in the limited information setting, where an online algorithm can choose its action using only gradient information about the previous objective function. For $L$-smooth and $\mu$-strongly convex objective functions, we propose an online multiple gradient descent (OMGD) algorithm and show that its competitive ratio for the OCO problem with quadratic switching cost is at most $4(L + 5) + \frac{16(L + 5)}{\mu}$. The competitive ratio upper bound for OMGD is also shown to be order-wise tight in terms of $L,\mu$. In addition, we show that the competitive ratio of any online algorithm is $\max{\Omega(L), \Omega(\frac{L}{\sqrt{\mu}})}$ in the limited information setting when the switching cost is quadratic. We also show that the OMGD algorithm achieves the optimal (order-wise) dynamic regret in the limited information setting. For the linear switching cost, the competitive ratio upper bound of the OMGD algorithm is shown to depend on both the path length and the squared path length of the problem instance, in addition to $L, \mu$, and is shown to be order-wise, the best competitive ratio any online algorithm can achieve. Consequently, we conclude that the optimal competitive ratio for the quadratic and linear switching costs are fundamentally different in the limited information setting.
Differentially Private Distributed Stochastic Optimization with Time-Varying Sample Sizes
Authors: Authors: Jimin Wang, Ji-Feng Zhang
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.11892
Pdf link: https://arxiv.org/pdf/2310.11892
Abstract Differentially private distributed stochastic optimization has become a hot topic due to the urgent need of privacy protection in distributed stochastic optimization. In this paper, two-time scale stochastic approximation-type algorithms for differentially private distributed stochastic optimization with time-varying sample sizes are proposed using gradient- and output-perturbation methods. For both gradient- and output-perturbation cases, the convergence of the algorithm and differential privacy with a finite cumulative privacy budget $\varepsilon$ for an infinite number of iterations are simultaneously established, which is substantially different from the existing works. By a time-varying sample sizes method, the privacy level is enhanced, and differential privacy with a finite cumulative privacy budget $\varepsilon$ for an infinite number of iterations is established. By properly choosing a Lyapunov function, the algorithm achieves almost-sure and mean-square convergence even when the added privacy noises have an increasing variance. Furthermore, we rigorously provide the mean-square convergence rates of the algorithm and show how the added privacy noise affects the convergence rate of the algorithm. Finally, numerical examples including distributed training on a benchmark machine learning dataset are presented to demonstrate the efficiency and advantages of the algorithms.
Accelerated Policy Gradient: On the Nesterov Momentum for Reinforcement Learning
Authors: Authors: Yen-Ju Chen, Nai-Chieh Huang, Ping-Chun Hsieh
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.11897
Pdf link: https://arxiv.org/pdf/2310.11897
Abstract Policy gradient methods have recently been shown to enjoy global convergence at a $\Theta(1/t)$ rate in the non-regularized tabular softmax setting. Accordingly, one important research question is whether this convergence rate can be further improved, with only first-order updates. In this paper, we answer the above question from the perspective of momentum by adapting the celebrated Nesterov's accelerated gradient (NAG) method to reinforcement learning (RL), termed \textit{Accelerated Policy Gradient} (APG). To demonstrate the potential of APG in achieving faster global convergence, we formally show that with the true gradient, APG with softmax policy parametrization converges to an optimal policy at a $\tilde{O}(1/t^2)$ rate. To the best of our knowledge, this is the first characterization of the global convergence rate of NAG in the context of RL. Notably, our analysis relies on one interesting finding: Regardless of the initialization, APG could end up reaching a locally nearly-concave regime, where APG could benefit significantly from the momentum, within finite iterations. By means of numerical validation, we confirm that APG exhibits $\tilde{O}(1/t^2)$ rate as well as show that APG could significantly improve the convergence behavior over the standard policy gradient.
Rather a Nurse than a Physician -- Contrastive Explanations under Investigation
Authors: Authors: Oliver Eberle, Ilias Chalkidis, Laura Cabello, Stephanie Brandl
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2310.11906
Pdf link: https://arxiv.org/pdf/2310.11906
Abstract Contrastive explanations, where one decision is explained in contrast to another, are supposed to be closer to how humans explain a decision than non-contrastive explanations, where the decision is not necessarily referenced to an alternative. This claim has never been empirically validated. We analyze four English text-classification datasets (SST2, DynaSent, BIOS and DBpedia-Animals). We fine-tune and extract explanations from three different models (RoBERTa, GTP-2, and T5), each in three different sizes and apply three post-hoc explainability methods (LRP, GradientxInput, GradNorm). We furthermore collect and release human rationale annotations for a subset of 100 samples from the BIOS dataset for contrastive and non-contrastive settings. A cross-comparison between model-based rationales and human annotations, both in contrastive and non-contrastive settings, yields a high agreement between the two settings for models as well as for humans. Moreover, model-based explanations computed in both settings align equally well with human rationales. Thus, we empirically find that humans do not necessarily explain in a contrastive manner.9 pages, long paper at ACL 2022 proceedings.
Acoustic shape optimization using energy stable curvilinear finite differences
Authors: Authors: Gustav Eriksson, Vidar Stiernström
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.11956
Pdf link: https://arxiv.org/pdf/2310.11956
Abstract A gradient-based method for shape optimization problems constrained by the acoustic wave equation is presented. The method makes use of high-order accurate finite differences with summation-by-parts properties on multiblock curvilinear grids to discretize in space. Representing the design domain through a coordinate mapping from a reference domain, the design shape is obtained by inversion of the discretized coordinate map. The adjoint state framework is employed to efficiently compute the gradient of the loss functional. Using the summation-by-parts properties of the finite difference discretization, we prove stability and dual consistency for the semi-discrete forward and adjoint problems. Numerical experiments verify the accuracy of the finite difference scheme and demonstrate the capabilities of the shape optimization method on two model problems with real-world relevance.
Simple Mechanisms for Representing, Indexing and Manipulating Concepts
Authors: Authors: Yuanzhi Li, Raghu Meka, Rina Panigrahy, Kulin Shah
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.12143
Pdf link: https://arxiv.org/pdf/2310.12143
Abstract Deep networks typically learn concepts via classifiers, which involves setting up a model and training it via gradient descent to fit the concept-labeled data. We will argue instead that learning a concept could be done by looking at its moment statistics matrix to generate a concrete representation or signature of that concept. These signatures can be used to discover structure across the set of concepts and could recursively produce higher-level concepts by learning this structure from those signatures. When the concepts are intersected', signatures of the concepts can be used to find a common theme across a number of relatedintersected' concepts. This process could be used to keep a dictionary of concepts so that inputs could correctly identify and be routed to the set of concepts involved in the (latent) generation of the input.
Keyword: super-resolution

Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach
Authors: Authors: Feng Luo, Jinxi Xiang, Jun Zhang, Xiao Han, Wei Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.12004
Pdf link: https://arxiv.org/pdf/2310.12004
Abstract The recent use of diffusion prior, enhanced by pre-trained text-image models, has markedly elevated the performance of image super-resolution (SR). To alleviate the huge computational cost required by pixel-based diffusion SR, latent-based methods utilize a feature encoder to transform the image and then implement the SR image generation in a compact latent space. Nevertheless, there are two major issues that limit the performance of latent-based diffusion. First, the compression of latent space usually causes reconstruction distortion. Second, huge computational cost constrains the parameter scale of the diffusion model. To counteract these issues, we first propose a frequency compensation module that enhances the frequency components from latent space to pixel space. The reconstruction distortion (especially for high-frequency information) can be significantly decreased. Then, we propose to use Sample-Space Mixture of Experts (SS-MoE) to achieve more powerful latent-based SR, which steadily improves the capacity of the model without a significant increase in inference costs. These carefully crafted designs contribute to performance improvements in largely explored 4x blind super-resolution benchmarks and extend to large magnification factors, i.e., 8x image SR benchmarks. The code is available at https://github.com/amandaluof/moe_sr.
HSTR-Net: Reference Based Video Super-resolution for Aerial Surveillance with Dual Cameras
Authors: Authors: H. Umut Suluhan, Hasan F. Ates, Bahadir K. Gunturk
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.12092
Pdf link: https://arxiv.org/pdf/2310.12092
Abstract Aerial surveillance requires high spatio-temporal resolution (HSTR) video for more accurate detection and tracking of objects. This is especially true for wide-area surveillance (WAS), where the surveyed region is large and the objects of interest are small. This paper proposes a dual camera system for the generation of HSTR video using reference-based super-resolution (RefSR). One camera captures high spatial resolution low frame rate (HSLF) video while the other captures low spatial resolution high frame rate (LSHF) video simultaneously for the same scene. A novel deep learning architecture is proposed to fuse HSLF and LSHF video feeds and synthesize HSTR video frames at the output. The proposed model combines optical flow estimation and (channel-wise and spatial) attention mechanisms to capture the fine motion and intricate dependencies between frames of the two video feeds. Simulations show that the proposed model provides significant improvement over existing reference-based SR techniques in terms of PSNR and SSIM metrics. The method also exhibits sufficient frames per second (FPS) for WAS when deployed on a power-constrained drone equipped with dual cameras.

zoq / arxiv-updates

New submissions for Thu, 19 Oct 23 #624

Keyword: sgd

Learning to Generate Parameters of ConvNets for Unseen Image Data

Keyword: optimization

Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction

ASP: Automatic Selection of Proxy dataset for efficient AutoML

Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs

Group Preference Optimization: Few-Shot Alignment of Large Language Models

Bias and Error Mitigation in Software-Generated Data: An Advanced Search and Optimization Framework Leveraging Generative Code Models

Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback

Hybrid Trajectory Optimization of Simple Skateboarding Tricks through Contact

Holistic Parking Slot Detection with Polygon-Shaped Representations

A Set-Based Approach for Robust Control Co-Design

Estimating Material Properties of Interacting Objects Using Sum-GP-UCB

Min-max Decoding Error Probability Optimization in RIS-Aided Hybrid TDMA-NOMA Networks

A Security-Constrained Optimal Power Management Algorithm for Shipboard Microgrids with Battery Energy Storage System and Fuel Cell

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

NeuroCUT: A Neural Approach for Robust Graph Partitioning

Physics-informed Neural Network for Acoustic Resonance Analysis

Multistable Perception, False Consensus, and Information Complements

Learning to Generate Parameters of ConvNets for Unseen Image Data

Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function

Online Convex Optimization with Switching Cost and Delayed Gradients

Differentially Private Distributed Stochastic Optimization with Time-Varying Sample Sizes

Acoustic shape optimization using energy stable curvilinear finite differences

A Finite-Horizon Approach to Active Level Set Estimation

Exact and efficient solutions of the LMC Multitask Gaussian Process model

A Persuasive Approach to Combating Misinformation

DHOT-GM: Robust Graph Matching Using A Differentiable Hierarchical Optimal Transport Framework

Quality Diversity through Human Feedback

Distributed Indexing Schemes for k-Dominant Skyline Analytics on Uncertain Edge-IoT Data

Fairer and More Accurate Tabular Models Through NAS

Probabilistic Sampling of Balanced K-Means using Adiabatic Quantum Computing

Keyword: adam

Learning to Generate Parameters of ConvNets for Unseen Image Data

Keyword: gradient

High Efficiency Polymer based Direct Multi-jet Impingement Cooling Solution for High Power Devices

Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes

Deep learning based on Transformer architecture for power system short-term voltage stability assessment with class imbalance

Unintended Memorization in Large ASR Models, and How to Mitigate It

Adversarial Training for Physics-Informed Neural Networks

Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function

Online Convex Optimization with Switching Cost and Delayed Gradients

Differentially Private Distributed Stochastic Optimization with Time-Varying Sample Sizes

Accelerated Policy Gradient: On the Nesterov Momentum for Reinforcement Learning

Rather a Nurse than a Physician -- Contrastive Explanations under Investigation

Acoustic shape optimization using energy stable curvilinear finite differences

Simple Mechanisms for Representing, Indexing and Manipulating Concepts

Keyword: super-resolution

Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach

HSTR-Net: Reference Based Video Super-resolution for Aerial Surveillance with Dual Cameras