New submissions for Wed, 6 Sep 23

Keyword: sgd

Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments

Authors: Authors: M. Soheil Shamaee, S. Fathi Hafshejani
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.01248
Pdf link: https://arxiv.org/pdf/2309.01248
Abstract This paper introduces a novel approach to enhance the performance of the stochastic gradient descent (SGD) algorithm by incorporating a modified decay step size based on $\frac{1}{\sqrt{t}}$. The proposed step size integrates a logarithmic term, leading to the selection of smaller values in the final iterations. Our analysis establishes a convergence rate of $O(\frac{\ln T}{\sqrt{T}})$ for smooth non-convex functions without the Polyak-{\L}ojasiewicz condition. To evaluate the effectiveness of our approach, we conducted numerical experiments on image classification tasks using the FashionMNIST, and CIFAR10 datasets, and the results demonstrate significant improvements in accuracy, with enhancements of $0.5\%$ and $1.4\%$ observed, respectively, compared to the traditional $\frac{1}{\sqrt{t}}$ step size. The source code can be found at \\url{https://github.com/Shamaeem/LNSQRTStepSize}.
Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD
Authors: Authors: Etay Livne, Gal Kaplun, Eran Malach Shai, Shalev-Schwatz
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.01640
Pdf link: https://arxiv.org/pdf/2309.01640
Abstract When using Stochastic Gradient Descent (SGD) for training machine learning models, it is often crucial to provide the model with examples sampled at random from the dataset. However, for large datasets stored in the cloud, random access to individual examples is often costly and inefficient. A recent work \cite{corgi}, proposed an online shuffling algorithm called CorgiPile, which greatly improves efficiency of data access, at the cost some performance loss, which is particularly apparent for large datasets stored in homogeneous shards (e.g., video datasets). In this paper, we introduce a novel two-step partial data shuffling strategy for SGD which combines an offline iteration of the CorgiPile method with a subsequent online iteration. Our approach enjoys the best of both worlds: it performs similarly to SGD with random access (even for homogenous data) without compromising the data access efficiency of CorgiPile. We provide a comprehensive theoretical analysis of the convergence properties of our method and demonstrate its practical advantages through experimental results.
DRAG: Divergence-based Adaptive Aggregation in Federated learning on Non-IID Data
Authors: Authors: Feng Zhu, Jingjing Zhang, Shengyun Liu, Xin Wang
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2309.01779
Pdf link: https://arxiv.org/pdf/2309.01779
Abstract Local stochastic gradient descent (SGD) is a fundamental approach in achieving communication efficiency in Federated Learning (FL) by allowing individual workers to perform local updates. However, the presence of heterogeneous data distributions across working nodes causes each worker to update its local model towards a local optimum, leading to the phenomenon known as client-drift" and resulting in slowed convergence. To address this issue, previous works have explored methods that either introduce communication overhead or suffer from unsteady performance. In this work, we introduce a novel metric calleddegree of divergence," quantifying the angle between the local gradient and the global reference direction. Leveraging this metric, we propose the divergence-based adaptive aggregation (DRAG) algorithm, which dynamically ``drags" the received local updates toward the reference direction in each round without requiring extra communication overhead. Furthermore, we establish a rigorous convergence analysis for DRAG, proving its ability to achieve a sublinear convergence rate. Compelling experimental results are presented to illustrate DRAG's superior performance compared to state-of-the-art algorithms in effectively managing the client-drift phenomenon. Additionally, DRAG exhibits remarkable resilience against certain Byzantine attacks. By securely sharing a small sample of the client's data with the FL server, DRAG effectively counters these attacks, as demonstrated through comprehensive experiments.
AdaPlus: Integrating Nesterov Momentum and Precise Stepsize Adjustment on AdamW Basis
Authors: Authors: Lei Guan
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.01966
Pdf link: https://arxiv.org/pdf/2309.01966
Abstract This paper proposes an efficient optimizer called AdaPlus which integrates Nesterov momentum and precise stepsize adjustment on AdamW basis. AdaPlus combines the advantages of AdamW, Nadam, and AdaBelief and, in particular, does not introduce any extra hyper-parameters. We perform extensive experimental evaluations on three machine learning tasks to validate the effectiveness of AdaPlus. The experiment results validate that AdaPlus (i) is the best adaptive method which performs most comparable with (even slightly better than) SGD with momentum on image classification tasks and (ii) outperforms other state-of-the-art optimizers on language modeling tasks and illustrates the highest stability when training GANs. The experiment code of AdaPlus is available at: https://github.com/guanleics/AdaPlus.
A Simple Asymmetric Momentum Make SGD Greatest Again
Authors: Authors: Gongyue Zhang, Dinghuang Zhang, Shuwen Zhao, Donghan Liu, Carrie M. Toptan, Honghai Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.02130
Pdf link: https://arxiv.org/pdf/2309.02130
Abstract We propose the simplest SGD enhanced method ever, Loss-Controlled Asymmetric Momentum(LCAM), aimed directly at the Saddle Point problem. Compared to the traditional SGD with Momentum, there's no increase in computational demand, yet it outperforms all current optimizers. We use the concepts of weight conjugation and traction effect to explain this phenomenon. We designed experiments to rapidly reduce the learning rate at specified epochs to trap parameters more easily at saddle points. We selected WRN28-10 as the test network and chose cifar10 and cifar100 as test datasets, an identical group to the original paper of WRN and Cosine Annealing Scheduling(CAS). We compared the ability to bypass saddle points of Asymmetric Momentum with different priorities. Finally, using WRN28-10 on Cifar100, we achieved a peak average test accuracy of 80.78\% around 120 epoch. For comparison, the original WRN paper reported 80.75\%, while CAS was at 80.42\%, all at 200 epoch. This means that while potentially increasing accuracy, we use nearly half convergence time. Our demonstration code is available at\ https://github.com/hakumaicc/Asymmetric-Momentum-LCAM
Fairness Optimization of RSMA for Uplink Communication based on Intelligent Reflecting Surface
Authors: Authors: Shanshan Zhang, Wen Chen
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.02264
Pdf link: https://arxiv.org/pdf/2309.02264
Abstract In this paper, we propose a rate-splitting multiple access (RSMA) scheme for uplink wireless communication systems with intelligent reflecting surface (IRS) aided. In the considered model, IRS is adopted to overcome power attenuation caused by path loss. We construct a max-min fairness optimization problem to obtain the resource allocation, including the receive beamforming at the base station (BS) and phase-shift beamforming at IRS. We also introduce a successive group decoding (SGD) algorithm at the receiver, which trades off the fairness and complexity of decoding. In the simulation, the results show that the proposed scheme has superiority in improving the fairness of uplink communication.
Keyword: optimization

Identifying Secure Operating Ranges for DER Control using Bilevel Optimization
Authors: Authors: Kshitij Girigoudar, Line A. Roald
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.00625
Pdf link: https://arxiv.org/pdf/2309.00625
Abstract Active distribution grids are accommodating an increasing number of controllable electric loads and distributed energy resources (DERs). A majority of these DERs are managed by entities other than the distribution utility, such as individual customers or third-party aggregators, who control the loads and DERs without consideration of any distribution grid constraints. This makes it challenging for a distribution system operator (DSO) to allow third-party aggregators and transmission operators to fully exploit the flexibility offered by these resources while also ensuring that distribution grid constraints such as voltage magnitude limits are not violated. In this paper, we develop a bilevel optimization-based framework to determine the aggregate power flexibility that can be obtained from an unbalanced distribution grid while ensuring that there is no disaggregation solution that leads to grid constraint violations. The results are a set of constraints and operating rules that are easy to communicate, and which provide the entities that procure flexibility from DERs (e.g. transmission operators or third-party aggregators) with the ability to freely implement their own disaggregation strategy without intervention from the DSO. The proposed approach is tested on two unbalanced distribution feeders and our simulation results indicate that it is possible to determine a wide range of aggregate power flexibility, as long as a simple set of rules for DER control activation are followed.
A Simulation--Based Optimization approach for analyzing the ambulance diversion phenomenon in an Emergency-Department network
Authors: Authors: Christian Piermarini, Massimo Roma
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.00643
Pdf link: https://arxiv.org/pdf/2309.00643
Abstract Ambulance Diversion (AD) is one of the possible strategies for relieving the worldwide phenomenon of Emergency Department (ED) overcrowding. It can be carried out when an ED is overloaded and consists of redirecting incoming by ambulance patients to neighboring EDs. Properly implemented, AD should result in reducing delays of patient treatment, ensuring safety and rescue of life-threatening patients. From an operational point of view, AD corresponds to a resource pooling policy among EDs in a network. In this paper we propose a novel model for studying the effectiveness of AD strategies, based on the Simulation-Based Optimization (SBO) approach. In particular, we developed a discrete event simulation model for reproducing the ED network operation. Then, for each AD policy considered, we formulate and solve an optimal resources allocation problem consisting of a bi-objective SBO problem where the target is the minimization of the non-value added time spent by patients and the overall cost incurred by the ED network. A set of optimal points belonging to the Pareto frontier is obtained for each policy. To show the reliability of the proposed approach, a real case study consisting of six large EDs in the Lazio region of Italy is considered, analyzing the effects of adopting different AD policies.
Polynomial-Model-Based Optimization for Blackbox Objectives
Authors: Authors: Janina Schreiber, Damar Wicaksono, Michael Hecht
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2309.00663
Pdf link: https://arxiv.org/pdf/2309.00663
Abstract For a wide range of applications the structure of systems like Neural Networks or complex simulations, is unknown and approximation is costly or even impossible. Black-box optimization seeks to find optimal (hyper-) parameters for these systems such that a pre-defined objective function is minimized. Polynomial-Model-Based Optimization (PMBO) is a novel blackbox optimizer that finds the minimum by fitting a polynomial surrogate to the objective function. Motivated by Bayesian optimization the model is iteratively updated according to the acquisition function Expected Improvement, thus balancing the exploitation and exploration rate and providing an uncertainty estimate of the model. PMBO is benchmarked against other state-of-the-art algorithms for a given set of artificial, analytical functions. PMBO competes successfully with those algorithms and even outperforms all of them in some cases. As the results suggest, we believe PMBO is the pivotal choice for solving blackbox optimization tasks occurring in a wide range of disciplines.
Randomized Polar Codes for Anytime Distributed Machine Learning
Authors: Authors: Burak Bartan, Mert Pilanci
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.00682
Pdf link: https://arxiv.org/pdf/2309.00682
Abstract We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. The proposed mechanism integrates the concepts of randomized sketching and polar codes in the context of coded computation. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery. Additionally, we provide an anytime estimator that can generate provably accurate estimates even when the set of available node outputs is not decodable. We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization. We present the implementation of these methods on a serverless cloud computing system and provide numerical results to demonstrate their scalability in practice, including ImageNet scale computations.
Learning Robust Model Predictive Control for Voltage Control of Islanded Microgrid
Authors: Authors: Sahand Kiani, Hamed Kebriaei, Mohsen Hamzeh, Ali Salmanpour
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.00742
Pdf link: https://arxiv.org/pdf/2309.00742
Abstract This paper proposes a novel control design for voltage tracking of an islanded AC microgrid in the presence of {nonlinear} loads and parametric uncertainties at the primary level of control. The proposed method is based on the Tube-Based Robust Model Predictive Control (RMPC), an online optimization-based method which can handle the constraints and uncertainties as well. The challenge with this method is the conservativeness imposed by designing the tube based on the worst-case scenario of the uncertainties. This weakness is amended in this paper by employing a combination of a learning-based Gaussian Process (GP) regression and RMPC. The advantage of using GP is that both the mean and variance of the loads are predicted at each iteration based on the real data, and the resulted values of mean and the bound of confidence are utilized to design the tube in RMPC. The theoretical results are also provided to prove the recursive feasibility and stability of the proposed learning based RMPC. Finally, the simulation results are carried out on both single and multiple DG (Distributed Generation) units.
Efficient RLHF: Reducing the Memory Usage of PPO
Authors: Authors: Michael Santacroce, Yadong Lu, Han Yu, Yuanzhi Li, Yelong Shen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2309.00754
Pdf link: https://arxiv.org/pdf/2309.00754
Abstract Reinforcement Learning with Human Feedback (RLHF) has revolutionized language modeling by aligning models with human preferences. However, the RL stage, Proximal Policy Optimization (PPO), requires over 3x the memory of Supervised Fine-Tuning (SFT), making it infeasible to use for most practitioners. To address this issue, we present a comprehensive analysis the memory usage, performance, and training time of memory-savings techniques for PPO. We introduce Hydra-RLHF by first integrating the SFT and Reward models and then dynamically turning LoRA "off" during training. Our experiments show: 1. Using LoRA during PPO reduces its memory usage to be smaller than SFT while improving alignment across four public benchmarks, and 2. Hydra-PPO reduces the latency per sample of LoRA-PPO by up to 65% while maintaining its performance. Our results demonstrate that Hydra-PPO is a simple and promising solution for enabling more widespread usage of RLHF.
Towards High-Frequency Tracking and Fast Edge-Aware Optimization
Authors: Authors: Akash Bapat
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.00777
Pdf link: https://arxiv.org/pdf/2309.00777
Abstract This dissertation advances the state of the art for AR/VR tracking systems by increasing the tracking frequency by orders of magnitude and proposes an efficient algorithm for the problem of edge-aware optimization. AR/VR is a natural way of interacting with computers, where the physical and digital worlds coexist. We are on the cusp of a radical change in how humans perform and interact with computing. Humans are sensitive to small misalignments between the real and the virtual world, and tracking at kilo-Hertz frequencies becomes essential. Current vision-based systems fall short, as their tracking frequency is implicitly limited by the frame-rate of the camera. This thesis presents a prototype system which can track at orders of magnitude higher than the state-of-the-art methods using multiple commodity cameras. The proposed system exploits characteristics of the camera traditionally considered as flaws, namely rolling shutter and radial distortion. The experimental evaluation shows the effectiveness of the method for various degrees of motion. Furthermore, edge-aware optimization is an indispensable tool in the computer vision arsenal for accurate filtering of depth-data and image-based rendering, which is increasingly being used for content creation and geometry processing for AR/VR. As applications increasingly demand higher resolution and speed, there exists a need to develop methods that scale accordingly. This dissertation proposes such an edge-aware optimization framework which is efficient, accurate, and algorithmically scales well, all of which are much desirable traits not found jointly in the state of the art. The experiments show the effectiveness of the framework in a multitude of computer vision tasks such as computational photography and stereo.
Diffusion Modeling with Domain-conditioned Prior Guidance for Accelerated MRI and qMRI Reconstruction
Authors: Authors: Wanyu Bian, Albert Jang, Fang Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.00783
Pdf link: https://arxiv.org/pdf/2309.00783
Abstract This study introduces a novel approach for image reconstruction based on a diffusion model conditioned on the native data domain. Our method is applied to multi-coil MRI and quantitative MRI reconstruction, leveraging the domain-conditioned diffusion model within the frequency and parameter domains. The prior MRI physics are used as embeddings in the diffusion model, enforcing data consistency to guide the training and sampling process, characterizing MRI k-space encoding in MRI reconstruction, and leveraging MR signal modeling for qMRI reconstruction. Furthermore, a gradient descent optimization is incorporated into the diffusion steps, enhancing feature learning and improving denoising. The proposed method demonstrates a significant promise, particularly for reconstructing images at high acceleration factors. Notably, it maintains great reconstruction accuracy and efficiency for static and quantitative MRI reconstruction across diverse anatomical structures. Beyond its immediate applications, this method provides potential generalization capability, making it adaptable to inverse problems across various domains.
Online Targetless Radar-Camera Extrinsic Calibration Based on the Common Features of Radar and Camera
Authors: Authors: Lei Cheng, Siyang Cao
Subjects: Robotics (cs.RO); Image and Video Processing (eess.IV); Signal Processing (eess.SP); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.00787
Pdf link: https://arxiv.org/pdf/2309.00787
Abstract Sensor fusion is essential for autonomous driving and autonomous robots, and radar-camera fusion systems have gained popularity due to their complementary sensing capabilities. However, accurate calibration between these two sensors is crucial to ensure effective fusion and improve overall system performance. Calibration involves intrinsic and extrinsic calibration, with the latter being particularly important for achieving accurate sensor fusion. Unfortunately, many target-based calibration methods require complex operating procedures and well-designed experimental conditions, posing challenges for researchers attempting to reproduce the results. To address this issue, we introduce a novel approach that leverages deep learning to extract a common feature from raw radar data (i.e., Range-Doppler-Angle data) and camera images. Instead of explicitly representing these common features, our method implicitly utilizes these common features to match identical objects from both data sources. Specifically, the extracted common feature serves as an example to demonstrate an online targetless calibration method between the radar and camera systems. The estimation of the extrinsic transformation matrix is achieved through this feature-based approach. To enhance the accuracy and robustness of the calibration, we apply the RANSAC and Levenberg-Marquardt (LM) nonlinear optimization algorithm for deriving the matrix. Our experiments in the real world demonstrate the effectiveness and accuracy of our proposed method.
Deep Learning and Inverse Problems
Authors: Authors: Ali Mohammad-Djafari, Ning Chu, Li Wang, Liang Yu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.00802
Pdf link: https://arxiv.org/pdf/2309.00802
Abstract Machine Learning (ML) methods and tools have gained great success in many data, signal, image and video processing tasks, such as classification, clustering, object detection, semantic segmentation, language processing, Human-Machine interface, etc. In computer vision, image and video processing, these methods are mainly based on Neural Networks (NN) and in particular Convolutional NN (CNN), and more generally Deep NN. Inverse problems arise anywhere we have indirect measurement. As, in general, those inverse problems are ill-posed, to obtain satisfactory solutions for them needs prior information. Different regularization methods have been proposed, where the problem becomes the optimization of a criterion with a likelihood term and a regularization term. The main difficulty, however, in great dimensional real applications, remains the computational cost. Using NN, and in particular Deep Learning (DL) surrogate models and approximate computation, can become very helpful. In this work, we focus on NN and DL particularly adapted for inverse problems. We consider two cases: First the case where the forward operator is known and used as physics constraint, the second more general data driven DL methods.
Domain Generalization via Balancing Training Difficulty and Model Capability
Authors: Authors: Xueying Jiang, Jiaxing Huang, Sheng Jin, Shijian Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.00844
Pdf link: https://arxiv.org/pdf/2309.00844
Abstract Domain generalization (DG) aims to learn domain-generalizable models from one or multiple source domains that can perform well in unseen target domains. Despite its recent progress, most existing work suffers from the misalignment between the difficulty level of training samples and the capability of contemporarily trained models, leading to over-fitting or under-fitting in the trained generalization model. We design MoDify, a Momentum Difficulty framework that tackles the misalignment by balancing the seesaw between the model's capability and the samples' difficulties along the training process. MoDify consists of two novel designs that collaborate to fight against the misalignment while learning domain-generalizable models. The first is MoDify-based Data Augmentation which exploits an RGB Shuffle technique to generate difficulty-aware training samples on the fly. The second is MoDify-based Network Optimization which dynamically schedules the training samples for balanced and smooth learning with appropriate difficulty. Without bells and whistles, a simple implementation of MoDify achieves superior performance across multiple benchmarks. In addition, MoDify can complement existing methods as a plug-in, and it is generic and can work for different visual recognition tasks.
Drift Analysis with Fitness Levels for Elitist Evolutionary Algorithms
Authors: Authors: Jun He, Yuren Zhou
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2309.00851
Pdf link: https://arxiv.org/pdf/2309.00851
Abstract The fitness level method is a popular tool for analyzing the computation time of elitist evolutionary algorithms. Its idea is to divide the search space into multiple fitness levels and estimate lower and upper bounds on the computation time using transition probabilities between fitness levels. However, the lower bound generated from this method is often not tight. To improve the lower bound, this paper rigorously studies an open question about the fitness level method: what are the tightest lower and upper time bounds that can be constructed based on fitness levels? To answer this question, drift analysis with fitness levels is developed, and the tightest bound problem is formulated as a constrained multi-objective optimization problem subject to fitness level constraints. The tightest metric bounds from fitness levels are constructed and proven for the first time. Then the metric bounds are converted into linear bounds, where existing linear bounds are special cases. This paper establishes a general framework that can cover various linear bounds from trivial to best coefficients. It is generic and promising, as it can be used not only to draw the same bounds as existing ones, but also to draw tighter bounds, especially on fitness landscapes where shortcuts exist. This is demonstrated in the case study of the (1+1) EA maximizing the TwoPath function.
Network Topology Inference with Sparsity and Laplacian Constraints
Authors: Authors: Jiaxi Ying, Xi Han, Rui Zhou, Xiwen Wang, Hing Cheung So
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.00960
Pdf link: https://arxiv.org/pdf/2309.00960
Abstract We tackle the network topology inference problem by utilizing Laplacian constrained Gaussian graphical models, which recast the task as estimating a precision matrix in the form of a graph Laplacian. Recent research \cite{ying2020nonconvex} has uncovered the limitations of the widely used $\ell_1$-norm in learning sparse graphs under this model: empirically, the number of nonzero entries in the solution grows with the regularization parameter of the $\ell_1$-norm; theoretically, a large regularization parameter leads to a fully connected (densest) graph. To overcome these challenges, we propose a graph Laplacian estimation method incorporating the $\ell_0$-norm constraint. An efficient gradient projection algorithm is developed to solve the resulting optimization problem, characterized by sparsity and Laplacian constraints. Through numerical experiments with synthetic and financial time-series datasets, we demonstrate the effectiveness of the proposed method in network topology inference.
On the training and generalization of deep operator networks
Authors: Authors: Sanghyun Lee, Yeonjong Shin
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.01020
Pdf link: https://arxiv.org/pdf/2309.01020
Abstract We present a novel training method for deep operator networks (DeepONets), one of the most popular neural network models for operators. DeepONets are constructed by two sub-networks, namely the branch and trunk networks. Typically, the two sub-networks are trained simultaneously, which amounts to solving a complex optimization problem in a high dimensional space. In addition, the nonconvex and nonlinear nature makes training very challenging. To tackle such a challenge, we propose a two-step training method that trains the trunk network first and then sequentially trains the branch network. The core mechanism is motivated by the divide-and-conquer paradigm and is the decomposition of the entire complex training task into two subtasks with reduced complexity. Therein the Gram-Schmidt orthonormalization process is introduced which significantly improves stability and generalization ability. On the theoretical side, we establish a generalization error estimate in terms of the number of training data, the width of DeepONets, and the number of input and output sensors. Numerical examples are presented to demonstrate the effectiveness of the two-step training method, including Darcy flow in heterogeneous porous media.
Optimizing Mobile-Edge AI-Generated Everything (AIGX) Services by Prompt Engineering: Fundamental, Framework, and Case Study
Authors: Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Shuguang Cui, Xuemin Shen, Ping Zhang
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2309.01065
Pdf link: https://arxiv.org/pdf/2309.01065
Abstract As the next-generation paradigm for content creation, AI-Generated Content (AIGC), i.e., generating content automatically by Generative AI (GAI) based on user prompts, has gained great attention and success recently. With the ever-increasing power of GAI, especially the emergence of Pretrained Foundation Models (PFMs) that contain billions of parameters and prompt engineering methods (i.e., finding the best prompts for the given task), the application range of AIGC is rapidly expanding, covering various forms of information for human, systems, and networks, such as network designs, channel coding, and optimization solutions. In this article, we present the concept of mobile-edge AI-Generated Everything (AIGX). Specifically, we first review the building blocks of AIGX, the evolution from AIGC to AIGX, as well as practical AIGX applications. Then, we present a unified mobile-edge AIGX framework, which employs edge devices to provide PFM-empowered AIGX services and optimizes such services via prompt engineering. More importantly, we demonstrate that suboptimal prompts lead to poor generation quality, which adversely affects user satisfaction, edge network performance, and resource utilization. Accordingly, we conduct a case study, showcasing how to train an effective prompt optimizer using ChatGPT and investigating how much improvement is possible with prompt engineering in terms of user experience, quality of generation, and network performance.
Enhancing Infrared Small Target Detection Robustness with Bi-Level Adversarial Framework
Authors: Authors: Zhu Liu, Zihang Chen, Jinyuan Liu, Long Ma, Xin Fan, Risheng Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.01099
Pdf link: https://arxiv.org/pdf/2309.01099
Abstract The detection of small infrared targets against blurred and cluttered backgrounds has remained an enduring challenge. In recent years, learning-based schemes have become the mainstream methodology to establish the mapping directly. However, these methods are susceptible to the inherent complexities of changing backgrounds and real-world disturbances, leading to unreliable and compromised target estimations. In this work, we propose a bi-level adversarial framework to promote the robustness of detection in the presence of distinct corruptions. We first propose a bi-level optimization formulation to introduce dynamic adversarial learning. Specifically, it is composited by the learnable generation of corruptions to maximize the losses as the lower-level objective and the robustness promotion of detectors as the upper-level one. We also provide a hierarchical reinforced learning strategy to discover the most detrimental corruptions and balance the performance between robustness and accuracy. To better disentangle the corruptions from salient features, we also propose a spatial-frequency interaction network for target detection. Extensive experiments demonstrate our scheme remarkably improves 21.96% IOU across a wide array of corruptions and notably promotes 4.97% IOU on the general benchmark. The source codes are available at https://github.com/LiuZhu-CV/BALISTD.
Hybrid-Supervised Dual-Search: Leveraging Automatic Learning for Loss-free Multi-Exposure Image Fusion
Authors: Authors: Guanyao Wu, Hongming Fu, Jinyuan Liu, Long Ma, Xin Fan, Risheng Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.01113
Pdf link: https://arxiv.org/pdf/2309.01113
Abstract Multi-exposure image fusion (MEF) has emerged as a prominent solution to address the limitations of digital imaging in representing varied exposure levels. Despite its advancements, the field grapples with challenges, notably the reliance on manual designs for network structures and loss functions, and the constraints of utilizing simulated reference images as ground truths. Consequently, current methodologies often suffer from color distortions and exposure artifacts, further complicating the quest for authentic image representation. In addressing these challenges, this paper presents a Hybrid-Supervised Dual-Search approach for MEF, dubbed HSDS-MEF, which introduces a bi-level optimization search scheme for automatic design of both network structures and loss functions. More specifically, we harnesses a unique dual research mechanism rooted in a novel weighted structure refinement architecture search. Besides, a hybrid supervised contrast constraint seamlessly guides and integrates with searching process, facilitating a more adaptive and comprehensive search for optimal loss functions. We realize the state-of-the-art performance in comparison to various competitive schemes, yielding a 10.61% and 4.38% improvement in Visual Information Fidelity (VIF) for general and no-reference scenarios, respectively, while providing results with high contrast, rich details and colors.
AutoML-GPT: Large Language Model for AutoML
Authors: Authors: Yun-Da Tsai, Yu-Che Tsai, Bo-Wei Huang, Chun-Pai Yang, Shou-De Lin
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.01125
Pdf link: https://arxiv.org/pdf/2309.01125
Abstract With the emerging trend of GPT models, we have established a framework called AutoML-GPT that integrates a comprehensive set of tools and libraries. This framework grants users access to a wide range of data preprocessing techniques, feature engineering methods, and model selection algorithms. Through a conversational interface, users can specify their requirements, constraints, and evaluation metrics. Throughout the process, AutoML-GPT employs advanced techniques for hyperparameter optimization and model selection, ensuring that the resulting model achieves optimal performance. The system effectively manages the complexity of the machine learning pipeline, guiding users towards the best choices without requiring deep domain knowledge. Through our experimental results on diverse datasets, we have demonstrated that AutoML-GPT significantly reduces the time and effort required for machine learning tasks. Its ability to leverage the vast knowledge encoded in large language models enables it to provide valuable insights, identify potential pitfalls, and suggest effective solutions to common challenges faced during model training.
Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction
Authors: Authors: Xiaoke Shang, Gehui Li, Zhiying Jiang, Shaomin Zhang, Nai Ding, Jinyuan Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.01183
Pdf link: https://arxiv.org/pdf/2309.01183
Abstract The correction of exposure-related issues is a pivotal component in enhancing the quality of images, offering substantial implications for various computer vision tasks. Historically, most methodologies have predominantly utilized spatial domain recovery, offering limited consideration to the potentialities of the frequency domain. Additionally, there has been a lack of a unified perspective towards low-light enhancement, exposure correction, and multi-exposure fusion, complicating and impeding the optimization of image processing. In response to these challenges, this paper proposes a novel methodology that leverages the frequency domain to improve and unify the handling of exposure correction tasks. Our method introduces Holistic Frequency Attention and Dynamic Frequency Feed-Forward Network, which replace conventional correlation computation in the spatial-domain. They form a foundational building block that facilitates a U-shaped Holistic Dynamic Frequency Transformer as a filter to extract global information and dynamically select important frequency bands for image restoration. Complementing this, we employ a Laplacian pyramid to decompose images into distinct frequency bands, followed by multiple restorers, each tuned to recover specific frequency-band information. The pyramid fusion allows a more detailed and nuanced image restoration process. Ultimately, our structure unifies the three tasks of low-light enhancement, exposure correction, and multi-exposure fusion, enabling comprehensive treatment of all classical exposure errors. Benchmarking on mainstream datasets for these tasks, our proposed method achieves state-of-the-art results, paving the way for more sophisticated and unified solutions in exposure correction.
Performance-Sensitive Potential Functions for Efficient Flow of Connected and Automated Vehicles
Authors: Authors: Filippos N. Tzortzoglou, Dionysios Theodosis, Aditya Dave, Andreas Malikopoulos
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.01238
Pdf link: https://arxiv.org/pdf/2309.01238
Abstract Connected and automated vehicles (CAVs) provide the most intriguing opportunity for enabling users to monitor transportation network conditions and make better decisions for improving safety and transportation efficiency. In this paper, we address the problem of effectively coordinating CAVs on lane-based roadways. Our approach utilizes potential functions to generate repulsive forces between CAVs that ensure collision avoidance. However, such potential functions can lead to unrealistic acceleration profiles and large inter-vehicle distances. The primary contribution of this work is the introduction of performance-sensitive potential functions to address these challenges. In our approach, the parameters of a potential function are determined through an optimization problem aiming to reduce both acceleration and inter-vehicle distances. To circumvent the computational implications due to the complexity of the resulting optimization problem that prevents the derivation of a real-time solution, we train a neural network model to learn the mapping of initial conditions to optimal parameters derived offline. Then, we prove sufficient criteria for the sampled-data model to ensure that the neural network output does not activate any of the state and safety constraints. Finally, we provide simulation results to demonstrate the effectiveness of the proposed approach.
An FPGA smart camera implementation of segmentation models for drone wildfire imagery
Authors: Authors: Eduardo Guarduño-Martinez, Jorge Ciprian-Sanchez, Gerardo Valente, Vazquez-Garcia, Gerardo Rodriguez-Hernandez, Adriana Palacios-Rosas, Lucile Rossi-Tisson, Gilberto Ochoa-Ruiz
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2309.01318
Pdf link: https://arxiv.org/pdf/2309.01318
Abstract Wildfires represent one of the most relevant natural disasters worldwide, due to their impact on various societal and environmental levels. Thus, a significant amount of research has been carried out to investigate and apply computer vision techniques to address this problem. One of the most promising approaches for wildfire fighting is the use of drones equipped with visible and infrared cameras for the detection, monitoring, and fire spread assessment in a remote manner but in close proximity to the affected areas. However, implementing effective computer vision algorithms on board is often prohibitive since deploying full-precision deep learning models running on GPU is not a viable option, due to their high power consumption and the limited payload a drone can handle. Thus, in this work, we posit that smart cameras, based on low-power consumption field-programmable gate arrays (FPGAs), in tandem with binarized neural networks (BNNs), represent a cost-effective alternative for implementing onboard computing on the edge. Herein we present the implementation of a segmentation model applied to the Corsican Fire Database. We optimized an existing U-Net model for such a task and ported the model to an edge device (a Xilinx Ultra96-v2 FPGA). By pruning and quantizing the original model, we reduce the number of parameters by 90%. Furthermore, additional optimizations enabled us to increase the throughput of the original model from 8 frames per second (FPS) to 33.63 FPS without loss in the segmentation performance: our model obtained 0.912 in Matthews correlation coefficient (MCC),0.915 in F1 score and 0.870 in Hafiane quality index (HAF), and comparable qualitative segmentation results when contrasted to the original full-precision model. The final model was integrated into a low-cost FPGA, which was used to implement a neural network accelerator.
Joint Oscillation Damping and Inertia Provision Service for Converter-Interfaced Generation
Authors: Authors: Cheng Feng, Linbin Huang, Xiuqiang He, Yi Wang, Florian Dörfler, Qixin Chen
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.01321
Pdf link: https://arxiv.org/pdf/2309.01321
Abstract As renewable generation becomes more prevalent, traditional power systems dominated by synchronous generators are transitioning to systems dominated by converter-interfaced generation. These devices, with their weaker damping capabilities and lower inertia, compromise the system's ability to withstand disturbances, pose a threat to system stability, and lead to oscillations and poor frequency response performance. While some new converter-interfaced generations are capable of providing superior damping and fast frequency control, there is a lack of effective measures to incentivize manufacturers to adopt them. To address this gap, this paper defines the joint oscillation damping and inertia provision services at the system level, seeking to encourage converter-interfaced generation to provide enhanced damping and fast frequency response capabilities. Our approach is anchored in a novel convex parametric formulation that combines oscillation mode and frequency stability constraints. These constraints ensure a sufficient damping ratio for all oscillation modes and maintain transient frequency trajectories within acceptable limits. They are designed to integrate smoothly into various operational and planning optimization frameworks. Using this formulation, we introduce a joint service for oscillation damping and inertia provision based on a cost-minimization problem. This facilitates the optimal allocation of damping and virtual inertia to converters, achieving both small-signal stability and frequency stability. Furthermore, we investigate the economic effects of introducing this service into a new ancillary service market, assessing its impact on system operations and cost-efficiency. Numerical tests highlight the service's efficacy in ensuring both small-signal stability and frequency stability, and offer insights into potential economic benefits.
Can I Trust Your Answer? Visually Grounded Video Question Answering
Authors: Authors: Junbin Xiao, Angela Yao, Yicong Li, Tat Seng Chua
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2309.01327
Pdf link: https://arxiv.org/pdf/2309.01327
Abstract We study visually grounded VideoQA in response to the emerging trends of utilizing pretraining techniques for video-language understanding. Specifically, by forcing vision-language models (VLMs) to answer questions and simultaneously provide visual evidence, we seek to ascertain the extent to which the predictions of such techniques are genuinely anchored in relevant video content, versus spurious correlations from language or irrelevant visual context. Towards this, we construct NExT-GQA -- an extension of NExT-QA with 10.5$K$ temporal grounding (or location) labels tied to the original QA pairs. With NExT-GQA, we scrutinize a variety of state-of-the-art VLMs. Through post-hoc attention analysis, we find that these models are weak in substantiating the answers despite their strong QA performance. This exposes a severe limitation of these models in making reliable predictions. As a remedy, we further explore and suggest a video grounding mechanism via Gaussian mask optimization and cross-modal learning. Experiments with different backbones demonstrate that this grounding mechanism improves both video grounding and QA. Our dataset and code are released. With these efforts, we aim to push towards the reliability of deploying VLMs in VQA systems.
Memory augment is All You Need for image restoration
Authors: Authors: Xiao Feng Zhang, Chao Chen Gu, Shan Ying Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.01377
Pdf link: https://arxiv.org/pdf/2309.01377
Abstract Image restoration is a low-level vision task, most CNN methods are designed as a black box, lacking transparency and internal aesthetics. Although some methods combining traditional optimization algorithms with DNNs have been proposed, they all have some limitations. In this paper, we propose a three-granularity memory layer and contrast learning named MemoryNet, specifically, dividing the samples into positive, negative, and actual three samples for contrastive learning, where the memory layer is able to preserve the deep features of the image and the contrastive learning converges the learned features to balance. Experiments on Derain/Deshadow/Deblur task demonstrate that these methods are effective in improving restoration performance. In addition, this paper's model obtains significant PSNR, SSIM gain on three datasets with different degradation types, which is a strong proof that the recovered images are perceptually realistic. The source code of MemoryNet can be obtained from https://github.com/zhangbaijin/MemoryNet
Federated cINN Clustering for Accurate Clustered Federated Learning
Authors: Authors: Yuhao Zhou, Minjia Shi, Yuxin Tian, Yuanxi Li, Qing Ye, Jiancheng Lv
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.01515
Pdf link: https://arxiv.org/pdf/2309.01515
Abstract Federated Learning (FL) presents an innovative approach to privacy-preserving distributed machine learning and enables efficient crowd intelligence on a large scale. However, a significant challenge arises when coordinating FL with crowd intelligence which diverse client groups possess disparate objectives due to data heterogeneity or distinct tasks. To address this challenge, we propose the Federated cINN Clustering Algorithm (FCCA) to robustly cluster clients into different groups, avoiding mutual interference between clients with data heterogeneity, and thereby enhancing the performance of the global model. Specifically, FCCA utilizes a global encoder to transform each client's private data into multivariate Gaussian distributions. It then employs a generative model to learn encoded latent features through maximum likelihood estimation, which eases optimization and avoids mode collapse. Finally, the central server collects converged local models to approximate similarities between clients and thus partition them into distinct clusters. Extensive experimental results demonstrate FCCA's superiority over other state-of-the-art clustered federated learning algorithms, evaluated on various models and datasets. These results suggest that our approach has substantial potential to enhance the efficiency and accuracy of real-world federated learning tasks.
Is Your Learned Query Optimizer Behaving As You Expect? A Machine Learning Perspective
Authors: Authors: Claude Lehmann, Pavel Sulimov, Kurt Stockinger
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2309.01551
Pdf link: https://arxiv.org/pdf/2309.01551
Abstract The current boom of learned query optimizers (LQO) can be explained not only by the general continuous improvement of deep learning (DL) methods but also by the straightforward formulation of a query optimization problem (QOP) as a machine learning (ML) one. The idea is often to replace dynamic programming approaches, widespread for solving QOP, with more powerful methods such as reinforcement learning. However, such a rapid "game change" in the field of QOP could not pass without consequences - other parts of the ML pipeline, except for predictive model development, have large improvement potential. For instance, different LQOs introduce their own restrictions on training data generation from queries, use an arbitrary train/validation approach, and evaluate on a voluntary split of benchmark queries. In this paper, we attempt to standardize the ML pipeline for evaluating LQOs by introducing a new end-to-end benchmarking framework. Additionally, we guide the reader through each data science stage in the ML pipeline and provide novel insights from the machine learning perspective, considering the specifics of QOP. Finally, we perform a rigorous evaluation of existing LQOs, showing that PostgreSQL outperforms these LQOs in almost all experiments depending on the train/test splits.
Homomorphically encrypted gradient descent algorithms for quadratic programming
Authors: Authors: André Bertolace, Konstantinos Gatsis, Kostas Margellos
Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.01559
Pdf link: https://arxiv.org/pdf/2309.01559
Abstract In this paper, we evaluate the different fully homomorphic encryption schemes, propose an implementation, and numerically analyze the applicability of gradient descent algorithms to solve quadratic programming in a homomorphic encryption setup. The limit on the multiplication depth of homomorphic encryption circuits is a major challenge for iterative procedures such as gradient descent algorithms. Our analysis not only quantifies these limitations on prototype examples, thus serving as a benchmark for future investigations, but also highlights additional trade-offs like the ones pertaining the choice of gradient descent or accelerated gradient descent methods, opening the road for the use of homomorphic encryption techniques in iterative procedures widely used in optimization based control. In addition, we argue that, among the available homomorphic encryption schemes, the one adopted in this work, namely CKKS, is the only suitable scheme for implementing gradient descent algorithms. The choice of the appropriate step size is crucial to the convergence of the procedure. The paper shows firsthand the feasibility of homomorphically encrypted gradient descent algorithms.
Cross-Consistent Deep Unfolding Network for Adaptive All-In-One Video Restoration
Authors: Authors: Yuanshuo Cheng, Mingwen Shao, Yecong Wan, Lixu Zhang, Wangmeng Zuo, Deyu Meng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.01627
Pdf link: https://arxiv.org/pdf/2309.01627
Abstract Existing Video Restoration (VR) methods always necessitate the individual deployment of models for each adverse weather to remove diverse adverse weather degradations, lacking the capability for adaptive processing of degradations. Such limitation amplifies the complexity and deployment costs in practical applications. To overcome this deficiency, in this paper, we propose a Cross-consistent Deep Unfolding Network (CDUN) for All-In-One VR, which enables the employment of a single model to remove diverse degradations for the first time. Specifically, the proposed CDUN accomplishes a novel iterative optimization framework, capable of restoring frames corrupted by corresponding degradations according to the degradation features given in advance. To empower the framework for eliminating diverse degradations, we devise a Sequence-wise Adaptive Degradation Estimator (SADE) to estimate degradation features for the input corrupted video. By orchestrating these two cascading procedures, CDUN achieves adaptive processing for diverse degradation. In addition, we introduce a window-based inter-frame fusion strategy to utilize information from more adjacent frames. This strategy involves the progressive stacking of temporal windows in multiple iterations, effectively enlarging the temporal receptive field and enabling each frame's restoration to leverage information from distant frames. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance in All-In-One VR.
Representing Edge Flows on Graphs via Sparse Cell Complexes
Authors: Authors: Josef Hoppe, Michael T. Schaub
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.01632
Pdf link: https://arxiv.org/pdf/2309.01632
Abstract Obtaining sparse, interpretable representations of observable data is crucial in many machine learning and signal processing tasks. For data representing flows along the edges of a graph, an intuitively interpretable way to obtain such representations is to lift the graph structure to a simplicial complex: The eigenvectors of the associated Hodge-Laplacian, respectively the incidence matrices of the corresponding simplicial complex then induce a Hodge decomposition, which can be used to represent the observed data in terms of gradient, curl, and harmonic flows. In this paper, we generalize this approach to cellular complexes and introduce the cell inference optimization problem, i.e., the problem of augmenting the observed graph by a set of cells, such that the eigenvectors of the associated Hodge Laplacian provide a sparse, interpretable representation of the observed edge flows on the graph. We show that this problem is NP-hard and introduce an efficient approximation algorithm for its solution. Experiments on real-world and synthetic data demonstrate that our algorithm outperforms current state-of-the-art methods while being computationally efficient.
ReLoc-PDR: Visual Relocalization Enhanced Pedestrian Dead Reckoning via Graph Optimization
Authors: Authors: Zongyang Chen, Xianfei Pan, Changhao Chen
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.01646
Pdf link: https://arxiv.org/pdf/2309.01646
Abstract Accurately and reliably positioning pedestrians in satellite-denied conditions remains a significant challenge. Pedestrian dead reckoning (PDR) is commonly employed to estimate pedestrian location using low-cost inertial sensor. However, PDR is susceptible to drift due to sensor noise, incorrect step detection, and inaccurate stride length estimation. This work proposes ReLoc-PDR, a fusion framework combining PDR and visual relocalization using graph optimization. ReLoc-PDR leverages time-correlated visual observations and learned descriptors to achieve robust positioning in visually-degraded environments. A graph optimization-based fusion mechanism with the Tukey kernel effectively corrects cumulative errors and mitigates the impact of abnormal visual observations. Real-world experiments demonstrate that our ReLoc-PDR surpasses representative methods in accuracy and robustness, achieving accurte and robust pedestrian positioning results using only a smartphone in challenging environments such as less-textured corridors and dark nighttime scenarios.
Stochastic Model Predictive Control for Networked Systems with Random Delays and Packet Losses in All Channels
Authors: Authors: Marijan Palmisano, Martin Steinberger, Martin Horn
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.01679
Pdf link: https://arxiv.org/pdf/2309.01679
Abstract A stochastic Model Predictive Control strategy for control systems with communication networks between the sensor node and the controller and between the controller and the actuator node is proposed. Data packets are subject to random delays and packet loss is possible; acknowledgments for received packets are not provided. The expected value of a quadratic cost is minimized subject to linear constraints; the set of all initial states for which the resulting optimization problem is guaranteed to be feasible is provided. The state vector of the controlled plant is shown to converge to zero with probability one.
ControlMat: A Controlled Generative Approach to Material Capture
Authors: Authors: Giuseppe Vecchio, Rosalie Martin, Arthur Roullier, Adrien Kaiser, Romain Rouffet, Valentin Deschaintre, Tamy Boubekeur
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2309.01700
Pdf link: https://arxiv.org/pdf/2309.01700
Abstract Material reconstruction from a photograph is a key component of 3D content creation democratization. We propose to formulate this ill-posed problem as a controlled synthesis one, leveraging the recent progress in generative deep networks. We present ControlMat, a method which, given a single photograph with uncontrolled illumination as input, conditions a diffusion model to generate plausible, tileable, high-resolution physically-based digital materials. We carefully analyze the behavior of diffusion models for multi-channel outputs, adapt the sampling process to fuse multi-scale information and introduce rolled diffusion to enable both tileability and patched diffusion for high-resolution outputs. Our generative approach further permits exploration of a variety of materials which could correspond to the input image, mitigating the unknown lighting conditions. We show that our approach outperforms recent inference and latent-space-optimization methods, and carefully validate our diffusion process design choices. Supplemental materials and additional details are available at: https://gvecchio.com/controlmat/.
From Kubernetes to Knactor: A State-Centric Rethink of Service Integration
Authors: Authors: Silvery Fu, Hong Zhang, Ryan Teoh, Taras Priadka, Sylvia Ratnasamy
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2309.01805
Pdf link: https://arxiv.org/pdf/2309.01805
Abstract Microservices are increasingly used in modern applications, leading to a growing need for effective service integration solutions. However, we argue that traditional API-centric integration mechanisms (e.g., RPC, REST, and Pub/Sub) hamper the modularity of microservices. These mechanisms introduce rigid code-level coupling, scatter integration logic, and hinder visibility into cross-service state exchanges. Ultimately, these limitations complicate the maintenance and evolution of microservice-based applications. In response, we propose a rethinking of service integration and present Knactor, a new state-centric integration framework to restore the modularity that microservices were intended to offer. Knactor decouples service integration from service development, allowing integration to be implemented as explicit state exchanges among multiple services. Our initial case study suggests that Knactor simplifies service integration and creates new opportunities for optimizations.
Instant Continual Learning of Neural Radiance Fields
Authors: Authors: Ryan Po, Zhengyang Dong, Alexander W. Bergman, Gordon Wetzstein
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.01811
Pdf link: https://arxiv.org/pdf/2309.01811
Abstract Neural radiance fields (NeRFs) have emerged as an effective method for novel-view synthesis and 3D scene reconstruction. However, conventional training methods require access to all training views during scene optimization. This assumption may be prohibitive in continual learning scenarios, where new data is acquired in a sequential manner and a continuous update of the NeRF is desired, as in automotive or remote sensing applications. When naively trained in such a continual setting, traditional scene representation frameworks suffer from catastrophic forgetting, where previously learned knowledge is corrupted after training on new data. Prior works in alleviating forgetting with NeRFs suffer from low reconstruction quality and high latency, making them impractical for real-world application. We propose a continual learning framework for training NeRFs that leverages replay-based methods combined with a hybrid explicit--implicit scene representation. Our method outperforms previous methods in reconstruction quality when trained in a continual setting, while having the additional benefit of being an order of magnitude faster.
Inverse Dynamics Trajectory Optimization for Contact-Implicit Model Predictive Control
Authors: Authors: Vince Kurtz, Alejandro Castro, Aykut Özgün Önol, Hai Lin
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.01813
Pdf link: https://arxiv.org/pdf/2309.01813
Abstract Robots must make and break contact to interact with the world and perform useful tasks. However, planning and control through contact remains a formidable challenge. In this work, we achieve real-time contact-implicit model predictive control with a surprisingly simple method: inverse dynamics trajectory optimization. While trajectory optimization with inverse dynamics is not new, we introduce a series of incremental innovations that collectively enable fast model predictive control on a variety of challenging manipulation and locomotion tasks. We implement these innovations in an open-source solver, and present a variety of simulation examples to support the effectiveness of the proposed approach. Additionally, we demonstrate contact-implicit model predictive control on hardware at over 100 Hz for a 20 degree-of-freedom bi-manual manipulation task.
Computation and Communication Efficient Federated Learning over Wireless Networks
Authors: Authors: Xiaonan Liu, Tharmalingam Ratnarajah
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2309.01816
Pdf link: https://arxiv.org/pdf/2309.01816
Abstract Federated learning (FL) allows model training from local data by edge devices while preserving data privacy. However, the learning accuracy decreases due to the heterogeneity of devices data, and the computation and communication latency increase when updating large scale learning models on devices with limited computational capability and wireless resources. To overcome these challenges, we consider a novel FL framework with partial model pruning and personalization. This framework splits the learning model into a global part with model pruning shared with all devices to learn data representations and a personalized part to be fine tuned for a specific device, which adapts the model size during FL to reduce both computation and communication overhead and minimize the overall training time, and increases the learning accuracy for the device with non independent and identically distributed (non IID) data. Then, the computation and communication latency and the convergence analysis of the proposed FL framework are mathematically analyzed. Based on the convergence analysis, an optimization problem is formulated to maximize the convergence rate under a latency threshold by jointly optimizing the pruning ratio and wireless resource allocation. By decoupling the optimization problem and deploying Karush Kuhn Tucker (KKT) conditions, we derive the closed form solutions of pruning ratio and wireless resource allocation. Finally, experimental results demonstrate that the proposed FL framework achieves a remarkable reduction of approximately 50 percents computation and communication latency compared with the scheme only with model personalization.
LoopTune: Optimizing Tensor Computations with Reinforcement Learning
Authors: Authors: Dejan Grubisic, Bram Wasti, Chris Cummins, John Mellor-Crummey, Aleksandar Zlateski
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2309.01825
Pdf link: https://arxiv.org/pdf/2309.01825
Abstract Advanced compiler technology is crucial for enabling machine learning applications to run on novel hardware, but traditional compilers fail to deliver performance, popular auto-tuners have long search times and expert-optimized libraries introduce unsustainable costs. To address this, we developed LoopTune, a deep reinforcement learning compiler that optimizes tensor computations in deep learning models for the CPU. LoopTune optimizes tensor traversal order while using the ultra-fast lightweight code generator LoopNest to perform hardware-specific optimizations. With a novel graph-based representation and action space, LoopTune speeds up LoopNest by 3.2x, generating an order of magnitude faster code than TVM, 2.8x faster than MetaSchedule, and 1.08x faster than AutoTVM, consistently performing at the level of the hand-tuned library Numpy. Moreover, LoopTune tunes code in order of seconds.
Towards General and Efficient Online Tuning for Spark
Authors: Authors: Yang Li, Huaijun Jiang, Yu Shen, Yide Fang, Xiaofeng Yang, Danqing Huang, Xinyi Zhang, Wentao Zhang, Ce Zhang, Peng Chen, Bin Cui
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.01901
Pdf link: https://arxiv.org/pdf/2309.01901
Abstract The distributed data analytic system -- Spark is a common choice for processing massive volumes of heterogeneous data, while it is challenging to tune its parameters to achieve high performance. Recent studies try to employ auto-tuning techniques to solve this problem but suffer from three issues: limited functionality, high overhead, and inefficient search. In this paper, we present a general and efficient Spark tuning framework that can deal with the three issues simultaneously. First, we introduce a generalized tuning formulation, which can support multiple tuning goals and constraints conveniently, and a Bayesian optimization (BO) based solution to solve this generalized optimization problem. Second, to avoid high overhead from additional offline evaluations in existing methods, we propose to tune parameters along with the actual periodic executions of each job (i.e., online evaluations). To ensure safety during online job executions, we design a safe configuration acquisition method that models the safe region. Finally, three innovative techniques are leveraged to further accelerate the search process: adaptive sub-space generation, approximate gradient descent, and meta-learning method. We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent. The empirical results on both public benchmarks and large-scale production tasks demonstrate its superiority in terms of practicality, generality, and efficiency. Notably, this service saves an average of 57.00% memory cost and 34.93% CPU cost on 25K in-production tasks within 20 iterations, respectively.
Efficient Bayesian Computational Imaging with a Surrogate Score-Based Prior
Authors: Authors: Berthy T. Feng, Katherine L. Bouman
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.01949
Pdf link: https://arxiv.org/pdf/2309.01949
Abstract We propose a surrogate function for efficient use of score-based priors for Bayesian inverse imaging. Recent work turned score-based diffusion models into probabilistic priors for solving ill-posed imaging problems by appealing to an ODE-based log-probability function. However, evaluating this function is computationally inefficient and inhibits posterior estimation of high-dimensional images. Our proposed surrogate prior is based on the evidence lower-bound of a score-based diffusion model. We demonstrate the surrogate prior on variational inference for efficient approximate posterior sampling of large images. Compared to the exact prior in previous work, our surrogate prior accelerates optimization of the variational image distribution by at least two orders of magnitude. We also find that our principled approach achieves higher-fidelity images than non-Bayesian baselines that involve hyperparameter-tuning at inference. Our work establishes a practical path forward for using score-based diffusion models as general-purpose priors for imaging.
Automatic Data Transformation Using Large Language Model An Experimental Study on Building Energy Data
Authors: Authors: Ankita Sharma, Xuanmao Li, Hong Guan, Guoxin Sun, Liang Zhang, Lanjun Wang, Kesheng Wu, Lei Cao, Erkang Zhu, Alexander Sim, Teresa Wu, Jia Zou
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2309.01957
Pdf link: https://arxiv.org/pdf/2309.01957
Abstract Existing approaches to automatic data transformation are insufficient to meet the requirements in many real-world scenarios, such as the building sector. First, there is no convenient interface for domain experts to provide domain knowledge easily. Second, they require significant training data collection overheads. Third, the accuracy suffers from complicated schema changes. To bridge this gap, we present a novel approach that leverages the unique capabilities of large language models (LLMs) in coding, complex reasoning, and zero-shot learning to generate SQL code that transforms the source datasets into the target datasets. We demonstrate the viability of this approach by designing an LLM-based framework, termed SQLMorpher, which comprises a prompt generator that integrates the initial prompt with optional domain knowledge and historical patterns in external databases. It also implements an iterative prompt optimization mechanism that automatically improves the prompt based on flaw detection. The key contributions of this work include (1) pioneering an end-to-end LLM-based solution for data transformation, (2) developing a benchmark dataset of 105 real-world building energy data transformation problems, and (3) conducting an extensive empirical evaluation where our approach achieved 96% accuracy in all 105 problems. SQLMorpher demonstrates the effectiveness of utilizing LLMs in complex, domain-specific challenges, highlighting the potential of their potential to drive sustainable solutions.
Empowering Low-Light Image Enhancer through Customized Learnable Priors
Authors: Authors: Naishan Zheng, Man Zhou, Yanmeng Dong, Xiangyu Rui, Jie Huang, Chongyi Li, Feng Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2309.01958
Pdf link: https://arxiv.org/pdf/2309.01958
Abstract Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end mapping networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issues, they rely on proximal operator networks that deliver ambiguous and implicit priors. In this work, we propose a paradigm for low-light image enhancement that explores the potential of customized learnable priors to improve the transparency of the deep unfolding paradigm. Motivated by the powerful feature representation capability of Masked Autoencoder (MAE), we customize MAE-based illumination and noise priors and redevelop them from two perspectives: 1) \textbf{structure flow}: we train the MAE from a normal-light image to its illumination properties and then embed it into the proximal operator design of the unfolding architecture; and m2) \textbf{optimization flow}: we train MAE from a normal-light image to its gradient representation and then employ it as a regularization term to constrain noise in the model output. These designs improve the interpretability and representation capability of the model.Extensive experiments on multiple low-light image enhancement datasets demonstrate the superiority of our proposed paradigm over state-of-the-art methods. Code is available at https://github.com/zheng980629/CUE.
Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length
Authors: Authors: Katerina Hlavackova-Schindler, Anna Melnykova, Irene Tubikanec
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.02027
Pdf link: https://arxiv.org/pdf/2309.02027
Abstract Multivariate Hawkes processes (MHPs) are versatile probabilistic tools used to model various real-life phenomena: earthquakes, operations on stock markets, neuronal activity, virus propagation and many others. In this paper, we focus on MHPs with exponential decay kernels and estimate connectivity graphs, which represent the Granger causal relations between their components. We approach this inference problem by proposing an optimization criterion and model selection algorithm based on the minimum message length (MML) principle. MML compares Granger causal models using the Occam's razor principle in the following way: even when models have a comparable goodness-of-fit to the observed data, the one generating the most concise explanation of the data is preferred. While most of the state-of-art methods using lasso-type penalization tend to overfitting in scenarios with short time horizons, the proposed MML-based method achieves high F1 scores in these settings. We conduct a numerical study comparing the proposed algorithm to other related classical and state-of-art methods, where we achieve the highest F1 scores in specific sparse graph settings. We illustrate the proposed method also on G7 sovereign bond data and obtain causal connections, which are in agreement with the expert knowledge available in the literature.
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Authors: Authors: Daoyuan Chen, Yilun Huang, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, Yaliang Li, Bolin Ding, Jingren Zhou
Subjects: Machine Learning (cs.LG); Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2309.02033
Pdf link: https://arxiv.org/pdf/2309.02033
Abstract The immense evolution in Large Language Models (LLMs) has underscored the importance of massive, diverse, and high-quality data. Despite this, existing open-source tools for LLM data processing remain limited and mostly tailored to specific datasets, with an emphasis on the reproducibility of released data over adaptability and usability, inhibiting potential applications. In response, we propose a one-stop, powerful yet flexible and user-friendly LLM data processing system named Data-Juicer. Our system offers over 50 built-in versatile operators and pluggable tools, which synergize modularity, composability, and extensibility dedicated to diverse LLM data processing needs. By incorporating visualized and automatic evaluation capabilities, Data-Juicer enables a timely feedback loop to accelerate data processing and gain data insights. To enhance usability, Data-Juicer provides out-of-the-box components for users with various backgrounds, and fruitful data recipes for LLM pre-training and post-tuning usages. Further, we employ multi-facet system optimization and seamlessly integrate Data-Juicer with both LLM and distributed computing ecosystems, to enable efficient and scalable data processing. Empirical validation of the generated data recipes reveals considerable improvements in LLaMA performance for various pre-training and post-tuning cases, demonstrating up to 7.45% relative improvement of averaged score across 16 LLM benchmarks and 16.25% higher win rate using pair-wise GPT-4 evaluation. The system's efficiency and scalability are also validated, supported by up to 88.7% reduction in single-machine processing time, 77.1% and 73.1% less memory and CPU usage respectively, and 7.91x processing acceleration when utilizing distributed computing ecosystems. Our system, data recipes, and multiple tutorial demos are released, calling for broader research centered on LLM data.
Diffusion Generative Inverse Design
Authors: Authors: Marin Vlastelica, Tatiana López-Guevara, Kelsey Allen, Peter Battaglia, Arnaud Doucet, Kimberley Stachenfeld
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.02040
Pdf link: https://arxiv.org/pdf/2309.02040
Abstract Inverse design refers to the problem of optimizing the input of an objective function in order to enact a target outcome. For many real-world engineering problems, the objective function takes the form of a simulator that predicts how the system state will evolve over time, and the design challenge is to optimize the initial conditions that lead to a target outcome. Recent developments in learned simulation have shown that graph neural networks (GNNs) can be used for accurate, efficient, differentiable estimation of simulator dynamics, and support high-quality design optimization with gradient- or sampling-based optimization procedures. However, optimizing designs from scratch requires many expensive model queries, and these procedures exhibit basic failures on either non-convex or high-dimensional problems.In this work, we show how denoising diffusion models (DDMs) can be used to solve inverse design problems efficiently and propose a particle sampling algorithm for further improving their efficiency. We perform experiments on a number of fluid dynamics design challenges, and find that our approach substantially reduces the number of calls to the simulator compared to standard techniques.
Bayesian experimental design for linear elasticity
Authors: Authors: Sarah Eberle-Blick, Nuutti Hyvönen
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2309.02042
Pdf link: https://arxiv.org/pdf/2309.02042
Abstract This work considers Bayesian experimental design for the inverse boundary value problem of linear elasticity in a two-dimensional setting. The aim is to optimize the positions of compactly supported pressure activations on the boundary of the examined body in order to maximize the value of the resulting boundary deformations as data for the inverse problem of reconstructing the Lam\'e parameters inside the object. We resort to a linearized measurement model and adopt the framework of Bayesian experimental design, under the assumption that the prior and measurement noise distributions are mutually independent Gaussians. This enables the use of the standard Bayesian A-optimality criterion for deducing optimal positions for the pressure activations. The (second) derivatives of the boundary measurements with respect to the Lam\'e parameters and the positions of the boundary pressure activations are deduced to allow minimizing the corresponding objective function, i.e., the trace of the covariance matrix of the posterior distribution, by a gradient-based optimization algorithm. Two-dimensional numerical experiments are performed to demonstrate the functionality of our approach.
Probabilistic Self-supervised Learning via Scoring Rules Minimization
Authors: Authors: Amirhossein Vahidi, Simon Schoßer, Lisa Wimmer, Yawei Li, Bernd Bischl, Eyke Hüllermeier, Mina Rezaei
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.02048
Pdf link: https://arxiv.org/pdf/2309.02048
Abstract In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representations from each other through knowledge distillation. By presenting the input samples in two augmented formats, the online network is trained to predict the target network representation of the same sample under a different augmented view. The two networks are trained via our new loss function based on proper scoring rules. We provide a theoretical justification for ProSMIN's convergence, demonstrating the strict propriety of its modified scoring rule. This insight validates the method's optimization process and contributes to its robustness and effectiveness in improving representation quality. We evaluate our probabilistic model on various downstream tasks, such as in-distribution generalization, out-of-distribution detection, dataset corruption, low-shot learning, and transfer learning. Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets like ImageNet-O and ImageNet-C, ProSMIN demonstrates its scalability and real-world applicability.
Optimization tools for Twin-in-the-Loop vehicle control design: analysis and yaw-rate tracking case study
Authors: Authors: Federico Dettù, Simone Formentin, Stefano Varisco, Sergio Matteo Savaresi
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.02080
Pdf link: https://arxiv.org/pdf/2309.02080
Abstract Given the urgent need of simplifying the end-of-line tuning of complex vehicle dynamics controllers, the Twin-in-the-Loop Control (TiL-C) approach was recently proposed in the automotive field. In TiL-C, a digital twin is run on-board to compute a nominal control action in run-time and an additional block C_delta is used to compensate for the mismatch between the simulator and the real vehicle. As the digital twin is assumed to be the best replica available of the real plant, the key issue in TiL-C becomes the tuning of the compensator, which must be performed relying on data only. In this paper, we investigate the use of different black-box optimization techniques for the calibration of C_delta. More specifically, we compare the originally proposed Bayesian Optimization (BO) approach with the recently developed Set Membership Global Optimization (SMGO) and Virtual Reference Feedback Tuning (VRFT), a one-shot direct data-driven design method. The analysis will be carried out within a professional multibody simulation environment on a novel TiL-C application case study -- the yaw-rate tracking problem -- so as to further prove the TiL-C effctiveness on a challenging problem. Simulations will show that the VRFT approach is capable of providing a well tuned controller after a single iteration, while 10 to 15 iterations are necessary for refining it with global optimizers. Also, SMGO is shown to significantly reduce the computational effort required by BO.
Model-based Offline Policy Optimization with Adversarial Network
Authors: Authors: Junming Yang, Xingguo Chen, Shengyuan Wang, Bolei Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.02157
Pdf link: https://arxiv.org/pdf/2309.02157
Abstract Model-based offline reinforcement learning (RL), which builds a supervised transition model with logging dataset to avoid costly interactions with the online environment, has been a promising approach for offline policy optimization. As the discrepancy between the logging data and online environment may result in a distributional shift problem, many prior works have studied how to build robust transition models conservatively and estimate the model uncertainty accurately. However, the over-conservatism can limit the exploration of the agent, and the uncertainty estimates may be unreliable. In this work, we propose a novel Model-based Offline policy optimization framework with Adversarial Network (MOAN). The key idea is to use adversarial learning to build a transition model with better generalization, where an adversary is introduced to distinguish between in-distribution and out-of-distribution samples. Moreover, the adversary can naturally provide a quantification of the model's uncertainty with theoretical guarantees. Extensive experiments showed that our approach outperforms existing state-of-the-art baselines on widely studied offline RL benchmarks. It can also generate diverse in-distribution samples, and quantify the uncertainty more accurately.
Personalized Federated Deep Reinforcement Learning-based Trajectory Optimization for Multi-UAV Assisted Edge Computing
Authors: Authors: Zhengrong Song, Chuan Ma, Ming Ding, Howard H. Yang, Yuwen Qian, Xiangwei Zhou
Subjects: Multiagent Systems (cs.MA); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.02193
Pdf link: https://arxiv.org/pdf/2309.02193
Abstract In the era of 5G mobile communication, there has been a significant surge in research focused on unmanned aerial vehicles (UAVs) and mobile edge computing technology. UAVs can serve as intelligent servers in edge computing environments, optimizing their flight trajectories to maximize communication system throughput. Deep reinforcement learning (DRL)-based trajectory optimization algorithms may suffer from poor training performance due to intricate terrain features and inadequate training data. To overcome this limitation, some studies have proposed leveraging federated learning (FL) to mitigate the data isolation problem and expedite convergence. Nevertheless, the efficacy of global FL models can be negatively impacted by the high heterogeneity of local data, which could potentially impede the training process and even compromise the performance of local agents. This work proposes a novel solution to address these challenges, namely personalized federated deep reinforcement learning (PF-DRL), for multi-UAV trajectory optimization. PF-DRL aims to develop individualized models for each agent to address the data scarcity issue and mitigate the negative impact of data heterogeneity. Simulation results demonstrate that the proposed algorithm achieves superior training performance with faster convergence rates, and improves service quality compared to other DRL-based approaches.
Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs
Authors: Authors: Christie Alappat, Jonas Thies, Georg Hager, Holger Fehske, Gerhard Wellein
Subjects: Numerical Analysis (math.NA); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2309.02228
Pdf link: https://arxiv.org/pdf/2309.02228
Abstract Sparse linear iterative solvers are essential for many large-scale simulations. Much of the runtime of these solvers is often spent in the implicit evaluation of matrix polynomials via a sequence of sparse matrix-vector products. A variety of approaches has been proposed to make these polynomial evaluations explicit (i.e., fix the coefficients), e.g., polynomial preconditioners or s-step Krylov methods. Furthermore, it is nowadays a popular practice to approximate triangular solves by a matrix polynomial to increase parallelism. Such algorithms allow to evaluate the polynomial using a so-called matrix power kernel (MPK), which computes the product between a power of a sparse matrix A and a dense vector x, or a related operation. Recently we have shown that using the level-based formulation of sparse matrix-vector multiplications in the Recursive Algebraic Coloring Engine (RACE) framework we can perform temporal cache blocking of MPK to increase its performance. In this work, we demonstrate the application of this cache-blocking optimization in sparse iterative solvers. By integrating the RACE library into the Trilinos framework, we demonstrate the speedups achieved in preconditioned) s-step GMRES, polynomial preconditioners, and algebraic multigrid (AMG). For MPK-dominated algorithms we achieve speedups of up to 3x on modern multi-core compute nodes. For algorithms with moderate contributions from subspace orthogonalization, the gain reduces significantly, which is often caused by the insufficient quality of the orthogonalization routines. Finally, we showcase the application of RACE-accelerated solvers in a real-world wind turbine simulation (Nalu-Wind) and highlight the new opportunities and perspectives opened up by RACE as a cache-blocking technique for MPK-enabled sparse solvers.
Fairness Optimization of RSMA for Uplink Communication based on Intelligent Reflecting Surface
Authors: Authors: Shanshan Zhang, Wen Chen
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.02264
Pdf link: https://arxiv.org/pdf/2309.02264
Abstract In this paper, we propose a rate-splitting multiple access (RSMA) scheme for uplink wireless communication systems with intelligent reflecting surface (IRS) aided. In the considered model, IRS is adopted to overcome power attenuation caused by path loss. We construct a max-min fairness optimization problem to obtain the resource allocation, including the receive beamforming at the base station (BS) and phase-shift beamforming at IRS. We also introduce a successive group decoding (SGD) algorithm at the receiver, which trades off the fairness and complexity of decoding. In the simulation, the results show that the proposed scheme has superiority in improving the fairness of uplink communication.
Black-Box Attacks against Signed Graph Analysis via Balance Poisoning
Authors: Authors: Jialong Zhou, Yuni Lai, Jian Ren, Kai Zhou
Subjects: Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2309.02396
Pdf link: https://arxiv.org/pdf/2309.02396
Abstract Signed graphs are well-suited for modeling social networks as they capture both positive and negative relationships. Signed graph neural networks (SGNNs) are commonly employed to predict link signs (i.e., positive and negative) in such graphs due to their ability to handle the unique structure of signed graphs. However, real-world signed graphs are vulnerable to malicious attacks by manipulating edge relationships, and existing adversarial graph attack methods do not consider the specific structure of signed graphs. SGNNs often incorporate balance theory to effectively model the positive and negative links. Surprisingly, we find that the balance theory that they rely on can ironically be exploited as a black-box attack. In this paper, we propose a novel black-box attack called balance-attack that aims to decrease the balance degree of the signed graphs. We present an efficient heuristic algorithm to solve this NP-hard optimization problem. We conduct extensive experiments on five popular SGNN models and four real-world datasets to demonstrate the effectiveness and wide applicability of our proposed attack method. By addressing these challenges, our research contributes to a better understanding of the limitations and resilience of robust models when facing attacks on SGNNs. This work contributes to enhancing the security and reliability of signed graph analysis in social network modeling. Our PyTorch implementation of the attack is publicly available on GitHub: https://github.com/JialongZhou666/Balance-Attack.git.
Generating Realistic Images from In-the-wild Sounds
Authors: Authors: Taegyeong Lee, Jeonghun Kang, Hyeonyu Kim, Taehwan Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2309.02405
Pdf link: https://arxiv.org/pdf/2309.02405
Abstract Representing wild sounds as images is an important but challenging task due to the lack of paired datasets between sound and images and the significant differences in the characteristics of these two modalities. Previous studies have focused on generating images from sound in limited categories or music. In this paper, we propose a novel approach to generate images from in-the-wild sounds. First, we convert sound into text using audio captioning. Second, we propose audio attention and sentence attention to represent the rich characteristics of sound and visualize the sound. Lastly, we propose a direct sound optimization with CLIPscore and AudioCLIP and generate images with a diffusion-based model. In experiments, it shows that our model is able to generate high quality images from wild sounds and outperforms baselines in both quantitative and qualitative evaluations on wild audio datasets.
Investment-based optimisation of energy storage design parameters in a grid-connected hybrid renewable energy system
Authors: Authors: Sleiman Farah, Gorm Bruun Andresen
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.02406
Pdf link: https://arxiv.org/pdf/2309.02406
Abstract Grid-connected hybrid renewable power systems with energy storage can reduce the intermittency of renewable power supply. However, emerging energy storage technologies need improvement to compete with lithium-ion batteries and reduce the cost of energy. Identifying and optimizing the the most valuable improvement path of these technologies is challenging due to the non-linearity of the energy system model when considering parameters as independent variables. To overcome this, a novel investment-based optimization method is proposed. The method involves linear optimization of the hybrid renewable energy system and subsequent investment optimization, accounting for diminishing improvements per investment. Applied to thermal energy, pumped thermal energy, molten salt, and adiabatic compressed air energy storage technologies, the results show that enhancing discharge efficiency is most valuable for all technologies. Reducing discharge capacity costs and energy storage capacity cost can also become important. Charge capacity cost and charge efficiency are found to be of lesser significance. The study provides detailed improvement pathways for each technology under various operational conditions, assisting developers in resource allocation. Overall, the investment-based optimization method and findings contribute to enhancing the competitiveness of emerging energy storage technologies and reducing reliance on batteries in renewable energy systems.
Keyword: adam

ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation
Authors: Authors: Nastaran Darabi, Maeesha Binte Hashem, Hongyi Pan, Ahmet Cetin, Wilfred Gomes, Amit Ranjan Trivedi
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.01771
Pdf link: https://arxiv.org/pdf/2309.01771
Abstract The edge processing of deep neural networks (DNNs) is becoming increasingly important due to its ability to extract valuable information directly at the data source to minimize latency and energy consumption. Frequency-domain model compression, such as with the Walsh-Hadamard transform (WHT), has been identified as an efficient alternative. However, the benefits of frequency-domain processing are often offset by the increased multiply-accumulate (MAC) operations required. This paper proposes a novel approach to an energy-efficient acceleration of frequency-domain neural networks by utilizing analog-domain frequency-based tensor transformations. Our approach offers unique opportunities to enhance computational efficiency, resulting in several high-level advantages, including array micro-architecture with parallelism, ADC/DAC-free analog computations, and increased output sparsity. Our approach achieves more compact cells by eliminating the need for trainable parameters in the transformation matrix. Moreover, our novel array micro-architecture enables adaptive stitching of cells column-wise and row-wise, thereby facilitating perfect parallelism in computations. Additionally, our scheme enables ADC/DAC-free computations by training against highly quantized matrix-vector products, leveraging the parameter-free nature of matrix multiplications. Another crucial aspect of our design is its ability to handle signed-bit processing for frequency-based transformations. This leads to increased output sparsity and reduced digitization workload. On a 16$\times$16 crossbars, for 8-bit input processing, the proposed approach achieves the energy efficiency of 1602 tera operations per second per Watt (TOPS/W) without early termination strategy and 5311 TOPS/W with early termination strategy at VDD = 0.8 V.
TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
Authors: Authors: Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2309.01947
Pdf link: https://arxiv.org/pdf/2309.01947
Abstract Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-validating models after making these changes can be a resource-intensive task. This paper presents TODM (Train Once Deploy Many), a new approach to efficiently train many sizes of hardware-friendly on-device ASR models with comparable GPU-hours to that of a single training job. TODM leverages insights from prior work on Supernet, where Recurrent Neural Network Transducer (RNN-T) models share weights within a Supernet. It reduces layer sizes and widths of the Supernet to obtain subnetworks, making them smaller models suitable for all hardware types. We introduce a novel combination of three techniques to improve the outcomes of the TODM Supernet: adaptive dropouts, an in-place Alpha-divergence knowledge distillation, and the use of ScaledAdam optimizer. We validate our approach by comparing Supernet-trained versus individually tuned Multi-Head State Space Model (MH-SSM) RNN-T using LibriSpeech. Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.
AdaPlus: Integrating Nesterov Momentum and Precise Stepsize Adjustment on AdamW Basis
Authors: Authors: Lei Guan
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.01966
Pdf link: https://arxiv.org/pdf/2309.01966
Abstract This paper proposes an efficient optimizer called AdaPlus which integrates Nesterov momentum and precise stepsize adjustment on AdamW basis. AdaPlus combines the advantages of AdamW, Nadam, and AdaBelief and, in particular, does not introduce any extra hyper-parameters. We perform extensive experimental evaluations on three machine learning tasks to validate the effectiveness of AdaPlus. The experiment results validate that AdaPlus (i) is the best adaptive method which performs most comparable with (even slightly better than) SGD with momentum on image classification tasks and (ii) outperforms other state-of-the-art optimizers on language modeling tasks and illustrates the highest stability when training GANs. The experiment code of AdaPlus is available at: https://github.com/guanleics/AdaPlus.
Keyword: gradient

Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction
Authors: Authors: Alejandro Rodriguez Dominguez, Muhammad Shahzad, Xia Hong
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.00781
Pdf link: https://arxiv.org/pdf/2309.00781
Abstract Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. It can be tackled with multiple hypotheses frameworks but with the difficulty of combining them efficiently in a learning model. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. The predictors are regression models of any type that can form centroidal Voronoi tessellations which are a function of their losses during training. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution and is equivalent to interpolating the meta-loss of the predictors, the loss being a zero set of the interpolation error. This model has a fixed-point iteration algorithm between the predictors and the centers of the basis functions. Diversity in learning can be controlled parametrically by truncating the tessellation formation with the losses of individual predictors. A closed-form solution with least-squares is presented, which to the authors knowledge, is the fastest solution in the literature for multiple hypotheses and structured predictions. Superior generalization performance and computational efficiency is achieved using only two-layer neural networks as predictors controlling diversity as a key component of success. A gradient-descent approach is introduced which is loss-agnostic regarding the predictors. The expected value for the loss of the structured model with Gaussian basis functions is computed, finding that correlation between predictors is not an appropriate tool for diversification. The experiments show outperformance with respect to the top competitors in the literature.
Diffusion Modeling with Domain-conditioned Prior Guidance for Accelerated MRI and qMRI Reconstruction
Authors: Authors: Wanyu Bian, Albert Jang, Fang Liu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.00783
Pdf link: https://arxiv.org/pdf/2309.00783
Abstract This study introduces a novel approach for image reconstruction based on a diffusion model conditioned on the native data domain. Our method is applied to multi-coil MRI and quantitative MRI reconstruction, leveraging the domain-conditioned diffusion model within the frequency and parameter domains. The prior MRI physics are used as embeddings in the diffusion model, enforcing data consistency to guide the training and sampling process, characterizing MRI k-space encoding in MRI reconstruction, and leveraging MR signal modeling for qMRI reconstruction. Furthermore, a gradient descent optimization is incorporated into the diffusion steps, enhancing feature learning and improving denoising. The proposed method demonstrates a significant promise, particularly for reconstructing images at high acceleration factors. Notably, it maintains great reconstruction accuracy and efficiency for static and quantitative MRI reconstruction across diverse anatomical structures. Beyond its immediate applications, this method provides potential generalization capability, making it adaptable to inverse problems across various domains.
Network Topology Inference with Sparsity and Laplacian Constraints
Authors: Authors: Jiaxi Ying, Xi Han, Rui Zhou, Xiwen Wang, Hing Cheung So
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.00960
Pdf link: https://arxiv.org/pdf/2309.00960
Abstract We tackle the network topology inference problem by utilizing Laplacian constrained Gaussian graphical models, which recast the task as estimating a precision matrix in the form of a graph Laplacian. Recent research \cite{ying2020nonconvex} has uncovered the limitations of the widely used $\ell_1$-norm in learning sparse graphs under this model: empirically, the number of nonzero entries in the solution grows with the regularization parameter of the $\ell_1$-norm; theoretically, a large regularization parameter leads to a fully connected (densest) graph. To overcome these challenges, we propose a graph Laplacian estimation method incorporating the $\ell_0$-norm constraint. An efficient gradient projection algorithm is developed to solve the resulting optimization problem, characterized by sparsity and Laplacian constraints. Through numerical experiments with synthetic and financial time-series datasets, we demonstrate the effectiveness of the proposed method in network topology inference.
Switch and Conquer: Efficient Algorithms By Switching Stochastic Gradient Oracles For Decentralized Saddle Point Problems
Authors: Authors: Chhavi Sharma, Vishnu Narayanan, P. Balamurugan
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.00997
Pdf link: https://arxiv.org/pdf/2309.00997
Abstract We consider a class of non-smooth strongly convex-strongly concave saddle point problems in a decentralized setting without a central server. To solve a consensus formulation of problems in this class, we develop an inexact primal dual hybrid gradient (inexact PDHG) procedure that allows generic gradient computation oracles to update the primal and dual variables. We first investigate the performance of inexact PDHG with stochastic variance reduction gradient (SVRG) oracle. Our numerical study uncovers a significant phenomenon of initial conservative progress of iterates of IPDHG with SVRG oracle. To tackle this, we develop a simple and effective switching idea, where a generalized stochastic gradient (GSG) computation oracle is employed to hasten the iterates' progress to a saddle point solution during the initial phase of updates, followed by a switch to the SVRG oracle at an appropriate juncture. The proposed algorithm is named Decentralized Proximal Switching Stochastic Gradient method with Compression (C-DPSSG), and is proven to converge to an $\epsilon$-accurate saddle point solution with linear rate. Apart from delivering highly accurate solutions, our study reveals that utilizing the best convergence phases of GSG and SVRG oracles makes C-DPSSG well suited for obtaining solutions of low/medium accuracy faster, useful for certain applications. Numerical experiments on two benchmark machine learning applications show C-DPSSG's competitive performance which validate our theoretical findings.
Hessian-aware Quantized Node Embeddings for Recommendation
Authors: Authors: Huiyuan Chen, Kaixiong Zhou, Kwei-Herng Lai, Chin-Chia Michael Yeh, Yan Zheng, Xia Hu, Hao Yang
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2309.01032
Pdf link: https://arxiv.org/pdf/2309.01032
Abstract Graph Neural Networks (GNNs) have achieved state-of-the-art performance in recommender systems. Nevertheless, the process of searching and ranking from a large item corpus usually requires high latency, which limits the widespread deployment of GNNs in industry-scale applications. To address this issue, many methods compress user/item representations into the binary embedding space to reduce space requirements and accelerate inference. Also, they use the Straight-through Estimator (STE) to prevent vanishing gradients during back-propagation. However, the STE often causes the gradient mismatch problem, leading to sub-optimal results. In this work, we present the Hessian-aware Quantized GNN (HQ-GNN) as an effective solution for discrete representations of users/items that enable fast retrieval. HQ-GNN is composed of two components: a GNN encoder for learning continuous node embeddings and a quantized module for compressing full-precision embeddings into low-bit ones. Consequently, HQ-GNN benefits from both lower memory requirements and faster inference speeds compared to vanilla GNNs. To address the gradient mismatch problem in STE, we further consider the quantized errors and its second-order derivatives for better stability. The experimental results on several large-scale datasets show that HQ-GNN achieves a good balance between latency and performance.
Efficient Covariance Matrix Reconstruction with Iterative Spatial Spectrum Sampling
Authors: Authors: S. Mohammadzadeh, V. H. Nascimento, R. C. de Lamare, O. Kukrer
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.01040
Pdf link: https://arxiv.org/pdf/2309.01040
Abstract This work presents a cost-effective technique for designing robust adaptive beamforming algorithms based on efficient covariance matrix reconstruction with iterative spatial power spectrum (CMR-ISPS). The proposed CMR-ISPS approach reconstructs the interference-plus-noise covariance (INC) matrix based on a simplified maximum entropy power spectral density function that can be used to shape the directional response of the beamformer. Firstly, we estimate the directions of arrival (DoAs) of the interfering sources with the available snapshots. We then develop an algorithm to reconstruct the INC matrix using a weighted sum of outer products of steering vectors whose coefficients can be estimated in the vicinity of the DoAs of the interferences which lie in a small angular sector. We also devise a cost-effective adaptive algorithm based on conjugate gradient techniques to update the beamforming weights and a method to obtain estimates of the signal of interest (SOI) steering vector from the spatial power spectrum. The proposed CMR-ISPS beamformer can suppress interferers close to the direction of the SOI by producing notches in the directional response of the array with sufficient depths. Simulation results are provided to confirm the validity of the proposed method and make a comparison to existing approaches
martFL: Enabling Utility-Driven Data Marketplace with a Robust and Verifiable Federated Learning Architecture
Authors: Authors: Qi Li, Zhuotao Liu, Qi Li, Ke Xu
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2309.01098
Pdf link: https://arxiv.org/pdf/2309.01098
Abstract The development of machine learning models requires a large amount of training data. Data marketplaces are essential for trading high-quality, private-domain data not publicly available online. However, due to growing data privacy concerns, direct data exchange is inappropriate. Federated Learning (FL) is a distributed machine learning paradigm that exchanges data utilities (in form of local models or gradients) among multiple parties without directly sharing the raw data. However, several challenges exist when applying existing FL architectures to construct a data marketplace: (i) In existing FL architectures, Data Acquirers (DAs) cannot privately evaluate local models from Data Providers (DPs) prior to trading; (ii) Model aggregation protocols in existing FL designs struggle to exclude malicious DPs without "overfitting" to the DA's (possibly biased) root dataset; (iii) Prior FL designs lack a proper billing mechanism to enforce the DA to fairly allocate the reward according to contributions made by different DPs. To address above challenges, we propose martFL, the first federated learning architecture that is specifically designed to enable a secure utility-driven data marketplace. At a high level, martFL is powered by two innovative designs: (i) a quality-aware model aggregation protocol that achieves robust local model aggregation even when the DA's root dataset is biased; (ii) a verifiable data transaction protocol that enables the DA to prove, both succinctly and in zero-knowledge, that it has faithfully aggregates the local models submitted by different DPs according to the committed aggregation weights, based on which the DPs can unambiguously claim the corresponding reward. We implement a prototype of martFL and evaluate it extensively over various tasks. The results show that martFL can improve the model accuracy by up to 25% while saving up to 64% data acquisition cost.
Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization
Authors: Authors: Uri Gadot, Esther Derman, Navdeep Kumar, Maxence Mohamed Elfatihi, Kfir Levy, Shie Mannor
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.01107
Pdf link: https://arxiv.org/pdf/2309.01107
Abstract In robust Markov decision processes (RMDPs), it is assumed that the reward and the transition dynamics lie in a given uncertainty set. By targeting maximal return under the most adversarial model from that set, RMDPs address performance sensitivity to misspecified environments. Yet, to preserve computational tractability, the uncertainty set is traditionally independently structured for each state. This so-called rectangularity condition is solely motivated by computational concerns. As a result, it lacks a practical incentive and may lead to overly conservative behavior. In this work, we study coupled reward RMDPs where the transition kernel is fixed, but the reward function lies within an $\alpha$-radius from a nominal one. We draw a direct connection between this type of non-rectangular reward-RMDPs and applying policy visitation frequency regularization. We introduce a policy-gradient method, and prove its convergence. Numerical experiments illustrate the learned policy's robustness and its less conservative behavior when compared to rectangular uncertainty.
Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments
Authors: Authors: M. Soheil Shamaee, S. Fathi Hafshejani
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.01248
Pdf link: https://arxiv.org/pdf/2309.01248
Abstract This paper introduces a novel approach to enhance the performance of the stochastic gradient descent (SGD) algorithm by incorporating a modified decay step size based on $\frac{1}{\sqrt{t}}$. The proposed step size integrates a logarithmic term, leading to the selection of smaller values in the final iterations. Our analysis establishes a convergence rate of $O(\frac{\ln T}{\sqrt{T}})$ for smooth non-convex functions without the Polyak-{\L}ojasiewicz condition. To evaluate the effectiveness of our approach, we conducted numerical experiments on image classification tasks using the FashionMNIST, and CIFAR10 datasets, and the results demonstrate significant improvements in accuracy, with enhancements of $0.5\%$ and $1.4\%$ observed, respectively, compared to the traditional $\frac{1}{\sqrt{t}}$ step size. The source code can be found at \\url{https://github.com/Shamaeem/LNSQRTStepSize}.
EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity
Authors: Authors: Zijie Jiang, Masatoshi Okutomi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.01296
Pdf link: https://arxiv.org/pdf/2309.01296
Abstract Self-supervised monocular scene flow estimation, aiming to understand both 3D structures and 3D motions from two temporally consecutive monocular images, has received increasing attention for its simple and economical sensor setup. However, the accuracy of current methods suffers from the bottleneck of less-efficient network architecture and lack of motion rigidity for regularization. In this paper, we propose a superior model named EMR-MSF by borrowing the advantages of network architecture design under the scope of supervised learning. We further impose explicit and robust geometric constraints with an elaborately constructed ego-motion aggregation module where a rigidity soft mask is proposed to filter out dynamic regions for stable ego-motion estimation using static regions. Moreover, we propose a motion consistency loss along with a mask regularization loss to fully exploit static regions. Several efficient training strategies are integrated including a gradient detachment technique and an enhanced view synthesis process for better performance. Our proposed method outperforms the previous self-supervised works by a large margin and catches up to the performance of supervised methods. On the KITTI scene flow benchmark, our approach improves the SF-all metric of the state-of-the-art self-supervised monocular method by 44% and demonstrates superior performance across sub-tasks including depth and visual odometry, amongst other self-supervised single-task or multi-task methods.
Real-time pedestrian recognition on low computational resources
Authors: Authors: Guifan Weng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.01353
Pdf link: https://arxiv.org/pdf/2309.01353
Abstract Pedestrian recognition has successfully been applied to security, autonomous cars, Aerial photographs. For most applications, pedestrian recognition on small mobile devices is important. However, the limitations of the computing hardware make this a challenging task. In this work, we investigate real-time pedestrian recognition on small physical-size computers with low computational resources for faster speed. This paper presents three methods that work on the small physical size CPUs system. First, we improved the Local Binary Pattern (LBP) features and Adaboost classifier. Second, we optimized the Histogram of Oriented Gradients (HOG) and Support Vector Machine. Third, We implemented fast Convolutional Neural Networks (CNNs). The results demonstrate that the three methods achieved real-time pedestrian recognition at an accuracy of more than 95% and a speed of more than 5 fps on a small physical size computational platform with a 1.8 GHz Intel i5 CPU. Our methods can be easily applied to small mobile devices with high compatibility and generality.
Toward Defensive Letter Design
Authors: Authors: Rentaro Kataoka, Akisato Kimura, Seiichi Uchida
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.01452
Pdf link: https://arxiv.org/pdf/2309.01452
Abstract A major approach for defending against adversarial attacks aims at controlling only image classifiers to be more resilient, and it does not care about visual objects, such as pandas and cars, in images. This means that visual objects themselves cannot take any defensive actions, and they are still vulnerable to adversarial attacks. In contrast, letters are artificial symbols, and we can freely control their appearance unless losing their readability. In other words, we can make the letters more defensive to the attacks. This paper poses three research questions related to the adversarial vulnerability of letter images: (1) How defensive are the letters against adversarial attacks? (2) Can we estimate how defensive a given letter image is before attacks? (3) Can we control the letter images to be more defensive against adversarial attacks? For answering the first and second questions, we measure the defensibility of letters by employing Iterative Fast Gradient Sign Method (I-FGSM) and then build a deep regression model for estimating the defensibility of each letter image. We also propose a two-step method based on a generative adversarial network (GAN) for generating character images with higher defensibility, which solves the third research question.
On the Consistency and Robustness of Saliency Explanations for Time Series Classification
Authors: Authors: Chiara Balestra, Bin Li, Emmanuel Müller
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.01457
Pdf link: https://arxiv.org/pdf/2309.01457
Abstract Interpretable machine learning and explainable artificial intelligence have become essential in many applications. The trade-off between interpretability and model performance is the traitor to developing intrinsic and model-agnostic interpretation methods. Although model explanation approaches have achieved significant success in vision and natural language domains, explaining time series remains challenging. The complex pattern in the feature domain, coupled with the additional temporal dimension, hinders efficient interpretation. Saliency maps have been applied to interpret time series windows as images. However, they are not naturally designed for sequential data, thus suffering various issues. This paper extensively analyzes the consistency and robustness of saliency maps for time series features and temporal attribution. Specifically, we examine saliency explanations from both perturbation-based and gradient-based explanation models in a time series classification task. Our experimental results on five real-world datasets show that they all lack consistent and robust performances to some extent. By drawing attention to the flawed saliency explanation models, we motivate to develop consistent and robust explanations for time series classification.
Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning
Authors: Authors: Qisen Yang, Huanqian Wang, Mukun Tong, Wenjie Shi, Gao Huang, Shiji Song
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.01458
Pdf link: https://arxiv.org/pdf/2309.01458
Abstract The black-box nature of deep reinforcement learning (RL) hinders them from real-world applications. Therefore, interpreting and explaining RL agents have been active research topics in recent years. Existing methods for post-hoc explanations usually adopt the action matching principle to enable an easy understanding of vision-based RL agents. In this paper, it is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents. It may lead to irrelevant or misplaced feature attribution when different DNNs' outputs lead to the same rewards or different rewards result from the same outputs. Therefore, we propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents as well. To ensure reward consistency during interpretable feature discovery, a novel framework (RL interpreting RL, denoted as RL-in-RL) is proposed to solve the gradient disconnection from actions to rewards. We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment. The results show that our method manages to keep reward (or return) consistency and achieves high-quality feature attribution. Further, a series of analytical experiments validate our assumption of the action matching principle's limitations.
Homomorphically encrypted gradient descent algorithms for quadratic programming
Authors: Authors: André Bertolace, Konstantinos Gatsis, Kostas Margellos
Subjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.01559
Pdf link: https://arxiv.org/pdf/2309.01559
Abstract In this paper, we evaluate the different fully homomorphic encryption schemes, propose an implementation, and numerically analyze the applicability of gradient descent algorithms to solve quadratic programming in a homomorphic encryption setup. The limit on the multiplication depth of homomorphic encryption circuits is a major challenge for iterative procedures such as gradient descent algorithms. Our analysis not only quantifies these limitations on prototype examples, thus serving as a benchmark for future investigations, but also highlights additional trade-offs like the ones pertaining the choice of gradient descent or accelerated gradient descent methods, opening the road for the use of homomorphic encryption techniques in iterative procedures widely used in optimization based control. In addition, we argue that, among the available homomorphic encryption schemes, the one adopted in this work, namely CKKS, is the only suitable scheme for implementing gradient descent algorithms. The choice of the appropriate step size is crucial to the convergence of the procedure. The paper shows firsthand the feasibility of homomorphically encrypted gradient descent algorithms.
Representing Edge Flows on Graphs via Sparse Cell Complexes
Authors: Authors: Josef Hoppe, Michael T. Schaub
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.01632
Pdf link: https://arxiv.org/pdf/2309.01632
Abstract Obtaining sparse, interpretable representations of observable data is crucial in many machine learning and signal processing tasks. For data representing flows along the edges of a graph, an intuitively interpretable way to obtain such representations is to lift the graph structure to a simplicial complex: The eigenvectors of the associated Hodge-Laplacian, respectively the incidence matrices of the corresponding simplicial complex then induce a Hodge decomposition, which can be used to represent the observed data in terms of gradient, curl, and harmonic flows. In this paper, we generalize this approach to cellular complexes and introduce the cell inference optimization problem, i.e., the problem of augmenting the observed graph by a set of cells, such that the eigenvectors of the associated Hodge Laplacian provide a sparse, interpretable representation of the observed edge flows on the graph. We show that this problem is NP-hard and introduce an efficient approximation algorithm for its solution. Experiments on real-world and synthetic data demonstrate that our algorithm outperforms current state-of-the-art methods while being computationally efficient.
Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD
Authors: Authors: Etay Livne, Gal Kaplun, Eran Malach Shai, Shalev-Schwatz
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.01640
Pdf link: https://arxiv.org/pdf/2309.01640
Abstract When using Stochastic Gradient Descent (SGD) for training machine learning models, it is often crucial to provide the model with examples sampled at random from the dataset. However, for large datasets stored in the cloud, random access to individual examples is often costly and inefficient. A recent work \cite{corgi}, proposed an online shuffling algorithm called CorgiPile, which greatly improves efficiency of data access, at the cost some performance loss, which is particularly apparent for large datasets stored in homogeneous shards (e.g., video datasets). In this paper, we introduce a novel two-step partial data shuffling strategy for SGD which combines an offline iteration of the CorgiPile method with a subsequent online iteration. Our approach enjoys the best of both worlds: it performs similarly to SGD with random access (even for homogenous data) without compromising the data access efficiency of CorgiPile. We provide a comprehensive theoretical analysis of the convergence properties of our method and demonstrate its practical advantages through experimental results.
Gated recurrent neural networks discover attention
Authors: Authors: Nicolas Zucchet, Seijin Kobayashi, Yassir Akram, Johannes von Oswald, Maxime Larcher, Angelika Steger, João Sacramento
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2309.01775
Pdf link: https://arxiv.org/pdf/2309.01775
Abstract Recent architectural developments have enabled recurrent neural networks (RNNs) to reach and even surpass the performance of Transformers on certain sequence modeling tasks. These modern RNNs feature a prominent design pattern: linear recurrent layers interconnected by feedforward paths with multiplicative gating. Here, we show how RNNs equipped with these two design elements can exactly implement (linear) self-attention, the main building block of Transformers. By reverse-engineering a set of trained RNNs, we find that gradient descent in practice discovers our construction. In particular, we examine RNNs trained to solve simple in-context learning tasks on which Transformers are known to excel and find that gradient descent instills in our RNNs the same attention-based in-context learning algorithm used by Transformers. Our findings highlight the importance of multiplicative interactions in neural networks and suggest that certain RNNs might be unexpectedly implementing attention under the hood.
DRAG: Divergence-based Adaptive Aggregation in Federated learning on Non-IID Data
Authors: Authors: Feng Zhu, Jingjing Zhang, Shengyun Liu, Xin Wang
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2309.01779
Pdf link: https://arxiv.org/pdf/2309.01779
Abstract Local stochastic gradient descent (SGD) is a fundamental approach in achieving communication efficiency in Federated Learning (FL) by allowing individual workers to perform local updates. However, the presence of heterogeneous data distributions across working nodes causes each worker to update its local model towards a local optimum, leading to the phenomenon known as client-drift" and resulting in slowed convergence. To address this issue, previous works have explored methods that either introduce communication overhead or suffer from unsteady performance. In this work, we introduce a novel metric calleddegree of divergence," quantifying the angle between the local gradient and the global reference direction. Leveraging this metric, we propose the divergence-based adaptive aggregation (DRAG) algorithm, which dynamically ``drags" the received local updates toward the reference direction in each round without requiring extra communication overhead. Furthermore, we establish a rigorous convergence analysis for DRAG, proving its ability to achieve a sublinear convergence rate. Compelling experimental results are presented to illustrate DRAG's superior performance compared to state-of-the-art algorithms in effectively managing the client-drift phenomenon. Additionally, DRAG exhibits remarkable resilience against certain Byzantine attacks. By securely sharing a small sample of the client's data with the FL server, DRAG effectively counters these attacks, as demonstrated through comprehensive experiments.
Survival Prediction from Imbalance colorectal cancer dataset using hybrid sampling methods and tree-based classifiers
Authors: Authors: Sadegh Soleimani, Mahsa Bahrami, Mansour Vali
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.01783
Pdf link: https://arxiv.org/pdf/2309.01783
Abstract Background and Objective: Colorectal cancer is a high mortality cancer. Clinical data analysis plays a crucial role in predicting the survival of colorectal cancer patients, enabling clinicians to make informed treatment decisions. However, utilizing clinical data can be challenging, especially when dealing with imbalanced outcomes. This paper focuses on developing algorithms to predict 1-, 3-, and 5-year survival of colorectal cancer patients using clinical datasets, with particular emphasis on the highly imbalanced 1-year survival prediction task. To address this issue, we propose a method that creates a pipeline of some of standard balancing techniques to increase the true positive rate. Evaluation is conducted on a colorectal cancer dataset from the SEER database. Methods: The pre-processing step consists of removing records with missing values and merging categories. The minority class of 1-year and 3-year survival tasks consists of 10% and 20% of the data, respectively. Edited Nearest Neighbor, Repeated edited nearest neighbor (RENN), Synthetic Minority Over-sampling Techniques (SMOTE), and pipelines of SMOTE and RENN approaches were used and compared for balancing the data with tree-based classifiers. Decision Trees, Random Forest, Extra Tree, eXtreme Gradient Boosting, and Light Gradient Boosting (LGBM) are used in this article. Method. Results: The performance evaluation utilizes a 5-fold cross-validation approach. In the case of highly imbalanced datasets (1-year), our proposed method with LGBM outperforms other sampling methods with the sensitivity of 72.30%. For the task of imbalance (3-year survival), the combination of RENN and LGBM achieves a sensitivity of 80.81%, indicating that our proposed method works best for highly imbalanced datasets. Conclusions: Our proposed method significantly improves mortality prediction for the minority class of colorectal cancer patients.
ATMS: Algorithmic Trading-Guided Market Simulation
Authors: Authors: Song Wei, Andrea Coletta, Svitlana Vyetrenko, Tucker Balch
Subjects: Machine Learning (cs.LG); Trading and Market Microstructure (q-fin.TR); Applications (stat.AP)
Arxiv link: https://arxiv.org/abs/2309.01784
Pdf link: https://arxiv.org/pdf/2309.01784
Abstract The effective construction of an Algorithmic Trading (AT) strategy often relies on market simulators, which remains challenging due to existing methods' inability to adapt to the sequential and dynamic nature of trading activities. This work fills this gap by proposing a metric to quantify market discrepancy. This metric measures the difference between a causal effect from underlying market unique characteristics and it is evaluated through the interaction between the AT agent and the market. Most importantly, we introduce Algorithmic Trading-guided Market Simulation (ATMS) by optimizing our proposed metric. Inspired by SeqGAN, ATMS formulates the simulator as a stochastic policy in reinforcement learning (RL) to account for the sequential nature of trading. Moreover, ATMS utilizes the policy gradient update to bypass differentiating the proposed metric, which involves non-differentiable operations such as order deletion from the market. Through extensive experiments on semi-real market data, we demonstrate the effectiveness of our metric and show that ATMS generates market data with improved similarity to reality compared to the state-of-the-art conditional Wasserstein Generative Adversarial Network (cWGAN) approach. Furthermore, ATMS produces market data with more balanced BUY and SELL volumes, mitigating the bias of the cWGAN baseline approach, where a simple strategy can exploit the BUY/SELL imbalance for profit.
Neural-Singular-Hessian: Implicit Neural Representation of Unoriented Point Clouds by Enforcing Singular Hessian
Authors: Authors: Zixiong Wang, Yunxiao Zhang, Rui Xu, Fan Zhang, Pengshuai Wang, Shuangmin Chen, Shiqing Xin, Wenping Wang, Changhe Tu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2309.01793
Pdf link: https://arxiv.org/pdf/2309.01793
Abstract Neural implicit representation is a promising approach for reconstructing surfaces from point clouds. Existing methods combine various regularization terms, such as the Eikonal and Laplacian energy terms, to enforce the learned neural function to possess the properties of a Signed Distance Function (SDF). However, inferring the actual topology and geometry of the underlying surface from poor-quality unoriented point clouds remains challenging. In accordance with Differential Geometry, the Hessian of the SDF is singular for points within the differential thin-shell space surrounding the surface. Our approach enforces the Hessian of the neural implicit function to have a zero determinant for points near the surface. This technique aligns the gradients for a near-surface point and its on-surface projection point, producing a rough but faithful shape within just a few iterations. By annealing the weight of the singular-Hessian term, our approach ultimately produces a high-fidelity reconstruction result. Extensive experimental results demonstrate that our approach effectively suppresses ghost geometry and recovers details from unoriented point clouds with better expressiveness than existing fitting-based methods.
Asymmetric matrix sensing by gradient descent with small random initialization
Authors: Authors: Johan S. Wind
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.01796
Pdf link: https://arxiv.org/pdf/2309.01796
Abstract We study matrix sensing, which is the problem of reconstructing a low-rank matrix from a few linear measurements. It can be formulated as an overparameterized regression problem, which can be solved by factorized gradient descent when starting from a small random initialization. Linear neural networks, and in particular matrix sensing by factorized gradient descent, serve as prototypical models of non-convex problems in modern machine learning, where complex phenomena can be disentangled and studied in detail. Much research has been devoted to studying special cases of asymmetric matrix sensing, such as asymmetric matrix factorization and symmetric positive semi-definite matrix sensing. Our key contribution is introducing a continuous differential equation that we call the $\textit{perturbed gradient flow}$. We prove that the perturbed gradient flow converges quickly to the true target matrix whenever the perturbation is sufficiently bounded. The dynamics of gradient descent for matrix sensing can be reduced to this formulation, yielding a novel proof of asymmetric matrix sensing with factorized gradient descent. Compared to directly analyzing the dynamics of gradient descent, the continuous formulation allows bounding key quantities by considering their derivatives, often simplifying the proofs. We believe the general proof technique may prove useful in other settings as well.
Gradient Domain Diffusion Models for Image Synthesis
Authors: Authors: Yuanhao Gong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Performance (cs.PF); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2309.01875
Pdf link: https://arxiv.org/pdf/2309.01875
Abstract Diffusion models are getting popular in generative image and video synthesis. However, due to the diffusion process, they require a large number of steps to converge. To tackle this issue, in this paper, we propose to perform the diffusion process in the gradient domain, where the convergence becomes faster. There are two reasons. First, thanks to the Poisson equation, the gradient domain is mathematically equivalent to the original image domain. Therefore, each diffusion step in the image domain has a unique corresponding gradient domain representation. Second, the gradient domain is much sparser than the image domain. As a result, gradient domain diffusion models converge faster. Several numerical experiments confirm that the gradient domain diffusion models are more efficient than the original diffusion models. The proposed method can be applied in a wide range of applications such as image processing, computer vision and machine learning tasks.
Towards General and Efficient Online Tuning for Spark
Authors: Authors: Yang Li, Huaijun Jiang, Yu Shen, Yide Fang, Xiaofeng Yang, Danqing Huang, Xinyi Zhang, Wentao Zhang, Ce Zhang, Peng Chen, Bin Cui
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.01901
Pdf link: https://arxiv.org/pdf/2309.01901
Abstract The distributed data analytic system -- Spark is a common choice for processing massive volumes of heterogeneous data, while it is challenging to tune its parameters to achieve high performance. Recent studies try to employ auto-tuning techniques to solve this problem but suffer from three issues: limited functionality, high overhead, and inefficient search. In this paper, we present a general and efficient Spark tuning framework that can deal with the three issues simultaneously. First, we introduce a generalized tuning formulation, which can support multiple tuning goals and constraints conveniently, and a Bayesian optimization (BO) based solution to solve this generalized optimization problem. Second, to avoid high overhead from additional offline evaluations in existing methods, we propose to tune parameters along with the actual periodic executions of each job (i.e., online evaluations). To ensure safety during online job executions, we design a safe configuration acquisition method that models the safe region. Finally, three innovative techniques are leveraged to further accelerate the search process: adaptive sub-space generation, approximate gradient descent, and meta-learning method. We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent. The empirical results on both public benchmarks and large-scale production tasks demonstrate its superiority in terms of practicality, generality, and efficiency. Notably, this service saves an average of 57.00% memory cost and 34.93% CPU cost on 25K in-production tasks within 20 iterations, respectively.
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes
Authors: Authors: Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.01922
Pdf link: https://arxiv.org/pdf/2309.01922
Abstract In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Distinguishing itself from existing works within this context, our approach harnesses the power of the general policy gradient-based algorithm, liberating it from the constraints of assuming a linear MDP structure. We propose a policy gradient-based algorithm and show its global convergence property. We then prove that the proposed algorithm has $\tilde{\mathcal{O}}({T}^{3/4})$ regret. Remarkably, this paper marks a pioneering effort by presenting the first exploration into regret-bound computation for the general parameterized policy gradient algorithm in the context of average reward scenarios.
Empowering Low-Light Image Enhancer through Customized Learnable Priors
Authors: Authors: Naishan Zheng, Man Zhou, Yanmeng Dong, Xiangyu Rui, Jie Huang, Chongyi Li, Feng Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2309.01958
Pdf link: https://arxiv.org/pdf/2309.01958
Abstract Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end mapping networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issues, they rely on proximal operator networks that deliver ambiguous and implicit priors. In this work, we propose a paradigm for low-light image enhancement that explores the potential of customized learnable priors to improve the transparency of the deep unfolding paradigm. Motivated by the powerful feature representation capability of Masked Autoencoder (MAE), we customize MAE-based illumination and noise priors and redevelop them from two perspectives: 1) \textbf{structure flow}: we train the MAE from a normal-light image to its illumination properties and then embed it into the proximal operator design of the unfolding architecture; and m2) \textbf{optimization flow}: we train MAE from a normal-light image to its gradient representation and then employ it as a regularization term to constrain noise in the model output. These designs improve the interpretability and representation capability of the model.Extensive experiments on multiple low-light image enhancement datasets demonstrate the superiority of our proposed paradigm over state-of-the-art methods. Code is available at https://github.com/zheng980629/CUE.
Linear Regression using Heterogeneous Data Batches
Authors: Authors: Ayush Jain, Rajat Sen, Weihao Kong, Abhimanyu Das, Alon Orlitsky
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.01973
Pdf link: https://arxiv.org/pdf/2309.01973
Abstract In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are $k$ subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few, $\tilde\Omega( k^{3/2})$, batches of medium-size with $\tilde\Omega(\sqrt k)$ samples each. However, the paper requires that the input distribution for all $k$ subgroups be isotropic Gaussian, and states that removing this assumption is an ``interesting and challenging problem". We propose a novel gradient-based algorithm that improves on the existing results in several ways. It extends the applicability of the algorithm by: (1) allowing the subgroups' underlying input distributions to be different, unknown, and heavy-tailed; (2) recovering all subgroups followed by a significant proportion of batches even for infinite $k$; (3) removing the separation requirement between the regression vectors; (4) reducing the number of batches and allowing smaller batch sizes.
Representation Learning Dynamics of Self-Supervised Models
Authors: Authors: Pascal Esser, Satyaki Mukherjee, Debarghya Ghoshdastidar
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.02011
Pdf link: https://arxiv.org/pdf/2309.02011
Abstract Self-Supervised Learning (SSL) is an important paradigm for learning representations from unlabelled data, and SSL with neural networks has been highly successful in practice. However current theoretical analysis of SSL is mostly restricted to generalisation error bounds. In contrast, learning dynamics often provide a precise characterisation of the behaviour of neural networks based models but, so far, are mainly known in supervised settings. In this paper, we study the learning dynamics of SSL models, specifically representations obtained by minimising contrastive and non-contrastive losses. We show that a naive extension of the dymanics of multivariate regression to SSL leads to learning trivial scalar representations that demonstrates dimension collapse in SSL. Consequently, we formulate SSL objectives with orthogonality constraints on the weights, and derive the exact (network width independent) learning dynamics of the SSL models trained using gradient descent on the Grassmannian manifold. We also argue that the infinite width approximation of SSL models significantly deviate from the neural tangent kernel approximations of supervised models. We numerically illustrate the validity of our theoretical findings, and discuss how the presented results provide a framework for further theoretical analysis of contrastive and non-contrastive SSL.
Diffusion Generative Inverse Design
Authors: Authors: Marin Vlastelica, Tatiana López-Guevara, Kelsey Allen, Peter Battaglia, Arnaud Doucet, Kimberley Stachenfeld
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.02040
Pdf link: https://arxiv.org/pdf/2309.02040
Abstract Inverse design refers to the problem of optimizing the input of an objective function in order to enact a target outcome. For many real-world engineering problems, the objective function takes the form of a simulator that predicts how the system state will evolve over time, and the design challenge is to optimize the initial conditions that lead to a target outcome. Recent developments in learned simulation have shown that graph neural networks (GNNs) can be used for accurate, efficient, differentiable estimation of simulator dynamics, and support high-quality design optimization with gradient- or sampling-based optimization procedures. However, optimizing designs from scratch requires many expensive model queries, and these procedures exhibit basic failures on either non-convex or high-dimensional problems.In this work, we show how denoising diffusion models (DDMs) can be used to solve inverse design problems efficiently and propose a particle sampling algorithm for further improving their efficiency. We perform experiments on a number of fluid dynamics design challenges, and find that our approach substantially reduces the number of calls to the simulator compared to standard techniques.
Bayesian experimental design for linear elasticity
Authors: Authors: Sarah Eberle-Blick, Nuutti Hyvönen
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2309.02042
Pdf link: https://arxiv.org/pdf/2309.02042
Abstract This work considers Bayesian experimental design for the inverse boundary value problem of linear elasticity in a two-dimensional setting. The aim is to optimize the positions of compactly supported pressure activations on the boundary of the examined body in order to maximize the value of the resulting boundary deformations as data for the inverse problem of reconstructing the Lam\'e parameters inside the object. We resort to a linearized measurement model and adopt the framework of Bayesian experimental design, under the assumption that the prior and measurement noise distributions are mutually independent Gaussians. This enables the use of the standard Bayesian A-optimality criterion for deducing optimal positions for the pressure activations. The (second) derivatives of the boundary measurements with respect to the Lam\'e parameters and the positions of the boundary pressure activations are deduced to allow minimizing the corresponding objective function, i.e., the trace of the covariance matrix of the posterior distribution, by a gradient-based optimization algorithm. Two-dimensional numerical experiments are performed to demonstrate the functionality of our approach.
Histograms of Points, Orientations, and Dynamics of Orientations Features for Hindi Online Handwritten Character Recognition
Authors: Authors: Anand Sharma (MIET, Meerut), A. G. Ramakrishnan (IISc, Bengaluru)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.02067
Pdf link: https://arxiv.org/pdf/2309.02067
Abstract A set of features independent of character stroke direction and order variations is proposed for online handwritten character recognition. A method is developed that maps features like co-ordinates of points, orientations of strokes at points, and dynamics of orientations of strokes at points spatially as a function of co-ordinate values of the points and computes histograms of these features from different regions in the spatial map. Different features like spatio-temporal, discrete Fourier transform, discrete cosine transform, discrete wavelet transform, spatial, and histograms of oriented gradients used in other studies for training classifiers for character recognition are considered. The classifier chosen for classification performance comparison, when trained with different features, is support vector machines (SVM). The character datasets used for training and testing the classifiers consist of online handwritten samples of 96 different Hindi characters. There are 12832 and 2821 samples in training and testing datasets, respectively. SVM classifiers trained with the proposed features has the highest classification accuracy of 92.9\% when compared to the performances of SVM classifiers trained with the other features and tested on the same testing dataset. Therefore, the proposed features have better character discriminative capability than the other features considered for comparison.
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
Authors: Authors: Wei Suo, Mengyang Sun, Weisong Liu, Yiqi Gao, Peng Wang, Yanning Zhang, Qi Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.02155
Pdf link: https://arxiv.org/pdf/2309.02155
Abstract VQA Natural Language Explanation (VQA-NLE) task aims to explain the decision-making process of VQA models in natural language. Unlike traditional attention or gradient analysis, free-text rationales can be easier to understand and gain users' trust. Existing methods mostly use post-hoc or self-rationalization models to obtain a plausible explanation. However, these frameworks are bottlenecked by the following challenges: 1) the reasoning process cannot be faithfully responded to and suffer from the problem of logical inconsistency. 2) Human-annotated explanations are expensive and time-consuming to collect. In this paper, we propose a new Semi-Supervised VQA-NLE via Self-Critical Learning (S3C), which evaluates the candidate explanations by answering rewards to improve the logical consistency between answers and rationales. With a semi-supervised learning framework, the S3C can benefit from a tremendous amount of samples without human-annotated explanations. A large number of automatic measures and human evaluations all show the effectiveness of our method. Meanwhile, the framework achieves a new state-of-the-art performance on the two VQA-NLE datasets.
Improving equilibrium propagation without weight symmetry through Jacobian homeostasis
Authors: Authors: Axel Laborieux, Friedemann Zenke
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2309.02214
Pdf link: https://arxiv.org/pdf/2309.02214
Abstract Equilibrium propagation (EP) is a compelling alternative to the backpropagation of error algorithm (BP) for computing gradients of neural networks on biological or analog neuromorphic substrates. Still, the algorithm requires weight symmetry and infinitesimal equilibrium perturbations, i.e., nudges, to estimate unbiased gradients efficiently. Both requirements are challenging to implement in physical systems. Yet, whether and how weight asymmetry affects its applicability is unknown because, in practice, it may be masked by biases introduced through the finite nudge. To address this question, we study generalized EP, which can be formulated without weight symmetry, and analytically isolate the two sources of bias. For complex-differentiable non-symmetric networks, we show that the finite nudge does not pose a problem, as exact derivatives can still be estimated via a Cauchy integral. In contrast, weight asymmetry introduces bias resulting in low task performance due to poor alignment of EP's neuronal error vectors compared to BP. To mitigate this issue, we present a new homeostatic objective that directly penalizes functional asymmetries of the Jacobian at the network's fixed point. This homeostatic objective dramatically improves the network's ability to solve complex tasks such as ImageNet 32x32. Our results lay the theoretical groundwork for studying and mitigating the adverse effects of imperfections of physical networks on learning algorithms that rely on the substrate's relaxation dynamics.
Magnetic Navigation using Attitude-Invariant Magnetic Field Information for Loop Closure Detection
Authors: Authors: Natalia Pavlasek, Charles Champagne Cossette, David Roy-Guay, James Richard Forbes
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.02394
Pdf link: https://arxiv.org/pdf/2309.02394
Abstract Indoor magnetic fields are a combination of Earth's magnetic field and disruptions induced by ferromagnetic objects, such as steel structural components in buildings. As a result of these disruptions, pervasive in indoor spaces, magnetic field data is often omitted from navigation algorithms in indoor environments. This paper leverages the spatially-varying disruptions to Earth's magnetic field to extract positional information for use in indoor navigation algorithms. The algorithm uses a rate gyro and an array of four magnetometers to estimate the robot's pose. Additionally, the magnetometer array is used to compute attitude-invariant measurements associated with the magnetic field and its gradient. These measurements are used to detect loop closure points. Experimental results indicate that the proposed approach can estimate the pose of a ground robot in an indoor environment within meter accuracy.
Delta-LoRA: Fine-Tuning High-Rank Parameters with the Delta of Low-Rank Matrices
Authors: Authors: Bojia Zi, Xianbiao Qi, Lingzhi Wang, Jianan Wang, Kam-Fai Wong, Lei Zhang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.02411
Pdf link: https://arxiv.org/pdf/2309.02411
Abstract In this paper, we present Delta-LoRA, which is a novel parameter-efficient approach to fine-tune large language models (LLMs). In contrast to LoRA and other low-rank adaptation methods such as AdaLoRA, Delta-LoRA not only updates the low-rank matrices $\bA$ and $\bB$, but also propagate the learning to the pre-trained weights $\bW$ via updates utilizing the delta of the product of two low-rank matrices ($\bA^{(t+1)}\bB^{(t+1)} - \bA^{(t)}\bB^{(t)}$). Such a strategy effectively addresses the limitation that the incremental update of low-rank matrices is inadequate for learning representations capable for downstream tasks. Moreover, as the update of $\bW$ does not need to compute the gradients of $\bW$ and store their momentums, Delta-LoRA shares comparable memory requirements and computational costs with LoRA. Extensive experiments show that Delta-LoRA significantly outperforms existing low-rank adaptation methods. We further support these results with comprehensive analyses that underscore the effectiveness of Delta-LoRA.
Keyword: super-resolution

Implicit Neural Image Stitching With Enhanced and Blended Feature Reconstruction
Authors: Authors: Minsu Kim, Jaewon Lee, Byeonghun Lee, Sunghoon Im, Kyong Hwan Jin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.01409
Pdf link: https://arxiv.org/pdf/2309.01409
Abstract Existing frameworks for image stitching often provide visually reasonable stitchings. However, they suffer from blurry artifacts and disparities in illumination, depth level, etc. Although the recent learning-based stitchings relax such disparities, the required methods impose sacrifice of image qualities failing to capture high-frequency details for stitched images. To address the problem, we propose a novel approach, implicit Neural Image Stitching (NIS) that extends arbitrary-scale super-resolution. Our method estimates Fourier coefficients of images for quality-enhancing warps. Then, the suggested model blends color mismatches and misalignment in the latent space and decodes the features into RGB values of stitched images. Our experiments show that our approach achieves improvement in resolving the low-definition imaging of the previous deep image stitching with favorable accelerated image-enhancing methods. Our source code is available at https://github.com/minshu-kim/NIS.

zoq / arxiv-updates

New submissions for Wed, 6 Sep 23 #594

Keyword: sgd

Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments

Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD

DRAG: Divergence-based Adaptive Aggregation in Federated learning on Non-IID Data

AdaPlus: Integrating Nesterov Momentum and Precise Stepsize Adjustment on AdamW Basis

A Simple Asymmetric Momentum Make SGD Greatest Again

Fairness Optimization of RSMA for Uplink Communication based on Intelligent Reflecting Surface

Keyword: optimization

Identifying Secure Operating Ranges for DER Control using Bilevel Optimization

A Simulation--Based Optimization approach for analyzing the ambulance diversion phenomenon in an Emergency-Department network

Polynomial-Model-Based Optimization for Blackbox Objectives

Randomized Polar Codes for Anytime Distributed Machine Learning

Learning Robust Model Predictive Control for Voltage Control of Islanded Microgrid

Efficient RLHF: Reducing the Memory Usage of PPO

Towards High-Frequency Tracking and Fast Edge-Aware Optimization

Diffusion Modeling with Domain-conditioned Prior Guidance for Accelerated MRI and qMRI Reconstruction

Online Targetless Radar-Camera Extrinsic Calibration Based on the Common Features of Radar and Camera

Deep Learning and Inverse Problems

Domain Generalization via Balancing Training Difficulty and Model Capability

Drift Analysis with Fitness Levels for Elitist Evolutionary Algorithms

Network Topology Inference with Sparsity and Laplacian Constraints

On the training and generalization of deep operator networks

Optimizing Mobile-Edge AI-Generated Everything (AIGX) Services by Prompt Engineering: Fundamental, Framework, and Case Study

Enhancing Infrared Small Target Detection Robustness with Bi-Level Adversarial Framework

Hybrid-Supervised Dual-Search: Leveraging Automatic Learning for Loss-free Multi-Exposure Image Fusion

AutoML-GPT: Large Language Model for AutoML

Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction

Performance-Sensitive Potential Functions for Efficient Flow of Connected and Automated Vehicles

An FPGA smart camera implementation of segmentation models for drone wildfire imagery

Joint Oscillation Damping and Inertia Provision Service for Converter-Interfaced Generation

Can I Trust Your Answer? Visually Grounded Video Question Answering

Memory augment is All You Need for image restoration

Federated cINN Clustering for Accurate Clustered Federated Learning

Is Your Learned Query Optimizer Behaving As You Expect? A Machine Learning Perspective

Homomorphically encrypted gradient descent algorithms for quadratic programming

Cross-Consistent Deep Unfolding Network for Adaptive All-In-One Video Restoration

Representing Edge Flows on Graphs via Sparse Cell Complexes

ReLoc-PDR: Visual Relocalization Enhanced Pedestrian Dead Reckoning via Graph Optimization

Stochastic Model Predictive Control for Networked Systems with Random Delays and Packet Losses in All Channels

ControlMat: A Controlled Generative Approach to Material Capture

From Kubernetes to Knactor: A State-Centric Rethink of Service Integration

Instant Continual Learning of Neural Radiance Fields

Inverse Dynamics Trajectory Optimization for Contact-Implicit Model Predictive Control

Computation and Communication Efficient Federated Learning over Wireless Networks

LoopTune: Optimizing Tensor Computations with Reinforcement Learning

Towards General and Efficient Online Tuning for Spark

Efficient Bayesian Computational Imaging with a Surrogate Score-Based Prior

Automatic Data Transformation Using Large Language Model An Experimental Study on Building Energy Data

Empowering Low-Light Image Enhancer through Customized Learnable Priors

Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length

Data-Juicer: A One-Stop Data Processing System for Large Language Models

Diffusion Generative Inverse Design

Bayesian experimental design for linear elasticity

Probabilistic Self-supervised Learning via Scoring Rules Minimization

Optimization tools for Twin-in-the-Loop vehicle control design: analysis and yaw-rate tracking case study

Model-based Offline Policy Optimization with Adversarial Network

Personalized Federated Deep Reinforcement Learning-based Trajectory Optimization for Multi-UAV Assisted Edge Computing

Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs

Fairness Optimization of RSMA for Uplink Communication based on Intelligent Reflecting Surface

Black-Box Attacks against Signed Graph Analysis via Balance Poisoning

Generating Realistic Images from In-the-wild Sounds

Investment-based optimisation of energy storage design parameters in a grid-connected hybrid renewable energy system

Keyword: adam

ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

AdaPlus: Integrating Nesterov Momentum and Precise Stepsize Adjustment on AdamW Basis

Keyword: gradient

Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction

Diffusion Modeling with Domain-conditioned Prior Guidance for Accelerated MRI and qMRI Reconstruction

Network Topology Inference with Sparsity and Laplacian Constraints

Switch and Conquer: Efficient Algorithms By Switching Stochastic Gradient Oracles For Decentralized Saddle Point Problems

Hessian-aware Quantized Node Embeddings for Recommendation

Efficient Covariance Matrix Reconstruction with Iterative Spatial Spectrum Sampling

martFL: Enabling Utility-Driven Data Marketplace with a Robust and Verifiable Federated Learning Architecture

Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments

EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity

Real-time pedestrian recognition on low computational resources