Abstract
Deep learning succeeds by doing hierarchical feature learning, yet tuning Hyper-Parameters (HP) such as initialization scales, learning rates etc., only give indirect control over this behavior. In this paper, we propose the alignment between the feature updates and the backward pass as a key notion to predict, measure and control feature learning. On the one hand, we show that when alignment holds, the magnitude of feature updates after one SGD step is related to the magnitude of the forward and backward passes by a simple and general formula. This leads to techniques to automatically adjust HPs (initialization scales and learning rates) at initialization and throughout training to attain a desired feature learning behavior. On the other hand, we show that, at random initialization, this alignment is determined by the spectrum of a certain kernel, and that well-conditioned layer-to-layer Jacobians (aka dynamical isometry) implies alignment. Finally, we investigate ReLU MLPs and ResNets in the large width-then-depth limit. Combining hints from random matrix theory and numerical experiments, we show that (i) in MLP with iid initializations, alignment degenerates with depth, making it impossible to start training, and that (ii) in ResNets, the branch scale $1/\sqrt{\text{depth}}$ is the only one maintaining non-trivial alignment at infinite depth.
Keyword: optimization
LayerCollapse: Adaptive compression of neural networks
Authors: Authors: Soheil Zibakhsh Shabgahi, Mohammad Soheil Shariff, Farinaz Koushanfar
Abstract
Handling the ever-increasing scale of contemporary deep learning and transformer-based models poses a significant challenge. Although great strides have been made in optimizing model compression techniques such as model architecture search and knowledge distillation, the availability of data and computational resources remains a considerable hurdle for these optimizations. This paper introduces LayerCollapse, a novel alternative adaptive model compression methodology. LayerCollapse works by eliminating non-linearities within the network and collapsing two consecutive fully connected layers into a single linear transformation. This approach simultaneously reduces both the number of layers and the parameter count, thereby enhancing model efficiency. We also introduce a compression aware regularizer, which compresses the model in alignment with the dataset quality and model expressiveness, consequently reducing overfitting across tasks. Our results demonstrate LayerCollapse's effective compression and regularization capabilities in multiple fine-grained classification benchmarks, achieving up to 74% post training compression with minimal accuracy loss. We compare this method with knowledge distillation on the same target network, showcasing a five-fold increase in computational efficiency and 8% improvement in overall accuracy on the ImageNet dataset.
Abstract
Recent progress in computer vision-oriented neural network designs is mostly driven by capturing high-order neural interactions among inputs and features. And there emerged a variety of approaches to accomplish this, such as Transformers and its variants. However, these interactions generate a large amount of intermediate state and/or strong data dependency, leading to considerable memory consumption and computing cost, and therefore compromising the overall runtime performance. To address this challenge, we rethink the high-order interactive neural network design with a quadratic computing approach. Specifically, we propose QuadraNet -- a comprehensive model design methodology from neuron reconstruction to structural block and eventually to the overall neural network implementation. Leveraging quadratic neurons' intrinsic high-order advantages and dedicated computation optimization schemes, QuadraNet could effectively achieve optimal cognition and computation performance. Incorporating state-of-the-art hardware-aware neural architecture search and system integration techniques, QuadraNet could also be well generalized in different hardware constraint settings and deployment scenarios. The experiment shows thatQuadraNet achieves up to 1.5$\times$ throughput, 30% less memory footprint, and similar cognition performance, compared with the state-of-the-art high-order approaches.
GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
Abstract
The advent of neural 3D Gaussians has recently brought about a revolution in the field of neural rendering, facilitating the generation of high-quality renderings at real-time speeds. However, the explicit and discrete representation encounters challenges when applied to scenes featuring reflective surfaces. In this paper, we present GaussianShader, a novel method that applies a simplified shading function on 3D Gaussians to enhance the neural rendering in scenes with reflective surfaces while preserving the training and rendering efficiency. The main challenge in applying the shading function lies in the accurate normal estimation on discrete 3D Gaussians. Specifically, we proposed a novel normal estimation framework based on the shortest axis directions of 3D Gaussians with a delicately designed loss to make the consistency between the normals and the geometries of Gaussian spheres. Experiments show that GaussianShader strikes a commendable balance between efficiency and visual quality. Our method surpasses Gaussian Splatting in PSNR on specular object datasets, exhibiting an improvement of 1.57dB. When compared to prior works handling reflective surfaces, such as Ref-NeRF, our optimization time is significantly accelerated (23h vs. 0.58h). Please click on our project website to see more results.
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
Authors: Authors: Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David B. Lindell
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-video models to generate dynamic 3D scenes. However, current text-to-4D methods face a three-way tradeoff between the quality of scene appearance, 3D structure, and motion. For example, text-to-image models and their 3D-aware variants are trained on internet-scale image datasets and can be used to produce scenes with realistic appearance and 3D structure -- but no motion. Text-to-video models are trained on relatively smaller video datasets and can produce scenes with motion, but poorer appearance and 3D structure. While these models have complementary strengths, they also have opposing weaknesses, making it difficult to combine them in a way that alleviates this three-way tradeoff. Here, we introduce hybrid score distillation sampling, an alternating optimization procedure that blends supervision signals from multiple pre-trained diffusion models and incorporates benefits of each for high-fidelity text-to-4D generation. Using hybrid SDS, we demonstrate synthesis of 4D scenes with compelling appearance, 3D structure, and motion.
Online Regulation of Dynamical Systems to Solutions of Constrained Optimization Problems
Abstract
This paper considers the problem of regulating a dynamical system to equilibria that are defined as solutions of an input- and state-constrained optimization problem. To solve this regulation task, we design a state feedback controller based on a continuous approximation of the projected gradient flow. We first show that the equilibria of the interconnection between the plant and the proposed controller correspond to critical points of the constrained optimization problem. We then derive sufficient conditions to ensure that, for the closed-loop system, isolated locally optimal solutions of the optimization problem are locally exponentially stable and show that input constraints are satisfied at all times by identifying an appropriate forward-invariant set.
TransOpt: Transformer-based Representation Learning for Optimization Problem Classification
Authors: Authors: Gjorgjina Cenikj, Gašper Petelin, Tome Eftimov
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
We propose a representation of optimization problem instances using a transformer-based neural network architecture trained for the task of problem classification of the 24 problem classes from the Black-box Optimization Benchmarking (BBOB) benchmark. We show that transformer-based methods can be trained to recognize problem classes with accuracies in the range of 70\%-80\% for different problem dimensions, suggesting the possible application of transformer architectures in acquiring representations for black-box optimization problems.
A Data-Driven, Non-Linear, Parameterized Reduced Order Model of Metal 3D Printing
Authors: Authors: Aaron L. Brown, Eric B. Chin, Youngsoo Choi, Saad A. Khairallah, Joseph T. McKeown
Abstract
Directed energy deposition (DED) is a promising metal additive manufacturing technology capable of 3D printing metal parts with complex geometries at lower cost compared to traditional manufacturing. The technology is most effective when process parameters like laser scan speed and power are optimized for a particular geometry and alloy. To accelerate optimization, we apply a data-driven, parameterized, non-linear reduced-order model (ROM) called Gaussian Process Latent Space Dynamics Identification (GPLaSDI) to physics-based DED simulation data. With an appropriate choice of hyperparameters, GPLaSDI is an effective ROM for this application, with a worst-case error of about 8% and a speed-up of about 1,000,000x with respect to the corresponding physics-based data.
TransNAS-TSAD: Harnessing Transformers for Multi-Objective Neural Architecture Search in Time Series Anomaly Detection
Authors: Authors: Ijaz Ul Haq, Byung Suk Lee
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract
The surge in real-time data collection across various industries has underscored the need for advanced anomaly detection in both univariate and multivariate time series data. Traditional methods, while comprehensive, often struggle to capture the complex interdependencies in such data. This paper introduces TransNAS-TSAD, a novel framework that synergizes transformer architecture with neural architecture search (NAS), enhanced through NSGA-II algorithm optimization. This innovative approach effectively tackles the complexities of both univariate and multivariate time series, balancing computational efficiency with detection accuracy. Our evaluation reveals that TransNAS-TSAD surpasses conventional anomaly detection models, demonstrating marked improvements in diverse data scenarios. We also propose the Efficiency-Accuracy-Complexity Score (EACS) as a new metric for assessing model performance, emphasizing the crucial balance between accuracy and computational resources. TransNAS-TSAD sets a new benchmark in time series anomaly detection, offering a versatile, efficient solution for complex real-world applications. This research paves the way for future developments in the field, highlighting its potential in a wide range of industry applications.
Self-Supervised Learning for Large-Scale Preventive Security Constrained DC Optimal Power Flow
Authors: Authors: Seonho Park, Pascal Van Hentenryck
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
Security-Constrained Optimal Power Flow (SCOPF) plays a crucial role in power grid stability but becomes increasingly complex as systems grow. This paper introduces PDL-SCOPF, a self-supervised end-to-end primal-dual learning framework for producing near-optimal solutions to large-scale SCOPF problems in milliseconds. Indeed, PDL-SCOPF remedies the limitations of supervised counterparts that rely on training instances with their optimal solutions, which becomes impractical for large-scale SCOPF problems. PDL-SCOPF mimics an Augmented Lagrangian Method (ALM) for training primal and dual networks that learn the primal solutions and the Lagrangian multipliers, respectively, to the unconstrained optimizations. In addition, PDL-SCOPF incorporates a repair layer to ensure the feasibility of the power balance in the nominal case, and a binary search layer to compute, using the Automatic Primary Response (APR), the generator dispatches in the contingencies. The resulting differentiable program can then be trained end-to-end using the objective function of the SCOPF and the power balance constraints of the contingencies. Experimental results demonstrate that the PDL-SCOPF delivers accurate feasible solutions with minimal optimality gaps. The framework underlying PDL-SCOPF aims at bridging the gap between traditional optimization methods and machine learning, highlighting the potential of self-supervised end-to-end primal-dual learning for large-scale optimization tasks.
The Forecastability of Underlying Building Electricity Demand from Time Series Data
Authors: Authors: Mohamad Khalil, A. Stephen McGough, Hussain Kazmi, Sara Walker
Abstract
Forecasting building energy consumption has become a promising solution in Building Energy Management Systems for energy saving and optimization. Furthermore, it can play an important role in the efficient management of the operation of a smart grid. Different data-driven approaches to forecast the future energy demand of buildings at different scale, and over various time horizons, can be found in the scientific literature, including extensive Machine Learning and Deep Learning approaches. However, the identification of the most accurate forecaster model which can be utilized to predict the energy demand of such a building is still challenging.In this paper, the design and implementation of a data-driven approach to predict how forecastable the future energy demand of a building is, without first utilizing a data-driven forecasting model, is presented. The investigation utilizes a historical electricity consumption time series data set with a half-hour interval that has been collected from a group of residential buildings located in the City of London, United Kingdom
Data-Driven Kalman Filter using Maximum Likelihood Optimization
Authors: Authors: Peihu Duan, Tao Liu, Yu Xing, Karl Henrik Johansson
Abstract
This paper investigates the state estimation problem for unknown linear systems with process and measurement noise. A novel data-driven Kalman filter (DDKF) that combines model identification with state estimation is developed using pre-collected input-output data and uncertain initial state information of the unknown system. Specifically, the state estimation problem is first formulated as a non-convex maximum likelihood (ML) optimization problem. Then, to reduce the computational complexity, the optimization problem is broken down into a series of sub-problems in a recursive manner. Based on the optimal solutions to the sub-problems, a closed-form DDKF is designed for the unknown system, which can estimate the state of a physically meaningful state-space realization, rather than these up to an unknown similarity transformation. The performance gap between the DDKF and the traditional Kalman filter with accurate system matrices is quantified through a sample complexity bound. In particular, when the number of the pre-collected trajectories tends to infinity, this gap converges to zero. Moreover, the DDKF is used to facilitate data-driven control design. A data-driven linear quadratic Gaussian controller is defined and its closed-loop performance is characterized. Finally, the effectiveness of the theoretical results is illustrated by numerical simulations.
Reconstructing the shape and material parameters of dissipative obstacles using an impedance model
Abstract
In inverse scattering problems, a model that allows for the simultaneous recovery of both the domain shape and an impedance boundary condition covers a wide range of problems with impenetrable domains, including recovering the shape of sound-hard and sound-soft obstacles and obstacles with thin coatings. This work develops an optimization framework for recovering the shape and material parameters of a penetrable, dissipative obstacle in the multifrequency setting, using a constrained class of curvature-dependent impedance function models proposed by Antoine, Barucq, and Vernhet. We find that this constrained model improves the robustness of the recovery problem, compared to more general models, and provides meaningfully better obstacle recovery than simpler models. We explore the effectiveness of the model for varying levels of dissipation, for noise-corrupted data, and for limited aperture data in the numerical examples.
Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features
Authors: Authors: Thomas Wimmer, Peter Wonka, Maks Ovsjanikov
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
With the immense growth of dataset sizes and computing resources in recent years, so-called foundation models have become popular in NLP and vision tasks. In this work, we propose to explore foundation models for the task of keypoint detection on 3D shapes. A unique characteristic of keypoint detection is that it requires semantic and geometric awareness while demanding high localization accuracy. To address this problem, we propose, first, to back-project features from large pre-trained 2D vision models onto 3D shapes and employ them for this task. We show that we obtain robust 3D features that contain rich semantic information and analyze multiple candidate features stemming from different 2D foundation models. Second, we employ a keypoint candidate optimization module which aims to match the average observed distribution of keypoints on the shape and is guided by the back-projected features. The resulting approach achieves a new state of the art for few-shot keypoint detection on the KeyPointNet dataset, almost doubling the performance of the previous best methods.
Composition of Nondeterministic and Stochastic Services for LTLf Task Specifications
Authors: Authors: Giuseppe De Giacomo, Marco Favorito, Luciana Silo
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
Abstract
In this paper, we study the composition of services so as to obtain runs satisfying a task specification in Linear Temporal Logic on finite traces (LTLf). We study the problem in the case services are nondeterministic and the LTLf specification can be exactly met, and in the case services are stochastic, where we are interested in maximizing the probability of satisfaction of the LTLf specification and, simultaneously, minimizing the utilization cost of the services. To do so, we combine techniques from LTLf synthesis, service composition `a la Roman Model, reactive synthesis, and bi-objective lexicographic optimization on MDPs. This framework has several interesting applications, including Smart Manufacturing and Digital Twins.
Abstract
With the increasing demands from passengers for data-intensive services, millimeter-wave (mmWave) communication is considered as an effective technique to release the transmission pressure on high speed train (HST) networks. However, mmWave signals ncounter severe losses when passing through the carriage, which decreases the quality of services on board. In this paper, we investigate an intelligent refracting surface (IRS)-assisted HST communication system. Herein, an IRS is deployed on the train window to dynamically reconfigure the propagation environment, and a hybrid time division multiple access-nonorthogonal multiple access scheme is leveraged for interference mitigation. We aim to maximize the overall throughput while taking into account the constraints imposed by base station beamforming, IRS discrete phase shifts and transmit power. To obtain a practical solution, we employ an alternating optimization method and propose a two-stage algorithm. In the first stage, the successive convex approximation method and branch and bound algorithm are leveraged for IRS phase shift design. In the second stage, the Lagrangian multiplier method is utilized for power allocation. Simulation results demonstrate the benefits of IRS adoption and power allocation for throughput improvement in mmWave HST networks.
PEOPLEx: PEdestrian Opportunistic Positioning LEveraging IMU, UWB, BLE and WiFi
Abstract
This paper advances the field of pedestrian localization by introducing a unifying framework for opportunistic positioning based on nonlinear factor graph optimization. While many existing approaches assume constant availability of one or multiple sensing signals, our methodology employs IMU-based pedestrian inertial navigation as the backbone for sensor fusion, opportunistically integrating Ultra-Wideband (UWB), Bluetooth Low Energy (BLE), and WiFi signals when they are available in the environment. The proposed PEOPLEx framework is designed to incorporate sensing data as it becomes available, operating without any prior knowledge about the environment (e.g. anchor locations, radio frequency maps, etc.). Our contributions are twofold: 1) we introduce an opportunistic multi-sensor and real-time pedestrian positioning framework fusing the available sensor measurements; 2) we develop novel factors for adaptive scaling and coarse loop closures, significantly improving the precision of indoor positioning. Experimental validation confirms that our approach achieves accurate localization estimates in real indoor scenarios using commercial smartphones.
SMaRt: Improving GANs with Score Matching Regularity
Authors: Authors: Mengfei Xia, Yujun Shen, Ceyuan Yang, Ran Yi, Wenping Wang, Yong-jin Liu
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Generative adversarial networks (GANs) usually struggle in learning from highly diverse data, whose underlying manifold is complex. In this work, we revisit the mathematical foundations of GANs, and theoretically reveal that the native adversarial loss for GAN training is insufficient to fix the problem of subsets with positive Lebesgue measure of the generated data manifold lying out of the real data manifold. Instead, we find that score matching serves as a valid solution to this issue thanks to its capability of persistently pushing the generated data points towards the real data manifold. We thereby propose to improve the optimization of GANs with score matching regularity (SMaRt). Regarding the empirical evidences, we first design a toy example to show that training GANs by the aid of a ground-truth score function can help reproduce the real data distribution more accurately, and then confirm that our approach can consistently boost the synthesis performance of various state-of-the-art GANs on real-world datasets with pre-trained diffusion models acting as the approximate score function. For instance, when training Aurora on the ImageNet 64x64 dataset, we manage to improve FID from 8.87 to 7.11, on par with the performance of one-step consistency model. The source code will be made public.
Whole-body Dynamic Collision Avoidance with Time-varying Control Barrier Functions
Authors: Authors: Jihao Huang, Xuemin Chi, Zhitao Liu, Hongye Su
Abstract
Recently, there has been increasing attention in robot research towards the whole-body collision avoidance. In this paper, we propose a safety-critical controller that utilizes time-varying control barrier functions (time varying CBFs) constructed by Robo-centric Euclidean Signed Distance Field (RC-ESDF) to achieve dynamic collision avoidance. The RC-ESDF is constructed in the robot body frame and solely relies on the robot's shape, eliminating the need for real-time updates to save computational resources. Additionally, we design two control Lyapunov functions (CLFs) to ensure that the robot can reach its destination. To enable real-time application, our safety-critical controller which incorporates CLFs and CBFs as constraints is formulated as a quadratic program (QP) optimization problem. We conducted numerical simulations on two different dynamics of an L-shaped robot to verify the effectiveness of our proposed approach.
Poisoning Attacks Against Contrastive Recommender Systems
Authors: Authors: Zongwei Wang, Junliang Yu, Min Gao, Hongzhi Yin, Bin Cui, Shazia Sadiq
Subjects: Information Retrieval (cs.IR); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
Contrastive learning (CL) has recently gained significant popularity in the field of recommendation. Its ability to learn without heavy reliance on labeled data is a natural antidote to the data sparsity issue. Previous research has found that CL can not only enhance recommendation accuracy but also inadvertently exhibit remarkable robustness against noise. However, this paper identifies a vulnerability of CL-based recommender systems: Compared with their non-CL counterparts, they are even more susceptible to poisoning attacks that aim to promote target items. Our analysis points to the uniform dispersion of representations led by the CL loss as the very factor that accounts for this vulnerability. We further theoretically and empirically demonstrate that the optimization of CL loss can lead to smooth spectral values of representations. Based on these insights, we attempt to reveal the potential poisoning attacks against CL-based recommender systems. The proposed attack encompasses a dual-objective framework: One that induces a smoother spectral value distribution to amplify the CL loss's inherent dispersion effect, named dispersion promotion; and the other that directly elevates the visibility of target items, named rank promotion. We validate the destructiveness of our attack model through extensive experimentation on four datasets. By shedding light on these vulnerabilities, we aim to facilitate the development of more robust CL-based recommender systems.
Combined Scheduling, Memory Allocation and Tensor Replacement for Minimizing Off-Chip Data Accesses of DNN Accelerators
Authors: Authors: Yi Li, Aarti Gupta, Sharad Malik
Abstract
Specialized hardware accelerators have been extensively used for Deep Neural Networks (DNNs) to provide power/performance benefits. These accelerators contain specialized hardware that supports DNN operators, and scratchpad memory for storing the tensor operands. Often, the size of the scratchpad is insufficient to store all the tensors needed for the computation, and additional data accesses are needed to move tensors back and forth from host memory during the computation with significant power/performance overhead. The volume of these additional data accesses depends on the operator schedule, and memory allocation (specific locations selected for the tensors in the scratchpad). We propose an optimization framework, named COSMA, for mapping DNNs to an accelerator that finds the optimal operator schedule, memory allocation and tensor replacement that minimizes the additional data accesses. COSMA provides an Integer Linear Programming (ILP) formulation to generate the optimal solution for mapping a DNN to the accelerator for a given scratchpad size. We demonstrate that, using an off-the-shelf ILP solver, COSMA obtains the optimal solution in seconds for a wide-range of state-of-the-art DNNs for different applications. Further, it out-performs existing methods by reducing on average 84% of the non-compulsory data accesses. We further propose a divide-and-conquer heuristic to scale up to certain complex DNNs generated by Neural Architecture Search, and this heuristic solution reduces on average 85% data accesses compared with other works.
Learning for Semantic Knowledge Base-Guided Online Feature Transmission in Dynamic Channels
Abstract
With the proliferation of edge computing, efficient AI inference on edge devices has become essential for intelligent applications such as autonomous vehicles and VR/AR. In this context, we address the problem of efficient remote object recognition by optimizing feature transmission between mobile devices and edge servers. We propose an online optimization framework to address the challenge of dynamic channel conditions and device mobility in an end-to-end communication system. Our approach builds upon existing methods by leveraging a semantic knowledge base to drive multi-level feature transmission, accounting for temporal factors and dynamic elements throughout the transmission process. To solve the online optimization problem, we design a novel soft actor-critic-based deep reinforcement learning system with a carefully designed reward function for real-time decision-making, overcoming the optimization difficulty of the NP-hard problem and achieving the minimization of semantic loss while respecting latency constraints. Numerical results showcase the superiority of our approach compared to traditional greedy methods under various system setups.
Advances in 3D Neural Stylization: A Survey
Authors: Authors: Yingshu Chen, Guocheng Shao, Ka Chun Shum, Binh-Son Hua, Sai-Kit Yeung
Abstract
Modern artificial intelligence provides a novel way of producing digital art in styles. The expressive power of neural networks enables the realm of visual style transfer methods, which can be used to edit images, videos, and 3D data to make them more artistic and diverse. This paper reports on recent advances in neural stylization for 3D data. We provide a taxonomy for neural stylization by considering several important design choices, including scene representation, guidance data, optimization strategies, and output styles. Building on such taxonomy, our survey first revisits the background of neural stylization on 2D images, and then provides in-depth discussions on recent neural stylization methods for 3D data, where we also provide a mini-benchmark on artistic stylization methods. Based on the insights gained from the survey, we then discuss open challenges, future research, and potential applications and impacts of neural stylization.
Spherical Designs for Function Approximation and Beyond
Abstract
In this paper, we compare two optimization algorithms using full Hessian and approximation Hessian to obtain numerical spherical designs through their variational characterization. Based on the obtained spherical design point sets, we investigate the approximation of smooth and non-smooth functions by spherical harmonics with spherical designs. Finally, we use spherical framelets for denoising Wendland functions as an application, which shows the great potential of spherical designs in spherical data processing.
Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control
Authors: Authors: Bernd Frauenknecht, Tobias Ehlgen, Sebastian Trimpe
Abstract
Advanced vehicle control is a fundamental building block in the development of autonomous driving systems. Reinforcement learning (RL) promises to achieve control performance superior to classical approaches while keeping computational demands low during deployment. However, standard RL approaches like soft-actor critic (SAC) require extensive amounts of training data to be collected and are thus impractical for real-world application. To address this issue, we apply recently developed data-efficient deep RL methods to vehicle trajectory control. Our investigation focuses on three methods, so far unexplored for vehicle control: randomized ensemble double Q-learning (REDQ), probabilistic ensembles with trajectory sampling and model predictive path integral optimizer (PETS-MPPI), and model-based policy optimization (MBPO). We find that in the case of trajectory control, the standard model-based RL formulation used in approaches like PETS-MPPI and MBPO is not suitable. We, therefore, propose a new formulation that splits dynamics prediction and vehicle localization. Our benchmark study on the CARLA simulator reveals that the three identified data-efficient deep RL approaches learn control strategies on a par with or better than SAC, yet reduce the required number of environment interactions by more than one order of magnitude.
Beamforming Design for Active RIS-Aided Over-the-Air Computation
Authors: Authors: Deyou Zhang, Ming Xiao, Mikael Skoglund, H. Vincent Poor
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Over-the-air computation (AirComp) is emerging as a promising technology for wireless data aggregation. However, its performance is hampered by users with poor channel conditions. To mitigate such a performance bottleneck, this paper introduces an active reconfigurable intelligence surface (RIS) into the AirComp system. Specifically, we begin by exploring the ideal RIS model and propose a joint optimization of the transceiver design and RIS configuration to minimize the mean squared error (MSE) between the target and estimated function values. To manage the resultant tri-convex optimization problem, we employ the alternating optimization (AO) technique to decompose it into three convex subproblems, each solvable optimally. Subsequently, we investigate two specific cases and analyze their respective asymptotic performance to reveal the superiority of the active RIS in mitigating the MSE relative to its passive counterpart. Lastly, we adapt our transceiver and RIS configuration design to account for the self-interference of the active RIS. To handle the resultant highly non-convex problem, we further devise a two-layer AO framework. Simulation results demonstrate the superiority of the active RIS in enhancing AirComp performance compared to its passive counterpart.
Advancing Medical Education through the cINnAMON Web Application
Abstract
The cINnAMON EUREKA Traditional project endeavours to revolutionize indoor lighting positioning and monitoring through the integration of intelligent devices and advanced sensor technologies. This article presents the prototypes developed for various project components and explores their potential application in medical education, particularly for aspiring healthcare professionals. The current variant of the intelligent bulb prototype offers a comparative analysis of the project's bulb against commercially available smart bulbs, shedding light on its superior efficiency and capabilities. Furthermore, the initial smart bracelet prototype showcases its ability to collect and analyse data from an array of built-in sensors, empowering medical students to evaluate fragility levels based on accelerometer, gyroscope, orientation, and heart rate data. Leveraging trilateration and optimization algorithms, the intelligent location module enables precise monitoring of individuals' positions within a building, enhancing medical students' understanding of patient localization in healthcare settings. In addition, the recognition of human activity module harnesses data from the bracelet's sensors to classify different activities, providing medical students with invaluable insights into patients' daily routines and mobility patterns. The user's personal profile module facilitates seamless user registration and access to the comprehensive services offered by the cINnAMON system, empowering medical students to collect patient data for analysis and aiding doctors in making informed healthcare decisions. With the telemonitoring system, medical students can remotely monitor patients by configuring sensors in their homes, thus enabling a deeper understanding of remote patient management.
Abstract
In this paper, reconfigurable intelligent surface (RIS)-assisted generalized receive quadrature spatial modulation (RIS-GRQSM) is proposed to improve the spectral efficiency of RIS-aided quadrature spatial modulation (QSM) systems by utilizing the concept of generalized spatial modulation (GSM). That is, multiple antennas are activated at the receiver independently for both the real and imaginary parts. We propose a max-min optimization problem to adjust the phase shifts of all RIS elements to maximize the relevant signal-to-noise ratios (SNRs) at all activated receive antennas. Using Lagrange duality, the non-convex optimization problem involving the phase shifts of all RIS elements reduces to a convex optimization involving a number of variables equal to the number of activated receive antennas. A successive greedy detector (GD) can be used at the receiver to detect the active antennas, which simplifies the detection process. The numerical results show that the proposed scheme outperforms the benchmark schemes in terms of error rate performance, especially in systems with a larger number of receive antennas. In the special case where each receive antenna corresponds to a user and is activated, the RIS-GRQSM system becomes a multicast communication system. In this context, in contrast to existing phase shift optimization algorithms which exhibit an impractical level of complexity, our proposed solution offers the advantage of low complexity and practical feasibility of implementation.
A Formulation of Structural Design Optimization Problems for Quantum Annealing
Authors: Authors: Fabian Key, Lukas Freinberger
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
We present a novel formulation of structural design optimization problems specifically tailored to be solved by quantum annealing (QA). Structural design optimization aims to find the best, i.e., material-efficient yet high-performance, configuration of a structure. To this end, computational optimization strategies can be employed, where a recently evolving strategy based on quantum mechanical effects is QA. This approach requires the optimization problem to be present, e.g., as a quadratic unconstrained binary optimization (QUBO) model. Thus, we develop a novel formulation of the optimization problem. The latter typically involves an analysis model for the component. Here, we use energy minimization principles that govern the behavior of structures under applied loads. This allows us to state the optimization problem as one overall minimization problem. Next, we map this to a QUBO problem that can be immediately solved by QA. We validate the proposed approach using a size optimization problem of a compound rod under self-weight loading. To this end, we develop strategies to account for the limitations of currently available hardware and find that the presented formulation is suitable for solving structural design optimization problems through QA and, for small-scale problems, already works on today's hardware.
Robust-to-Noise Algorithms for Distributed Resource Allocation and Scheduling
Subjects: Systems and Control (eess.SY); Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP); Optimization and Control (math.OC)
Abstract
Efficient resource allocation and scheduling algorithms are essential for various distributed applications, ranging from wireless networks and cloud computing platforms to autonomous multi-agent systems and swarm robotic networks. However, real-world environments are often plagued by uncertainties and noise, leading to sub-optimal performance and increased vulnerability of traditional algorithms. This paper addresses the challenge of robust resource allocation and scheduling in the presence of noise and disturbances. The proposed study introduces a novel sign-based dynamics for developing robust-to-noise algorithms distributed over a multi-agent network that can adaptively handle external disturbances. Leveraging concepts from convex optimization theory, control theory, and network science the framework establishes a principled approach to design algorithms that can maintain key properties such as resource-demand balance and constraint feasibility. Meanwhile, notions of uniform-connectivity and versatile networking conditions are also addressed.
Solving the Team Orienteering Problem with Transformers
Authors: Authors: Daniel Fuertes, Carlos R. del-Blanco, Fernando Jaureguizar, Narciso García
Abstract
Route planning for a fleet of vehicles is an important task in applications such as package delivery, surveillance, or transportation. This problem is usually modeled as a Combinatorial Optimization problem named as Team Orienteering Problem. The most popular Team Orienteering Problem solvers are mainly based on either linear programming, which provides accurate solutions by employing a large computation time that grows with the size of the problem, or heuristic methods, which usually find suboptimal solutions in a shorter amount of time. In this paper, a multi-agent route planning system capable of solving the Team Orienteering Problem in a very fast and accurate manner is presented. The proposed system is based on a centralized Transformer neural network that can learn to encode the scenario (modeled as a graph) and the context of the agents to provide fast and accurate solutions. Several experiments have been performed to demonstrate that the presented system can outperform most of the state-of-the-art works in terms of computation speed. In addition, the code is publicly available at \url{this http URL}.
Local Geometry Determines Global Landscape in Low-rank Factorization for Synchronization
Authors: Authors: Shuyang Ling
Subjects: Information Theory (cs.IT); Optimization and Control (math.OC); Computation (stat.CO)
Abstract
The orthogonal group synchronization problem, which focuses on recovering orthogonal group elements from their corrupted pairwise measurements, encompasses examples such as high-dimensional Kuramoto model on general signed networks, $\mathbb{Z}_2$-synchronization, community detection under stochastic block models, and orthogonal Procrustes problem. The semidefinite relaxation (SDR) has proven its power in solving this problem; however, its expensive computational costs impede its widespread practical applications. We consider the Burer-Monteiro factorization approach to the orthogonal group synchronization, an effective and scalable low-rank factorization to solve large scale SDPs. Despite the significant empirical successes of this factorization approach, it is still a challenging task to understand when the nonconvex optimization landscape is benign, i.e., the optimization landscape possesses only one local minimizer, which is also global. In this work, we demonstrate that if the degree of freedom within the factorization exceeds twice the condition number of the Laplacian" (certificate matrix) at the global minimizer, the optimization landscape is absent of spurious local minima. Our main theorem is purely algebraic and versatile, and it seamlessly applies to all the aforementioned examples: the nonconvex landscape remains benign under almost identical condition that enables the success of the SDR. Additionally, we illustrate that the Burer-Monteiro factorization is robust tomonotone adversaries", mirroring the resilience of the SDR. In other words, introducing ``favorable" adversaries into the data will not result in the emergence of new spurious local minimizers.
Geometry-Aware Normalizing Wasserstein Flows for Optimal Causal Inference
Abstract
This manuscript enriches the framework of continuous normalizing flows (CNFs) within causal inference, primarily to augment the geometric properties of parametric submodels used in targeted maximum likelihood estimation (TMLE). By introducing an innovative application of CNFs, we construct a refined series of parametric submodels that enable a directed interpolation between the prior distribution $p_0$ and the empirical distribution $p_1$. This proposed methodology serves to optimize the semiparametric efficiency bound in causal inference by orchestrating CNFs to align with Wasserstein gradient flows. Our approach not only endeavors to minimize the mean squared error in the estimation but also imbues the estimators with geometric sophistication, thereby enhancing robustness against misspecification. This robustness is crucial, as it alleviates the dependence on the standard $n^{\frac{1}{4}}$ rate for a doubly-robust perturbation direction in TMLE. By incorporating robust optimization principles and differential geometry into the estimators, the developed geometry-aware CNFs represent a significant advancement in the pursuit of doubly robust causal inference.
Keyword: adam
There is no result
Keyword: gradient
Online Regulation of Dynamical Systems to Solutions of Constrained Optimization Problems
Abstract
This paper considers the problem of regulating a dynamical system to equilibria that are defined as solutions of an input- and state-constrained optimization problem. To solve this regulation task, we design a state feedback controller based on a continuous approximation of the projected gradient flow. We first show that the equilibria of the interconnection between the plant and the proposed controller correspond to critical points of the constrained optimization problem. We then derive sufficient conditions to ensure that, for the closed-loop system, isolated locally optimal solutions of the optimization problem are locally exponentially stable and show that input constraints are satisfied at all times by identifying an appropriate forward-invariant set.
A trainable manifold for accurate approximation with ReLU Networks
Abstract
We present a novel technique for exercising greater control of the weights of ReLU activated neural networks to produce more accurate function approximations. Many theoretical works encode complex operations into ReLU networks using smaller base components. In these works, a common base component is a constant width approximation to x^2, which has exponentially decaying error with respect to depth. We extend this block to represent a greater range of convex one-dimensional functions. We derive a manifold of weights such that the output of these new networks utilizes exponentially many piecewise-linear segments. This manifold guides their training process to overcome drawbacks associated with random initialization and unassisted gradient descent. We train these networks to approximate functions which do not necessarily lie on the manifold, showing a significant reduction of error values over conventional approaches.
DisMech: A Discrete Differential Geometry-based Physical Simulator for Soft Robots and Structures
Authors: Authors: Andrew Choi, Ran Jing, Andrew Sabelhaus, Mohammad Khalid Jawed
Abstract
Fast, accurate, and generalizable simulations are a key enabler of modern advances in robot design and control. However, existing simulation frameworks in robotics either model rigid environments and mechanisms only, or if they include flexible or soft structures, suffer significantly in one or more of these performance areas. To close this "sim2real" gap, we introduce DisMech, a simulation environment that models highly dynamic motions of rod-like soft continuum robots and structures, quickly and accurately, with arbitrary connections between them. Our methodology combines a fully implicit discrete differential geometry-based physics solver with fast and accurate contact handling, all in an intuitive software interface. Crucially, we propose a gradient descent approach to easily map the motions of hardware robot prototypes to control inputs in DisMech. We validate DisMech through several highly-nuanced soft robot simulations while demonstrating an order of magnitude speed increase over previous state of the art. Our real2sim validation shows high physical accuracy versus hardware, even with complicated soft actuation mechanisms such as shape memory alloy wires. With its low computational cost, physical accuracy, and ease of use, DisMech can accelerate translation of sim-based control for both soft robotics and deformable object manipulation.
Deep Reinforcement Learning Based Optimal Energy Management of Multi-energy Microgrids with Uncertainties
Authors: Authors: Yang Cui, Yang Xu, Yang Li, Yijian Wang, Xinpeng Zou
Abstract
Multi-energy microgrid (MEMG) offers an effective approach to deal with energy demand diversification and new energy consumption on the consumer side. In MEMG, it is critical to deploy an energy management system (EMS) for efficient utilization of energy and reliable operation of the system. To help EMS formulate optimal dispatching schemes, a deep reinforcement learning (DRL)-based MEMG energy management scheme with renewable energy source (RES) uncertainty is proposed in this paper. To accurately describe the operating state of the MEMG, the off-design performance model of energy conversion devices is considered in scheduling. The nonlinear optimal dispatching model is expressed as a Markov decision process (MDP) and is then addressed by the twin delayed deep deterministic policy gradient (TD3) algorithm. In addition, to accurately describe the uncertainty of RES, the conditional-least squares generative adversarial networks (C-LSGANs) method based on RES forecast power is proposed to construct the scenarios set of RES power generation. The generated data of RES is used for scheduling to obtain caps and floors for the purchase of electricity and natural gas. Based on this, the superior energy supply sector can formulate solutions in advance to tackle the uncertainty of RES. Finally, the simulation analysis demonstrates the validity and superiority of the method.
DSeg: Direct Line Segments Detection
Authors: Authors: Berger Cyrille, Lacroix Simon
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
This paper presents a model-driven approach to detect image line segments. The approach incrementally detects segments on the gradient image using a linear Kalman filter that estimates the supporting line parameters and their associated variances. The algorithm is fast and robust with respect to image noise and illumination variations, it allows the detection of longer line segments than data-driven approaches, and does not require any tedious parameters tuning. An extension of the algorithm that exploits a pyramidal approach to enhance the quality of results is proposed. Results with varying scene illumination and comparisons to classic existing approaches are presented.
On Exact Inversion of DPM-Solvers
Authors: Authors: Seongmin Hong, Kyeonghyun Lee, Suh Yoon Jeon, Hyewon Bae, Se Young Chun
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Diffusion probabilistic models (DPMs) are a key component in modern generative models. DPM-solvers have achieved reduced latency and enhanced quality significantly, but have posed challenges to find the exact inverse (i.e., finding the initial noise from the given image). Here we investigate the exact inversions for DPM-solvers and propose algorithms to perform them when samples are generated by the first-order as well as higher-order DPM-solvers. For each explicit denoising step in DPM-solvers, we formulated the inversions using implicit methods such as gradient descent or forward step method to ensure the robustness to large classifier-free guidance unlike the prior approach using fixed-point iteration. Experimental results demonstrated that our proposed exact inversion methods significantly reduced the error of both image and noise reconstructions, greatly enhanced the ability to distinguish invisible watermarks and well prevented unintended background changes consistently during image editing. Project page: \url{https://smhongok.github.io/inv-dpm.html}.
A Robust Hessian-based Trust Region Algorithm for Spherical Conformal Parameterizations
Abstract
Surface parameterizations are widely applied in computer graphics, medical imaging and transformation optics. In this paper, we rigorously derive the gradient vector and Hessian matrix of the discrete conformal energy for spherical conformal parameterizations of simply connected closed surfaces of genus-$0$. In addition, we give the sparsity structure of the Hessian matrix, which leads to a robust Hessian-based trust region algorithm for the computation of spherical conformal maps. Numerical experiments demonstrate the local quadratic convergence of the proposed algorithm with low conformal distortions. We subsequently propose an application of our method to surface registrations that still maintains local quadratic convergence.
Data-Agnostic Model Poisoning against Federated Learning: A Graph Autoencoder Approach
Authors: Authors: Kai Li, Jingjing Zheng, Xin Yuan, Wei Ni, Ozgur B. Akan, H. Vincent Poor
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
This paper proposes a novel, data-agnostic, model poisoning attack on Federated Learning (FL), by designing a new adversarial graph autoencoder (GAE)-based framework. The attack requires no knowledge of FL training data and achieves both effectiveness and undetectability. By listening to the benign local models and the global model, the attacker extracts the graph structural correlations among the benign local models and the training data features substantiating the models. The attacker then adversarially regenerates the graph structural correlations while maximizing the FL training loss, and subsequently generates malicious local models using the adversarial graph structure and the training data features of the benign ones. A new algorithm is designed to iteratively train the malicious local models using GAE and sub-gradient descent. The convergence of FL under attack is rigorously proved, with a considerably large optimality gap. Experiments show that the FL accuracy drops gradually under the proposed attack and existing defense mechanisms fail to detect it. The attack can give rise to an infection across all benign devices, making it a serious threat to FL.
Learning Radio Environments by Differentiable Ray Tracing
Authors: Authors: Jakob Hoydis, Fayçal Aït Aoudia, Sebastian Cammerer, Florian Euchner, Merlin Nimier-David, Stephan ten Brink, Alexander Keller
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Ray tracing (RT) is instrumental in 6G research in order to generate spatially-consistent and environment-specific channel impulse responses (CIRs). While acquiring accurate scene geometries is now relatively straightforward, determining material characteristics requires precise calibration using channel measurements. We therefore introduce a novel gradient-based calibration method, complemented by differentiable parametrizations of material properties, scattering and antenna patterns. Our method seamlessly integrates with differentiable ray tracers that enable the computation of derivatives of CIRs with respect to these parameters. Essentially, we approach field computation as a large computational graph wherein parameters are trainable akin to weights of a neural network (NN). We have validated our method using both synthetic data and real-world indoor channel measurements, employing a distributed multiple-input multiple-output (MIMO) channel sounder.
Automatic Functional Differentiation in JAX
Authors: Authors: Min Lin
Subjects: Programming Languages (cs.PL); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
We extend JAX with the capability to automatically differentiate higher-order functions (functionals and operators). By representing functions as a generalization of arrays, we seamlessly use JAX's existing primitive system to implement higher-order functions. We present a set of primitive operators that serve as foundational building blocks for constructing several key types of functionals. For every introduced primitive operator, we derive and implement both linearization and transposition rules, aligning with JAX's internal protocols for forward and reverse mode automatic differentiation. This enhancement allows for functional differentiation in the same syntax traditionally use for functions. The resulting functional gradients are themselves functions ready to be invoked in python. We showcase this tool's efficacy and simplicity through applications where functional derivatives are indispensable. The source code of this work is released at https://github.com/sail-sg/autofd .
A robust and adaptive GenEO-type domain decomposition preconditioner for $\mathbf{H}(\mathbf{curl})$ problems in general non-convex three-dimensional geometries
Abstract
In this paper we develop and analyse domain decomposition methods for linear systems of equations arising from conforming finite element discretisations of positive Maxwell-type equations, namely for $\mathbf{H}(\mathbf{curl})$ problems. It is well known that convergence of domain decomposition methods rely heavily on the efficiency of the coarse space used in the second level. We design adaptive coarse spaces that complement a near-kernel space made from the gradient of scalar functions. The new class of preconditioner is inspired by the idea of subspace decomposition, but based on spectral coarse spaces, and is specially designed for curl-conforming discretisations of Maxwell's equations in heterogeneous media on general domains which may have holes. Our approach has wider applicability and theoretical justification than the well-known Hiptmair-Xu auxiliary space preconditioner, with results extending to the variable coefficient case and non-convex domains at the expense of a larger coarse space.
Communication-Efficient Federated Optimization over Semi-Decentralized Networks
Authors: Authors: He Wang, Yuejie Chi
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Abstract
In large-scale federated and decentralized learning, communication efficiency is one of the most challenging bottlenecks. While gossip communication -- where agents can exchange information with their connected neighbors -- is more cost-effective than communicating with the remote server, it often requires a greater number of communication rounds, especially for large and sparse networks. To tackle the trade-off, we examine the communication efficiency under a semi-decentralized communication protocol, in which agents can perform both agent-to-agent and agent-to-server communication in a probabilistic manner. We design a tailored communication-efficient algorithm over semi-decentralized networks, referred to as PISCO, which inherits the robustness to data heterogeneity thanks to gradient tracking and allows multiple local updates for saving communication. We establish the convergence rate of PISCO for nonconvex problems and show that PISCO enjoys a linear speedup in terms of the number of agents and local updates. Our numerical results highlight the superior communication efficiency of PISCO and its resilience to data heterogeneity and various network topologies.
Geometry-Aware Normalizing Wasserstein Flows for Optimal Causal Inference
Abstract
This manuscript enriches the framework of continuous normalizing flows (CNFs) within causal inference, primarily to augment the geometric properties of parametric submodels used in targeted maximum likelihood estimation (TMLE). By introducing an innovative application of CNFs, we construct a refined series of parametric submodels that enable a directed interpolation between the prior distribution $p_0$ and the empirical distribution $p_1$. This proposed methodology serves to optimize the semiparametric efficiency bound in causal inference by orchestrating CNFs to align with Wasserstein gradient flows. Our approach not only endeavors to minimize the mean squared error in the estimation but also imbues the estimators with geometric sophistication, thereby enhancing robustness against misspecification. This robustness is crucial, as it alleviates the dependence on the standard $n^{\frac{1}{4}}$ rate for a doubly-robust perturbation direction in TMLE. By incorporating robust optimization principles and differential geometry into the estimators, the developed geometry-aware CNFs represent a significant advancement in the pursuit of doubly robust causal inference.
One-step Diffusion with Distribution Matching Distillation
Authors: Authors: Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Freeman, Taesung Park
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Diffusion models generate high-quality images but require dozens of forward passes. We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator with minimal impact on image quality. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence whose gradient can be expressed as the difference between 2 score functions, one of the target distribution and the other of the synthetic distribution being produced by our one-step generator. The score functions are parameterized as two diffusion models trained separately on each distribution. Combined with a simple regression loss matching the large-scale structure of the multi-step diffusion outputs, our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k, comparable to Stable Diffusion but orders of magnitude faster. Utilizing FP16 inference, our model can generate images at 20 FPS on modern hardware.
Abstract
Dataset distillation aims to generate a smaller but representative subset from a large dataset, which allows a model to be trained efficiently, meanwhile evaluating on the original testing data distribution to achieve decent performance. Many prior works have aimed to align with diverse aspects of the original datasets, such as matching the training weight trajectories, gradient, feature/BatchNorm distributions, etc. In this work, we show how to distill various large-scale datasets such as full ImageNet-1K/21K under a conventional input resolution of 224$\times$224 to achieve the best accuracy over all previous approaches, including SRe$^2$L, TESLA and MTT. To achieve this, we introduce a simple yet effective ${\bf C}$urriculum ${\bf D}$ata ${\bf A}$ugmentation ($\texttt{CDA}$) during data synthesis that obtains the accuracy on large-scale ImageNet-1K and 21K with 63.2% under IPC (Images Per Class) 50 and 36.1% under IPC 20, respectively. Finally, we show that, by integrating all our enhancements together, the proposed model beats the current state-of-the-art by more than 4% Top-1 accuracy on ImageNet-1K/21K and for the first time, reduces the gap to its full-data training counterpart to less than absolute 15%. Moreover, this work represents the inaugural success in dataset distillation on larger-scale ImageNet-21K under the standard 224$\times$224 resolution. Our code and distilled ImageNet-21K dataset of 20 IPC, 2K recovery budget are available at https://github.com/VILA-Lab/SRe2L/tree/main/CDA.
Keyword: super-resolution
PEAN: A Diffusion-based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution
Abstract
Scene text image super-resolution (STISR) aims at simultaneously increasing the resolution and readability of low-resolution scene text images, thus boosting the performance of the downstream recognition task. Two factors in scene text images, semantic information and visual structure, affect the recognition performance significantly. To mitigate the effects from these factors, this paper proposes a Prior-Enhanced Attention Network (PEAN). Specifically, a diffusion-based module is developed to enhance the text prior, hence offering better guidance for the SR network to generate SR images with higher semantic accuracy. Meanwhile, the proposed PEAN leverages an attention-based modulation module to understand scene text images by neatly perceiving the local and global dependence of images, despite the shape of the text. A multi-task learning paradigm is employed to optimize the network, enabling the model to generate legible SR images. As a result, PEAN establishes new SOTA results on the TextZoom benchmark. Experiments are also conducted to analyze the importance of the enhanced text prior as a means of improving the performance of the SR network. Code will be made available at https://github.com/jdfxzzy/PEAN.
Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing
Abstract
Super-Resolution for remote sensing has the potential for huge impact on planet monitoring by producing accurate and realistic high resolution imagery on a frequent basis and a global scale. Despite a lot of attention, several inconsistencies and challenges have prevented it from being deployed in practice. These include the lack of effective metrics, fragmented and relatively small-scale datasets for training, insufficient comparisons across a suite of methods, and unclear evidence for the use of super-resolution outputs for machine consumption. This work presents a new metric for super-resolution, CLIPScore, that corresponds far better with human judgments than previous metrics on an extensive study. We use CLIPScore to evaluate four standard methods on a new large-scale dataset, S2-NAIP, and three existing benchmark datasets, and find that generative adversarial networks easily outperform more traditional L2 loss-based models and are more semantically accurate than modern diffusion models. We also find that using CLIPScore as an auxiliary loss can speed up the training of GANs by 18x and lead to improved outputs, resulting in an effective model in diverse geographies across the world which we will release publicly. The dataset, pre-trained model weights, and code are available at https://github.com/allenai/satlas-super-resolution/.
HiPA: Enabling One-Step Text-to-Image Diffusion Models via High-Frequency-Promoting Adaptation
Authors: Authors: Yifan Zhang, Bryan Hooi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Diffusion models have revolutionized text-to-image generation, but their real-world applications are hampered by the extensive time needed for hundreds of diffusion steps. Although progressive distillation has been proposed to speed up diffusion sampling to 2-8 steps, it still falls short in one-step generation, and necessitates training multiple student models, which is highly parameter-extensive and time-consuming. To overcome these limitations, we introduce High-frequency-Promoting Adaptation (HiPA), a parameter-efficient approach to enable one-step text-to-image diffusion. Grounded in the insight that high-frequency information is essential but highly lacking in one-step diffusion, HiPA focuses on training one-step, low-rank adaptors to specifically enhance the under-represented high-frequency abilities of advanced diffusion models. The learned adaptors empower these diffusion models to generate high-quality images in just a single step. Compared with progressive distillation, HiPA achieves much better performance in one-step text-to-image generation (37.3 $\rightarrow$ 23.8 in FID-5k on MS-COCO 2017) and 28.6x training speed-up (108.8 $\rightarrow$ 3.8 A100 GPU days), requiring only 0.04% training parameters (7,740 million $\rightarrow$ 3.3 million). We also demonstrate HiPA's effectiveness in text-guided image editing, inpainting and super-resolution tasks, where our adapted models consistently deliver high-quality outputs in just one diffusion step. The source code will be released.
Keyword: sgd
Steering Deep Feature Learning with Backward Aligned Feature Updates
Keyword: optimization
LayerCollapse: Adaptive compression of neural networks
QuadraNet: Improving High-Order Neural Interaction Efficiency with Hardware-Aware Quadratic Neural Networks
GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
Online Regulation of Dynamical Systems to Solutions of Constrained Optimization Problems
TransOpt: Transformer-based Representation Learning for Optimization Problem Classification
A Data-Driven, Non-Linear, Parameterized Reduced Order Model of Metal 3D Printing
TransNAS-TSAD: Harnessing Transformers for Multi-Objective Neural Architecture Search in Time Series Anomaly Detection
Self-Supervised Learning for Large-Scale Preventive Security Constrained DC Optimal Power Flow
The Forecastability of Underlying Building Electricity Demand from Time Series Data
Data-Driven Kalman Filter using Maximum Likelihood Optimization
Reconstructing the shape and material parameters of dissipative obstacles using an impedance model
Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features
Composition of Nondeterministic and Stochastic Services for LTLf Task Specifications
Throughput Maximization for Intelligent Refracting Surface Assisted mmWave High-Speed Train Communications
PEOPLEx: PEdestrian Opportunistic Positioning LEveraging IMU, UWB, BLE and WiFi
SMaRt: Improving GANs with Score Matching Regularity
Whole-body Dynamic Collision Avoidance with Time-varying Control Barrier Functions
Poisoning Attacks Against Contrastive Recommender Systems
Combined Scheduling, Memory Allocation and Tensor Replacement for Minimizing Off-Chip Data Accesses of DNN Accelerators
Learning for Semantic Knowledge Base-Guided Online Feature Transmission in Dynamic Channels
Advances in 3D Neural Stylization: A Survey
Spherical Designs for Function Approximation and Beyond
Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control
Beamforming Design for Active RIS-Aided Over-the-Air Computation
Advancing Medical Education through the cINnAMON Web Application
RIS-Assisted Generalized Receive Quadrature Spatial Modulation
A Formulation of Structural Design Optimization Problems for Quantum Annealing
Robust-to-Noise Algorithms for Distributed Resource Allocation and Scheduling
Solving the Team Orienteering Problem with Transformers
Local Geometry Determines Global Landscape in Low-rank Factorization for Synchronization
Laplacian" (certificate matrix) at the global minimizer, the optimization landscape is absent of spurious local minima. Our main theorem is purely algebraic and versatile, and it seamlessly applies to all the aforementioned examples: the nonconvex landscape remains benign under almost identical condition that enables the success of the SDR. Additionally, we illustrate that the Burer-Monteiro factorization is robust to
monotone adversaries", mirroring the resilience of the SDR. In other words, introducing ``favorable" adversaries into the data will not result in the emergence of new spurious local minimizers.Geometry-Aware Normalizing Wasserstein Flows for Optimal Causal Inference
Keyword: adam
There is no result
Keyword: gradient
Online Regulation of Dynamical Systems to Solutions of Constrained Optimization Problems
A trainable manifold for accurate approximation with ReLU Networks
DisMech: A Discrete Differential Geometry-based Physical Simulator for Soft Robots and Structures
Deep Reinforcement Learning Based Optimal Energy Management of Multi-energy Microgrids with Uncertainties
DSeg: Direct Line Segments Detection
On Exact Inversion of DPM-Solvers
A Robust Hessian-based Trust Region Algorithm for Spherical Conformal Parameterizations
Data-Agnostic Model Poisoning against Federated Learning: A Graph Autoencoder Approach
Learning Radio Environments by Differentiable Ray Tracing
Automatic Functional Differentiation in JAX
A robust and adaptive GenEO-type domain decomposition preconditioner for $\mathbf{H}(\mathbf{curl})$ problems in general non-convex three-dimensional geometries
Communication-Efficient Federated Optimization over Semi-Decentralized Networks
Geometry-Aware Normalizing Wasserstein Flows for Optimal Causal Inference
One-step Diffusion with Distribution Matching Distillation
Dataset Distillation in Large Data Era
Keyword: super-resolution
PEAN: A Diffusion-based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution
Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing
HiPA: Enabling One-Step Text-to-Image Diffusion Models via High-Frequency-Promoting Adaptation