Abstract
Peer-to-peer deep learning algorithms are enabling distributed edge devices to collaboratively train deep neural networks without exchanging raw training data or relying on a central server. Peer-to-Peer Learning (P2PL) and other algorithms based on Distributed Local-Update Stochastic/mini-batch Gradient Descent (local DSGD) rely on interleaving epochs of training with distributed consensus steps. This process leads to model parameter drift/divergence amongst participating devices in both IID and non-IID settings. We observe that model drift results in significant oscillations in test performance evaluated after local training and consensus phases. We then identify factors that amplify performance oscillations and demonstrate that our novel approach, P2PL with Affinity, dampens test performance oscillations in non-IID settings without incurring any additional communication cost.
Parallel Trust-Region Approaches in Neural Network Training: Beyond Traditional Methods
Authors: Authors: Ken Trotti, Samuel A. Cruz Alegría, Alena Kopaničáková, Rolf Krause
Abstract
We propose to train neural networks (NNs) using a novel variant of the ``Additively Preconditioned Trust-region Strategy'' (APTS). The proposed method is based on a parallelizable additive domain decomposition approach applied to the neural network's parameters. Built upon the TR framework, the APTS method ensures global convergence towards a minimizer. Moreover, it eliminates the need for computationally expensive hyper-parameter tuning, as the TR algorithm automatically determines the step size in each iteration. We demonstrate the capabilities, strengths, and limitations of the proposed APTS training method by performing a series of numerical experiments. The presented numerical study includes a comparison with widely used training methods such as SGD, Adam, LBFGS, and the standard TR method.
Keyword: optimization
Towards Fair Graph Federated Learning via Incentive Mechanisms
Authors: Authors: Chenglu Pan, Jiarong Xu, Yue Yu, Ziqi Yang, Qingbiao Wu, Chunping Wang, Lei Chen, Yang Yang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
Abstract
Graph federated learning (FL) has emerged as a pivotal paradigm enabling multiple agents to collaboratively train a graph model while preserving local data privacy. Yet, current efforts overlook a key issue: agents are self-interested and would hesitant to share data without fair and satisfactory incentives. This paper is the first endeavor to address this issue by studying the incentive mechanism for graph federated learning. We identify a unique phenomenon in graph federated learning: the presence of agents posing potential harm to the federation and agents contributing with delays. This stands in contrast to previous FL incentive mechanisms that assume all agents contribute positively and in a timely manner. In view of this, this paper presents a novel incentive mechanism tailored for fair graph federated learning, integrating incentives derived from both model gradient and payoff. To achieve this, we first introduce an agent valuation function aimed at quantifying agent contributions through the introduction of two criteria: gradient alignment and graph diversity. Moreover, due to the high heterogeneity in graph federated learning, striking a balance between accuracy and fairness becomes particularly crucial. We introduce motif prototypes to enhance accuracy, communicated between the server and agents, enhancing global model aggregation and aiding agents in local model optimization. Extensive experiments show that our model achieves the best trade-off between accuracy and the fairness of model gradient, as well as superior payoff fairness.
Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks
Abstract
Backpropagation (BP) has been a successful optimization technique for deep learning models. However, its limitations, such as backward- and update-locking, and its biological implausibility, hinder the concurrent updating of layers and do not mimic the local learning processes observed in the human brain. To address these issues, recent research has suggested using local error signals to asynchronously train network blocks. However, this approach often involves extensive trial-and-error iterations to determine the best configuration for local training. This includes decisions on how to decouple network blocks and which auxiliary networks to use for each block. In our work, we introduce a novel BP-free approach: a block-wise BP-free (BWBPF) neural network that leverages local error signals to optimize distinct sub-neural networks separately, where the global loss is only responsible for updating the output layer. The local error signals used in the BP-free model can be computed in parallel, enabling a potential speed-up in the weight update process through parallel implementation. Our experimental results consistently show that this approach can identify transferable decoupled architectures for VGG and ResNet variations, outperforming models trained with end-to-end backpropagation and other state-of-the-art block-wise learning techniques on datasets such as CIFAR-10 and Tiny-ImageNet. The code is released at https://github.com/Belis0811/BWBPF.
Ternary-type Opacity and Hybrid Odometry for RGB-only NeRF-SLAM
Authors: Authors: Junru Lin, Asen Nachkov, Songyou Peng, Luc Van Gool, Danda Pani Paudel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The opacity of rigid 3D scenes with opaque surfaces is considered to be of a binary type. However, we observed that this property is not followed by the existing RGB-only NeRF-SLAM. Therefore, we are motivated to introduce this prior into the RGB-only NeRF-SLAM pipeline. Unfortunately, the optimization through the volumetric rendering function does not facilitate easy integration of the desired prior. Instead, we observed that the opacity of ternary-type (TT) is well supported. In this work, we study why ternary-type opacity is well-suited and desired for the task at hand. In particular, we provide theoretical insights into the process of jointly optimizing radiance and opacity through the volumetric rendering process. Through exhaustive experiments on benchmark datasets, we validate our claim and provide insights into the optimization process, which we believe will unleash the potential of RGB-only NeRF-SLAM. To foster this line of research, we also propose a simple yet novel visual odometry scheme that uses a hybrid combination of volumetric and warping-based image renderings. More specifically, the proposed hybrid odometry (HO) additionally uses image warping-based coarse odometry, leading up to an order of magnitude final speed-up. Furthermore, we show that the proposed TT and HO well complement each other, offering state-of-the-art results on benchmark datasets in terms of both speed and accuracy.
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
Authors: Authors: Arnav Singhvi, Manish Shetty, Shangyin Tan, Christopher Potts, Koushik Sen, Matei Zaharia, Omar Khattab
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
Abstract
Chaining language model (LM) calls as composable modules is fueling a new powerful way of programming. However, ensuring that LMs adhere to important constraints remains a key challenge, one often addressed with heuristic "prompt engineering". We introduce LM Assertions, a new programming construct for expressing computational constraints that LMs should satisfy. We integrate our constructs into the recent DSPy programming model for LMs, and present new strategies that allow DSPy to compile programs with arbitrary LM Assertions into systems that are more reliable and more accurate. In DSPy, LM Assertions can be integrated at compile time, via automatic prompt optimization, and/or at inference time, via automatic selfrefinement and backtracking. We report on two early case studies for complex question answering (QA), in which the LM program must iteratively retrieve information in multiple hops and synthesize a long-form answer with citations. We find that LM Assertions improve not only compliance with imposed rules and guidelines but also enhance downstream task performance, delivering intrinsic and extrinsic gains up to 35.7% and 13.3%, respectively. Our reference implementation of LM Assertions is integrated into DSPy at https://github.com/stanfordnlp/dspy
Enhancing Optimization Through Innovation: The Multi-Strategy Improved Black Widow Optimization Algorithm (MSBWOA)
Authors: Authors: Xin Xu
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
This paper introduces a Multi-Strategy Improved Black Widow Optimization Algorithm (MSBWOA), designed to enhance the performance of the standard Black Widow Algorithm (BW) in solving complex optimization problems. The proposed algorithm integrates four key strategies: initializing the population using Tent chaotic mapping to enhance diversity and initial exploratory capability; implementing mutation optimization on the least fit individuals to maintain dynamic population and prevent premature convergence; incorporating a non-linear inertia weight to balance global exploration and local exploitation; and adding a random perturbation strategy to enhance the algorithm's ability to escape local optima. Evaluated through a series of standard test functions, the MSBWOA demonstrates significant performance improvements in various dimensions, particularly in convergence speed and solution quality. Experimental results show that compared to the traditional BW algorithm and other existing optimization methods, the MSBWOA exhibits better stability and efficiency in handling a variety of optimization problems. These findings validate the effectiveness of the proposed strategies and offer a new solution approach for complex optimization challenges.
Meta-Learning with Versatile Loss Geometries for Fast Adaptation Using Mirror Descent
Authors: Authors: Yilang Zhang, Bingcong Li, Georgios B. Giannakis
Abstract
Utilizing task-invariant prior knowledge extracted from related tasks, meta-learning is a principled framework that empowers learning a new task especially when data records are limited. A fundamental challenge in meta-learning is how to quickly "adapt" the extracted prior in order to train a task-specific model within a few optimization steps. Existing approaches deal with this challenge using a preconditioner that enhances convergence of the per-task training process. Though effective in representing locally a quadratic training loss, these simple linear preconditioners can hardly capture complex loss geometries. The present contribution addresses this limitation by learning a nonlinear mirror map, which induces a versatile distance metric to enable capturing and optimizing a wide range of loss geometries, hence facilitating the per-task training. Numerical tests on few-shot learning datasets demonstrate the superior expressiveness and convergence of the advocated approach.
Adaptive Decision-Objective Loss for Forecast-then-Optimize in Power Systems
Authors: Authors: Haipeng Zhang, Ran Li, Mingyang Sun, Teng Fei
Abstract
Forecast-then-optimize is a widely-used framework for decision-making problems in power systems. Traditionally, statistical losses have been employed to train forecasting models, but recent research demonstrated that improved decision utility in downstream optimization tasks can be achieved by using decision loss as an alternative. However, the implementation of decision loss in power systems faces challenges in 1) accommodating multi-stage decision-making problems where upstream optimality cannot guarantee final optimality; 2) adapting to dynamic environments such as changing parameters and nature of the problem like continuous or discrete optimization tasks. To this end, this paper proposes a novel adaptive decision-objective loss (ADOL) to address the above challenges. Specifically, ADOL first redefines the decision loss as objective utilities rather than objective loss to eliminate the need to manually set the optimal decision, thus ensuring the globally optimal decision. ADOL enables one-off training in a dynamic environment by introducing additional variables. The differentiability and convexity of ADOL provide useful gradients for forecasting model training in conjunction with continuous and discrete optimization tasks. Experiments are conducted for both linear programming-based and mixed integer linear programming-based power system two-stage dispatching cases with changing costs, and the results show that the proposed ADOL is capable of achieving globally optimal decision-making and adaptability to dynamic environments. The method can be extended to other multi-stage tasks in complex systems.
Secure Information Embedding in Images with Hybrid Firefly Algorithm
Abstract
Various methods have been proposed to secure access to sensitive information over time, such as the many cryptographic methods in use to facilitate secure communications on the internet. But other methods like steganography have been overlooked which may be more suitable in cases where the act of transmission of sensitive information itself should remain a secret. Multiple techniques that are commonly discussed for such scenarios suffer from low capacity and high distortion in the output signal. This research introduces a novel steganographic approach for concealing a confidential portable document format (PDF) document within a host image by employing the Hybrid Firefly algorithm (HFA) proposed to select the pixel arrangement. This algorithm combines two widely used optimization algorithms to improve their performance. The suggested methodology utilizes the HFA algorithm to conduct a search for optimal pixel placements in the spatial domain. The purpose of this search is to accomplish two main goals: increasing the host image's capacity and reducing distortion. Moreover, the proposed approach intends to reduce the time required for the embedding procedure. The findings indicate a decrease in image distortion and an accelerated rate of convergence in the search process. The resultant embeddings exhibit robustness against steganalytic assaults, hence rendering the identification of the embedded data a formidable undertaking.
MindOpt Adapter for CPLEX Benchmarking Performance Analysis
Authors: Authors: Mou Sun, Tao Li, Wotao Yin
Subjects: Mathematical Software (cs.MS); Optimization and Control (math.OC)
Abstract
This report provides a comprehensive analysis of the performance of MindOpt Adapter for CPLEX 12.9 in benchmark testing. CPLEX, recognized as a robust Mixed Integer Programming (MIP) solver, has faced some scrutiny regarding its performance on MIPLIB 2017 when configured to default settings. MindOpt Adapter aims to enhance CPLEX's performance by automatically applying improved configurations for solving optimization problems. Our testing demonstrates that MindOpt Adapter for CPLEX yields successfully solved 230 of the 240 problems in the MIPLIB 2017 benchmark set. This performance surpasses all the other solvers in terms of the number of problems solved and the geometric mean of running times. The report provides a comparison of the benchmark results against the outcomes achieved by CPLEX under its default configuration.
The Fuse XORier Lookup Table: Exploration, Implementation, and Revision of Probabilistic Sets and Maps
Abstract
This paper presents an exploration, implementations, and revisions of probabilistic sets and maps, specifically focusing on Bloomier filters and related data structures. The paper introduces the Fuse XORier Lookup Table (FXLT), an enhanced version of the Bloomier Filter incorporating spatial coupling, linear construction, and optimizations. The authors provide implementations in C and Python, comparing the FXLT's performance with other data structures like bloom filters, XOR filters, binary fuse filters, hash tables, and red-black trees. The FXLT demonstrates improvements in both space and time efficiency over traditional Bloomier Filters and appears competitive with hash tables for large datasets.
Time Lower Bounds for the Metropolis Process and Simulated Annealing
Authors: Authors: Zongchen Chen, Dan Mikulincer, Daniel Reichman, Alexander S. Wein
Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Probability (math.PR)
Abstract
The Metropolis process (MP) and Simulated Annealing (SA) are stochastic local search heuristics that are often used in solving combinatorial optimization problems. Despite significant interest, there are very few theoretical results regarding the quality of approximation obtained by MP and SA (with polynomially many iterations) for NP-hard optimization problems. We provide rigorous lower bounds for MP and SA with respect to the classical maximum independent set problem when the algorithms are initialized from the empty set. We establish the existence of a family of graphs for which both MP and SA fail to find approximate solutions in polynomial time. More specifically, we show that for any $\varepsilon \in (0,1)$ there are $n$-vertex graphs for which the probability SA (when limited to polynomially many iterations) will approximate the optimal solution within ratio $\Omega\left(\frac{1}{n^{1-\varepsilon}}\right)$ is exponentially small. Our lower bounds extend to graphs of constant average degree $d$, illustrating the failure of MP to achieve an approximation ratio of $\Omega\left(\frac{\log (d)}{d}\right)$ in polynomial time. In some cases, our impossibility results also go beyond Simulated Annealing and apply even when the temperature is chosen adaptively. Finally, we prove time lower bounds when the inputs to these algorithms are bipartite graphs, and even trees, which are known to admit polynomial-time algorithms for the independent set problem.
Energy Efficiency Maximization for Intelligent Surfaces Aided Massive MIMO with Zero
Authors: Authors: Wilson de Souza Junior, Taufik Abrao
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
In this work, we address the energy efficiency (EE) maximization problem in a downlink communication system utilizing reconfigurable intelligent surface (RIS) in a multi-user massive multiple-input multiple-output (mMIMO) setup with zero-forcing (ZF) precoding. The channel between the base station (BS) and RIS operates under a Rician fading with Rician factor K1. Since systematically optimizing the RIS phase shifts in each channel coherence time interval is challenging and burdensome, we employ the statistical channel state information (CSI)-based optimization strategy to alleviate this overhead. By treating the RIS phase shifts matrix as a constant over multiple channel coherence time intervals, we can reduce the computational complexity while maintaining an interesting performance. Based on an ergodic rate (ER) lower bound closed-form, the EE optimization problem is formulated. Such a problem is non-convex and challenging to tackle due to the coupled variables. To circumvent such an obstacle, we explore the sequential optimization approach where the power allocation vector p, the number of antennas M, and the RIS phase shifts v are separated and sequentially solved iteratively until convergence. With the help of the Lagrangian dual method, fractional programming (FP) techniques, and Lemma 1, insightful compact closed-form expressions for each of the three optimization variables are derived. Simulation results validate the effectiveness of the proposed method across different generalized channel scenarios, including non-line-of-sight (NLoS) and partially line-of-sight (LoS) conditions. This underscores its potential to significantly reduce power consumption, decrease the number of active antennas at the base station, and effectively incorporate RIS structure in mMIMO communication setup with just statistical CSI knowledge.
Hierarchical Optimization of Metaheuristic Algorithms and Federated Learning for Enhanced Capacity Management and Load Balancing in HetNets
Authors: Authors: Saimin Chen Zhang
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
This research introduces a revolutionary paradigm for HetNet management, presenting an innovative algorithmic framework that transcends traditional notions of network capacity enhancement. Our exploration delves into the intricate interplay among distinct components, weaving together metaheuristic algorithms, Neural Networks optimization, and Federated Learning approaches. The primary focus is on optimizing capacity in IoT-based heterogeneous networks while ensuring impeccable coverage and data reliability. Employing multi-layer optimization methods, we propose a dynamic model for optimal transmission strategy, strategically allocating replicas within cloud computing environments to curtail data access costs. Our algorithm not only discerns optimal data replication locations but also navigates the delicate balance between spectral efficiency and ergodic capacity in cellular IoT networks with small cells using on/off control. The orchestrated interplay between metaheuristic algorithms, Neural Networks optimization, and Federated Learning orchestrates resource reallocation, attaining an optimal balance between spectral efficiency, power utility, and ergodic capacity based on Quality of Service (QoS) requirements. Simulation results corroborate the efficacy of our approach, showcasing enhanced tradeoffs between spectral efficiency and total ergodic capacity with diminished outage probability compared to prevailing algorithms across diverse scenarios.
Trochoid Search Optimization
Authors: Authors: Abdesslem Layeb
Subjects: Neural and Evolutionary Computing (cs.NE)
Abstract
This paper introduces the Trochoid Search Optimization Algorithm (TSO), a novel metaheuristic leveraging the mathematical properties of trochoid curves. The TSO algorithm employs a unique combination of simultaneous translational and rotational motions inherent in trochoids, fostering a refined equilibrium between explorative and exploitative search capabilities. Notably, TSO consists of two pivotal phases global and local search that collectively contribute to its efficiency and efficacy. Experimental validation demonstrates the TSO algorithm's remarkable performance across various benchmark functions, showcasing its competitive edge in balancing exploration and exploitation within the search space. A distinguishing feature of TSO lies in its simplicity, marked by a minimal requirement for user-defined parameters, making it an accessible yet powerful optimization tool.
Topology Learning for Heterogeneous Decentralized Federated Learning over Unreliable D2D Networks
Authors: Authors: Zheshun Wu, Zenglin Xu, Dun Zeng, Junfan Li, Jie Liu
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
With the proliferation of intelligent mobile devices in wireless device-to-device (D2D) networks, decentralized federated learning (DFL) has attracted significant interest. Compared to centralized federated learning (CFL), DFL mitigates the risk of central server failures due to communication bottlenecks. However, DFL faces several challenges, such as the severe heterogeneity of data distributions in diverse environments, and the transmission outages and package errors caused by the adoption of the User Datagram Protocol (UDP) in D2D networks. These challenges often degrade the convergence of training DFL models. To address these challenges, we conduct a thorough theoretical convergence analysis for DFL and derive a convergence bound. By defining a novel quantity named unreliable links-aware neighborhood discrepancy in this convergence bound, we formulate a tractable optimization objective, and develop a novel Topology Learning method considering the Representation Discrepancy and Unreliable Links in DFL, named ToLRDUL. Intensive experiments under both feature skew and label skew settings have validated the effectiveness of our proposed method, demonstrating improved convergence speed and test accuracy, consistent with our theoretical findings.
Free Space Optical Integrated Sensing and Communication Based on DCO-OFDM: Performance Metrics and Resource Allocation
Authors: Authors: Yunfeng Wen, Fang Yang, Jian Song, Zhu Han
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Optimization and Control (math.OC)
Abstract
As one of the six usage scenarios of the sixth generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered considerable attention, and numerous studies have been conducted on radio-frequency (RF)-ISAC. Benefitting from the communication and sensing capabilities of an optical system, free space optical (FSO)-ISAC becomes a potential complement to RF-ISAC. In this paper, a direct-current-biased optical orthogonal frequency division multiplexing (DCO-OFDM) scheme is proposed for FSO-ISAC. To derive the spectral efficiency for communication and the Fisher information for sensing as performance metrics, we model the clipping noise of DCO-OFDM as additive colored Gaussian noise to obtain the expression of the signal-to-noise ratio. Based on the derived performance metrics, joint power allocation problems are formulated for both communication-centric and sensing-centric scenarios. In addition, the non-convex joint optimization problems are decomposed into sub-problems for DC bias and subcarriers, which can be solved by block coordinate descent algorithms. Furthermore, numerical simulations demonstrate the proposed algorithms and reveal the trade-off between communication and sensing functionalities of the OFDM-based FSO-ISAC system.
Cross-Layer Optimization for Fault-Tolerant Deep Learning
Authors: Authors: Qing Zhang, Cheng Liu, Bo Liu, Haitong Huang, Ying Wang, Huawei Li, Xiaowei Li
Abstract
Fault-tolerant deep learning accelerator is the basis for highly reliable deep learning processing and critical to deploy deep learning in safety-critical applications such as avionics and robotics. Since deep learning is known to be computing- and memory-intensive, traditional fault-tolerant approaches based on redundant computing will incur substantial overhead including power consumption and chip area. To this end, we propose to characterize deep learning vulnerability difference across both neurons and bits of each neuron, and leverage the vulnerability difference to enable selective protection of the deep learning processing components from the perspective of architecture layer and circuit layer respectively. At the same time, we observe the correlation between model quantization and bit protection overhead of the underlying processing elements of deep learning accelerators, and propose to reduce the bit protection overhead by adding additional quantization constrain without compromising the model accuracy. Finally, we employ Bayesian optimization strategy to co-optimize the correlated cross-layer design parameters at algorithm layer, architecture layer, and circuit layer to minimize the hardware resource consumption while fulfilling multiple user constraints including reliability, accuracy, and performance of the deep learning processing at the same time.
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Authors: Authors: Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Text-guided diffusion models have revolutionized image and video generation and have also been successfully used for optimization-based 3D object synthesis. Here, we instead focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects using score distillation methods with an additional temporal dimension. Compared to previous work, we pursue a novel compositional generation-based approach, and combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization, thereby simultaneously enforcing temporal consistency, high-quality visual appearance and realistic geometry. Our method, called Align Your Gaussians (AYG), leverages dynamic 3D Gaussian Splatting with deformation fields as 4D representation. Crucial to AYG is a novel method to regularize the distribution of the moving 3D Gaussians and thereby stabilize the optimization and induce motion. We also propose a motion amplification mechanism as well as a new autoregressive synthesis scheme to generate and combine multiple 4D sequences for longer generation. These techniques allow us to synthesize vivid dynamic scenes, outperform previous work qualitatively and quantitatively and achieve state-of-the-art text-to-4D performance. Due to the Gaussian 4D representation, different 4D animations can be seamlessly combined, as we demonstrate. AYG opens up promising avenues for animation, simulation and digital content creation as well as synthetic data generation.
Optimal Beamforming for Secure Integrated Sensing and Communication Exploiting Target Location Distribution
Authors: Authors: Kaiyue Hou, Shuowen Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
In this paper, we study a secure integrated sensing and communication (ISAC) system where one multi-antenna base station (BS) simultaneously communicates with one single-antenna user and senses the location parameter of a target which serves as a potential eavesdropper via its reflected echo signals. In particular, we consider a challenging scenario where the target's location is unknown and random, while its distribution information is known a priori. First, we derive the posterior Cram\'er-Rao bound (PCRB) of the mean-squared error (MSE) in target location sensing, which has a complicated expression. To draw more insights, we derive a tight approximation of it in closed form, which indicates that the transmit beamforming should achieve a "probability-dependent power focusing" effect over possible target locations, with more power focused on highly-probable locations. Next, considering an artificial noise based beamforming structure, we formulate the transmit beamforming optimization problem to maximize the worst-case secrecy rate among all possible target (eavesdropper) locations, subject to a threshold on the sensing PCRB. The formulated problem is non-convex and difficult to solve. We show that the problem can be solved via a two-stage method, by first obtaining the optimal beamforming corresponding to any given threshold on the signal-to-interference-plus-noise ratio (SINR) at the eavesdropper, and then obtaining the optimal threshold via one-dimensional search. By applying the semi-definite relaxation (SDR) technique, we relax the first problem into a convex form and further prove that the relaxation is tight, based on which the optimal solution of the original beamforming optimization problem can be obtained with polynomial-time complexity. Then, we further propose two suboptimal solutions with lower complexity. Numerical results validate the effectiveness of our designs.
Age of Actuation and Timeliness: Semantics in a Wireless Power Transfer System
Authors: Authors: Ali Nikkhah, Anthony Ephremides, Nikolaos Pappas
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
In this paper, we investigate a model relevant to semantics-aware goal-oriented communications, and we propose a new metric that incorporates the utilization of information in addition to its timelines. Specifically, we consider the transmission of observations from an external process to a battery-powered receiver through status updates. These updates inform the receiver about the process status and enable actuation if sufficient energy is available to achieve a goal. We focus on a wireless power transfer (WPT) model, where the receiver receives energy from a dedicated power transmitter and occasionally from the data transmitter when they share a common channel. We analyze the Age of Information (AoI) and propose a new metric, the \textit{Age of Actuation (AoA), which is relevant when the receiver utilizes the status updates to perform actions in a timely manner}. We provide analytical characterizations of the average AoA and the violation probability of the AoA, demonstrating that AoA generalizes AoI. Moreover, we introduce and analytically characterize the \textit{Probability of Missing Actuation (PoMA)}; this metric becomes relevant also \textit{to quantify the incurred cost of a missed action}. We formulate unconstrained and constrained optimization problems for all the metrics and present numerical evaluations of our analytical results. This proposed set of metrics goes beyond the traditional timeliness metrics since the synergy of different flows is now considered.
Controllable 3D Face Generation with Conditional Style Code Diffusion
Authors: Authors: Xiaolong Shen, Jianxin Ma, Chang Zhou, Zongxin Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Generating photorealistic 3D faces from given conditions is a challenging task. Existing methods often rely on time-consuming one-by-one optimization approaches, which are not efficient for modeling the same distribution content, e.g., faces. Additionally, an ideal controllable 3D face generation model should consider both facial attributes and expressions. Thus we propose a novel approach called TEx-Face(TExt & Expression-to-Face) that addresses these challenges by dividing the task into three components, i.e., 3D GAN Inversion, Conditional Style Code Diffusion, and 3D Face Decoding. For 3D GAN inversion, we introduce two methods which aim to enhance the representation of style codes and alleviate 3D inconsistencies. Furthermore, we design a style code denoiser to incorporate multiple conditions into the style code and propose a data augmentation strategy to address the issue of insufficient paired visual-language data. Extensive experiments conducted on FFHQ, CelebA-HQ, and CelebA-Dialog demonstrate the promising performance of our TEx-Face in achieving the efficient and controllable generation of photorealistic 3D faces. The code will be available at https://github.com/sxl142/TEx-Face.
Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration
Authors: Authors: Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard
Abstract
Pufferfish privacy is a flexible generalization of differential privacy that allows to model arbitrary secrets and adversary's prior knowledge about the data. Unfortunately, designing general and tractable Pufferfish mechanisms that do not compromise utility is challenging. Furthermore, this framework does not provide the composition guarantees needed for a direct use in iterative machine learning algorithms. To mitigate these issues, we introduce a R\'enyi divergence-based variant of Pufferfish and show that it allows us to extend the applicability of the Pufferfish framework. We first generalize the Wasserstein mechanism to cover a wide range of noise distributions and introduce several ways to improve its utility. We also derive stronger guarantees against out-of-distribution adversaries. Finally, as an alternative to composition, we prove privacy amplification results for contractive noisy iterations and showcase the first use of Pufferfish in private convex optimization. A common ingredient underlying our results is the use and extension of shift reduction lemmas.
Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian Score Climbing
Authors: Authors: Hany Abdulsamad, Sahel Iqbal, Adrien Corenflos, Simo Särkkä
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
Stochastic optimal control of dynamical systems is a crucial challenge in sequential decision-making. Recently, control-as-inference approaches have had considerable success, providing a viable risk-sensitive framework to address the exploration-exploitation dilemma. Nonetheless, a majority of these techniques only invoke the inference-control duality to derive a modified risk objective that is then addressed within a reinforcement learning framework. This paper introduces a novel perspective by framing risk-sensitive stochastic control as Markovian score climbing under samples drawn from a conditional particle filter. Our approach, while purely inference-centric, provides asymptotically unbiased estimates for gradient-based policy optimization with optimal importance weighting and no explicit value function learning. To validate our methodology, we apply it to the task of learning neural non-Gaussian feedback policies, showcasing its efficacy on numerical benchmarks of stochastic dynamical systems.
Abstract
Targetless IMU-LiDAR extrinsic calibration methods are gaining significant attention as the importance of the IMU-LiDAR fusion system increases. Notably, existing calibration methods derive calibration parameters under the assumption that the methods require full motion in all axes. When IMU and LiDAR are mounted on a ground robot the motion of which is restricted to planar motion, existing calibration methods are likely to exhibit degraded performance. To address this issue, we present GRIL-Calib: a novel targetless Ground Robot IMU-LiDAR Calibration method. Our proposed method leverages ground information to compensate for the lack of unrestricted full motion. First, we propose LiDAR Odometry (LO) using ground plane residuals to enhance calibration accuracy. Second, we propose the Ground Plane Motion (GPM) constraint and incorporate it into the optimization for calibration, enabling the determination of full 6-DoF extrinsic parameters, including theoretically unobservable direction. Finally, unlike baseline methods, we formulate the calibration not as sequential two optimizations but as a single optimization (SO) problem, solving all calibration parameters simultaneously and improving accuracy. We validate our \textit{GRIL-Calib} by applying it to three public real-world datasets and comparing its performance with that of existing state-of-the-art methods in terms of accuracy and robustness. Our code is available at https://github.com/Taeyoung96/GRIL-Calib.
Towards Cooperative VRUs: Optimal Positioning Sampling for Pedestrian Awareness Messages
Authors: Authors: Jorge Martín-Pérez, Oscar Amador, Markus Rydeberg, Linnéa Olsson, Alexey Vinel
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Road safety is the main motivation for Cooperative Intelligent Transport Systems (C-ITS) in general, and vehicular communications (V2X) technology in particular. The V2X-based Vulnerable Road User (VRU) protection is an approach that relies on the persistent broadcasting of "beacon" awareness messages by a VRU mobile device. To this end the European Telecommunications Standards Institute (ETSI) has specified the Vulnerable Road User Awareness Message (VAM) as well as the overall ITS-G5 protocol stack enabling a variety of the V2X applications. This article studies how often pedestrians (a type of VRU) should check their position to issue a VAM. To that end, we characterize the rate at which pedestrians generate VAMs leveraging a recognized mobility model, and formulate an optimization problem to minimize the time elapsed between VAMs. We propose an algorithm to solve the problem in 802.11p and assess its accuracy through numerical and simulation campaigns. Results evidence the accuracy of our VAM rate characterization, and evidence that we decrease ETSI positioning sampling rate by more than 30%. On top, our solution decreases the time between VAMs, and increases the packet delivery ratio. In other words, our approach increases the pedestrians safety while reducing the battery consumption of mobile devices.
Keyword: adam
Parallel Trust-Region Approaches in Neural Network Training: Beyond Traditional Methods
Authors: Authors: Ken Trotti, Samuel A. Cruz Alegría, Alena Kopaničáková, Rolf Krause
Abstract
We propose to train neural networks (NNs) using a novel variant of the ``Additively Preconditioned Trust-region Strategy'' (APTS). The proposed method is based on a parallelizable additive domain decomposition approach applied to the neural network's parameters. Built upon the TR framework, the APTS method ensures global convergence towards a minimizer. Moreover, it eliminates the need for computationally expensive hyper-parameter tuning, as the TR algorithm automatically determines the step size in each iteration. We demonstrate the capabilities, strengths, and limitations of the proposed APTS training method by performing a series of numerical experiments. The presented numerical study includes a comparison with widely used training methods such as SGD, Adam, LBFGS, and the standard TR method.
Keyword: gradient
RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
Abstract
Simulation plays a crucial role in the development of autonomous vehicles (AVs) due to the potential risks associated with real-world testing. Although significant progress has been made in the visual aspects of simulators, generating complex behavior among agents remains a formidable challenge. It is not only imperative to ensure realism in the scenarios generated but also essential to incorporate preferences and conditions to facilitate controllable generation for AV training and evaluation. Traditional methods, mainly relying on memorizing the distribution of training datasets, often fall short in generating unseen scenarios. Inspired by the success of retrieval augmented generation in large language models, we present RealGen, a novel retrieval-based in-context learning framework for traffic scenario generation. RealGen synthesizes new scenarios by combining behaviors from multiple retrieved examples in a gradient-free way, which may originate from templates or tagged scenarios. This in-context learning framework endows versatile generative capabilities, including the ability to edit scenarios, compose various behaviors, and produce critical scenarios. Evaluations show that RealGen offers considerable flexibility and controllability, marking a new direction in the field of controllable traffic scenario generation. Check our project website for more information: https://realgen.github.io.
Towards Fair Graph Federated Learning via Incentive Mechanisms
Authors: Authors: Chenglu Pan, Jiarong Xu, Yue Yu, Ziqi Yang, Qingbiao Wu, Chunping Wang, Lei Chen, Yang Yang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Social and Information Networks (cs.SI)
Abstract
Graph federated learning (FL) has emerged as a pivotal paradigm enabling multiple agents to collaboratively train a graph model while preserving local data privacy. Yet, current efforts overlook a key issue: agents are self-interested and would hesitant to share data without fair and satisfactory incentives. This paper is the first endeavor to address this issue by studying the incentive mechanism for graph federated learning. We identify a unique phenomenon in graph federated learning: the presence of agents posing potential harm to the federation and agents contributing with delays. This stands in contrast to previous FL incentive mechanisms that assume all agents contribute positively and in a timely manner. In view of this, this paper presents a novel incentive mechanism tailored for fair graph federated learning, integrating incentives derived from both model gradient and payoff. To achieve this, we first introduce an agent valuation function aimed at quantifying agent contributions through the introduction of two criteria: gradient alignment and graph diversity. Moreover, due to the high heterogeneity in graph federated learning, striking a balance between accuracy and fairness becomes particularly crucial. We introduce motif prototypes to enhance accuracy, communicated between the server and agents, enhancing global model aggregation and aiding agents in local model optimization. Extensive experiments show that our model achieves the best trade-off between accuracy and the fairness of model gradient, as well as superior payoff fairness.
Multi-Model Wireless Federated Learning with Downlink Beamforming
Authors: Authors: Chong Zhang, Min Dong, Ben Liang, Ali Afana, Yahia Ahmed
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This paper studies the design of wireless federated learning (FL) for simultaneously training multiple machine learning models. We consider round robin device-model assignment and downlink beamforming for concurrent multiple model updates. After formulating the joint downlink-uplink transmission process, we derive the per-model global update expression over communication rounds, capturing the effect of beamforming and noisy reception. To maximize the multi-model training convergence rate, we derive an upper bound on the optimality gap of the global model update and use it to formulate a multi-group multicast beamforming problem. We show that this problem can be converted to minimizing the sum of inverse signal-to-interference-plus-noise ratios (SINRs), which can be solved efficiently by projected gradient descent. Simulation shows that our proposed multi-model FL solution outperforms other alternatives, including conventional single-model sequential training and multi-model zero-forcing beamforming.
Adaptive Decision-Objective Loss for Forecast-then-Optimize in Power Systems
Authors: Authors: Haipeng Zhang, Ran Li, Mingyang Sun, Teng Fei
Abstract
Forecast-then-optimize is a widely-used framework for decision-making problems in power systems. Traditionally, statistical losses have been employed to train forecasting models, but recent research demonstrated that improved decision utility in downstream optimization tasks can be achieved by using decision loss as an alternative. However, the implementation of decision loss in power systems faces challenges in 1) accommodating multi-stage decision-making problems where upstream optimality cannot guarantee final optimality; 2) adapting to dynamic environments such as changing parameters and nature of the problem like continuous or discrete optimization tasks. To this end, this paper proposes a novel adaptive decision-objective loss (ADOL) to address the above challenges. Specifically, ADOL first redefines the decision loss as objective utilities rather than objective loss to eliminate the need to manually set the optimal decision, thus ensuring the globally optimal decision. ADOL enables one-off training in a dynamic environment by introducing additional variables. The differentiability and convexity of ADOL provide useful gradients for forecasting model training in conjunction with continuous and discrete optimization tasks. Experiments are conducted for both linear programming-based and mixed integer linear programming-based power system two-stage dispatching cases with changing costs, and the results show that the proposed ADOL is capable of achieving globally optimal decision-making and adaptability to dynamic environments. The method can be extended to other multi-stage tasks in complex systems.
Sequential Multiuser Scheduling and Power Allocation for Cell-Free Multiple-Antenna Networks
Authors: Authors: S. Mashdour, A. Schmeink, R. C. de Lamare, J. P. Sales
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Resource allocation is a fundamental task in cell-free (CF) massive multi-input multi-output (MIMO) systems, which can effectively improve the network performance. In this paper, we study the downlink of CF MIMO networks with network clustering and linear precoding, and develop a sequential multiuser scheduling and power allocation scheme. In particular, we present a multiuser scheduling algorithm based on greedy techniques and a gradient ascent {(GA)} power allocation algorithm for sum-rate maximization when imperfect channel state information (CSI) is considered. Numerical results show the superiority of the proposed sequential scheduling and power allocation scheme and algorithms to existing approaches while reducing the computational complexity and the signaling load.
Abstract
The capacity to generalize to future unseen data stands as one of the utmost crucial attributes of deep neural networks. Sharpness-Aware Minimization (SAM) aims to enhance the generalizability by minimizing worst-case loss using one-step gradient ascent as an approximation. However, as training progresses, the non-linearity of the loss landscape increases, rendering one-step gradient ascent less effective. On the other hand, multi-step gradient ascent will incur higher training cost. In this paper, we introduce a normalized Hessian trace to accurately measure the curvature of loss landscape on {\em both} training and test sets. In particular, to counter excessive non-linearity of loss landscape, we propose Curvature Regularized SAM (CR-SAM), integrating the normalized Hessian trace as a SAM regularizer. Additionally, we present an efficient way to compute the trace via finite differences with parallelism. Our theoretical analysis based on PAC-Bayes bounds establishes the regularizer's efficacy in reducing generalization error. Empirical evaluation on CIFAR and ImageNet datasets shows that CR-SAM consistently enhances classification performance for ResNet and Vision Transformer (ViT) models across various datasets. Our code is available at https://github.com/TrustAIoT/CR-SAM.
Automatic Curriculum Learning with Gradient Reward Signals
Abstract
This paper investigates the impact of using gradient norm reward signals in the context of Automatic Curriculum Learning (ACL) for deep reinforcement learning (DRL). We introduce a framework where the teacher model, utilizing the gradient norm information of a student model, dynamically adapts the learning curriculum. This approach is based on the hypothesis that gradient norms can provide a nuanced and effective measure of learning progress. Our experimental setup involves several reinforcement learning environments (PointMaze, AntMaze, and AdroitHandRelocate), to assess the efficacy of our method. We analyze how gradient norm rewards influence the teacher's ability to craft challenging yet achievable learning sequences, ultimately enhancing the student's performance. Our results show that this approach not only accelerates the learning process but also leads to improved generalization and adaptability in complex tasks. The findings underscore the potential of gradient norm signals in creating more efficient and robust ACL systems, opening new avenues for research in curriculum learning and reinforcement learning.
Peer-to-Peer Learning + Consensus with Non-IID Data
Authors: Authors: Srinivasa Pranav, José M. F. Moura
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Peer-to-peer deep learning algorithms are enabling distributed edge devices to collaboratively train deep neural networks without exchanging raw training data or relying on a central server. Peer-to-Peer Learning (P2PL) and other algorithms based on Distributed Local-Update Stochastic/mini-batch Gradient Descent (local DSGD) rely on interleaving epochs of training with distributed consensus steps. This process leads to model parameter drift/divergence amongst participating devices in both IID and non-IID settings. We observe that model drift results in significant oscillations in test performance evaluated after local training and consensus phases. We then identify factors that amplify performance oscillations and demonstrate that our novel approach, P2PL with Affinity, dampens test performance oscillations in non-IID settings without incurring any additional communication cost.
A Learning oriented DLP System based on Classification Model
Authors: Authors: Kishu Gupta, Ashwani Kush
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Information Retrieval (cs.IR)
Abstract
Data is the key asset for organizations and data sharing is lifeline for organization growth; which may lead to data loss. Data leakage is the most critical issue being faced by organizations. In order to mitigate the data leakage issues data leakage prevention systems (DLPSs) are deployed at various levels by the organizations. DLPSs are capable to protect all kind of data i.e. DAR, DIM/DIT, DIU. Statistical analysis, regular expression, data fingerprinting are common approaches exercised in DLP system. Out of these techniques; statistical analysis approach is most appropriate for proposed DLP model of data security. This paper defines a statistical DLP model for document classification. Model uses various statistical approaches like TF-IDF (Term Frequency- Inverse Document Frequency) a renowned term count/weighing function, Vectorization, Gradient boosting document classification etc. to classify the documents before allowing any access to it. Machine learning is used to test and train the model. Proposed model also introduces an extremely efficient and more accurate approach; IGBCA (Improvised Gradient Boosting Classification Algorithm); for document classification, to prevent them from possible data leakage. Results depicts that proposed model can classify documents with high accuracy and on basis of which data can be prevented from being loss.
On the convergence of loss and uncertainty-based active learning algorithms
Authors: Authors: Daniel Haimovich, Dima Karamshuk, Fridolin Linder, Niek Tax, Milan Vojnovic
Abstract
We study convergence rates of loss and uncertainty-based active learning algorithms under various assumptions. First, we provide a set of conditions under which a convergence rate guarantee holds, and use this for linear classifiers and linearly separable datasets to show convergence rate guarantees for loss-based sampling and different loss functions. Second, we provide a framework that allows us to derive convergence rate bounds for loss-based sampling by deploying known convergence rate bounds for stochastic gradient descent algorithms. Third, and last, we propose an active learning algorithm that combines sampling of points and stochastic Polyak's step size. We show a condition on the sampling that ensures a convergence rate guarantee for this algorithm for smooth convex loss functions. Our numerical results demonstrate efficiency of our proposed algorithm.
On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods
Authors: Authors: Anh Duc Nguyen, Tuan Dung Nguyen, Quang Minh Nguyen, Hoang H. Nguyen, Kim-Chuan Toh
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract
This paper studies the Partial Optimal Transport (POT) problem between two unbalanced measures with at most $n$ supports and its applications in various AI tasks such as color transfer or domain adaptation. There is hence the need for fast approximations of POT with increasingly large problem sizes in arising applications. We first theoretically and experimentally investigate the infeasibility of the state-of-the-art Sinkhorn algorithm for POT due to its incompatible rounding procedure, which consequently degrades its qualitative performance in real world applications like point-cloud registration. To this end, we propose a novel rounding algorithm for POT, and then provide a feasible Sinkhorn procedure with a revised computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon^4)$. Our rounding algorithm also permits the development of two first-order methods to approximate the POT problem. The first algorithm, Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD), finds an $\varepsilon$-approximate solution to the POT problem in $\mathcal{\widetilde O}(n^{2.5}/\varepsilon)$, which is better in $\varepsilon$ than revised Sinkhorn. The second method, Dual Extrapolation, achieves the computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon)$, thereby being the best in the literature. We further demonstrate the flexibility of POT compared to standard OT as well as the practicality of our algorithms on real applications where two marginal distributions are unbalanced.
Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian Score Climbing
Authors: Authors: Hany Abdulsamad, Sahel Iqbal, Adrien Corenflos, Simo Särkkä
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract
Stochastic optimal control of dynamical systems is a crucial challenge in sequential decision-making. Recently, control-as-inference approaches have had considerable success, providing a viable risk-sensitive framework to address the exploration-exploitation dilemma. Nonetheless, a majority of these techniques only invoke the inference-control duality to derive a modified risk objective that is then addressed within a reinforcement learning framework. This paper introduces a novel perspective by framing risk-sensitive stochastic control as Markovian score climbing under samples drawn from a conditional particle filter. Our approach, while purely inference-centric, provides asymptotically unbiased estimates for gradient-based policy optimization with optimal importance weighting and no explicit value function learning. To validate our methodology, we apply it to the task of learning neural non-Gaussian feedback policies, showcasing its efficacy on numerical benchmarks of stochastic dynamical systems.
Keyword: super-resolution
ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training
Abstract
Despite significant advancements in medical vision-language pre-training, existing methods have largely overlooked the inherent entity-specific context within radiology reports and the complex cross-modality contextual relationships between text and images. To close this gap, we propose a novel Entity-centered Context-aware Medical Vision-language Pre-training (ECAMP) framework, which is designed to enable a more entity-centered and context-sensitive interpretation of medical data. Utilizing the recent powerful large language model, we distill entity-centered context from medical reports, which enables ECAMP to gain more effective supervision from the text modality. By further pre-training our model with carefully designed entity-aware, context-enhanced masked language modeling and context-guided super-resolution tasks, ECAMP significantly refines the interplay between text and image modalities, leading to an enhanced ability to extract entity-centered contextual features. Besides, our proposed multi-scale context fusion design also improves the semantic integration of both coarse and fine-level image representations, prompting better performance for multi-scale downstream applications. Combining these components leads to significant performance leaps over current state-of-the-art methods and establishes a new standard for cross-modality learning in medical imaging, whose effectiveness is demonstrated by our extensive experiments on various tasks including classification, segmentation, and detection across several public datasets. Code and models are available at https://github.com/ToniChopp/ECAMP.
EPNet: An Efficient Pyramid Network for Enhanced Single-Image Super-Resolution with Reduced Computational Requirements
Authors: Authors: Xin Xu, Jinman Park, Paul Fieguth
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Single-image super-resolution (SISR) has seen significant advancements through the integration of deep learning. However, the substantial computational and memory requirements of existing methods often limit their practical application. This paper introduces a new Efficient Pyramid Network (EPNet) that harmoniously merges an Edge Split Pyramid Module (ESPM) with a Panoramic Feature Extraction Module (PFEM) to overcome the limitations of existing methods, particularly in terms of computational efficiency. The ESPM applies a pyramid-based channel separation strategy, boosting feature extraction while maintaining computational efficiency. The PFEM, a novel fusion of CNN and Transformer structures, enables the concurrent extraction of local and global features, thereby providing a panoramic view of the image landscape. Our architecture integrates the PFEM in a manner that facilitates the streamlined exchange of feature information and allows for the further refinement of image texture details. Experimental results indicate that our model outperforms existing state-of-the-art methods in image resolution quality, while considerably decreasing computational and memory costs. This research contributes to the ongoing evolution of efficient and practical SISR methodologies, bearing broader implications for the field of computer vision.
A Comprehensive End-to-End Computer Vision Framework for Restoration and Recognition of Low-Quality Engineering Drawings
Abstract
The digitization of engineering drawings is crucial for efficient reuse, distribution, and archiving. Existing computer vision approaches for digitizing engineering drawings typically assume the input drawings have high quality. However, in reality, engineering drawings are often blurred and distorted due to improper scanning, storage, and transmission, which may jeopardize the effectiveness of existing approaches. This paper focuses on restoring and recognizing low-quality engineering drawings, where an end-to-end framework is proposed to improve the quality of the drawings and identify the graphical symbols on them. The framework uses K-means clustering to classify different engineering drawing patches into simple and complex texture patches based on their gray level co-occurrence matrix statistics. Computer vision operations and a modified Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) model are then used to improve the quality of the two types of patches, respectively. A modified Faster Region-based Convolutional Neural Network (Faster R-CNN) model is used to recognize the quality-enhanced graphical symbols. Additionally, a multi-stage task-driven collaborative learning strategy is proposed to train the modified ESRGAN and Faster R-CNN models to improve the resolution of engineering drawings in the direction that facilitates graphical symbol recognition, rather than human visual perception. A synthetic data generation method is also proposed to construct quality-degraded samples for training the framework. Experiments on real-world electrical diagrams show that the proposed framework achieves an accuracy of 98.98% and a recall of 99.33%, demonstrating its superiority over previous approaches. Moreover, the framework is integrated into a widely-used power system software application to showcase its practicality.
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
Abstract
Recent progress in text-guided image inpainting, based on the unprecedented success of text-to-image diffusion models, has led to exceptionally realistic and visually plausible results. However, there is still significant potential for improvement in current text-to-image inpainting models, particularly in better aligning the inpainted area with user prompts and performing high-resolution inpainting. Therefore, in this paper we introduce HD-Painter, a completely training-free approach that accurately follows to prompts and coherently scales to high-resolution image inpainting. To this end, we design the Prompt-Aware Introverted Attention (PAIntA) layer enhancing self-attention scores by prompt information and resulting in better text alignment generations. To further improve the prompt coherence we introduce the Reweighting Attention Score Guidance (RASG) mechanism seamlessly integrating a post-hoc sampling strategy into general form of DDIM to prevent out-of-distribution latent shifts. Moreover, HD-Painter allows extension to larger scales by introducing a specialized super-resolution technique customized for inpainting, enabling the completion of missing regions in images of up to 2K resolution. Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches qualitatively and quantitatively, achieving an impressive generation accuracy improvement of 61.4% vs 51.9%. We will make the codes publicly available at: https://github.com/Picsart-AI-Research/HD-Painter
Keyword: sgd
Peer-to-Peer Learning + Consensus with Non-IID Data
Parallel Trust-Region Approaches in Neural Network Training: Beyond Traditional Methods
Keyword: optimization
Towards Fair Graph Federated Learning via Incentive Mechanisms
Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks
Ternary-type Opacity and Hybrid Odometry for RGB-only NeRF-SLAM
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
Enhancing Optimization Through Innovation: The Multi-Strategy Improved Black Widow Optimization Algorithm (MSBWOA)
Meta-Learning with Versatile Loss Geometries for Fast Adaptation Using Mirror Descent
Adaptive Decision-Objective Loss for Forecast-then-Optimize in Power Systems
Secure Information Embedding in Images with Hybrid Firefly Algorithm
MindOpt Adapter for CPLEX Benchmarking Performance Analysis
The Fuse XORier Lookup Table: Exploration, Implementation, and Revision of Probabilistic Sets and Maps
Time Lower Bounds for the Metropolis Process and Simulated Annealing
Energy Efficiency Maximization for Intelligent Surfaces Aided Massive MIMO with Zero
Hierarchical Optimization of Metaheuristic Algorithms and Federated Learning for Enhanced Capacity Management and Load Balancing in HetNets
Trochoid Search Optimization
Topology Learning for Heterogeneous Decentralized Federated Learning over Unreliable D2D Networks
Free Space Optical Integrated Sensing and Communication Based on DCO-OFDM: Performance Metrics and Resource Allocation
Cross-Layer Optimization for Fault-Tolerant Deep Learning
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Optimal Beamforming for Secure Integrated Sensing and Communication Exploiting Target Location Distribution
Age of Actuation and Timeliness: Semantics in a Wireless Power Transfer System
Controllable 3D Face Generation with Conditional Style Code Diffusion
Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration
Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian Score Climbing
GRIL-Calib: Targetless Ground Robot IMU-LiDAR Extrinsic Calibration Method using Ground Plane Motion Constraints
Towards Cooperative VRUs: Optimal Positioning Sampling for Pedestrian Awareness Messages
Keyword: adam
Parallel Trust-Region Approaches in Neural Network Training: Beyond Traditional Methods
Keyword: gradient
RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
Towards Fair Graph Federated Learning via Incentive Mechanisms
Multi-Model Wireless Federated Learning with Downlink Beamforming
Adaptive Decision-Objective Loss for Forecast-then-Optimize in Power Systems
Sequential Multiuser Scheduling and Power Allocation for Cell-Free Multiple-Antenna Networks
CR-SAM: Curvature Regularized Sharpness-Aware Minimization
Automatic Curriculum Learning with Gradient Reward Signals
Peer-to-Peer Learning + Consensus with Non-IID Data
A Learning oriented DLP System based on Classification Model
On the convergence of loss and uncertainty-based active learning algorithms
On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods
Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian Score Climbing
Keyword: super-resolution
ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training
EPNet: An Efficient Pyramid Network for Enhanced Single-Image Super-Resolution with Reduced Computational Requirements
A Comprehensive End-to-End Computer Vision Framework for Restoration and Recognition of Low-Quality Engineering Drawings
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models