New submissions for Tue, 2 Jan 24

Keyword: sgd

DiffHybrid-UQ: Uncertainty Quantification for Differentiable Hybrid Neural Modeling

Authors: Authors: Deepak Akhare, Tengfei Luo, Jian-Xun Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.00161
Pdf link: https://arxiv.org/pdf/2401.00161
Abstract The hybrid neural differentiable models mark a significant advancement in the field of scientific machine learning. These models, integrating numerical representations of known physics into deep neural networks, offer enhanced predictive capabilities and show great potential for data-driven modeling of complex physical systems. However, a critical and yet unaddressed challenge lies in the quantification of inherent uncertainties stemming from multiple sources. Addressing this gap, we introduce a novel method, DiffHybrid-UQ, for effective and efficient uncertainty propagation and estimation in hybrid neural differentiable models, leveraging the strengths of deep ensemble Bayesian learning and nonlinear transformations. Specifically, our approach effectively discerns and quantifies both aleatoric uncertainties, arising from data noise, and epistemic uncertainties, resulting from model-form discrepancies and data sparsity. This is achieved within a Bayesian model averaging framework, where aleatoric uncertainties are modeled through hybrid neural models. The unscented transformation plays a pivotal role in enabling the flow of these uncertainties through the nonlinear functions within the hybrid model. In contrast, epistemic uncertainties are estimated using an ensemble of stochastic gradient descent (SGD) trajectories. This approach offers a practical approximation to the posterior distribution of both the network parameters and the physical parameters. Notably, the DiffHybrid-UQ framework is designed for simplicity in implementation and high scalability, making it suitable for parallel computing environments. The merits of the proposed method have been demonstrated through problems governed by both ordinary and partial differentiable equations.
Improving the Privacy and Practicality of Objective Perturbation for Differentially Private Linear Learners
Authors: Authors: Rachel Redberg, Antti Koskela, Yu-Xiang Wang
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.00583
Pdf link: https://arxiv.org/pdf/2401.00583
Abstract In the arena of privacy-preserving machine learning, differentially private stochastic gradient descent (DP-SGD) has outstripped the objective perturbation mechanism in popularity and interest. Though unrivaled in versatility, DP-SGD requires a non-trivial privacy overhead (for privately tuning the model's hyperparameters) and a computational complexity which might be extravagant for simple models such as linear and logistic regression. This paper revamps the objective perturbation mechanism with tighter privacy analyses and new computational tools that boost it to perform competitively with DP-SGD on unconstrained convex generalized linear problems.
Keyword: optimization

Semantic Computing for Organizational Effectiveness: From Organization Theory to Practice through Semantics-Based Modelling
Authors: Authors: Mena Rizk, Daniela Rosu, Mark Fox
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.00062
Pdf link: https://arxiv.org/pdf/2401.00062
Abstract A critical function of an organization is to foster the level of integration (coordination and cooperation) necessary to achieve its objectives. The need to coordinate and motivation to cooperate emerges from the myriad dependencies between an organization's members and their work. Therefore, to reason about solutions to coordination and cooperation problems requires a robust representation that includes the underlying dependencies. We find that such a representation remains missing from formal organizational models, and we leverage semantics to bridge this gap. Drawing on well-established organizational research and our extensive fieldwork with one of North America's largest municipalities, (1) we introduce an ontology, formalized in first-order logic, that operationalizes concepts like outcome, reward, and epistemic dependence, and their links to potential integration risks; and (2) present real-world applications of this ontology to analyze and support integration in complex government infrastructure projects. Our ontology is implemented and validated in both Z3 and OWL. Key features of our model include inferable dependencies, explainable coordination and cooperation risks, and actionable insights on how dependency structures within an organization can be altered to mitigate the risks. Conceptualizing real-world challenges like incentive misalignment, free-riding, and subgoal optimization in terms of dependency structures, our semantics-based approach represents a novel method for modelling and enhancing coordination and cooperation. Integrated within a decision-support system, our model may serve as an impactful aid for organizational design and effectiveness. More broadly, our approach underscores the transformative potential of semantics in deriving tangible, real-world value from existing organization theory.
Particle-Based Shape Modeling for Arbitrary Regions-of-Interest
Authors: Authors: Hong Xu, Alan Morris, Shireen Y. Elhabian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.00067
Pdf link: https://arxiv.org/pdf/2401.00067
Abstract Statistical Shape Modeling (SSM) is a quantitative method for analyzing morphological variations in anatomical structures. These analyses often necessitate building models on targeted anatomical regions of interest to focus on specific morphological features. We propose an extension to \particle-based shape modeling (PSM), a widely used SSM framework, to allow shape modeling to arbitrary regions of interest. Existing methods to define regions of interest are computationally expensive and have topological limitations. To address these shortcomings, we use mesh fields to define free-form constraints, which allow for delimiting arbitrary regions of interest on shape surfaces. Furthermore, we add a quadratic penalty method to the model optimization to enable computationally efficient enforcement of any combination of cutting-plane and free-form constraints. We demonstrate the effectiveness of this method on a challenging synthetic dataset and two medical datasets.
Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations
Authors: Authors: Guojian Wang, Faguo Wu, Xiao Zhang, Tianyuan Chen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.00162
Pdf link: https://arxiv.org/pdf/2401.00162
Abstract The sparsity of reward feedback remains a challenging problem in online deep reinforcement learning (DRL). Previous approaches have utilized temporal credit assignment (CA) to achieve impressive results in multiple hard tasks. However, many CA methods relied on complex architectures or introduced sensitive hyperparameters to estimate the impact of state-action pairs. Meanwhile, the premise of the feasibility of CA methods is to obtain trajectories with sparse rewards, which can be troublesome in sparse-reward environments with large state spaces. To tackle these problems, we propose a simple and efficient algorithm called Policy Optimization with Smooth Guidance (POSG) that leverages a small set of sparse-reward demonstrations to make reliable and effective long-term credit assignments while efficiently facilitating exploration. The key idea is that the relative impact of state-action pairs can be indirectly estimated using offline demonstrations rather than directly leveraging the sparse reward trajectories generated by the agent. Specifically, we first obtain the trajectory importance by considering both the trajectory-level distance to demonstrations and the returns of the relevant trajectories. Then, the guidance reward is calculated for each state-action pair by smoothly averaging the importance of the trajectories through it, merging the demonstration's distribution and reward information. We theoretically analyze the performance improvement bound caused by smooth guidance rewards and derive a new worst-case lower bound on the performance improvement. Extensive results demonstrate POSG's significant advantages in control performance and convergence speed compared to benchmark DRL algorithms. Notably, the specific metrics and quantifiable results are investigated to demonstrate the superiority of POSG.
Block-Level MU-MISO Interference Exploitation Precoding: Optimal Structure and Explicit Duality
Authors: Authors: Junwen Yang, Ang Li, Xuewen Liao, Christos Masouros, A. L. Swindlehurst
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.00166
Pdf link: https://arxiv.org/pdf/2401.00166
Abstract This paper investigates block-level interference exploitation (IE) precoding for multi-user multiple-input single-output (MU-MISO) downlink systems. To overcome the need for symbol-level IE precoding to frequently update the precoding matrix, we propose to jointly optimize all the precoders or transmit signals within a transmission block. The resultant precoders only need to be updated once per block, and while not necessarily constant over all the symbol slots, we refer to the technique as block-level slot-variant IE precoding. Through a careful examination of the optimal structure and the explicit duality inherent in block-level power minimization (PM) and signal-to-interference-plus-noise ratio (SINR) balancing (SB) problems, we discover that the joint optimization can be decomposed into subproblems with smaller variable sizes. As a step further, we propose block-level slot-invariant IE precoding by adding a structural constraint on the slot-variant IE precoding to maintain a constant precoder throughout the block. A novel linear precoder for IE is further presented, and we prove that the proposed slot-variant and slot-invariant IE precoding share an identical solution when the number of symbol slots does not exceed the number of users. Numerical simulations demonstrate that the proposed precoders achieve a significant complexity reduction compared against benchmark schemes, without sacrificing performance.
Multiform Evolution for High-Dimensional Problems with Low Effective Dimensionality
Authors: Authors: Yaqing Hou, Mingyang Sun, Abhishek Gupta, Yaochu Jin, Haiyin Piao, Hongwei Ge, Qiang Zhang
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2401.00168
Pdf link: https://arxiv.org/pdf/2401.00168
Abstract In this paper, we scale evolutionary algorithms to high-dimensional optimization problems that deceptively possess a low effective dimensionality (certain dimensions do not significantly affect the objective function). To this end, an instantiation of the multiform optimization paradigm is presented, where multiple low-dimensional counterparts of a target high-dimensional task are generated via random embeddings. Since the exact relationship between the auxiliary (low-dimensional) tasks and the target is a priori unknown, a multiform evolutionary algorithm is developed for unifying all formulations into a single multi-task setting. The resultant joint optimization enables the target task to efficiently reuse solutions evolved across various low-dimensional searches via cross-form genetic transfers, hence speeding up overall convergence characteristics. To validate the overall efficacy of our proposed algorithmic framework, comprehensive experimental studies are carried out on well-known continuous benchmark functions as well as a set of practical problems in the hyper-parameter tuning of machine learning models and deep learning models in classification tasks and Predator-Prey games, respectively.
A Novel Explanation Against Linear Neural Networks
Authors: Authors: Anish Lakkapragada
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.00186
Pdf link: https://arxiv.org/pdf/2401.00186
Abstract Linear Regression and neural networks are widely used to model data. Neural networks distinguish themselves from linear regression with their use of activation functions that enable modeling nonlinear functions. The standard argument for these activation functions is that without them, neural networks only can model a line. However, a novel explanation we propose in this paper for the impracticality of neural networks without activation functions, or linear neural networks, is that they actually reduce both training and testing performance. Having more parameters makes LNNs harder to optimize, and thus they require more training iterations than linear regression to even potentially converge to the optimal solution. We prove this hypothesis through an analysis of the optimization of an LNN and rigorous testing comparing the performance between both LNNs and linear regression on synthethic, noisy datasets.
Real-Time and Security-Aware Precoding in RIS-Empowered Multi-User Wireless Networks
Authors: Authors: Abuzar B. M. Adam, Mohamed Amine Ouamri, Mohammed Saleh Ali Muthanna, Xingwang Li, Mohammed A. M. Elhassan, Ammar Muthanna
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.00192
Pdf link: https://arxiv.org/pdf/2401.00192
Abstract In this letter, we propose a deep-unfolding-based framework (DUNet) to maximize the secrecy rate in reconfigurable intelligent surface (RIS) empowered multi-user wireless networks. To tailor DUNet, first we relax the problem, decouple it into beamforming and phase shift subproblems, and propose an alternative optimization (AO) based solution for the relaxed problem. Second, we apply Karush-Kuhn-Tucker (KKT) conditions to obtain a closed-form solutions for the beamforming and the phase shift. Using deep-unfolding mechanism, we transform the closed-form solutions into a deep learning model (i.e., DUNet) that achieves a comparable performance to that of AO in terms of accuracy and about 25.6 times faster.
Open-TI: Open Traffic Intelligence with Augmented Language Model
Authors: Authors: Longchao Da, Kuanru Liou, Tiejin Chen, Xuesong Zhou, Xiangyong Luo, Yezhou Yang, Hua Wei
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.00211
Pdf link: https://arxiv.org/pdf/2401.00211
Abstract Transportation has greatly benefited the cities' development in the modern civilization process. Intelligent transportation, leveraging advanced computer algorithms, could further increase people's daily commuting efficiency. However, intelligent transportation, as a cross-discipline, often requires practitioners to comprehend complicated algorithms and obscure neural networks, bringing a challenge for the advanced techniques to be trusted and deployed in practical industries. Recognizing the expressiveness of the pre-trained large language models, especially the potential of being augmented with abilities to understand and execute intricate commands, we introduce Open-TI. Serving as a bridge to mitigate the industry-academic gap, Open-TI is an innovative model targeting the goal of Turing Indistinguishable Traffic Intelligence, it is augmented with the capability to harness external traffic analysis packages based on existing conversations. Marking its distinction, Open-TI is the first method capable of conducting exhaustive traffic analysis from scratch - spanning from map data acquisition to the eventual execution in complex simulations. Besides, Open-TI is able to conduct task-specific embodiment like training and adapting the traffic signal control policies (TSC), explore demand optimizations, etc. Furthermore, we explored the viability of LLMs directly serving as control agents, by understanding the expected intentions from Open-TI, we designed an agent-to-agent communication mode to support Open-TI conveying messages to ChatZero (control agent), and then the control agent would choose from the action space to proceed the execution. We eventually provide the formal implementation structure, and the open-ended design invites further community-driven enhancements.
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
Authors: Authors: Yuanzhao Zhai, Han Zhang, Yu Lei, Yue Yu, Kele Xu, Dawei Feng, Bo Ding, Huaimin Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.00243
Pdf link: https://arxiv.org/pdf/2401.00243
Abstract Reinforcement learning from human feedback (RLHF) emerges as a promising paradigm for aligning large language models (LLMs). However, a notable challenge in RLHF is overoptimization, where beyond a certain threshold, the pursuit of higher rewards leads to a decline in human preferences. In this paper, we observe the weakness of KL regularization which is commonly employed in existing RLHF methods to address overoptimization. To mitigate this limitation, we scrutinize the RLHF objective in the offline dataset and propose uncertainty-penalized RLHF (UP-RLHF), which incorporates uncertainty regularization during RL-finetuning. To enhance the uncertainty quantification abilities for reward models, we first propose a diverse low-rank adaptation (LoRA) ensemble by maximizing the nuclear norm of LoRA matrix concatenations. Then we optimize policy models utilizing penalized rewards, determined by both rewards and uncertainties provided by the diverse reward LoRA ensembles. Our experimental results, based on two real human preference datasets, showcase the effectiveness of diverse reward LoRA ensembles in quantifying reward uncertainty. Additionally, uncertainty regularization in UP-RLHF proves to be pivotal in mitigating overoptimization, thereby contributing to the overall performance.
Dual-space Hierarchical Learning for Goal-guided Conversational Recommendation
Authors: Authors: Can Chen, Hao Liu, Zeming Liu, Xue Liu, Dejing Dou
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2401.00272
Pdf link: https://arxiv.org/pdf/2401.00272
Abstract Proactively and naturally guiding the dialog from the non-recommendation context (e.g., Chit-chat) to the recommendation scenario (e.g., Music) is crucial for the Conversational Recommender System (CRS). Prior studies mainly focus on planning the next dialog goal~(e.g., chat on a movie star) conditioned on the previous dialog. However, we find the dialog goals can be simultaneously observed at different levels, which can be utilized to improve CRS. In this paper, we propose Dual-space Hierarchical Learning (DHL) to leverage multi-level goal sequences and their hierarchical relationships for conversational recommendation. Specifically, we exploit multi-level goal sequences from both the representation space and the optimization space. In the representation space, we propose the hierarchical representation learning where a cross attention module derives mutually enhanced multi-level goal representations. In the optimization space, we devise the hierarchical weight learning to reweight lower-level goal sequences, and introduce bi-level optimization for stable update. Additionally, we propose a soft labeling strategy to guide optimization gradually. Experiments on two real-world datasets verify the effectiveness of our approach. Code and data are available here.
Deep Generative Symbolic Regression
Authors: Authors: Samuel Holt, Zhaozhi Qian, Mihaela van der Schaar
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.00282
Pdf link: https://arxiv.org/pdf/2401.00282
Abstract Symbolic regression (SR) aims to discover concise closed-form mathematical equations from data, a task fundamental to scientific discovery. However, the problem is highly challenging because closed-form equations lie in a complex combinatorial search space. Existing methods, ranging from heuristic search to reinforcement learning, fail to scale with the number of input variables. We make the observation that closed-form equations often have structural characteristics and invariances (e.g., the commutative law) that could be further exploited to build more effective symbolic regression solutions. Motivated by this observation, our key contribution is to leverage pre-trained deep generative models to capture the intrinsic regularities of equations, thereby providing a solid foundation for subsequent optimization steps. We show that our novel formalism unifies several prominent approaches of symbolic regression and offers a new perspective to justify and improve on the previous ad hoc designs, such as the usage of cross-entropy loss during pre-training. Specifically, we propose an instantiation of our framework, Deep Generative Symbolic Regression (DGSR). In our experiments, we show that DGSR achieves a higher recovery rate of true equations in the setting of a larger number of input variables, and it is more computationally efficient at inference time than state-of-the-art RL symbolic regression solutions.
Near-Space Communications: The Last Piece of 6G Space-Air-Ground-Sea Integrated Network Puzzle
Authors: Authors: Hongshan Liu, Tong Qin, Zhen Gao, Keke Ying, Ziwei Wan, Jingjing Zhao, Rui Na, Zhongxiang Li, Gaojie Chen, Chun Hu, Ruiqi Liu, Yikun Mei, Tuan Li, Tianqi Mao, Shuo Wang, Dezhi Zheng
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.00283
Pdf link: https://arxiv.org/pdf/2401.00283
Abstract This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis between the NS-COM network and other counterparts in SAGSIN is conducted, covering aspects of deployment, coverage and channel characteristics. Afterwards, the technical aspects of NS-COM, including channel modeling, random access, channel estimation, array-based beam management and joint network optimization, are examined in detail. Furthermore, we explore the potential applications of NS-COM, such as structural expansion in SAGSIN communications, remote and urgent communications, weather monitoring and carbon neutrality. Finally, some promising research avenues are identified, including near-space-ground direct links, reconfigurable multiple input multiple output (MIMO) array, federated learning assisted NS-COM, maritime communication and free space optical (FSO) communication. Overall, this paper highlights that the NS-COM plays an indispensable role in the SAGSIN puzzle, providing substantial performance and coverage enhancement to the traditional SAGSIN architecture.
Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise
Authors: Authors: Shaan Ul Haque, Sajad Khodadadian, Siva Theja Maguluri
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.00364
Pdf link: https://arxiv.org/pdf/2401.00364
Abstract Stochastic approximation (SA) is an iterative algorithm to find the fixed point of an operator given noisy samples of this operator. SA appears in many areas such as optimization and Reinforcement Learning (RL). When implemented in practice, the noise that appears in the update of RL algorithms is naturally Markovian. Furthermore, in some settings, such as gradient TD, SA is employed in a two-time-scale manner. The mix of Markovian noise along with the two-time-scale structure results in an algorithm which is complex to analyze theoretically. In this paper, we characterize a tight convergence bound for the iterations of linear two-time-scale SA with Markovian noise. Our results show the convergence behavior of this algorithm given various choices of step sizes. Applying our result to the well-known TDC algorithm, we show the first $O(1/\epsilon)$ sample complexity for the convergence of this algorithm, outperforming all the previous work. Similarly, our results can be applied to establish the convergence behavior of a variety of RL algorithms, such as TD-learning with Polyak averaging, GTD, and GTD2.
Real-Time FJ/MAC PDE Solvers via Tensorized, Back-Propagation-Free Optical PINN Training
Authors: Authors: Yequan Zhao, Xian Xian, Xinling Yu, Ziyue Liu, Zhixiong Chen, Geza Kurczveil, Raymond G. Beausoleil, Zheng Zhang
Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.00413
Pdf link: https://arxiv.org/pdf/2401.00413
Abstract Solving partial differential equations (PDEs) numerically often requires huge computing time, energy cost, and hardware resources in practical applications. This has limited their applications in many scenarios (e.g., autonomous systems, supersonic flows) that have a limited energy budget and require near real-time response. Leveraging optical computing, this paper develops an on-chip training framework for physics-informed neural networks (PINNs), aiming to solve high-dimensional PDEs with fJ/MAC photonic power consumption and ultra-low latency. Despite the ultra-high speed of optical neural networks, training a PINN on an optical chip is hard due to (1) the large size of photonic devices, and (2) the lack of scalable optical memory devices to store the intermediate results of back-propagation (BP). To enable realistic optical PINN training, this paper presents a scalable method to avoid the BP process. We also employ a tensor-compressed approach to improve the convergence and scalability of our optical PINN training. This training framework is designed with tensorized optical neural networks (TONN) for scalable inference acceleration and MZI phase-domain tuning for \textit{in-situ} optimization. Our simulation results of a 20-dim HJB PDE show that our photonic accelerator can reduce the number of MZIs by a factor of $1.17\times 10^3$, with only $1.36$ J and $1.15$ s to solve this equation. This is the first real-size optical PINN training framework that can be applied to solve high-dimensional PDEs.
Is It Possible to Backdoor Face Forgery Detection with Natural Triggers?
Authors: Authors: Xiaoxuan Han, Songlin Yang, Wei Wang, Ziwen He, Jing Dong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.00414
Pdf link: https://arxiv.org/pdf/2401.00414
Abstract Deep neural networks have significantly improved the performance of face forgery detection models in discriminating Artificial Intelligent Generated Content (AIGC). However, their security is significantly threatened by the injection of triggers during model training (i.e., backdoor attacks). Although existing backdoor defenses and manual data selection can mitigate those using human-eye-sensitive triggers, such as patches or adversarial noises, the more challenging natural backdoor triggers remain insufficiently researched. To further investigate natural triggers, we propose a novel analysis-by-synthesis backdoor attack against face forgery detection models, which embeds natural triggers in the latent space. We thoroughly study such backdoor vulnerability from two perspectives: (1) Model Discrimination (Optimization-Based Trigger): we adopt a substitute detection model and find the trigger by minimizing the cross-entropy loss; (2) Data Distribution (Custom Trigger): we manipulate the uncommon facial attributes in the long-tailed distribution to generate poisoned samples without the supervision from detection models. Furthermore, to completely evaluate the detection models towards the latest AIGC, we utilize both state-of-the-art StyleGAN and Stable Diffusion for trigger generation. Finally, these backdoor triggers introduce specific semantic features to the generated poisoned samples (e.g., skin textures and smile), which are more natural and robust. Extensive experiments show that our method is superior from three levels: (1) Attack Success Rate: ours achieves a high attack success rate (over 99%) and incurs a small model accuracy drop (below 0.2%) with a low poisoning rate (less than 3%); (2) Backdoor Defense: ours shows better robust performance when faced with existing backdoor defense methods; (3) Human Inspection: ours is less human-eye-sensitive from a comprehensive user study.
From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion
Authors: Authors: Xingyuan Li, Yang Zou, Jinyuan Liu, Zhiying Jiang, Long Ma, Xin Fan, Risheng Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.00421
Pdf link: https://arxiv.org/pdf/2401.00421
Abstract With the rapid progression of deep learning technologies, multi-modality image fusion has become increasingly prevalent in object detection tasks. Despite its popularity, the inherent disparities in how different sources depict scene content make fusion a challenging problem. Current fusion methodologies identify shared characteristics between the two modalities and integrate them within this shared domain using either iterative optimization or deep learning architectures, which often neglect the intricate semantic relationships between modalities, resulting in a superficial understanding of inter-modal connections and, consequently, suboptimal fusion outcomes. To address this, we introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images. This method capitalizes on the complementary characteristics of diverse modalities, bolstering both the accuracy and robustness of object detection. The codebook is utilized to enhance a streamlined and concise depiction of the fused intra- and inter-domain dynamics, fine-tuned for optimal performance in detection tasks. We present a bilevel optimization strategy that establishes a nexus between the joint problem of fusion and detection, optimizing both processes concurrently. Furthermore, we introduce the first dataset of paired infrared and visible images accompanied by text prompts, paving the way for future research. Extensive experiments on several datasets demonstrate that our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
Deeper and Wider Networks for Performance Metrics Prediction in Communication Networks
Authors: Authors: Aijia Liu, Shiqing Liu, Xiaobing Pei
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.00429
Pdf link: https://arxiv.org/pdf/2401.00429
Abstract In today's era, users have increasingly high expectations regarding the performance and efficiency of communication networks. Network operators aspire to achieve efficient network planning, operation, and optimization through Digital Twin Networks (DTN). The effectiveness of DTN heavily relies on the network model, with graph neural networks (GNN) playing a crucial role in network modeling. However, existing network modeling methods still lack a comprehensive understanding of communication networks. In this paper, we propose DWNet (Deeper and Wider Networks), a heterogeneous graph neural network modeling method based on data-driven approaches that aims to address end-to-end latency and jitter prediction in network models. This method stands out due to two distinctive features: firstly, it introduces deeper levels of state participation in the message passing process; secondly, it extensively integrates relevant features during the feature fusion process. Through experimental validation and evaluation, our model achieves higher prediction accuracy compared to previous research achievements, particularly when dealing with unseen network topologies during model training. Our model not only provides more accurate predictions but also demonstrates stronger generalization capabilities across diverse topological structures.
Data-driven Energy Efficiency Modelling in Large-scale Networks: An Expert Knowledge and ML-based Approach
Authors: Authors: D López-Pérez, A De Domenico, N Piovesan, M . Debbah
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.00443
Pdf link: https://arxiv.org/pdf/2401.00443
Abstract The energy consumption of mobile networks poses a critical challenge. Mitigating this concern necessitates the deployment and optimization of network energy-saving solutions, such as carrier shutdown, to dynamically manage network resources. Traditional optimization approaches encounter complexity due to factors like the large number of cells, stochastic traffic, channel variations, and intricate trade-offs. This paper introduces the simulated reality of communication networks (SRCON) framework, a novel, data-driven modeling paradigm that harnesses live network data and employs a blend of machine learning (ML)- and expert-based models. These mix of models accurately characterizes the functioning of network components, and predicts network energy efficiency and user equipment (UE) quality of service for any energy carrier shutdown configuration in a specific network. Distinguishing itself from existing methods, SRCON eliminates the reliance on expensive expert knowledge, drive testing, or incomplete maps for predicting network performance. This paper details the pipeline employed by SRCON to decompose the large network energy efficiency modeling problem into ML and expert-based submodels. It demonstrates how, by embracing stochasticity, and carefully crafting the relationship between such submodels, the overall computational complexity can be reduced and prediction accuracy enhanced. Results derived from real network data underscore the paradigm shift introduced by SRCON, showcasing significant gains over a state-of-the art method used by a operator for network energy efficiency modeling. The reliability of this local, data-driven modeling of the network proves to be a key asset for network energy-saving optimization.
Energy-Efficient Power Control for Multiple-Task Split Inference in UAVs: A Tiny Learning-Based Approach
Authors: Authors: Chenxi Zhao, Min Sheng, Junyu Liu, Tianshu Chu, Jiandong Li
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.00445
Pdf link: https://arxiv.org/pdf/2401.00445
Abstract The limited energy and computing resources of unmanned aerial vehicles (UAVs) hinder the application of aerial artificial intelligence. The utilization of split inference in UAVs garners significant attention due to its effectiveness in mitigating computing and energy requirements. However, achieving energy-efficient split inference in UAVs remains complex considering of various crucial parameters such as energy level and delay constraints, especially involving multiple tasks. In this paper, we present a two-timescale approach for energy minimization in split inference, where discrete and continuous variables are segregated into two timescales to reduce the size of action space and computational complexity. This segregation enables the utilization of tiny reinforcement learning (TRL) for selecting discrete transmission modes for sequential tasks. Moreover, optimization programming (OP) is embedded between TRL's output and reward function to optimize the continuous transmit power. Specifically, we replace the optimization of transmit power with that of transmission time to decrease the computational complexity of OP since we reveal that energy consumption monotonically decreases with increasing transmission time. The replacement significantly reduces the feasible region and enables a fast solution according to the closed-form expression for optimal transmit power. Simulation results show that the proposed algorithm can achieve a higher probability of successful task completion with lower energy consumption.
User Clustering for STAR-RIS Assisted Full-Duplex NOMA Communication Systems
Authors: Authors: Abdelhamid Salem, Kai-Kit Wong, Chan-Byoung Chae, Yangyang Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.00447
Pdf link: https://arxiv.org/pdf/2401.00447
Abstract In contrast to conventional reconfigurable intelligent surface (RIS), simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) has been proposed recently to enlarge the serving area from 180o to 360o coverage. This work considers the performance of a STAR-RIS aided full-duplex (FD) non-orthogonal multiple access (NOMA) communication systems. The STAR-RIS is implemented at the cell-edge to assist the cell-edge users, while the cell-center users can communicate directly with a FD base station (BS). We first introduce new user clustering schemes for the downlink and uplink transmissions. Then, based on the proposed transmission schemes closed-form expressions of the ergodic rates in the downlink and uplink modes are derived taking into account the system impairments caused by the self interference at the FD-BS and the imperfect successive interference cancellation (SIC). Moreover, an optimization problem to maximize the total sum-rate is formulated and solved by optimizing the amplitudes and the phase-shifts of the STAR-RIS elements and allocating the transmit power efficiently. The performance of the proposed user clustering schemes and the optimal STAR-RIS design are investigated through numerical results
On Learning for Ambiguous Chance Constrained Problems
Authors: Authors: A Ch Madhusudanarao, Rahul Singh
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.00547
Pdf link: https://arxiv.org/pdf/2401.00547
Abstract We study chance constrained optimization problems $\min_x f(x)$ s.t. $P(\left{ \theta: g(x,\theta)\le 0 \right})\ge 1-\epsilon$ where $\epsilon\in (0,1)$ is the violation probability, when the distribution $P$ is not known to the decision maker (DM). When the DM has access to a set of distributions $\mathcal{U}$ such that $P$ is contained in $\mathcal{U}$, then the problem is known as the ambiguous chance-constrained problem \cite{erdougan2006ambiguous}. We study ambiguous chance-constrained problem for the case when $\mathcal{U}$ is of the form $\left{\mu:\frac{\mu (y)}{\nu(y)}\leq C, \forall y\in\Theta, \mu(y)\ge 0\right}$, where $\nu$ is a reference distribution.'' We show that in this case the original problem can bewell-approximated'' by a sampled problem in which $N$ i.i.d. samples of $\theta$ are drawn from $\nu$, and the original constraint is replaced with $g(x,\theta_i)\le 0,~i=1,2,\ldots,N$. We also derive the sample complexity associated with this approximation, i.e., for $\epsilon,\delta>0$ the number of samples which must be drawn from $\nu$ so that with a probability greater than $1-\delta$ (over the randomness of $\nu$), the solution obtained by solving the sampled program yields an $\epsilon$-feasible solution for the original chance constrained problem.
GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields
Authors: Authors: Xiao Pan, Zongxin Yang, Shuai Bai, Yi Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.00616
Pdf link: https://arxiv.org/pdf/2401.00616
Abstract In this paper, we focus on the One-shot Novel View Synthesis (O-NVS) task which targets synthesizing photo-realistic novel views given only one reference image per scene. Previous One-shot Generalizable Neural Radiance Fields (OG-NeRF) methods solve this task in an inference-time finetuning-free manner, yet suffer the blurry issue due to the encoder-only architecture that highly relies on the limited reference image. On the other hand, recent diffusion-based image-to-3d methods show vivid plausible results via distilling pre-trained 2D diffusion models into a 3D representation, yet require tedious per-scene optimization. Targeting these issues, we propose the GD^2-NeRF, a Generative Detail compensation framework via GAN and Diffusion that is both inference-time finetuning-free and with vivid plausible details. In detail, following a coarse-to-fine strategy, GD^2-NeRF is mainly composed of a One-stage Parallel Pipeline (OPP) and a 3D-consistent Detail Enhancer (Diff3DE). At the coarse stage, OPP first efficiently inserts the GAN model into the existing OG-NeRF pipeline for primarily relieving the blurry issue with in-distribution priors captured from the training dataset, achieving a good balance between sharpness (LPIPS, FID) and fidelity (PSNR, SSIM). Then, at the fine stage, Diff3DE further leverages the pre-trained image diffusion models to complement rich out-distribution details while maintaining decent 3D consistency. Extensive experiments on both the synthetic and real-world datasets show that GD$^2$-NeRF noticeably improves the details while without per-scene finetuning.
Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models
Authors: Authors: Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, Liang Zhao
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.00625
Pdf link: https://arxiv.org/pdf/2401.00625
Abstract The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated models like OpenAI's ChatGPT, represents a significant advancement in artificial intelligence. These models, however, bring forth substantial challenges in the high consumption of computational, memory, energy, and financial resources, especially in environments with limited resource capabilities. This survey aims to systematically address these challenges by reviewing a broad spectrum of techniques designed to enhance the resource efficiency of LLMs. We categorize methods based on their optimization focus: computational, memory, energy, financial, and network resources and their applicability across various stages of an LLM's lifecycle, including architecture design, pretraining, finetuning, and system design. Additionally, the survey introduces a nuanced categorization of resource efficiency techniques by their specific resource types, which uncovers the intricate relationships and mappings between various resources and corresponding optimization techniques. A standardized set of evaluation metrics and datasets is also presented to facilitate consistent and fair comparisons across different models and techniques. By offering a comprehensive overview of the current sota and identifying open research avenues, this survey serves as a foundational reference for researchers and practitioners, aiding them in developing more sustainable and efficient LLMs in a rapidly evolving landscape.
Adversarially Trained Actor Critic for offline CMDPs
Authors: Authors: Honghao Wei, Xiyue Peng, Xin Liu, Arnob Ghosh
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.00629
Pdf link: https://arxiv.org/pdf/2401.00629
Abstract We propose a Safe Adversarial Trained Actor Critic (SATAC) algorithm for offline reinforcement learning (RL) with general function approximation in the presence of limited data coverage. SATAC operates as a two-player Stackelberg game featuring a refined objective function. The actor (leader player) optimizes the policy against two adversarially trained value critics (follower players), who focus on scenarios where the actor's performance is inferior to the behavior policy. Our framework provides both theoretical guarantees and a robust deep-RL implementation. Theoretically, we demonstrate that when the actor employs a no-regret optimization oracle, SATAC achieves two guarantees: (i) For the first time in the offline RL setting, we establish that SATAC can produce a policy that outperforms the behavior policy while maintaining the same level of safety, which is critical to designing an algorithm for offline RL. (ii) We demonstrate that the algorithm guarantees policy improvement across a broad range of hyperparameters, indicating its practical robustness. Additionally, we offer a practical version of SATAC and compare it with existing state-of-the-art offline safe-RL algorithms in continuous control environments. SATAC outperforms all baselines across a range of tasks, thus validating the theoretical performance.
TBDD: A New Trust-based, DRL-driven Framework for Blockchain Sharding in IoT
Authors: Authors: Zixu Zhang, Guangsheng Yu, Caijun Sun, Xu Wang, Ying Wang, Ming Zhang, Wei Ni, Ren Ping Liu, Andrew Reeves, Nektarios Georgalas
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.00632
Pdf link: https://arxiv.org/pdf/2401.00632
Abstract Integrating sharded blockchain with IoT presents a solution for trust issues and optimized data flow. Sharding boosts blockchain scalability by dividing its nodes into parallel shards, yet it's vulnerable to the $1\%$ attacks where dishonest nodes target a shard to corrupt the entire blockchain. Balancing security with scalability is pivotal for such systems. Deep Reinforcement Learning (DRL) adeptly handles dynamic, complex systems and multi-dimensional optimization. This paper introduces a Trust-based and DRL-driven (\textsc{TbDd}) framework, crafted to counter shard collusion risks and dynamically adjust node allocation, enhancing throughput while maintaining network security. With a comprehensive trust evaluation mechanism, \textsc{TbDd} discerns node types and performs targeted resharding against potential threats. The model maximizes tolerance for dishonest nodes, optimizes node movement frequency, ensures even node distribution in shards, and balances sharding risks. Rigorous evaluations prove \textsc{TbDd}'s superiority over conventional random-, community-, and trust-based sharding methods in shard risk equilibrium and reducing cross-shard transactions.
Hybrid physics-informed metabolic cybergenetics: process rates augmented with machine-learning surrogates informed by flux balance analysis
Authors: Authors: Sebastián Espinel-Ríos, José L. Avalos
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.00670
Pdf link: https://arxiv.org/pdf/2401.00670
Abstract Metabolic cybergenetics is a promising concept that interfaces gene expression and cellular metabolism with computers for real-time dynamic metabolic control. The focus is on control at the transcriptional level, serving as a means to modulate intracellular metabolic fluxes. Recent strategies in this field have employed constraint-based dynamic models for process optimization, control, and estimation. However, this results in bilevel dynamic optimization problems, which pose considerable numerical and conceptual challenges. In this study, we present an alternative hybrid physics-informed dynamic modeling framework for metabolic cybergenetics, aimed at simplifying optimization, control, and estimation tasks. By utilizing machine-learning surrogates, our approach effectively embeds the physics of metabolic networks into the process rates of structurally simpler macro-kinetic models coupled with gene expression. These surrogates, informed by flux balance analysis, link the domains of manipulatable intracellular enzymes to metabolic exchange fluxes. This ensures that critical knowledge captured by the system's metabolic network is preserved. The resulting models can be integrated into metabolic cybergenetic schemes involving single-level optimizations. Additionally, the hybrid modeling approach maintains the number of system states at a necessary minimum, easing the burden of process monitoring and estimation. Our hybrid physics-informed metabolic cybergenetic framework is demonstrated using a computational case study on the optogenetically-assisted production of itaconate by $\textit{Escherichia coli}$.
SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models
Authors: Authors: Jinglong Luo, Yehong Zhang, Jiaqi Zhang, Xin Mu, Hui Wang, Yue Yu, Zenglin Xu
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.00793
Pdf link: https://arxiv.org/pdf/2401.00793
Abstract With the growing use of large language models hosted on cloud platforms to offer inference services, privacy concerns are escalating, especially concerning sensitive data like investment plans and bank account details. Secure Multi-Party Computing (SMPC) emerges as a promising solution to protect the privacy of inference data and model parameters. However, the application of SMPC in Privacy-Preserving Inference (PPI) for large language models, particularly those based on the Transformer architecture, often leads to considerable slowdowns or declines in performance. This is largely due to the multitude of nonlinear operations in the Transformer architecture, which are not well-suited to SMPC and are difficult to circumvent or optimize effectively. To address this concern, we introduce an advanced optimization framework called SecFormer, designed to strike an optimal balance between performance and efficiency in PPI for Transformer models. By implementing knowledge distillation techniques, we successfully eliminate the high-cost exponential and maximum operations in PPI without sacrificing model performance. Additionally, we have developed a suite of efficient SMPC protocols that utilize segmented polynomials and Goldschmidt's method to handle other complex nonlinear functions within PPI, such as GeLU, LayerNorm, and Softmax. Our extensive experiments reveal that SecFormer outperforms MPCFormer in performance, showing improvements of $5.6\%$ and $24.2\%$ for BERT${\text{BASE}}$ and BERT${\text{LARGE}}$, respectively. In terms of efficiency, SecFormer is 3.4 and 3.2 times faster than Puma, demonstrating its effectiveness and speed.
Noise-Aware and Equitable Urban Air Traffic Management: An Optimization Approach
Authors: Authors: Zhenyu Gao, Yue Yu, Qinshuang Wei, Ufuk Topcu, John-Paul Clarke
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.00806
Pdf link: https://arxiv.org/pdf/2401.00806
Abstract Urban air mobility (UAM), a transformative concept for the transport of passengers and cargo, faces several integration challenges in complex urban environments. Community acceptance of aircraft noise is among the most noticeable of these challenges when launching or scaling up a UAM system. Properly managing community noise is fundamental to establishing a UAM system that is environmentally and socially sustainable. In this work, we develop a holistic and equitable approach to manage UAM air traffic and its community noise impact in urban environments. The proposed approach is a hybrid approach that considers a mix of different noise mitigation strategies, including limiting the number of operations, cruising at higher altitudes, and ambient noise masking. We tackle the problem through the lens of network system control and formulate a multi-objective optimization model for managing traffic flow in a multi-layer UAM network while concurrently pursuing demand fulfillment, noise control, and energy saving. Further, we use a social welfare function in the optimization model as the basis for the efficiency-fairness trade-off in both demand fulfillment and noise control. We apply the proposed approach to a comprehensive case study in the city of Austin and perform design trade-offs through both visual and quantitative analyses.
Keyword: adam

A comprehensive framework for occluded human pose estimation
Authors: Authors: Linhao Xu, Lin Zhao, Xinxin Sun, Guangyu Li, Kedong Yan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.00155
Pdf link: https://arxiv.org/pdf/2401.00155
Abstract Occlusion presents a significant challenge in human pose estimation. The challenges posed by occlusion can be attributed to the following factors: 1) Data: The collection and annotation of occluded human pose samples are relatively challenging. 2) Feature: Occlusion can cause feature confusion due to the high similarity between the target person and interfering individuals. 3) Inference: Robust inference becomes challenging due to the loss of complete body structural information. The existing methods designed for occluded human pose estimation usually focus on addressing only one of these factors. In this paper, we propose a comprehensive framework DAG (Data, Attention, Graph) to address the performance degradation caused by occlusion. Specifically, we introduce the mask joints with instance paste data augmentation technique to simulate occlusion scenarios. Additionally, an Adaptive Discriminative Attention Module (ADAM) is proposed to effectively enhance the features of target individuals. Furthermore, we present the Feature-Guided Multi-Hop GCN (FGMP-GCN) to fully explore the prior knowledge of body structure and improve pose estimation results. Through extensive experiments conducted on three benchmark datasets for occluded human pose estimation, we demonstrate that the proposed method outperforms existing methods. Code and data will be publicly available.
Keyword: gradient

DiffHybrid-UQ: Uncertainty Quantification for Differentiable Hybrid Neural Modeling
Authors: Authors: Deepak Akhare, Tengfei Luo, Jian-Xun Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.00161
Pdf link: https://arxiv.org/pdf/2401.00161
Abstract The hybrid neural differentiable models mark a significant advancement in the field of scientific machine learning. These models, integrating numerical representations of known physics into deep neural networks, offer enhanced predictive capabilities and show great potential for data-driven modeling of complex physical systems. However, a critical and yet unaddressed challenge lies in the quantification of inherent uncertainties stemming from multiple sources. Addressing this gap, we introduce a novel method, DiffHybrid-UQ, for effective and efficient uncertainty propagation and estimation in hybrid neural differentiable models, leveraging the strengths of deep ensemble Bayesian learning and nonlinear transformations. Specifically, our approach effectively discerns and quantifies both aleatoric uncertainties, arising from data noise, and epistemic uncertainties, resulting from model-form discrepancies and data sparsity. This is achieved within a Bayesian model averaging framework, where aleatoric uncertainties are modeled through hybrid neural models. The unscented transformation plays a pivotal role in enabling the flow of these uncertainties through the nonlinear functions within the hybrid model. In contrast, epistemic uncertainties are estimated using an ensemble of stochastic gradient descent (SGD) trajectories. This approach offers a practical approximation to the posterior distribution of both the network parameters and the physical parameters. Notably, the DiffHybrid-UQ framework is designed for simplicity in implementation and high scalability, making it suitable for parallel computing environments. The merits of the proposed method have been demonstrated through problems governed by both ordinary and partial differentiable equations.
A unified structure-preserving parametric finite element method for anisotropic surface diffusion
Authors: Authors: Weizhu Bao, Yifei Li
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2401.00207
Pdf link: https://arxiv.org/pdf/2401.00207
Abstract We propose and analyze a unified structure-preserving parametric finite element method (SP-PFEM) for the anisotropic surface diffusion of curves in two dimensions $(d=2)$ and surfaces in three dimensions $(d=3)$ with an arbitrary anisotropic surface energy density $\gamma(\boldsymbol{n})$, where $\boldsymbol{n}\in \mathbb{S}^{d-1}$ represents the outward unit vector. By introducing a novel unified surface energy matrix $\boldsymbol{G}_k(\boldsymbol{n})$ depending on $\gamma(\boldsymbol{n})$, the Cahn--Hoffman $\boldsymbol{\xi}$-vector and a stabilizing function $k(\boldsymbol{n}):\ \mathbb{S}^{d-1}\to {\mathbb R}$, we obtain a unified and conservative variational formulation for the anisotropic surface diffusion via different surface differential operators including the surface gradient operator, the surface divergence operator and the surface Laplace--Beltrami operator. A SP-PFEM discretization is presented for the variational problem. In order to establish the unconditional energy stability of the proposed SP-PFEM under a very mild condition on $\gamma(\boldsymbol{n})$, we propose a new framework via {\sl local energy estimate} for proving energy stability/structure-preserving properties of the parametric finite element method for the anisotropic surface diffusion. This framework sheds light on how to prove unconditional energy stability of other numerical methods for geometric partial differential equations. Extensive numerical results are reported to demonstrate the efficiency and accuracy as well as structure-preserving properties of the proposed SP-PFEM for the anisotropic surface diffusion with arbitrary anisotropic surface energy density $\gamma(\boldsymbol{n})$ arising from different applications.
BusReF: Infrared-Visible images registration and fusion focus on reconstructible area using one set of features
Authors: Authors: Zeyang Zhang, Hui Li, Tianyang Xu, Xiaojun Wu, Josef Kittler
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.00285
Pdf link: https://arxiv.org/pdf/2401.00285
Abstract In a scenario where multi-modal cameras are operating together, the problem of working with non-aligned images cannot be avoided. Yet, existing image fusion algorithms rely heavily on strictly registered input image pairs to produce more precise fusion results, as a way to improve the performance of downstream high-level vision tasks. In order to relax this assumption, one can attempt to register images first. However, the existing methods for registering multiple modalities have limitations, such as complex structures and reliance on significant semantic information. This paper aims to address the problem of image registration and fusion in a single framework, called BusRef. We focus on Infrared-Visible image registration and fusion task (IVRF). In this framework, the input unaligned image pairs will pass through three stages: Coarse registration, Fine registration and Fusion. It will be shown that the unified approach enables more robust IVRF. We also propose a novel training and evaluation strategy, involving the use of masks to reduce the influence of non-reconstructible regions on the loss functions, which greatly improves the accuracy and robustness of the fusion task. Last but not least, a gradient-aware fusion network is designed to preserve the complementary information. The advanced performance of this algorithm is demonstrated by
Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise
Authors: Authors: Shaan Ul Haque, Sajad Khodadadian, Siva Theja Maguluri
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.00364
Pdf link: https://arxiv.org/pdf/2401.00364
Abstract Stochastic approximation (SA) is an iterative algorithm to find the fixed point of an operator given noisy samples of this operator. SA appears in many areas such as optimization and Reinforcement Learning (RL). When implemented in practice, the noise that appears in the update of RL algorithms is naturally Markovian. Furthermore, in some settings, such as gradient TD, SA is employed in a two-time-scale manner. The mix of Markovian noise along with the two-time-scale structure results in an algorithm which is complex to analyze theoretically. In this paper, we characterize a tight convergence bound for the iterations of linear two-time-scale SA with Markovian noise. Our results show the convergence behavior of this algorithm given various choices of step sizes. Applying our result to the well-known TDC algorithm, we show the first $O(1/\epsilon)$ sample complexity for the convergence of this algorithm, outperforming all the previous work. Similarly, our results can be applied to establish the convergence behavior of a variety of RL algorithms, such as TD-learning with Polyak averaging, GTD, and GTD2.
Client-wise Modality Selection for Balanced Multi-modal Federated Learning
Authors: Authors: Yunfeng Fan, Wenchao Xu, Haozhao Wang, Penghui Ruan, Song Guo
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2401.00403
Pdf link: https://arxiv.org/pdf/2401.00403
Abstract Selecting proper clients to participate in the iterative federated learning (FL) rounds is critical to effectively harness a broad range of distributed datasets. Existing client selection methods simply consider the variability among FL clients with uni-modal data, however, have yet to consider clients with multi-modalities. We reveal that traditional client selection scheme in MFL may suffer from a severe modality-level bias, which impedes the collaborative exploitation of multi-modal data, leading to insufficient local data exploration and global aggregation. To tackle this challenge, we propose a Client-wise Modality Selection scheme for MFL (CMSFed) that can comprehensively utilize information from each modality via avoiding such client selection bias caused by modality imbalance. Specifically, in each MFL round, the local data from different modalities are selectively employed to participate in local training and aggregation to mitigate potential modality imbalance of the global model. To approximate the fully aggregated model update in a balanced way, we introduce a novel local training loss function to enhance the weak modality and align the divergent feature spaces caused by inconsistent modality adoption strategies for different clients simultaneously. Then, a modality-level gradient decoupling method is designed to derive respective submodular functions to maintain the gradient diversity during the selection progress and balance MFL according to local modality imbalance in each iteration. Our extensive experiments showcase the superiority of CMSFed over baselines and its effectiveness in multi-modal data exploitation.
Diff-PCR: Diffusion-Based Correspondence Searching in Doubly Stochastic Matrix Space for Point Cloud Registration
Authors: Authors: Qianliang Wu, Haobo Jiang, Yaqing Ding, Lei Luo, Jin Xie, Jian Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.00436
Pdf link: https://arxiv.org/pdf/2401.00436
Abstract Efficiently finding optimal correspondences between point clouds is crucial for solving both rigid and non-rigid point cloud registration problems. Existing methods often rely on geometric or semantic feature embedding to establish correspondences and estimate transformations or flow fields. Recently, state-of-the-art methods have employed RAFT-like iterative updates to refine the solution. However, these methods have certain limitations. Firstly, their iterative refinement design lacks transparency, and their iterative updates follow a fixed path during the refinement process, which can lead to suboptimal results. Secondly, these methods overlook the importance of refining or optimizing correspondences (or matching matrices) as a precursor to solving transformations or flow fields. They typically compute candidate correspondences based on distances in the point feature space. However, they only project the candidate matching matrix into some matrix space once with Sinkhorn or dual softmax operations to obtain final correspondences. This one-shot projected matching matrix may be far from the globally optimal one, and these approaches do not consider the distribution of the target matching matrix. In this paper, we propose a novel approach that exploits the Denoising Diffusion Model to predict a searching gradient for the optimal matching matrix within the Doubly Stochastic Matrix Space. During the reverse denoising process, our method iteratively searches for better solutions along this denoising gradient, which points towards the maximum likelihood direction of the target matching matrix. Our method offers flexibility by allowing the search to start from any initial matching matrix provided by the online backbone or white noise. Experimental evaluations on the 3DMatch/3DLoMatch and 4DMatch/4DLoMatch datasets demonstrate the effectiveness of our newly designed framework.
Improving the Privacy and Practicality of Objective Perturbation for Differentially Private Linear Learners
Authors: Authors: Rachel Redberg, Antti Koskela, Yu-Xiang Wang
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.00583
Pdf link: https://arxiv.org/pdf/2401.00583
Abstract In the arena of privacy-preserving machine learning, differentially private stochastic gradient descent (DP-SGD) has outstripped the objective perturbation mechanism in popularity and interest. Though unrivaled in versatility, DP-SGD requires a non-trivial privacy overhead (for privately tuning the model's hyperparameters) and a computational complexity which might be extravagant for simple models such as linear and logistic regression. This paper revamps the objective perturbation mechanism with tighter privacy analyses and new computational tools that boost it to perform competitively with DP-SGD on unconstrained convex generalized linear problems.
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Authors: Authors: Peihao Wang, Zhiwen Fan, Dejia Xu, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.00604
Pdf link: https://arxiv.org/pdf/2401.00604
Abstract Score distillation has emerged as one of the most prevalent approaches for text-to-3D asset synthesis. Essentially, score distillation updates 3D parameters by lifting and back-propagating scores averaged over different views. In this paper, we reveal that the gradient estimation in score distillation is inherent to high variance. Through the lens of variance reduction, the effectiveness of SDS and VSD can be interpreted as applications of various control variates to the Monte Carlo estimator of the distilled score. Motivated by this rethinking and based on Stein's identity, we propose a more general solution to reduce variance for score distillation, termed Stein Score Distillation (SSD). SSD incorporates control variates constructed by Stein identity, allowing for arbitrary baseline functions. This enables us to include flexible guidance priors and network architectures to explicitly optimize for variance reduction. In our experiments, the overall pipeline, dubbed SteinDreamer, is implemented by instantiating the control variate with a monocular depth estimator. The results suggest that SSD can effectively reduce the distillation variance and consistently improve visual quality for both object- and scene-level generation. Moreover, we demonstrate that SteinDreamer achieves faster convergence than existing methods due to more stable gradient updates.
Keyword: super-resolution

Image Super-resolution Reconstruction Network based on Enhanced Swin Transformer via Alternating Aggregation of Local-Global Features
Authors: Authors: Yuming Huang, Yingpin Chen, Changhui Wu, Hanrong Xie, Binhui Song, Hui Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.00241
Pdf link: https://arxiv.org/pdf/2401.00241
Abstract The Swin Transformer image super-resolution reconstruction network only relies on the long-range relationship of window attention and shifted window attention to explore features. This mechanism has two limitations. On the one hand, it only focuses on global features while ignoring local features. On the other hand, it is only concerned with spatial feature interactions while ignoring channel features and channel interactions, thus limiting its non-linear mapping ability. To address the above limitations, this paper proposes enhanced Swin Transformer modules via alternating aggregation of local-global features. In the local feature aggregation stage, this paper introduces shift convolution to realize the interaction between local spatial information and channel information. This paper proposes a block sparse global perception module in the global feature aggregation stage. This module organizes the spatial information first, then sends the recombination information into a spatial gating unit to implement the further interaction of spatial and channel information. Then, a multi-scale self-attention module and a low-parameter residual channel attention module are introduced to realize information aggregation at different scales. Finally, the proposed network is validated on five publicly available datasets. The experimental results show that the proposed network outperforms the other state-of-the-art super-resolution networks.
UGPNet: Universal Generative Prior for Image Restoration
Authors: Authors: Hwayoon Lee, Kyoungkook Kang, Hyeongmin Lee, Seung-Hwan Baek, Sunghyun Cho
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.00370
Pdf link: https://arxiv.org/pdf/2401.00370
Abstract Recent image restoration methods can be broadly categorized into two classes: (1) regression methods that recover the rough structure of the original image without synthesizing high-frequency details and (2) generative methods that synthesize perceptually-realistic high-frequency details even though the resulting image deviates from the original structure of the input. While both directions have been extensively studied in isolation, merging their benefits with a single framework has been rarely studied. In this paper, we propose UGPNet, a universal image restoration framework that can effectively achieve the benefits of both approaches by simply adopting a pair of an existing regression model and a generative model. UGPNet first restores the image structure of a degraded input using a regression model and synthesizes a perceptually-realistic image with a generative model on top of the regressed output. UGPNet then combines the regressed output and the synthesized output, resulting in a final result that faithfully reconstructs the structure of the original image in addition to perceptually-realistic textures. Our extensive experiments on deblurring, denoising, and super-resolution demonstrate that UGPNet can successfully exploit both regression and generative methods for high-fidelity image restoration.
Diffusion Models, Image Super-Resolution And Everything: A Survey
Authors: Authors: Brian B. Moser, Arundhati S. Shanbhag, Federico Raue, Stanislav Frolov, Sebastian Palacio, Andreas Dengel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); General Literature (cs.GL); Machine Learning (cs.LG); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2401.00736
Pdf link: https://arxiv.org/pdf/2401.00736
Abstract Diffusion Models (DMs) represent a significant advancement in image Super-Resolution (SR), aligning technical image quality more closely with human preferences and expanding SR applications. DMs address critical limitations of previous methods, enhancing overall realism and details in SR images. However, DMs suffer from color-shifting issues, and their high computational costs call for efficient sampling alternatives, underscoring the challenge of balancing computational efficiency and image quality. This survey gives an overview of DMs applied to image SR and offers a detailed analysis that underscores the unique characteristics and methodologies within this domain, distinct from broader existing reviews in the field. It presents a unified view of DM fundamentals and explores research directions, including alternative input domains, conditioning strategies, guidance, corruption spaces, and zero-shot methods. This survey provides insights into the evolution of image SR with DMs, addressing current trends, challenges, and future directions in this rapidly evolving field.
Bracketing is All You Need: Unifying Image Restoration and Enhancement Tasks with Multi-Exposure Images
Authors: Authors: Zhilu Zhang, Shuohao Zhang, Renlong Wu, Zifei Yan, Wangmeng Zuo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.00766
Pdf link: https://arxiv.org/pdf/2401.00766
Abstract It is challenging but highly desired to acquire high-quality photos with clear content in low-light environments. Although multi-image processing methods (using burst, dual-exposure, or multi-exposure images) have made significant progress in addressing this issue, they typically focus exclusively on specific restoration or enhancement tasks, being insufficient in exploiting multi-image. Motivated by that multi-exposure images are complementary in denoising, deblurring, high dynamic range imaging, and super-resolution, we propose to utilize bracketing photography to unify restoration and enhancement tasks in this work. Due to the difficulty in collecting real-world pairs, we suggest a solution that first pre-trains the model with synthetic paired data and then adapts it to real-world unlabeled images. In particular, a temporally modulated recurrent network (TMRNet) and self-supervised adaptation method are proposed. Moreover, we construct a data simulation pipeline to synthesize pairs and collect real-world images from 200 nighttime scenarios. Experiments on both datasets show that our method performs favorably against the state-of-the-art multi-image processing ones. The dataset, code, and pre-trained models are available at https://github.com/cszhilu1998/BracketIRE.

zoq / arxiv-updates

New submissions for Tue, 2 Jan 24 #677

Keyword: sgd

DiffHybrid-UQ: Uncertainty Quantification for Differentiable Hybrid Neural Modeling

Improving the Privacy and Practicality of Objective Perturbation for Differentially Private Linear Learners

Keyword: optimization

Semantic Computing for Organizational Effectiveness: From Organization Theory to Practice through Semantics-Based Modelling

Particle-Based Shape Modeling for Arbitrary Regions-of-Interest

Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations

Block-Level MU-MISO Interference Exploitation Precoding: Optimal Structure and Explicit Duality

Multiform Evolution for High-Dimensional Problems with Low Effective Dimensionality

A Novel Explanation Against Linear Neural Networks

Real-Time and Security-Aware Precoding in RIS-Empowered Multi-User Wireless Networks

Open-TI: Open Traffic Intelligence with Augmented Language Model

Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles

Dual-space Hierarchical Learning for Goal-guided Conversational Recommendation

Deep Generative Symbolic Regression

Near-Space Communications: The Last Piece of 6G Space-Air-Ground-Sea Integrated Network Puzzle

Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise

Real-Time FJ/MAC PDE Solvers via Tensorized, Back-Propagation-Free Optical PINN Training

Is It Possible to Backdoor Face Forgery Detection with Natural Triggers?

From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion

Deeper and Wider Networks for Performance Metrics Prediction in Communication Networks

Data-driven Energy Efficiency Modelling in Large-scale Networks: An Expert Knowledge and ML-based Approach

Energy-Efficient Power Control for Multiple-Task Split Inference in UAVs: A Tiny Learning-Based Approach

User Clustering for STAR-RIS Assisted Full-Duplex NOMA Communication Systems

On Learning for Ambiguous Chance Constrained Problems

GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models

Adversarially Trained Actor Critic for offline CMDPs

TBDD: A New Trust-based, DRL-driven Framework for Blockchain Sharding in IoT

Hybrid physics-informed metabolic cybergenetics: process rates augmented with machine-learning surrogates informed by flux balance analysis

SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models

Noise-Aware and Equitable Urban Air Traffic Management: An Optimization Approach

Keyword: adam

A comprehensive framework for occluded human pose estimation

Keyword: gradient

DiffHybrid-UQ: Uncertainty Quantification for Differentiable Hybrid Neural Modeling

A unified structure-preserving parametric finite element method for anisotropic surface diffusion

BusReF: Infrared-Visible images registration and fusion focus on reconstructible area using one set of features

Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise

Client-wise Modality Selection for Balanced Multi-modal Federated Learning

Diff-PCR: Diffusion-Based Correspondence Searching in Doubly Stochastic Matrix Space for Point Cloud Registration

Improving the Privacy and Practicality of Objective Perturbation for Differentially Private Linear Learners

SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

Keyword: super-resolution

Image Super-resolution Reconstruction Network based on Enhanced Swin Transformer via Alternating Aggregation of Local-Global Features

UGPNet: Universal Generative Prior for Image Restoration

Diffusion Models, Image Super-Resolution And Everything: A Survey

Bracketing is All You Need: Unifying Image Restoration and Enhancement Tasks with Multi-Exposure Images