Abstract
Intent-aware session recommendation (ISR) is pivotal in discerning user intents within sessions for precise predictions. Traditional approaches, however, face limitations due to their presumption of a uniform number of intents across all sessions. This assumption overlooks the dynamic nature of user sessions, where the number and type of intentions can significantly vary. In addition, these methods typically operate in latent spaces, thus hinder the model's transparency.Addressing these challenges, we introduce a novel ISR approach, utilizing the advanced reasoning capabilities of large language models (LLMs). First, this approach begins by generating an initial prompt that guides LLMs to predict the next item in a session, based on the varied intents manifested in user sessions. Then, to refine this process, we introduce an innovative prompt optimization mechanism that iteratively self-reflects and adjusts prompts. Furthermore, our prompt selection module, built upon the LLMs' broad adaptability, swiftly selects the most optimized prompts across diverse domains. This new paradigm empowers LLMs to discern diverse user intents at a semantic level, leading to more accurate and interpretable session recommendations. Our extensive experiments on three real-world datasets demonstrate the effectiveness of our method, marking a significant advancement in ISR systems.
Adaptive Proximal Policy Optimization with Upper Confidence Bound
Abstract
Trust Region Policy Optimization (TRPO) attractively optimizes the policy while constraining the update of the new policy within a trust region, ensuring the stability and monotonic optimization. Building on the theoretical guarantees of trust region optimization, Proximal Policy Optimization (PPO) successfully enhances the algorithm's sample efficiency and reduces deployment complexity by confining the update of the new and old policies within a surrogate trust region. However, this approach is limited by the fixed setting of surrogate trust region and is not sufficiently adaptive, because there is no theoretical proof that the optimal clipping bound remains consistent throughout the entire training process, truncating the ratio of the new and old policies within surrogate trust region can ensure that the algorithm achieves its best performance, therefore, exploring and researching a dynamic clip bound for improving PPO's performance can be quite beneficial. To design an adaptive clipped trust region and explore the dynamic clip bound's impact on the performance of PPO, we introduce an adaptive PPO-CLIP (Adaptive-PPO) method that dynamically explores and exploits the clip bound using a bandit during the online training process. Furthermore, ample experiments will initially demonstrate that our Adaptive-PPO exhibits sample efficiency and performance compared to PPO-CLIP.
Multi Armed Bandit based Resource Allocation in Near Memory Processing Architectures
Abstract
Recent advances in 3D fabrication have allowed handling the memory bottlenecks for modern data-intensive applications by bringing the computation closer to the memory, enabling Near Memory Processing (NMP). Memory Centric Networks (MCN) are advanced memory architectures that use NMP architectures, where multiple stacks of the 3D memory units are equipped with simple processing cores, allowing numerous threads to execute concurrently. The performance of the NMP is crucially dependent upon the efficient task offloading and task-to-NMP allocation. Our work presents a multi-armed bandit (MAB) based approach in formulating an efficient resource allocation strategy for MCN. Most existing literature concentrates only on one application domain and optimizing only one metric, i.e., either execution time or power. However, our solution is more generic and can be applied to diverse application domains. In our approach, we deploy Upper Confidence Bound (UCB) policy to collect rewards and eventually use it for regret optimization. We study the following metrics: instructions per cycle, execution times, NMP core cache misses, packet latencies, and power consumption. Our study covers various applications from PARSEC and SPLASH2 benchmarks suite. The evaluation shows that the system's performance improves by ~11% on average and an average reduction in total power consumption by ~12%.
GP+: A Python Library for Kernel-based learning via Gaussian Processes
Authors: Authors: Amin Yousefpour, Zahra Zanjani Foumani, Mehdi Shishehbor, Carlos Mora, Ramin Bostanabad
Abstract
In this paper we introduce GP+, an open-source library for kernel-based learning via Gaussian processes (GPs) which are powerful statistical models that are completely characterized by their parametric covariance and mean functions. GP+ is built on PyTorch and provides a user-friendly and object-oriented tool for probabilistic learning and inference. As we demonstrate with a host of examples, GP+ has a few unique advantages over other GP modeling libraries. We achieve these advantages primarily by integrating nonlinear manifold learning techniques with GPs' covariance and mean functions. As part of introducing GP+, in this paper we also make methodological contributions that (1) enable probabilistic data fusion and inverse parameter estimation, and (2) equip GPs with parsimonious parametric mean functions which span mixed feature spaces that have both categorical and quantitative variables. We demonstrate the impact of these contributions in the context of Bayesian optimization, multi-fidelity modeling, sensitivity analysis, and calibration of computer models.
CaVE: A Cone-Aligned Approach for Fast Predict-then-optimize with Binary Linear Programs
Authors: Authors: Bo Tang, Elias B. Khalil
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
The end-to-end predict-then-optimize framework, also known as decision-focused learning, has gained popularity for its ability to integrate optimization into the training procedure of machine learning models that predict the unknown cost (objective function) coefficients of optimization problems from contextual instance information. Naturally, most of the problems of interest in this space can be cast as integer linear programs. In this work, we focus on binary linear programs (BLPs) and propose a new end-to-end training method for predict-then-optimize. Our method, Cone-aligned Vector Estimation (CaVE), aligns the predicted cost vectors with the cone corresponding to the true optimal solution of a training instance. When the predicted cost vector lies inside the cone, the optimal solution to the linear relaxation of the binary problem is optimal w.r.t. to the true cost vector. Not only does this alignment produce decision-aware learning models, but it also dramatically reduces training time as it circumvents the need to solve BLPs to compute a loss function with its gradients. Experiments across multiple datasets show that our method exhibits a favorable trade-off between training time and solution quality, particularly with large-scale optimization problems such as vehicle routing, a hard BLP that has yet to benefit from predict-then-optimize methods in the literature due to its difficulty.
Interpretable factorization of clinical questionnaires to identify latent factors of psychopathology
Authors: Authors: Ka Chun Lam, Bridget W Mahony, Armin Raznahan, Francisco Pereira
Abstract
Psychiatry research seeks to understand the manifestations of psychopathology in behavior, as measured in questionnaire data, by identifying a small number of latent factors that explain them. While factor analysis is the traditional tool for this purpose, the resulting factors may not be interpretable, and may also be subject to confounding variables. Moreover, missing data are common, and explicit imputation is often required. To overcome these limitations, we introduce interpretability constrained questionnaire factorization (ICQF), a non-negative matrix factorization method with regularization tailored for questionnaire data. Our method aims to promote factor interpretability and solution stability. We provide an optimization procedure with theoretical convergence guarantees, and an automated procedure to detect latent dimensionality accurately. We validate these procedures using realistic synthetic data. We demonstrate the effectiveness of our method in a widely used general-purpose questionnaire, in two independent datasets (the Healthy Brain Network and Adolescent Brain Cognitive Development studies). Specifically, we show that ICQF improves interpretability, as defined by domain experts, while preserving diagnostic information across a range of disorders, and outperforms competing methods for smaller dataset sizes. This suggests that the regularization in our method matches domain characteristics. The python implementation for ICQF is available at \url{https://github.com/jefferykclam/ICQF}.
Safe Exploration in Reinforcement Learning: Training Backup Control Barrier Functions with Zero Training Time Safety Violations
Abstract
Safe reinforcement learning (RL) aims to satisfy safety constraints during training. However, guaranteeing safety during training remained a challenging problem. This paper presents a novel framework that integrates Backup Control Barrier Functions (BCBFs) with reinforcement learning (RL) to enable safe exploration called RLBUS: Reinforcement Learning Backup Shield. BCBFs incorporate backup controllers that predict a system's finite-time response, facilitating online optimization of a control policy that maintains the forward invariance of a safe subset, while satisfying actuator constraints. Building on the soft-minimum/soft-maximum CBF method from prior work, which ensures feasibility and continuity of the BCBF with multiple backup controllers, this paper proposes integrating these BCBFs with RL. This framework leverages RL to learn a better backup policy to enlarge the forward invariant set, while guaranteeing safety during training. By combining backup controllers and RL, the approach provides safety and feasibility guarantees during training and enables safe online exploration with zero training-time safety violations. The method is demonstrated on an inverted pendulum example, where expanding the forward invariant set through RL allows the pendulum to safely explore larger regions of state space.
On the Dynamics Under the Unhinged Loss and Beyond
Abstract
Recent works have studied implicit biases in deep learning, especially the behavior of last-layer features and classifier weights. However, they usually need to simplify the intermediate dynamics under gradient flow or gradient descent due to the intractability of loss functions and model architectures. In this paper, we introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze the closed-form dynamics while requiring as few simplifications or assumptions as possible. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization. Based on the layer-peeled model that views last-layer features as free optimization variables, we conduct a thorough analysis in the unconstrained, regularized, and spherical constrained cases, as well as the case where the neural tangent kernel remains invariant. To bridge the performance of the unhinged loss to that of Cross-Entropy (CE), we investigate the scenario of fixing classifier weights with a specific structure, (e.g., a simplex equiangular tight frame). Our analysis shows that these dynamics converge exponentially fast to a solution depending on the initialization of features and classifier weights. These theoretical results not only offer valuable insights, including explicit feature regularization and rescaled learning rates for enhancing practical training with the unhinged loss, but also extend their applicability to other loss functions. Finally, we empirically demonstrate these theoretical results and insights through extensive experiments.
Large Language Model Enhanced Multi-Agent Systems for 6G Communications
Authors: Authors: Feibo Jiang, Li Dong, Yubo Peng, Kezhi Wang, Kun Yang, Cunhua Pan, Dusit Niyato, Octavia A. Dobre
Abstract
The rapid development of the Large Language Model (LLM) presents huge opportunities for 6G communications, e.g., network optimization and management by allowing users to input task requirements to LLMs by nature language. However, directly applying native LLMs in 6G encounters various challenges, such as a lack of private communication data and knowledge, limited logical reasoning, evaluation, and refinement abilities. Integrating LLMs with the capabilities of retrieval, planning, memory, evaluation and reflection in agents can greatly enhance the potential of LLMs for 6G communications. To this end, we propose a multi-agent system with customized communication knowledge and tools for solving communication related tasks using natural language, comprising three components: (1) Multi-agent Data Retrieval (MDR), which employs the condensate and inference agents to refine and summarize communication knowledge from the knowledge base, expanding the knowledge boundaries of LLMs in 6G communications; (2) Multi-agent Collaborative Planning (MCP), which utilizes multiple planning agents to generate feasible solutions for the communication related task from different perspectives based on the retrieved knowledge; (3) Multi-agent Evaluation and Reflecxion (MER), which utilizes the evaluation agent to assess the solutions, and applies the reflexion agent and refinement agent to provide improvement suggestions for current solutions. Finally, we validate the effectiveness of the proposed multi-agent system by designing a semantic communication system, as a case study of 6G communications.
MMSE Design of RIS-aided Communications
Authors: Authors: Wen-Xuan Long, Marco Moretti, Andrea Abrardo, Luca Sanguinetti, Rui Chen
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Consider a communication system in which a single antenna user equipment exchanges information with a multi-antenna base station via a reconfigurable intelligent surface (RIS) in the presence of spatially correlated channels and electromagnetic interference (EMI). To exploit the attractive advantages of RIS technology, accurate configuration of its reflecting elements is crucial. In this paper, we use statistical knowledge of channels and EMI to optimize the RIS elements for i) accurate channel estimation and ii) reliable data transmission. In both cases, our goal is to determine the RIS coefficients that minimize the mean square error, resulting in the formulation of two non-convex problems that share the same structure. To solve these two problems, we present an alternating optimization approach that reliably converges to a locally optimal solution. The incorporation of the diagonally scaled steepest descent algorithm, derived from Newton's method, ensures fast convergence with manageable complexity. Numerical results demonstrate the effectiveness of the proposed method under various propagation conditions. Notably, it shows significant advantages over existing alternatives that depend on a sub-optimal configuration of the RIS and are derived on the basis of different criteria.
SimAC: A Simple Anti-Customization Method against Text-to-Image Synthesis of Diffusion Models
Abstract
Despite the success of diffusion-based customization methods on visual content creation, increasing concerns have been raised about such techniques from both privacy and political perspectives. To tackle this issue, several anti-customization methods have been proposed in very recent months, predominantly grounded in adversarial attacks. Unfortunately, most of these methods adopt straightforward designs, such as end-to-end optimization with a focus on adversarially maximizing the original training loss, thereby neglecting nuanced internal properties intrinsic to the diffusion model, and even leading to ineffective optimization in some diffusion time steps. In this paper, we strive to bridge this gap by undertaking a comprehensive exploration of these inherent properties, to boost the performance of current anti-customization approaches. Two aspects of properties are investigated: 1) We examine the relationship between time step selection and the model's perception in the frequency domain of images and find that lower time steps can give much more contributions to adversarial noises. This inspires us to propose an adaptive greedy search for optimal time steps that seamlessly integrates with existing anti-customization methods. 2) We scrutinize the roles of features at different layers during denoising and devise a sophisticated feature-based optimization framework for anti-customization. Experiments on facial benchmarks demonstrate that our approach significantly increases identity disruption, thereby enhancing user privacy and security.
Optimization of Power Control for Autonomous Hybrid Electric Vehicles with Flexible Power Demand
Authors: Authors: Mohammadali Kargar, Xingyong Song
Abstract
Technology advancement for on-road vehicles has gained significant momentum in the past decades, particularly in the field of vehicle automation and powertrain electrification. The optimization of powertrain controls for autonomous vehicles typically involves a separated consideration of the vehicle's external dynamics and powertrain dynamics, with one key aspect often overlooked. This aspect, known as flexible power demand, recognizes that the powertrain control system does not necessarily have to precisely match the power requested by the vehicle motion controller at all times. Leveraging this feature can lead to control designs achieving improved fuel economy by adding an extra degree of freedom to the powertrain control. The present research investigates the use of an Approximate Dynamic Programming (ADP) approach to develop a powertrain controller, which takes into account the flexibility in power demand within the ADP framework. The formulation is based on an autonomous hybrid electric vehicle (HEV), while the methodology can also be applied to other types of vehicles. It is also found that necessary customization of the ADP algorithm is needed for this particular control problem to prevent convergence issues. Finally, a case study is presented to evaluate the effectiveness of the investigated method.
BinGo: Identifying Security Patches in Binary Code with Graph Representation Learning
Authors: Authors: Xu He, Shu Wang, Pengbin Feng, Xinda Wang, Shiyu Sun, Qi Li, Kun Sun
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Abstract
A timely software update is vital to combat the increasing security vulnerabilities. However, some software vendors may secretly patch their vulnerabilities without creating CVE entries or even describing the security issue in their change log. Thus, it is critical to identify these hidden security patches and defeat potential N-day attacks. Researchers have employed various machine learning techniques to identify security patches in open-source software, leveraging the syntax and semantic features of the software changes and commit messages. However, all these solutions cannot be directly applied to the binary code, whose instructions and program flow may dramatically vary due to different compilation configurations. In this paper, we propose BinGo, a new security patch detection system for binary code. The main idea is to present the binary code as code property graphs to enable a comprehensive understanding of program flow and perform a language model over each basic block of binary code to catch the instruction semantics. BinGo consists of four phases, namely, patch data pre-processing, graph extraction, embedding generation, and graph representation learning. Due to the lack of an existing binary security patch dataset, we construct such a dataset by compiling the pre-patch and post-patch source code of the Linux kernel. Our experimental results show BinGo can achieve up to 80.77% accuracy in identifying security patches between two neighboring versions of binary code. Moreover, BinGo can effectively reduce the false positives and false negatives caused by the different compilers and optimization levels.
Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints under Polar Representation
Abstract
Document dewarping, aiming to eliminate geometric deformation in photographed documents to benefit text recognition, has made great progress in recent years but is still far from being solved. While Cartesian coordinates are typically leveraged by state-of-the-art approaches to learn a group of deformation control points, such representation is not efficient for dewarping model to learn the deformation information. In this work, we explore Polar coordinates representation for each point in document dewarping, namely Polar-Doc. In contrast to most current works adopting a two-stage pipeline typically, Polar representation enables a unified point regression framework for both segmentation and dewarping network in one single stage. Such unification makes the whole model more efficient to learn under an end-to-end optimization pipeline, and also obtains a compact representation. Furthermore, we propose a novel multi-scope Polar-Doc-IOU loss to constrain the relationship among control points as a grid-based regularization under the Polar representation. Visual comparisons and quantitative experiments on two benchmarks show that, with much fewer parameters than the other mainstream counterparts, our one-stage model with multi-scope constraints achieves new state-of-the-art performance on both pixel alignment metrics and OCR metrics. Source codes will be available at \url{*****}.
An efficient algorithm for multiuser sum-rate maximization of large-scale active RIS-aided MIMO system
Authors: Authors: Qian Zhang, Mingjie Shao, Qiang Li, Ju Liu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
Active reconfigurable intelligent surface (RIS) is a new RIS architecture that can reflect and amplify communication signals. It can provide enhanced performance gain compared to the conventional passive RIS systems that can only reflect the signals. On the other hand, the design problem of active RIS-aided systems is more challenging than the passive RIS-aided systems and its efficient algorithms are less studied. In this paper, we consider the sum rate maximization problem in the multiuser massive multiple-input single-output (MISO) downlink with the aid of a large-scale active RIS. Existing approaches usually resort to general optimization solvers and can be computationally prohibitive in the considered settings. We propose an efficient block successive upper bound minimization (BSUM) method, of which each step has a (semi) closed-form update. Thus, the proposed algorithm has an attractive low per-iteration complexity. By simulation, our proposed algorithm consumes much less computation than the existing approaches. In particular, when the MIMO and/or RIS sizes are large, our proposed algorithm can be orders-of-magnitude faster than existing approaches.
Abstract
Incremental computation aims to compute more efficiently on changed input by reusing previously computed results. We give a high-level overview of works on incremental computation, and highlight the essence underlying all of them, which we call incrementalization -- the discrete counterpart of differentiation in calculus. We review the gist of a systematic method for incrementalization, and a systematic method centered around it, called Iterate-Incrementalize-Implement, for program design and optimization, as well as algorithm design and optimization. At a meta-level, with historical contexts and for future directions, we stress the power of high-level data, control, and module abstractions in developing new and better algorithms and programs as well as their precise complexities.
CBQ: Cross-Block Quantization for Large Language Models
Abstract
Post-training quantization (PTQ) has driven attention to producing efficient large language models (LLMs) with ultra-low costs. Since hand-craft quantization parameters lead to low performance in low-bit quantization, recent methods optimize the quantization parameters through block-wise reconstruction between the floating-point and quantized models. However, these methods suffer from two challenges: accumulated errors from independent one-by-one block quantization and reconstruction difficulties from extreme weight and activation outliers. To address these two challenges, we propose CBQ, a cross-block reconstruction-based PTQ method for LLMs. To reduce error accumulation, we introduce a cross-block dependency with the aid of a homologous reconstruction scheme to build the long-range dependency between adjacent multi-blocks with overlapping. To reduce reconstruction difficulty, we design a coarse-to-fine pre-processing (CFP) to truncate weight outliers and dynamically scale activation outliers before optimization, and an adaptive rounding scheme, called LoRA-Rounding, with two low-rank learnable matrixes to further rectify weight quantization errors. Extensive experiments demonstrate that: (1) CBQ pushes both activation and weight quantization to low-bit settings W4A4, W4A8, and W2A16. (2) CBQ achieves better performance than the existing state-of-the-art methods on various LLMs and benchmark datasets.
Performance of linear solvers in tensor-train format on current multicore architectures
Authors: Authors: Melven Röhrig-Zöllner, Manuel Joey Becklas, Jonas Thies, Achim Basermann
Abstract
In this paper we discuss the performance of solvers for low-rank linear systems in the tensor-train format, also known as matrix-product states (MPS) in physics. We focus on today's many-core CPU systems and the interplay of the performance and the required linear algebra operations in this setting. Specifically, we consider the tensor-train GMRES method, the modified alternating linear scheme (MALS) and the alternating minimal energy (AMEn). Based on the example of a simple non-symmetric system from discretizing a multidimensional convection-diffusion equation in, e.g., $50^{10}$ grid points, we illustrate the computational complexity of the three methods. This shows that the projection to smaller sub-problems reduces the required number of floating point operations by orders of magnitude: between GMRES and MALS, and again between MALS and AMEn. All three methods require similar underlying linear algebra operations (building blocks). We suggest several building block improvements regarding orthogonalization steps, singular value decompositions, and tensor contractions. In addition, we propose a simple generic preconditioner in the tensor-train format based on a rank-1 approximation of the operator. Combining all optimizations, we obtain roughly a 5x speedup over the reference implementation for the fastest method (AMEn) on a current multi-core CPU.
Secure Deep Reinforcement Learning for Dynamic Resource Allocation in Wireless MEC Networks
Abstract
This paper proposes a blockchain-secured deep reinforcement learning (BC-DRL) optimization framework for {data management and} resource allocation in decentralized {wireless mobile edge computing (MEC)} networks. In our framework, {we design a low-latency reputation-based proof-of-stake (RPoS) consensus protocol to select highly reliable blockchain-enabled BSs to securely store MEC user requests and prevent data tampering attacks.} {We formulate the MEC resource allocation optimization as a constrained Markov decision process that balances minimum processing latency and denial-of-service (DoS) probability}. {We use the MEC aggregated features as the DRL input to significantly reduce the high-dimensionality input of the remaining service processing time for individual MEC requests. Our designed constrained DRL effectively attains the optimal resource allocations that are adapted to the dynamic DoS requirements. We provide extensive simulation results and analysis to} validate that our BC-DRL framework achieves higher security, reliability, and resource utilization efficiency than benchmark blockchain consensus protocols and {MEC} resource allocation algorithms.
Abstract
With the great success of text-conditioned diffusion models in creative text-to-image generation, various text-driven image editing approaches have attracted the attentions of many researchers. However, previous works mainly focus on discreteness-sensitive instructions such as adding, removing or replacing specific objects, background elements or global styles (i.e., hard editing), while generally ignoring subject-binding but semantically fine-changing continuity-sensitive instructions such as actions, poses or adjectives, and so on (i.e., soft editing), which hampers generative AI from generating user-customized visual contents. To mitigate this predicament, we propose a spatio-temporal guided adaptive editing algorithm AdapEdit, which realizes adaptive image editing by introducing a soft-attention strategy to dynamically vary the guiding degree from the editing conditions to visual pixels from both temporal and spatial perspectives. Note our approach has a significant advantage in preserving model priors and does not require model training, fine-tuning, extra data, or optimization. We present our results over a wide variety of raw images and editing instructions, demonstrating competitive performance and showing it significantly outperforms the previous approaches.
ClusterDDPM: An EM clustering framework with Denoising Diffusion Probabilistic Models
Authors: Authors: Jie Yan, Jing Liu, Zhong-yuan Zhang
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Variational autoencoder (VAE) and generative adversarial networks (GAN) have found widespread applications in clustering and have achieved significant success. However, the potential of these approaches may be limited due to VAE's mediocre generation capability or GAN's well-known instability during adversarial training. In contrast, denoising diffusion probabilistic models (DDPMs) represent a new and promising class of generative models that may unlock fresh dimensions in clustering. In this study, we introduce an innovative expectation-maximization (EM) framework for clustering using DDPMs. In the E-step, we aim to derive a mixture of Gaussian priors for the subsequent M-step. In the M-step, our focus lies in learning clustering-friendly latent representations for the data by employing the conditional DDPM and matching the distribution of latent representations to the mixture of Gaussian priors. We present a rigorous theoretical analysis of the optimization process in the M-step, proving that the optimizations are equivalent to maximizing the lower bound of the Q function within the vanilla EM framework under certain constraints. Comprehensive experiments validate the advantages of the proposed framework, showcasing superior performance in clustering, unsupervised conditional generation and latent representation learning.
Event-Triggered Safe Bayesian Optimization on Quadcopters
Authors: Authors: Antonia Holzapfel, Paul Brunzema, Sebastian Trimpe
Abstract
Bayesian optimization (BO) has proven to be a powerful tool for automatically tuning control parameters without requiring knowledge of the underlying system dynamics. Safe BO methods, in addition, guarantee safety during the optimization process, assuming that the underlying objective function does not change. However, in real-world scenarios, time-variations frequently occur, for example, due to wear in the system or changes in operation. Utilizing standard safe BO strategies that do not address time-variations can result in failure as previous safe decisions may become unsafe over time, which we demonstrate herein. To address this, we introduce a new algorithm, Event-Triggered SafeOpt (ETSO), which adapts to changes online solely relying on the observed costs. At its core, ETSO uses an event trigger to detect significant deviations between observations and the current surrogate of the objective function. When such change is detected, the algorithm reverts to a safe backup controller, and exploration is restarted. In this way, safety is recovered and maintained across changes. We evaluate ETSO on quadcopter controller tuning, both in simulation and hardware experiments. ETSO outperforms state-of-the-art safe BO, achieving superior control performance over time while maintaining safety.
An Incentive Mechanism for Federated Learning Based on Multiple Resource Exchange
Authors: Authors: Ruonan Dong, Hui Xu, Han Zhang, GuoPeng Zhang
Abstract
Federated Learning (FL) is a distributed machine learning paradigm that addresses privacy concerns in machine learning and still guarantees high test accuracy. However, achieving the necessary accuracy by having all clients participate in FL is impractical, given the constraints of client local computing resource. In this paper, we introduce a multi-user collaborative computing framework, categorizing users into two roles: model owners (MOs) and data owner (DOs). Without resorting to monetary incentives, an MO can encourage more DOs to join in FL by allowing the DOs to offload extra local computing tasks to the MO for execution. This exchange of "data" for "computing resources" streamlines the incentives for clients to engage more effectively in FL. We formulate the interaction between MO and DOs as an optimization problem, and the objective is to effectively utilize the communication and computing resource of the MO and DOs to minimize the time to complete an FL task. The proposed problem is a mixed integer nonlinear programming (MINLP) with high computational complexity. We first decompose it into two distinct subproblems, namely the client selection problem and the resource allocation problem to segregate the integer variables from the continuous variables. Then, an effective iterative algorithm is proposed to solve problem. Simulation results demonstrate that the proposed collaborative computing framework can achieve an accuracy of more than 95\% while minimizing the overall time to complete an FL task.
Machine Learning for the Multi-Dimensional Bin Packing Problem: Literature Review and Empirical Evaluation
Abstract
The Bin Packing Problem (BPP) is a well-established combinatorial optimization (CO) problem. Since it has many applications in our daily life, e.g. logistics and resource allocation, people are seeking efficient bin packing algorithms. On the other hand, researchers have been making constant advances in machine learning (ML), which is famous for its efficiency. In this article, we first formulate BPP, introducing its variants and practical constraints. Then, a comprehensive survey on ML for multi-dimensional BPP is provided. We further collect some public benchmarks of 3D BPP, and evaluate some online methods on the Cutting Stock Dataset. Finally, we share our perspective on challenges and future directions in BPP. To the best of our knowledge, this is the first systematic review of ML-related methods for BPP.
Memory Simulations, Security and Optimization in a Verified Compiler
Authors: Authors: David Monniaux (VERIMAG - IMAG)
Subjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
Abstract
Current compilers implement security features and optimizations that require nontrivial semantic reasoning about pointers and memory allocation: the program after the insertion of the security feature, or after applying the optimization, must simulate the original program despite a different memory layout. In this article, we illustrate such reasoning on pointer allocations through memory extensions and injections, as well as fine points on undefined values, by explaining how we implemented and proved correct two security features (stack canaries and pointer authentication) and one optimization (tail recursion elimination) in the CompCert formally verified compiler.
Random relay selection based heuristic optimization model for the scheduling and effective resource allocation in the cognitive radio network
Authors: Authors: Aravindkumaran S, D. Saraswady
Subjects: Networking and Internet Architecture (cs.NI)
Abstract
Cognitive Radio Network (CRN) provides effective capabilities for resource allocation with the valuable spectrum resources in the network. It provides the effective allocation of resources to the unlicensed users or Secondary Users (SUs) to access the spectrum those are unused by the licensed users or Primary Users (Pus). This paper develops an Optimal Relay Selection scheme with the spectrum-sharing scheme in CRN. The proposed Cross-Layer Spider Swarm Shifting is implemented in CRN for the optimal relay selection with Spider Swarm Optimization (SSO). The shortest path is estimated with the data shifting model for the data transmission path in the CRN. This study examines a cognitive relay network (CRN) with interference restrictions imposed by a mobile end user (MU). Half-duplex communication is used in the proposed system model between a single primary user (PU) and a single secondary user (SU). Between the SU source and SU destination, an amplify and forward (AF) relaying mechanism is also used. While other nodes (SU Source, SU relays, and PU) are supposed to be immobile in this scenario, the mobile end user (SU destination) is assumed to travel at high vehicle speeds. The suggested method achieves variety by placing a selection combiner at the SU destination and dynamically selecting the optimal relay for transmission based on the greatest signal-to-noise (SNR) ratio. The performance of the proposed Cross-Layer Spider Swarm Shifting model is compared with the Spectrum Sharing Optimization with QoS Guarantee (SSO-QG). The comparative analysis expressed that the proposed Cross-Layer Spider Swarm Shifting model delay is reduced by 15% compared with SSO-QG. Additionally, the proposed Cross-Layer Spider Swarm Shifting exhibits the improved network performance of ~25% higher throughput compared with SSO-QG.
MToP: A MATLAB Optimization Platform for Evolutionary Multitasking
Abstract
Evolutionary multitasking (EMT) has been attracting much attention over the past years. It aims to handle multiple optimization tasks simultaneously within limited computing resources assisted by inter-task knowledge transfer techniques. Numerous multitask evolutionary algorithms (MTEAs) for solving multitask optimization (MTO) problems have been proposed in the EMT field, but there lacks a comprehensive software platform to help researchers evaluate MTEA performance on benchmark MTO problems as well as explore real-world applications. To address this issue, we introduce the first open-source optimization platform, named MTO-Platform (MToP), for EMT. It incorporates more than 30 MTEAs, more than 150 MTO problem cases with real-world applications, and more than 10 performance metrics. Moreover, for comparing MTEAs with traditional evolutionary algorithms, we modified more than 30 popular single-task evolutionary algorithms to be able to solve MTO problems in MToP. MToP is a user-friendly tool with a graphical user interface that makes it easy to analyze results, export data, and plot schematics. More importantly, MToP is extensible, allowing users to develop new algorithms and define new problems. The source code of MToP is available at https://github.com/intLyc/MTO-Platform.
Smart Energy Management with Optimized Prosumerism for Achieving Dynamic Net-Zero Balance in Electrified Road Transport Networks
Abstract
The increasing number of Electric Vehicles (EVs) have led to rising energy demands which aggregates the burden on grid supply. A few solutions have been proposed to reduce grid load, for example, using storage systems for storing surplus energy from EVs or time-scheduling supply. These solutions are costly and limited to specific regions and times. This paper proposes a smart energy management solution for a massively electrified road transport network. It comprises of energy supplies from grid, charging stations, renewable sources and EVs connected by 5G-enabled aggregators. We propose EVs as prosumers, which are energy consumers but also supply back their surplus energy via Vehicle-to-Grid (V2G) technology. We use machine learning to forecast hourly energy output from renewable sources, surplus supply from EVs and their demands. A grid cost minimization using Mixed Integer Linear Programming solution is proposed to dynamically alter supply according to demand and energy provision from EVs. The upper bounds of surplus supply and demand of EVs are theoretically derived. An incentive distribution mechanism is presented to reward EVs offering their surplus supply and to discourage them to become selfish which is analyzed using Prisoner's dilemma game. The paper also presents an optimum number of charging stations on a road. Simulation results show that the proposed solution can effectively meet the demand requirements even if the supply from grid is limited, and can averagely reduce 38.21% of grid load. It results in 5.3% of average cost reduction compared with optimization without prosumers. The proposed penalty charge for CO2 emissions results in over 50% cost reduction by using renewable resources in the proposed solution as compared to fossil fuels. The communication and computation complexity of the proposed solution is reduced by 5G-enabled aggregator.
A Precoding for ORIS-Assisted MIMO Multi-User VLC System
Abstract
In this paper, we study a multi-user visible light communication (VLC) system assisted with optical reflecting intelligent surface (ORIS). Joint precoding and alignment matrices are designed to maximize the average signal-to-interference plus noise ratio (SINR) criteria. Considering the constraints of the constant mean transmission power of LEDs and the power associated with all users, an optimization problem is proposed. To solve this problem, we utilize an alternating optimization algorithm to optimize the precoding and alignment matrices. The simulation results demonstrate that the resultant SINR of the proposed method outperforms ZF and MMSE precoding algorithms.
Curriculum-Enhanced Residual Soft An-Isotropic Normalization for Over-smoothness in Deep GNNs
Authors: Authors: Jin Li, Qirong Zhang, Shuling Xu, Xinlong Chen, Longkun Guo, Yang-Geng Fu
Abstract
Despite Graph neural networks' significant performance gain over many classic techniques in various graph-related downstream tasks, their successes are restricted in shallow models due to over-smoothness and the difficulties of optimizations among many other issues. In this paper, to alleviate the over-smoothing issue, we propose a soft graph normalization method to preserve the diversities of node embeddings and prevent indiscrimination due to possible over-closeness. Combined with residual connections, we analyze the reason why the method can effectively capture the knowledge in both input graph structures and node features even with deep networks. Additionally, inspired by Curriculum Learning that learns easy examples before the hard ones, we propose a novel label-smoothing-based learning framework to enhance the optimization of deep GNNs, which iteratively smooths labels in an auxiliary graph and constructs many gradual non-smooth tasks for extracting increasingly complex knowledge and gradually discriminating nodes from coarse to fine. The method arguably reduces the risk of overfitting and generalizes better results. Finally, extensive experiments are carried out to demonstrate the effectiveness and potential of the proposed model and learning framework through comparison with twelve existing baselines including the state-of-the-art methods on twelve real-world node classification benchmarks.
GLOP: Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time
Authors: Authors: Haoran Ye, Jiarui Wang, Helan Liang, Zhiguang Cao, Yong Li, Fanzhang Li
Abstract
The recent end-to-end neural solvers have shown promise for small-scale routing problems but suffered from limited real-time scaling-up performance. This paper proposes GLOP (Global and Local Optimization Policies), a unified hierarchical framework that efficiently scales toward large-scale routing problems. GLOP partitions large routing problems into Travelling Salesman Problems (TSPs) and TSPs into Shortest Hamiltonian Path Problems. For the first time, we hybridize non-autoregressive neural heuristics for coarse-grained problem partitions and autoregressive neural heuristics for fine-grained route constructions, leveraging the scalability of the former and the meticulousness of the latter. Experimental results show that GLOP achieves competitive and state-of-the-art real-time performance on large-scale routing problems, including TSP, ATSP, CVRP, and PCTSP.
Green Operations of SWIPT Networks: The Role of End-User Devices
Authors: Authors: Gianluca Rizzo, Marco Ajmone Marsan, Christian Esposito, Biagio Boi
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
Internet of Things (IoT) devices often come with batteries of limited capacity that are not easily replaceable or rechargeable, and that constrain significantly the sensing, computing, and communication tasks that they can perform. The Simultaneous Wireless Information and Power Transfer (SWIPT) paradigm addresses this issue by delivering power wirelessly to energy-harvesting IoT devices with the same signal used for information transfer. For their peculiarity, these networks require specific energy-efficient planning and management approaches. However, to date, it is not clear what are the most effective strategies for managing a SWIPT network for energy efficiency. In this paper, we address this issue by developing an analytical model based on stochastic geometry, accounting for the statistics of user-perceived performance and base station scheduling. We formulate an optimization problem for deriving the energy optimal configuration as a function of the main system parameters, and we propose a genetic algorithm approach to solve it. Our results enable a first-order evaluation of the most effective strategies for energy-efficient provisioning of power and communications in a SWIPT network. We show that the service capacity brought about by users brings energy-efficient dynamic network provisioning strategies that radically differ from those of networks with no wireless power transfer.
A Survey of Generative AI for Intelligent Transportation Systems
Abstract
Intelligent transportation systems play a crucial role in modern traffic management and optimization, greatly improving traffic efficiency and safety. With the rapid development of generative artificial intelligence (Generative AI) technologies in the fields of image generation and natural language processing, generative AI has also played a crucial role in addressing key issues in intelligent transportation systems, such as data sparsity, difficulty in observing abnormal scenarios, and in modeling data uncertainty. In this review, we systematically investigate the relevant literature on generative AI techniques in addressing key issues in different types of tasks in intelligent transportation systems. First, we introduce the principles of different generative AI techniques, and their potential applications. Then, we classify tasks in intelligent transportation systems into four types: traffic perception, traffic prediction, traffic simulation, and traffic decision-making. We systematically illustrate how generative AI techniques addresses key issues in these four different types of tasks. Finally, we summarize the challenges faced in applying generative AI to intelligent transportation systems, and discuss future research directions based on different application scenarios.
Abstract
Training a deep neural network to maximize a target objective has become the standard recipe for successful machine learning over the last decade. These networks can be optimized with supervised learning, if the target objective is differentiable. For many interesting problems, this is however not the case. Common objectives like intersection over union (IoU), bilingual evaluation understudy (BLEU) score or rewards cannot be optimized with supervised learning. A common workaround is to define differentiable surrogate losses, leading to suboptimal solutions with respect to the actual objective. Reinforcement learning (RL) has emerged as a promising alternative for optimizing deep neural networks to maximize non-differentiable objectives in recent years. Examples include aligning large language models via human feedback, code generation, object detection or control problems. This makes RL techniques relevant to the larger machine learning audience. The subject is, however, time intensive to approach due to the large range of methods, as well as the often very theoretical presentation. In this introduction, we take an alternative approach, different from classic reinforcement learning textbooks. Rather than focusing on tabular problems, we introduce reinforcement learning as a generalization of supervised learning, which we first apply to non-differentiable objectives and later to temporal problems. Assuming only basic knowledge of supervised learning, the reader will be able to understand state-of-the-art deep RL algorithms like proximal policy optimization (PPO) after reading this tutorial.
Keyword: adam
Fine-Grained Image-Text Alignment in Medical Imaging Enables Cyclic Image-Report Generation
Abstract
To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process. AdaMatch exploits the fine-grained relation between adaptive patches and words to provide explanations of specific image regions with corresponding words. To capture the abnormal regions of varying sizes and positions, we introduce the Adaptive Patch extraction (AdaPatch) module to acquire the adaptive patches for these regions adaptively. In order to provide explicit explainability for CXR-report generation task, we propose an AdaMatch-based bidirectional large language model for Cyclic CXR-report generation (AdaMatch-Cyclic). It employs the AdaMatch to obtain the keywords for CXR images and `keypatches' for medical reports as hints to guide CXR-report generation. Extensive experiments on two publicly available CXR datasets prove the effectiveness of our method and its superior performance to existing methods.
Keyword: gradient
SE(3)-Invariant Multiparameter Persistent Homology for Chiral-Sensitive Molecular Property Prediction
Authors: Authors: Andac Demir, Francis Prael III, Bulent Kiziltan
Abstract
In this study, we present a novel computational method for generating molecular fingerprints using multiparameter persistent homology (MPPH). This technique holds considerable significance for drug discovery and materials science, where precise molecular property prediction is vital. By integrating SE(3)-invariance with Vietoris-Rips persistent homology, we effectively capture the three-dimensional representations of molecular chirality. This non-superimposable mirror image property directly influences the molecular interactions, serving as an essential factor in molecular property prediction. We explore the underlying topologies and patterns in molecular structures by applying Vietoris-Rips persistent homology across varying scales and parameters such as atomic weight, partial charge, bond type, and chirality. Our method's efficacy can be improved by incorporating additional parameters such as aromaticity, orbital hybridization, bond polarity, conjugated systems, as well as bond and torsion angles. Additionally, we leverage Stochastic Gradient Langevin Boosting in a Bayesian ensemble of GBDTs to obtain aleatoric and epistemic uncertainty estimates for gradient boosting models. With these uncertainty estimates, we prioritize high-uncertainty samples for active learning and model fine-tuning, benefiting scenarios where data labeling is costly or time consuming. Compared to conventional GNNs which usually suffer from oversmoothing and oversquashing, MPPH provides a more comprehensive and interpretable characterization of molecular data topology. We substantiate our approach with theoretical stability guarantees and demonstrate its superior performance over existing state-of-the-art methods in predicting molecular properties through extensive evaluations on the MoleculeNet benchmark datasets.
Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply
Abstract
Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses, thereby providing asynchronous and parallel training methods that substantially reduce memory cost. However, empirical experiments reveal that as the number of segmentations of the gradient-isolated module increases, the performance of the local learning scheme degrades substantially, severely limiting its expansibility. To avoid this issue, we theoretically analyze the greedy local learning from the standpoint of information theory and propose a ContSup scheme, which incorporates context supply between isolated modules to compensate for information loss. Experiments on benchmark datasets (i.e. CIFAR, SVHN, STL-10) achieve SOTA results and indicate that our proposed method can significantly improve the performance of greedy local learning with minimal memory and computational overhead, allowing for the boost of the number of isolated modules. Our codes are available at https://github.com/Tab-ct/ContSup.
CaVE: A Cone-Aligned Approach for Fast Predict-then-optimize with Binary Linear Programs
Authors: Authors: Bo Tang, Elias B. Khalil
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
The end-to-end predict-then-optimize framework, also known as decision-focused learning, has gained popularity for its ability to integrate optimization into the training procedure of machine learning models that predict the unknown cost (objective function) coefficients of optimization problems from contextual instance information. Naturally, most of the problems of interest in this space can be cast as integer linear programs. In this work, we focus on binary linear programs (BLPs) and propose a new end-to-end training method for predict-then-optimize. Our method, Cone-aligned Vector Estimation (CaVE), aligns the predicted cost vectors with the cone corresponding to the true optimal solution of a training instance. When the predicted cost vector lies inside the cone, the optimal solution to the linear relaxation of the binary problem is optimal w.r.t. to the true cost vector. Not only does this alignment produce decision-aware learning models, but it also dramatically reduces training time as it circumvents the need to solve BLPs to compute a loss function with its gradients. Experiments across multiple datasets show that our method exhibits a favorable trade-off between training time and solution quality, particularly with large-scale optimization problems such as vehicle routing, a hard BLP that has yet to benefit from predict-then-optimize methods in the literature due to its difficulty.
IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable $k$-Means
Authors: Authors: Sean Jaffe, Ambuj K. Singh, Francesco Bullo
Abstract
Compressing large neural networks with minimal performance loss is crucial to enabling their deployment on edge devices. (Cho et al., 2022) proposed a weight quantization method that uses an attention-based clustering algorithm called differentiable $k$-means (DKM). Despite achieving state-of-the-art results, DKM's performance is constrained by its heavy memory dependency. We propose an implicit, differentiable $k$-means algorithm (IDKM), which eliminates the major memory restriction of DKM. Let $t$ be the number of $k$-means iterations, $m$ be the number of weight-vectors, and $b$ be the number of bits per cluster address. IDKM reduces the overall memory complexity of a single $k$-means layer from $\mathcal{O}(t \cdot m \cdot 2^b)$ to $\mathcal{O}( m \cdot 2^b)$. We also introduce a variant, IDKM with Jacobian-Free-Backpropagation (IDKM-JFB), for which the time complexity of the gradient calculation is independent of $t$ as well. We provide a proof of concept of our methods by showing that, under the same settings, IDKM achieves comparable performance to DKM with less compute time and less memory. We also use IDKM and IDKM-JFB to quantize a large neural network, Resnet18, on hardware where DKM cannot train at all.
On the Dynamics Under the Unhinged Loss and Beyond
Abstract
Recent works have studied implicit biases in deep learning, especially the behavior of last-layer features and classifier weights. However, they usually need to simplify the intermediate dynamics under gradient flow or gradient descent due to the intractability of loss functions and model architectures. In this paper, we introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze the closed-form dynamics while requiring as few simplifications or assumptions as possible. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization. Based on the layer-peeled model that views last-layer features as free optimization variables, we conduct a thorough analysis in the unconstrained, regularized, and spherical constrained cases, as well as the case where the neural tangent kernel remains invariant. To bridge the performance of the unhinged loss to that of Cross-Entropy (CE), we investigate the scenario of fixing classifier weights with a specific structure, (e.g., a simplex equiangular tight frame). Our analysis shows that these dynamics converge exponentially fast to a solution depending on the initialization of features and classifier weights. These theoretical results not only offer valuable insights, including explicit feature regularization and rescaled learning rates for enhancing practical training with the unhinged loss, but also extend their applicability to other loss functions. Finally, we empirically demonstrate these theoretical results and insights through extensive experiments.
Noise in the reverse process improves the approximation capabilities of diffusion models
Abstract
In Score based Generative Modeling (SGMs), the state-of-the-art in generative modeling, stochastic reverse processes are known to perform better than their deterministic counterparts. This paper delves into the heart of this phenomenon, comparing neural ordinary differential equations (ODEs) and neural stochastic differential equations (SDEs) as reverse processes. We use a control theoretic perspective by posing the approximation of the reverse process as a trajectory tracking problem. We analyze the ability of neural SDEs to approximate trajectories of the Fokker-Planck equation, revealing the advantages of stochasticity. First, neural SDEs exhibit a powerful regularizing effect, enabling $L^2$ norm trajectory approximation surpassing the Wasserstein metric approximation achieved by neural ODEs under similar conditions, even when the reference vector field or score function is not Lipschitz. Applying this result, we establish the class of distributions that can be sampled using score matching in SGMs, relaxing the Lipschitz requirement on the gradient of the data distribution in existing literature. Second, we show that this approximation property is preserved when network width is limited to the input dimension of the network. In this limited width case, the weights act as control inputs, framing our analysis as a controllability problem for neural SDEs in probability density space. This sheds light on how noise helps to steer the system towards the desired solution and illuminates the empirical success of stochasticity in generative modeling.
DTL: Disentangled Transfer Learning for Visual Recognition
Authors: Authors: Minghao Fu, Ke Zhu, Jianxin Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
When pre-trained models become rapidly larger, the cost of fine-tuning on downstream tasks steadily increases, too. To economically fine-tune these models, parameter-efficient transfer learning (PETL) is proposed, which only tunes a tiny subset of trainable parameters to efficiently learn quality representations. However, current PETL methods are facing the dilemma that during training the GPU memory footprint is not effectively reduced as trainable parameters. PETL will likely fail, too, if the full fine-tuning encounters the out-of-GPU-memory issue. This phenomenon happens because trainable parameters from these methods are generally entangled with the backbone, such that a lot of intermediate states have to be stored in GPU memory for gradient propagation. To alleviate this problem, we introduce Disentangled Transfer Learning (DTL), which disentangles the trainable parameters from the backbone using a lightweight Compact Side Network (CSN). By progressively extracting task-specific information with a few low-rank linear mappings and appropriately adding the information back to the backbone, CSN effectively realizes knowledge transfer in various downstream tasks. We conducted extensive experiments to validate the effectiveness of our method. The proposed method not only reduces a large amount of GPU memory usage and trainable parameters, but also outperforms existing PETL methods by a significant margin in accuracy, achieving new state-of-the-art on several standard benchmarks.
Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies
Authors: Authors: Vicki Young, Jumman Hossain, Nirmalya Roy
Abstract
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal while efficiently avoiding obstacles. Traditional reinforcement learning techniques, namely Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed DDPG (TD3), have been evaluated using the Gazebo simulation framework in a variety of environments with parameters such as random goal and robot starting locations. These methods provide a numerical reward to the robot, offering an indication of action quality in relation to the goal. However, their limitations become apparent in complex settings where multiple, potentially conflicting, objectives are present. To address these limitations, we propose an approach employing Multi-Objective Reinforcement Learning (MORL). By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals, aiming to achieve a Pareto optimal solution. This comparative study highlights the potential for MORL in complex, dynamic robotic navigation tasks, setting the stage for future investigations into more adaptable and robust robotic behaviors.
Topology-Dependent Privacy Bound For Decentralized Federated Learning
Authors: Authors: Qiongxiu Li, Wenrui Yu, Changlong Ji, Richard Heusdens
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract
Decentralized Federated Learning (FL) has attracted significant attention due to its enhanced robustness and scalability compared to its centralized counterpart. It pivots on peer-to-peer communication rather than depending on a central server for model aggregation. While prior research has delved into various factors of decentralized FL such as aggregation methods and privacy-preserving techniques, one crucial aspect affecting privacy is relatively unexplored: the underlying graph topology. In this paper, we fill the gap by deriving a stringent privacy bound for decentralized FL under the condition that the accuracy is not compromised, highlighting the pivotal role of graph topology. Specifically, we demonstrate that the minimum privacy loss at each model aggregation step is dependent on the size of what we term as 'honest components', the maximally connected subgraphs once all untrustworthy participants are excluded from the networks, which is closely tied to network robustness. Our analysis suggests that attack-resilient networks will provide a superior privacy guarantee. We further validate this by studying both Poisson and power law networks, showing that the latter, being less robust against attacks, indeed reveals more privacy. In addition to a theoretical analysis, we consolidate our findings by examining two distinct privacy attacks: membership inference and gradient inversion.
Kimad: Adaptive Gradient Compression with Bandwidth Awareness
Authors: Authors: Jihao Xin, Ivan Ilin, Shunkang Zhang, Marco Canini, Peter Richtárik
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)
Abstract
In distributed training, communication often emerges as a bottleneck. In response, we introduce Kimad, a solution that offers adaptive gradient compression. By consistently monitoring bandwidth, Kimad refines compression ratios to match specific neural network layer requirements. Our exhaustive tests and proofs confirm Kimad's outstanding performance, establishing it as a benchmark in adaptive compression for distributed deep learning.
CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem
Authors: Authors: Qian Chen, Taolin Zhang, Dongyang Li, Xiaofeng He
Abstract
The minimal feature removal problem in the post-hoc explanation area aims to identify the minimal feature set (MFS). Prior studies using the greedy algorithm to calculate the minimal feature set lack the exploration of feature interactions under a monotonic assumption which cannot be satisfied in general scenarios. In order to address the above limitations, we propose a Cooperative Integrated Dynamic Refining method (CIDR) to efficiently discover minimal feature sets. Specifically, we design Cooperative Integrated Gradients (CIG) to detect interactions between features. By incorporating CIG and characteristics of the minimal feature set, we transform the minimal feature removal problem into a knapsack problem. Additionally, we devise an auxiliary Minimal Feature Refinement algorithm to determine the minimal feature set from numerous candidate sets. To the best of our knowledge, our work is the first to address the minimal feature removal problem in the field of natural language processing. Extensive experiments demonstrate that CIDR is capable of tracing representative minimal feature sets with improved interpretability across various models and datasets.
Conceptualizing Suicidal Behavior: Utilizing Explanations of Predicted Outcomes to Analyze Longitudinal Social Media Data
Authors: Authors: Van Minh Nguyen, Nasheen Nur, William Stern, Thomas Mercer, Chiradeep Sen, Siddhartha Bhattacharyya, Victor Tumbiolo, Seng Jhing Goh
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
Abstract
The COVID-19 pandemic has escalated mental health crises worldwide, with social isolation and economic instability contributing to a rise in suicidal behavior. Suicide can result from social factors such as shame, abuse, abandonment, and mental health conditions like depression, Post-Traumatic Stress Disorder (PTSD), Attention-Deficit/Hyperactivity Disorder (ADHD), anxiety disorders, and bipolar disorders. As these conditions develop, signs of suicidal ideation may manifest in social media interactions. Analyzing social media data using artificial intelligence (AI) techniques can help identify patterns of suicidal behavior, providing invaluable insights for suicide prevention agencies, professionals, and broader community awareness initiatives. Machine learning algorithms for this purpose require large volumes of accurately labeled data. Previous research has not fully explored the potential of incorporating explanations in analyzing and labeling longitudinal social media data. In this study, we employed a model explanation method, Layer Integrated Gradients, on top of a fine-tuned state-of-the-art language model, to assign each token from Reddit users' posts an attribution score for predicting suicidal ideation. By extracting and analyzing attributions of tokens from the data, we propose a methodology for preliminary screening of social media posts for suicidal ideation without using large language models during inference.
Keyword: super-resolution
Semantic-Lens: Instance-Centric Semantic Alignment for Video Super-Resolution
Authors: Authors: Qi Tang, Yao Zhao, Meiqin Liu, Jian Jin, Chao Yao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
As a critical clue of video super-resolution (VSR), inter-frame alignment significantly impacts overall performance. However, accurate pixel-level alignment is a challenging task due to the intricate motion interweaving in the video. In response to this issue, we introduce a novel paradigm for VSR named \textbf{Semantic Lens}, predicated on semantic priors drawn from degraded videos. Specifically, video is modeled as instances, events, and scenes via a Semantic Extractor. Those semantics assist the Pixel Enhancer in understanding the recovered contents and generating more realistic visual results. The distilled global semantics embody the scene information of each frame, while the instance-specific semantics assemble the spatial-temporal contexts related to each instance. Furthermore, we devise a \textbf{S}emantics-\textbf{P}owered \textbf{A}ttention \textbf{C}ross-\textbf{E}mbedding (SPACE) block to bridge the pixel-level features with semantic knowledge, composed of a \textbf{G}lobal \textbf{P}erspective \textbf{S}hifter (GPS) and an \textbf{I}nstance-Specific \textbf{S}emantic \textbf{E}mbedding \textbf{E}ncoder (ISEE). Concretely, the GPS module generates pairs of affine transformation parameters for pixel-level feature modulation conditioned on global semantics. After that, the ISEE module harnesses the attention mechanism to align the adjacent frames in the instance-centric semantic space. In addition, we incorporate a simple yet effective pre-alignment module to alleviate the difficulty of model training. Extensive experiments demonstrate the superiority of our model over existing state-of-the-art VSR methods.
Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements
Abstract
In this paper, we present a novel robust framework for low-level vision tasks, including denoising, object removal, frame interpolation, and super-resolution, that does not require any external training data corpus. Our proposed approach directly learns the weights of neural modules by optimizing over the corrupted test sequence, leveraging the spatio-temporal coherence and internal statistics of videos. Furthermore, we introduce a novel spatial pyramid loss that leverages the property of spatio-temporal patch recurrence in a video across the different scales of the video. This loss enhances robustness to unstructured noise in both the spatial and temporal domains. This further results in our framework being highly robust to degradation in input frames and yields state-of-the-art results on downstream tasks such as denoising, object removal, and frame interpolation. To validate the effectiveness of our approach, we conduct qualitative and quantitative evaluations on standard video datasets such as DAVIS, UCF-101, and VIMEO90K-T.
CoIE: Chain-of-Instruct Editing for Multi-Attribute Face Manipulation
Authors: Authors: Zhenduo Zhang, Bowen Zhang, Guang Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Current text-to-image editing models often encounter challenges with smoothly manipulating multiple attributes using a single instruction. Taking inspiration from the Chain-of-Thought prompting technique utilized in language models, we present an innovative concept known as Chain-of-Instruct Editing (CoIE), which enhances the capabilities of these models through step-by-step editing using a series of instructions. In particular, in the context of face manipulation, we leverage the contextual learning abilities of a pretrained Large Language Model (LLM), such as GPT-4, to generate a sequence of instructions from the original input, utilizing a purpose-designed 1-shot template. To further improve the precision of each editing step, we conduct fine-tuning on the editing models using our self-constructed instruction-guided face editing dataset, Instruct-CelebA. And additionally, we incorporate a super-resolution module to mitigate the adverse effects of editability and quality degradation. Experimental results across various challenging cases confirm the significant boost in multi-attribute facial image manipulation using chain-of-instruct editing. This is evident in enhanced editing success rates, measured by CLIPSim and Coverage metrics, improved by 17.86% and 85.45% respectively, and heightened controllability indicated by Preserve L1 and Quality metrics, improved by 11.58% and 4.93% respectively.
EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms with Real-captured Hybrid Dataset
Authors: Authors: Peiqi Duan, Boyu Li, Yixin Yang, Hanyue Lou, Minggui Teng, Yi Ma, Boxin Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Event cameras are emerging imaging technology that offers advantages over conventional frame-based imaging sensors in dynamic range and sensing speed. Complementing the rich texture and color perception of traditional image frames, the hybrid camera system of event and frame-based cameras enables high-performance imaging. With the assistance of event cameras, high-quality image/video enhancement methods make it possible to break the limits of traditional frame-based cameras, especially exposure time, resolution, dynamic range, and frame rate limits. This paper focuses on five event-aided image and video enhancement tasks (i.e., event-based video reconstruction, event-aided high frame rate video reconstruction, image deblurring, image super-resolution, and high dynamic range image reconstruction), provides an analysis of the effects of different event properties, a real-captured and ground truth labeled benchmark dataset, a unified benchmarking of state-of-the-art methods, and an evaluation for two mainstream event simulators. In detail, this paper collects a real-captured evaluation dataset EventAid for five event-aided image/video enhancement tasks, by using "Event-RGB" multi-camera hybrid system, taking into account scene diversity and spatiotemporal synchronization. We further perform quantitative and visual comparisons for state-of-the-art algorithms, provide a controlled experiment to analyze the performance limit of event-aided image deblurring methods, and discuss open problems to inspire future research.
Keyword: sgd
There is no result
Keyword: optimization
Large Language Models for Intent-Driven Session Recommendations
Adaptive Proximal Policy Optimization with Upper Confidence Bound
Multi Armed Bandit based Resource Allocation in Near Memory Processing Architectures
GP+: A Python Library for Kernel-based learning via Gaussian Processes
CaVE: A Cone-Aligned Approach for Fast Predict-then-optimize with Binary Linear Programs
Interpretable factorization of clinical questionnaires to identify latent factors of psychopathology
Safe Exploration in Reinforcement Learning: Training Backup Control Barrier Functions with Zero Training Time Safety Violations
On the Dynamics Under the Unhinged Loss and Beyond
Large Language Model Enhanced Multi-Agent Systems for 6G Communications
MMSE Design of RIS-aided Communications
SimAC: A Simple Anti-Customization Method against Text-to-Image Synthesis of Diffusion Models
Optimization of Power Control for Autonomous Hybrid Electric Vehicles with Flexible Power Demand
BinGo: Identifying Security Patches in Binary Code with Graph Representation Learning
Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints under Polar Representation
An efficient algorithm for multiuser sum-rate maximization of large-scale active RIS-aided MIMO system
Incremental Computation: What Is the Essence?
CBQ: Cross-Block Quantization for Large Language Models
Performance of linear solvers in tensor-train format on current multicore architectures
Secure Deep Reinforcement Learning for Dynamic Resource Allocation in Wireless MEC Networks
AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing
ClusterDDPM: An EM clustering framework with Denoising Diffusion Probabilistic Models
Event-Triggered Safe Bayesian Optimization on Quadcopters
An Incentive Mechanism for Federated Learning Based on Multiple Resource Exchange
Machine Learning for the Multi-Dimensional Bin Packing Problem: Literature Review and Empirical Evaluation
Memory Simulations, Security and Optimization in a Verified Compiler
Random relay selection based heuristic optimization model for the scheduling and effective resource allocation in the cognitive radio network
MToP: A MATLAB Optimization Platform for Evolutionary Multitasking
Smart Energy Management with Optimized Prosumerism for Achieving Dynamic Net-Zero Balance in Electrified Road Transport Networks
A Precoding for ORIS-Assisted MIMO Multi-User VLC System
Curriculum-Enhanced Residual Soft An-Isotropic Normalization for Over-smoothness in Deep GNNs
GLOP: Learning Global Partition and Local Construction for Solving Large-scale Routing Problems in Real-time
Green Operations of SWIPT Networks: The Role of End-User Devices
A Survey of Generative AI for Intelligent Transportation Systems
An Invitation to Deep Reinforcement Learning
Keyword: adam
Fine-Grained Image-Text Alignment in Medical Imaging Enables Cyclic Image-Report Generation
Keyword: gradient
SE(3)-Invariant Multiparameter Persistent Homology for Chiral-Sensitive Molecular Property Prediction
Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply
CaVE: A Cone-Aligned Approach for Fast Predict-then-optimize with Binary Linear Programs
IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable $k$-Means
On the Dynamics Under the Unhinged Loss and Beyond
Noise in the reverse process improves the approximation capabilities of diffusion models
DTL: Disentangled Transfer Learning for Visual Recognition
Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies
Topology-Dependent Privacy Bound For Decentralized Federated Learning
Kimad: Adaptive Gradient Compression with Bandwidth Awareness
CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem
Conceptualizing Suicidal Behavior: Utilizing Explanations of Predicted Outcomes to Analyze Longitudinal Social Media Data
Keyword: super-resolution
Semantic-Lens: Instance-Centric Semantic Alignment for Video Super-Resolution
Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements
CoIE: Chain-of-Instruct Editing for Multi-Attribute Face Manipulation
EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms with Real-captured Hybrid Dataset