New submissions for Thu, 12 Oct 23

Keyword: sgd

Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

Authors: Authors: Zixiang Chen, Junkai Zhang, Yiwen Kou, Xiangning Chen, Cho-Jui Hsieh, Quanquan Gu
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.07269
Pdf link: https://arxiv.org/pdf/2310.07269
Abstract The challenge of overfitting, in which the model memorizes the training data and fails to generalize to test data, has become increasingly significant in the training of large neural networks. To tackle this challenge, Sharpness-Aware Minimization (SAM) has emerged as a promising training method, which can improve the generalization of neural networks even in the presence of label noise. However, a deep understanding of how SAM works, especially in the setting of nonlinear neural networks and classification tasks, remains largely missing. This paper fills this gap by demonstrating why SAM generalizes better than Stochastic Gradient Descent (SGD) for a certain data model and two-layer convolutional ReLU networks. The loss landscape of our studied problem is nonsmooth, thus current explanations for the success of SAM based on the Hessian information are insufficient. Our result explains the benefits of SAM, particularly its ability to prevent noise learning in the early stages, thereby facilitating more effective learning of features. Experiments on both synthetic and real data corroborate our theory.
Keyword: optimization

Design of JiuTian Intelligent Network Simulation Platform
Authors: Authors: Lei Zhao, Miaomiao Zhang, Guangyu Li, Zhuowen Guan, Sijia Liu, Zhaobin Xiao, Yuting Cao, Zhe Lv, Yanping Liang
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.06858
Pdf link: https://arxiv.org/pdf/2310.06858
Abstract This paper introduced the JiuTian Intelligent Network Simulation Platform, which can provide wireless communication simulation data services for the Open Innovation Platform. The platform contains a series of scalable simulator functionalities, offering open services that enable users to use reinforcement learning algorithms for model training and inference based on simulation environments and data. Additionally, it allows users to address optimization tasks in different scenarios by uploading and updating parameter configurations. The platform and its open services were primarily introduced from the perspectives of background, overall architecture, simulator, business scenarios, and future directions.
Extended Reality via Cooperative NOMA in Hybrid Cloud/Mobile-Edge Computing Networks
Authors: Authors: Robert-Jeron Reifert, Hayssam Dahrouj, Aydin Sezgin
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.06874
Pdf link: https://arxiv.org/pdf/2310.06874
Abstract Extended reality (XR) applications often perform resource-intensive tasks, which are computed remotely, a process that prioritizes the latency criticality aspect. To this end, this paper shows that through leveraging the power of the central cloud (CC), the close proximity of edge computers (ECs), and the flexibility of uncrewed aerial vehicles (UAVs), a UAV-aided hybrid cloud/mobile-edge computing architecture promises to handle the intricate requirements of future XR applications. In this context, this paper distinguishes between two types of XR devices, namely, strong and weak devices. The paper then introduces a cooperative non-orthogonal multiple access (Co-NOMA) scheme, pairing strong and weak devices, so as to aid the XR devices quality-of-user experience by intelligently selecting either the direct or the relay links toward the weak XR devices. A sum logarithmic-rate maximization problem is, thus, formulated so as to jointly determine the computation and communication resources, and link-selection strategy as a means to strike a trade-off between the system throughput and fairness. Subject to realistic network constraints, e.g., power consumption and delay, the optimization problem is then solved iteratively via discrete relaxations, successive-convex approximation, and fractional programming, an approach which can be implemented in a distributed fashion across the network. Simulation results validate the proposed algorithms performance in terms of log-rate maximization, delay-sensitivity, scalability, and runtime performance. The practical distributed Co-NOMA implementation is particularly shown to offer appreciable benefits over traditional multiple access and NOMA methods, highlighting its applicability in decentralized XR systems.
Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization
Authors: Authors: Fan Yang, Wenxuan Zhou, Zuxin Liu, Ding Zhao, David Held
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.06903
Pdf link: https://arxiv.org/pdf/2310.06903
Abstract Safe Reinforcement Learning (RL) plays an important role in applying RL algorithms to safety-critical real-world applications, addressing the trade-off between maximizing rewards and adhering to safety constraints. This work introduces a novel approach that combines RL with trajectory optimization to manage this trade-off effectively. Our approach embeds safety constraints within the action space of a modified Markov Decision Process (MDP). The RL agent produces a sequence of actions that are transformed into safe trajectories by a trajectory optimizer, thereby effectively ensuring safety and increasing training stability. This novel approach excels in its performance on challenging Safety Gym tasks, achieving significantly higher rewards and near-zero safety violations during inference. The method's real-world applicability is demonstrated through a safe and effective deployment in a real robot task of box-pushing around obstacles.
PICProp: Physics-Informed Confidence Propagation for Uncertainty Quantification
Authors: Authors: Qianli Shen, Wai Hoh Tang, Zhun Deng, Apostolos Psaros, Kenji Kawaguchi
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.06923
Pdf link: https://arxiv.org/pdf/2310.06923
Abstract Standard approaches for uncertainty quantification in deep learning and physics-informed learning have persistent limitations. Indicatively, strong assumptions regarding the data likelihood are required, the performance highly depends on the selection of priors, and the posterior can be sampled only approximately, which leads to poor approximations because of the associated computational cost. This paper introduces and studies confidence interval (CI) estimation for deterministic partial differential equations as a novel problem. That is, to propagate confidence, in the form of CIs, from data locations to the entire domain with probabilistic guarantees. We propose a method, termed Physics-Informed Confidence Propagation (PICProp), based on bi-level optimization to compute a valid CI without making heavy assumptions. We provide a theorem regarding the validity of our method, and computational experiments, where the focus is on physics-informed learning.
Eclares: Energy-Aware Clarity-Driven Ergodic Search
Authors: Authors: Kaleb Ben Naveed, Devansh Agrawal, Christopher Vermillion, Dimitra Panagou
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.06933
Pdf link: https://arxiv.org/pdf/2310.06933
Abstract Planning informative trajectories while considering the spatial distribution of the information over the environment, as well as constraints such as the robot's limited battery capacity, makes the long-time horizon persistent coverage problem complex. Ergodic search methods consider the spatial distribution of environmental information while optimizing robot trajectories; however, current methods lack the ability to construct the target information spatial distribution for environments that vary stochastically across space and time. Moreover, current coverage methods dealing with battery capacity constraints either assume simple robot and battery models, or are computationally expensive. To address these problems, we propose a framework called Eclares, in which our contribution is two-fold. 1) First, we propose a method to construct the target information spatial distribution for ergodic trajectory optimization using clarity, an information measure bounded between [0,1]. The clarity dynamics allows us to capture information decay due to lack of measurements and to quantify the maximum attainable information in stochastic spatiotemporal environments. 2) Second, instead of directly tracking the ergodic trajectory, we introduce the energy-aware (eware) filter, which iteratively validates the ergodic trajectory to ensure that the robot has enough energy to return to the charging station when needed. The proposed eware filter is applicable to nonlinear robot models and is computationally lightweight. We demonstrate the working of the framework through a simulation case study.
On the Role of Font Formats in Building Efficient Web Applications
Authors: Authors: Benedikt Dornauer, Wolfgang Vigl, Michael Felderer
Subjects: Performance (cs.PF); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2310.06939
Pdf link: https://arxiv.org/pdf/2310.06939
Abstract The success of a web application is closely linked to its performance, which positively impacts user satisfaction and contributes to energy-saving efforts. Among the various optimization techniques, one specific subject focuses on improving the utilization of web fonts. This study investigates the impact of different font formats on client-side resource consumption, such as CPU, memory, load time, and energy. In a controlled experiment, we evaluate performance metrics using the four font formats: OTF, TTF, WOFF, and WOFF2. The results of the study show that there are significant differences between all pair-wise format comparisons regarding all performance metrics. Overall, WOFF2 performs best, except in terms of memory allocation. Through the study and examination of literature, this research contributes (1) an overview of methodologies to enhance web performance through font utilization, (2) a specific exploration of the four prevalent font formats in an experimental setup, and (3) practical recommendations for scientific professionals and practitioners.
Adversarial optimization leads to over-optimistic security-constrained dispatch, but sampling can help
Authors: Authors: Charles Dawson, Chuchu Fan
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.06956
Pdf link: https://arxiv.org/pdf/2310.06956
Abstract To ensure safe, reliable operation of the electrical grid, we must be able to predict and mitigate likely failures. This need motivates the classic security-constrained AC optimal power flow (SCOPF) problem. SCOPF is commonly solved using adversarial optimization, where the dispatcher and an adversary take turns optimizing a robust dispatch and adversarial attack, respectively. We show that adversarial optimization is liable to severely overestimate the robustness of the optimized dispatch (when the adversary encounters a local minimum), leading the operator to falsely believe that their dispatch is secure. To prevent this overconfidence, we develop a novel adversarial sampling approach that prioritizes diversity in the predicted attacks. We find that our method not only substantially improves the robustness of the optimized dispatch but also avoids overconfidence, accurately characterizing the likelihood of voltage collapse under a given threat model. We demonstrate a proof-of-concept on small-scale transmission systems with 14 and 57 nodes.
Multi-Robot Cooperative Navigation in Crowds: A Game-Theoretic Learning-Based Model Predictive Control Approach
Authors: Authors: Viet-Anh Le, Vaishnav Tadiparthi, Behdad Chalaki, Hossein Nourkhiz Mahjoub, Jovin D'sa, Ehsan Moradi-Pari, Andreas A. Malikopoulos
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2310.06964
Pdf link: https://arxiv.org/pdf/2310.06964
Abstract In this paper, we develop a control framework for the coordination of multiple robots as they navigate through crowded environments. Our framework comprises of a local model predictive control (MPC) for each robot and a social long short-term memory model that forecasts pedestrians' trajectories. We formulate the local MPC formulation for each individual robot that includes both individual and shared objectives, in which the latter encourages the emergence of coordination among robots. Next, we consider the multi-robot navigation and human-robot interaction, respectively, as a potential game and a two-player game, then employ an iterative best response approach to solve the resulting optimization problems in a centralized and distributed fashion. Finally, we demonstrate the effectiveness of coordination among robots in simulated crowd navigation.
Reconfigurable Intelligent Surfaces-Enabled Intra-Cell Pilot Reuse in Massive MIMO Systems
Authors: Authors: Jose Carlos Marinello Filho, Taufik Abrao, Ekram Hossain, Amine Mezghani
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP); Applications (stat.AP)
Arxiv link: https://arxiv.org/abs/2310.06975
Pdf link: https://arxiv.org/pdf/2310.06975
Abstract Channel state information (CSI) estimation is a critical issue in the design of modern massive multiple-input multiple-output (mMIMO) networks. With the increasing number of users, assigning orthogonal pilots to everyone incurs a large overhead that strongly penalizes the system's spectral efficiency (SE). It becomes thus necessary to reuse pilots, giving rise to pilot contamination, a vital performance bottleneck of mMIMO networks. Reusing pilots among the users of the same cell is a desirable operation condition from the perspective of reducing training overheads; however, the intra-cell pilot contamination might worsen due to the users' proximity. Reconfigurable intelligent surfaces (RISs), capable of smartly controlling the wireless channel, can be leveraged for intra-cell pilot reuse. In this paper, our main contribution is a RIS-aided approach for intra-cell pilot reuse and the corresponding channel estimation method. Relying upon the knowledge of only statistical CSI, we optimize the RIS phase shifts based on a manifold optimization framework and the RIS positioning based on a deterministic approach. The extensive numerical results highlight the remarkable performance improvements the proposed scheme achieves (for both uplink and downlink transmissions) compared to other alternatives.
Neural Relational Inference with Fast Modular Meta-learning
Authors: Authors: Ferran Alet, Erica Weng, Tomás Lozano Pérez, Leslie Pack Kaelbling
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07015
Pdf link: https://arxiv.org/pdf/2310.07015
Abstract \textit{Graph neural networks} (GNNs) are effective models for many dynamical systems consisting of entities and relations. Although most GNN applications assume a single type of entity and relation, many situations involve multiple types of interactions. \textit{Relational inference} is the problem of inferring these interactions and learning the dynamics from observational data. We frame relational inference as a \textit{modular meta-learning} problem, where neural modules are trained to be composed in different ways to solve many tasks. This meta-learning framework allows us to implicitly encode time invariance and infer relations in context of one another rather than independently, which increases inference capacity. Framing inference as the inner-loop optimization of meta-learning leads to a model-based approach that is more data-efficient and capable of estimating the state of entities that we do not observe directly, but whose existence can be inferred from their effect on observed entities. To address the large search space of graph neural network compositions, we meta-learn a \textit{proposal function} that speeds up the inner-loop simulated annealing search within the modular meta-learning algorithm, providing two orders of magnitude increase in the size of problems that can be addressed.
Neural Harmonium: An Interpretable Deep Structure for Nonlinear Dynamic System Identification with Application to Audio Processing
Authors: Authors: Karim Helwani, Erfan Soltanmohammadi, Michael M. Goodwin
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.07032
Pdf link: https://arxiv.org/pdf/2310.07032
Abstract Improving the interpretability of deep neural networks has recently gained increased attention, especially when the power of deep learning is leveraged to solve problems in physics. Interpretability helps us understand a model's ability to generalize and reveal its limitations. In this paper, we introduce a causal interpretable deep structure for modeling dynamic systems. Our proposed model makes use of the harmonic analysis by modeling the system in a time-frequency domain while maintaining high temporal and spectral resolution. Moreover, the model is built in an order recursive manner which allows for fast, robust, and exact second order optimization without the need for an explicit Hessian calculation. To circumvent the resulting high dimensionality of the building blocks of our system, a neural network is designed to identify the frequency interdependencies. The proposed model is illustrated and validated on nonlinear system identification problems as required for audio signal processing tasks. Crowd-sourced experimentation contrasting the performance of the proposed approach to other state-of-the-art solutions on an acoustic echo cancellation scenario confirms the effectiveness of our method for real-life applications.
GraphCloak: Safeguarding Task-specific Knowledge within Graph-structured Data from Unauthorized Exploitation
Authors: Authors: Yixin Liu, Chenrui Fan, Xun Chen, Pan Zhou, Lichao Sun
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2310.07100
Pdf link: https://arxiv.org/pdf/2310.07100
Abstract As Graph Neural Networks (GNNs) become increasingly prevalent in a variety of fields, from social network analysis to protein-protein interaction studies, growing concerns have emerged regarding the unauthorized utilization of personal data. Recent studies have shown that imperceptible poisoning attacks are an effective method of protecting image data from such misuse. However, the efficacy of this approach in the graph domain remains unexplored. To bridge this gap, this paper introduces GraphCloak to safeguard against the unauthorized usage of graph data. Compared with prior work, GraphCloak offers unique significant innovations: (1) graph-oriented, the perturbations are applied to both topological structures and descriptive features of the graph; (2) effective and stealthy, our cloaking method can bypass various inspections while causing a significant performance drop in GNNs trained on the cloaked graphs; and (3) stable across settings, our methods consistently perform effectively under a range of practical settings with limited knowledge. To address the intractable bi-level optimization problem, we propose two error-minimizing-based poisoning methods that target perturbations on the structural and feature space, along with a subgraph injection poisoning method. Our comprehensive evaluation of these methods underscores their effectiveness, stealthiness, and stability. We also delve into potential countermeasures and provide analytical justification for their effectiveness, paving the way for intriguing future research.
Off-Policy Evaluation for Human Feedback
Authors: Authors: Qitong Gao, Juncheng Dong, Vahid Tarokh, Min Chi, Miroslav Pajic
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07123
Pdf link: https://arxiv.org/pdf/2310.07123
Abstract Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare. However, existing OPE methods fall short in estimating human feedback (HF) signals, as HF may be conditioned over multiple underlying factors and is only sparsely available; as opposed to the agent-defined environmental rewards (used in policy optimization), which are usually determined over parametric functions or distributions. Consequently, the nature of HF signals makes extrapolating accurate OPE estimations to be challenging. To resolve this, we introduce an OPE for HF (OPEHF) framework that revives existing OPE methods in order to accurately evaluate the HF signals. Specifically, we develop an immediate human reward (IHR) reconstruction approach, regularized by environmental knowledge distilled in a latent space that captures the underlying dynamics of state transitions as well as issuing HF signals. Our approach has been tested over two real-world experiments, adaptive in-vivo neurostimulation and intelligent tutoring, as well as in a simulation environment (visual Q&A). Results show that our approach significantly improves the performance toward estimating HF signals accurately, compared to directly applying (variants of) existing OPE methods.
Risk Assessment and Statistical Significance in the Age of Foundation Models
Authors: Authors: Apoorva Nitsure, Youssef Mroueh, Mattia Rigotti, Kristjan Greenewald, Brian Belgodere, Mikhail Yurochkin, Jiri Navratil, Igor Melnyk, Jerret Ross
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Risk Management (q-fin.RM); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.07132
Pdf link: https://arxiv.org/pdf/2310.07132
Abstract We propose a distributional framework for assessing socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and mathematical finance to balance risk and utility when choosing between alternatives. Using this framework, we formally develop a risk-aware approach for foundation model selection given guardrails quantified by specified metrics. Inspired by portfolio optimization and selection theory in mathematical finance, we define a \emph{metrics portfolio} for each model as a means to aggregate a collection of metrics, and perform model selection based on the stochastic dominance of these portfolios. The statistical significance of our tests is backed theoretically by an asymptotic analysis via central limit theorems instantiated in practice via a bootstrap variance estimate. We use our framework to compare various large language models regarding risks related to drifting from instructions and outputting toxic content.
Denoising Task Routing for Diffusion Models
Authors: Authors: Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07138
Pdf link: https://arxiv.org/pdf/2310.07138
Abstract Diffusion models generate highly realistic images through learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL). Despite the inherent connection between diffusion models and MTL, there remains an unexplored area in designing neural architectures that explicitly incorporate MTL into the framework of diffusion models. In this paper, we present Denoising Task Routing (DTR), a simple add-on strategy for existing diffusion model architectures to establish distinct information pathways for individual tasks within a single architecture by selectively activating subsets of channels in the model. What makes DTR particularly compelling is its seamless integration of prior knowledge of denoising tasks into the framework: (1) Task Affinity: DTR activates similar channels for tasks at adjacent timesteps and shifts activated channels as sliding windows through timesteps, capitalizing on the inherent strong affinity between tasks at adjacent timesteps. (2) Task Weights: During the early stages (higher timesteps) of the denoising process, DTR assigns a greater number of task-specific channels, leveraging the insight that diffusion models prioritize reconstructing global structure and perceptually rich contents in earlier stages, and focus on simple noise removal in later stages. Our experiments demonstrate that DTR consistently enhances the performance of diffusion models across various evaluation protocols, all without introducing additional parameters. Furthermore, DTR contributes to accelerating convergence during training. Finally, we show the complementarity between our architectural approach and existing MTL optimization techniques, providing a more complete view of MTL within the context of diffusion training.
Operating-Envelopes-Aware Decentralized Welfare Maximization for Energy Communities
Authors: Authors: Ahmed S. Alahmed, Guido Cavraro, Andrey Bernstein, Lang Tong
Subjects: Systems and Control (eess.SY); Theoretical Economics (econ.TH); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2310.07157
Pdf link: https://arxiv.org/pdf/2310.07157
Abstract We propose an operating-envelope-aware, prosumer-centric, and efficient energy community that aggregates individual and shared community distributed energy resources and transacts with a regulated distribution system operator (DSO) under a generalized net energy metering tariff design. To ensure safe network operation, the DSO imposes dynamic export and import limits, known as dynamic operating envelopes, on end-users' revenue meters. Given the operating envelopes, we propose an incentive-aligned community pricing mechanism under which the decentralized optimization of community members' benefit implies the optimization of overall community welfare. The proposed pricing mechanism satisfies the cost-causation principle and ensures the stability of the energy community in a coalition game setting. Numerical examples provide insights into the characteristics of the proposed pricing mechanism and quantitative measures of its performance.
Integrated Sensing and Communication enabled Multiple Base Stations Cooperative Sensing Towards 6G
Authors: Authors: Zhiqing Wei, Wangjun Jiang, Zhiyong Feng, Huici Wu, Ning Zhang, Kaifeng Han, Ruizhong Xu, Ping Zhang
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.07180
Pdf link: https://arxiv.org/pdf/2310.07180
Abstract Driven by the intelligent applications of sixth-generation (6G) mobile communication systems such as smart city and autonomous driving, which connect the physical and cyber space, the integrated sensing and communication (ISAC) brings a revolutionary change to the base stations (BSs) of 6G by integrating radar sensing and communication in the same hardware and wireless resource. However, with the requirements of long-range and accurate sensing in the applications of smart city and autonomous driving, the ISAC enabled single BS still has a limitation in the sensing range and accuracy. With the networked infrastructures of mobile communication systems, multi-BS cooperative sensing is a natural choice satisfying the requirement of long-range and accurate sensing. In this article, the framework of multi-BS cooperative sensing is proposed, breaking through the limitation of single-BS sensing. The enabling technologies, including unified ISAC performance metrics, ISAC signal design and optimization, interference management, cooperative sensing algorithms, are introduced in details. The performance evaluation results are provided to verify the effectiveness of multi-BS cooperative sensing schemes. With ISAC enabled multi-BS cooperative sensing (ISAC-MCS), the intelligent infrastructures connecting physical and cyber space can be established, ushering the era of 6G promoting the intelligence of everything.
$pκ$-Curves: Interpolatory curves with curvature approximating a parabola
Authors: Authors: Zhihao Wang, Juan Cao, Tuan Guan, Zhonggui Chen, Yongjie Jessica Zhang
Subjects: Computational Geometry (cs.CG); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.07191
Pdf link: https://arxiv.org/pdf/2310.07191
Abstract This paper introduces a novel class of fair and interpolatory curves called $p\kappa$-curves. These curves are comprised of smoothly stitched B\'ezier curve segments, where the curvature distribution of each segment is made to closely resemble a parabola, resulting in an aesthetically pleasing shape. Moreover, each segment passes through an interpolated point at a parameter where the parabola has an extremum, encouraging the alignment of interpolated points with curvature extrema. To achieve these properties, we tailor an energy function that guides the optimization process to obtain the desired curve characteristics. Additionally, we develop an efficient algorithm and an initialization method, enabling interactive modeling of the $p\kappa$-curves without the need for global optimization. We provide various examples and comparisons with existing state-of-the-art methods to demonstrate the curve modeling capabilities and visually pleasing appearance of $p\kappa$-curves.
DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation
Authors: Authors: Rong Wang, Wei Mao, Hongdong Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.07206
Pdf link: https://arxiv.org/pdf/2310.07206
Abstract This paper addresses the task of 3D pose estimation for a hand interacting with an object from a single image observation. When modeling hand-object interaction, previous works mainly exploit proximity cues, while overlooking the dynamical nature that the hand must stably grasp the object to counteract gravity and thus preventing the object from slipping or falling. These works fail to leverage dynamical constraints in the estimation and consequently often produce unstable results. Meanwhile, refining unstable configurations with physics-based reasoning remains challenging, both by the complexity of contact dynamics and by the lack of effective and efficient physics inference in the data-driven learning framework. To address both issues, we present DeepSimHO: a novel deep-learning pipeline that combines forward physics simulation and backward gradient approximation with a neural network. Specifically, for an initial hand-object pose estimated by a base network, we forward it to a physics simulator to evaluate its stability. However, due to non-smooth contact geometry and penetration, existing differentiable simulators can not provide reliable state gradient. To remedy this, we further introduce a deep network to learn the stability evaluation process from the simulator, while smoothly approximating its gradient and thus enabling effective back-propagation. Extensive experiments show that our method noticeably improves the stability of the estimation and achieves superior efficiency over test-time optimization. The code is available at https://github.com/rongakowang/DeepSimHO.
Robust Safe Reinforcement Learning under Adversarial Disturbances
Authors: Authors: Zeyang Li, Chuxiong Hu, Shengbo Eben Li, Jia Cheng, Yunan Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07207
Pdf link: https://arxiv.org/pdf/2310.07207
Abstract Safety is a primary concern when applying reinforcement learning to real-world control tasks, especially in the presence of external disturbances. However, existing safe reinforcement learning algorithms rarely account for external disturbances, limiting their applicability and robustness in practice. To address this challenge, this paper proposes a robust safe reinforcement learning framework that tackles worst-case disturbances. First, this paper presents a policy iteration scheme to solve for the robust invariant set, i.e., a subset of the safe set, where persistent safety is only possible for states within. The key idea is to establish a two-player zero-sum game by leveraging the safety value function in Hamilton-Jacobi reachability analysis, in which the protagonist (i.e., control inputs) aims to maintain safety and the adversary (i.e., external disturbances) tries to break down safety. This paper proves that the proposed policy iteration algorithm converges monotonically to the maximal robust invariant set. Second, this paper integrates the proposed policy iteration scheme into a constrained reinforcement learning algorithm that simultaneously synthesizes the robust invariant set and uses it for constrained policy optimization. This algorithm tackles both optimality and safety, i.e., learning a policy that attains high rewards while maintaining safety under worst-case disturbances. Experiments on classic control tasks show that the proposed method achieves zero constraint violation with learned worst-case adversarial disturbances, while other baseline algorithms violate the safety constraints substantially. Our proposed method also attains comparable performance as the baselines even in the absence of the adversary.
Fault-tolerant $k$-Supplier with Outliers
Authors: Authors: Deeparnab Chakrabarty, Luc Cote, Ankita Sarkar
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2310.07208
Pdf link: https://arxiv.org/pdf/2310.07208
Abstract We present approximation algorithms for the Fault-tolerant $k$-Supplier with Outliers ($\mathsf{F}k\mathsf{SO}$) problem. This is a common generalization of two known problems -- $k$-Supplier with Outliers, and Fault-tolerant $k$-Supplier -- each of which generalize the well-known $k$-Supplier problem. In the $k$-Supplier problem the goal is to serve $n$ clients $C$, by opening $k$ facilities from a set of possible facilities $F$; the objective function is the farthest that any client must travel to access an open facility. In $\mathsf{F}k\mathsf{SO}$, each client $v$ has a fault-tolerance $\ell_v$, and now desires $\ell_v$ facilities to serve it; so each client $v$'s contribution to the objective function is now its distance to the $\ell_v^{\text{th}}$ closest open facility. Furthermore, we are allowed to choose $m$ clients that we will serve, and only those clients contribute to the objective function, while the remaining $n-m$ are considered outliers. Our main result is a $\min{4t-1,2^t+1}$-approximation for the $\mathsf{F}k\mathsf{SO}$ problem, where $t$ is the number of distinct values of $\ell_v$ that appear in the instance. At $t=1$, i.e. in the case where the $\ell_v$'s are uniformly some $\ell$, this yields a $3$-approximation, improving upon the $11$-approximation given for the uniform case by Inamdar and Varadarajan [2020], who also introduced the problem. Our result for the uniform case matches tight $3$-approximations that exist for $k$-Supplier, $k$-Supplier with Outliers, and Fault-tolerant $k$-Supplier. Our key technical contribution is an application of the round-or-cut schema to $\mathsf{F}k\mathsf{SO}$. Guided by an LP relaxation, we reduce to a simpler optimization problem, which we can solve to obtain distance bounds for the "round" step, and valid inequalities for the "cut" step.
Enhancing Neural Architecture Search with Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT Devices
Authors: Authors: Alessio Burrello, Matteo Risso, Beatrice Alessandra Motetti, Enrico Macii, Luca Benini, Daniele Jahier Pagliari
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07217
Pdf link: https://arxiv.org/pdf/2310.07217
Abstract The rapid proliferation of computing domains relying on Internet of Things (IoT) devices has created a pressing need for efficient and accurate deep-learning (DL) models that can run on low-power devices. However, traditional DL models tend to be too complex and computationally intensive for typical IoT end-nodes. To address this challenge, Neural Architecture Search (NAS) has emerged as a popular design automation technique for co-optimizing the accuracy and complexity of deep neural networks. Nevertheless, existing NAS techniques require many iterations to produce a network that adheres to specific hardware constraints, such as the maximum memory available on the hardware or the maximum latency allowed by the target application. In this work, we propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods, which allows the generation, in a single shot, of a model that respects user-defined constraints on both memory and latency in a time comparable to a single standard training. The proposed approach is evaluated on five IoT-relevant benchmarks, including the MLPerf Tiny suite and Tiny ImageNet, demonstrating that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively (as defined by our targets), while ensuring non-inferior accuracy on state-of-the-art hand-tuned deep neural networks for TinyML.
Are GATs Out of Balance?
Authors: Authors: Nimrah Mustafa, Aleksandar Bojchevski, Rebekka Burkholz
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07235
Pdf link: https://arxiv.org/pdf/2310.07235
Abstract While the expressive power and computational capabilities of graph neural networks (GNNs) have been theoretically studied, their optimization and learning dynamics, in general, remain largely unexplored. Our study undertakes the Graph Attention Network (GAT), a popular GNN architecture in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. We derive a conservation law of GAT gradient flow dynamics, which explains why a high portion of parameters in GATs with standard initialization struggle to change during training. This effect is amplified in deeper GATs, which perform significantly worse than their shallow counterparts. To alleviate this problem, we devise an initialization scheme that balances the GAT network. Our approach i) allows more effective propagation of gradients and in turn enables trainability of deeper networks, and ii) attains a considerable speedup in training and convergence time in comparison to the standard initialization. Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.
Cost-aware Joint Caching and Forwarding in Networks with Heterogeneous Cache Resources
Authors: Authors: Faruk Volkan Mutlu, Edmund Yeh
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2310.07243
Pdf link: https://arxiv.org/pdf/2310.07243
Abstract Caching is crucial for enabling high-throughput networks for data intensive applications. Traditional caching technology relies on DRAM, as it can transfer data at a high rate. However, DRAM capacity is subject to contention by most system components and thus is very limited, implying that DRAM-only caches cannot scale to meet growing demand. Fortunately, persistent memory and flash storage technologies are rapidly evolving and can be utilized alongside DRAM to increase cache capacities. To do so without compromising network performance requires caching techniques adapted to the characteristics of these technologies. In this paper, we model the cache as a collection of storage blocks with different rate parameters and utilization costs. We introduce an optimization technique based on the drift-plus-penalty method and apply it in a framework which enables joint caching and forwarding. We show that it achieves an optimal trade-off between throughput and cache utilization costs in a virtual control plane. We then develop a corresponding practical policy in the data plane. Finally, through simulations in several settings, we demonstrate the superior performance of our proposed approach with respect to total user delay and cache utilization costs.
Score Regularized Policy Optimization through Diffusion Behavior
Authors: Authors: Huayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07297
Pdf link: https://arxiv.org/pdf/2310.07297
Abstract Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous behavior policies. However, sampling from diffusion policies is considerably slow because it necessitates tens to hundreds of iterative inference steps for one action. To address this issue, we propose to extract an efficient deterministic inference policy from critic models and pretrained diffusion behavior models, leveraging the latter to directly regularize the policy gradient with the behavior distribution's score function during optimization. Our method enjoys powerful generative capabilities of diffusion modeling while completely circumventing the computationally intensive and time-consuming diffusion sampling scheme, both during training and evaluation. Extensive results on D4RL tasks show that our method boosts action sampling speed by more than 25 times compared with various leading diffusion-based methods in locomotion tasks, while still maintaining state-of-the-art performance.
Histopathological Image Classification and Vulnerability Analysis using Federated Learning
Authors: Authors: Sankalp Vyas, Amar Nath Patra, Raj Mani Shukla
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07380
Pdf link: https://arxiv.org/pdf/2310.07380
Abstract Healthcare is one of the foremost applications of machine learning (ML). Traditionally, ML models are trained by central servers, which aggregate data from various distributed devices to forecast the results for newly generated data. This is a major concern as models can access sensitive user information, which raises privacy concerns. A federated learning (FL) approach can help address this issue: A global model sends its copy to all clients who train these copies, and the clients send the updates (weights) back to it. Over time, the global model improves and becomes more accurate. Data privacy is protected during training, as it is conducted locally on the clients' devices. However, the global model is susceptible to data poisoning. We develop a privacy-preserving FL technique for a skin cancer dataset and show that the model is prone to data poisoning attacks. Ten clients train the model, but one of them intentionally introduces flipped labels as an attack. This reduces the accuracy of the global model. As the percentage of label flipping increases, there is a noticeable decrease in accuracy. We use a stochastic gradient descent optimization algorithm to find the most optimal accuracy for the model. Although FL can protect user privacy for healthcare diagnostics, it is also vulnerable to data poisoning, which must be addressed.
Deep Kernel and Image Quality Estimators for Optimizing Robotic Ultrasound Controller using Bayesian Optimization
Authors: Authors: Deepak Raina, SH Chandrashekhara, Richard Voyles, Juan Wachs, Subir Kumar Saha
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07392
Pdf link: https://arxiv.org/pdf/2310.07392
Abstract Ultrasound is a commonly used medical imaging modality that requires expert sonographers to manually maneuver the ultrasound probe based on the acquired image. Autonomous Robotic Ultrasound (A-RUS) is an appealing alternative to this manual procedure in order to reduce sonographers' workload. The key challenge to A-RUS is optimizing the ultrasound image quality for the region of interest across different patients. This requires knowledge of anatomy, recognition of error sources and precise probe position, orientation and pressure. Sample efficiency is important while optimizing these parameters associated with the robotized probe controller. Bayesian Optimization (BO), a sample-efficient optimization framework, has recently been applied to optimize the 2D motion of the probe. Nevertheless, further improvements are needed to improve the sample efficiency for high-dimensional control of the probe. We aim to overcome this problem by using a neural network to learn a low-dimensional kernel in BO, termed as Deep Kernel (DK). The neural network of DK is trained using probe and image data acquired during the procedure. The two image quality estimators are proposed that use a deep convolution neural network and provide real-time feedback to the BO. We validated our framework using these two feedback functions on three urinary bladder phantoms. We obtained over 50% increase in sample efficiency for 6D control of the robotized probe. Furthermore, our results indicate that this performance enhancement in BO is independent of the specific training dataset, demonstrating inter-patient adaptability.
RANS: Highly-Parallelised Simulator for Reinforcement Learning based Autonomous Navigating Spacecrafts
Authors: Authors: Matteo El-Hariry, Antoine Richard, Miguel Olivares-Mendez
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2310.07393
Pdf link: https://arxiv.org/pdf/2310.07393
Abstract Nowadays, realistic simulation environments are essential to validate and build reliable robotic solutions. This is particularly true when using Reinforcement Learning (RL) based control policies. To this end, both robotics and RL developers need tools and workflows to create physically accurate simulations and synthetic datasets. Gazebo, MuJoCo, Webots, Pybullets or Isaac Sym are some of the many tools available to simulate robotic systems. Developing learning-based methods for space navigation is, due to the highly complex nature of the problem, an intensive data-driven process that requires highly parallelized simulations. When it comes to the control of spacecrafts, there is no easy to use simulation library designed for RL. We address this gap by harnessing the capabilities of NVIDIA Isaac Gym, where both physics simulation and the policy training reside on GPU. Building on this tool, we provide an open-source library enabling users to simulate thousands of parallel spacecrafts, that learn a set of maneuvering tasks, such as position, attitude, and velocity control. These tasks enable to validate complex space scenarios, such as trajectory optimization for landing, docking, rendezvous and more.
IRS Assisted Federated Learning A Broadband Over-the-Air Aggregation Approach
Authors: Authors: Deyou Zhang, Ming Xiao, Zhibo Pang, Lihui Wang, H. Vincent Poor
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.07405
Pdf link: https://arxiv.org/pdf/2310.07405
Abstract We consider a broadband over-the-air computation empowered model aggregation approach for wireless federated learning (FL) systems and propose to leverage an intelligent reflecting surface (IRS) to combat wireless fading and noise. We first investigate the conventional node-selection based framework, where a few edge nodes are dropped in model aggregation to control the aggregation error. We analyze the performance of this node-selection based framework and derive an upper bound on its performance loss, which is shown to be related to the selected edge nodes. Then, we seek to minimize the mean-squared error (MSE) between the desired global gradient parameters and the actually received ones by optimizing the selected edge nodes, their transmit equalization coefficients, the IRS phase shifts, and the receive factors of the cloud server. By resorting to the matrix lifting technique and difference-of-convex programming, we successfully transform the formulated optimization problem into a convex one and solve it using off-the-shelf solvers. To improve learning performance, we further propose a weight-selection based FL framework. In such a framework, we assign each edge node a proper weight coefficient in model aggregation instead of discarding any of them to reduce the aggregation error, i.e., amplitude alignment of the received local gradient parameters from different edge nodes is not required. We also analyze the performance of this weight-selection based framework and derive an upper bound on its performance loss, followed by minimizing the MSE via optimizing the weight coefficients of the edge nodes, their transmit equalization coefficients, the IRS phase shifts, and the receive factors of the cloud server. Furthermore, we use the MNIST dataset for simulations to evaluate the performance of both node-selection and weight-selection based FL frameworks.
AI/ML-based Load Prediction in IEEE 802.11 Enterprise Networks
Authors: Authors: Francesc Wilhelmi, Dariush Salami, Gianluca Fontanesi, Lorenzo Galati-Giordano, Mika Kasslin
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.07467
Pdf link: https://arxiv.org/pdf/2310.07467
Abstract Enterprise Wi-Fi networks can greatly benefit from Artificial Intelligence and Machine Learning (AI/ML) thanks to their well-developed management and operation capabilities. At the same time, AI/ML-based traffic/load prediction is one of the most appealing data-driven solutions to improve the Wi-Fi experience, either through the enablement of autonomous operation or by boosting troubleshooting with forecasted network utilization. In this paper, we study the suitability and feasibility of adopting AI/ML-based load prediction in practical enterprise Wi-Fi networks. While leveraging AI/ML solutions can potentially contribute to optimizing Wi-Fi networks in terms of energy efficiency, performance, and reliability, their effective adoption is constrained to aspects like data availability and quality, computational capabilities, and energy consumption. Our results show that hardware-constrained AI/ML models can potentially predict network load with less than 20% average error and 3% 85th-percentile error, which constitutes a suitable input for proactively driving Wi-Fi network optimization.
Boosting Black-box Attack to Deep Neural Networks with Conditional Diffusion Models
Authors: Authors: Renyang Liu, Wei Zhou, Tianwei Zhang, Kangjie Chen, Jun Zhao, Kwok-Yan Lam
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07492
Pdf link: https://arxiv.org/pdf/2310.07492
Abstract Existing black-box attacks have demonstrated promising potential in creating adversarial examples (AE) to deceive deep learning models. Most of these attacks need to handle a vast optimization space and require a large number of queries, hence exhibiting limited practical impacts in real-world scenarios. In this paper, we propose a novel black-box attack strategy, Conditional Diffusion Model Attack (CDMA), to improve the query efficiency of generating AEs under query-limited situations. The key insight of CDMA is to formulate the task of AE synthesis as a distribution transformation problem, i.e., benign examples and their corresponding AEs can be regarded as coming from two distinctive distributions and can transform from each other with a particular converter. Unlike the conventional \textit{query-and-optimization} approach, we generate eligible AEs with direct conditional transform using the aforementioned data converter, which can significantly reduce the number of queries needed. CDMA adopts the conditional Denoising Diffusion Probabilistic Model as the converter, which can learn the transformation from clean samples to AEs, and ensure the smooth development of perturbed noise resistant to various defense strategies. We demonstrate the effectiveness and efficiency of CDMA by comparing it with nine state-of-the-art black-box attacks across three benchmark datasets. On average, CDMA can reduce the query count to a handful of times; in most cases, the query count is only ONE. We also show that CDMA can obtain $>99\%$ attack success rate for untarget attacks over all datasets and targeted attack over CIFAR-10 with the noise budget of $\epsilon=16$.
Diversity for Contingency: Learning Diverse Behaviors for Efficient Adaptation and Transfer
Authors: Authors: Finn Rietz, Johannes Andreas Stork
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07493
Pdf link: https://arxiv.org/pdf/2310.07493
Abstract Discovering all useful solutions for a given task is crucial for transferable RL agents, to account for changes in the task or transition dynamics. This is not considered by classical RL algorithms that are only concerned with finding the optimal policy, given the current task and dynamics. We propose a simple method for discovering all possible solutions of a given task, to obtain an agent that performs well in the transfer setting and adapts quickly to changes in the task or transition dynamics. Our method iteratively learns a set of policies, while each subsequent policy is constrained to yield a solution that is unlikely under all previous policies. Unlike prior methods, our approach does not require learning additional models for novelty detection and avoids balancing task and novelty reward signals, by directly incorporating the constraint into the action selection and optimization steps.
Sample-Driven Federated Learning for Energy-Efficient and Real-Time IoT Sensing
Authors: Authors: Minh Ngoc Luu, Minh-Duong Nguyen, Ebrahim Bedeer, Van Duc Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Quoc-Viet Pham
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07497
Pdf link: https://arxiv.org/pdf/2310.07497
Abstract In the domain of Federated Learning (FL) systems, recent cutting-edge methods heavily rely on ideal conditions convergence analysis. Specifically, these approaches assume that the training datasets on IoT devices possess similar attributes to the global data distribution. However, this approach fails to capture the full spectrum of data characteristics in real-time sensing FL systems. In order to overcome this limitation, we suggest a new approach system specifically designed for IoT networks with real-time sensing capabilities. Our approach takes into account the generalization gap due to the user's data sampling process. By effectively controlling this sampling process, we can mitigate the overfitting issue and improve overall accuracy. In particular, We first formulate an optimization problem that harnesses the sampling process to concurrently reduce overfitting while maximizing accuracy. In pursuit of this objective, our surrogate optimization problem is adept at handling energy efficiency while optimizing the accuracy with high generalization. To solve the optimization problem with high complexity, we introduce an online reinforcement learning algorithm, named Sample-driven Control for Federated Learning (SCFL) built on the Soft Actor-Critic (A2C) framework. This enables the agent to dynamically adapt and find the global optima even in changing environments. By leveraging the capabilities of SCFL, our system offers a promising solution for resource allocation in FL systems with real-time sensing capabilities.
A Unified Remote Sensing Anomaly Detector Across Modalities and Scenes via Deviation Relationship Learning
Authors: Authors: Jingtao Li, Xinyu Wang, Hengwei Zhao, Liangpei Zhang, Yanfei Zhong
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2310.07511
Pdf link: https://arxiv.org/pdf/2310.07511
Abstract Remote sensing anomaly detector can find the objects deviating from the background as potential targets. Given the diversity in earth anomaly types, a unified anomaly detector across modalities and scenes should be cost-effective and flexible to new earth observation sources and anomaly types. However, the current anomaly detectors are limited to a single modality and single scene, since they aim to learn the varying background distribution. Motivated by the universal anomaly deviation pattern, in that anomalies exhibit deviations from their local context, we exploit this characteristic to build a unified anomaly detector. Firstly, we reformulate the anomaly detection task as an undirected bilayer graph based on the deviation relationship, where the anomaly score is modeled as the conditional probability, given the pattern of the background and normal objects. The learning objective is then expressed as a conditional probability ranking problem. Furthermore, we design an instantiation of the reformulation in the data, architecture, and optimization aspects. Simulated spectral and spatial anomalies drive the instantiated architecture. The model is optimized directly for the conditional probability ranking. The proposed model was validated in five modalities including the hyperspectral, visible light, synthetic aperture radar (SAR), infrared and low light to show its unified detection ability.
Retrieve Anything To Augment Large Language Models
Authors: Authors: Peitian Zhang, Shitao Xiao, Zheng Liu, Zhicheng Dou, Jian-Yun Nie
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2310.07554
Pdf link: https://arxiv.org/pdf/2310.07554
Abstract Large language models (LLMs) face significant challenges stemming from the inherent limitations in knowledge, memory, alignment, and action. These challenges cannot be addressed by LLMs alone, but should rely on assistance from the external world, such as knowledge base, memory store, demonstration examples, and tools. Retrieval augmentation stands as a vital mechanism for bridging the gap between LLMs and the external assistance. However, conventional methods encounter two pressing issues. On one hand, the general-purpose retrievers are not properly optimized for the retrieval augmentation of LLMs. On the other hand, the task-specific retrievers lack the required versatility, hindering their performance across the diverse retrieval augmentation scenarios. In this work, we present a novel approach, the LLM Embedder, which comprehensively support the diverse needs of LLMs' retrieval augmentation with one unified embedding model. Training such an unified model is non-trivial, as various retrieval tasks aim to capture distinct semantic relationships, often subject to mutual interference. To address this challenge, we systematically optimize our training methodology. This includes reward formulation based on LLMs' feedback, the stabilization of knowledge distillation, multi-task fine-tuning with explicit instructions, and the use of homogeneous in-batch negative sampling. These optimization strategies contribute to the outstanding empirical performance of the LLM-Embedder. Notably, it yields remarkable enhancements in retrieval augmentation for LLMs, surpassing both general-purpose and task-specific retrievers in various evaluation scenarios. This project is made publicly available at https://github.com/FlagOpen/FlagEmbedding.
ROMO: Retrieval-enhanced Offline Model-based Optimization
Authors: Authors: Mingcheng Chen, Haoran Zhao, Yuxiang Zhao, Hulei Fan, Hongqiao Gao, Yong Yu, Zheng Tian
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07560
Pdf link: https://arxiv.org/pdf/2310.07560
Abstract Data-driven black-box model-based optimization (MBO) problems arise in a great number of practical application scenarios, where the goal is to find a design over the whole space maximizing a black-box target function based on a static offline dataset. In this work, we consider a more general but challenging MBO setting, named constrained MBO (CoMBO), where only part of the design space can be optimized while the rest is constrained by the environment. A new challenge arising from CoMBO is that most observed designs that satisfy the constraints are mediocre in evaluation. Therefore, we focus on optimizing these mediocre designs in the offline dataset while maintaining the given constraints rather than further boosting the best observed design in the traditional MBO setting. We propose retrieval-enhanced offline model-based optimization (ROMO), a new derivable forward approach that retrieves the offline dataset and aggregates relevant samples to provide a trusted prediction, and use it for gradient-based optimization. ROMO is simple to implement and outperforms state-of-the-art approaches in the CoMBO setting. Empirically, we conduct experiments on a synthetic Hartmann (3D) function dataset, an industrial CIO dataset, and a suite of modified tasks in the Design-Bench benchmark. Results show that ROMO performs well in a wide range of constrained optimization tasks.
Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer
Authors: Authors: Zikai Xiao, Zihan Chen, Songshang Liu, Hualiang Wang, Yang Feng, Jin Hao, Joey Tianyi Zhou, Jian Wu, Howard Hao Yang, Zuozhu Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07587
Pdf link: https://arxiv.org/pdf/2310.07587
Abstract Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or centralized long-tailed learning methods hardly apply due to challenges in (a) characterizing the global long-tailed distribution under privacy constraints and (b) adjusting the local learning strategy to cope with the head-tail imbalance. In response, we propose a method termed $\texttt{Fed-GraB}$, comprised of a Self-adjusting Gradient Balancer (SGB) module that re-weights clients' gradients in a closed-loop manner, based on the feedback of global long-tailed distribution evaluated by a Direct Prior Analyzer (DPA) module. Using $\texttt{Fed-GraB}$, clients can effectively alleviate the distribution drift caused by data heterogeneity during the model training process and obtain a global model with better performance on the minority classes while maintaining the performance of the majority classes. Extensive experiments demonstrate that $\texttt{Fed-GraB}$ achieves state-of-the-art performance on representative datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist.
Approximating Subset Sum Ratio faster than Subset Sum
Authors: Authors: Karl Bringmann
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2310.07595
Pdf link: https://arxiv.org/pdf/2310.07595
Abstract Subset Sum Ratio is the following optimization problem: Given a set of $n$ positive numbers $I$, find disjoint subsets $X,Y \subseteq I$ minimizing the ratio $\max{\Sigma(X)/\Sigma(Y),\Sigma(Y)/\Sigma(X)}$, where $\Sigma(Z)$ denotes the sum of all elements of $Z$. Subset Sum Ratio is an optimization variant of the Equal Subset Sum problem. It was introduced by Woeginger and Yu in '92 and is known to admit an FPTAS [Bazgan, Santha, Tuza '98]. The best approximation schemes before this work had running time $O(n^4/\varepsilon)$ [Melissinos, Pagourtzis '18], $\tilde O(n^{2.3}/\varepsilon^{2.6})$ and $\tilde O(n^2/\varepsilon^3)$ [Alonistiotis et al. '22]. In this work, we present an improved approximation scheme for Subset Sum Ratio running in time $O(n / \varepsilon^{0.9386})$. Here we assume that the items are given in sorted order, otherwise we need an additional running time of $O(n \log n)$ for sorting. Our improved running time simultaneously improves the dependence on $n$ to linear and the dependence on $1/\varepsilon$ to sublinear. For comparison, the related Subset Sum problem admits an approximation scheme running in time $O(n/\varepsilon)$ [Gens, Levner '79]. If one would achieve an approximation scheme with running time $\tilde O(n / \varepsilon^{0.99})$ for Subset Sum, then one would falsify the Strong Exponential Time Hypothesis [Abboud, Bringmann, Hermelin, Shabtay '19] as well as the Min-Plus-Convolution Hypothesis [Bringmann, Nakos '21]. We thus establish that Subset Sum Ratio admits faster approximation schemes than Subset Sum. This comes as a surprise, since at any point in time before this work the best known approximation scheme for Subset Sum Ratio had a worse running time than the best known approximation scheme for Subset Sum.
OpsEval: A Comprehensive Task-Oriented AIOps Benchmark for Large Language Models
Authors: Authors: Yuhe Liu, Changhua Pei, Longlong Xu, Bohan Chen, Mingze Sun, Zhirui Zhang, Yongqian Sun, Shenglin Zhang, Kun Wang, Haiming Zhang, Jianhui Li, Gaogang Xie, Xidaoo Wen, Xiaohui Nie, Dan Pei
Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2310.07637
Pdf link: https://arxiv.org/pdf/2310.07637
Abstract Large language models (LLMs) have exhibited remarkable capabilities in NLP-related tasks such as translation, summarizing, and generation. The application of LLMs in specific areas, notably AIOps (Artificial Intelligence for IT Operations), holds great potential due to their advanced abilities in information summarizing, report analyzing, and ability of API calling. Nevertheless, the performance of current LLMs in AIOps tasks is yet to be determined. Furthermore, a comprehensive benchmark is required to steer the optimization of LLMs tailored for AIOps. Compared with existing benchmarks that focus on evaluating specific fields like network configuration, in this paper, we present \textbf{OpsEval}, a comprehensive task-oriented AIOps benchmark designed for LLMs. For the first time, OpsEval assesses LLMs' proficiency in three crucial scenarios (Wired Network Operation, 5G Communication Operation, and Database Operation) at various ability levels (knowledge recall, analytical thinking, and practical application). The benchmark includes 7,200 questions in both multiple-choice and question-answer (QA) formats, available in English and Chinese. With quantitative and qualitative results, we show how various LLM tricks can affect the performance of AIOps, including zero-shot, chain-of-thought, and few-shot in-context learning. We find that GPT4-score is more consistent with experts than widely used Bleu and Rouge, which can be used to replace automatic metrics for large-scale qualitative evaluations.
Automated Layout Design and Control of Robust Cooperative Grasped-Load Aerial Transportation Systems
Authors: Authors: Carlo Bosio, Jerry Tang, Ting-Hao Wang, Mark W. Mueller
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2310.07649
Pdf link: https://arxiv.org/pdf/2310.07649
Abstract We present a novel approach to cooperative aerial transportation through a team of drones, using optimal control theory and a hierarchical control strategy. We assume the drones are connected to the payload through rigid attachments, essentially transforming the whole system into a larger flying object with "thrust modules" at the attachment locations of the drones. We investigate the optimal arrangement of the thrust modules around the payload, so that the resulting system is robust to disturbances. We choose the $\mathcal{H}_2$ norm as a measure of robustness, and propose an iterative optimization routine to compute the optimal layout of the vehicles around the object. We experimentally validate our approach using four drones and comparing the disturbance rejection performances achieved by two different layouts (the optimal one and a sub-optimal one), and observe that the results match our predictions.
Deep Backtracking Counterfactuals for Causally Compliant Explanations
Authors: Authors: Klaus-Rudolf Kladny, Julius von Kügelgen, Bernhard Schölkopf, Michael Muehlebach
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.07665
Pdf link: https://arxiv.org/pdf/2310.07665
Abstract Counterfactuals can offer valuable insights by answering what would have been observed under altered circumstances, conditional on a factual observation. Whereas the classical interventional interpretation of counterfactuals has been studied extensively, backtracking constitutes a less studied alternative the backtracking principle has emerged as an alternative philosophy where all causal laws are kept intact. In the present work, we introduce a practical method for computing backtracking counterfactuals in structural causal models that consist of deep generative components. To this end, we impose conditions on the structural assignments that enable the generation of counterfactuals by solving a tractable constrained optimization problem in the structured latent space of a causal model. Our formulation also facilitates a comparison with methods in the field of counterfactual explanations. Compared to these, our method represents a versatile, modular and causally compliant alternative. We demonstrate these properties experimentally on a modified version of MNIST and CelebA.
Controllable Data Generation Via Iterative Data-Property Mutual Mappings
Authors: Authors: Bo Pan, Muran Qin, Shiyu Wang, Yifei Zhang, Liang Zhao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07683
Pdf link: https://arxiv.org/pdf/2310.07683
Abstract Deep generative models have been widely used for their ability to generate realistic data samples in various areas, such as images, molecules, text, and speech. One major goal of data generation is controllability, namely to generate new data with desired properties. Despite growing interest in the area of controllable generation, significant challenges still remain, including 1) disentangling desired properties with unrelated latent variables, 2) out-of-distribution property control, and 3) objective optimization for out-of-distribution property control. To address these challenges, in this paper, we propose a general framework to enhance VAE-based data generators with property controllability and ensure disentanglement. Our proposed objective can be optimized on both data seen and unseen in the training set. We propose a training procedure to train the objective in a semi-supervised manner by iteratively conducting mutual mappings between the data and properties. The proposed framework is implemented on four VAE-based controllable generators to evaluate its performance on property error, disentanglement, generation quality, and training time. The results indicate that our proposed framework enables more precise control over the properties of generated samples in a short training time, ensuring the disentanglement and keeping the validity of the generated samples.
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models
Authors: Authors: Yingqing He, Shaoshu Yang, Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, Ying Shan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.07702
Pdf link: https://arxiv.org/pdf/2310.07702
Abstract In this work, we investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes. In addition, the generated images should have arbitrary image aspect ratios. When generating images directly at a higher resolution, 1024 x 1024, with the pre-trained Stable Diffusion using training images of resolution 512 x 512, we observe persistent problems of object repetition and unreasonable object structures. Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues. As a new perspective, we examine the structural components of the U-Net in diffusion models and identify the crucial cause as the limited perception field of convolutional kernels. Based on this key observation, we propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference. We further propose the dispersed convolution and noise-damped classifier-free guidance, which can enable ultra-high-resolution image generation (e.g., 4096 x 4096). Notably, our approach does not require any training or optimization. Extensive experiments demonstrate that our approach can address the repetition issue well and achieve state-of-the-art performance on higher-resolution image synthesis, especially in texture details. Our work also suggests that a pre-trained diffusion model trained on low-resolution images can be directly used for high-resolution visual generation without further tuning, which may provide insights for future research on ultra-high-resolution image and video synthesis.
Keyword: adam

Ultima: Robust and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud
Authors: Authors: Ertza Warraich, Omer Shabtai, Khalid Manaa, Shay Vargaftik, Yonatan Piasetzky, Matty Kadosh, Lalith Suresh, Muhammad Shahbaz
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2310.06993
Pdf link: https://arxiv.org/pdf/2310.06993
Abstract We present Ultima, a new collective-communication system for the cloud with bounded, predictable completion times for deep-learning jobs in the presence of varying computation (stragglers) and communication (congestion and gradient drops) variabilities. Ultima exploits the inherent resiliency and the stochastic nature of distributed deep-learning (DDL) training to work with approximated gradients, and provides an efficient balance between (tail) performance and the resulting accuracy of the trained models. Exploiting this domain-specific characteristic of DDL, Ultima introduces (1) mechanisms (e.g., Transpose AllReduce, unreliable connection-oriented transport, and adaptive timeout) to improve the DDL jobs' tail execution time, and (2) strategies (e.g., Hadamard Transform) to mitigate the impact of gradient drops on model accuracy. Our evaluation shows that Ultima achieves 60% faster time-to-accuracy (TTA), on average, when operating in shared environments (e.g., public cloud), and is on par with existing algorithms (e.g., Ring-AllReduce) in dedicated environments (like HPC).
AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation
Authors: Authors: Liyang Chen, Weihong Bao, Shun Lei, Boshi Tang, Zhiyong Wu, Shiyin Kang, Haozhi Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2310.07236
Pdf link: https://arxiv.org/pdf/2310.07236
Abstract Speech-driven 3D facial animation aims at generating facial movements that are synchronized with the driving speech, which has been widely explored recently. Existing works mostly neglect the person-specific talking style in generation, including facial expression and head pose styles. Several works intend to capture the personalities by fine-tuning modules. However, limited training data leads to the lack of vividness. In this work, we propose AdaMesh, a novel adaptive speech-driven facial animation approach, which learns the personalized talking style from a reference video of about 10 seconds and generates vivid facial expressions and head poses. Specifically, we propose mixture-of-low-rank adaptation (MoLoRA) to fine-tune the expression adapter, which efficiently captures the facial expression style. For the personalized pose style, we propose a pose adapter by building a discrete pose prior and retrieving the appropriate style embedding with a semantic-aware pose style matrix without fine-tuning. Extensive experimental results show that our approach outperforms state-of-the-art methods, preserves the talking style in the reference video, and generates vivid facial animation. The supplementary video and code will be available at https://adamesh.github.io.
Keyword: gradient

Generalized Golub-Kahan bidiagonalization for nonsymmetric saddle point systems
Authors: Authors: Andrei Dumitrasc, Carola Kruse, Ulrich Ruede
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.06952
Pdf link: https://arxiv.org/pdf/2310.06952
Abstract The generalized Golub-Kahan bidiagonalization has been used to solve saddle-point systems where the leading block is symmetric and positive definite. We extend this iterative method for the case where the symmetry condition no longer holds. We do so by relying on the known connection the algorithm has with the Conjugate Gradient method and following the line of reasoning that adapts the latter into the Full Orthogonalization Method. We propose appropriate stopping criteria based on the residual and an estimate of the energy norm for the error associated with the primal variable. Numerical comparison with GMRES highlights the advantages of our proposed strategy regarding its low memory requirements and the associated implications.
Ultima: Robust and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud
Authors: Authors: Ertza Warraich, Omer Shabtai, Khalid Manaa, Shay Vargaftik, Yonatan Piasetzky, Matty Kadosh, Lalith Suresh, Muhammad Shahbaz
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2310.06993
Pdf link: https://arxiv.org/pdf/2310.06993
Abstract We present Ultima, a new collective-communication system for the cloud with bounded, predictable completion times for deep-learning jobs in the presence of varying computation (stragglers) and communication (congestion and gradient drops) variabilities. Ultima exploits the inherent resiliency and the stochastic nature of distributed deep-learning (DDL) training to work with approximated gradients, and provides an efficient balance between (tail) performance and the resulting accuracy of the trained models. Exploiting this domain-specific characteristic of DDL, Ultima introduces (1) mechanisms (e.g., Transpose AllReduce, unreliable connection-oriented transport, and adaptive timeout) to improve the DDL jobs' tail execution time, and (2) strategies (e.g., Hadamard Transform) to mitigate the impact of gradient drops on model accuracy. Our evaluation shows that Ultima achieves 60% faster time-to-accuracy (TTA), on average, when operating in shared environments (e.g., public cloud), and is on par with existing algorithms (e.g., Ring-AllReduce) in dedicated environments (like HPC).
Weak Galerkin methods for elliptic interface problems on curved polygonal partitions
Authors: Authors: Dan Li, Chunmei Wang, Shangyou Zhang
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2310.07009
Pdf link: https://arxiv.org/pdf/2310.07009
Abstract This paper presents a new weak Galerkin (WG) method for elliptic interface problems on general curved polygonal partitions. The method's key innovation lies in its ability to transform the complex interface jump condition into a more manageable Dirichlet boundary condition, simplifying the theoretical analysis significantly. The numerical scheme is designed by using locally constructed weak gradient on the curved polygonal partitions. We establish error estimates of optimal order for the numerical approximation in both discrete $H^1$ and $L^2$ norms. Additionally, we present various numerical results that serve to illustrate the robust numerical performance of the proposed WG interface method.
A predict-and-optimize approach to profit-driven churn prevention
Authors: Authors: Nuria Gómez-Vargas, Sebastián Maldonado, Carla Vairetti
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07047
Pdf link: https://arxiv.org/pdf/2310.07047
Abstract In this paper, we introduce a novel predict-and-optimize method for profit-driven churn prevention. We frame the task of targeting customers for a retention campaign as a regret minimization problem. The main objective is to leverage individual customer lifetime values (CLVs) to ensure that only the most valuable customers are targeted. In contrast, many profit-driven strategies focus on churn probabilities while considering average CLVs. This often results in significant information loss due to data aggregation. Our proposed model aligns with the guidelines of Predict-and-Optimize (PnO) frameworks and can be efficiently solved using stochastic gradient descent methods. Results from 12 churn prediction datasets underscore the effectiveness of our approach, which achieves the best average performance compared to other well-established strategies in terms of average profit.
Investigating the Adversarial Robustness of Density Estimation Using the Probability Flow ODE
Authors: Authors: Marius Arvinte, Cory Cornelius, Jason Martin, Nageen Himayat
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07084
Pdf link: https://arxiv.org/pdf/2310.07084
Abstract Beyond their impressive sampling capabilities, score-based diffusion models offer a powerful analysis tool in the form of unbiased density estimation of a query sample under the training data distribution. In this work, we investigate the robustness of density estimation using the probability flow (PF) neural ordinary differential equation (ODE) model against gradient-based likelihood maximization attacks and the relation to sample complexity, where the compressed size of a sample is used as a measure of its complexity. We introduce and evaluate six gradient-based log-likelihood maximization attacks, including a novel reverse integration attack. Our experimental evaluations on CIFAR-10 show that density estimation using the PF ODE is robust against high-complexity, high-likelihood attacks, and that in some cases adversarial samples are semantically meaningful, as expected from a robust estimator.
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
Authors: Authors: Zhikai Li, Xiaoxuan Liu, Banghua Zhu, Zhen Dong, Qingyi Gu, Kurt Keutzer
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07147
Pdf link: https://arxiv.org/pdf/2310.07147
Abstract Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks. Fine-tuning these pre-trained models on downstream datasets provides further significant performance gains, but this process has been challenging due to its extraordinary resource requirements. To this end, existing efforts focus on parameter-efficient fine-tuning, which, unfortunately, fail to capitalize on the powerful potential of full-parameter fine-tuning. In this work, we propose QFT, a novel Quantized Full-parameter Tuning framework for LLMs that enables memory-efficient fine-tuning without harming performance. Our framework incorporates two novel ideas: (i) we adopt the efficient Lion optimizer, which only keeps track of the momentum and has consistent update magnitudes for each parameter, an inherent advantage for robust quantization; and (ii) we quantize all model states and store them as integer values, and present a gradient flow and parameter update scheme for the quantized weights. As a result, QFT reduces the model state memory to 21% of the standard solution while achieving comparable performance, e.g., tuning a LLaMA-7B model requires only <30GB of memory, satisfied by a single A6000 GPU.
Boosting Learning for LDPC Codes to Improve the Error-Floor Performance
Authors: Authors: Hee-Youl Kwak, Dae-Young Yun, Yongjune Kim, Sang-Hyo Kim, Jong-Seon No
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07194
Pdf link: https://arxiv.org/pdf/2310.07194
Abstract Low-density parity-check (LDPC) codes have been successfully commercialized in communication systems due to their strong error correction ability and simple decoding process. However, the error-floor phenomenon of LDPC codes, in which the error rate stops decreasing rapidly at a certain level, poses challenges in achieving extremely low error rates and the application of LDPC codes in scenarios demanding ultra high reliability. In this work, we propose training methods to optimize neural min-sum (NMS) decoders that are robust to the error-floor. Firstly, by leveraging the boosting learning technique of ensemble networks, we divide the decoding network into two networks and train the post network to be specialized for uncorrected codewords that failed in the first network. Secondly, to address the vanishing gradient issue in training, we introduce a block-wise training schedule that locally trains a block of weights while retraining the preceding block. Lastly, we show that assigning different weights to unsatisfied check nodes effectively lowers the error-floor with a minimal number of weights. By applying these training methods to standard LDPC codes, we achieve the best error-floor performance compared to other decoding methods. The proposed NMS decoder, optimized solely through novel training methods without additional modules, can be implemented into current LDPC decoders without incurring extra hardware costs. The source code is available at https://github.com/ghy1228/LDPC_Error_Floor.
DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation
Authors: Authors: Rong Wang, Wei Mao, Hongdong Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.07206
Pdf link: https://arxiv.org/pdf/2310.07206
Abstract This paper addresses the task of 3D pose estimation for a hand interacting with an object from a single image observation. When modeling hand-object interaction, previous works mainly exploit proximity cues, while overlooking the dynamical nature that the hand must stably grasp the object to counteract gravity and thus preventing the object from slipping or falling. These works fail to leverage dynamical constraints in the estimation and consequently often produce unstable results. Meanwhile, refining unstable configurations with physics-based reasoning remains challenging, both by the complexity of contact dynamics and by the lack of effective and efficient physics inference in the data-driven learning framework. To address both issues, we present DeepSimHO: a novel deep-learning pipeline that combines forward physics simulation and backward gradient approximation with a neural network. Specifically, for an initial hand-object pose estimated by a base network, we forward it to a physics simulator to evaluate its stability. However, due to non-smooth contact geometry and penetration, existing differentiable simulators can not provide reliable state gradient. To remedy this, we further introduce a deep network to learn the stability evaluation process from the simulator, while smoothly approximating its gradient and thus enabling effective back-propagation. Extensive experiments show that our method noticeably improves the stability of the estimation and achieves superior efficiency over test-time optimization. The code is available at https://github.com/rongakowang/DeepSimHO.
Are GATs Out of Balance?
Authors: Authors: Nimrah Mustafa, Aleksandar Bojchevski, Rebekka Burkholz
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07235
Pdf link: https://arxiv.org/pdf/2310.07235
Abstract While the expressive power and computational capabilities of graph neural networks (GNNs) have been theoretically studied, their optimization and learning dynamics, in general, remain largely unexplored. Our study undertakes the Graph Attention Network (GAT), a popular GNN architecture in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. We derive a conservation law of GAT gradient flow dynamics, which explains why a high portion of parameters in GATs with standard initialization struggle to change during training. This effect is amplified in deeper GATs, which perform significantly worse than their shallow counterparts. To alleviate this problem, we devise an initialization scheme that balances the GAT network. Our approach i) allows more effective propagation of gradients and in turn enables trainability of deeper networks, and ii) attains a considerable speedup in training and convergence time in comparison to the standard initialization. Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.
Why Does Sharpness-Aware Minimization Generalize Better Than SGD?
Authors: Authors: Zixiang Chen, Junkai Zhang, Yiwen Kou, Xiangning Chen, Cho-Jui Hsieh, Quanquan Gu
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.07269
Pdf link: https://arxiv.org/pdf/2310.07269
Abstract The challenge of overfitting, in which the model memorizes the training data and fails to generalize to test data, has become increasingly significant in the training of large neural networks. To tackle this challenge, Sharpness-Aware Minimization (SAM) has emerged as a promising training method, which can improve the generalization of neural networks even in the presence of label noise. However, a deep understanding of how SAM works, especially in the setting of nonlinear neural networks and classification tasks, remains largely missing. This paper fills this gap by demonstrating why SAM generalizes better than Stochastic Gradient Descent (SGD) for a certain data model and two-layer convolutional ReLU networks. The loss landscape of our studied problem is nonsmooth, thus current explanations for the success of SAM based on the Hessian information are insufficient. Our result explains the benefits of SAM, particularly its ability to prevent noise learning in the early stages, thereby facilitating more effective learning of features. Experiments on both synthetic and real data corroborate our theory.
Score Regularized Policy Optimization through Diffusion Behavior
Authors: Authors: Huayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2310.07297
Pdf link: https://arxiv.org/pdf/2310.07297
Abstract Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous behavior policies. However, sampling from diffusion policies is considerably slow because it necessitates tens to hundreds of iterative inference steps for one action. To address this issue, we propose to extract an efficient deterministic inference policy from critic models and pretrained diffusion behavior models, leveraging the latter to directly regularize the policy gradient with the behavior distribution's score function during optimization. Our method enjoys powerful generative capabilities of diffusion modeling while completely circumventing the computationally intensive and time-consuming diffusion sampling scheme, both during training and evaluation. Extensive results on D4RL tasks show that our method boosts action sampling speed by more than 25 times compared with various leading diffusion-based methods in locomotion tasks, while still maintaining state-of-the-art performance.
Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters
Authors: Authors: Mateusz Michalkiewicz, Masoud Faraki, Xiang Yu, Manmohan Chandraker, Mahsa Baktashmotlagh
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.07361
Pdf link: https://arxiv.org/pdf/2310.07361
Abstract Overfitting to the source domain is a common issue in gradient-based training of deep neural networks. To compensate for the over-parameterized models, numerous regularization techniques have been introduced such as those based on dropout. While these methods achieve significant improvements on classical benchmarks such as ImageNet, their performance diminishes with the introduction of domain shift in the test set i.e. when the unseen data comes from a significantly different distribution. In this paper, we move away from the classical approach of Bernoulli sampled dropout mask construction and propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters. Specifically, at each training step, parameters with high GSNR will be discarded. Furthermore, we alleviate the burden of manually searching for the optimal dropout ratio by leveraging a meta-learning approach. We evaluate our method on standard domain generalization benchmarks and achieve competitive results on classification and face anti-spoofing problems.
Histopathological Image Classification and Vulnerability Analysis using Federated Learning
Authors: Authors: Sankalp Vyas, Amar Nath Patra, Raj Mani Shukla
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07380
Pdf link: https://arxiv.org/pdf/2310.07380
Abstract Healthcare is one of the foremost applications of machine learning (ML). Traditionally, ML models are trained by central servers, which aggregate data from various distributed devices to forecast the results for newly generated data. This is a major concern as models can access sensitive user information, which raises privacy concerns. A federated learning (FL) approach can help address this issue: A global model sends its copy to all clients who train these copies, and the clients send the updates (weights) back to it. Over time, the global model improves and becomes more accurate. Data privacy is protected during training, as it is conducted locally on the clients' devices. However, the global model is susceptible to data poisoning. We develop a privacy-preserving FL technique for a skin cancer dataset and show that the model is prone to data poisoning attacks. Ten clients train the model, but one of them intentionally introduces flipped labels as an attack. This reduces the accuracy of the global model. As the percentage of label flipping increases, there is a noticeable decrease in accuracy. We use a stochastic gradient descent optimization algorithm to find the most optimal accuracy for the model. Although FL can protect user privacy for healthcare diagnostics, it is also vulnerable to data poisoning, which must be addressed.
Randomized Runge-Kutta-Nyström
Authors: Authors: Nawaf Bou-Rabee, Tore Selland Kleppe
Subjects: Numerical Analysis (math.NA); Probability (math.PR); Computation (stat.CO); Methodology (stat.ME); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2310.07399
Pdf link: https://arxiv.org/pdf/2310.07399
Abstract We present 5/2- and 7/2-order $L^2$-accurate randomized Runge-Kutta-Nystr\"om methods to approximate the Hamiltonian flow underlying various non-reversible Markov chain Monte Carlo chains including unadjusted Hamiltonian Monte Carlo and unadjusted kinetic Langevin chains. Quantitative 5/2-order $L^2$-accuracy upper bounds are provided under gradient and Hessian Lipschitz assumptions on the potential energy function. The superior complexity of the corresponding Markov chains is numerically demonstrated for a selection of `well-behaved', high-dimensional target distributions.
IRS Assisted Federated Learning A Broadband Over-the-Air Aggregation Approach
Authors: Authors: Deyou Zhang, Ming Xiao, Zhibo Pang, Lihui Wang, H. Vincent Poor
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2310.07405
Pdf link: https://arxiv.org/pdf/2310.07405
Abstract We consider a broadband over-the-air computation empowered model aggregation approach for wireless federated learning (FL) systems and propose to leverage an intelligent reflecting surface (IRS) to combat wireless fading and noise. We first investigate the conventional node-selection based framework, where a few edge nodes are dropped in model aggregation to control the aggregation error. We analyze the performance of this node-selection based framework and derive an upper bound on its performance loss, which is shown to be related to the selected edge nodes. Then, we seek to minimize the mean-squared error (MSE) between the desired global gradient parameters and the actually received ones by optimizing the selected edge nodes, their transmit equalization coefficients, the IRS phase shifts, and the receive factors of the cloud server. By resorting to the matrix lifting technique and difference-of-convex programming, we successfully transform the formulated optimization problem into a convex one and solve it using off-the-shelf solvers. To improve learning performance, we further propose a weight-selection based FL framework. In such a framework, we assign each edge node a proper weight coefficient in model aggregation instead of discarding any of them to reduce the aggregation error, i.e., amplitude alignment of the received local gradient parameters from different edge nodes is not required. We also analyze the performance of this weight-selection based framework and derive an upper bound on its performance loss, followed by minimizing the MSE via optimizing the weight coefficients of the edge nodes, their transmit equalization coefficients, the IRS phase shifts, and the receive factors of the cloud server. Furthermore, we use the MNIST dataset for simulations to evaluate the performance of both node-selection and weight-selection based FL frameworks.
Distance-based Weighted Transformer Network for Image Completion
Authors: Authors: Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Xuelong Li, Yue Lu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2310.07440
Pdf link: https://arxiv.org/pdf/2310.07440
Abstract The challenge of image generation has been effectively modeled as a problem of structure priors or transformation. However, existing models have unsatisfactory performance in understanding the global input image structures because of particular inherent features (for example, local inductive prior). Recent studies have shown that self-attention is an efficient modeling technique for image completion problems. In this paper, we propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. In our model, we leverage the strengths of both Convolutional Neural Networks (CNNs) and DWT blocks to enhance the image completion process. Specifically, CNNs are used to augment the local texture information of coarse priors and DWT blocks are used to recover certain coarse textures and coherent visual structures. Unlike current approaches that generally use CNNs to create feature maps, we use the DWT to encode global dependencies and compute distance-based weighted feature maps, which substantially minimizes the problem of visual ambiguities. Meanwhile, to better produce repeated textures, we introduce Residual Fast Fourier Convolution (Res-FFC) blocks to combine the encoder's skip features with the coarse features provided by our generator. Furthermore, a simple yet effective technique is proposed to normalize the non-zero values of convolutions, and fine-tune the network layers for regularization of the gradient norms to provide an efficient training stabiliser. Extensive quantitative and qualitative experiments on three challenging datasets demonstrate the superiority of our proposed model compared to existing approaches.
ROMO: Retrieval-enhanced Offline Model-based Optimization
Authors: Authors: Mingcheng Chen, Haoran Zhao, Yuxiang Zhao, Hulei Fan, Hongqiao Gao, Yong Yu, Zheng Tian
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07560
Pdf link: https://arxiv.org/pdf/2310.07560
Abstract Data-driven black-box model-based optimization (MBO) problems arise in a great number of practical application scenarios, where the goal is to find a design over the whole space maximizing a black-box target function based on a static offline dataset. In this work, we consider a more general but challenging MBO setting, named constrained MBO (CoMBO), where only part of the design space can be optimized while the rest is constrained by the environment. A new challenge arising from CoMBO is that most observed designs that satisfy the constraints are mediocre in evaluation. Therefore, we focus on optimizing these mediocre designs in the offline dataset while maintaining the given constraints rather than further boosting the best observed design in the traditional MBO setting. We propose retrieval-enhanced offline model-based optimization (ROMO), a new derivable forward approach that retrieves the offline dataset and aggregates relevant samples to provide a trusted prediction, and use it for gradient-based optimization. ROMO is simple to implement and outperforms state-of-the-art approaches in the CoMBO setting. Empirically, we conduct experiments on a synthetic Hartmann (3D) function dataset, an industrial CIO dataset, and a suite of modified tasks in the Design-Bench benchmark. Results show that ROMO performs well in a wide range of constrained optimization tasks.
Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer
Authors: Authors: Zikai Xiao, Zihan Chen, Songshang Liu, Hualiang Wang, Yang Feng, Jin Hao, Joey Tianyi Zhou, Jian Wu, Howard Hao Yang, Zuozhu Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2310.07587
Pdf link: https://arxiv.org/pdf/2310.07587
Abstract Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or centralized long-tailed learning methods hardly apply due to challenges in (a) characterizing the global long-tailed distribution under privacy constraints and (b) adjusting the local learning strategy to cope with the head-tail imbalance. In response, we propose a method termed $\texttt{Fed-GraB}$, comprised of a Self-adjusting Gradient Balancer (SGB) module that re-weights clients' gradients in a closed-loop manner, based on the feedback of global long-tailed distribution evaluated by a Direct Prior Analyzer (DPA) module. Using $\texttt{Fed-GraB}$, clients can effectively alleviate the distribution drift caused by data heterogeneity during the model training process and obtain a global model with better performance on the minority classes while maintaining the performance of the majority classes. Extensive experiments demonstrate that $\texttt{Fed-GraB}$ achieves state-of-the-art performance on representative datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist.
Keyword: super-resolution

ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and Multispectral Data Fusion
Authors: Authors: Jinghui Qin, Lihuang Fang, Ruitao Lu, Liang Lin, Yukai Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2310.07255
Pdf link: https://arxiv.org/pdf/2310.07255
Abstract Deep learning-based hyperspectral image (HSI) super-resolution, which aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs), has attracted lots of attention. However, neural networks require large amounts of training data, hindering their application in real-world scenarios. In this letter, we propose a novel adversarial automatic data augmentation framework ADASR that automatically optimizes and augments HSI-MSI sample pairs to enrich data diversity for HSI-MSI fusion. Our framework is sample-aware and optimizes an augmentor network and two downsampling networks jointly by adversarial learning so that we can learn more robust downsampling networks for training the upsampling network. Extensive experiments on two public classical hyperspectral datasets demonstrate the effectiveness of our ADASR compared to the state-of-the-art methods.

zoq / arxiv-updates

New submissions for Thu, 12 Oct 23 #619

Keyword: sgd

Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

Keyword: optimization

Design of JiuTian Intelligent Network Simulation Platform

Extended Reality via Cooperative NOMA in Hybrid Cloud/Mobile-Edge Computing Networks

Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization

PICProp: Physics-Informed Confidence Propagation for Uncertainty Quantification

Eclares: Energy-Aware Clarity-Driven Ergodic Search

On the Role of Font Formats in Building Efficient Web Applications

Adversarial optimization leads to over-optimistic security-constrained dispatch, but sampling can help

Multi-Robot Cooperative Navigation in Crowds: A Game-Theoretic Learning-Based Model Predictive Control Approach

Reconfigurable Intelligent Surfaces-Enabled Intra-Cell Pilot Reuse in Massive MIMO Systems

Neural Relational Inference with Fast Modular Meta-learning

Neural Harmonium: An Interpretable Deep Structure for Nonlinear Dynamic System Identification with Application to Audio Processing

GraphCloak: Safeguarding Task-specific Knowledge within Graph-structured Data from Unauthorized Exploitation

Off-Policy Evaluation for Human Feedback

Risk Assessment and Statistical Significance in the Age of Foundation Models

Denoising Task Routing for Diffusion Models

Operating-Envelopes-Aware Decentralized Welfare Maximization for Energy Communities

Integrated Sensing and Communication enabled Multiple Base Stations Cooperative Sensing Towards 6G

$pκ$-Curves: Interpolatory curves with curvature approximating a parabola

DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation

Robust Safe Reinforcement Learning under Adversarial Disturbances

Fault-tolerant $k$-Supplier with Outliers

Enhancing Neural Architecture Search with Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT Devices

Are GATs Out of Balance?

Cost-aware Joint Caching and Forwarding in Networks with Heterogeneous Cache Resources

Score Regularized Policy Optimization through Diffusion Behavior

Histopathological Image Classification and Vulnerability Analysis using Federated Learning

Deep Kernel and Image Quality Estimators for Optimizing Robotic Ultrasound Controller using Bayesian Optimization

RANS: Highly-Parallelised Simulator for Reinforcement Learning based Autonomous Navigating Spacecrafts

IRS Assisted Federated Learning A Broadband Over-the-Air Aggregation Approach

AI/ML-based Load Prediction in IEEE 802.11 Enterprise Networks

Boosting Black-box Attack to Deep Neural Networks with Conditional Diffusion Models

Diversity for Contingency: Learning Diverse Behaviors for Efficient Adaptation and Transfer

Sample-Driven Federated Learning for Energy-Efficient and Real-Time IoT Sensing

A Unified Remote Sensing Anomaly Detector Across Modalities and Scenes via Deviation Relationship Learning

Retrieve Anything To Augment Large Language Models

ROMO: Retrieval-enhanced Offline Model-based Optimization

Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer

Approximating Subset Sum Ratio faster than Subset Sum

OpsEval: A Comprehensive Task-Oriented AIOps Benchmark for Large Language Models

Automated Layout Design and Control of Robust Cooperative Grasped-Load Aerial Transportation Systems

Deep Backtracking Counterfactuals for Causally Compliant Explanations

Controllable Data Generation Via Iterative Data-Property Mutual Mappings

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

Keyword: adam

Ultima: Robust and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud

AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation

Keyword: gradient

Generalized Golub-Kahan bidiagonalization for nonsymmetric saddle point systems

Ultima: Robust and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud

Weak Galerkin methods for elliptic interface problems on curved polygonal partitions

A predict-and-optimize approach to profit-driven churn prevention

Investigating the Adversarial Robustness of Density Estimation Using the Probability Flow ODE

QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources

Boosting Learning for LDPC Codes to Improve the Error-Floor Performance

DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation

Are GATs Out of Balance?

Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

Score Regularized Policy Optimization through Diffusion Behavior

Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters

Histopathological Image Classification and Vulnerability Analysis using Federated Learning

Randomized Runge-Kutta-Nyström

IRS Assisted Federated Learning A Broadband Over-the-Air Aggregation Approach

Distance-based Weighted Transformer Network for Image Completion

ROMO: Retrieval-enhanced Offline Model-based Optimization

Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer

Keyword: super-resolution

ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and Multispectral Data Fusion