New submissions for Wed, 20 Dec 23

Keyword: sgd

There is no result

Keyword: optimization

Low-Latency ML Inference by Grouping Correlated Data Objects and Computation

Authors: Authors: Thiago Garrett, Weijia Song, Roman Vitenberg, Ken Birman
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.11488
Pdf link: https://arxiv.org/pdf/2312.11488
Abstract ML inference workflows often require low latency and high throughput, yet we lack good options for addressing this need. Techniques that reduce latency in other streaming settings (such as caching and optimization-driven scheduling) are of limited value because ML data dependencies are often very large and can change dramatically depending on the triggering event. In this work, we propose a novel correlation grouping mechanism that makes it easier for developers to express application-specific data access correlations, enabling coordinated management of data objects in server clusters hosting streaming inference tasks. Experiments based on a latency-sensitive ML-based application confirm the limitations of standard techniques while showing that our solution yields dramatically better performance. The proposed mechanism is able to maintain significantly lower and more consistent latency, achieves higher node utilization as workload and scale-out increase, and yet requires only minor changes to the code implementing the application.
A Simulated Annealing-Based Multiobjective Optimization Algorithm for Minimum Weight Minimum Connected Dominating Set Problem
Authors: Authors: Hayet Dahmri, Salim Bouamama
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.11527
Pdf link: https://arxiv.org/pdf/2312.11527
Abstract Minimum connected dominating set problem is an NP-hard combinatorial optimization problem in graph theory. Finding connected dominating set is of high interest in various domains such as wireless sensor networks, optical networks, and systems biology. Its weighted variant named minimum weight connected dominating set is also useful in such applications. In this paper, we propose a simulated annealing algorithm based on a greedy heuristic for tackling a variant of the minimum connected dominating set problem and that by exploiting two objectives together namely the cardinality and the total weight of the connected dominating set. Experimental results compared to those obtained by a recent proposed research show the superiority of our approach.
Improved Differentially Private and Lazy Online Convex Optimization
Authors: Authors: Naman Agarwal, Satyen Kale, Karan Singh, Abhradeep Guha Thakurta
Subjects: Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.11534
Pdf link: https://arxiv.org/pdf/2312.11534
Abstract We study the task of $(\epsilon, \delta)$-differentially private online convex optimization (OCO). In the online setting, the release of each distinct decision or iterate carries with it the potential for privacy loss. This problem has a long history of research starting with Jain et al. [2012] and the best known results for the regime of {\epsilon} being very small are presented in Agarwal et al. [2023]. In this paper we improve upon the results of Agarwal et al. [2023] in terms of the dimension factors as well as removing the requirement of smoothness. Our results are now the best known rates for DP-OCO in this regime. Our algorithms builds upon the work of [Asi et al., 2023] which introduced the idea of explicitly limiting the number of switches via rejection sampling. The main innovation in our algorithm is the use of sampling from a strongly log-concave density which allows us to trade-off the dimension factors better leading to improved results.
Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior
Authors: Authors: Nan Huang, Ting Zhang, Yuhui Yuan, Dong Chen, Shanghang Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.11535
Pdf link: https://arxiv.org/pdf/2312.11535
Abstract In this paper, we present a novel two-stage approach that fully utilizes the information provided by the reference image to establish a customized knowledge prior for image-to-3D generation. While previous approaches primarily rely on a general diffusion prior, which struggles to yield consistent results with the reference image, we propose a subject-specific and multi-modal diffusion model. This model not only aids NeRF optimization by considering the shading mode for improved geometry but also enhances texture from the coarse results to achieve superior refinement. Both aspects contribute to faithfully aligning the 3D content with the subject. Extensive experiments showcase the superiority of our method, Customize-It-3D, outperforming previous works by a substantial margin. It produces faithful 360-degree reconstructions with impressive visual quality, making it well-suited for various applications, including text-to-3D creation.
A Unified Pre-training and Adaptation Framework for Combinatorial Optimization on Graphs
Authors: Authors: Ruibin Zeng, Minglong Lei, Lingfeng Niu, Lan Cheng
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.11547
Pdf link: https://arxiv.org/pdf/2312.11547
Abstract Combinatorial optimization (CO) on graphs is a classic topic that has been extensively studied across many scientific and industrial fields. Recently, solving CO problems on graphs through learning methods has attracted great attention. Advanced deep learning methods, e.g., graph neural networks (GNNs), have been used to effectively assist the process of solving COs. However, current frameworks based on GNNs are mainly designed for certain CO problems, thereby failing to consider their transferable and generalizable abilities among different COs on graphs. Moreover, simply using original graphs to model COs only captures the direct correlations among objects, which does not consider the mathematical logicality and properties of COs. In this paper, we propose a unified pre-training and adaptation framework for COs on graphs with the help of the maximum satisfiability (Max-SAT) problem. We first use Max-SAT to bridge different COs on graphs since they can be converted to Max-SAT problems represented by standard formulas and clauses with logical information. Then, we further design a pre-training and domain adaptation framework to extract the transferable and generalizable features so that different COs can benefit from them. In the pre-training stage, Max-SAT instances are generated to initialize the parameters of the model. In the fine-tuning stage, instances from CO and Max-SAT problems are used for adaptation so that the transferable ability can be further improved. Numerical experiments on several datasets show that features extracted by our framework exhibit superior transferability and Max-SAT can boost the ability to solve COs on graphs.
Learning Interpretable Queries for Explainable Image Classification with Information Pursuit
Authors: Authors: Stefan Kolek, Aditya Chattopadhyay, Kwan Ho Ryan Chan, Hector Andrade-Loarca, Gitta Kutyniok, Réne Vidal
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.11548
Pdf link: https://arxiv.org/pdf/2312.11548
Abstract Information Pursuit (IP) is an explainable prediction algorithm that greedily selects a sequence of interpretable queries about the data in order of information gain, updating its posterior at each step based on observed query-answer pairs. The standard paradigm uses hand-crafted dictionaries of potential data queries curated by a domain expert or a large language model after a human prompt. However, in practice, hand-crafted dictionaries are limited by the expertise of the curator and the heuristics of prompt engineering. This paper introduces a novel approach: learning a dictionary of interpretable queries directly from the dataset. Our query dictionary learning problem is formulated as an optimization problem by augmenting IP's variational formulation with learnable dictionary parameters. To formulate learnable and interpretable queries, we leverage the latent space of large vision and language models like CLIP. To solve the optimization problem, we propose a new query dictionary learning algorithm inspired by classical sparse dictionary learning. Our experiments demonstrate that learned dictionaries significantly outperform hand-crafted dictionaries generated with large language models.
PR-NeuS: A Prior-based Residual Learning Paradigm for Fast Multi-view Neural Surface Reconstruction
Authors: Authors: Jianyao Xu, Qingshan Xu, Xinyao Liao, Wanjuan Su, Chen Zhang, Yew-Soon Ong, Wenbing Tao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.11577
Pdf link: https://arxiv.org/pdf/2312.11577
Abstract Neural surfaces learning has shown impressive performance in multi-view surface reconstruction. However, most existing methods use large multilayer perceptrons (MLPs) to train their models from scratch, resulting in hours of training for a single scene. Recently, how to accelerate the neural surfaces learning has received a lot of attention and remains an open problem. In this work, we propose a prior-based residual learning paradigm for fast multi-view neural surface reconstruction. This paradigm consists of two optimization stages. In the first stage, we propose to leverage generalization models to generate a basis signed distance function (SDF) field. This initial field can be quickly obtained by fusing multiple local SDF fields produced by generalization models. This provides a coarse global geometry prior. Based on this prior, in the second stage, a fast residual learning strategy based on hash-encoding networks is proposed to encode an offset SDF field for the basis SDF field. Moreover, we introduce a prior-guided sampling scheme to help the residual learning stage converge better, and thus recover finer structures. With our designed paradigm, experimental results show that our method only takes about 3 minutes to reconstruct the surface of a single scene, while achieving competitive surface quality. Our code will be released upon publication.
Two-Channel Extended Kalman Filtering with Intermittent Measurements
Authors: Authors: Vicu-Mihalis Maer, Zsofia Lendek, Stefan Pirje, Domagoj Tolic, Antun Djuras, Vicko Prkacin, Ivana Palunko, Lucian Busoniu
Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.11600
Pdf link: https://arxiv.org/pdf/2312.11600
Abstract We consider two nonlinear state estimation problems in a setting where an extended Kalman filter receives measurements from two sets of sensors via two channels (2C). In the stochastic-2C problem, the channels drop measurements stochastically, whereas in 2C scheduling, the estimator chooses when to read each channel. In the first problem, we generalize linear-case 2C analysis to obtain -- for a given pair of channel arrival rates -- boundedness conditions for the trace of the error covariance, as well as a worst-case upper bound. For scheduling, an optimization problem is solved to find arrival rates that balance low channel usage with low trace bounds, and channels are read deterministically with the expected periods corresponding to these arrival rates. We validate both solutions in simulations for linear and nonlinear dynamics; as well as in a real experiment with an underwater robot whose position is being intermittently found in a UAV camera image.
Experiment-informed finite-strain inverse design of spinodal metamaterials
Authors: Authors: Prakash Thakolkaran, Michael A. Espinal, Somayajulu Dhulipala, Siddhant Kumar, Carlos M. Portela
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2312.11648
Pdf link: https://arxiv.org/pdf/2312.11648
Abstract This study presents a novel physics-enhanced machine learning (ML) and optimization framework tailored to address the challenges of designing intricate spinodal metamaterials with customized mechanical properties in scenarios where computational modeling is restricted, and experimental data is sparse. By utilizing sparse experimental data directly, our approach facilitates the inverse design of spinodal structures with precise finite-strain mechanical responses. Leveraging physics-based inductive biases to compensate for limited data availability, the framework sheds light on instability-induced pattern formation in periodic metamaterials, attributing it to nonconvex energetic potentials. Inspired by phase transformation modeling, the method integrates multiple partial input convex neural networks to create nonconvex potentials, effectively capturing complex nonlinear stress-strain behavior, even under extreme deformations.
ADMM-MM Algorithm for General Tensor Decomposition
Authors: Authors: Manabu Mukai, Hidekata Hontani, Tatsuya Yokota
Subjects: Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.11763
Pdf link: https://arxiv.org/pdf/2312.11763
Abstract In this paper, we propose a new unified optimization algorithm for general tensor decomposition which is formulated as an inverse problem for low-rank tensors in the general linear observation models. The proposed algorithm supports three basic loss functions ($\ell_2$-loss, $\ell_1$-loss and KL divergence) and various low-rank tensor decomposition models (CP, Tucker, TT, and TR decompositions). We derive the optimization algorithm based on hierarchical combination of the alternating direction method of multiplier (ADMM) and majorization-minimization (MM). We show that wide-range applications can be solved by the proposed algorithm, and can be easily extended to any established tensor decomposition models in a {plug-and-play} manner.
Text-Image Conditioned Diffusion for Consistent Text-to-3D Generation
Authors: Authors: Yuze He, Yushi Bai, Matthieu Lin, Jenny Sheng, Yubin Hu, Qi Wang, Yu-Hui Wen, Yong-Jin Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.11774
Pdf link: https://arxiv.org/pdf/2312.11774
Abstract By lifting the pre-trained 2D diffusion models into Neural Radiance Fields (NeRFs), text-to-3D generation methods have made great progress. Many state-of-the-art approaches usually apply score distillation sampling (SDS) to optimize the NeRF representations, which supervises the NeRF optimization with pre-trained text-conditioned 2D diffusion models such as Imagen. However, the supervision signal provided by such pre-trained diffusion models only depends on text prompts and does not constrain the multi-view consistency. To inject the cross-view consistency into diffusion priors, some recent works finetune the 2D diffusion model with multi-view data, but still lack fine-grained view coherence. To tackle this challenge, we incorporate multi-view image conditions into the supervision signal of NeRF optimization, which explicitly enforces fine-grained view consistency. With such stronger supervision, our proposed text-to-3D method effectively mitigates the generation of floaters (due to excessive densities) and completely empty spaces (due to insufficient densities). Our quantitative evaluations on the T$^3$Bench dataset demonstrate that our method achieves state-of-the-art performance over existing text-to-3D methods. We will make the code publicly available.
Faster Convergence with Multiway Preferences
Authors: Authors: Aadirupa Saha, Vitaly Feldman, Tomer Koren, Yishay Mansour
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.11788
Pdf link: https://arxiv.org/pdf/2312.11788
Abstract We address the problem of convex optimization with preference feedback, where the goal is to minimize a convex function given a weaker form of comparison queries. Each query consists of two points and the dueling feedback returns a (noisy) single-bit binary comparison of the function values of the two queried points. Here we consider the sign-function-based comparison feedback model and analyze the convergence rates with batched and multiway (argmin of a set queried points) comparisons. Our main goal is to understand the improved convergence rates owing to parallelization in sign-feedback-based optimization problems. Our work is the first to study the problem of convex optimization with multiway preferences and analyze the optimal convergence rates. Our first contribution lies in designing efficient algorithms with a convergence rate of $\smash{\widetilde O}(\frac{d}{\min{m,d} \epsilon})$ for $m$-batched preference feedback where the learner can query $m$-pairs in parallel. We next study a $m$-multiway comparison (`battling') feedback, where the learner can get to see the argmin feedback of $m$-subset of queried points and show a convergence rate of $\smash{\widetilde O}(\frac{d}{ \min{\log m,d}\epsilon })$. We show further improved convergence rates with an additional assumption of strong convexity. Finally, we also study the convergence lower bounds for batched preferences and multiway feedback optimization showing the optimality of our convergence rates w.r.t. $m$.
SoC-Tuner: An Importance-guided Exploration Framework for DNN-targeting SoC Design
Authors: Authors: Shixin Chen, Su Zheng, Chen Bai, Wenqian Zhao, Shuo Yin, Yang Bai, Bei Yu
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2312.11820
Pdf link: https://arxiv.org/pdf/2312.11820
Abstract Designing a system-on-chip (SoC) for deep neural network (DNN) acceleration requires balancing multiple metrics such as latency, power, and area. However, most existing methods ignore the interactions among different SoC components and rely on inaccurate and error-prone evaluation tools, leading to inferior SoC design. In this paper, we present SoC-Tuner, a DNN-targeting exploration framework to find the Pareto optimal set of SoC configurations efficiently. Our framework constructs a thorough SoC design space of all components and divides the exploration into three phases. We propose an importance-based analysis to prune the design space, a sampling algorithm to select the most representative initialization points, and an information-guided multi-objective optimization method to balance multiple design metrics of SoC design. We validate our framework with the actual very-large-scale-integration (VLSI) flow on various DNN benchmarks and show that it outperforms previous methods. To the best of our knowledge, this is the first work to construct an exploration framework of SoCs for DNN acceleration.
Provably Convergent Federated Trilevel Learning
Authors: Authors: Yang Jiao, Kai Yang, Tiancheng Wu, Chengtao Jian, Jianwei Huang
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.11835
Pdf link: https://arxiv.org/pdf/2312.11835
Abstract Trilevel learning, also called trilevel optimization (TLO), has been recognized as a powerful modelling tool for hierarchical decision process and widely applied in many machine learning applications, such as robust neural architecture search, hyperparameter optimization, and domain adaptation. Tackling TLO problems has presented a great challenge due to their nested decision-making structure. In addition, existing works on TLO face the following key challenges: 1) they all focus on the non-distributed setting, which may lead to privacy breach; 2) they do not offer any non-asymptotic convergence analysis which characterizes how fast an algorithm converges. To address the aforementioned challenges, this paper proposes an asynchronous federated trilevel optimization method to solve TLO problems. The proposed method utilizes $\mu$-cuts to construct a hyper-polyhedral approximation for the TLO problem and solve it in an asynchronous manner. We demonstrate that the proposed $\mu$-cuts are applicable to not only convex functions but also a wide range of non-convex functions that meet the $\mu$-weakly convex assumption. Furthermore, we theoretically analyze the non-asymptotic convergence rate for the proposed method by showing its iteration complexity to obtain $\epsilon$-stationary point is upper bounded by $\mathcal{O}(\frac{1}{\epsilon^2})$. Extensive experiments on real-world datasets have been conducted to elucidate the superiority of the proposed method, e.g., it has a faster convergence rate with a maximum acceleration of approximately 80$\%$.
Enhancing Social Decision-Making of Autonomous Vehicles: A Mixed-Strategy Game Approach With Interaction Orientation Identification
Authors: Authors: Jiaqi Liu, Xiao Qi, Peng Hang, Jian Sun
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.11843
Pdf link: https://arxiv.org/pdf/2312.11843
Abstract The integration of Autonomous Vehicles (AVs) into existing human-driven traffic systems poses considerable challenges, especially within environments where human and machine interactions are frequent and complex, such as at unsignalized intersections. Addressing these challenges, we introduce a novel framework predicated on dynamic and socially-aware decision-making game theory to augment the social decision-making prowess of AVs in mixed driving environments.This comprehensive framework is delineated into three primary modules: Social Tendency Recognition, Mixed-Strategy Game Modeling, and Expert Mode Learning. We introduce 'Interaction Orientation' as a metric to evaluate the social decision-making tendencies of various agents, incorporating both environmental factors and trajectory data. The mixed-strategy game model developed as part of this framework considers the evolution of future traffic scenarios and includes a utility function that balances safety, operational efficiency, and the unpredictability of environmental conditions. To adapt to real-world driving complexities, our framework utilizes dynamic optimization techniques for assimilating and learning from expert human driving strategies. These strategies are compiled into a comprehensive library, serving as a reference for future decision-making processes. Our approach is validated through extensive driving datasets, and the results demonstrate marked enhancements in decision timing, precision.
Adaptive Tracking and Perching for Quadrotor in Dynamic Scenarios
Authors: Authors: Yuman Gao, Jialin Ji, Qianhao Wang, Rui Jin, Yi Lin, Zhimeng Shang, Yanjun Cao, Shaojie Shen, Chao Xu, Fei Gao
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.11866
Pdf link: https://arxiv.org/pdf/2312.11866
Abstract Perching on the moving platforms is a promising solution to enhance the endurance and operational range of quadrotors, which could benefit the efficiency of a variety of air-ground cooperative tasks. To ensure robust perching, tracking with a steady relative state and reliable perception is a prerequisite. This paper presents an adaptive dynamic tracking and perching scheme for autonomous quadrotors to achieve tight integration with moving platforms. For reliable perception of dynamic targets, we introduce elastic visibility-aware planning to actively avoid occlusion and target loss. Additionally, we propose a flexible terminal adjustment method that adapts the changes in flight duration and the coupled terminal states, ensuring full-state synchronization with the time-varying perching surface at various angles. A relaxation strategy is developed by optimizing the tangential relative speed to address the dynamics and safety violations brought by hard boundary conditions. Moreover, we take SE(3) motion planning into account to ensure no collision between the quadrotor and the platform until the contact moment. Furthermore, we propose an efficient spatiotemporal trajectory optimization framework considering full state dynamics for tracking and perching. The proposed method is extensively tested through benchmark comparisons and ablation studies. To facilitate the application of academic research to industry and to validate the efficiency of our scheme under strictly limited computational resources, we deploy our system on a commercial drone (DJI-MAVIC3) with a full-size sport-utility vehicle (SUV). We conduct extensive real-world experiments, where the drone successfully tracks and perches at 30~km/h (8.3~m/s) on the top of the SUV, and at 3.5~m/s with 60{\deg} inclined into the trunk of the SUV.
Difficulty-Focused Contrastive Learning for Knowledge Tracing with a Large Language Model-Based Difficulty Prediction
Authors: Authors: Unggi Lee, Sungjun Yoon, Joon Seo Yun, Kyoungsoo Park, YoungHoon Jung, Damji Stratton, Hyeoncheol Kim
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2312.11890
Pdf link: https://arxiv.org/pdf/2312.11890
Abstract This paper presents novel techniques for enhancing the performance of knowledge tracing (KT) models by focusing on the crucial factor of question and concept difficulty level. Despite the acknowledged significance of difficulty, previous KT research has yet to exploit its potential for model optimization and has struggled to predict difficulty from unseen data. To address these problems, we propose a difficulty-centered contrastive learning method for KT models and a Large Language Model (LLM)-based framework for difficulty prediction. These innovative methods seek to improve the performance of KT models and provide accurate difficulty estimates for unseen data. Our ablation study demonstrates the efficacy of these techniques by demonstrating enhanced KT model performance. Nonetheless, the complex relationship between language and difficulty merits further investigation.
Stable Relay Learning Optimization Approach for Fast Power System Production Cost Minimization Simulation
Authors: Authors: Zishan Guo, Qinran Hu, Tao Qian, Xin Fang, Renjie Hu, Zaijun Wu
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.11896
Pdf link: https://arxiv.org/pdf/2312.11896
Abstract Production cost minimization (PCM) simulation is commonly employed for assessing the operational efficiency, economic viability, and reliability, providing valuable insights for power system planning and operations. However, solving a PCM problem is time-consuming, consisting of numerous binary variables for simulation horizon extending over months and years. This hinders rapid assessment of modern energy systems with diverse planning requirements. Existing methods for accelerating PCM tend to sacrifice accuracy for speed. In this paper, we propose a stable relay learning optimization (s-RLO) approach within the Branch and Bound (B&B) algorithm. The proposed approach offers rapid and stable performance, and ensures optimal solutions. The two-stage s-RLO involves an imitation learning (IL) phase for accurate policy initialization and a reinforcement learning (RL) phase for time-efficient fine-tuning. When implemented on the popular SCIP solver, s-RLO returns the optimal solution up to 2 times faster than the default relpscost rule and 1.4 times faster than IL, or exhibits a smaller gap at the predefined time limit. The proposed approach shows stable performance, reducing fluctuations by approximately 50% compared with IL. The efficacy of the proposed s-RLO approach is supported by numerical results.
Convergence Visualizer of Decentralized Federated Distillation with Reduced Communication Costs
Authors: Authors: Akihito Taya, Yuuki Nishiyama, Kaoru Sezaki
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.11905
Pdf link: https://arxiv.org/pdf/2312.11905
Abstract Federated learning (FL) achieves collaborative learning without the need for data sharing, thus preventing privacy leakage. To extend FL into a fully decentralized algorithm, researchers have applied distributed optimization algorithms to FL by considering machine learning (ML) tasks as parameter optimization problems. Conversely, the consensus-based multi-hop federated distillation (CMFD) proposed in the authors' previous work makes neural network (NN) models get close with others in a function space rather than in a parameter space. Hence, this study solves two unresolved challenges of CMFD: (1) communication cost reduction and (2) visualization of model convergence. Based on a proposed dynamic communication cost reduction method (DCCR), the amount of data transferred in a network is reduced; however, with a slight degradation in the prediction accuracy. In addition, a technique for visualizing the distance between the NN models in a function space is also proposed. The technique applies a dimensionality reduction technique by approximating infinite-dimensional functions as numerical vectors to visualize the trajectory of how the models change by the distributed learning algorithm.
Parameterized Decision-making with Multi-modal Perception for Autonomous Driving
Authors: Authors: Yuyang Xia, Shuncheng Liu, Quanlin Yu, Liwei Deng, You Zhang, Han Su, Kai Zheng
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.11935
Pdf link: https://arxiv.org/pdf/2312.11935
Abstract Autonomous driving is an emerging technology that has advanced rapidly over the last decade. Modern transportation is expected to benefit greatly from a wise decision-making framework of autonomous vehicles, including the improvement of mobility and the minimization of risks and travel time. However, existing methods either ignore the complexity of environments only fitting straight roads, or ignore the impact on surrounding vehicles during optimization phases, leading to weak environmental adaptability and incomplete optimization objectives. To address these limitations, we propose a parameterized decision-making framework with multi-modal perception based on deep reinforcement learning, called AUTO. We conduct a comprehensive perception to capture the state features of various traffic participants around the autonomous vehicle, based on which we design a graph-based model to learn a state representation of the multi-modal semantic features. To distinguish between lane-following and lane-changing, we decompose an action of the autonomous vehicle into a parameterized action structure that first decides whether to change lanes and then computes an exact action to execute. A hybrid reward function takes into account aspects of safety, traffic efficiency, passenger comfort, and impact to guide the framework to generate optimal actions. In addition, we design a regularization term and a multi-worker paradigm to enhance the training. Extensive experiments offer evidence that AUTO can advance state-of-the-art in terms of both macroscopic and microscopic effectiveness.
Adversarial AutoMixup
Authors: Authors: Huafeng Qin, Xin Jin, Yun Jiang, Mounim A. El-Yacoubi, Xinbo Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.11954
Pdf link: https://arxiv.org/pdf/2312.11954
Abstract Data mixing augmentation has been widely applied to improve the generalization ability of deep neural networks. Recently, offline data mixing augmentation, e.g. handcrafted and saliency information-based mixup, has been gradually replaced by automatic mixing approaches. Through minimizing two sub-tasks, namely, mixed sample generation and mixup classification in an end-to-end way, AutoMix significantly improves accuracy on image classification tasks. However, as the optimization objective is consistent for the two sub-tasks, this approach is prone to generating consistent instead of diverse mixed samples, which results in overfitting for target task training. In this paper, we propose AdAutomixup, an adversarial automatic mixup augmentation approach that generates challenging samples to train a robust classifier for image classification, by alternatively optimizing the classifier and the mixup sample generator. AdAutomixup comprises two modules, a mixed example generator, and a target classifier. The mixed sample generator aims to produce hard mixed examples to challenge the target classifier while the target classifier`s aim is to learn robust features from hard mixed examples to improve generalization. To prevent the collapse of the inherent meanings of images, we further introduce an exponential moving average (EMA) teacher and cosine similarity to train AdAutomixup in an end-to-end way. Extensive experiments on seven image benchmarks consistently prove that our approach outperforms the state of the art in various classification scenarios.
Optimizing Diffusion Noise Can Serve As Universal Motion Priors
Authors: Authors: Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan, Thabo Beeler, Supasorn Suwajanakorn, Siyu Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.11994
Pdf link: https://arxiv.org/pdf/2312.11994
Abstract We propose Diffusion Noise Optimization (DNO), a new method that effectively leverages existing motion diffusion models as motion priors for a wide range of motion-related tasks. Instead of training a task-specific diffusion model for each new task, DNO operates by optimizing the diffusion latent noise of an existing pre-trained text-to-motion model. Given the corresponding latent noise of a human motion, it propagates the gradient from the target criteria defined on the motion space through the whole denoising process to update the diffusion latent noise. As a result, DNO supports any use cases where criteria can be defined as a function of motion. In particular, we show that, for motion editing and control, DNO outperforms existing methods in both achieving the objective and preserving the motion content. DNO accommodates a diverse range of editing modes, including changing trajectory, pose, joint locations, or avoiding newly added obstacles. In addition, DNO is effective in motion denoising and completion, producing smooth and realistic motion from noisy and partial inputs. DNO achieves these results at inference time without the need for model retraining, offering great versatility for any defined reward or loss function on the motion representation.
PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping
Authors: Authors: Nai-Chieh Huang, Ping-Chun Hsieh, Kuo-Hao Ho, I-Chen Wu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.12065
Pdf link: https://arxiv.org/pdf/2312.12065
Abstract Proximal Policy Optimization algorithm employing a clipped surrogate objective (PPO-Clip) is a prominent exemplar of the policy optimization methods. However, despite its remarkable empirical success, PPO-Clip lacks theoretical substantiation to date. In this paper, we contribute to the field by establishing the first global convergence results of a PPO-Clip variant in both tabular and neural function approximation settings. Our findings highlight the $O(1/\sqrt{T})$ min-iterate convergence rate specifically in the context of neural function approximation. We tackle the inherent challenges in analyzing PPO-Clip through three central concepts: (i) We introduce a generalized version of the PPO-Clip objective, illuminated by its connection with the hinge loss. (ii) Employing entropic mirror descent, we establish asymptotic convergence for tabular PPO-Clip with direct policy parameterization. (iii) Inspired by the tabular analysis, we streamline convergence analysis by introducing a two-step policy improvement approach. This decouples policy search from complex neural policy parameterization using a regression-based update scheme. Furthermore, we gain deeper insights into the efficacy of PPO-Clip by interpreting these generalized objectives. Our theoretical findings also mark the first characterization of the influence of the clipping mechanism on PPO-Clip convergence. Importantly, the clipping range affects only the pre-constant of the convergence rate.
PICNN: A Pathway towards Interpretable Convolutional Neural Networks
Authors: Authors: Wengang Guo, Jiayi Yang, Huilin Yin, Qijun Chen, Wei Ye
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.12068
Pdf link: https://arxiv.org/pdf/2312.12068
Abstract Convolutional Neural Networks (CNNs) have exhibited great performance in discriminative feature learning for complex visual tasks. Besides discrimination power, interpretability is another important yet under-explored property for CNNs. One difficulty in the CNN interpretability is that filters and image classes are entangled. In this paper, we introduce a novel pathway to alleviate the entanglement between filters and image classes. The proposed pathway groups the filters in a late conv-layer of CNN into class-specific clusters. Clusters and classes are in a one-to-one relationship. Specifically, we use the Bernoulli sampling to generate the filter-cluster assignment matrix from a learnable filter-class correspondence matrix. To enable end-to-end optimization, we develop a novel reparameterization trick for handling the non-differentiable Bernoulli sampling. We evaluate the effectiveness of our method on ten widely used network architectures (including nine CNNs and a ViT) and five benchmark datasets. Experimental results have demonstrated that our method PICNN (the combination of standard CNNs with our proposed pathway) exhibits greater interpretability than standard CNNs while achieving higher or comparable discrimination power.
Low Dispersion Optimized High-Order Schemes for Discretization of Non-Linear Straight and Mixed Second Derivative Terms
Authors: Authors: Hemanth Chandravamsi, Steven H. Frankel
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)
Arxiv link: https://arxiv.org/abs/2312.12069
Pdf link: https://arxiv.org/pdf/2312.12069
Abstract In this paper, we propose a new set of midpoint-based high-order discretization schemes for computing straight and mixed nonlinear second derivative terms that appear in the compressible Navier-Stokes equations. Firstly, we detail a set of conventional fourth and sixth-order baseline schemes that utilize central midpoint derivatives for the calculation of second derivatives terms. To enhance the spectral properties of the baseline schemes, an optimization procedure is proposed that adjusts the order and truncation error of the midpoint derivative approximation while still constraining the same overall stencil width and scheme order. A new filter penalty term is introduced into the midpoint derivative calculation to help achieve high wavenumber accuracy and high-frequency damping in the mixed derivative discretization. Fourier analysis performed on the both straight and mixed second derivative terms show high spectral efficiency and minimal numerical viscosity with no odd-even decoupling effect. Numerical validation of the resulting optimized schemes is performed through various benchmark test cases assessing their theoretical order of accuracy and solution resolution. The results highlight that the present optimized schemes efficiently utilize the inherent viscosity of the governing equations to achieve improved simulation stability - a feature attributed to their superior spectral resolution in the high wavenumber range. The method is also tested and applied to non-uniform structured meshes in curvilinear coordinates, employing a supersonic impinging jet test case.
DLCA-Recon: Dynamic Loose Clothing Avatar Reconstruction from Monocular Videos
Authors: Authors: Chunjie Luo, Fei Luo, Yusen Wang, Enxu Zhao, Chunxia Xiao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.12096
Pdf link: https://arxiv.org/pdf/2312.12096
Abstract Reconstructing a dynamic human with loose clothing is an important but difficult task. To address this challenge, we propose a method named DLCA-Recon to create human avatars from monocular videos. The distance from loose clothing to the underlying body rapidly changes in every frame when the human freely moves and acts. Previous methods lack effective geometric initialization and constraints for guiding the optimization of deformation to explain this dramatic change, resulting in the discontinuous and incomplete reconstruction surface. To model the deformation more accurately, we propose to initialize an estimated 3D clothed human in the canonical space, as it is easier for deformation fields to learn from the clothed human than from SMPL. With both representations of explicit mesh and implicit SDF, we utilize the physical connection information between consecutive frames and propose a dynamic deformation field (DDF) to optimize deformation fields. DDF accounts for contributive forces on loose clothing to enhance the interpretability of deformations and effectively capture the free movement of loose clothing. Moreover, we propagate SMPL skinning weights to each individual and refine pose and skinning weights during the optimization to improve skinning transformation. Based on more reasonable initialization and DDF, we can simulate real-world physics more accurately. Extensive experiments on public and our own datasets validate that our method can produce superior results for humans with loose clothing compared to the SOTA methods.
On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width
Authors: Authors: Satoki Ishikawa, Ryo Karakida
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.12226
Pdf link: https://arxiv.org/pdf/2312.12226
Abstract Second-order optimization has been developed to accelerate the training of deep neural networks and it is being applied to increasingly larger-scale models. In this study, towards training on further larger scales, we identify a specific parameterization for second-order optimization that promotes feature learning in a stable manner even if the network width increases significantly. Inspired by a maximal update parameterization, we consider a one-step update of the gradient and reveal the appropriate scales of hyperparameters including random initialization, learning rates, and damping terms. Our approach covers two major second-order optimization algorithms, K-FAC and Shampoo, and we demonstrate that our parameterization achieves higher generalization performance in feature learning. In particular, it enables us to transfer the hyperparameters across models with different widths.
HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback
Authors: Authors: Gaoge Han, Shaoli Huang, Mingming Gong, Jinglei Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.12227
Pdf link: https://arxiv.org/pdf/2312.12227
Abstract We introduce HuTuMotion, an innovative approach for generating natural human motions that navigates latent motion diffusion models by leveraging few-shot human feedback. Unlike existing approaches that sample latent variables from a standard normal prior distribution, our method adapts the prior distribution to better suit the characteristics of the data, as indicated by human feedback, thus enhancing the quality of motion generation. Furthermore, our findings reveal that utilizing few-shot feedback can yield performance levels on par with those attained through extensive human feedback. This discovery emphasizes the potential and efficiency of incorporating few-shot human-guided optimization within latent diffusion models for personalized and style-aware human motion generation applications. The experimental results show the significantly superior performance of our method over existing state-of-the-art approaches.
Protecting Massive MIMO-Radar Coexistence: Precoding Design and Power Control
Authors: Authors: Mohamed Elfiatoure, Mohammadali Mohammadi, Hien Quoc Ngo, Peter J. Smith, Michail Matthaiou
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.12244
Pdf link: https://arxiv.org/pdf/2312.12244
Abstract This paper studies the coexistence between a downlink multiuser massive multi-input-multi-output (MIMO) communication system and MIMO radar. The performance of the massive MIMO system with maximum ratio ($\MR$), zero-forcing ($\ZF$), and protective $\ZF$ ($\PZF$) precoding designs is characterized in terms of spectral efficiency (SE) and by taking the channel estimation errors and power control into account. The idea of $\PZF$ precoding relies on the projection of the information-bearing signal onto the null space of the radar channel to protect the radar against communication signals. We further derive closed-form expressions for the detection probability of the radar system for the considered precoding designs. By leveraging the closed-form expressions for the SE and detection probability, we formulate a power control problem at the radar and base station (BS) to maximize the detection probability while satisfying the per-user SE requirements. This optimization problem can be efficiently tackled via the bisection method by solving a linear feasibility problem. Our analysis and simulations show that the $\PZF$ design has the highest detection probability performance among all designs, with intermediate SE performance compared to the other two designs. Moreover, by optimally selecting the power control coefficients at the BS and radar, the detection probability improves significantly.
Optimal Power Flow Pursuit via Feedback-based Safe Gradient Flow
Authors: Authors: Antonin Colot, Yiting Chen, Bertrand Cornelusse, Jorge Cortes, Emiliano Dall'Anese
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.12267
Pdf link: https://arxiv.org/pdf/2312.12267
Abstract This paper considers the problem of controlling the operation of inverter-interfaced distributed energy resources (DERs) in a distribution grid, to achieve operational and performance goals with limited system-level information. We develop an online feedback optimization method to drive the DERs' power setpoints to solutions of an AC optimal power flow (OPF) problem based only on voltage measurements (and without requiring measurements of the power consumption of non-controllable assets). The proposed method - grounded on the theory of control barrier functions - is based on a continuous approximation of the projected gradient flow, appropriately modified to accommodate measurements from the power network. We provide results in terms of local exponential stability, and assess the robustness to errors in the measurements and in the system Jacobian matrix. We show that the proposed method ensures anytime satisfaction of the voltage constraints when no model and measurement errors are present; if these errors are present and are small, the voltage violation is practically negligible. We also discuss extensions of the framework to virtual power plant setups. Numerical experiments on a 93-bus distribution system and with realistic load and production profiles show a superior performance in terms of voltage regulation relative to existing methods.
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
Authors: Authors: Jason Vega, Isha Chaudhary, Changming Xu, Gagandeep Singh
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.12321
Pdf link: https://arxiv.org/pdf/2312.12321
Abstract With the recent surge in popularity of LLMs has come an ever-increasing need for LLM safety training. In this paper, we show that SOTA open-source LLMs are vulnerable to simple, optimization-free attacks we refer to as $\textit{priming attacks}$, which are easy to execute and effectively bypass alignment from safety training. Our proposed attack improves the Attack Success Rate on Harmful Behaviors, as measured by Llama Guard, by up to $3.3\times$ compared to baselines. Source code and data are available at https://github.com/uiuc-focal-lab/llm-priming-attacks .
Optimizing Local Satisfaction of Long-Run Average Objectives in Markov Decision Processes
Authors: Authors: David Klaška, Antonín Kučera, Vojtěch Kůr, Vít Musil, Vojtěch Řehák
Subjects: Multiagent Systems (cs.MA); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.12325
Pdf link: https://arxiv.org/pdf/2312.12325
Abstract Long-run average optimization problems for Markov decision processes (MDPs) require constructing policies with optimal steady-state behavior, i.e., optimal limit frequency of visits to the states. However, such policies may suffer from local instability, i.e., the frequency of states visited in a bounded time horizon along a run differs significantly from the limit frequency. In this work, we propose an efficient algorithmic solution to this problem.
Path Planning for Continuum Rods Using Bernstein Surfaces
Authors: Authors: Maxwell Hammond, Venanzio Cichella, Amirreza F. Golestaneh, Caterina Lamuta
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.12333
Pdf link: https://arxiv.org/pdf/2312.12333
Abstract This paper presents a method for optimal motion planning of continuum robots by employing Bernstein surfaces to approximate the system's dynamics and impose complex constraints, including collision avoidance. The main contribution is the approximation of infinite-dimensional continuous problems into their discrete counterparts, facilitating their solution using standard optimization solvers. This discretization leverages the unique properties of Bernstein surface, providing a framework that extends previous works which focused on ODEs approximated by Bernstein polynomials. Numerical validations are conducted through several numerical scenarios. The presented methodology offers a promising direction for solving complex optimal control problems in the realm of soft robotics.
Transformed Primal-Dual Methods with Variable-Preconditioners
Authors: Authors: Long Chen, Ruchi Guo, Jingrong Wei
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2312.12355
Pdf link: https://arxiv.org/pdf/2312.12355
Abstract This paper introduces a novel Transformed Primal-Dual with variable-metric/preconditioner (TPDv) algorithm, designed to efficiently solve affine constrained optimization problems common in nonlinear partial differential equations (PDEs). Diverging from traditional methods, TPDv iteratively updates time-evolving preconditioning operators, enhancing adaptability. The algorithm is derived and analyzed, demonstrating global linear convergence rates under mild assumptions. Numerical experiments on challenging nonlinear PDEs, including the Darcy-Forchheimer model and a nonlinear electromagnetic problem, showcase the algorithm's superiority over existing methods in terms of iteration numbers and computational efficiency. The paper concludes with a comprehensive convergence analysis.
Chasing Fairness in Graphs: A GNN Architecture Perspective
Authors: Authors: Zhimeng Jiang, Xiaotian Han, Chao Fan, Zirui Liu, Na Zou, Ali Mostafavi, Xia Hu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2312.12369
Pdf link: https://arxiv.org/pdf/2312.12369
Abstract There has been significant progress in improving the performance of graph neural networks (GNNs) through enhancements in graph data, model architecture design, and training strategies. For fairness in graphs, recent studies achieve fair representations and predictions through either graph data pre-processing (e.g., node feature masking, and topology rewiring) or fair training strategies (e.g., regularization, adversarial debiasing, and fair contrastive learning). How to achieve fairness in graphs from the model architecture perspective is less explored. More importantly, GNNs exhibit worse fairness performance compared to multilayer perception since their model architecture (i.e., neighbor aggregation) amplifies biases. To this end, we aim to achieve fairness via a new GNN architecture. We propose \textsf{F}air \textsf{M}essage \textsf{P}assing (FMP) designed within a unified optimization framework for GNNs. Notably, FMP \textit{explicitly} renders sensitive attribute usage in \textit{forward propagation} for node classification task using cross-entropy loss without data pre-processing. In FMP, the aggregation is first adopted to utilize neighbors' information and then the bias mitigation step explicitly pushes demographic group node presentation centers together. In this way, FMP scheme can aggregate useful information from neighbors and mitigate bias to achieve better fairness and prediction tradeoff performance. Experiments on node classification tasks demonstrate that the proposed FMP outperforms several baselines in terms of fairness and accuracy on three real-world datasets. The code is available in {\url{https://github.com/zhimengj0326/FMP}}.
Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models
Authors: Authors: Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.12416
Pdf link: https://arxiv.org/pdf/2312.12416
Abstract The quality of the prompts provided to text-to-image diffusion models determines how faithful the generated content is to the user's intent, often requiring `prompt engineering'. To harness visual concepts from target images without prompt engineering, current approaches largely rely on embedding inversion by optimizing and then mapping them to pseudo-tokens. However, working with such high-dimensional vector representations is challenging because they lack semantics and interpretability, and only allow simple vector operations when using them. Instead, this work focuses on inverting the diffusion model to obtain interpretable language prompts directly. The challenge of doing this lies in the fact that the resulting optimization problem is fundamentally discrete and the space of prompts is exponentially large; this makes using standard optimization techniques, such as stochastic gradient descent, difficult. To this end, we utilize a delayed projection scheme to optimize for prompts representative of the vocabulary space in the model. Further, we leverage the findings that different timesteps of the diffusion process cater to different levels of detail in an image. The later, noisy, timesteps of the forward diffusion process correspond to the semantic information, and therefore, prompt inversion in this range provides tokens representative of the image semantics. We show that our approach can identify semantically interpretable and meaningful prompts for a target image which can be used to synthesize diverse images with similar content. We further illustrate the application of the optimized prompts in evolutionary image generation and concept removal.
Device Scheduling for Relay-assisted Over-the-Air Aggregation in Federated Learning
Authors: Authors: Fan Zhang, Jining Chen, Kunlun Wang, Wen Chen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.12417
Pdf link: https://arxiv.org/pdf/2312.12417
Abstract Federated learning (FL) leverages data distributed at the edge of the network to enable intelligent applications. The efficiency of FL can be improved by using over-the-air computation (AirComp) technology in the process of gradient aggregation. In this paper, we propose a relay-assisted large-scale FL framework, and investigate the device scheduling problem in relay-assisted FL systems under the constraints of power consumption and mean squared error (MSE). we formulate a joint device scheduling, and power allocation problem to maximize the number of scheduled devices. We solve the resultant non-convex optimization problem by transforming the optimization problem into multiple sparse optimization problems. By the proposed device scheduling algorithm, these sparse sub-problems are solved and the maximum number of federated learning edge devices is obtained. The simulation results demonstrate the effectiveness of the proposed scheme as compared with other benchmark schemes.
Keyword: adam

Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Authors: Authors: Wei Tang, Liang Li, Xuejing Liu, Lu Jin, Jinhui Tang, Zechao Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.11967
Pdf link: https://arxiv.org/pdf/2312.11967
Abstract Visual grounding (VG) aims to locate a specific target in an image based on a given language query. The discriminative information from context is important for distinguishing the target from other objects, particularly for the targets that have the same category as others. However, most previous methods underestimate such information. Moreover, they are usually designed for the standard scene (without any novel object), which limits their generalization to the open-vocabulary scene. In this paper, we propose a novel framework with context disentangling and prototype inheriting for robust visual grounding to handle both scenes. Specifically, the context disentangling disentangles the referent and context features, which achieves better discrimination between them. The prototype inheriting inherits the prototypes discovered from the disentangled visual features by a prototype bank to fully utilize the seen data, especially for the open-vocabulary scene. The fused features, obtained by leveraging Hadamard product on disentangled linguistic and visual features of prototypes to avoid sharp adjusting the importance between the two types of features, are then attached with a special token and feed to a vision Transformer encoder for bounding box regression. Extensive experiments are conducted on both standard and open-vocabulary scenes. The performance comparisons indicate that our method outperforms the state-of-the-art methods in both scenarios. {The code is available at https://github.com/WayneTomas/TransCP.
Keyword: gradient

Adaptive Smooth Activation for Improved Disease Diagnosis and Organ Segmentation from Radiology Scans
Authors: Authors: Koushik Biswas, Debesh Jha, Nikhil Kumar Tomar, Gorkem Durak, Alpay Medetalibeyoglu, Matthew Antalek, Yury Velichko, Daniela Ladner, Amir Bohrani, Ulas Bagci
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2312.11480
Pdf link: https://arxiv.org/pdf/2312.11480
Abstract In this study, we propose a new activation function, called Adaptive Smooth Activation Unit (ASAU), tailored for optimized gradient propagation, thereby enhancing the proficiency of convolutional networks in medical image analysis. We apply this new activation function to two important and commonly used general tasks in medical image analysis: automatic disease diagnosis and organ segmentation in CT and MRI. Our rigorous evaluation on the RadImageNet abdominal/pelvis (CT and MRI) dataset and Liver Tumor Segmentation Benchmark (LiTS) 2017 demonstrates that our ASAU-integrated frameworks not only achieve a substantial (4.80\%) improvement over ReLU in classification accuracy (disease detection) on abdominal CT and MRI but also achieves 1\%-3\% improvement in dice coefficient compared to widely used activations for `healthy liver tissue' segmentation. These improvements offer new baselines for developing a diagnostic tool, particularly for complex, challenging pathologies. The superior performance and adaptability of ASAU highlight its potential for integration into a wide range of image classification and segmentation tasks.
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Authors: Authors: Michael Psenka, Alejandro Escontrela, Pieter Abbeel, Yi Ma
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.11752
Pdf link: https://arxiv.org/pdf/2312.11752
Abstract Diffusion models have become a popular choice for representing actor policies in behavior cloning and offline reinforcement learning. This is due to their natural ability to optimize an expressive class of distributions over a continuous space. However, previous works fail to exploit the score-based structure of diffusion models, and instead utilize a simple behavior cloning term to train the actor, limiting their ability in the actor-critic setting. In this paper, we focus on off-policy reinforcement learning and propose a new method for learning a diffusion model policy that exploits the linked structure between the score of the policy and the action gradient of the Q-function. We denote this method Q-score matching and provide theoretical justification for this approach. We conduct experiments in simulated environments to demonstrate the effectiveness of our proposed method and compare to popular baselines.
Root Cause Explanation of Outliers under Noisy Mechanisms
Authors: Authors: Phuoc Nguyen, Truyen Tran, Sunil Gupta, Thin Nguyen, Svetha Venkatesh
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.11818
Pdf link: https://arxiv.org/pdf/2312.11818
Abstract Identifying root causes of anomalies in causal processes is vital across disciplines. Once identified, one can isolate the root causes and implement necessary measures to restore the normal operation. Causal processes are often modelled as graphs with entities being nodes and their paths/interconnections as edge. Existing work only consider the contribution of nodes in the generative process, thus can not attribute the outlier score to the edges of the mechanism if the anomaly occurs in the connections. In this paper, we consider both individual edge and node of each mechanism when identifying the root causes. We introduce a noisy functional causal model to account for this purpose. Then, we employ Bayesian learning and inference methods to infer the noises of the nodes and edges. We then represent the functional form of a target outlier leaf as a function of the node and edge noises. Finally, we propose an efficient gradient-based attribution method to compute the anomaly attribution scores which scales linearly with the number of nodes and edges. Experiments on simulated datasets and two real-world scenario datasets show better anomaly attribution performance of the proposed method compared to the baselines. Our method scales to larger graphs with more nodes and edges.
Sparse is Enough in Fine-tuning Pre-trained Large Language Model
Authors: Authors: Weixi Song, Zuchao Li, Lefei Zhang, Hai Zhao, Bo Du
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2312.11875
Pdf link: https://arxiv.org/pdf/2312.11875
Abstract With the prevalence of pre-training-fine-tuning paradigm, how to efficiently adapt the pre-trained model to the downstream tasks has been an intriguing issue. Parameter-Efficient Fine-Tuning (PEFT) methods have been proposed for low-cost adaptation, including Adapters, Bia-only, and the recently widely used Low-Rank Adaptation. Although these methods have demonstrated their effectiveness to some extent and have been widely applied, the underlying principles are still unclear. In this paper, we reveal the transition of loss landscape in the downstream domain from random initialization to pre-trained initialization, that is, from low-amplitude oscillation to high-amplitude oscillation. The parameter gradients exhibit a property akin to sparsity, where a small fraction of components dominate the total gradient norm, for instance, 1% of the components account for 99% of the gradient. This property ensures that the pre-trained model can easily find a flat minimizer which guarantees the model's ability to generalize even with a low number of trainable parameters. Based on this, we propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT), and validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning. The code is accessible at https://github.com/song-wx/SIFT/.
Optimizing Diffusion Noise Can Serve As Universal Motion Priors
Authors: Authors: Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan, Thabo Beeler, Supasorn Suwajanakorn, Siyu Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.11994
Pdf link: https://arxiv.org/pdf/2312.11994
Abstract We propose Diffusion Noise Optimization (DNO), a new method that effectively leverages existing motion diffusion models as motion priors for a wide range of motion-related tasks. Instead of training a task-specific diffusion model for each new task, DNO operates by optimizing the diffusion latent noise of an existing pre-trained text-to-motion model. Given the corresponding latent noise of a human motion, it propagates the gradient from the target criteria defined on the motion space through the whole denoising process to update the diffusion latent noise. As a result, DNO supports any use cases where criteria can be defined as a function of motion. In particular, we show that, for motion editing and control, DNO outperforms existing methods in both achieving the objective and preserving the motion content. DNO accommodates a diverse range of editing modes, including changing trajectory, pose, joint locations, or avoiding newly added obstacles. In addition, DNO is effective in motion denoising and completion, producing smooth and realistic motion from noisy and partial inputs. DNO achieves these results at inference time without the need for model retraining, offering great versatility for any defined reward or loss function on the motion representation.
Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method
Authors: Authors: Jiachun Pan, Hanshu Yan, Jun Hao Liew, Jiashi Feng, Vincent Y. F. Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.12030
Pdf link: https://arxiv.org/pdf/2312.12030
Abstract Training-free guided sampling in diffusion models leverages off-the-shelf pre-trained networks, such as an aesthetic evaluation model, to guide the generation process. Current training-free guided sampling algorithms obtain the guidance energy function based on a one-step estimate of the clean image. However, since the off-the-shelf pre-trained networks are trained on clean images, the one-step estimation procedure of the clean image may be inaccurate, especially in the early stages of the generation process in diffusion models. This causes the guidance in the early time steps to be inaccurate. To overcome this problem, we propose Symplectic Adjoint Guidance (SAG), which calculates the gradient guidance in two inner stages. Firstly, SAG estimates the clean image via $n$ function calls, where $n$ serves as a flexible hyperparameter that can be tailored to meet specific image quality requirements. Secondly, SAG uses the symplectic adjoint method to obtain the gradients accurately and efficiently in terms of the memory requirements. Extensive experiments demonstrate that SAG generates images with higher qualities compared to the baselines in both guided image and video generation tasks.
Extension of the Dip-test Repertoire -- Efficient and Differentiable p-value Calculation for Clustering
Authors: Authors: Lena G. M. Bauer, Collin Leiber, Christian Böhm, Claudia Plant
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.12050
Pdf link: https://arxiv.org/pdf/2312.12050
Abstract Over the last decade, the Dip-test of unimodality has gained increasing interest in the data mining community as it is a parameter-free statistical test that reliably rates the modality in one-dimensional samples. It returns a so called Dip-value and a corresponding probability for the sample's unimodality (Dip-p-value). These two values share a sigmoidal relationship. However, the specific transformation is dependent on the sample size. Many Dip-based clustering algorithms use bootstrapped look-up tables translating Dip- to Dip-p-values for a certain limited amount of sample sizes. We propose a specifically designed sigmoid function as a substitute for these state-of-the-art look-up tables. This accelerates computation and provides an approximation of the Dip- to Dip-p-value transformation for every single sample size. Further, it is differentiable and can therefore easily be integrated in learning schemes using gradient descent. We showcase this by exploiting our function in a novel subspace clustering algorithm called Dip'n'Sub. We highlight in extensive experiments the various benefits of our proposal.
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property
Authors: Authors: Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, Tuomas Sandholm
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.12067
Pdf link: https://arxiv.org/pdf/2312.12067
Abstract Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning. Their theoretical understanding in multiagent settings, however, remains limited, especially beyond two-player competitive and potential Markov games. In this paper, we develop a new framework to characterize optimistic policy gradient methods in multi-player Markov games with a single controller. Specifically, under the further assumption that the game exhibits an equilibrium collapse, in that the marginals of coarse correlated equilibria (CCE) induce Nash equilibria (NE), we show convergence to stationary $\epsilon$-NE in $O(1/\epsilon^2)$ iterations, where $O(\cdot)$ suppresses polynomial factors in the natural parameters of the game. Such an equilibrium collapse is well-known to manifest itself in two-player zero-sum Markov games, but also occurs even in a class of multi-player Markov games with separable interactions, as established by recent work. As a result, we bypass known complexity barriers for computing stationary NE when either of our assumptions fails. Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
ZS-SRT: An Efficient Zero-Shot Super-Resolution Training Method for Neural Radiance Fields
Authors: Authors: Xiang Feng, Yongbo He, Yubo Wang, Chengkai Wang, Zhenzhong Kuang, Jiajun Ding, Feiwei Qin, Jun Yu, Jianping Fan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2312.12122
Pdf link: https://arxiv.org/pdf/2312.12122
Abstract Neural Radiance Fields (NeRF) have achieved great success in the task of synthesizing novel views that preserve the same resolution as the training views. However, it is challenging for NeRF to synthesize high-quality high-resolution novel views with low-resolution training data. To solve this problem, we propose a zero-shot super-resolution training framework for NeRF. This framework aims to guide the NeRF model to synthesize high-resolution novel views via single-scene internal learning rather than requiring any external high-resolution training data. Our approach consists of two stages. First, we learn a scene-specific degradation mapping by performing internal learning on a pretrained low-resolution coarse NeRF. Second, we optimize a super-resolution fine NeRF by conducting inverse rendering with our mapping function so as to backpropagate the gradients from low-resolution 2D space into the super-resolution 3D sampling space. Then, we further introduce a temporal ensemble strategy in the inference phase to compensate for the scene estimation errors. Our method is featured on two points: (1) it does not consume high-resolution views or additional scene data to train super-resolution NeRF; (2) it can speed up the training process by adopting a coarse-to-fine strategy. By conducting extensive experiments on public datasets, we have qualitatively and quantitatively demonstrated the effectiveness of our method.
OVD-Explorer:Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments
Authors: Authors: Jinyi Liu, Zhi Wang, Yan Zheng, Jianye Hao, Chenjia Bai, Junjie Ye, Zhen Wang, Haiyin Piao, Yang Sun
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.12145
Pdf link: https://arxiv.org/pdf/2312.12145
Abstract In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream principle for directing exploration towards less explored areas, characterized by higher uncertainty. However, in the presence of environmental stochasticity (noise), purely optimistic exploration may lead to excessive probing of high-noise areas, consequently impeding exploration efficiency. Hence, in exploring noisy environments, while optimism-driven exploration serves as a foundation, prudent attention to alleviating unnecessary over-exploration in high-noise areas becomes beneficial. In this work, we propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control. OVD-Explorer proposes a new measurement of the policy's exploration ability considering noise in optimistic perspectives, and leverages gradient ascent to drive exploration. Practically, OVD-Explorer can be easily integrated with continuous control RL algorithms. Extensive evaluations on the MuJoCo and GridChaos tasks demonstrate the superiority of OVD-Explorer in achieving noise-aware optimistic exploration.
Quantitative convergence of a discretization of dynamic optimal transport using the dual formulation
Authors: Authors: Sadashige Ishida, Hugo Lavenant
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.12213
Pdf link: https://arxiv.org/pdf/2312.12213
Abstract We present a discretization of the dynamic optimal transport problem for which we can obtain the convergence rate for the value of the transport cost to its continuous value when the temporal and spatial stepsize vanish. This convergence result does not require any regularity assumption on the measures, though experiments suggest that the rate is not sharp. Via an analysis of the duality gap we also obtain the convergence rates for the gradient of the optimal potentials and the velocity field under mild regularity assumptions. To obtain such rates we discretize the dual formulation of the dynamic optimal transport problem and use the mature literature related to the error due to discretizing the Hamilton-Jacobi equation.
On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width
Authors: Authors: Satoki Ishikawa, Ryo Karakida
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.12226
Pdf link: https://arxiv.org/pdf/2312.12226
Abstract Second-order optimization has been developed to accelerate the training of deep neural networks and it is being applied to increasingly larger-scale models. In this study, towards training on further larger scales, we identify a specific parameterization for second-order optimization that promotes feature learning in a stable manner even if the network width increases significantly. Inspired by a maximal update parameterization, we consider a one-step update of the gradient and reveal the appropriate scales of hyperparameters including random initialization, learning rates, and damping terms. Our approach covers two major second-order optimization algorithms, K-FAC and Shampoo, and we demonstrate that our parameterization achieves higher generalization performance in feature learning. In particular, it enables us to transfer the hyperparameters across models with different widths.
Optimal Power Flow Pursuit via Feedback-based Safe Gradient Flow
Authors: Authors: Antonin Colot, Yiting Chen, Bertrand Cornelusse, Jorge Cortes, Emiliano Dall'Anese
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.12267
Pdf link: https://arxiv.org/pdf/2312.12267
Abstract This paper considers the problem of controlling the operation of inverter-interfaced distributed energy resources (DERs) in a distribution grid, to achieve operational and performance goals with limited system-level information. We develop an online feedback optimization method to drive the DERs' power setpoints to solutions of an AC optimal power flow (OPF) problem based only on voltage measurements (and without requiring measurements of the power consumption of non-controllable assets). The proposed method - grounded on the theory of control barrier functions - is based on a continuous approximation of the projected gradient flow, appropriately modified to accommodate measurements from the power network. We provide results in terms of local exponential stability, and assess the robustness to errors in the measurements and in the system Jacobian matrix. We show that the proposed method ensures anytime satisfaction of the voltage constraints when no model and measurement errors are present; if these errors are present and are small, the voltage violation is practically negligible. We also discuss extensions of the framework to virtual power plant setups. Numerical experiments on a 93-bus distribution system and with realistic load and production profiles show a superior performance in terms of voltage regulation relative to existing methods.
pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
Authors: Authors: David Charatan, Sizhe Li, Andrea Tagliasacchi, Vincent Sitzmann
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.12337
Pdf link: https://arxiv.org/pdf/2312.12337
Abstract We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution. We make this sampling operation differentiable via a reparameterization trick, allowing us to back-propagate gradients through the Gaussian splatting representation. We benchmark our method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where we outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance field.
Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models
Authors: Authors: Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.12416
Pdf link: https://arxiv.org/pdf/2312.12416
Abstract The quality of the prompts provided to text-to-image diffusion models determines how faithful the generated content is to the user's intent, often requiring `prompt engineering'. To harness visual concepts from target images without prompt engineering, current approaches largely rely on embedding inversion by optimizing and then mapping them to pseudo-tokens. However, working with such high-dimensional vector representations is challenging because they lack semantics and interpretability, and only allow simple vector operations when using them. Instead, this work focuses on inverting the diffusion model to obtain interpretable language prompts directly. The challenge of doing this lies in the fact that the resulting optimization problem is fundamentally discrete and the space of prompts is exponentially large; this makes using standard optimization techniques, such as stochastic gradient descent, difficult. To this end, we utilize a delayed projection scheme to optimize for prompts representative of the vocabulary space in the model. Further, we leverage the findings that different timesteps of the diffusion process cater to different levels of detail in an image. The later, noisy, timesteps of the forward diffusion process correspond to the semantic information, and therefore, prompt inversion in this range provides tokens representative of the image semantics. We show that our approach can identify semantically interpretable and meaningful prompts for a target image which can be used to synthesize diverse images with similar content. We further illustrate the application of the optimized prompts in evolutionary image generation and concept removal.
Device Scheduling for Relay-assisted Over-the-Air Aggregation in Federated Learning
Authors: Authors: Fan Zhang, Jining Chen, Kunlun Wang, Wen Chen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.12417
Pdf link: https://arxiv.org/pdf/2312.12417
Abstract Federated learning (FL) leverages data distributed at the edge of the network to enable intelligent applications. The efficiency of FL can be improved by using over-the-air computation (AirComp) technology in the process of gradient aggregation. In this paper, we propose a relay-assisted large-scale FL framework, and investigate the device scheduling problem in relay-assisted FL systems under the constraints of power consumption and mean squared error (MSE). we formulate a joint device scheduling, and power allocation problem to maximize the number of scheduled devices. We solve the resultant non-convex optimization problem by transforming the optimization problem into multiple sparse optimization problems. By the proposed device scheduling algorithm, these sparse sub-problems are solved and the maximum number of federated learning edge devices is obtained. The simulation results demonstrate the effectiveness of the proposed scheme as compared with other benchmark schemes.
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Authors: Authors: Shraman Pramanick, Guangxing Han, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.12423
Pdf link: https://arxiv.org/pdf/2312.12423
Abstract The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a single framework. In this work, we introduce VistaLLM, a powerful visual system that addresses coarse- and fine-grained VL tasks over single and multiple input images using a unified framework. VistaLLM utilizes an instruction-guided image tokenizer that filters global embeddings using task descriptions to extract compressed and refined features from numerous images. Moreover, VistaLLM employs a gradient-aware adaptive sampling technique to represent binary segmentation masks as sequences, significantly improving over previously used uniform sampling. To bolster the desired capability of VistaLLM, we curate CoinIt, a comprehensive coarse-to-fine instruction tuning dataset with 6.8M samples. We also address the lack of multi-image grounding datasets by introducing a novel task, AttCoSeg (Attribute-level Co-Segmentation), which boosts the model's reasoning and grounding capability over multiple input images. Extensive experiments on a wide range of V- and VL tasks demonstrate the effectiveness of VistaLLM by achieving consistent state-of-the-art performance over strong baselines across all downstream tasks. Our project page can be found at https://shramanpramanick.github.io/VistaLLM/.
Efficient Title Reranker for Fast and Improved Knowledge-Intense NLP
Authors: Authors: Ziyi Chen, Heyi Tao, Daqian Zuo, Jize Jiang, Yang Jun, Yuxiang Wei
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.12430
Pdf link: https://arxiv.org/pdf/2312.12430
Abstract We introduce Efficient Title Reranker via Broadcasting Query Encoder, a novel title reranking technique to achieve efficient title reranking 20x-40x faster than vanilla passage reranker. However, one of the challenges with the training of Efficient Title Reranker is the instability. Analyzing the issue, we found some very difficult ground truths might act as noisy labels causing accuracy to drop as well as some extreme values in model probability output causing nan. To address these issues, we introduce the Sigmoid Trick, a novel technique that reduces the gradient update of both cases resulting in better retrieval efficacy. Experiments showed the effectiveness of ETR and sigmoid trick as we achieved four state-of-the-art positions on the kilt knowledge benchmark.
Keyword: super-resolution

FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline
Authors: Authors: Chien-Yu Lin, Qichen Fu, Thomas Merth, Karren Yang, Anurag Ranjan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2312.11537
Pdf link: https://arxiv.org/pdf/2312.11537
Abstract Super-resolution (SR) techniques have recently been proposed to upscale the outputs of neural radiance fields (NeRF) and generate high-quality images with enhanced inference speeds. However, existing NeRF+SR methods increase training overhead by using extra input features, loss functions, and/or expensive training procedures such as knowledge distillation. In this paper, we aim to leverage SR for efficiency gains without costly training or architectural changes. Specifically, we build a simple NeRF+SR pipeline that directly combines existing modules, and we propose a lightweight augmentation technique, random patch sampling, for training. Compared to existing NeRF+SR methods, our pipeline mitigates the SR computing overhead and can be trained up to 23x faster, making it feasible to run on consumer devices such as the Apple MacBook. Experiments show our pipeline can upscale NeRF outputs by 2-4x while maintaining high quality, increasing inference speeds by up to 18x on an NVIDIA V100 GPU and 12.8x on an M1 Pro chip. We conclude that SR can be a simple but effective technique for improving the efficiency of NeRF models for consumer devices.
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions
Authors: Authors: Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.11595
Pdf link: https://arxiv.org/pdf/2312.11595
Abstract Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for more fine-level image processing tasks, such as denoising, super-resolution, deblurring, and compression artifact removal. In this paper, we develop TIP, a Text-driven Image Processing framework that leverages natural language as a user-friendly interface to control the image restoration process. We consider the capacity of text information in two dimensions. First, we use content-related prompts to enhance the semantic alignment, effectively alleviating identity ambiguity in the restoration outcomes. Second, our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength, without the need for explicit task-specific design. In addition, we introduce a novel fusion mechanism that augments the existing ControlNet architecture by learning to rescale the generative prior, thereby achieving better restoration fidelity. Our extensive experiments demonstrate the superior restoration performance of TIP compared to the state of the arts, alongside offering the flexibility of text-based control over the restoration effects.
ZS-SRT: An Efficient Zero-Shot Super-Resolution Training Method for Neural Radiance Fields
Authors: Authors: Xiang Feng, Yongbo He, Yubo Wang, Chengkai Wang, Zhenzhong Kuang, Jiajun Ding, Feiwei Qin, Jun Yu, Jianping Fan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2312.12122
Pdf link: https://arxiv.org/pdf/2312.12122
Abstract Neural Radiance Fields (NeRF) have achieved great success in the task of synthesizing novel views that preserve the same resolution as the training views. However, it is challenging for NeRF to synthesize high-quality high-resolution novel views with low-resolution training data. To solve this problem, we propose a zero-shot super-resolution training framework for NeRF. This framework aims to guide the NeRF model to synthesize high-resolution novel views via single-scene internal learning rather than requiring any external high-resolution training data. Our approach consists of two stages. First, we learn a scene-specific degradation mapping by performing internal learning on a pretrained low-resolution coarse NeRF. Second, we optimize a super-resolution fine NeRF by conducting inverse rendering with our mapping function so as to backpropagate the gradients from low-resolution 2D space into the super-resolution 3D sampling space. Then, we further introduce a temporal ensemble strategy in the inference phase to compensate for the scene estimation errors. Our method is featured on two points: (1) it does not consume high-resolution views or additional scene data to train super-resolution NeRF; (2) it can speed up the training process by adopting a coarse-to-fine strategy. By conducting extensive experiments on public datasets, we have qualitatively and quantitatively demonstrated the effectiveness of our method.

zoq / arxiv-updates

New submissions for Wed, 20 Dec 23 #668

Keyword: sgd

Keyword: optimization

Low-Latency ML Inference by Grouping Correlated Data Objects and Computation

A Simulated Annealing-Based Multiobjective Optimization Algorithm for Minimum Weight Minimum Connected Dominating Set Problem

Improved Differentially Private and Lazy Online Convex Optimization

Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior

A Unified Pre-training and Adaptation Framework for Combinatorial Optimization on Graphs

Learning Interpretable Queries for Explainable Image Classification with Information Pursuit

PR-NeuS: A Prior-based Residual Learning Paradigm for Fast Multi-view Neural Surface Reconstruction

Two-Channel Extended Kalman Filtering with Intermittent Measurements

Experiment-informed finite-strain inverse design of spinodal metamaterials

ADMM-MM Algorithm for General Tensor Decomposition

Text-Image Conditioned Diffusion for Consistent Text-to-3D Generation

Faster Convergence with Multiway Preferences

SoC-Tuner: An Importance-guided Exploration Framework for DNN-targeting SoC Design

Provably Convergent Federated Trilevel Learning

Enhancing Social Decision-Making of Autonomous Vehicles: A Mixed-Strategy Game Approach With Interaction Orientation Identification

Adaptive Tracking and Perching for Quadrotor in Dynamic Scenarios

Difficulty-Focused Contrastive Learning for Knowledge Tracing with a Large Language Model-Based Difficulty Prediction

Stable Relay Learning Optimization Approach for Fast Power System Production Cost Minimization Simulation

Convergence Visualizer of Decentralized Federated Distillation with Reduced Communication Costs

Parameterized Decision-making with Multi-modal Perception for Autonomous Driving

Adversarial AutoMixup

Optimizing Diffusion Noise Can Serve As Universal Motion Priors

PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping

PICNN: A Pathway towards Interpretable Convolutional Neural Networks

Low Dispersion Optimized High-Order Schemes for Discretization of Non-Linear Straight and Mixed Second Derivative Terms

DLCA-Recon: Dynamic Loose Clothing Avatar Reconstruction from Monocular Videos

On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width

HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback

Protecting Massive MIMO-Radar Coexistence: Precoding Design and Power Control

Optimal Power Flow Pursuit via Feedback-based Safe Gradient Flow

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks

Optimizing Local Satisfaction of Long-Run Average Objectives in Markov Decision Processes

Path Planning for Continuum Rods Using Bernstein Surfaces

Transformed Primal-Dual Methods with Variable-Preconditioners

Chasing Fairness in Graphs: A GNN Architecture Perspective

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

Device Scheduling for Relay-assisted Over-the-Air Aggregation in Federated Learning

Keyword: adam

Context Disentangling and Prototype Inheriting for Robust Visual Grounding

Keyword: gradient

Adaptive Smooth Activation for Improved Disease Diagnosis and Organ Segmentation from Radiology Scans

Learning a Diffusion Model Policy from Rewards via Q-Score Matching

Root Cause Explanation of Outliers under Noisy Mechanisms

Sparse is Enough in Fine-tuning Pre-trained Large Language Model

Optimizing Diffusion Noise Can Serve As Universal Motion Priors

Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method

Extension of the Dip-test Repertoire -- Efficient and Differentiable p-value Calculation for Clustering

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property

ZS-SRT: An Efficient Zero-Shot Super-Resolution Training Method for Neural Radiance Fields

OVD-Explorer:Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Quantitative convergence of a discretization of dynamic optimal transport using the dual formulation

On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width

Optimal Power Flow Pursuit via Feedback-based Safe Gradient Flow

pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

Device Scheduling for Relay-assisted Over-the-Air Aggregation in Federated Learning

Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

Efficient Title Reranker for Fast and Improved Knowledge-Intense NLP

Keyword: super-resolution

FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline

TIP: Text-Driven Image Processing with Semantic and Restoration Instructions

ZS-SRT: An Efficient Zero-Shot Super-Resolution Training Method for Neural Radiance Fields