New submissions for Fri, 15 Dec 23

Keyword: sgd

Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods

Authors: Authors: Zijian Liu, Zhengyuan Zhou
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.08531
Pdf link: https://arxiv.org/pdf/2312.08531
Abstract In the past several years, the convergence of the last iterate of the Stochastic Gradient Descent (SGD) algorithm has triggered people's interest due to its good performance in practice but lack of theoretical understanding. For Lipschitz and convex functions, different works have established the optimal $O(\log(1/\delta)\log T/\sqrt{T})$ or $O(\sqrt{\log(1/\delta)/T})$ high-probability convergence rates for the final iterate, where $T$ is the time horizon and $\delta$ is the failure probability. However, to prove these bounds, all the existing works are limited to compact domains or require almost surely bounded noises. It is natural to ask whether the last iterate of SGD can still guarantee the optimal convergence rate but without these two restrictive assumptions. Besides this important question, there are still lots of theoretical problems lacking an answer. For example, compared with the last iterate convergence of SGD for non-smooth problems, only few results for smooth optimization have yet been developed. Additionally, the existing results are all limited to a non-composite objective and the standard Euclidean norm. It still remains unclear whether the last-iterate convergence can be provably extended to wider composite optimization and non-Euclidean norms. In this work, to address the issues mentioned above, we revisit the last-iterate convergence of stochastic gradient methods and provide the first unified way to prove the convergence rates both in expectation and in high probability to accommodate general domains, composite objectives, non-Euclidean norms, Lipschitz conditions, smoothness and (strong) convexity simultaneously. Additionally, we extend our analysis to obtain the last-iterate convergence under heavy-tailed noises.
Contractive error feedback for gradient compression
Authors: Authors: Bingcong Li, Shuai Zheng, Parameswaran Raman, Anshumali Shrivastava, Georgios B. Giannakis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.08538
Pdf link: https://arxiv.org/pdf/2312.08538
Abstract On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle these challenges, we propose an communication efficient method called contractive error feedback (ConEF). As opposed to SGD with error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the sweet spot of convergence and memory usage, and achieves communication efficiency by leveraging biased and all-reducable gradient compression. We empirically validate ConEF on various learning tasks that include image classification, language modeling, and machine translation and observe that ConEF saves 80\% - 90\% of the extra memory in EFSGD with almost no loss on test performance, while also achieving 1.3x - 5x speedup of SGD. Through our work, we also demonstrate the feasibility and convergence of ConEF to clear up the theoretical barrier of integrating ConEF to popular memory efficient frameworks such as ZeRO-3.
Keyword: optimization

A global optimization SAR image segmentation model can be easily transformed to a general ROF denoising model
Authors: Authors: Guangming Liu, Qi Liu, Jing Liang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08376
Pdf link: https://arxiv.org/pdf/2312.08376
Abstract In this paper, we propose a novel locally statistical active contour model (LACM) based on Aubert-Aujol (AA) denoising model and variational level set method, which can be used for SAR images segmentation with intensity inhomogeneity. Then we transform the proposed model into a global optimization model by using convex relaxation technique. Firstly, we apply the Split Bregman technique to transform the global optimization model into two alternating optimization processes of Shrink operator and Laplace operator, which is called SB_LACM model. Moreover, we propose two fast models to solve the global optimization model , which are more efficient than the SB_LACM model. The first model is: we add the proximal function to transform the global optimization model to a general ROF model[29], which can be solved by a fast denoising algorithm proposed by R.-Q.Jia, and H.Zhao; Thus we obtain a fast segmentation algorithm with global optimization solver that does not involve partial differential equations or difference equation, and only need simple difference computation. The second model is: we use a different splitting approach than one model to transform the global optimization model into a differentiable term and a general ROF model term, which can be solved by the same technique as the first model. Experiments using some challenging synthetic images and Envisat SAR images demonstrate the superiority of our proposed models with respect to the state-of-the-art models.
A Study on the Inductance and Thermal Regression and Optimization for Automatic Layout Design of Power Modules
Authors: Authors: Victor Parque, Aiki Nakamura, Tomoyuki Miyashita
Subjects: Neural and Evolutionary Computing (cs.NE); Computational Engineering, Finance, and Science (cs.CE); Performance (cs.PF); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2312.08523
Pdf link: https://arxiv.org/pdf/2312.08523
Abstract Power modules with excellent inductance and temperature metrics are significant to meet the rising sophistication of energy demand in new technologies. In this paper, we use a surrogate-based approach to render optimal layouts of power modules with feasible and attractive inductance-temperature ratios at low computational budget. In particular, we use the class of feedforward networks to estimate the surrogate relationships between power module layout-design variables and inductance-temperature factors rendered from simulations; and Differential Evolution algorithms to optimize and locate feasible layout configurations of power module substrates minimizing inductance and temperature ratios. Our findings suggest the desirable classes of feedforward networks and gradient-free optimization algorithms being able to estimate and optimize power module layouts efficiently and effectively.
auto-sktime: Automated Time Series Forecasting
Authors: Authors: Marc-André Zöller, Marius Lindauer, Marco F. Huber
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.08528
Pdf link: https://arxiv.org/pdf/2312.08528
Abstract In today's data-driven landscape, time series forecasting is pivotal in decision-making across various sectors. Yet, the proliferation of more diverse time series data, coupled with the expanding landscape of available forecasting methods, poses significant challenges for forecasters. To meet the growing demand for efficient forecasting, we introduce auto-sktime, a novel framework for automated time series forecasting. The proposed framework uses the power of automated machine learning (AutoML) techniques to automate the creation of the entire forecasting pipeline. The framework employs Bayesian optimization, to automatically construct pipelines from statistical, machine learning (ML) and deep neural network (DNN) models. Furthermore, we propose three essential improvements to adapt AutoML to time series data: First, pipeline templates to account for the different supported forecasting models. Second, a novel warm-starting technique to start the optimization from prior optimization runs. Third, we adapt multi-fidelity optimizations to make them applicable to a search space containing statistical, ML and DNN models. Experimental results on 64 diverse real-world time series datasets demonstrate the effectiveness and efficiency of the framework, outperforming traditional methods while requiring minimal human involvement.
Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods
Authors: Authors: Zijian Liu, Zhengyuan Zhou
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.08531
Pdf link: https://arxiv.org/pdf/2312.08531
Abstract In the past several years, the convergence of the last iterate of the Stochastic Gradient Descent (SGD) algorithm has triggered people's interest due to its good performance in practice but lack of theoretical understanding. For Lipschitz and convex functions, different works have established the optimal $O(\log(1/\delta)\log T/\sqrt{T})$ or $O(\sqrt{\log(1/\delta)/T})$ high-probability convergence rates for the final iterate, where $T$ is the time horizon and $\delta$ is the failure probability. However, to prove these bounds, all the existing works are limited to compact domains or require almost surely bounded noises. It is natural to ask whether the last iterate of SGD can still guarantee the optimal convergence rate but without these two restrictive assumptions. Besides this important question, there are still lots of theoretical problems lacking an answer. For example, compared with the last iterate convergence of SGD for non-smooth problems, only few results for smooth optimization have yet been developed. Additionally, the existing results are all limited to a non-composite objective and the standard Euclidean norm. It still remains unclear whether the last-iterate convergence can be provably extended to wider composite optimization and non-Euclidean norms. In this work, to address the issues mentioned above, we revisit the last-iterate convergence of stochastic gradient methods and provide the first unified way to prove the convergence rates both in expectation and in high probability to accommodate general domains, composite objectives, non-Euclidean norms, Lipschitz conditions, smoothness and (strong) convexity simultaneously. Additionally, we extend our analysis to obtain the last-iterate convergence under heavy-tailed noises.
Contractive error feedback for gradient compression
Authors: Authors: Bingcong Li, Shuai Zheng, Parameswaran Raman, Anshumali Shrivastava, Georgios B. Giannakis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.08538
Pdf link: https://arxiv.org/pdf/2312.08538
Abstract On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle these challenges, we propose an communication efficient method called contractive error feedback (ConEF). As opposed to SGD with error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the sweet spot of convergence and memory usage, and achieves communication efficiency by leveraging biased and all-reducable gradient compression. We empirically validate ConEF on various learning tasks that include image classification, language modeling, and machine translation and observe that ConEF saves 80\% - 90\% of the extra memory in EFSGD with almost no loss on test performance, while also achieving 1.3x - 5x speedup of SGD. Through our work, we also demonstrate the feasibility and convergence of ConEF to clear up the theoretical barrier of integrating ConEF to popular memory efficient frameworks such as ZeRO-3.
On Searching for Minimal Integer Representation of Undirected Graphs
Authors: Authors: Victor Parque, Tomoyuki Miyashita
Subjects: Discrete Mathematics (cs.DM); Neural and Evolutionary Computing (cs.NE); Combinatorics (math.CO)
Arxiv link: https://arxiv.org/abs/2312.08539
Pdf link: https://arxiv.org/pdf/2312.08539
Abstract Minimal and efficient graph representations are key to store, communicate, and sample the search space of graphs and networks while meeting user-defined criteria. In this paper, we investigate the feasibility of gradient-free optimization heuristics based on Differential Evolution to search for minimal integer representations of undirected graphs. The class of Differential Evolution algorithms are population-based gradient-free optimization heuristics having found a relevant attention in the nonconvex and nonlinear optimization communities. Our computational experiments using eight classes of Differential Evolution schemes and graph instances with varying degrees of sparsity have shown the merit of attaining minimal numbers for graph encoding/representation rendered by exploration-oriented strategies within few function evaluations. Our results have the potential to elucidate new number-based encoding and sample-based algorithms for graph representation, network design and optimization.
Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models
Authors: Authors: Liangchen Song, Liangliang Cao, Jiatao Gu, Yifan Jiang, Junsong Yuan, Hao Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08563
Pdf link: https://arxiv.org/pdf/2312.08563
Abstract The advancement of text-driven 3D content editing has been blessed by the progress from 2D generative diffusion models. However, a major obstacle hindering the widespread adoption of 3D content editing is its time-intensive processing. This challenge arises from the iterative and refining steps required to achieve consistent 3D outputs from 2D image-based generative models. Recent state-of-the-art methods typically require optimization time ranging from tens of minutes to several hours to edit a 3D scene using a single GPU. In this work, we propose that by incorporating correspondence regularization into diffusion models, the process of 3D editing can be significantly accelerated. This approach is inspired by the notion that the estimated samples during diffusion should be multiview-consistent during the diffusion generation process. By leveraging this multiview consistency, we can edit 3D content at a much faster speed. In most scenarios, our proposed technique brings a 10$\times$ speed-up compared to the baseline method and completes the editing of a 3D scene in 2 minutes with comparable quality.
NViST: In the Wild New View Synthesis from a Single Image with Transformers
Authors: Authors: Wonbong Jang, Lourdes Agapito
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08568
Pdf link: https://arxiv.org/pdf/2312.08568
Abstract We propose NViST, a transformer-based model for novel-view synthesis from a single image, trained on a large-scale dataset of in-the-wild images with complex backgrounds. NViST transforms image inputs directly into a radiance field, adopting a scalable transformer-based architecture. In practice, NViST exploits the self-supervised features learnt by a masked autoencoder (MAE), and learns a novel decoder that translates features to 3D tokens via cross-attention and adaptive layer normalization. Our model is efficient at inference since only a single forward-pass is needed to predict a 3D representation, unlike methods that require test-time optimization or sampling such as 3D-aware diffusion models. We tackle further limitations of current new-view synthesis models. First, unlike most generative models that are trained in a category-specific manner, often on synthetic datasets or on masked inputs, our model is trained on MVImgNet, a large-scale dataset of real-world, casually-captured videos containing hundreds of object categories with diverse backgrounds. Secondly, our model does not require canonicalization of the training data - i.e. aligning all objects with a frontal view - only needing relative pose at training time which removes a substantial barrier to it being used on casually captured datasets. We show results on unseen objects and categories on MVImgNet and even casual phone captures. We conduct qualitative and quantitative evaluations on MVImgNet and ShapeNet to show that our model represents a step forward towards enabling true in-the-wild novel-view synthesis from a single image.
Federated Learning for Wireless Applications: A Prototype
Authors: Authors: Varun Laxman Muttepawar, Arjun Mehra, Zubair Shaban, Ranjitha Prasad, Harshan Jagadeesh
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2312.08577
Pdf link: https://arxiv.org/pdf/2312.08577
Abstract Wireless embedded edge devices are ubiquitous in our daily lives, enabling them to gather immense data via onboard sensors and mobile applications. This offers an amazing opportunity to train machine learning (ML) models in the realm of wireless devices for decision-making. Training ML models in a wireless setting necessitates transmitting datasets collected at the edge to a cloud parameter server, which is infeasible due to bandwidth constraints, security, and privacy issues. To tackle these challenges, Federated Learning (FL) has emerged as a distributed optimization approach to the decentralization of the model training process. In this work, we present a novel prototype to examine FL's effectiveness over bandwidth-constrained wireless channels. Through a novel design consisting of Zigbee and NI USRP devices, we propose a configuration that allows clients to broadcast synergistically local ML model updates to a central server to obtain a generalized global model. We assess the efficacy of this prototype using metrics such as global model accuracy and time complexity under varying conditions of transmission power, data heterogeneity and local learning.
Omega-Regular Decision Processes
Authors: Authors: Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak
Subjects: Logic in Computer Science (cs.LO); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.08602
Pdf link: https://arxiv.org/pdf/2312.08602
Abstract Regular decision processes (RDPs) are a subclass of non-Markovian decision processes where the transition and reward functions are guarded by some regular property of the past (a lookback). While RDPs enable intuitive and succinct representation of non-Markovian decision processes, their expressive power coincides with finite-state Markov decision processes (MDPs). We introduce omega-regular decision processes (ODPs) where the non-Markovian aspect of the transition and reward functions are extended to an omega-regular lookahead over the system evolution. Semantically, these lookaheads can be considered as promises made by the decision maker or the learning agent about her future behavior. In particular, we assume that, if the promised lookaheads are not met, then the payoff to the decision maker is $\bot$ (least desirable payoff), overriding any rewards collected by the decision maker. We enable optimization and learning for ODPs under the discounted-reward objective by reducing them to lexicographic optimization and learning over finite MDPs. We present experimental results demonstrating the effectiveness of the proposed reduction.
Verification of Neural Reachable Tubes via Scenario Optimization and Conformal Prediction
Authors: Authors: Albert Lin, Somil Bansal
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.08604
Pdf link: https://arxiv.org/pdf/2312.08604
Abstract Learning-based approaches for controlling safety-critical systems are rapidly growing in popularity; thus, it is important to assure their performance and safety. Hamilton-Jacobi (HJ) reachability analysis is a popular formal verification tool for providing such guarantees, since it can handle general nonlinear system dynamics, bounded adversarial system disturbances, and state and input constraints. However, its computational and memory complexity scales exponentially with the state dimension, making it intractable for large-scale systems. To overcome this challenge, neural approaches, such as DeepReach, have been used to synthesize reachable tubes and safety controllers for high-dimensional systems. However, verifying these neural reachable tubes remains challenging. In this work, we propose two verification methods, based on robust scenario optimization and conformal prediction, to provide probabilistic safety guarantees for neural reachable tubes. Our methods allow a direct trade-off between resilience to outlier errors in the neural tube, which are inevitable in a learning-based approach, and the strength of the probabilistic safety guarantee. Furthermore, we show that split conformal prediction, a widely used method in the machine learning community for uncertainty quantification, reduces to a scenario-based approach, making the two methods equivalent not only for verification of neural reachable tubes but also more generally. To our knowledge, our proof is the first in the literature to show a strong relationship between conformal prediction and scenario optimization. Finally, we propose an outlier-adjusted verification approach that uses the error distribution in neural reachable tubes to recover greater safe volumes. We demonstrate the efficacy of the proposed approaches for the high-dimensional problems of multi-vehicle collision avoidance and rocket landing with no-go zones.
MaxK-GNN: Towards Theoretical Speed Limits for Accelerating Graph Neural Networks Training
Authors: Authors: Hongwu Peng, Xi Xie, Kaustubh Shivdikar, MD Amit Hasan, Jiahui Zhao, Shaoyi Huang, Omer Khan, David Kaeli, Caiwen Ding
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2312.08656
Pdf link: https://arxiv.org/pdf/2312.08656
Abstract In the acceleration of deep neural network training, the GPU has become the mainstream platform. GPUs face substantial challenges on GNNs, such as workload imbalance and memory access irregularities, leading to underutilized hardware. Existing solutions such as PyG, DGL with cuSPARSE, and GNNAdvisor frameworks partially address these challenges but memory traffic is still significant. We argue that drastic performance improvements can only be achieved by the vertical optimization of algorithm and system innovations, rather than treating the speedup optimization as an "after-thought" (i.e., (i) given a GNN algorithm, designing an accelerator, or (ii) given hardware, mainly optimizing the GNN algorithm). In this paper, we present MaxK-GNN, an advanced high-performance GPU training system integrating algorithm and system innovation. (i) We introduce the MaxK nonlinearity and provide a theoretical analysis of MaxK nonlinearity as a universal approximator, and present the Compressed Balanced Sparse Row (CBSR) format, designed to store the data and index of the feature matrix after nonlinearity; (ii) We design a coalescing enhanced forward computation with row-wise product-based SpGEMM Kernel using CBSR for input feature matrix fetching and strategic placement of a sparse output accumulation buffer in shared memory; (iii) We develop an optimized backward computation with outer product-based and SSpMM Kernel. We conduct extensive evaluations of MaxK-GNN and report the end-to-end system run-time. Experiments show that MaxK-GNN system could approach the theoretical speedup limit according to Amdahl's law. We achieve comparable accuracy to SOTA GNNs, but at a significantly increased speed: 3.22/4.24 times speedup (vs. theoretical limits, 5.52/7.27 times) on Reddit compared to DGL and GNNAdvisor implementations.
Incomplete Contrastive Multi-View Clustering with High-Confidence Guiding
Authors: Authors: Guoqing Chao, Yi Jiang, Dianhui Chu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.08697
Pdf link: https://arxiv.org/pdf/2312.08697
Abstract Incomplete multi-view clustering becomes an important research problem, since multi-view data with missing values are ubiquitous in real-world applications. Although great efforts have been made for incomplete multi-view clustering, there are still some challenges: 1) most existing methods didn't make full use of multi-view information to deal with missing values; 2) most methods just employ the consistent information within multi-view data but ignore the complementary information; 3) For the existing incomplete multi-view clustering methods, incomplete multi-view representation learning and clustering are treated as independent processes, which leads to performance gap. In this work, we proposed a novel Incomplete Contrastive Multi-View Clustering method with high-confidence guiding (ICMVC). Firstly, we proposed a multi-view consistency relation transfer plus graph convolutional network to tackle missing values problem. Secondly, instance-level attention fusion and high-confidence guiding are proposed to exploit the complementary information while instance-level contrastive learning for latent representation is designed to employ the consistent information. Thirdly, an end-to-end framework is proposed to integrate multi-view missing values handling, multi-view representation learning and clustering assignment for joint optimization. Experiments compared with state-of-the-art approaches demonstrated the effectiveness and superiority of our method. Our code is publicly available at https://github.com/liunian-Jay/ICMVC.
Gradient Informed Proximal Policy Optimization
Authors: Authors: Sanghyun Son, Laura Yu Zheng, Ryan Sullivan, Yi-Ling Qiao, Ming C. Lin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.08710
Pdf link: https://arxiv.org/pdf/2312.08710
Abstract We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we introduce the concept of an {\alpha}-policy that stands as a locally superior policy. By adaptively modifying the {\alpha} value, we can effectively manage the influence of analytical policy gradients during learning. To this end, we suggest metrics for assessing the variance and bias of analytical gradients, reducing dependence on these gradients when high variance or bias is detected. Our proposed approach outperforms baseline algorithms in various scenarios, such as function optimization, physics simulations, and traffic control environments. Our code can be found online: https://github.com/SonSang/gippo.
Aerial STAR-RIS Empowered MEC: A DRL Approach for Energy Minimization
Authors: Authors: Pyae Sone Aung, Loc X. Nguyen, Yan Kyaw Tun, Zhu Han, Choong Seon Hong
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.08714
Pdf link: https://arxiv.org/pdf/2312.08714
Abstract Multi-access Edge Computing (MEC) addresses computational and battery limitations in devices by allowing them to offload computation tasks. To overcome the difficulties in establishing line-of-sight connections, integrating unmanned aerial vehicles (UAVs) has proven beneficial, offering enhanced data exchange, rapid deployment, and mobility. The utilization of reconfigurable intelligent surfaces (RIS), specifically simultaneously transmitting and reflecting RIS (STAR-RIS) technology, further extends coverage capabilities and introduces flexibility in MEC. This study explores the integration of UAV and STAR-RIS to facilitate communication between IoT devices and an MEC server. The formulated problem aims to minimize energy consumption for IoT devices and aerial STAR-RIS by jointly optimizing task offloading, aerial STAR-RIS trajectory, amplitude and phase shift coefficients, and transmit power. Given the non-convexity of the problem and the dynamic environment, solving it directly within a polynomial time frame is challenging. Therefore, deep reinforcement learning (DRL), particularly proximal policy optimization (PPO), is introduced for its sample efficiency and stability. Simulation results illustrate the effectiveness of the proposed system compared to benchmark schemes in the literature.
FAPP: Fast and Adaptive Perception and Planning for UAVs in Dynamic Cluttered Environments
Authors: Authors: Minghao Lu, Xiyu Fan, Han Chen, Peng Lu
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.08743
Pdf link: https://arxiv.org/pdf/2312.08743
Abstract Obstacle avoidance for Unmanned Aerial Vehicles (UAVs) in cluttered environments is significantly challenging. Existing obstacle avoidance for UAVs either focuses on fully static environments or static environments with only a few dynamic objects. In this paper, we take the initiative to consider the obstacle avoidance of UAVs in dynamic cluttered environments in which dynamic objects are the dominant objects. This type of environment poses significant challenges to both perception and planning. Multiple dynamic objects possess various motions, making it extremely difficult to estimate and predict their motions using one motion model. The planning must be highly efficient to avoid cluttered dynamic objects. This paper proposes Fast and Adaptive Perception and Planning (FAPP) for UAVs flying in complex dynamic cluttered environments. A novel and efficient point cloud segmentation strategy is proposed to distinguish static and dynamic objects. To address multiple dynamic objects with different motions, an adaptive estimation method with covariance adaptation is proposed to quickly and accurately predict their motions. Our proposed trajectory optimization algorithm is highly efficient, enabling it to avoid fast objects. Furthermore, an adaptive re-planning method is proposed to address the case when the trajectory optimization cannot find a feasible solution, which is common for dynamic cluttered environments. Extensive validations in both simulation and real-world experiments demonstrate the effectiveness of our proposed system for highly dynamic and cluttered environments.
All-to-all reconfigurability with sparse Ising machines: the XORSAT challenge with p-bits
Authors: Authors: Navid Anjum Aadit, Srijan Nikhar, Sidharth Kannan, Shuvro Chowdhury, Kerem Y. Camsari
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE); Quantum Physics (quant-ph)
Arxiv link: https://arxiv.org/abs/2312.08748
Pdf link: https://arxiv.org/pdf/2312.08748
Abstract Domain-specific hardware to solve computationally hard optimization problems has generated tremendous excitement recently. Here, we evaluate probabilistic bit (p-bit) based Ising Machines (IM), or p-computers with a benchmark combinatorial optimization problem, namely the 3-regular 3-XOR Satisfiability (3R3X). The 3R3X problem has a glassy energy landscape and it has recently been used to benchmark various IMs and other solvers. We introduce a multiplexed architecture where p-computers emulate all-to-all (complete) graph functionality despite being interconnected in highly sparse networks, enabling highly parallelized Gibbs sampling. We implement this architecture in FPGAs and show that p-bit networks running an adaptive version of the powerful parallel tempering algorithm demonstrate competitive algorithmic and prefactor advantages over alternative IMs by D-Wave, Toshiba and others. Scaled magnetic nanodevice-based realizations of p-computers could lead to orders-of-magnitude further improvement according to experimentally established projections.
RDARS Empowered Massive MIMO System: Two-Timescale Transceiver Design with Imperfect CSI
Authors: Authors: Chengzhi Ma, Jintao Wang, Xi Yang, Guanghua Yang, Wei Zhang, Shaodan Ma
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2312.08753
Pdf link: https://arxiv.org/pdf/2312.08753
Abstract In this paper, we investigate a novel reconfigurable distributed antennas and reflecting surface (RDARS) aided multi-user massive MIMO system with imperfect CSI and propose a practical two-timescale (TTS) transceiver design to reduce the communication overhead and computational complexity of the system. In the RDARS-aided system, not only distribution gain but also reflection gain can be obtained by a flexible combination of the distributed antennas and reflecting surface, which differentiates the system from the others and also makes the TTS design challenging. To enable the optimal TTS transceiver design, the achievable rate of the system is first derived in closed-form. Then the TTS design aiming at the weighted sum rate maximization is considered. To solve the challenging non-convex optimization problem with high-order design variables, i.e., the transmit powers and the phase shifts at the RDARS, a block coordinate descent based method is proposed to find the optimal solutions in semi-closed forms iteratively. Specifically, two efficient algorithms are proposed with provable convergence for the optimal phase shift design, i.e., Riemannian Gradient Ascent based algorithm by exploiting the unit-modulus constraints, and Two-Tier Majorization-Minimization based algorithm with closed-form optimal solutions in each iteration. Simulation results validate the effectiveness of the proposed algorithm and demonstrate the superiority of deploying RDARS in massive MIMO systems to provide substantial rate improvement with a significantly reduced total number of active antennas/RF chains and lower transmit power when compared to the DAS and RIS-aided systems.
CF-NeRF: Camera Parameter Free Neural Radiance Fields with Incremental Learning
Authors: Authors: Qingsong Yan, Qiang Wang, Kaiyong Zhao, Jie Chen, Bo Li, Xiaowen Chu, Fei Deng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08760
Pdf link: https://arxiv.org/pdf/2312.08760
Abstract Neural Radiance Fields (NeRF) have demonstrated impressive performance in novel view synthesis. However, NeRF and most of its variants still rely on traditional complex pipelines to provide extrinsic and intrinsic camera parameters, such as COLMAP. Recent works, like NeRFmm, BARF, and L2G-NeRF, directly treat camera parameters as learnable and estimate them through differential volume rendering. However, these methods work for forward-looking scenes with slight motions and fail to tackle the rotation scenario in practice. To overcome this limitation, we propose a novel \underline{c}amera parameter \underline{f}ree neural radiance field (CF-NeRF), which incrementally reconstructs 3D representations and recovers the camera parameters inspired by incremental structure from motion (SfM). Given a sequence of images, CF-NeRF estimates the camera parameters of images one by one and reconstructs the scene through initialization, implicit localization, and implicit optimization. To evaluate our method, we use a challenging real-world dataset NeRFBuster which provides 12 scenes under complex trajectories. Results demonstrate that CF-NeRF is robust to camera rotation and achieves state-of-the-art results without providing prior information and constraints.
Implement services for business scenarios by combining basic emulators
Authors: Authors: Lei Zhao, Miaomiao Zhang
Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2312.08815
Pdf link: https://arxiv.org/pdf/2312.08815
Abstract This article mainly introduces how to use various basic emulators to form a combined emulator in the Jiutian Intelligence Network Simulation Platform to realize simulation service functions in different business scenarios. Among them, the combined emulator is included. The business scenarios include different practical applications such as multi-objective antenna optimization, high traffic of business, CSI (channel state information) compression feedback, etc.
Semantics-Division Duplexing: A Novel Full-Duplex Paradigm
Authors: Authors: Kai Niu, Zijian Liang, Chao Dong, Jincheng Dai, Zhongwei Si, Ping Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.08862
Pdf link: https://arxiv.org/pdf/2312.08862
Abstract In-band full-duplex (IBFD) is a theoretically effective solution to increase the overall throughput for the future wireless communications system by enabling transmission and reception over the same time-frequency resources. However, reliable source reconstruction remains a great challenge in the practical IBFD systems due to the non-ideal elimination of the self-interference and the inherent limitations of the separate source and channel coding methods. On the other hand, artificial intelligence-enabled semantic communication can provide a viable direction for the optimization of the IBFD system. This article introduces a novel IBFD paradigm with the guidance of semantic communication called semantics-division duplexing (SDD). It utilizes semantic domain processing to further suppress self-interference, distinguish the expected semantic information, and recover the desired sources. Further integration of the digital and semantic domain processing can be implemented so as to achieve intelligent and concise communications. We present the advantages of the SDD paradigm with theoretical explanations and provide some visualized results to verify its effectiveness.
What, How, and When Should Object Detectors Update in Continually Changing Test Domains?
Authors: Authors: Jayeon Yoo, Dongkwan Lee, Inseop Chung, Donghyun Kim, Nojun Kwak
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08875
Pdf link: https://arxiv.org/pdf/2312.08875
Abstract It is a well-known fact that the performance of deep learning models deteriorates when they encounter a distribution shift at test time. Test-time adaptation (TTA) algorithms have been proposed to adapt the model online while inferring test data. However, existing research predominantly focuses on classification tasks through the optimization of batch normalization layers or classification heads, but this approach limits its applicability to various model architectures like Transformers and makes it challenging to apply to other tasks, such as object detection. In this paper, we propose a novel online adaption approach for object detection in continually changing test domains, considering which part of the model to update, how to update it, and when to perform the update. By introducing architecture-agnostic and lightweight adaptor modules and only updating these while leaving the pre-trained backbone unchanged, we can rapidly adapt to new test domains in an efficient way and prevent catastrophic forgetting. Furthermore, we present a practical and straightforward class-wise feature aligning method for object detection to resolve domain shifts. Additionally, we enhance efficiency by determining when the model is sufficiently adapted or when additional adaptation is needed due to changes in the test distribution. Our approach surpasses baselines on widely used benchmarks, achieving improvements of up to 4.9\%p and 7.9\%p in mAP for COCO $\rightarrow$ COCO-corrupted and SHIFT, respectively, while maintaining about 20 FPS or higher.
May the Noise be with you: Adversarial Training without Adversarial Examples
Authors: Authors: Ayoub Arous, Andres F Lopez-Lopera, Nael Abu-Ghazaleh, Ihsen Alouani
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08877
Pdf link: https://arxiv.org/pdf/2312.08877
Abstract In this paper, we investigate the following question: Can we obtain adversarially-trained models without training on adversarial examples? Our intuition is that training a model with inherent stochasticity, i.e., optimizing the parameters by minimizing a stochastic loss function, yields a robust expectation function that is non-stochastic. In contrast to related methods that introduce noise at the input level, our proposed approach incorporates inherent stochasticity by embedding Gaussian noise within the layers of the NN model at training time. We model the propagation of noise through the layers, introducing a closed-form stochastic loss function that encapsulates a noise variance parameter. Additionally, we contribute a formalized noise-aware gradient, enabling the optimization of model parameters while accounting for stochasticity. Our experimental results confirm that the expectation model of a stochastic architecture trained on benign distribution is adversarially robust. Interestingly, we find that the impact of the applied Gaussian noise's standard deviation on both robustness and baseline accuracy closely mirrors the impact of the noise magnitude employed in adversarial training. Our work contributes adversarially trained networks using a completely different approach, with empirically similar robustness to adversarial training.
Neural Video Fields Editing
Authors: Authors: Shuzhou Yang, Chong Mou, Jiwen Yu, Yuhan Wang, Xiandong Meng, Jian Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08882
Pdf link: https://arxiv.org/pdf/2312.08882
Abstract Diffusion models have revolutionized text-driven video editing. However, applying these methods to real-world editing encounters two significant challenges: (1) the rapid increase in graphics memory demand as the number of frames grows, and (2) the inter-frame inconsistency in edited videos. To this end, we propose NVEdit, a novel text-driven video editing framework designed to mitigate memory overhead and improve consistent editing for real-world long videos. Specifically, we construct a neural video field, powered by tri-plane and sparse grid, to enable encoding long videos with hundreds of frames in a memory-efficient manner. Next, we update the video field through off-the-shelf Text-to-Image (T2I) models to impart text-driven editing effects. A progressive optimization strategy is developed to preserve original temporal priors. Importantly, both the neural video field and T2I model are adaptable and replaceable, thus inspiring future research. Experiments demonstrate that our approach successfully edits hundreds of frames with impressive inter-frame consistency.
SceneWiz3D: Towards Text-guided 3D Scene Composition
Authors: Authors: Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08885
Pdf link: https://arxiv.org/pdf/2312.08885
Abstract We are witnessing significant breakthroughs in the technology for generating 3D objects from text. Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets. Generating entire scenes, however, remains very challenging as a scene contains multiple 3D objects, diverse and scattered. In this work, we introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text. We marry the locality of objects with globality of scenes by introducing a hybrid 3D representation: explicit for objects and implicit for scenes. Remarkably, an object, being represented explicitly, can be either generated from text using conventional text-to-3D approaches, or provided by users. To configure the layout of the scene and automatically place objects, we apply the Particle Swarm Optimization technique during the optimization process. Furthermore, it is difficult for certain parts of the scene (e.g., corners, occlusion) to receive multi-view supervision, leading to inferior geometry. We incorporate an RGBD panorama diffusion model to mitigate it, resulting in high-quality geometry. Extensive evaluation supports that our approach achieves superior quality over previous approaches, enabling the generation of detailed and view-consistent 3D scenes.
Solving Dense Linear Systems Faster than via Preconditioning
Authors: Authors: Michał Dereziński, Jiaming Yang
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.08893
Pdf link: https://arxiv.org/pdf/2312.08893
Abstract We give a stochastic optimization algorithm that solves a dense $n\times n$ real-valued linear system $Ax=b$, returning $\tilde x$ such that $|A\tilde x-b|\leq \epsilon|b|$ in time: $$\tilde O((n^2+nk^{\omega-1})\log1/\epsilon),$$ where $k$ is the number of singular values of $A$ larger than $O(1)$ times its smallest positive singular value, $\omega < 2.372$ is the matrix multiplication exponent, and $\tilde O$ hides a poly-logarithmic in $n$ factor. When $k=O(n^{1-\theta})$ (namely, $A$ has a flat-tailed spectrum, e.g., due to noisy data or regularization), this improves on both the cost of solving the system directly, as well as on the cost of preconditioning an iterative method such as conjugate gradient. In particular, our algorithm has an $\tilde O(n^2)$ runtime when $k=O(n^{0.729})$. We further adapt this result to sparse positive semidefinite matrices and least squares regression. Our main algorithm can be viewed as a randomized block coordinate descent method, where the key challenge is simultaneously ensuring good convergence and fast per-iteration time. In our analysis, we use theory of majorization for elementary symmetric polynomials to establish a sharp convergence guarantee when coordinate blocks are sampled using a determinantal point process. We then use a Markov chain coupling argument to show that similar convergence can be attained with a cheaper sampling scheme, and accelerate the block coordinate descent update via matrix sketching.
Dataset Distillation via Adversarial Prediction Matching
Authors: Authors: Mingyang Chen, Bo Huang, Junda Lu, Bing Li, Yi Wang, Minhao Cheng, Wei Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08912
Pdf link: https://arxiv.org/pdf/2312.08912
Abstract Dataset distillation is the technique of synthesizing smaller condensed datasets from large original datasets while retaining necessary information to persist the effect. In this paper, we approach the dataset distillation problem from a novel perspective: we regard minimizing the prediction discrepancy on the real data distribution between models, which are respectively trained on the large original dataset and on the small distilled dataset, as a conduit for condensing information from the raw data into the distilled version. An adversarial framework is proposed to solve the problem efficiently. In contrast to existing distillation methods involving nested optimization or long-range gradient unrolling, our approach hinges on single-level optimization. This ensures the memory efficiency of our method and provides a flexible tradeoff between time and memory budgets, allowing us to distil ImageNet-1K using a minimum of only 6.5GB of GPU memory. Under the optimal tradeoff strategy, it requires only 2.5$\times$ less memory and 5$\times$ less runtime compared to the state-of-the-art. Empirically, our method can produce synthetic datasets just 10% the size of the original, yet achieve, on average, 94% of the test accuracy of models trained on the full original datasets including ImageNet-1K, significantly surpassing state-of-the-art. Additionally, extensive tests reveal that our distilled datasets excel in cross-architecture generalization capabilities.
BiPFT: Binary Pre-trained Foundation Transformer with Low-rank Estimation of Binarization Residual Polynomials
Authors: Authors: Xingrun Xing, Li Du, Xinyuan Wang, Xianlin Zeng, Yequan Wang, Zheng Zhang, Jiajun Zhang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.08937
Pdf link: https://arxiv.org/pdf/2312.08937
Abstract Pretrained foundation models offer substantial benefits for a wide range of downstream tasks, which can be one of the most potential techniques to access artificial general intelligence. However, scaling up foundation transformers for maximal task-agnostic knowledge has brought about computational challenges, especially on resource-limited devices such as mobiles. This work proposes the first Binary Pretrained Foundation Transformer (BiPFT) for natural language understanding (NLU) tasks, which remarkably saves 56 times operations and 28 times memory. In contrast to previous task-specific binary transformers, BiPFT exhibits a substantial enhancement in the learning capabilities of binary neural networks (BNNs), promoting BNNs into the era of pre-training. Benefiting from extensive pretraining data, we further propose a data-driven binarization method. Specifically, we first analyze the binarization error in self-attention operations and derive the polynomials of binarization error. To simulate full-precision self-attention, we define binarization error as binarization residual polynomials, and then introduce low-rank estimators to model these polynomials. Extensive experiments validate the effectiveness of BiPFTs, surpassing task-specific baseline by 15.4% average performance on the GLUE benchmark. BiPFT also demonstrates improved robustness to hyperparameter changes, improved optimization efficiency, and reduced reliance on downstream distillation, which consequently generalize on various NLU tasks and simplify the downstream pipeline of BNNs. Our code and pretrained models are publicly available at https://github.com/Xingrun-Xing/BiPFT.
Single and Multi-Objective Benchmark Problems Focusing on Human-Powered Aircraft Design
Authors: Authors: Nobuo Namura
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2312.08953
Pdf link: https://arxiv.org/pdf/2312.08953
Abstract This paper introduces a novel set of benchmark problems aimed at advancing research in both single and multi-objective optimization, with a specific focus on the design of human-powered aircraft (HPA). These benchmark problems are unique in that they incorporate real-world design considerations such as fluid dynamics and material mechanics, providing a more realistic simulation of engineering design optimization. We propose three difficulty levels and a wing segmentation parameter in these problems, allowing for scalable complexity to suit various research needs. The problems are designed to be computationally reasonable, ensuring short evaluation times, while still capturing the moderate multimodality of engineering design problems. Our extensive experiments using popular evolutionary algorithms for multi-objective problems demonstrate that the proposed benchmarks effectively replicate the diverse Pareto front shapes observed in real-world problems, including convex, linear, concave, and degenerated forms. The benchmarks and their Python source codes are made publicly available for broader use in the optimization research community.
LLMind: Orchestrating AI and IoT with LLMs for Complex Task Execution
Authors: Authors: Hongwei Cui, Yuyang Du, Qun Yang, Yulin Shao, Soung Chang Liew
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.09007
Pdf link: https://arxiv.org/pdf/2312.09007
Abstract In this article, we introduce LLMind, an innovative AI framework that utilizes large language models (LLMs) as a central orchestrator. The framework integrates LLMs with domain-specific AI modules, enabling IoT devices to collaborate effectively in executing complex tasks. The LLM performs planning and generates control scripts using a reliable and precise language-code transformation approach based on finite state machines (FSMs). The LLM engages in natural conversations with users, employing role-playing techniques to generate contextually appropriate responses. Additionally, users can interact easily with the AI agent via a user-friendly social media platform. The framework also incorporates semantic analysis and response optimization techniques to enhance speed and effectiveness. Ultimately, this framework is designed not only to innovate IoT device control and enrich user experiences but also to foster an intelligent and integrated IoT device ecosystem that evolves and becomes more sophisticated through continuing user and machine interactions.
Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
Authors: Authors: Jiwoo Chung, Sangeek Hyun, Jae-Pil Heo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.09008
Pdf link: https://arxiv.org/pdf/2312.09008
Abstract Despite the impressive generative capabilities of diffusion models, existing diffusion model-based style transfer methods require inference-stage optimization (e.g. fine-tuning or textual inversion of style) which is time-consuming, or fails to leverage the generative ability of large-scale diffusion models. To address these issues, we introduce a novel artistic style transfer method based on a pre-trained large-scale diffusion model without any optimization. Specifically, we manipulate the features of self-attention layers as the way the cross-attention mechanism works; in the generation process, substituting the key and value of content with those of style image. This approach provides several desirable characteristics for style transfer including 1) preservation of content by transferring similar styles into similar image patches and 2) transfer of style based on similarity of local texture (e.g. edge) between content and style images. Furthermore, we introduce query preservation and attention temperature scaling to mitigate the issue of disruption of original content, and initial latent Adaptive Instance Normalization (AdaIN) to deal with the disharmonious color (failure to transfer the colors of style). Our experimental results demonstrate that our proposed method surpasses state-of-the-art methods in both conventional and diffusion-based style transfer baselines.
Symmetry Breaking and Equivariant Neural Networks
Authors: Authors: Sékou-Oumar Kaba, Siamak Ravanbakhsh
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.09016
Pdf link: https://arxiv.org/pdf/2312.09016
Abstract Using symmetry as an inductive bias in deep learning has been proven to be a principled approach for sample-efficient model design. However, the relationship between symmetry and the imperative for equivariance in neural networks is not always obvious. Here, we analyze a key limitation that arises in equivariant functions: their incapacity to break symmetry at the level of individual data samples. In response, we introduce a novel notion of 'relaxed equivariance' that circumvents this limitation. We further demonstrate how to incorporate this relaxation into equivariant multilayer perceptrons (E-MLPs), offering an alternative to the noise-injection method. The relevance of symmetry breaking is then discussed in various application domains: physics, graph representation learning, combinatorial optimization and equivariant decoding.
Bayes Net based highbrid Monte Carlo Optimization for Redundant Manipulator
Authors: Authors: Feng Yichang, Wang Jin, Zhang Haiyun, Lu Guodong
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.09024
Pdf link: https://arxiv.org/pdf/2312.09024
Abstract This paper proposes a Bayes Net based Monte Carlo optimization for motion planning (BN-MCO). Primarily, we adjust the potential fields determined by goal and start constraints to progressively guide the sampled clusters toward the goal and start points. Then, we utilize the Gaussian mixed modal (GMM) to perform the Monte Carlo optimization, confronting these two non-convex potential fields. Moreover, KL divergence measures the bias between the true distribution determined by the fields and the proposed GMM, whose parameters are learned incrementally according to the manifold information of the bias. In this way, the Bayesian network consisting of sequential updated GMMs expands until the constraints are satisfied and the shortest path method can find a feasible path. Finally, we tune the key parameters and benchmark BN-MCO against the other 5 planners on LBR-iiwa in a bookshelf. The result shows the highest success rate and moderate solving efficiency of BN-MCO.
Graph Neural Networks with Diverse Spectral Filtering
Authors: Authors: Jingwei Guo, Kaizhu Huang, Xinping Yi, Rui Zhang
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2312.09041
Pdf link: https://arxiv.org/pdf/2312.09041
Abstract Spectral Graph Neural Networks (GNNs) have achieved tremendous success in graph machine learning, with polynomial filters applied for graph convolutions, where all nodes share the identical filter weights to mine their local contexts. Despite the success, existing spectral GNNs usually fail to deal with complex networks (e.g., WWW) due to such homogeneous spectral filtering setting that ignores the regional heterogeneity as typically seen in real-world networks. To tackle this issue, we propose a novel diverse spectral filtering (DSF) framework, which automatically learns node-specific filter weights to exploit the varying local structure properly. Particularly, the diverse filter weights consist of two components -- A global one shared among all nodes, and a local one that varies along network edges to reflect node difference arising from distinct graph parts -- to balance between local and global information. As such, not only can the global graph characteristics be captured, but also the diverse local patterns can be mined with awareness of different node positions. Interestingly, we formulate a novel optimization problem to assist in learning diverse filters, which also enables us to enhance any spectral GNNs with our DSF framework. We showcase the proposed framework on three state-of-the-arts including GPR-GNN, BernNet, and JacobiConv. Extensive experiments over 10 benchmark datasets demonstrate that our framework can consistently boost model performance by up to 4.92% in node classification tasks, producing diverse filters with enhanced interpretability. Code is available at \url{https://github.com/jingweio/DSF}.
Entropy Regularization and Faster Decremental Matching in General Graphs
Authors: Authors: Jiale Chen, Aaron Sidford, Ta-Wei Tu
Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.09077
Pdf link: https://arxiv.org/pdf/2312.09077
Abstract We provide an algorithm that maintains, against an adaptive adversary, a $(1-\varepsilon)$-approximate maximum matching in $n$-node $m$-edge general (not necessarily bipartite) undirected graph undergoing edge deletions with high probability with (amortized) $O(\mathrm{poly}(\varepsilon^{-1}, \log n))$ time per update. We also obtain the same update time for maintaining a fractional approximate weighted matching (and hence an approximation to the value of the maximum weight matching) and an integral approximate weighted matching in dense graphs. Our unweighted result improves upon the prior state-of-the-art which includes a $\mathrm{poly}(\log{n}) \cdot 2^{O(1/\varepsilon^2)}$ update time [Assadi-Bernstein-Dudeja 2022] and an $O(\sqrt{m} \varepsilon^{-2})$ update time [Gupta-Peng 2013], and our weighted result improves upon the $O(\sqrt{m}\varepsilon^{-O(1/\varepsilon)}\log{n})$ update time due to [Gupta-Peng 2013]. To obtain our results, we generalize a recent optimization approach to dynamic algorithms from [Jambulapati-Jin-Sidford-Tian 2022]. We show that repeatedly solving entropy-regularized optimization problems yields a lazy updating scheme for fractional decremental problems with a near-optimal number of updates. To apply this framework we develop optimization methods compatible with it and new dynamic rounding algorithms for the matching polytope.
COMBHelper: A Neural Approach to Reduce Search Space for Graph Combinatorial Problems
Authors: Authors: Hao Tian, Sourav Medya, Wei Ye
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2312.09086
Pdf link: https://arxiv.org/pdf/2312.09086
Abstract Combinatorial Optimization (CO) problems over graphs appear routinely in many applications such as in optimizing traffic, viral marketing in social networks, and matching for job allocation. Due to their combinatorial nature, these problems are often NP-hard. Existing approximation algorithms and heuristics rely on the search space to find the solutions and become time-consuming when this space is large. In this paper, we design a neural method called COMBHelper to reduce this space and thus improve the efficiency of the traditional CO algorithms based on node selection. Specifically, it employs a Graph Neural Network (GNN) to identify promising nodes for the solution set. This pruned search space is then fed to the traditional CO algorithms. COMBHelper also uses a Knowledge Distillation (KD) module and a problem-specific boosting module to bring further efficiency and efficacy. Our extensive experiments show that the traditional CO algorithms with COMBHelper are at least 2 times faster than their original versions.
Tokenize Anything via Prompting
Authors: Authors: Ting Pan, Lulu Tang, Xinlong Wang, Shiguang Shan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.09128
Pdf link: https://arxiv.org/pdf/2312.09128
Abstract We present a unified, promptable model capable of simultaneously segmenting, recognizing, and captioning anything. Unlike SAM, we aim to build a versatile region representation in the wild via visual prompting. To achieve this, we train a generalizable model with massive segmentation masks, e.g., SA-1B masks, and semantic priors from a pre-trained CLIP model with 5 billion parameters. Specifically, we construct a promptable image decoder by adding a semantic token to each mask token. The semantic token is responsible for learning the semantic priors in a predefined concept space. Through joint optimization of segmentation on mask tokens and concept prediction on semantic tokens, our model exhibits strong regional recognition and localization capabilities. For example, an additional 38M-parameter causal text decoder trained from scratch sets a new record with a CIDEr score of 150.7 on the Visual Genome region captioning task. We believe this model can be a versatile region-level image tokenizer, capable of encoding general-purpose region context for a broad range of perception tasks. Code and models are available at https://github.com/baaivision/tokenize-anything.
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments
Authors: Authors: Liyuan Zhu, Shengyu Huang, Konrad Schindler, Iro Armeni
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.09138
Pdf link: https://arxiv.org/pdf/2312.09138
Abstract Research into dynamic 3D scene understanding has primarily focused on short-term change tracking from dense observations, while little attention has been paid to long-term changes with sparse observations. We address this gap with MoRE, a novel approach for multi-object relocalization and reconstruction in evolving environments. We view these environments as "living scenes" and consider the problem of transforming scans taken at different points in time into a 3D reconstruction of the object instances, whose accuracy and completeness increase over time. At the core of our method lies an SE(3)-equivariant representation in a single encoder-decoder network, trained on synthetic data. This representation enables us to seamlessly tackle instance matching, registration, and reconstruction. We also introduce a joint optimization algorithm that facilitates the accumulation of point clouds originating from the same instance across multiple scans taken at different points in time. We validate our method on synthetic and real-world data and demonstrate state-of-the-art performance in both end-to-end performance and individual subtasks.
Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers
Authors: Authors: Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, Song-Hai Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.09147
Pdf link: https://arxiv.org/pdf/2312.09147
Abstract Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. Please see our project page at https://zouzx.github.io/TriplaneGaussian/.
Architecture Singularity Distance Computations for Linear Pentapods
Authors: Authors: Aditya Kapilavai, Georg Nawratil
Subjects: Robotics (cs.RO); Mathematical Software (cs.MS); Algebraic Geometry (math.AG)
Arxiv link: https://arxiv.org/abs/2312.09160
Pdf link: https://arxiv.org/pdf/2312.09160
Abstract The kinematic/robotic community is not only interested in measuring the closeness of a given robot configuration to its next singular one but also in a geometric meaningful index evaluating how far the robot design is away from being architecturally singular. Such an architecture singularity distance, which can be used by engineers as a criterion within the design process, is presented for a certain class of parallel manipulators of Stewart-Gough type; namely so-called linear pentapods. Geometrically the architecture singular designs are well-understood and can be subclassified into several cases, which allows to solve the optimization problem of computing the closest architecture singular design to a given linear pentapod with algorithms from numerical algebraic geometry.
Efficient Online Learning of Contact Force Models for Connector Insertion
Authors: Authors: Kevin Tracy, Zachary Manchester, Ajinkya Jain, Keegan Go, Stefan Schaal, Tom Erez, Yuval Tassa
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.09190
Pdf link: https://arxiv.org/pdf/2312.09190
Abstract Contact-rich manipulation tasks with stiff frictional elements like connector insertion are difficult to model with rigid-body simulators. In this work, we propose a new approach for modeling these environments by learning a quasi-static contact force model instead of a full simulator. Using a feature vector that contains information about the configuration and control, we find a linear mapping adequately captures the relationship between this feature vector and the sensed contact forces. A novel Linear Model Learning (LML) algorithm is used to solve for the globally optimal mapping in real time without any matrix inversions, resulting in an algorithm that runs in nearly constant time on a GPU as the model size increases. We validate the proposed approach for connector insertion both in simulation and hardware experiments, where the learned model is combined with an optimization-based controller to achieve smooth insertions in the presence of misalignments and uncertainty. Our website featuring videos, code, and more materials is available at https://model-based-plugging.github.io/.
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Authors: Authors: Jacob Eisenstein, Chirag Nagpal, Alekh Agarwal, Ahmad Beirami, Alex D'Amour, DJ Dvijotham, Adam Fisch, Katherine Heller, Stephen Pfohl, Deepak Ramachandran, Peter Shaw, Jonathan Berant
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.09244
Pdf link: https://arxiv.org/pdf/2312.09244
Abstract Reward models play a key role in aligning language model applications towards human preferences. However, this setup creates an incentive for the language model to exploit errors in the reward model to achieve high estimated reward, a phenomenon often termed \emph{reward hacking}. A natural mitigation is to train an ensemble of reward models, aggregating over model outputs to obtain a more robust reward estimate. We explore the application of reward ensembles to alignment at both training time (through reinforcement learning) and inference time (through reranking). First, we show that reward models are \emph{underspecified}: reward models that perform similarly in-distribution can yield very different rewards when used in alignment, due to distribution shift. Second, underspecification results in overoptimization, where alignment to one reward model does not improve reward as measured by another reward model trained on the same data. Third, overoptimization is mitigated by the use of reward ensembles, and ensembles that vary by their \emph{pretraining} seeds lead to better generalization than ensembles that differ only by their \emph{fine-tuning} seeds, with both outperforming individual reward models. However, even pretrain reward ensembles do not eliminate reward hacking: we show several qualitative reward hacking phenomena that are not mitigated by ensembling because all reward models in the ensemble exhibit similar error patterns.
ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining
Authors: Authors: Ruoxi Shi, Xinyue Wei, Cheng Wang, Hao Su
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2312.09249
Pdf link: https://arxiv.org/pdf/2312.09249
Abstract We present ZeroRF, a novel per-scene optimization method addressing the challenge of sparse view 360{\deg} reconstruction in neural field representations. Current breakthroughs like Neural Radiance Fields (NeRF) have demonstrated high-fidelity image synthesis but struggle with sparse input views. Existing methods, such as Generalizable NeRFs and per-scene optimization approaches, face limitations in data dependency, computational cost, and generalization across diverse scenarios. To overcome these challenges, we propose ZeroRF, whose key idea is to integrate a tailored Deep Image Prior into a factorized NeRF representation. Unlike traditional methods, ZeroRF parametrizes feature grids with a neural network generator, enabling efficient sparse view 360{\deg} reconstruction without any pretraining or additional regularization. Extensive experiments showcase ZeroRF's versatility and superiority in terms of both quality and speed, achieving state-of-the-art results on benchmark datasets. ZeroRF's significance extends to applications in 3D content generation and editing. Project page: https://sarahweiii.github.io/zerorf/
Keyword: adam

Identifying Planetary Names in Astronomy Papers: A Multi-Step Approach
Authors: Authors: Golnaz Shapurian, Michael J Kurtz, Alberto Accomazzi
Subjects: Computation and Language (cs.CL); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.08579
Pdf link: https://arxiv.org/pdf/2312.08579
Abstract The automatic identification of planetary feature names in astronomy publications presents numerous challenges. These features include craters, defined as roughly circular depressions resulting from impact or volcanic activity; dorsas, which are elongate raised structures or wrinkle ridges; and lacus, small irregular patches of dark, smooth material on the Moon, referred to as "lake" (Planetary Names Working Group, n.d.). Many feature names overlap with places or people's names that they are named after, for example, Syria, Tempe, Einstein, and Sagan, to name a few (U.S. Geological Survey, n.d.). Some feature names have been used in many contexts, for instance, Apollo, which can refer to mission, program, sample, astronaut, seismic, seismometers, core, era, data, collection, instrument, and station, in addition to the crater on the Moon. Some feature names can appear in the text as adjectives, like the lunar craters Black, Green, and White. Some feature names in other contexts serve as directions, like craters West and South on the Moon. Additionally, some features share identical names across different celestial bodies, requiring disambiguation, such as the Adams crater, which exists on both the Moon and Mars. We present a multi-step pipeline combining rule-based filtering, statistical relevance analysis, part-of-speech (POS) tagging, named entity recognition (NER) model, hybrid keyword harvesting, knowledge graph (KG) matching, and inference with a locally installed large language model (LLM) to reliably identify planetary names despite these challenges. When evaluated on a dataset of astronomy papers from the Astrophysics Data System (ADS), this methodology achieves an F1-score over 0.97 in disambiguating planetary feature names.
Keyword: gradient

Accelerating Meta-Learning by Sharing Gradients
Authors: Authors: Oscar Chang, Hod Lipson
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.08398
Pdf link: https://arxiv.org/pdf/2312.08398
Abstract The success of gradient-based meta-learning is primarily attributed to its ability to leverage related tasks to learn task-invariant information. However, the absence of interactions between different tasks in the inner loop leads to task-specific over-fitting in the initial phase of meta-training. While this is eventually corrected by the presence of these interactions in the outer loop, it comes at a significant cost of slower meta-learning. To address this limitation, we explicitly encode task relatedness via an inner loop regularization mechanism inspired by multi-task learning. Our algorithm shares gradient information from previously encountered tasks as well as concurrent tasks in the same task batch, and scales their contribution with meta-learned parameters. We show using two popular few-shot classification datasets that gradient sharing enables meta-learning under bigger inner loop learning rates and can accelerate the meta-training process by up to 134%.
A Study on the Inductance and Thermal Regression and Optimization for Automatic Layout Design of Power Modules
Authors: Authors: Victor Parque, Aiki Nakamura, Tomoyuki Miyashita
Subjects: Neural and Evolutionary Computing (cs.NE); Computational Engineering, Finance, and Science (cs.CE); Performance (cs.PF); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2312.08523
Pdf link: https://arxiv.org/pdf/2312.08523
Abstract Power modules with excellent inductance and temperature metrics are significant to meet the rising sophistication of energy demand in new technologies. In this paper, we use a surrogate-based approach to render optimal layouts of power modules with feasible and attractive inductance-temperature ratios at low computational budget. In particular, we use the class of feedforward networks to estimate the surrogate relationships between power module layout-design variables and inductance-temperature factors rendered from simulations; and Differential Evolution algorithms to optimize and locate feasible layout configurations of power module substrates minimizing inductance and temperature ratios. Our findings suggest the desirable classes of feedforward networks and gradient-free optimization algorithms being able to estimate and optimize power module layouts efficiently and effectively.
Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods
Authors: Authors: Zijian Liu, Zhengyuan Zhou
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.08531
Pdf link: https://arxiv.org/pdf/2312.08531
Abstract In the past several years, the convergence of the last iterate of the Stochastic Gradient Descent (SGD) algorithm has triggered people's interest due to its good performance in practice but lack of theoretical understanding. For Lipschitz and convex functions, different works have established the optimal $O(\log(1/\delta)\log T/\sqrt{T})$ or $O(\sqrt{\log(1/\delta)/T})$ high-probability convergence rates for the final iterate, where $T$ is the time horizon and $\delta$ is the failure probability. However, to prove these bounds, all the existing works are limited to compact domains or require almost surely bounded noises. It is natural to ask whether the last iterate of SGD can still guarantee the optimal convergence rate but without these two restrictive assumptions. Besides this important question, there are still lots of theoretical problems lacking an answer. For example, compared with the last iterate convergence of SGD for non-smooth problems, only few results for smooth optimization have yet been developed. Additionally, the existing results are all limited to a non-composite objective and the standard Euclidean norm. It still remains unclear whether the last-iterate convergence can be provably extended to wider composite optimization and non-Euclidean norms. In this work, to address the issues mentioned above, we revisit the last-iterate convergence of stochastic gradient methods and provide the first unified way to prove the convergence rates both in expectation and in high probability to accommodate general domains, composite objectives, non-Euclidean norms, Lipschitz conditions, smoothness and (strong) convexity simultaneously. Additionally, we extend our analysis to obtain the last-iterate convergence under heavy-tailed noises.
World Models via Policy-Guided Trajectory Diffusion
Authors: Authors: Marc Rigter, Jun Yamada, Ingmar Posner
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.08533
Pdf link: https://arxiv.org/pdf/2312.08533
Abstract World models are a powerful tool for developing intelligent agents. By predicting the outcome of a sequence of actions, world models enable policies to be optimised via on-policy reinforcement learning (RL) using synthetic data, i.e. in ``in imagination''. Existing world models are autoregressive, and interleave predicting the next state with sampling the next action from the policy. Thus, the prediction error inevitably compounds as the trajectory length grows. In this work, we propose a novel world modelling approach that is not autoregressive and generates entire on-policy trajectories via a single pass through a diffusion model. Our approach, Policy-Guided Trajectory Diffusion (PolyGRAD), leverages a denoising model in addition to the gradient of the action distribution of the policy to diffuse a trajectory of initially random states and actions into an on-policy synthetic trajectory. We analyse the capabilities of our approach and demonstrate that it obtains competitive prediction errors to state-of-the-art autoregressive baselines. PolyGRAD also enables performant policies to be trained via on-policy RL in imagination. We believe that PolyGRAD introduces a promising paradigm for world modelling with many possible extensions to explore in future work.
Contractive error feedback for gradient compression
Authors: Authors: Bingcong Li, Shuai Zheng, Parameswaran Raman, Anshumali Shrivastava, Georgios B. Giannakis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.08538
Pdf link: https://arxiv.org/pdf/2312.08538
Abstract On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle these challenges, we propose an communication efficient method called contractive error feedback (ConEF). As opposed to SGD with error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the sweet spot of convergence and memory usage, and achieves communication efficiency by leveraging biased and all-reducable gradient compression. We empirically validate ConEF on various learning tasks that include image classification, language modeling, and machine translation and observe that ConEF saves 80\% - 90\% of the extra memory in EFSGD with almost no loss on test performance, while also achieving 1.3x - 5x speedup of SGD. Through our work, we also demonstrate the feasibility and convergence of ConEF to clear up the theoretical barrier of integrating ConEF to popular memory efficient frameworks such as ZeRO-3.
On Searching for Minimal Integer Representation of Undirected Graphs
Authors: Authors: Victor Parque, Tomoyuki Miyashita
Subjects: Discrete Mathematics (cs.DM); Neural and Evolutionary Computing (cs.NE); Combinatorics (math.CO)
Arxiv link: https://arxiv.org/abs/2312.08539
Pdf link: https://arxiv.org/pdf/2312.08539
Abstract Minimal and efficient graph representations are key to store, communicate, and sample the search space of graphs and networks while meeting user-defined criteria. In this paper, we investigate the feasibility of gradient-free optimization heuristics based on Differential Evolution to search for minimal integer representations of undirected graphs. The class of Differential Evolution algorithms are population-based gradient-free optimization heuristics having found a relevant attention in the nonconvex and nonlinear optimization communities. Our computational experiments using eight classes of Differential Evolution schemes and graph instances with varying degrees of sparsity have shown the merit of attaining minimal numbers for graph encoding/representation rendered by exploration-oriented strategies within few function evaluations. Our results have the potential to elucidate new number-based encoding and sample-based algorithms for graph representation, network design and optimization.
MotherNet: A Foundational Hypernetwork for Tabular Classification
Authors: Authors: Andreas Müller, Carlo Curino, Raghu Ramakrishnan
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.08598
Pdf link: https://arxiv.org/pdf/2312.08598
Abstract The advent of Foundation Models is transforming machine learning across many modalities (e.g., language, images, videos) with prompt engineering replacing training in many settings. Recent work on tabular data (e.g., TabPFN) hints at a similar opportunity to build Foundation Models for classification for numerical data. In this paper, we go one step further and propose a hypernetwork architecture that we call MotherNet, trained on millions of classification tasks, that, once prompted with a never-seen-before training set generates the weights of a trained ``child'' neural-network. Like other Foundation Models, MotherNet replaces training on specific datasets with in-context learning through a single forward pass. In contrast to existing hypernetworks that were either task-specific or trained for relatively constraint multi-task settings, MotherNet is trained to generate networks to perform multiclass classification on arbitrary tabular datasets without any dataset specific gradient descent. The child network generated by MotherNet using in-context learning outperforms neural networks trained using gradient descent on small datasets, and is competitive with predictions by TabPFN and standard ML methods like Gradient Boosting. Unlike a direct application of transformer models like TabPFN, MotherNet generated networks are highly efficient at inference time. This methodology opens up a new approach to building predictive models on tabular data that is both efficient and robust, without any dataset-specific training.
MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning
Authors: Authors: Yi Xin, Junlong Du, Qiang Wang, Ke Yan, Shouhong Ding
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08636
Pdf link: https://arxiv.org/pdf/2312.08636
Abstract Multi-Task Learning (MTL) is designed to train multiple correlated tasks simultaneously, thereby enhancing the performance of individual tasks. Typically, a multi-task network structure consists of a shared backbone and task-specific decoders. However, the complexity of the decoders increases with the number of tasks. To tackle this challenge, we integrate the decoder-free vision-language model CLIP, which exhibits robust zero-shot generalization capability. Recently, parameter-efficient transfer learning methods have been extensively explored with CLIP for adapting to downstream tasks, where prompt tuning showcases strong potential. Nevertheless, these methods solely fine-tune a single modality (text or visual), disrupting the modality structure of CLIP. In this paper, we first propose Multi-modal Alignment Prompt (MmAP) for CLIP, which aligns text and visual modalities during fine-tuning process. Building upon MmAP, we develop an innovative multi-task prompt learning framework. On the one hand, to maximize the complementarity of tasks with high similarity, we utilize a gradient-driven task grouping method that partitions tasks into several disjoint groups and assign a group-shared MmAP to each group. On the other hand, to preserve the unique characteristics of each task, we assign an task-specific MmAP to each task. Comprehensive experiments on two large multi-task learning datasets demonstrate that our method achieves significant performance improvements compared to full fine-tuning while only utilizing approximately 0.09% of trainable parameters.
CLIP-guided Federated Learning on Heterogeneous and Long-Tailed Data
Authors: Authors: Jiangming Shi, Shanshan Zheng, Xiangbo Yin, Yang Lu, Yuan Xie, Yanyun Qu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08648
Pdf link: https://arxiv.org/pdf/2312.08648
Abstract Federated learning (FL) provides a decentralized machine learning paradigm where a server collaborates with a group of clients to learn a global model without accessing the clients' data. User heterogeneity is a significant challenge for FL, which together with the class-distribution imbalance further enhances the difficulty of FL. Great progress has been made in large vision-language models, such as Contrastive Language-Image Pre-training (CLIP), which paves a new way for image classification and object recognition. Inspired by the success of CLIP on few-shot and zero-shot learning, we use CLIP to optimize the federated learning between server and client models under its vision-language supervision. It is promising to mitigate the user heterogeneity and class-distribution balance due to the powerful cross-modality representation and rich open-vocabulary prior knowledge. In this paper, we propose the CLIP-guided FL (CLIP2FL) method on heterogeneous and long-tailed data. In CLIP2FL, the knowledge of the off-the-shelf CLIP model is transferred to the client-server models, and a bridge is built between the client and server. Specifically, for client-side learning, knowledge distillation is conducted between client models and CLIP to improve the ability of client-side feature representation. For server-side learning, in order to mitigate the heterogeneity and class-distribution imbalance, we generate federated features to retrain the server model. A prototype contrastive learning with the supervision of the text encoder of CLIP is introduced to generate federated features depending on the client-side gradients, and they are used to retrain a balanced server classifier.
Automated detection of Zika and dengue in Aedes aegypti using neural spiking analysis
Authors: Authors: Danial Sharifrazi, Nouman Javed, Roohallah Alizadehsani, Prasad N. Paradkar, U. Rajendra Acharya, Asim Bhatti
Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Arxiv link: https://arxiv.org/abs/2312.08654
Pdf link: https://arxiv.org/pdf/2312.08654
Abstract Mosquito-borne diseases present considerable risks to the health of both animals and humans. Aedes aegypti mosquitoes are the primary vectors for numerous medically important viruses such as dengue, Zika, yellow fever, and chikungunya. To characterize this mosquito neural activity, it is essential to classify the generated electrical spikes. However, no open-source neural spike classification method is currently available for mosquitoes. Our work presented in this paper provides an innovative artificial intelligence-based method to classify the neural spikes in uninfected, dengue-infected, and Zika-infected mosquitoes. Aiming for outstanding performance, the method employs a fusion of normalization, feature importance, and dimension reduction for the preprocessing and combines convolutional neural network and extra gradient boosting (XGBoost) for classification. The method uses the electrical spiking activity data of mosquito neurons recorded by microelectrode array technology. We used data from 0, 1, 2, 3, and 7 days post-infection, containing over 15 million samples, to analyze the method's performance. The performance of the proposed method was evaluated using accuracy, precision, recall, and the F1 scores. The results obtained from the method highlight its remarkable performance in differentiating infected vs uninfected mosquito samples, achieving an average of 98.1%. The performance was also compared with 6 other machine learning algorithms to further assess the method's capability. The method outperformed all other machine learning algorithms' performance. Overall, this research serves as an efficient method to classify the neural spikes of Aedes aegypti mosquitoes and can assist in unraveling the complex interactions between pathogens and mosquitoes.
Privacy Amplification by Iteration for ADMM with (Strongly) Convex Objective Functions
Authors: Authors: T-H. Hubert Chan, Hao Xie, Mengshi Zhao
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.08685
Pdf link: https://arxiv.org/pdf/2312.08685
Abstract We examine a private ADMM variant for (strongly) convex objectives which is a primal-dual iterative method. Each iteration has a user with a private function used to update the primal variable, masked by Gaussian noise for local privacy, without directly adding noise to the dual variable. Privacy amplification by iteration explores if noises from later iterations can enhance the privacy guarantee when releasing final variables after the last iteration. Cyffers et al. [ICML 2023] explored privacy amplification by iteration for the proximal ADMM variant, where a user's entire private function is accessed and noise is added to the primal variable. In contrast, we examine a private ADMM variant requiring just one gradient access to a user's function, but both primal and dual variables must be passed between successive iterations. To apply Balle et al.'s [NeurIPS 2019] coupling framework to the gradient ADMM variant, we tackle technical challenges with novel ideas. First, we address the non-expansive mapping issue in ADMM iterations by using a customized norm. Second, because the dual variables are not masked with any noise directly, their privacy guarantees are achieved by treating two consecutive noisy ADMM iterations as a Markov operator. Our main result is that the privacy guarantee for the gradient ADMM variant can be amplified proportionally to the number of iterations. For strongly convex objective functions, this amplification exponentially increases with the number of iterations. These amplification results align with the previously studied special case of stochastic gradient descent.
Gradient Informed Proximal Policy Optimization
Authors: Authors: Sanghyun Son, Laura Yu Zheng, Ryan Sullivan, Yi-Ling Qiao, Ming C. Lin
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.08710
Pdf link: https://arxiv.org/pdf/2312.08710
Abstract We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we introduce the concept of an {\alpha}-policy that stands as a locally superior policy. By adaptively modifying the {\alpha} value, we can effectively manage the influence of analytical policy gradients during learning. To this end, we suggest metrics for assessing the variance and bias of analytical gradients, reducing dependence on these gradients when high variance or bias is detected. Our proposed approach outperforms baseline algorithms in various scenarios, such as function optimization, physics simulations, and traffic control environments. Our code can be found online: https://github.com/SonSang/gippo.
A low-rank multipatch isogeometric method based on Tucker tensors
Authors: Authors: Monica Montardini, Giancarlo Sangalli, Mattia Tani
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2312.08736
Pdf link: https://arxiv.org/pdf/2312.08736
Abstract In this paper we present a low-rank method for conforming multipatch discretizations of compressible linear elasticity problems using Isogeometric Analysis. The proposed technique is a non-trivial extension of [M. Montardini, G. Sangalli, and M. Tani. A low-rank isogeometric solver based on Tucker tensors. Comput. Methods Appl. Mech. Engrg., page 116472, 2023.] to multipatch geometries. We tackle the model problem using an overlapping Schwarz method, where the subdomains can be defined as unions of neighbouring patches. Then on each subdomain we approximate the blocks of the linear system matrix and of the right-hand side vector using Tucker matrices and Tucker vectors, respectively. We use the Truncated Preconditioned Conjugate Gradient as a linear solver, coupled with a suited preconditioner. The numerical experiments show the advantages of this approach in terms of memory storage. Moreover, the number of iterations is robust with respect to the relevant parameters.
GOEnFusion: Gradient Origin Encodings for 3D Forward Diffusion Models
Authors: Authors: Animesh Karnewar, Andrea Vedaldi, Niloy J. Mitra, David Novotny
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2312.08744
Pdf link: https://arxiv.org/pdf/2312.08744
Abstract The recently introduced Forward-Diffusion method allows to train a 3D diffusion model using only 2D images for supervision. However, it does not easily generalise to different 3D representations and requires a computationally expensive auto-regressive sampling process to generate the underlying 3D scenes. In this paper, we propose GOEn: Gradient Origin Encoding (pronounced "gone"). GOEn can encode input images into any type of 3D representation without the need to use a pre-trained image feature extractor. It can also handle single, multiple or no source view(s) alike, by design, and tries to maximise the information transfer from the views to the encodings. Our proposed GOEnFusion model pairs GOEn encodings with a realisation of the Forward-Diffusion model which addresses the limitations of the vanilla Forward-Diffusion realisation. We evaluate how much information the GOEn mechanism transfers to the encoded representations, and how well it captures the prior distribution over the underlying 3D scenes, through the lens of a partial AutoEncoder. Lastly, the efficacy of the GOEnFusion model is evaluated on the recently proposed OmniObject3D dataset while comparing to the state-of-the-art Forward and non-Forward-Diffusion models and other 3D generative models.
RDARS Empowered Massive MIMO System: Two-Timescale Transceiver Design with Imperfect CSI
Authors: Authors: Chengzhi Ma, Jintao Wang, Xi Yang, Guanghua Yang, Wei Zhang, Shaodan Ma
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2312.08753
Pdf link: https://arxiv.org/pdf/2312.08753
Abstract In this paper, we investigate a novel reconfigurable distributed antennas and reflecting surface (RDARS) aided multi-user massive MIMO system with imperfect CSI and propose a practical two-timescale (TTS) transceiver design to reduce the communication overhead and computational complexity of the system. In the RDARS-aided system, not only distribution gain but also reflection gain can be obtained by a flexible combination of the distributed antennas and reflecting surface, which differentiates the system from the others and also makes the TTS design challenging. To enable the optimal TTS transceiver design, the achievable rate of the system is first derived in closed-form. Then the TTS design aiming at the weighted sum rate maximization is considered. To solve the challenging non-convex optimization problem with high-order design variables, i.e., the transmit powers and the phase shifts at the RDARS, a block coordinate descent based method is proposed to find the optimal solutions in semi-closed forms iteratively. Specifically, two efficient algorithms are proposed with provable convergence for the optimal phase shift design, i.e., Riemannian Gradient Ascent based algorithm by exploiting the unit-modulus constraints, and Two-Tier Majorization-Minimization based algorithm with closed-form optimal solutions in each iteration. Simulation results validate the effectiveness of the proposed algorithm and demonstrate the superiority of deploying RDARS in massive MIMO systems to provide substantial rate improvement with a significantly reduced total number of active antennas/RF chains and lower transmit power when compared to the DAS and RIS-aided systems.
May the Noise be with you: Adversarial Training without Adversarial Examples
Authors: Authors: Ayoub Arous, Andres F Lopez-Lopera, Nael Abu-Ghazaleh, Ihsen Alouani
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08877
Pdf link: https://arxiv.org/pdf/2312.08877
Abstract In this paper, we investigate the following question: Can we obtain adversarially-trained models without training on adversarial examples? Our intuition is that training a model with inherent stochasticity, i.e., optimizing the parameters by minimizing a stochastic loss function, yields a robust expectation function that is non-stochastic. In contrast to related methods that introduce noise at the input level, our proposed approach incorporates inherent stochasticity by embedding Gaussian noise within the layers of the NN model at training time. We model the propagation of noise through the layers, introducing a closed-form stochastic loss function that encapsulates a noise variance parameter. Additionally, we contribute a formalized noise-aware gradient, enabling the optimization of model parameters while accounting for stochasticity. Our experimental results confirm that the expectation model of a stochastic architecture trained on benign distribution is adversarially robust. Interestingly, we find that the impact of the applied Gaussian noise's standard deviation on both robustness and baseline accuracy closely mirrors the impact of the noise magnitude employed in adversarial training. Our work contributes adversarially trained networks using a completely different approach, with empirically similar robustness to adversarial training.
High-Dimensional Bayesian Optimisation with Large-Scale Constraints -- An Application to Aeroelastic Tailoring
Authors: Authors: Hauke Maathuis, Roeland De Breuker, Saullo G. P. Castro
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.08891
Pdf link: https://arxiv.org/pdf/2312.08891
Abstract Design optimisation potentially leads to lightweight aircraft structures with lower environmental impact. Due to the high number of design variables and constraints, these problems are ordinarily solved using gradient-based optimisation methods, leading to a local solution in the design space while the global space is neglected. Bayesian Optimisation is a promising path towards sample-efficient, global optimisation based on probabilistic surrogate models. While Bayesian optimisation methods have demonstrated their strength for problems with a low number of design variables, the scalability to high-dimensional problems while incorporating large-scale constraints is still lacking. Especially in aeroelastic tailoring where directional stiffness properties are embodied into the structural design of aircraft, to control aeroelastic deformations and to increase the aerodynamic and structural performance, the safe operation of the system needs to be ensured by involving constraints resulting from different analysis disciplines. Hence, a global design space search becomes even more challenging. The present study attempts to tackle the problem by using high-dimensional Bayesian Optimisation in combination with a dimensionality reduction approach to solve the optimisation problem occurring in aeroelastic tailoring, presenting a novel approach for high-dimensional problems with large-scale constraints. Experiments on well-known benchmark cases with black-box constraints show that the proposed approach can incorporate large-scale constraints.
Solving Dense Linear Systems Faster than via Preconditioning
Authors: Authors: Michał Dereziński, Jiaming Yang
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.08893
Pdf link: https://arxiv.org/pdf/2312.08893
Abstract We give a stochastic optimization algorithm that solves a dense $n\times n$ real-valued linear system $Ax=b$, returning $\tilde x$ such that $|A\tilde x-b|\leq \epsilon|b|$ in time: $$\tilde O((n^2+nk^{\omega-1})\log1/\epsilon),$$ where $k$ is the number of singular values of $A$ larger than $O(1)$ times its smallest positive singular value, $\omega < 2.372$ is the matrix multiplication exponent, and $\tilde O$ hides a poly-logarithmic in $n$ factor. When $k=O(n^{1-\theta})$ (namely, $A$ has a flat-tailed spectrum, e.g., due to noisy data or regularization), this improves on both the cost of solving the system directly, as well as on the cost of preconditioning an iterative method such as conjugate gradient. In particular, our algorithm has an $\tilde O(n^2)$ runtime when $k=O(n^{0.729})$. We further adapt this result to sparse positive semidefinite matrices and least squares regression. Our main algorithm can be viewed as a randomized block coordinate descent method, where the key challenge is simultaneously ensuring good convergence and fast per-iteration time. In our analysis, we use theory of majorization for elementary symmetric polynomials to establish a sharp convergence guarantee when coordinate blocks are sampled using a determinantal point process. We then use a Markov chain coupling argument to show that similar convergence can be attained with a cheaper sampling scheme, and accelerate the block coordinate descent update via matrix sketching.
Dataset Distillation via Adversarial Prediction Matching
Authors: Authors: Mingyang Chen, Bo Huang, Junda Lu, Bing Li, Yi Wang, Minhao Cheng, Wei Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08912
Pdf link: https://arxiv.org/pdf/2312.08912
Abstract Dataset distillation is the technique of synthesizing smaller condensed datasets from large original datasets while retaining necessary information to persist the effect. In this paper, we approach the dataset distillation problem from a novel perspective: we regard minimizing the prediction discrepancy on the real data distribution between models, which are respectively trained on the large original dataset and on the small distilled dataset, as a conduit for condensing information from the raw data into the distilled version. An adversarial framework is proposed to solve the problem efficiently. In contrast to existing distillation methods involving nested optimization or long-range gradient unrolling, our approach hinges on single-level optimization. This ensures the memory efficiency of our method and provides a flexible tradeoff between time and memory budgets, allowing us to distil ImageNet-1K using a minimum of only 6.5GB of GPU memory. Under the optimal tradeoff strategy, it requires only 2.5$\times$ less memory and 5$\times$ less runtime compared to the state-of-the-art. Empirically, our method can produce synthetic datasets just 10% the size of the original, yet achieve, on average, 94% of the test accuracy of models trained on the full original datasets including ImageNet-1K, significantly surpassing state-of-the-art. Additionally, extensive tests reveal that our distilled datasets excel in cross-architecture generalization capabilities.
EAT: Towards Long-Tailed Out-of-Distribution Detection
Authors: Authors: Tong Wei, Bo-Lin Wang, Min-Ling Zhang
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08939
Pdf link: https://arxiv.org/pdf/2312.08939
Abstract Despite recent advancements in out-of-distribution (OOD) detection, most current studies assume a class-balanced in-distribution training dataset, which is rarely the case in real-world scenarios. This paper addresses the challenging task of long-tailed OOD detection, where the in-distribution data follows a long-tailed class distribution. The main difficulty lies in distinguishing OOD data from samples belonging to the tail classes, as the ability of a classifier to detect OOD instances is not strongly correlated with its accuracy on the in-distribution classes. To overcome this issue, we propose two simple ideas: (1) Expanding the in-distribution class space by introducing multiple abstention classes. This approach allows us to build a detector with clear decision boundaries by training on OOD data using virtual labels. (2) Augmenting the context-limited tail classes by overlaying images onto the context-rich OOD data. This technique encourages the model to pay more attention to the discriminative features of the tail classes. We provide a clue for separating in-distribution and OOD data by analyzing gradient noise. Through extensive experiments, we demonstrate that our method outperforms the current state-of-the-art on various benchmark datasets. Moreover, our method can be used as an add-on for existing long-tail learning approaches, significantly enhancing their OOD detection performance. Code is available at: https://github.com/Stomach-ache/Long-Tailed-OOD-Detection .
Contact-Implicit MPC: Controlling Diverse Quadruped Motions Without Pre-Planned Contact Modes or Trajectories
Authors: Authors: Gijeong Kim, Dongyun Kang, Joon-Ha Kim, Seungwoo Hong, Hae-Won Park
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.08961
Pdf link: https://arxiv.org/pdf/2312.08961
Abstract This paper presents a contact-implicit model predictive control (MPC) framework for the real-time discovery of multi-contact motions, without predefined contact mode sequences or foothold positions. This approach utilizes the contact-implicit differential dynamic programming (DDP) framework, merging the hard contact model with a linear complementarity constraint. We propose the analytical gradient of the contact impulse based on relaxed complementarity constraints to further the exploration of a variety of contact modes. By leveraging a hard contact model-based simulation and computation of search direction through a smooth gradient, our methodology identifies dynamically feasible state trajectories, control inputs, and contact forces while simultaneously unveiling new contact mode sequences. However, the broadened scope of contact modes does not always ensure real-world applicability. Recognizing this, we implemented differentiable cost terms to guide foot trajectories and make gait patterns. Furthermore, to address the challenge of unstable initial roll-outs in an MPC setting, we employ the multiple shooting variant of DDP. The efficacy of the proposed framework is validated through simulations and real-world demonstrations using a 45 kg HOUND quadruped robot, performing various tasks in simulation and showcasing actual experiments involving a forward trot and a front-leg rearing motion.
Topic Bias in Emotion Classification
Authors: Authors: Maximilian Wegge, Roman Klinger
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2312.09043
Pdf link: https://arxiv.org/pdf/2312.09043
Abstract Emotion corpora are typically sampled based on keyword/hashtag search or by asking study participants to generate textual instances. In any case, these corpora are not uniform samples representing the entirety of a domain. We hypothesize that this practice of data acquisition leads to unrealistic correlations between overrepresented topics in these corpora that harm the generalizability of models. Such topic bias could lead to wrong predictions for instances like "I organized the service for my aunt's funeral." when funeral events are over-represented for instances labeled with sadness, despite the emotion of pride being more appropriate here. In this paper, we study this topic bias both from the data and the modeling perspective. We first label a set of emotion corpora automatically via topic modeling and show that emotions in fact correlate with specific topics. Further, we see that emotion classifiers are confounded by such topics. Finally, we show that the established debiasing method of adversarial correction via gradient reversal mitigates the issue. Our work points out issues with existing emotion corpora and that more representative resources are required for fair evaluation of models predicting affective concepts from text.
Keyword: super-resolution

CartoMark: a benchmark dataset for map pattern recognition and 1 map content retrieval with machine intelligence
Authors: Authors: Xiran Zhou, Yi Wen, Honghao Li, Kaiyuan Li, Zhenfeng Shao, Zhigang Yan, Xiao Xie
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2312.08600
Pdf link: https://arxiv.org/pdf/2312.08600
Abstract Maps are fundamental medium to visualize and represent the real word in a simple and 16 philosophical way. The emergence of the 3rd wave information has made a proportion of maps are available to be generated ubiquitously, which would significantly enrich the dimensions and perspectives to understand the characteristics of the real world. However, a majority of map dataset have never been discovered, acquired and effectively used, and the map data used in many applications might not be completely fitted for the authentic demands of these applications. This challenge is emerged due to the lack of numerous well-labelled benchmark datasets for implementing the deep learning approaches into identifying complicated map content. Thus, we develop a large-scale benchmark dataset that includes well-labelled dataset for map text annotation recognition, map scene classification, map super-resolution reconstruction, and map style transferring. Furthermore, these well-labelled datasets would facilitate the state-of-the-art machine intelligence technologies to conduct map feature detection, map pattern recognition and map content retrieval. We hope our efforts would be useful for AI-enhanced cartographical applications.
Guided Image Restoration via Simultaneous Feature and Image Guided Fusion
Authors: Authors: Xinyi Liu, Qian Zhao, Jie Liang, Hui Zeng, Deyu Meng, Lei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08853
Pdf link: https://arxiv.org/pdf/2312.08853
Abstract Guided image restoration (GIR), such as guided depth map super-resolution and pan-sharpening, aims to enhance a target image using guidance information from another image of the same scene. Currently, joint image filtering-inspired deep learning-based methods represent the state-of-the-art for GIR tasks. Those methods either deal with GIR in an end-to-end way by elaborately designing filtering-oriented deep neural network (DNN) modules, focusing on the feature-level fusion of inputs; or explicitly making use of the traditional joint filtering mechanism by parameterizing filtering coefficients with DNNs, working on image-level fusion. The former ones are good at recovering contextual information but tend to lose fine-grained details, while the latter ones can better retain textual information but might lead to content distortions. In this work, to inherit the advantages of both methodologies while mitigating their limitations, we proposed a Simultaneous Feature and Image Guided Fusion (SFIGF) network, that simultaneously considers feature and image-level guided fusion following the guided filter (GF) mechanism. In the feature domain, we connect the cross-attention (CA) with GF, and propose a GF-inspired CA module for better feature-level fusion; in the image domain, we fully explore the GF mechanism and design GF-like structure for better image-level fusion. Since guided fusion is implemented in both feature and image domains, the proposed SFIGF is expected to faithfully reconstruct both contextual and textual information from sources and thus lead to better GIR results. We apply SFIGF to 4 typical GIR tasks, and experimental results on these tasks demonstrate its effectiveness and general availability.
Diffusion-based Blind Text Image Super-Resolution
Authors: Authors: Yuzhe Zhang, Jiawei Zhang, Hao Li, Zhouxia Wang, Luwei Hou, Dongqing Zou, Liheng Bian
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.08886
Pdf link: https://arxiv.org/pdf/2312.08886
Abstract Recovering degraded low-resolution text images is challenging, especially for Chinese text images with complex strokes and severe degradation in real-world scenarios. Ensuring both text fidelity and style realness is crucial for high-quality text image super-resolution. Recently, diffusion models have achieved great success in natural image synthesis and restoration due to their powerful data distribution modeling abilities and data generation capabilities. In this work, we propose an Image Diffusion Model (IDM) to restore text images with realistic styles. For diffusion models, they are not only suitable for modeling realistic image distribution but also appropriate for learning text distribution. Since text prior is important to guarantee the correctness of the restored text structure according to existing arts, we also propose a Text Diffusion Model (TDM) for text recognition which can guide IDM to generate text images with correct structures. We further propose a Mixture of Multi-modality module (MoM) to make these two diffusion models cooperate with each other in all the diffusion steps. Extensive experiments on synthetic and real-world datasets demonstrate that our Diffusion-based Blind Text Image Super-Resolution (DiffTSR) can restore text images with more accurate text structures as well as more realistic appearances simultaneously.

zoq / arxiv-updates

New submissions for Fri, 15 Dec 23 #665

Keyword: sgd

Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods

Contractive error feedback for gradient compression

Keyword: optimization

A global optimization SAR image segmentation model can be easily transformed to a general ROF denoising model

A Study on the Inductance and Thermal Regression and Optimization for Automatic Layout Design of Power Modules

auto-sktime: Automated Time Series Forecasting

Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods

Contractive error feedback for gradient compression

On Searching for Minimal Integer Representation of Undirected Graphs

Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models

NViST: In the Wild New View Synthesis from a Single Image with Transformers

Federated Learning for Wireless Applications: A Prototype

Omega-Regular Decision Processes

Verification of Neural Reachable Tubes via Scenario Optimization and Conformal Prediction

MaxK-GNN: Towards Theoretical Speed Limits for Accelerating Graph Neural Networks Training

Incomplete Contrastive Multi-View Clustering with High-Confidence Guiding

Gradient Informed Proximal Policy Optimization

Aerial STAR-RIS Empowered MEC: A DRL Approach for Energy Minimization

FAPP: Fast and Adaptive Perception and Planning for UAVs in Dynamic Cluttered Environments

All-to-all reconfigurability with sparse Ising machines: the XORSAT challenge with p-bits

RDARS Empowered Massive MIMO System: Two-Timescale Transceiver Design with Imperfect CSI

CF-NeRF: Camera Parameter Free Neural Radiance Fields with Incremental Learning

Implement services for business scenarios by combining basic emulators

Semantics-Division Duplexing: A Novel Full-Duplex Paradigm

What, How, and When Should Object Detectors Update in Continually Changing Test Domains?

May the Noise be with you: Adversarial Training without Adversarial Examples

Neural Video Fields Editing

SceneWiz3D: Towards Text-guided 3D Scene Composition

Solving Dense Linear Systems Faster than via Preconditioning

Dataset Distillation via Adversarial Prediction Matching

BiPFT: Binary Pre-trained Foundation Transformer with Low-rank Estimation of Binarization Residual Polynomials

Single and Multi-Objective Benchmark Problems Focusing on Human-Powered Aircraft Design

LLMind: Orchestrating AI and IoT with LLMs for Complex Task Execution

Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

Symmetry Breaking and Equivariant Neural Networks

Bayes Net based highbrid Monte Carlo Optimization for Redundant Manipulator

Graph Neural Networks with Diverse Spectral Filtering

Entropy Regularization and Faster Decremental Matching in General Graphs

COMBHelper: A Neural Approach to Reduce Search Space for Graph Combinatorial Problems

Tokenize Anything via Prompting

Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments

Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers

Architecture Singularity Distance Computations for Linear Pentapods

Efficient Online Learning of Contact Force Models for Connector Insertion

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining

Keyword: adam

Identifying Planetary Names in Astronomy Papers: A Multi-Step Approach

Keyword: gradient

Accelerating Meta-Learning by Sharing Gradients

A Study on the Inductance and Thermal Regression and Optimization for Automatic Layout Design of Power Modules

Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods

World Models via Policy-Guided Trajectory Diffusion

Contractive error feedback for gradient compression

On Searching for Minimal Integer Representation of Undirected Graphs

MotherNet: A Foundational Hypernetwork for Tabular Classification

MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning

CLIP-guided Federated Learning on Heterogeneous and Long-Tailed Data

Automated detection of Zika and dengue in Aedes aegypti using neural spiking analysis

Privacy Amplification by Iteration for ADMM with (Strongly) Convex Objective Functions

Gradient Informed Proximal Policy Optimization

A low-rank multipatch isogeometric method based on Tucker tensors

GOEnFusion: Gradient Origin Encodings for 3D Forward Diffusion Models

RDARS Empowered Massive MIMO System: Two-Timescale Transceiver Design with Imperfect CSI

May the Noise be with you: Adversarial Training without Adversarial Examples

High-Dimensional Bayesian Optimisation with Large-Scale Constraints -- An Application to Aeroelastic Tailoring

Solving Dense Linear Systems Faster than via Preconditioning

Dataset Distillation via Adversarial Prediction Matching

EAT: Towards Long-Tailed Out-of-Distribution Detection

Contact-Implicit MPC: Controlling Diverse Quadruped Motions Without Pre-Planned Contact Modes or Trajectories

Topic Bias in Emotion Classification

Keyword: super-resolution

CartoMark: a benchmark dataset for map pattern recognition and 1 map content retrieval with machine intelligence

Guided Image Restoration via Simultaneous Feature and Image Guided Fusion

Diffusion-based Blind Text Image Super-Resolution