New submissions for Tue, 5 Dec 23

Keyword: sgd

AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix

Authors: Authors: Yun Yue, Zhiling Ye, Jiadi Jiang, Yongchao Liu, Ke Zhang
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.01658
Pdf link: https://arxiv.org/pdf/2312.01658
Abstract Adaptive optimizers, such as Adam, have achieved remarkable success in deep learning. A key component of these optimizers is the so-called preconditioning matrix, providing enhanced gradient information and regulating the step size of each gradient direction. In this paper, we propose a novel approach to designing the preconditioning matrix by utilizing the gradient difference between two successive steps as the diagonal elements. These diagonal elements are closely related to the Hessian and can be perceived as an approximation of the inner product between the Hessian row vectors and difference of the adjacent parameter vectors. Additionally, we introduce an auto-switching function that enables the preconditioning matrix to switch dynamically between Stochastic Gradient Descent (SGD) and the adaptive optimizer. Based on these two techniques, we develop a new optimizer named AGD that enhances the generalization performance. We evaluate AGD on public datasets of Natural Language Processing (NLP), Computer Vision (CV), and Recommendation Systems (RecSys). Our experimental results demonstrate that AGD outperforms the state-of-the-art (SOTA) optimizers, achieving highly competitive or significantly better predictive performance. Furthermore, we analyze how AGD is able to switch automatically between SGD and the adaptive optimizer and its actual effects on various scenarios. The code is available at https://github.com/intelligent-machine-learning/dlrover/tree/master/atorch/atorch/optimizers.
Keyword: optimization

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Authors: Authors: Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, Tat-Seng Chua
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.00849
Pdf link: https://arxiv.org/pdf/2312.00849
Abstract Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. However, existing MLLMs prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. The problem makes existing MLLMs untrustworthy and thus impractical in real-world (especially high-stakes) applications. To address the challenge, we present RLHF-V, which enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback. Specifically, RLHF-V collects human preference in the form of segment-level corrections on hallucinations, and performs dense direct preference optimization over the human feedback. Comprehensive experiments on five benchmarks in both automatic and human evaluation show that, RLHF-V can enable substantially more trustworthy MLLM behaviors with promising data and computation efficiency. Remarkably, using 1.4k annotated data samples, RLHF-V significantly reduces the hallucination rate of the base MLLM by 34.8%, outperforming the concurrent LLaVA-RLHF trained on 10k annotated data. The final model achieves state-of-the-art performance in trustworthiness among open-source MLLMs, and shows better robustness than GPT-4V in preventing hallucinations aroused from over-generalization. We open-source our code, model, and data at https://github.com/RLHF-V/RLHF-V.
Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction
Authors: Authors: Shuchi Wu, Chuan Ma, Kang Wei, Xiaogang Xu, Ming Ding, Yuwen Qian, Tao Xiang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2312.00855
Pdf link: https://arxiv.org/pdf/2312.00855
Abstract This paper introduces RDA, a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders: (1) suboptimal performances attributed to biased optimization objectives, and (2) elevated query costs stemming from the end-to-end paradigm that necessitates querying the target encoder every epoch. Specifically, we initially Refine the representations of the target encoder for each training sample, thereby establishing a less biased optimization objective before the steal-training phase. This is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives. Demanding exponentially fewer queries compared to the end-to-end approach, prototypes can be instantiated to guide subsequent query-free training. For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs while Aligning those matched ones in terms of both amplitude and angle. In this way, the trained surrogate encoder achieves state-of-the-art results across the board in various downstream datasets with limited queries. Moreover, RDA is shown to be robust to multiple widely-used defenses.
Hyperparameter Optimization for Large Language Model Instruction-Tuning
Authors: Authors: Christophe Tribes, Sacha Benarroch-Lelong, Peng Lu, Ivan Kobyzev
Subjects: Computation and Language (cs.CL); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.00949
Pdf link: https://arxiv.org/pdf/2312.00949
Abstract The fine-tuning of Large Language Models (LLMs) has enabled them to recently achieve milestones in natural language processing applications. The emergence of ever larger LLMs has paved the way for more efficient fine-tuning methods. Among these, the Low-Rank Adaptation (LoRA) method keeps most of the weights of the pre-trained LLM frozen while introducing a low-rank decomposition of the weight matrix, enabling the tuning of only a very small proportion of the network. The performance on downstream tasks of models fine-tuned with LoRA heavily relies on a set of hyperparameters including the rank of the decomposition. In this work, we investigate the choice of these hyperparameters through two main blackbox optimization (BBO) techniques. We examine the whole pipeline of performing fine-tuning and validation on a pre-trained LLM as a blackbox and efficiently explore the space of hyperparameters with the \nomad algorithm, achieving a boost in performance and human alignment of the tuned model.
Biased Random-Key Genetic Algorithms: A Review
Authors: Authors: Mariana A. Londe, Luciana S. Pessoa, Carlos E. Andrade, Mauricio G. C. Resende
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2312.00961
Pdf link: https://arxiv.org/pdf/2312.00961
Abstract This paper is a comprehensive literature review of Biased Random-Key Genetic Algorithms (BRKGA). BRKGA is a metaheuristic that employs random-key-based chromosomes with biased, uniform, and elitist mating strategies in a genetic algorithm framework. The review encompasses over 150 papers with a wide range of applications, including classical combinatorial optimization problems, real-world industrial use cases, and non-orthodox applications such as neural network hyperparameter tuning in machine learning. Scheduling is by far the most prevalent application area in this review, followed by network design and location problems. The most frequent hybridization method employed is local search, and new features aim to increase population diversity. Overall, this survey provides a comprehensive overview of the BRKGA metaheuristic and its applications and highlights important areas for future research.
Consistent Mesh Diffusion
Authors: Authors: Julian Knodt, Xifeng Gao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2312.00971
Pdf link: https://arxiv.org/pdf/2312.00971
Abstract Given a 3D mesh with a UV parameterization, we introduce a novel approach to generating textures from text prompts. While prior work uses optimization from Text-to-Image Diffusion models to generate textures and geometry, this is slow and requires significant compute resources. Alternatively, there are projection based approaches that use the same Text-to-Image models that paint images onto a mesh, but lack consistency at different viewing angles, we propose a method that uses a single Depth-to-Image diffusion network, and generates a single consistent texture when rendered on the 3D surface by first unifying multiple 2D image's diffusion paths, and hoisting that to 3D with MultiDiffusion~\cite{multidiffusion}. We demonstrate our approach on a dataset containing 30 meshes, taking approximately 5 minutes per mesh. To evaluate the quality of our approach, we use CLIP-score~\cite{clipscore} and Frechet Inception Distance (FID)~\cite{frechet} to evaluate the quality of the rendering, and show our improvement over prior work.
Combining Kernelized Autoencoding and Centroid Prediction for Dynamic Multi-objective Optimization
Authors: Authors: Zhanglu Hou, Juan Zou, Gan Ruan, Yuan Liu, Yizhang Xia
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2312.00978
Pdf link: https://arxiv.org/pdf/2312.00978
Abstract Evolutionary algorithms face significant challenges when dealing with dynamic multi-objective optimization because Pareto optimal solutions and/or Pareto optimal fronts change. This paper proposes a unified paradigm, which combines the kernelized autoncoding evolutionary search and the centriod-based prediction (denoted by KAEP), for solving dynamic multi-objective optimization problems (DMOPs). Specifically, whenever a change is detected, KAEP reacts effectively to it by generating two subpopulations. The first subpoulation is generated by a simple centriod-based prediction strategy. For the second initial subpopulation, the kernel autoencoder is derived to predict the moving of the Pareto-optimal solutions based on the historical elite solutions. In this way, an initial population is predicted by the proposed combination strategies with good convergence and diversity, which can be effective for solving DMOPs. The performance of our proposed method is compared with five state-of-the-art algorithms on a number of complex benchmark problems. Empirical results fully demonstrate the superiority of our proposed method on most test instances.
Learning-based Ecological Adaptive Cruise Control of Autonomous Electric Vehicles: A Comparison of ADP, DQN and DDPG Approaches
Authors: Authors: Sunwoo Kim, Kwang-Ki K. Kim
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.01004
Pdf link: https://arxiv.org/pdf/2312.01004
Abstract This paper presents model-based and model-free learning methods for economic and ecological adaptive cruise control (Eco-ACC) of connected and autonomous electric vehicles. For model-based optimal control of Eco-ACC, we considered longitudinal vehicle dynamics and a quasi-steady-state powertrain model including the physical limits of a commercial electric vehicle. We used adaptive dynamic programming (ADP), in which the value function was trained using data obtained from IPG CarMaker simulations. For real-time implementation, forward multi-step look-ahead prediction and optimization were executed in a receding horizon scheme to maximize the energy efficiency of the electric machine while avoiding rear-end collisions and satisfying the powertrain, speed, and distance-gap constraints. For model-free optimal control of Eco-ACC, we applied two reinforcement learning methods, Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG), in which deep neural networks were trained in IPG CarMaker simulations. For performance demonstrations, the HWFET, US06, and WLTP Class 3b driving cycles were used to simulate the front vehicle, and the energy consumptions of the host vehicle and front vehicle were compared. In high-fidelity IPG CarMaker simulations, the proposed learning-based Eco-ACC methods demonstrated approximately 3-5% and 10-14% efficiency improvements in highway and city-highway driving scenarios, respectively, compared with the front vehicle. A video of the CarMaker simulation is available at https://youtu.be/DIXzJxMVig8.
Adding Domain Knowledge to Query-Driven Learned Databases
Authors: Authors: Peizhi Wu, Ryan Marcus, Zachary G. Ives
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2312.01025
Pdf link: https://arxiv.org/pdf/2312.01025
Abstract In recent years, \emph{learned cardinality estimation} has emerged as an alternative to traditional query optimization methods: by training machine learning models over observed query performance, learned cardinality estimation techniques can accurately predict query cardinalities and costs -- accounting for skew, correlated predicates, and many other factors that traditional methods struggle to capture. However, query-driven learned cardinality estimators are dependent on sample workloads, requiring vast amounts of labeled queries. Further, we show that state-of-the-art query-driven techniques can make significant and unpredictable errors on queries that are outside the distribution of their training set. We show that these out-of-distribution errors can be mitigated by incorporating the \emph{domain knowledge} used in traditional query optimizers: \emph{constraints} on values and cardinalities (e.g., based on key-foreign-key relationships, range predicates, and more generally on inclusion and functional dependencies). We develop methods for \emph{semi-supervised} query-driven learned query optimization, based on constraints, and we experimentally demonstrate that such techniques can increase a learned query optimizer's accuracy in cardinality estimation, reduce the reliance on massive labeled queries, and improve the robustness of query end-to-end performance.
From Beginner to Expert: Modeling Medical Knowledge into General LLMs
Authors: Authors: Qiang Li, Xiaoyan Yang, Haowen Wang, Qin Wang, Lei Liu, Junjie Wang, Yang Zhang, Mingyuan Chu, Sen Hu, Yicheng Chen, Yue Shen, Cong Fan, Wangshu Zhang, Teng Xu, Jinjie Gu, Jing Zheng, Guannan Zhang Ant Group
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2312.01040
Pdf link: https://arxiv.org/pdf/2312.01040
Abstract Recently, large language model (LLM) based artificial intelligence (AI) systems have demonstrated remarkable capabilities in natural language understanding and generation. However, these models face a significant challenge when it comes to sensitive applications, such as reasoning over medical knowledge and answering medical questions in a physician-like manner. Prior studies attempted to overcome this challenge by increasing the model size (>100B) to learn more general medical knowledge, while there is still room for improvement in LLMs with smaller-scale model sizes (<100B). In this work, we start from a pre-trained general LLM model (AntGLM-10B) and fine-tune it from a medical beginner towards a medical expert (called AntGLM-Med-10B), which leverages a 3-stage optimization procedure, \textit{i.e.}, general medical knowledge injection, medical domain instruction tuning, and specific medical task adaptation. Our contributions are threefold: (1) We specifically investigate how to adapt a pre-trained general LLM in medical domain, especially for a specific medical task. (2) We collect and construct large-scale medical datasets for each stage of the optimization process. These datasets encompass various data types and tasks, such as question-answering, medical reasoning, multi-choice questions, and medical conversations. (3) Specifically for multi-choice questions in the medical domain, we propose a novel Verification-of-Choice approach for prompting engineering, which significantly enhances the reasoning ability of LLMs. Remarkably, by combining the above approaches, our AntGLM-Med-10B model can outperform the most of LLMs on PubMedQA, including both general and medical LLMs, even when these LLMs have larger model size.
Covert Communications in STAR-RIS-Aided Rate-Splitting Multiple Access Systems
Authors: Authors: Heng Chang, Hai Yang, Shuobo Xu, Xiyu Pang, Hongwu Liu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.01042
Pdf link: https://arxiv.org/pdf/2312.01042
Abstract In this paper, we investigate covert communications in a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-aided rate-splitting multiple access (RSMA) system. Under the RSMA principles, the messages for the covert user (Bob) and public user (Grace) are converted to the common and private streams at the legitimate transmitter (Alice) to realize downlink transmissions, while the STAR-RIS is deployed not only to aid the public transmissions from Alice to Grace, but also to shield the covert transmissions from Alice to Bob against the warden (Willie). To characterize the covert performance of the considered STAR-RIS-aided RSMA (STAR-RIS-RSMA) system, we derive analytical expression for the minimum average detection error probability of Willie, based on which a covert rate maximization problem is formulated. To maximize Bob's covert rate while confusing Willie's monitoring, the transmit power allocation, common rate allocation, and STAR-RIS reflection/transmission beamforming are jointly optimized subject to Grace's quality of service (QoS) requirements. The non-convex covert rate maximization problem, consisting of highly coupled system parameters are decoupled into three sub-problems of transmit power allocation, common rate allocation, and STAR-RIS reflection/transmission beamforming, respectively. To obtain the rank-one constrained optimal solution for the sub-problem of optimizing the STAR-RIS reflection/transmission beamforming, a penalty-based successive convex approximation scheme is developed. Moreover, an alternative optimization (AO) algorithm is designed to determine the optimal solution for the sub-problem of optimizing the transmit power allocation, while the original problem is overall solved by a new AO algorithm.
Investigating the Surrogate Modeling Capabilities of Continuous Time Echo State Networks
Authors: Authors: Saakaar Bhatnagar
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2312.01056
Pdf link: https://arxiv.org/pdf/2312.01056
Abstract Continuous Time Echo State Networks (CTESN) are a promising yet under-explored surrogate modeling technique for dynamical systems, particularly those governed by stiff Ordinary Differential Equations (ODEs). This paper critically investigates the effects of important hyper-parameters and algorithmic choices on the generalization capability of CTESN surrogates on two benchmark problems governed by Robertson's equations. The method is also used to parametrize the initial conditions of a system of ODEs that realistically model automobile collisions, solving them accurately up to 200 times faster than numerical ODE solvers. The results of this paper demonstrate the ability of CTESN surrogates to accurately predict sharp transients and highly nonlinear system responses, and their utility in speeding up the solution of stiff ODE systems, allowing for their use in diverse applications from accelerated design optimization to digital twins.
A Database System for State Management in Stateful Network Service Function Chains [Vision]
Authors: Authors: Zhonghao Yang, Shuhao Zhang
Subjects: Networking and Internet Architecture (cs.NI); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2312.01066
Pdf link: https://arxiv.org/pdf/2312.01066
Abstract Network Function Virtualization (NFV) heralds a transformative era in network function deployment, enabling the orchestration of Service Function Chains (SFCs) for delivering complex and dynamic network services. Yet, the development and sustenance of stateful SFCs remain challenging, with intricate demands for usability in SFC development, performance, and execution correctness. In this paper, we present DB4NFV, a database system designed to address these challenges. Central to DB4NFV is the integration of transactional semantics into the entire lifecycle of stateful SFC, a core idea that enhances all aspects of the system. This integration provides an intuitive and well-structured API, which greatly simplifies the development of stateful SFCs. Concurrently, transactional semantics facilitate the optimization of runtime performance by efficiently leveraging modern multicore architectures. Moreover, by encapsulating state operations as transactions, DB4NFV achieves robustness, even at the entire chain level, ensuring reliable operation across varying network conditions. Consequently, DB4NFV marks a substantial forward leap in NFV state management, leveraging transactional semantics to achieve a harmonious blend of usability, efficiency, and robustness, thus facilitating the effective deployment of stateful SFCs in contemporary network infrastructures.
Hybrid Hierarchical DRL Enabled Resource Allocation for Secure Transmission in Multi-IRS-Assisted Sensing-Enhanced Spectrum Sharing Networks
Authors: Authors: Lingyi Wang, Wei Wu, Fuhui Zhou, Qihui Wu, Octavia A. Dobre, Tony Q.S. Quek
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.01071
Pdf link: https://arxiv.org/pdf/2312.01071
Abstract Secure communications are of paramount importance in spectrum sharing networks due to the allocation and sharing characteristics of spectrum resources. To further explore the potential of intelligent reflective surfaces (IRSs) in enhancing spectrum sharing and secure transmission performance, a multiple intelligent reflection surface (multi-IRS)-assisted sensing-enhanced wideband spectrum sharing network is investigated by considering physical layer security techniques. An intelligent resource allocation scheme based on double deep Q networks (D3QN) algorithm and soft Actor-Critic (SAC) algorithm is proposed to maximize the secure transmission rate of the secondary network by jointly optimizing IRS pairings, subchannel assignment, transmit beamforming of the secondary base station, reflection coefficients of IRSs and the sensing time. To tackle the sparse reward problem caused by a significant amount of reflection elements of multiple IRSs, the method of hierarchical reinforcement learning is exploited. An alternative optimization (AO)-based conventional mathematical scheme is introduced to verify the computational complexity advantage of our proposed intelligent scheme. Simulation results demonstrate the efficiency of our proposed intelligent scheme as well as the superiority of multi-IRS design in enhancing secrecy rate and spectrum utilization. It is shown that inappropriate deployment of IRSs can reduce the security performance with the presence of multiple eavesdroppers (Eves), and the arrangement of IRSs deserves further consideration.
Prior-Aware Robust Beam Alignment for Low-SNR Millimeter-Wave Communications
Authors: Authors: Jihun Park, Yongjeong Oh, Jaewon Yun, Seonjung Kim, Yo-Seb Jeon
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.01100
Pdf link: https://arxiv.org/pdf/2312.01100
Abstract This paper presents a robust beam alignment technique for millimeter-wave communications in low signal-to-noise ratio (SNR) environments. The core strategy of our technique is to repeatedly transmit the most probable beam candidates to reduce beam misalignment probability induced by noise. Specifically, for a given beam training overhead, both the selection of candidates and the number of repetitions for each beam candidate are optimized based on channel prior information. To achieve this, a deep neural network is employed to learn the prior probability of the optimal beam at each location. The beam misalignment probability is then analyzed based on the channel prior, forming the basis for an optimization problem aimed at minimizing the analyzed beam misalignment probability. A closed-form solution is derived for a special case with two beam candidates, and an efficient algorithm is developed for general cases with multiple beam candidates. Simulation results using the DeepMIMO dataset demonstrate the superior performance of our technique in dynamic low-SNR communication environments when compared to existing beam alignment techniques.
Has Anything Changed? 3D Change Detection by 2D Segmentation Masks
Authors: Authors: Aikaterini Adam, Konstantinos Karantzalos, Lazaros Grammatikopoulos, Torsten Sattler
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.01148
Pdf link: https://arxiv.org/pdf/2312.01148
Abstract As capturing devices become common, 3D scans of interior spaces are acquired on a daily basis. Through scene comparison over time, information about objects in the scene and their changes is inferred. This information is important for robots and AR and VR devices, in order to operate in an immersive virtual experience. We thus propose an unsupervised object discovery method that identifies added, moved, or removed objects without any prior knowledge of what objects exist in the scene. We model this problem as a combination of a 3D change detection and a 2D segmentation task. Our algorithm leverages generic 2D segmentation masks to refine an initial but incomplete set of 3D change detections. The initial changes, acquired through render-and-compare likely correspond to movable objects. The incomplete detections are refined through graph optimization, distilling the information of the 2D segmentation masks in the 3D space. Experiments on the 3Rscan dataset prove that our method outperforms competitive baselines, with SoTA results.
Disjoint Dominating and 2-Dominating Sets in Graphs: Hardness and Approximation results
Authors: Authors: Soumyashree Rana, Sounaka Mishra, Bhawani Sankar Panda
Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2312.01149
Pdf link: https://arxiv.org/pdf/2312.01149
Abstract A set $D \subseteq V$ of a graph $G=(V, E)$ is a dominating set of $G$ if each vertex $v\in V\setminus D$ is adjacent to at least one vertex in $D,$ whereas a set $D_2\subseteq V$ is a $2$-dominating (double dominating) set of $G$ if each vertex $v\in V \setminus D_2$ is adjacent to at least two vertices in $D_2.$ A graph $G$ is a $DD_2$-graph if there exists a pair ($D, D_2$) of dominating set and $2$-dominating set of $G$ which are disjoint. In this paper, we solve some open problems posed by M.Miotk, J.~Topp and P.{.Z}yli{\'n}ski (Disjoint dominating and 2-dominating sets in graphs, Discrete Optimization, 35:100553, 2020) by giving approximation algorithms for the problem of determining a minimal spanning $DD_2$-graph of minimum size (Min-$DD_2$) with an approximation ratio of $3$; a minimal spanning $DD_2$-graph of maximum size (Max-$DD_2$) with an approximation ratio of $3$; and for the problem of adding minimum number of edges to a graph $G$ to make it a $DD_2$-graph (Min-to-$DD_2$) with an $O(\log n)$ approximation ratio. Furthermore, we prove that Min-$DD_2$ and Max-$DD_2$ are APX-complete for graphs with maximum degree $4$. We also show that Min-$DD_2$ and Max-$DD_2$ are approximable within a factor of $1.8$ and $1.5$ respectively, for any $3$-regular graph. Finally, we show the inapproximability result of Max-Min-to-$DD_2$ for bipartite graphs, that this problem can not be approximated within $n^{\frac{1}{6}-\varepsilon}$ for any $\varepsilon >0,$ unless P=NP.
Pointer Networks Trained Better via Evolutionary Algorithms
Authors: Authors: Muyao Zhong, Shengcai Liu, Bingdong Li, Haobo Fu, Chao Qian, Ke Tand, Peng Yang
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2312.01150
Pdf link: https://arxiv.org/pdf/2312.01150
Abstract Pointer Network (PtrNet) is a specific neural network for solving Combinatorial Optimization Problems (COPs). While PtrNets offer real-time feed-forward inference for complex COPs instances, its quality of the results tends to be less satisfactory. One possible reason is that such issue suffers from the lack of global search ability of the gradient descent, which is frequently employed in traditional PtrNet training methods including both supervised learning and reinforcement learning. To improve the performance of PtrNet, this paper delves deeply into the advantages of training PtrNet with Evolutionary Algorithms (EAs), which have been widely acknowledged for not easily getting trapped by local optima. Extensive empirical studies based on the Travelling Salesman Problem (TSP) have been conducted. Results demonstrate that PtrNet trained with EA can consistently perform much better inference results than eight state-of-the-art methods on various problem scales. Compared with gradient descent based PtrNet training methods, EA achieves up to 30.21\% improvement in quality of the solution with the same computational time. With this advantage, this paper is able to at the first time report the results of solving 1000-dimensional TSPs by training a PtrNet on the same dimensionality, which strongly suggests that scaling up the training instances is in need to improve the performance of PtrNet on solving higher-dimensional COPs.
Recent Advances in Scalable Energy-Efficient and Trustworthy Spiking Neural networks: from Algorithms to Technology
Authors: Authors: Souvik Kundu, Rui-Jie Zhu, Akhilesh Jaiswal, Peter A. Beerel
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.01213
Pdf link: https://arxiv.org/pdf/2312.01213
Abstract Neuromorphic computing and, in particular, spiking neural networks (SNNs) have become an attractive alternative to deep neural networks for a broad range of signal processing applications, processing static and/or temporal inputs from different sensory modalities, including audio and vision sensors. In this paper, we start with a description of recent advances in algorithmic and optimization innovations to efficiently train and scale low-latency, and energy-efficient spiking neural networks (SNNs) for complex machine learning applications. We then discuss the recent efforts in algorithm-architecture co-design that explores the inherent trade-offs between achieving high energy-efficiency and low latency while still providing high accuracy and trustworthiness. We then describe the underlying hardware that has been developed to leverage such algorithmic innovations in an efficient way. In particular, we describe a hybrid method to integrate significant portions of the model's computation within both memory components as well as the sensor itself. Finally, we discuss the potential path forward for research in building deployable SNN systems identifying key challenges in the algorithm-hardware-application co-design space with an emphasis on trustworthiness.
RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction
Authors: Authors: Baptiste Brument, Robin Bruneau, Yvain Quéau, Jean Mélou, François Bernard Lauze, Jean-Denis, Jean-Denis Durou, Lilian Calvet
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.01215
Pdf link: https://arxiv.org/pdf/2312.01215
Abstract This paper introduces a versatile paradigm for integrating multi-view reflectance and normal maps acquired through photometric stereo. Our approach employs a pixel-wise joint re-parameterization of reflectance and normal, considering them as a vector of radiances rendered under simulated, varying illumination. This re-parameterization enables the seamless integration of reflectance and normal maps as input data in neural volume rendering-based 3D reconstruction while preserving a single optimization objective. In contrast, recent multi-view photometric stereo (MVPS) methods depend on multiple, potentially conflicting objectives. Despite its apparent simplicity, our proposed approach outperforms state-of-the-art approaches in MVPS benchmarks across F-score, Chamfer distance, and mean angular error metrics. Notably, it significantly improves the detailed 3D reconstruction of areas with high curvature or low visibility.
Strategic Data Revocation in Federated Unlearning
Authors: Authors: Ningning Ding, Ermin Wei, Randall Berry
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2312.01235
Pdf link: https://arxiv.org/pdf/2312.01235
Abstract By allowing users to erase their data's impact on federated learning models, federated unlearning protects users' right to be forgotten and data privacy. Despite a burgeoning body of research on federated unlearning's technical feasibility, there is a paucity of literature investigating the considerations behind users' requests for data revocation. This paper proposes a non-cooperative game framework to study users' data revocation strategies in federated unlearning. We prove the existence of a Nash equilibrium. However, users' best response strategies are coupled via model performance and unlearning costs, which makes the equilibrium computation challenging. We obtain the Nash equilibrium by establishing its equivalence with a much simpler auxiliary optimization problem. We also summarize users' multi-dimensional attributes into a single-dimensional metric and derive the closed-form characterization of an equilibrium, when users' unlearning costs are negligible. Moreover, we compare the cases of allowing and forbidding partial data revocation in federated unlearning. Interestingly, the results reveal that allowing partial revocation does not necessarily increase users' data contributions or payoffs due to the game structure. Additionally, we demonstrate that positive externalities may exist between users' data revocation decisions when users incur unlearning costs, while this is not the case when their unlearning costs are negligible.
PPAD-membership for Problems with Exact Rational Solutions: A General Approach via Convex Optimization
Authors: Authors: Aris Filos-Ratsikas, Kristoffer Arnsfelt Hansen, Kasper Høgh, Alexandros Hollender
Subjects: Computer Science and Game Theory (cs.GT); Computational Complexity (cs.CC)
Arxiv link: https://arxiv.org/abs/2312.01237
Pdf link: https://arxiv.org/pdf/2312.01237
Abstract We introduce a general technique for proving membership of search problems with exact rational solutions in PPAD, one of the most well-known classes containing total search problems with polynomial-time verifiable solutions. In particular, we construct a "pseudogate", coined the linear-OPT-gate, which can be used as a "plug-and-play" component in a piecewise-linear (PL) arithmetic circuit, as an integral component of the "Linear-FIXP" equivalent definition of the class. The linear-OPT-gate can solve several convex optimization programs, including quadratic programs, which often appear organically in the simplest existence proofs for these problems. This effectively transforms existence proofs to PPAD-membership proofs, and consequently establishes the existence of solutions described by rational numbers. Using the linear-OPT-gate, we are able to significantly simplify and generalize almost all known PPAD-membership proofs for finding exact solutions in the application domains of game theory, competitive markets, auto-bidding auctions, and fair division, as well as to obtain new PPAD-membership results for problems in these domains.
Rethinking PGD Attack: Is Sign Function Necessary?
Authors: Authors: Junjie Yang, Tianlong Chen, Xuxi Chen, Zhangyang Wang, Yingbin Liang
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.01260
Pdf link: https://arxiv.org/pdf/2312.01260
Abstract Neural networks have demonstrated success in various domains, yet their performance can be significantly degraded by even a small input perturbation. Consequently, the construction of such perturbations, known as adversarial attacks, has gained significant attention, many of which fall within "white-box" scenarios where we have full access to the neural network. Existing attack algorithms, such as the projected gradient descent (PGD), commonly take the sign function on the raw gradient before updating adversarial inputs, thereby neglecting gradient magnitude information. In this paper, we present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance, as well as its caveat. We also interpret why previous attempts of directly using raw gradients failed. Based on that, we further propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign. Specifically, we convert the constrained optimization problem into an unconstrained one, by introducing a new hidden variable of non-clipped perturbation that can move beyond the constraint. The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments, outperforming PGD and other competitors in various settings, without incurring any additional computational overhead. The codes is available in https://github.com/JunjieYang97/RGD.
Mendata: A Framework to Purify Manipulated Training Data
Authors: Authors: Zonghao Huang, Neil Gong, Michael K. Reiter
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.01281
Pdf link: https://arxiv.org/pdf/2312.01281
Abstract Untrusted data used to train a model might have been manipulated to endow the learned model with hidden properties that the data contributor might later exploit. Data purification aims to remove such manipulations prior to training the model. We propose Mendata, a novel framework to purify manipulated training data. Starting from a small reference dataset in which a large majority of the inputs are clean, Mendata perturbs the training inputs so that they retain their utility but are distributed similarly (as measured by Wasserstein distance) to the reference data, thereby eliminating hidden properties from the learned model. A key challenge is how to find such perturbations, which we address by formulating a min-max optimization problem and developing a two-step method to iteratively solve it. We demonstrate the effectiveness of Mendata by applying it to defeat state-of-the-art data poisoning and data tracing techniques.
Joint Beam Scheduling and Power Optimization for Beam Hopping LEO Satellite Systems
Authors: Authors: Shuang Zheng, Xing Zhang, Peng Wang, Wenbo Wang
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.01292
Pdf link: https://arxiv.org/pdf/2312.01292
Abstract Low earth orbit (LEO) satellite communications can provide ubiquitous and reliable services, making it an essential part of the Internet of Everything network. Beam hopping (BH) is an emerging technology for effectively addressing the issue of low resource utilization caused by the non-uniform spatio-temporal distribution of traffic demands. However, how to allocate multi-dimensional resources in a timely and efficient way for the highly dynamic LEO satellite systems remains a challenge. This paper proposes a joint beam scheduling and power optimization beam hopping (JBSPO-BH) algorithm considering the differences in the geographic distribution of sink nodes. The JBSPO-BH algorithm decouples the original problem into two sub-problems. The beam scheduling problem is modelled as a potential game, and the Nash equilibrium (NE) point is obtained as the beam scheduling strategy. Moreover, the penalty function interior point method is applied to optimize the power allocation. Simulation results show that the JBSPO-BH algorithm has low time complexity and fast convergence and achieves better performance both in throughput and fairness. Compared with greedy-based BH, greedy-based BH with the power optimization, round-robin BH, Max-SINR BH and satellite resource allocation algorithm, the throughput of the proposed algorithm is improved by 44.99%, 20.79%, 156.06%, 15.39% and 8.17%, respectively.
Two-stage dynamic creative optimization under sparse ambiguous samples for e-commerce advertising
Authors: Authors: Guandong Li, Xian Yang
Subjects: Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2312.01295
Pdf link: https://arxiv.org/pdf/2312.01295
Abstract Ad creative is one of the main mediums for e-commerce advertising. In our approach we decouple this dynamic creative optimization into two stages, a cascaded structure that can trade off between effectiveness and efficiency. In the first stage, we train an automatic creative optimization architecture based on autoco to simulate complex interactions between creative elements. Although we obtained the ranking of different creatives under a sku, because we bucketed and merged historical data according to periods, this confuses the ctr diversity of the same ad creatives on different days and weakens the ability to separate ambiguous samples. Therefore, we propose a transformer-based rerank model. With the help of the rank model, we propose a distillation method to learn the relative order of ideas and extract the ranking knowledge to guide the rerank learning. The creative order soft labels under each sku are generated by the rank model to alleviate the dilemma that a large number of under-represented creatives cannot obtain real labels. Through the knowledge diffusion of rerank, the ambiguous samples are associated with the positive and negative samples. Cascade rerank and autoco to output the estimated value of the synthetic ad image. In the second stage, we designed a bandit model, and the bandit selected one of the output ad of the first stage for timely delivery. Experimental results show that our method can outperform competing baselines in terms of sctr. Online A/B testing shows that our method improves ctr by 10% compared to the baseline.
Tradeoff of age-of-information and power under reliability constraint for short-packet communication with block-length adaptation
Authors: Authors: Sudarsanan A. K., Vineeth B. S., Chandra R. Murthy
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2312.01364
Pdf link: https://arxiv.org/pdf/2312.01364
Abstract In applications such as remote estimation and monitoring, update packets are transmitted by power-constrained devices using short-packet codes over wireless networks. Therefore, networks need to be end-to-end optimized using information freshness metrics such as age of information under transmit power and reliability constraints to ensure support for such applications. For short-packet coding, modelling and understanding the effect of block codeword length on transmit power and other performance metrics is important. To understand the above optimization for short-packet coding, we consider the optimal tradeoff problem between age of information and transmit power under reliability constraints for short packet point-to-point communication model with an exogenous packet generation process. In contrast to prior work, we consider scheduling policies that can possibly adapt the block-length or transmission time of short packet codes in order to achieve the optimal tradeoff. We characterize the tradeoff using a semi-Markov decision process formulation. We also obtain analytical upper bounds as well as numerical, analytical, and asymptotic lower bounds on the optimal tradeoff. We show that in certain regimes, such as high reliability and high packet generation rate, non-adaptive scheduling policies (fixed transmission time policies) are close-to-optimal. Furthermore, in a high-power or in a low-power regime, non-adaptive as well as state-independent randomized scheduling policies are order-optimal. These results are corroborated by numerical and simulation experiments. The tradeoff is then characterized for a wireless point-to-point channel with block fading as well as for other packet generation models (including an age-dependent packet generation model).
Regret Optimality of GP-UCB
Authors: Authors: Wenjia Wang, Xiaowei Zhang, Lu Zou
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.01386
Pdf link: https://arxiv.org/pdf/2312.01386
Abstract Gaussian Process Upper Confidence Bound (GP-UCB) is one of the most popular methods for optimizing black-box functions with noisy observations, due to its simple structure and superior performance. Its empirical successes lead to a natural, yet unresolved question: Is GP-UCB regret optimal? In this paper, we offer the first generally affirmative answer to this important open question in the Bayesian optimization literature. We establish new upper bounds on both the simple and cumulative regret of GP-UCB when the objective function to optimize admits certain smoothness property. These upper bounds match the known minimax lower bounds (up to logarithmic factors independent of the feasible region's dimensionality) for optimizing functions with the same smoothness. Intriguingly, our findings indicate that, with the same level of exploration, GP-UCB can simultaneously achieve optimality in both simple and cumulative regret. The crux of our analysis hinges on a refined uniform error bound for online estimation of functions in reproducing kernel Hilbert spaces. This error bound, which we derive from empirical process theory, is of independent interest, and its potential applications may reach beyond the scope of this study.
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
Authors: Authors: Can Jin, Tianjin Huang, Yihua Zhang, Mykola Pechenizkiy, Sijia Liu, Shiwei Liu, Tianlong Chen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.01397
Pdf link: https://arxiv.org/pdf/2312.01397
Abstract The rapid development of large-scale deep learning models questions the affordability of hardware platforms, which necessitates the pruning to reduce their computational and memory footprints. Sparse neural networks as the product, have demonstrated numerous favorable benefits like low complexity, undamaged generalization, etc. Most of the prominent pruning strategies are invented from a model-centric perspective, focusing on searching and preserving crucial weights by analyzing network topologies. However, the role of data and its interplay with model-centric pruning has remained relatively unexplored. In this research, we introduce a novel data-model co-design perspective: to promote superior weight sparsity by learning important model topology and adequate input data in a synergetic manner. Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework. As a pioneering effort, this paper conducts systematic investigations about the impact of different visual prompts on model pruning and suggests an effective joint optimization approach. Extensive experiments with 3 network architectures and 8 datasets evidence the substantial performance improvements from VPNs over existing start-of-the-art pruning algorithms. Furthermore, we find that subnetworks discovered by VPNs from pre-trained models enjoy better transferability across diverse downstream scenarios. These insights shed light on new promising possibilities of data-model co-designs for vision model sparsification.
Context-Enhanced Relational Operators with Vector Embeddings
Authors: Authors: Viktor Sanca, Manos Chatzakis, Anastasia Ailamaki
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.01476
Pdf link: https://arxiv.org/pdf/2312.01476
Abstract Collecting data, extracting value, and combining insights from relational and context-rich multi-modal sources in data processing pipelines presents a challenge for traditional relational DBMS. While relational operators allow declarative and optimizable query specification, they are limited to data transformations unsuitable for capturing or analyzing context. On the other hand, representation learning models can map context-rich data into embeddings, allowing machine-automated context processing but requiring imperative data transformation integration with the analytical query. To bridge this dichotomy, we present a context-enhanced relational join and introduce an embedding operator composable with relational operators. This enables hybrid relational and context-rich vector data processing, with algebraic equivalences compatible with relational algebra and corresponding logical and physical optimizations. We investigate model-operator interaction with vector data processing and study the characteristics of the E-join operator. Using an example of string embeddings, we demonstrate enabling hybrid context-enhanced processing on relational join operators with vector embeddings. The importance of holistic optimization, from logical to physical, is demonstrated in an order of magnitude execution time improvement.
Towards Decentralized Task Offloading and Resource Allocation in User-Centric Mobile Edge Computing
Authors: Authors: Langtian Qin, Hancheng Lu, Yuang Chen, Baolin Chong, Feng Wu
Subjects: Systems and Control (eess.SY); Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.01499
Pdf link: https://arxiv.org/pdf/2312.01499
Abstract In the traditional cellular-based mobile edge computing (MEC), users at the edge of the cell are prone to suffer severe inter-cell interference and signal attenuation, leading to low throughput even transmission interruptions. Such edge effect severely obstructs offloading of tasks to MEC servers. To address this issue, we propose user-centric mobile edge computing (UCMEC), a novel MEC architecture integrating user-centric transmission, which can ensure high throughput and reliable communication for task offloading. Then, we formulate an optimization problem with joint consideration of task offloading, power control, and computing resource allocation in UCMEC, aiming at obtaining the optimal performance in terms of long-term average total delay. To solve the intractable problem, we propose two decentralized joint optimization schemes based on multi-agent deep reinforcement learning (MADRL) and convex optimization, which consider both cooperation and non-cooperation among network nodes. Simulation results demonstrate that the proposed schemes in UCMEC can significantly improve the uplink transmission rate by at most 343.56% and reduce the long-term average total delay by at most 45.57% compared to traditional cellular-based MEC.
Learning Channel Capacity with Neural Mutual Information Estimator Based on Message Importance Measure
Authors: Authors: Zhefan Li, Rui She, Pingyi Fan, Chenghui Peng, Khaled B. Letaief
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.01546
Pdf link: https://arxiv.org/pdf/2312.01546
Abstract Channel capacity estimation plays a crucial role in beyond 5G intelligent communications. Despite its significance, this task is challenging for a majority of channels, especially for the complex channels not modeled as the well-known typical ones. Recently, neural networks have been used in mutual information estimation and optimization. They are particularly considered as efficient tools for learning channel capacity. In this paper, we propose a cooperative framework to simultaneously estimate channel capacity and design the optimal codebook. First, we will leverage MIM-based GAN, a novel form of generative adversarial network (GAN) using message importance measure (MIM) as the information distance, into mutual information estimation, and develop a novel method, named MIM-based mutual information estimator (MMIE). Then, we design a generalized cooperative framework for channel capacity learning, in which a generator is regarded as an encoder producing the channel input, while a discriminator is the mutual information estimator that assesses the performance of the generator. Through the adversarial training, the generator automatically learns the optimal codebook and the discriminator estimates the channel capacity. Numerical experiments will demonstrate that compared with several conventional estimators, the MMIE achieves state-of-the-art performance in terms of accuracy and stability.
A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video
Authors: Authors: Keito Kudo, Haruki Nagasawa, Jun Suzuki, Nobuyuki Shimizu
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.01575
Pdf link: https://arxiv.org/pdf/2312.01575
Abstract This paper proposes a practical multimodal video summarization task setting and a dataset to train and evaluate the task. The target task involves summarizing a given video into a predefined number of keyframe-caption pairs and displaying them in a listable format to grasp the video content quickly. This task aims to extract crucial scenes from the video in the form of images (keyframes) and generate corresponding captions explaining each keyframe's situation. This task is useful as a practical application and presents a highly challenging problem worthy of study. Specifically, achieving simultaneous optimization of the keyframe selection performance and caption quality necessitates careful consideration of the mutual dependence on both preceding and subsequent keyframes and captions. To facilitate subsequent research in this field, we also construct a dataset by expanding upon existing datasets and propose an evaluation framework. Furthermore, we develop two baseline systems and report their respective performance.
OCGEC: One-class Graph Embedding Classification for DNN Backdoor Detection
Authors: Authors: Haoyu Jiang, Haiyang Yu, Nan Li, Ping Yi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2312.01585
Pdf link: https://arxiv.org/pdf/2312.01585
Abstract Deep neural networks (DNNs) have been found vulnerable to backdoor attacks, raising security concerns about their deployment in mission-critical applications. There are various approaches to detect backdoor attacks, however they all make certain assumptions about the target attack to be detected and require equal and huge numbers of clean and backdoor samples for training, which renders these detection methods quite limiting in real-world circumstances. This study proposes a novel one-class classification framework called One-class Graph Embedding Classification (OCGEC) that uses GNNs for model-level backdoor detection with only a little amount of clean data. First, we train thousands of tiny models as raw datasets from a small number of clean datasets. Following that, we design a ingenious model-to-graph method for converting the model's structural details and weight features into graph data. We then pre-train a generative self-supervised graph autoencoder (GAE) to better learn the features of benign models in order to detect backdoor models without knowing the attack strategy. After that, we dynamically combine the GAE and one-class classifier optimization goals to form classification boundaries that distinguish backdoor models from benign models. Our OCGEC combines the powerful representation capabilities of graph neural networks with the utility of one-class classification techniques in the field of anomaly detection. In comparison to other baselines, it achieves AUC scores of more than 98% on a number of tasks, which far exceeds existing methods for detection even when they rely on a huge number of positive and negative samples. Our pioneering application of graphic scenarios for generic backdoor detection can provide new insights that can be used to improve other backdoor defense tasks. Code is available at https://github.com/jhy549/OCGEC.
Interference-Constrained Scheduling of a Cognitive Multi-hop Underwater Acoustic Network
Authors: Authors: Chen Peng, Urbashi Mitra
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.01625
Pdf link: https://arxiv.org/pdf/2312.01625
Abstract This paper investigates optimal scheduling for a cognitive multi-hop underwater acoustic network with a primary user interference constraint. The network consists of primary and secondary users, with multi-hop transmission adopted for both user types to provide reliable communications. Critical characteristics of underwater acoustic channels, including significant propagation delay, distance-and-frequency dependent attenuation, half-duplex modem, and inter-hop interference, are taken into account in the design and analysis. In particular, time-slot allocation is found to be more effective than frequency-slot allocation due to the underwater channel model. The goal of the network scheduling problem is to maximize the end-to-end throughput of the overall system while limiting the throughput loss of primary users. Both centralized and decentralized approaches are considered. Partially Observable Markov Decision Processes (POMDP) framework is applied to formulate the optimization problem, and an optimal dynamic programming algorithm is derived. However, the optimal dynamic programming solution is computationally intractable. Key properties are shown for the objective function, enabling the design of approximate schemes with significant complexity reduction. Numerical results show that the proposed schemes significantly increase system throughput while maintaining the primary throughput loss constraint. Under certain traffic conditions, the throughput gain over frequency-slot allocation schemes can be as high as 50%.
An End-to-End Network Pruning Pipeline with Sparsity Enforcement
Authors: Authors: Evan Dogariu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.01653
Pdf link: https://arxiv.org/pdf/2312.01653
Abstract Neural networks have emerged as a powerful tool for solving complex tasks across various domains, but their increasing size and computational requirements have posed significant challenges in deploying them on resource-constrained devices. Neural network sparsification, and in particular pruning, has emerged as an effective technique to alleviate these challenges by reducing model size, computational complexity, and memory footprint while maintaining competitive performance. However, many pruning pipelines modify the standard training pipeline at only a single stage, if at all. In this work, we look to develop an end-to-end training pipeline that befits neural network pruning and sparsification at all stages of training. To do so, we make use of nonstandard model parameter initialization, pre-pruning training methodologies, and post-pruning training optimizations. We conduct experiments utilizing combinations of these methods, in addition to different techniques used in the pruning step, and find that our combined pipeline can achieve significant gains over current state of the art approaches to neural network sparsification.
Optimizing Bus Travel: A Novel Approach to Feature Mining with P-KMEANS and P-LDA Algorithms
Authors: Authors: Hongjie Liu, Haotian Shi, Sicheng Fu, Tengfei Yuan, Xinhuan Zhang, Hongzhe Xu, Bin Ran
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.01687
Pdf link: https://arxiv.org/pdf/2312.01687
Abstract Customizing services for bus travel can bolster its attractiveness, optimize usage, alleviate traffic congestion, and diminish carbon emissions. This potential is realized by harnessing recent advancements in positioning communication facilities, the Internet of Things, and artificial intelligence for feature mining in public transportation. However, the inherent complexities of disorganized and unstructured public transportation data introduce substantial challenges to travel feature extraction. This study presents a bus travel feature extraction method rooted in Point of Interest (POI) data, employing enhanced P-KMENAS and P-LDA algorithms to overcome these limitations. While the KMEANS algorithm adeptly segments passenger travel paths into distinct clusters, its outcomes can be influenced by the initial K value. On the other hand, Latent Dirichlet Allocation (LDA) excels at feature identification and probabilistic interpretations yet encounters difficulties with feature intermingling and nuanced sub-feature interactions. Incorporating the POI dimension enhances our understanding of travel behavior, aligning it more closely with passenger attributes and facilitating easier data analysis. By incorporating POI data, our refined P-KMENAS and P-LDA algorithms grant a holistic insight into travel behaviors and attributes, effectively mitigating the limitations above. Consequently, this POI-centric algorithm effectively amalgamates diverse POI attributes, delineates varied travel contexts, and imparts probabilistic metrics to feature properties. Our method successfully mines the diverse aspects of bus travel, such as age, occupation, gender, sports, cost, safety, and personality traits. It effectively calculates relationships between individual travel behaviors and assigns explanatory and evaluative probabilities to POI labels, thereby enhancing bus travel optimization.
Risk-Controlling Model Selection via Guided Bayesian Optimization
Authors: Authors: Bracha Laufer-Goldshtein, Adam Fisch, Regina Barzilay, Tommi Jaakkola
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.01692
Pdf link: https://arxiv.org/pdf/2312.01692
Abstract Adjustable hyperparameters of machine learning models typically impact various key trade-offs such as accuracy, fairness, robustness, or inference cost. Our goal in this paper is to find a configuration that adheres to user-specified limits on certain risks while being useful with respect to other conflicting metrics. We solve this by combining Bayesian Optimization (BO) with rigorous risk-controlling procedures, where our core idea is to steer BO towards an efficient testing strategy. Our BO method identifies a set of Pareto optimal configurations residing in a designated region of interest. The resulting candidates are statistically verified and the best-performing configuration is selected with guaranteed risk levels. We demonstrate the effectiveness of our approach on a range of tasks with multiple desiderata, including low error rates, equitable predictions, handling spurious correlations, managing rate and distortion in generative models, and reducing computational costs.
Joint Task Partitioning and Parallel Scheduling in Device-Assisted Mobile Edge Networks
Authors: Authors: Yang Li, Xinlei Ge, Bo Lei, Xing Zhang, Wenbo Wang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2312.01751
Pdf link: https://arxiv.org/pdf/2312.01751
Abstract With the development of the Internet of Things (IoT), certain IoT devices have the capability to not only accomplish their own tasks but also simultaneously assist other resource-constrained devices. Therefore, this paper considers a device-assisted mobile edge computing system that leverages auxiliary IoT devices to alleviate the computational burden on the edge computing server and enhance the overall system performance. In this study, computationally intensive tasks are decomposed into multiple partitions, and each task partition can be processed in parallel on an IoT device or the edge server. The objective of this research is to develop an efficient online algorithm that addresses the joint optimization of task partitioning and parallel scheduling under time-varying system states, posing challenges to conventional numerical optimization methods. To address these challenges, a framework called online task partitioning action and parallel scheduling policy generation (OTPPS) is proposed, which is based on deep reinforcement learning (DRL). Specifically, the framework leverages a deep neural network (DNN) to learn the optimal partitioning action for each task by mapping input states. Furthermore, it is demonstrated that the remaining parallel scheduling problem exhibits NP-hard complexity when considering a specific task partitioning action. To address this subproblem, a fair and delay-minimized task scheduling (FDMTS) algorithm is designed. Extensive evaluation results demonstrate that OTPPS achieves near-optimal average delay performance and consistently high fairness levels in various environmental states compared to other baseline schemes.
Two-stage optimized unified adversarial patch for attacking visible-infrared cross-modal detectors in the physical world
Authors: Authors: Chengyin Hu, Weiwen Shi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.01789
Pdf link: https://arxiv.org/pdf/2312.01789
Abstract Currently, many studies have addressed security concerns related to visible and infrared detectors independently. In practical scenarios, utilizing cross-modal detectors for tasks proves more reliable than relying on single-modal detectors. Despite this, there is a lack of comprehensive security evaluations for cross-modal detectors. While existing research has explored the feasibility of attacks against cross-modal detectors, the implementation of a robust attack remains unaddressed. This work introduces the Two-stage Optimized Unified Adversarial Patch (TOUAP) designed for performing attacks against visible-infrared cross-modal detectors in real-world, black-box settings. The TOUAP employs a two-stage optimization process: firstly, PSO optimizes an irregular polygonal infrared patch to attack the infrared detector; secondly, the color QR code is optimized, and the shape information of the infrared patch from the first stage is used as a mask. The resulting irregular polygon visible modal patch executes an attack on the visible detector. Through extensive experiments conducted in both digital and physical environments, we validate the effectiveness and robustness of the proposed method. As the TOUAP surpasses baseline performance, we advocate for its widespread attention.
Using Bayesian Optimization to Design Time Step Size Controllers with Application to Modified Patankar--Runge--Kutta Methods
Authors: Authors: Thomas Izgin, Hendrik Ranocha
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2312.01796
Pdf link: https://arxiv.org/pdf/2312.01796
Abstract Modified Patankar--Runge--Kutta (MPRK) methods are linearly implicit time integration schemes developed to preserve positivity and a linear invariant such as the total mass in chemical reactions. MPRK methods are naturally equipped with embedded schemes yielding a local error estimate similar to Runge--Kutta pairs. To design good time step size controllers using these error estimates, we propose to use Bayesian optimization. In particular, we design a novel objective function that captures important properties such as tolerance convergence and computational stability. We apply our new approach to several MPRK schemes and controllers based on digital signal processing, extending classical PI and PID controllers. We demonstrate that the optimization process yields controllers that are at least as good as the best controllers chosen from a wide range of suggestions available for classical explicit and implicit time integration methods.
Energy-based Potential Games for Joint Motion Forecasting and Control
Authors: Authors: Christopher Diehl, Tobias Klosek, Martin Krüger, Nils Murzyn, Timo Osterburg, Torsten Bertram
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.01811
Pdf link: https://arxiv.org/pdf/2312.01811
Abstract This work uses game theory as a mathematical framework to address interaction modeling in multi-agent motion forecasting and control. Despite its interpretability, applying game theory to real-world robotics, like automated driving, faces challenges such as unknown game parameters. To tackle these, we establish a connection between differential games, optimal control, and energy-based models, demonstrating how existing approaches can be unified under our proposed Energy-based Potential Game formulation. Building upon this, we introduce a new end-to-end learning application that combines neural networks for game-parameter inference with a differentiable game-theoretic optimization layer, acting as an inductive bias. The analysis provides empirical evidence that the game-theoretic layer adds interpretability and improves the predictive performance of various neural network backbones using two simulations and two real-world driving datasets.
Class Symbolic Regression: Gotta Fit 'Em All
Authors: Authors: Wassim Tenachi, Rodrigo Ibata, Thibaut L. François, Foivos I. Diakogiannis
Subjects: Machine Learning (cs.LG); Astrophysics of Galaxies (astro-ph.GA); Instrumentation and Methods for Astrophysics (astro-ph.IM); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2312.01816
Pdf link: https://arxiv.org/pdf/2312.01816
Abstract We introduce "Class Symbolic Regression" a first framework for automatically finding a single analytical functional form that accurately fits multiple datasets - each governed by its own (possibly) unique set of fitting parameters. This hierarchical framework leverages the common constraint that all the members of a single class of physical phenomena follow a common governing law. Our approach extends the capabilities of our earlier Physical Symbolic Optimization ($\Phi$-SO) framework for Symbolic Regression, which integrates dimensional analysis constraints and deep reinforcement learning for symbolic analytical function discovery from data. We demonstrate the efficacy of this novel approach by applying it to a panel of synthetic toy case datasets and showcase its practical utility for astrophysics by successfully extracting an analytic galaxy potential from a set of simulated orbits approximating stellar streams.
SPECRUN: The Danger of Speculative Runahead Execution in Processors
Authors: Authors: Chaoqun Shen, Gang Qu, Jiliang Zhang
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2312.01832
Pdf link: https://arxiv.org/pdf/2312.01832
Abstract Runahead execution is a continuously evolving microarchitectural technique for processor performance. This paper introduces the first transient execution attack on the runahead execution, called SPECRUN, which exploits the unresolved branch prediction during runahead execution. We show that SPECRUN eliminates the limitation on the number of transient instructions posed by the reorder buffer size, enhancing the exploitability and harmfulness of the attack. We concretely demonstrate a proof-of-concept attack that causes leaking secrets from a victim process, validate the merit of SPECRUN, and design a secure runahead execution scheme. This paper highlights the need to consider the security of potential optimization techniques before implementing them in a processor.
TCP Slice: A semi-distributed TCP algorithm for Delay-constrained Applications
Authors: Authors: Dibbendu Roy, Goutam Das
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.01869
Pdf link: https://arxiv.org/pdf/2312.01869
Abstract The TCP congestion control protocol serves as the cornerstone of reliable internet communication. However, as new applications require more specific guarantees regarding data rate and delay, network management must adapt. Thus, service providers are shifting from decentralized to centralized control of the network using a software-defined network controller (SDN). The SDN classifies applications and allocates logically separate resources called slices, over the physical network. We propose TCP Slice, a congestion control algorithm that meets specific delay and bandwidth guarantees. Obtaining closed-form delay bounds for a client is challenging due to dependencies on other clients and their traffic stochasticity. We use network calculus to derive the client's delay bound and incorporate it as a constraint in the Network Utility Maximization problem. We solve the resulting optimization using dual decomposition and obtain a semi-distributed TCP protocol that can be implemented with the help of SDN controller and the use of an Explicit Congestion Notification (ECN) bit. Additionally, we also propose a proactive approach for congestion control using digital twin. TCP Slice represents a significant step towards accommodating evolving internet traffic patterns and the need for better network management in the face of increasing application diversity.
CaRL: Cascade Reinforcement Learning with State Space Splitting for O-RAN based Traffic Steering
Authors: Authors: Chuanneng Sun, Yu Zhou, Gueyoung Jung, Tuyen Xuan Tran, Dario Pompili
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.01970
Pdf link: https://arxiv.org/pdf/2312.01970
Abstract The Open Radio Access Network (O-RAN) architecture empowers intelligent and automated optimization of the RAN through applications deployed on the RAN Intelligent Controller (RIC) platform, enabling capabilities beyond what is achievable with traditional RAN solutions. Within this paradigm, Traffic Steering (TS) emerges as a pivotal RIC application that focuses on optimizing cell-level mobility settings in near-real-time, aiming to significantly improve network spectral efficiency. In this paper, we design a novel TS algorithm based on a Cascade Reinforcement Learning (CaRL) framework. We propose state space factorization and policy decomposition to reduce the need for large models and well-labeled datasets. For each sub-state space, an RL sub-policy will be trained to learn an optimized mapping onto the action space. To apply CaRL on new network regions, we propose a knowledge transfer approach to initialize a new sub-policy based on knowledge learned by the trained policies. To evaluate CaRL, we build a data-driven and scalable RIC digital twin (DT) that is modeled using important real-world data, including network configuration, user geo-distribution, and traffic demand, among others, from a tier-1 mobile operator in the US. We evaluate CaRL on two DT scenarios representing two network clusters in two different cities and compare its performance with the business-as-usual (BAU) policy and other competing optimization approaches using heuristic and Q-table algorithms. Benchmarking results show that CaRL performs the best and improves the average cluster-aggregated downlink throughput over the BAU policy by 24% and 18% in these two scenarios, respectively.
Tuning of Online Feedback Optimization for setpoint tracking in centrifugal compressors
Authors: Authors: Marta Zagorowska, Lukas Ortmann, Alisa Rupenyan, Mehmet Mercangoez, Lars Imsland
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.01996
Pdf link: https://arxiv.org/pdf/2312.01996
Abstract Online Feedback Optimization (OFO) controllers steer a system to its optimal operating point by treating optimization algorithms as auxiliary dynamic systems. Implementation of OFO controllers requires setting the parameters of the optimization algorithm that allows reaching convergence, posing a challenge because the convergence of the optimization algorithm is often decoupled from the performance of the controlled system. OFO controllers are also typically designed to ensure steady-state tracking by fixing the sampling time to be longer than the time constants of the system. In this paper, we first quantify the impact of OFO parameters and the sampling time on the tracking error and number of oscillations of the controlled system, showing that adjusting them allows good tracking without reaching steady states. We then propose a tuning method for the sampling time of the OFO controller together with the parameters to allow tracking fast trajectories while reducing oscillations. We validate the proposed tuning approach in a pressure controller in a centrifugal compressor, tracking trajectories faster than the time needed to reach the steady state by the compressor. The results of the validation confirm that simultaneous tuning of the sampling time and the parameters of OFO yields up to 87% times better tracking performance than manual tuning based on steady state.
Optimal Data Generation in Multi-Dimensional Parameter Spaces, using Bayesian Optimization
Authors: Authors: M. R. Mahani, Igor A. Nechepurenko, Yasmin Rahimof, Andreas Wicht
Subjects: Machine Learning (cs.LG); Applied Physics (physics.app-ph); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2312.02012
Pdf link: https://arxiv.org/pdf/2312.02012
Abstract Acquiring a substantial number of data points for training accurate machine learning (ML) models is a big challenge in scientific fields where data collection is resource-intensive. Here, we propose a novel approach for constructing a minimal yet highly informative database for training ML models in complex multi-dimensional parameter spaces. To achieve this, we mimic the underlying relation between the output and input parameters using Gaussian process regression (GPR). Using a set of known data, GPR provides predictive means and standard deviation for the unknown data. Given the predicted standard deviation by GPR, we select data points using Bayesian optimization to obtain an efficient database for training ML models. We compare the performance of ML models trained on databases obtained through this method, with databases obtained using traditional approaches. Our results demonstrate that the ML models trained on the database obtained using Bayesian optimization approach consistently outperform the other two databases, achieving high accuracy with a significantly smaller number of data points. Our work contributes to the resource-efficient collection of data in high-dimensional complex parameter spaces, to achieve high precision machine learning predictions.
Distributed Optimization with Feasible Set Privacy
Authors: Authors: Shreya Meel, Sennur Ulukus
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.02112
Pdf link: https://arxiv.org/pdf/2312.02112
Abstract We consider the setup of a constrained optimization problem with two agents $E_1$ and $E_2$ who jointly wish to learn the optimal solution set while keeping their feasible sets $\mathcal{P}_1$ and $\mathcal{P}_2$ private from each other. The objective function $f$ is globally known and each feasible set is a collection of points from a global alphabet. We adopt a sequential symmetric private information retrieval (SPIR) framework where one of the agents (say $E_1$) privately checks in $\mathcal{P}_2$, the presence of candidate solutions of the problem constrained to $\mathcal{P}_1$ only, while learning no further information on $\mathcal{P}_2$ than the solution alone. Further, we extract an information theoretically private threshold PSI (ThPSI) protocol from our scheme and characterize its download cost. We show that, compared to privately acquiring the feasible set $\mathcal{P}_1\cap \mathcal{P}_2$ using an SPIR-based private set intersection (PSI) protocol, and finding the optimum, our scheme is better as it incurs less information leakage and less download cost than the former. Over all possible uniform mappings of $f$ to a fixed range of values, our scheme outperforms the former with a high probability.
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Authors: Authors: Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon Luiten
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.02126
Pdf link: https://arxiv.org/pdf/2312.02126
Abstract Dense simultaneous localization and mapping (SLAM) is pivotal for embodied scene understanding. Recent work has shown that 3D Gaussians enable high-quality reconstruction and real-time rendering of scenes using multiple posed cameras. In this light, we show for the first time that representing a scene by 3D Gaussians can enable dense SLAM using a single unposed monocular RGB-D camera. Our method, SplaTAM, addresses the limitations of prior radiance field-based representations, including fast rendering and optimization, the ability to determine if areas have been previously mapped, and structured map expansion by adding more Gaussians. We employ an online tracking and mapping pipeline while tailoring it to specifically use an underlying Gaussian representation and silhouette-guided optimization via differentiable rendering. Extensive experiments show that SplaTAM achieves up to 2X state-of-the-art performance in camera pose estimation, map construction, and novel-view synthesis, demonstrating its superiority over existing approaches, while allowing real-time rendering of a high-resolution dense 3D map.
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Authors: Authors: Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, Liqiang Nie
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02134
Pdf link: https://arxiv.org/pdf/2312.02134
Abstract We present GaussianAvatar, an efficient approach to creating realistic human avatars with dynamic 3D appearances from a single video. We start by introducing animatable 3D Gaussians to explicitly represent humans in various poses and clothing styles. Such an explicit and animatable representation can fuse 3D appearances more efficiently and consistently from 2D observations. Our representation is further augmented with dynamic properties to support pose-dependent appearance modeling, where a dynamic appearance network along with an optimizable feature tensor is designed to learn the motion-to-appearance mapping. Moreover, by leveraging the differentiable motion condition, our method enables a joint optimization of motions and appearances during avatar modeling, which helps to tackle the long-standing issue of inaccurate motion estimation in monocular settings. The efficacy of GaussianAvatar is validated on both the public dataset and our collected dataset, demonstrating its superior performances in terms of appearance quality and rendering efficiency.
iMatching: Imperative Correspondence Learning
Authors: Authors: Zitong Zhan, Dasong Gao, Yun-Jou Lin, Youjie Xia, Chen Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02141
Pdf link: https://arxiv.org/pdf/2312.02141
Abstract Learning feature correspondence is a foundational task in computer vision, holding immense importance for downstream applications such as visual odometry and 3D reconstruction. Despite recent progress in data-driven models, feature correspondence learning is still limited by the lack of accurate per-pixel correspondence labels. To overcome this difficulty, we introduce a new self-supervised scheme, imperative learning (IL), for training feature correspondence. It enables correspondence learning on arbitrary uninterrupted videos without any camera pose or depth labels, heralding a new era for self-supervised correspondence learning. Specifically, we formulated the problem of correspondence learning as a bilevel optimization, which takes the reprojection error from bundle adjustment as a supervisory signal for the model. To avoid large memory and computation overhead, we leverage the stationary point to effectively back-propagate the implicit gradients through bundle adjustment. Through extensive experiments, we demonstrate superior performance on tasks including feature matching and pose estimation, in which we obtained an average of 30% accuracy gain over the state-of-the-art matching models.
Optimizing Camera Configurations for Multi-View Pedestrian Detection
Authors: Authors: Yunzhong Hou, Xingjian Leng, Tom Gedeon, Liang Zheng
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02144
Pdf link: https://arxiv.org/pdf/2312.02144
Abstract Jointly considering multiple camera views (multi-view) is very effective for pedestrian detection under occlusion. For such multi-view systems, it is critical to have well-designed camera configurations, including camera locations, directions, and fields-of-view (FoVs). Usually, these configurations are crafted based on human experience or heuristics. In this work, we present a novel solution that features a transformer-based camera configuration generator. Using reinforcement learning, this generator autonomously explores vast combinations within the action space and searches for configurations that give the highest detection accuracy according to the training dataset. The generator learns advanced techniques like maximizing coverage, minimizing occlusion, and promoting collaboration. Across multiple simulation scenarios, the configurations generated by our transformer-based model consistently outperform random search, heuristic-based methods, and configurations designed by human experts, shedding light on future camera layout optimization.
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis
Authors: Authors: Shunyuan Zheng, Boyao Zhou, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, Yebin Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02155
Pdf link: https://arxiv.org/pdf/2312.02155
Abstract We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting. Unlike the original Gaussian Splatting or neural implicit rendering methods that necessitate per-subject optimizations, we introduce Gaussian parameter maps defined on the source views and regress directly Gaussian Splatting properties for instant novel view synthesis without any fine-tuning or optimization. To this end, we train our Gaussian parameter regression module on a large amount of human scan data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable and experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
Mesh-Guided Neural Implicit Field Editing
Authors: Authors: Can Wang, Mingming He, Menglei Chai, Dongdong Chen, Jing Liao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02157
Pdf link: https://arxiv.org/pdf/2312.02157
Abstract Neural implicit fields have emerged as a powerful 3D representation for reconstructing and rendering photo-realistic views, yet they possess limited editability. Conversely, explicit 3D representations, such as polygonal meshes, offer ease of editing but may not be as suitable for rendering high-quality novel views. To harness the strengths of both representations, we propose a new approach that employs a mesh as a guiding mechanism in editing the neural radiance field. We first introduce a differentiable method using marching tetrahedra for polygonal mesh extraction from the neural implicit field and then design a differentiable color extractor to assign colors obtained from the volume renderings to this extracted mesh. This differentiable colored mesh allows gradient back-propagation from the explicit mesh to the implicit fields, empowering users to easily manipulate the geometry and color of neural implicit fields. To enhance user control from coarse-grained to fine-grained levels, we introduce an octree-based structure into its optimization. This structure prioritizes the edited regions and the surface part, making our method achieve fine-grained edits to the neural implicit field and accommodate various user modifications, including object additions, component removals, specific area deformations, and adjustments to local and global colors. Through extensive experiments involving diverse scenes and editing operations, we have demonstrated the capabilities and effectiveness of our method. Our project page is: \url{https://cassiepython.github.io/MNeuEdit/}
Keyword: adam

AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix
Authors: Authors: Yun Yue, Zhiling Ye, Jiadi Jiang, Yongchao Liu, Ke Zhang
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.01658
Pdf link: https://arxiv.org/pdf/2312.01658
Abstract Adaptive optimizers, such as Adam, have achieved remarkable success in deep learning. A key component of these optimizers is the so-called preconditioning matrix, providing enhanced gradient information and regulating the step size of each gradient direction. In this paper, we propose a novel approach to designing the preconditioning matrix by utilizing the gradient difference between two successive steps as the diagonal elements. These diagonal elements are closely related to the Hessian and can be perceived as an approximation of the inner product between the Hessian row vectors and difference of the adjacent parameter vectors. Additionally, we introduce an auto-switching function that enables the preconditioning matrix to switch dynamically between Stochastic Gradient Descent (SGD) and the adaptive optimizer. Based on these two techniques, we develop a new optimizer named AGD that enhances the generalization performance. We evaluate AGD on public datasets of Natural Language Processing (NLP), Computer Vision (CV), and Recommendation Systems (RecSys). Our experimental results demonstrate that AGD outperforms the state-of-the-art (SOTA) optimizers, achieving highly competitive or significantly better predictive performance. Furthermore, we analyze how AGD is able to switch automatically between SGD and the adaptive optimizer and its actual effects on various scenarios. The code is available at https://github.com/intelligent-machine-learning/dlrover/tree/master/atorch/atorch/optimizers.
Keyword: gradient

Learning-based Ecological Adaptive Cruise Control of Autonomous Electric Vehicles: A Comparison of ADP, DQN and DDPG Approaches
Authors: Authors: Sunwoo Kim, Kwang-Ki K. Kim
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.01004
Pdf link: https://arxiv.org/pdf/2312.01004
Abstract This paper presents model-based and model-free learning methods for economic and ecological adaptive cruise control (Eco-ACC) of connected and autonomous electric vehicles. For model-based optimal control of Eco-ACC, we considered longitudinal vehicle dynamics and a quasi-steady-state powertrain model including the physical limits of a commercial electric vehicle. We used adaptive dynamic programming (ADP), in which the value function was trained using data obtained from IPG CarMaker simulations. For real-time implementation, forward multi-step look-ahead prediction and optimization were executed in a receding horizon scheme to maximize the energy efficiency of the electric machine while avoiding rear-end collisions and satisfying the powertrain, speed, and distance-gap constraints. For model-free optimal control of Eco-ACC, we applied two reinforcement learning methods, Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG), in which deep neural networks were trained in IPG CarMaker simulations. For performance demonstrations, the HWFET, US06, and WLTP Class 3b driving cycles were used to simulate the front vehicle, and the energy consumptions of the host vehicle and front vehicle were compared. In high-fidelity IPG CarMaker simulations, the proposed learning-based Eco-ACC methods demonstrated approximately 3-5% and 10-14% efficiency improvements in highway and city-highway driving scenarios, respectively, compared with the front vehicle. A video of the CarMaker simulation is available at https://youtu.be/DIXzJxMVig8.
PROFL: A Privacy-Preserving Federated Learning Method with Stringent Defense Against Poisoning Attacks
Authors: Authors: Yisheng Zhong, Li-Ping Wang
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.01045
Pdf link: https://arxiv.org/pdf/2312.01045
Abstract Federated Learning (FL) faces two major issues: privacy leakage and poisoning attacks, which may seriously undermine the reliability and security of the system. Overcoming them simultaneously poses a great challenge. This is because privacy protection policies prohibit access to users' local gradients to avoid privacy leakage, while Byzantine-robust methods necessitate access to these gradients to defend against poisoning attacks. To address these problems, we propose a novel privacy-preserving Byzantine-robust FL framework PROFL. PROFL is based on the two-trapdoor additional homomorphic encryption algorithm and blinding techniques to ensure the data privacy of the entire FL process. During the defense process, PROFL first utilize secure Multi-Krum algorithm to remove malicious gradients at the user level. Then, according to the Pauta criterion, we innovatively propose a statistic-based privacy-preserving defense algorithm to eliminate outlier interference at the feature level and resist impersonation poisoning attacks with stronger concealment. Detailed theoretical analysis proves the security and efficiency of the proposed method. We conducted extensive experiments on two benchmark datasets, and PROFL improved accuracy by 39% to 75% across different attack settings compared to similar privacy-preserving robust methods, demonstrating its significant advantage in robustness.
Spectrum-driven Mixed-frequency Network for Hyperspectral Salient Object Detection
Authors: Authors: Peifu Liu, Tingfa Xu, Huan Chen, Shiyun Zhou, Haolin Qin, Jianan Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.01060
Pdf link: https://arxiv.org/pdf/2312.01060
Abstract Hyperspectral salient object detection (HSOD) aims to detect spectrally salient objects in hyperspectral images (HSIs). However, existing methods inadequately utilize spectral information by either converting HSIs into false-color images or converging neural networks with clustering. We propose a novel approach that fully leverages the spectral characteristics by extracting two distinct frequency components from the spectrum: low-frequency Spectral Saliency and high-frequency Spectral Edge. The Spectral Saliency approximates the region of salient objects, while the Spectral Edge captures edge information of salient objects. These two complementary components, crucial for HSOD, are derived by computing from the inter-layer spectral angular distance of the Gaussian pyramid and the intra-neighborhood spectral angular gradients, respectively. To effectively utilize this dual-frequency information, we introduce a novel lightweight Spectrum-driven Mixed-frequency Network (SMN). SMN incorporates two parameter-free plug-and-play operators, namely Spectral Saliency Generator and Spectral Edge Operator, to extract the Spectral Saliency and Spectral Edge components from the input HSI independently. Subsequently, the Mixed-frequency Attention module, comprised of two frequency-dependent heads, intelligently combines the embedded features of edge and saliency information, resulting in a mixed-frequency feature representation. Furthermore, a saliency-edge-aware decoder progressively scales up the mixed-frequency feature while preserving rich detail and saliency information for accurate salient object prediction. Extensive experiments conducted on the HS-SOD benchmark and our custom dataset HSOD-BIT demonstrate that our SMN outperforms state-of-the-art methods regarding HSOD performance. Code and dataset will be available at https://github.com/laprf/SMN.
Pointer Networks Trained Better via Evolutionary Algorithms
Authors: Authors: Muyao Zhong, Shengcai Liu, Bingdong Li, Haobo Fu, Chao Qian, Ke Tand, Peng Yang
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2312.01150
Pdf link: https://arxiv.org/pdf/2312.01150
Abstract Pointer Network (PtrNet) is a specific neural network for solving Combinatorial Optimization Problems (COPs). While PtrNets offer real-time feed-forward inference for complex COPs instances, its quality of the results tends to be less satisfactory. One possible reason is that such issue suffers from the lack of global search ability of the gradient descent, which is frequently employed in traditional PtrNet training methods including both supervised learning and reinforcement learning. To improve the performance of PtrNet, this paper delves deeply into the advantages of training PtrNet with Evolutionary Algorithms (EAs), which have been widely acknowledged for not easily getting trapped by local optima. Extensive empirical studies based on the Travelling Salesman Problem (TSP) have been conducted. Results demonstrate that PtrNet trained with EA can consistently perform much better inference results than eight state-of-the-art methods on various problem scales. Compared with gradient descent based PtrNet training methods, EA achieves up to 30.21\% improvement in quality of the solution with the same computational time. With this advantage, this paper is able to at the first time report the results of solving 1000-dimensional TSPs by training a PtrNet on the same dimensionality, which strongly suggests that scaling up the training instances is in need to improve the performance of PtrNet on solving higher-dimensional COPs.
Efficient Expansion and Gradient Based Task Inference for Replay Free Incremental Learning
Authors: Authors: Soumya Roy, Vinay K Verma, Deepak Gupta
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.01188
Pdf link: https://arxiv.org/pdf/2312.01188
Abstract This paper proposes a simple but highly efficient expansion-based model for continual learning. The recent feature transformation, masking and factorization-based methods are efficient, but they grow the model only over the global or shared parameter. Therefore, these approaches do not fully utilize the previously learned information because the same task-specific parameter forgets the earlier knowledge. Thus, these approaches show limited transfer learning ability. Moreover, most of these models have constant parameter growth for all tasks, irrespective of the task complexity. Our work proposes a simple filter and channel expansion based method that grows the model over the previous task parameters and not just over the global parameter. Therefore, it fully utilizes all the previously learned information without forgetting, which results in better knowledge transfer. The growth rate in our proposed model is a function of task complexity; therefore for a simple task, the model has a smaller parameter growth while for complex tasks, the model requires more parameters to adapt to the current task. Recent expansion based models show promising results for task incremental learning (TIL). However, for class incremental learning (CIL), prediction of task id is a crucial challenge; hence, their results degrade rapidly as the number of tasks increase. In this work, we propose a robust task prediction method that leverages entropy weighted data augmentations and the models gradient using pseudo labels. We evaluate our model on various datasets and architectures in the TIL, CIL and generative continual learning settings. The proposed approach shows state-of-the-art results in all these settings. Our extensive ablation studies show the efficacy of the proposed components.
Rethinking PGD Attack: Is Sign Function Necessary?
Authors: Authors: Junjie Yang, Tianlong Chen, Xuxi Chen, Zhangyang Wang, Yingbin Liang
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.01260
Pdf link: https://arxiv.org/pdf/2312.01260
Abstract Neural networks have demonstrated success in various domains, yet their performance can be significantly degraded by even a small input perturbation. Consequently, the construction of such perturbations, known as adversarial attacks, has gained significant attention, many of which fall within "white-box" scenarios where we have full access to the neural network. Existing attack algorithms, such as the projected gradient descent (PGD), commonly take the sign function on the raw gradient before updating adversarial inputs, thereby neglecting gradient magnitude information. In this paper, we present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance, as well as its caveat. We also interpret why previous attempts of directly using raw gradients failed. Based on that, we further propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign. Specifically, we convert the constrained optimization problem into an unconstrained one, by introducing a new hidden variable of non-clipped perturbation that can move beyond the constraint. The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments, outperforming PGD and other competitors in various settings, without incurring any additional computational overhead. The codes is available in https://github.com/JunjieYang97/RGD.
Learning to Compose SuperWeights for Neural Parameter Allocation Search
Authors: Authors: Piotr Teterwak, Soren Nelson, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.01274
Pdf link: https://arxiv.org/pdf/2312.01274
Abstract Neural parameter allocation search (NPAS) automates parameter sharing by obtaining weights for a network given an arbitrary, fixed parameter budget. Prior work has two major drawbacks we aim to address. First, there is a disconnect in the sharing pattern between the search and training steps, where weights are warped for layers of different sizes during the search to measure similarity, but not during training, resulting in reduced performance. To address this, we generate layer weights by learning to compose sets of SuperWeights, which represent a group of trainable parameters. These SuperWeights are created to be large enough so they can be used to represent any layer in the network, but small enough that they are computationally efficient. The second drawback we address is the method of measuring similarity between shared parameters. Whereas prior work compared the weights themselves, we argue this does not take into account the amount of conflict between the shared weights. Instead, we use gradient information to identify layers with shared weights that wish to diverge from each other. We demonstrate that our SuperWeight Networks consistently boost performance over the state-of-the-art on the ImageNet and CIFAR datasets in the NPAS setting. We further show that our approach can generate parameters for many network architectures using the same set of weights. This enables us to support tasks like efficient ensembling and anytime prediction, outperforming fully-parameterized ensembles with 17% fewer parameters.
Task-Oriented Edge Networks: Decentralized Learning Over Wireless Fronthaul
Authors: Authors: Hoon Lee, Seung-Wook Kim
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.01288
Pdf link: https://arxiv.org/pdf/2312.01288
Abstract This paper studies task-oriented edge networks where multiple edge internet-of-things nodes execute machine learning tasks with the help of powerful deep neural networks (DNNs) at a network cloud. Separate edge nodes (ENs) result in a partially observable system where they can only get partitioned features of the global network states. These local observations need to be forwarded to the cloud via resource-constrained wireless fronthual links. Individual ENs compress their local observations into uplink fronthaul messages using task-oriented encoder DNNs. Then, the cloud carries out a remote inference task by leveraging received signals. Such a distributed topology requests a decentralized training and decentralized execution (DTDE) learning framework for designing edge-cloud cooperative inference rules and their decentralized training strategies. First, we develop fronthaul-cooperative DNN architecture along with proper uplink coordination protocols suitable for wireless fronthaul interconnection. Inspired by the nomographic function, an efficient cloud inference model becomes an integration of a number of shallow DNNs. This modulized architecture brings versatile calculations that are independent of the number of ENs. Next, we present a decentralized training algorithm of separate edge-cloud DNNs over downlink wireless fronthaul channels. An appropriate downlink coordination protocol is proposed, which backpropagates gradient vectors wirelessly from the cloud to the ENs.
SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System
Authors: Authors: Yunfei Fan, Tianyu Zhao, Guidong Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.01616
Pdf link: https://arxiv.org/pdf/2312.01616
Abstract Accuracy and computational efficiency are the most important metrics to Visual Inertial Navigation System (VINS). The existing VINS algorithms with either high accuracy or low computational complexity, are difficult to provide the high precision localization in resource-constrained devices. To this end, we propose a novel filter-based VINS framework named SchurVINS, which could guarantee both high accuracy by building a complete residual model and low computational complexity with Schur complement. Technically, we first formulate the full residual model where Gradient, Hessian and observation covariance are explicitly modeled. Then Schur complement is employed to decompose the full model into ego-motion residual model and landmark residual model. Finally, Extended Kalman Filter (EKF) update is implemented in these two models with high efficiency. Experiments on EuRoC and TUM-VI datasets show that our method notably outperforms state-of-the-art (SOTA) methods in both accuracy and computational complexity. We will open source our experimental code to benefit the community.
Heroes: Lightweight Federated Learning with Neural Composition and Adaptive Local Update in Heterogeneous Edge Networks
Authors: Authors: Jiaming Yan, Jianchun Liu, Shilong Wang, Hongli Xu, Haifeng Liu, Jianhua Zhou
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2312.01617
Pdf link: https://arxiv.org/pdf/2312.01617
Abstract Federated Learning (FL) enables distributed clients to collaboratively train models without exposing their private data. However, it is difficult to implement efficient FL due to limited resources. Most existing works compress the transmitted gradients or prune the global model to reduce the resource cost, but leave the compressed or pruned parameters under-optimized, which degrades the training performance. To address this issue, the neural composition technique constructs size-adjustable models by composing low-rank tensors, allowing every parameter in the global model to learn the knowledge from all clients. Nevertheless, some tensors can only be optimized by a small fraction of clients, thus the global model may get insufficient training, leading to a long completion time, especially in heterogeneous edge scenarios. To this end, we enhance the neural composition technique, enabling all parameters to be fully trained. Further, we propose a lightweight FL framework, called Heroes, with enhanced neural composition and adaptive local update. A greedy-based algorithm is designed to adaptively assign the proper tensors and local update frequencies for participating clients according to their heterogeneous capabilities and resource budgets. Extensive experiments demonstrate that Heroes can reduce traffic consumption by about 72.05\% and provide up to 2.97$\times$ speedup compared to the baselines.
On Tuning Neural ODE for Stability, Consistency and Faster Convergence
Authors: Authors: Sheikh Waqas Akhtar
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.01657
Pdf link: https://arxiv.org/pdf/2312.01657
Abstract Neural-ODE parameterize a differential equation using continuous depth neural network and solve it using numerical ODE-integrator. These models offer a constant memory cost compared to models with discrete sequence of hidden layers in which memory cost increases linearly with the number of layers. In addition to memory efficiency, other benefits of neural-ode include adaptability of evaluation approach to input, and flexibility to choose numerical precision or fast training. However, despite having all these benefits, it still has some limitations. We identify the ODE-integrator (also called ODE-solver) as the weakest link in the chain as it may have stability, consistency and convergence (CCS) issues and may suffer from slower convergence or may not converge at all. We propose a first-order Nesterov's accelerated gradient (NAG) based ODE-solver which is proven to be tuned vis-a-vis CCS conditions. We empirically demonstrate the efficacy of our approach by training faster, while achieving better or comparable performance against neural-ode employing other fixed-step explicit ODE-solvers as well discrete depth models such as ResNet in three different tasks including supervised classification, density estimation, and time-series modelling.
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix
Authors: Authors: Yun Yue, Zhiling Ye, Jiadi Jiang, Yongchao Liu, Ke Zhang
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.01658
Pdf link: https://arxiv.org/pdf/2312.01658
Abstract Adaptive optimizers, such as Adam, have achieved remarkable success in deep learning. A key component of these optimizers is the so-called preconditioning matrix, providing enhanced gradient information and regulating the step size of each gradient direction. In this paper, we propose a novel approach to designing the preconditioning matrix by utilizing the gradient difference between two successive steps as the diagonal elements. These diagonal elements are closely related to the Hessian and can be perceived as an approximation of the inner product between the Hessian row vectors and difference of the adjacent parameter vectors. Additionally, we introduce an auto-switching function that enables the preconditioning matrix to switch dynamically between Stochastic Gradient Descent (SGD) and the adaptive optimizer. Based on these two techniques, we develop a new optimizer named AGD that enhances the generalization performance. We evaluate AGD on public datasets of Natural Language Processing (NLP), Computer Vision (CV), and Recommendation Systems (RecSys). Our experimental results demonstrate that AGD outperforms the state-of-the-art (SOTA) optimizers, achieving highly competitive or significantly better predictive performance. Furthermore, we analyze how AGD is able to switch automatically between SGD and the adaptive optimizer and its actual effects on various scenarios. The code is available at https://github.com/intelligent-machine-learning/dlrover/tree/master/atorch/atorch/optimizers.
Tab-Attention: Self-Attention-based Stacked Generalization for Imbalanced Credit Default Prediction
Authors: Authors: Yandan Tan, Hongbin Zhu, JieWu, Hongfeng Chai
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2312.01688
Pdf link: https://arxiv.org/pdf/2312.01688
Abstract Accurately credit default prediction faces challenges due to imbalanced data and low correlation between features and labels. Existing default prediction studies on the basis of gradient boosting decision trees (GBDT), deep learning techniques, and feature selection strategies can have varying degrees of success depending on the specific task. Motivated by this, we propose Tab-Attention, a novel self-attention-based stacked generalization method for credit default prediction. This approach ensembles the potential proprietary knowledge contributions from multi-view feature spaces, to cope with low feature correlation and imbalance. We organize multi-view feature spaces according to the latent linear or nonlinear strengths between features and labels. Meanwhile, the f1 score assists the model in imbalance training to find the optimal state for identifying minority default samples. Our Tab-Attention achieves superior Recall_1 and f1_1 of default intention recognition than existing GBDT-based models and advanced deep learning by about 32.92% and 16.05% on average, respectively, while maintaining outstanding overall performance and prediction performance for non-default samples. The proposed method could ensemble essential knowledge through the self-attention mechanism, which is of great significance for a more robust future prediction system.
On Gradient Boosted Decision Trees and Neural Rankers: A Case-Study on Short-Video Recommendations at ShareChat
Authors: Authors: Olivier Jeunen, Hitesh Sagtani, Himanshu Doi, Rasul Karimov, Neeti Pokharna, Danish Kalim, Aleksei Ustimenko, Christopher Green, Wenzhe Shi, Rishabh Mehrotra
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2312.01760
Pdf link: https://arxiv.org/pdf/2312.01760
Abstract Practitioners who wish to build real-world applications that rely on ranking models, need to decide which modelling paradigm to follow. This is not an easy choice to make, as the research literature on this topic has been shifting in recent years. In particular, whilst Gradient Boosted Decision Trees (GBDTs) have reigned supreme for more than a decade, the flexibility of neural networks has allowed them to catch up, and recent works report accuracy metrics that are on par. Nevertheless, practical systems require considerations beyond mere accuracy metrics to decide on a modelling approach. This work describes our experiences in balancing some of the trade-offs that arise, presenting a case study on a short-video recommendation application. We highlight (1) neural networks' ability to handle large training data size, user- and item-embeddings allows for more accurate models than GBDTs in this setting, and (2) because GBDTs are less reliant on specialised hardware, they can provide an equally accurate model at a lower cost. We believe these findings are of relevance to researchers in both academia and industry, and hope they can inspire practitioners who need to make similar modelling choices in the future.
iMatching: Imperative Correspondence Learning
Authors: Authors: Zitong Zhan, Dasong Gao, Yun-Jou Lin, Youjie Xia, Chen Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02141
Pdf link: https://arxiv.org/pdf/2312.02141
Abstract Learning feature correspondence is a foundational task in computer vision, holding immense importance for downstream applications such as visual odometry and 3D reconstruction. Despite recent progress in data-driven models, feature correspondence learning is still limited by the lack of accurate per-pixel correspondence labels. To overcome this difficulty, we introduce a new self-supervised scheme, imperative learning (IL), for training feature correspondence. It enables correspondence learning on arbitrary uninterrupted videos without any camera pose or depth labels, heralding a new era for self-supervised correspondence learning. Specifically, we formulated the problem of correspondence learning as a bilevel optimization, which takes the reprojection error from bundle adjustment as a supervisory signal for the model. To avoid large memory and computation overhead, we leverage the stationary point to effectively back-propagate the implicit gradients through bundle adjustment. Through extensive experiments, we demonstrate superior performance on tasks including feature matching and pose estimation, in which we obtained an average of 30% accuracy gain over the state-of-the-art matching models.
Readout Guidance: Learning Control from Diffusion Features
Authors: Authors: Grace Luo, Trevor Darrell, Oliver Wang, Dan B Goldman, Aleksander Holynski
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02150
Pdf link: https://arxiv.org/pdf/2312.02150
Abstract We present Readout Guidance, a method for controlling text-to-image diffusion models with learned signals. Readout Guidance uses readout heads, lightweight networks trained to extract signals from the features of a pre-trained, frozen diffusion model at every timestep. These readouts can encode single-image properties, such as pose, depth, and edges; or higher-order properties that relate multiple images, such as correspondence and appearance similarity. Furthermore, by comparing the readout estimates to a user-defined target, and back-propagating the gradient through the readout head, these estimates can be used to guide the sampling process. Compared to prior methods for conditional generation, Readout Guidance requires significantly fewer added parameters and training samples, and offers a convenient and simple recipe for reproducing different forms of conditional control under a single framework, with a single architecture and sampling procedure. We showcase these benefits in the applications of drag-based manipulation, identity-consistent generation, and spatially aligned control. Project page: https://readout-guidance.github.io.
Mesh-Guided Neural Implicit Field Editing
Authors: Authors: Can Wang, Mingming He, Menglei Chai, Dongdong Chen, Jing Liao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02157
Pdf link: https://arxiv.org/pdf/2312.02157
Abstract Neural implicit fields have emerged as a powerful 3D representation for reconstructing and rendering photo-realistic views, yet they possess limited editability. Conversely, explicit 3D representations, such as polygonal meshes, offer ease of editing but may not be as suitable for rendering high-quality novel views. To harness the strengths of both representations, we propose a new approach that employs a mesh as a guiding mechanism in editing the neural radiance field. We first introduce a differentiable method using marching tetrahedra for polygonal mesh extraction from the neural implicit field and then design a differentiable color extractor to assign colors obtained from the volume renderings to this extracted mesh. This differentiable colored mesh allows gradient back-propagation from the explicit mesh to the implicit fields, empowering users to easily manipulate the geometry and color of neural implicit fields. To enhance user control from coarse-grained to fine-grained levels, we introduce an octree-based structure into its optimization. This structure prioritizes the edited regions and the surface part, making our method achieve fine-grained edits to the neural implicit field and accommodate various user modifications, including object additions, component removals, specific area deformations, and adjustments to local and global colors. Through extensive experiments involving diverse scenes and editing operations, we have demonstrated the capabilities and effectiveness of our method. Our project page is: \url{https://cassiepython.github.io/MNeuEdit/}
Keyword: super-resolution

Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution
Authors: Authors: Xi Yang, Chenhang He, Jianqi Ma, Lei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.00853
Pdf link: https://arxiv.org/pdf/2312.00853
Abstract Real-world low-resolution (LR) videos have diverse and complex degradations, imposing great challenges on video super-resolution (VSR) algorithms to reproduce their high-resolution (HR) counterparts with high quality. Recently, the diffusion models have shown compelling performance in generating realistic details for image restoration tasks. However, the diffusion process has randomness, making it hard to control the contents of restored images. This issue becomes more serious when applying diffusion models to VSR tasks because temporal consistency is crucial to the perceptual quality of videos. In this paper, we propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models. To ensure the content consistency among adjacent frames, we exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss, ensuring that the generated HR video maintains a coherent and continuous visual flow. To further mitigate the discontinuity of generated details, we insert temporal module to the decoder and fine-tune it with an innovative sequence-oriented loss. The proposed motion-guided latent diffusion (MGLD) based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets, validating the effectiveness of the proposed model design and training strategies.
Generative Powers of Ten
Authors: Authors: Xiaojuan Wang, Janne Kontkanen, Brian Curless, Steve Seitz, Ira Kemelmacher, Ben Mildenhall, Pratul Srinivasan, Dor Verbin, Aleksander Holynski
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2312.02149
Pdf link: https://arxiv.org/pdf/2312.02149
Abstract We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different scales while preserving the integrity of each individual sampling process. Since each generated scale is guided by a different text prompt, our method enables deeper levels of zoom than traditional super-resolution methods that may struggle to create new contextual structure at vastly different scales. We compare our method qualitatively with alternative techniques in image super-resolution and outpainting, and show that our method is most effective at generating consistent multi-scale content.

zoq / arxiv-updates

New submissions for Tue, 5 Dec 23 #657

Keyword: sgd

AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix

Keyword: optimization

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction

Hyperparameter Optimization for Large Language Model Instruction-Tuning

Biased Random-Key Genetic Algorithms: A Review

Consistent Mesh Diffusion

Combining Kernelized Autoencoding and Centroid Prediction for Dynamic Multi-objective Optimization

Learning-based Ecological Adaptive Cruise Control of Autonomous Electric Vehicles: A Comparison of ADP, DQN and DDPG Approaches

Adding Domain Knowledge to Query-Driven Learned Databases

From Beginner to Expert: Modeling Medical Knowledge into General LLMs

Covert Communications in STAR-RIS-Aided Rate-Splitting Multiple Access Systems

Investigating the Surrogate Modeling Capabilities of Continuous Time Echo State Networks

A Database System for State Management in Stateful Network Service Function Chains [Vision]

Hybrid Hierarchical DRL Enabled Resource Allocation for Secure Transmission in Multi-IRS-Assisted Sensing-Enhanced Spectrum Sharing Networks

Prior-Aware Robust Beam Alignment for Low-SNR Millimeter-Wave Communications

Has Anything Changed? 3D Change Detection by 2D Segmentation Masks

Disjoint Dominating and 2-Dominating Sets in Graphs: Hardness and Approximation results

Pointer Networks Trained Better via Evolutionary Algorithms

Recent Advances in Scalable Energy-Efficient and Trustworthy Spiking Neural networks: from Algorithms to Technology

RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction

Strategic Data Revocation in Federated Unlearning

PPAD-membership for Problems with Exact Rational Solutions: A General Approach via Convex Optimization

Rethinking PGD Attack: Is Sign Function Necessary?

Mendata: A Framework to Purify Manipulated Training Data

Joint Beam Scheduling and Power Optimization for Beam Hopping LEO Satellite Systems

Two-stage dynamic creative optimization under sparse ambiguous samples for e-commerce advertising

Tradeoff of age-of-information and power under reliability constraint for short-packet communication with block-length adaptation

Regret Optimality of GP-UCB

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective

Context-Enhanced Relational Operators with Vector Embeddings

Towards Decentralized Task Offloading and Resource Allocation in User-Centric Mobile Edge Computing

Learning Channel Capacity with Neural Mutual Information Estimator Based on Message Importance Measure

A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video

OCGEC: One-class Graph Embedding Classification for DNN Backdoor Detection

Interference-Constrained Scheduling of a Cognitive Multi-hop Underwater Acoustic Network

An End-to-End Network Pruning Pipeline with Sparsity Enforcement

Optimizing Bus Travel: A Novel Approach to Feature Mining with P-KMEANS and P-LDA Algorithms

Risk-Controlling Model Selection via Guided Bayesian Optimization

Joint Task Partitioning and Parallel Scheduling in Device-Assisted Mobile Edge Networks

Two-stage optimized unified adversarial patch for attacking visible-infrared cross-modal detectors in the physical world

Using Bayesian Optimization to Design Time Step Size Controllers with Application to Modified Patankar--Runge--Kutta Methods

Energy-based Potential Games for Joint Motion Forecasting and Control

Class Symbolic Regression: Gotta Fit 'Em All

SPECRUN: The Danger of Speculative Runahead Execution in Processors

TCP Slice: A semi-distributed TCP algorithm for Delay-constrained Applications

CaRL: Cascade Reinforcement Learning with State Space Splitting for O-RAN based Traffic Steering

Tuning of Online Feedback Optimization for setpoint tracking in centrifugal compressors

Optimal Data Generation in Multi-Dimensional Parameter Spaces, using Bayesian Optimization

Distributed Optimization with Feasible Set Privacy

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

iMatching: Imperative Correspondence Learning

Optimizing Camera Configurations for Multi-View Pedestrian Detection

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Mesh-Guided Neural Implicit Field Editing

Keyword: adam

AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix

Keyword: gradient

Learning-based Ecological Adaptive Cruise Control of Autonomous Electric Vehicles: A Comparison of ADP, DQN and DDPG Approaches

PROFL: A Privacy-Preserving Federated Learning Method with Stringent Defense Against Poisoning Attacks

Spectrum-driven Mixed-frequency Network for Hyperspectral Salient Object Detection

Pointer Networks Trained Better via Evolutionary Algorithms

Efficient Expansion and Gradient Based Task Inference for Replay Free Incremental Learning

Rethinking PGD Attack: Is Sign Function Necessary?

Learning to Compose SuperWeights for Neural Parameter Allocation Search

Task-Oriented Edge Networks: Decentralized Learning Over Wireless Fronthaul

SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System

Heroes: Lightweight Federated Learning with Neural Composition and Adaptive Local Update in Heterogeneous Edge Networks

On Tuning Neural ODE for Stability, Consistency and Faster Convergence

AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix

Tab-Attention: Self-Attention-based Stacked Generalization for Imbalanced Credit Default Prediction

On Gradient Boosted Decision Trees and Neural Rankers: A Case-Study on Short-Video Recommendations at ShareChat

iMatching: Imperative Correspondence Learning

Readout Guidance: Learning Control from Diffusion Features

Mesh-Guided Neural Implicit Field Editing

Keyword: super-resolution