New submissions for Thu, 4 Jan 24

Keyword: sgd

Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices

Authors: Authors: Anirudh Rajiv Menon, Unnikrishnan Menon, Kailash Ahirwar
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2401.01728
Pdf link: https://arxiv.org/pdf/2401.01728
Abstract Modern deep learning models, growing larger and more complex, have demonstrated exceptional generalization and accuracy due to training on huge datasets. This trend is expected to continue. However, the increasing size of these models poses challenges in training, as traditional centralized methods are limited by memory constraints at such scales. This paper proposes an asynchronous decentralized training paradigm for large modern deep learning models that harnesses the compute power of regular heterogeneous PCs with limited resources connected across the internet to achieve favourable performance metrics. Ravnest facilitates decentralized training by efficiently organizing compute nodes into clusters with similar data transfer rates and compute capabilities, without necessitating that each node hosts the entire model. These clusters engage in $\textit{Zero-Bubble Asynchronous Model Parallel}$ training, and a $\textit{Parallel Multi-Ring All-Reduce}$ method is employed to effectively execute global parameter averaging across all clusters. We have framed our asynchronous SGD loss function as a block structured optimization problem with delayed updates and derived an optimal convergence rate of $O\left(\frac{1}{\sqrt{K}}\right)$. We further discuss linear speedup with respect to the number of participating clusters and the bound on the staleness parameter.
Keyword: optimization

Optimizing Convolutional Neural Network Architecture
Authors: Authors: Luis Balderas, Miguel Lastra, José M. Benítez
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.01361
Pdf link: https://arxiv.org/pdf/2401.01361
Abstract Convolutional Neural Networks (CNN) are widely used to face challenging tasks like speech recognition, natural language processing or computer vision. As CNN architectures get larger and more complex, their computational requirements increase, incurring significant energetic costs and challenging their deployment on resource-restricted devices. In this paper, we propose Optimizing Convolutional Neural Network Architecture (OCNNA), a novel CNN optimization and construction method based on pruning and knowledge distillation designed to establish the importance of convolutional layers. The proposal has been evaluated though a thorough empirical study including the best known datasets (CIFAR-10, CIFAR-100 and Imagenet) and CNN architectures (VGG-16, ResNet-50, DenseNet-40 and MobileNet), setting Accuracy Drop and Remaining Parameters Ratio as objective metrics to compare the performance of OCNNA against the other state-of-art approaches. Our method has been compared with more than 20 convolutional neural network simplification algorithms obtaining outstanding results. As a result, OCNNA is a competitive CNN constructing method which could ease the deployment of neural networks into IoT or resource-limited devices.
RL-MPCA: A Reinforcement Learning Based Multi-Phase Computation Allocation Approach for Recommender Systems
Authors: Authors: Jiahong Zhou, Shunhui Mao, Guoliang Yang, Bo Tang, Qianlong Xie, Lebin Lin, Xingxing Wang, Dong Wang
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.01369
Pdf link: https://arxiv.org/pdf/2401.01369
Abstract Recommender systems aim to recommend the most suitable items to users from a large number of candidates. Their computation cost grows as the number of user requests and the complexity of services (or models) increases. Under the limitation of computation resources (CRs), how to make a trade-off between computation cost and business revenue becomes an essential question. The existing studies focus on dynamically allocating CRs in queue truncation scenarios (i.e., allocating the size of candidates), and formulate the CR allocation problem as an optimization problem with constraints. Some of them focus on single-phase CR allocation, and others focus on multi-phase CR allocation but introduce some assumptions about queue truncation scenarios. However, these assumptions do not hold in other scenarios, such as retrieval channel selection and prediction model selection. Moreover, existing studies ignore the state transition process of requests between different phases, limiting the effectiveness of their approaches. This paper proposes a Reinforcement Learning (RL) based Multi-Phase Computation Allocation approach (RL-MPCA), which aims to maximize the total business revenue under the limitation of CRs. RL-MPCA formulates the CR allocation problem as a Weakly Coupled MDP problem and solves it with an RL-based approach. Specifically, RL-MPCA designs a novel deep Q-network to adapt to various CR allocation scenarios, and calibrates the Q-value by introducing multiple adaptive Lagrange multipliers (adaptive-$\lambda$) to avoid violating the global CR constraints. Finally, experiments on the offline simulation environment and online real-world recommender system validate the effectiveness of our approach.
Cost Minimization in Multi-cloud Systems with Runtime Microservice Re-orchestration
Authors: Authors: Marco Zambianco, Silvio Cretti, Domenico Siracusa
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2401.01408
Pdf link: https://arxiv.org/pdf/2401.01408
Abstract Multi-cloud systems facilitate a cost-efficient and geographically-distributed deployment of microservice-based applications by temporary leasing virtual nodes with diverse pricing models. To preserve the cost-efficiency of multi-cloud deployments, it is essential to redeploy microservices onto the available nodes according to a dynamic resource configuration, which is often performed to better accommodate workload variations. However, this approach leads to frequent service disruption since applications are continuously shutdown and redeployed in order to apply the new resource assignment. To overcome this issue, we propose a re-orchestration scheme that migrates microservice at runtime based on a rolling update scheduling logic. Specifically, we propose an integer linear optimization problem that minimizes the cost associated to multi-cloud virtual nodes and that ensures that delay-sensitive microservices are co-located on the same regional cluster. The resulting rescheduling order guarantees no service disruption by repacking microservices between the available nodes without the need to turn off the outdated microservice instance before redeploying the updated version. In addition, we propose a two-step heuristic scheme that effectively approximates the optimal solution at the expense of close-to-zero service disruption and QoS violation probability. Results show that proposed schemes achieve better performance in terms of cost mitigation, low service disruption and low QoS violation probability compared to baseline schemes replicating Kubernetes scheduler functionalities.
Multiple Access Techniques for Intelligent and Multi-Functional 6G: Tutorial, Survey, and Outlook
Authors: Authors: Bruno Clerckx, Yijie Mao, Zhaohui Yang, Mingzhe Chen, Ahmed Alkhateeb, Liang Liu, Min Qiu, Jinhong Yuan, Vincent W.S. Wong, Juan Montojo
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.01433
Pdf link: https://arxiv.org/pdf/2401.01433
Abstract Multiple access (MA) is a crucial part of any wireless system and refers to techniques that make use of the resource dimensions to serve multiple users/devices/machines/services, ideally in the most efficient way. Given the needs of multi-functional wireless networks for integrated communications, sensing, localization, computing, coupled with the surge of machine learning / artificial intelligence (AI) in wireless networks, MA techniques are expected to experience a paradigm shift in 6G and beyond. In this paper, we provide a tutorial, survey and outlook of past, emerging and future MA techniques and pay a particular attention to how wireless network intelligence and multi-functionality will lead to a re-thinking of those techniques. The paper starts with an overview of orthogonal, physical layer multicasting, space domain, power domain, ratesplitting, code domain MAs, and other domains, and highlight the importance of researching universal multiple access to shrink instead of grow the knowledge tree of MA schemes by providing a unified understanding of MA schemes across all resource dimensions. It then jumps into rethinking MA schemes in the era of wireless network intelligence, covering AI for MA such as AI-empowered resource allocation, optimization, channel estimation, receiver designs, user behavior predictions, and MA for AI such as federated learning/edge intelligence and over the air computation. We then discuss MA for network multi-functionality and the interplay between MA and integrated sensing, localization, and communications. We finish with studying MA for emerging intelligent applications before presenting a roadmap toward 6G standardization. We also point out numerous directions that are promising for future research.
Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment
Authors: Authors: Shamyo Brotee, Farhan Kabir, Md. Abdur Razzaque, Palash Roy, Md. Mamun-Or-Rashid, Md. Rafiul Hassan, Mohammad Mehedi Hassan
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2401.01481
Pdf link: https://arxiv.org/pdf/2401.01481
Abstract One of the most critical applications undertaken by coalitions of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is reaching predefined targets by following the most time-efficient routes while avoiding collisions. Unfortunately, UAVs are hampered by limited battery life, and UGVs face challenges in reachability due to obstacles and elevation variations. Existing literature primarily focuses on one-to-one coalitions, which constrains the efficiency of reaching targets. In this work, we introduce a novel approach for a UAV-UGV coalition with a variable number of vehicles, employing a modified mean-shift clustering algorithm to segment targets into multiple zones. Each vehicle utilizes Multi-agent Deep Deterministic Policy Gradient (MADDPG) and Multi-agent Proximal Policy Optimization (MAPPO), two advanced reinforcement learning algorithms, to form an effective coalition for navigating obstructed environments without collisions. This approach of assigning targets to various circular zones, based on density and range, significantly reduces the time required to reach these targets. Moreover, introducing variability in the number of UAVs and UGVs in a coalition enhances task efficiency by enabling simultaneous multi-target engagement. The results of our experimental evaluation demonstrate that our proposed method substantially surpasses current state-of-the-art techniques, nearly doubling efficiency in terms of target navigation time and task completion rate.
Enhancing Multilingual Information Retrieval in Mixed Human Resources Environments: A RAG Model Implementation for Multicultural Enterprise
Authors: Authors: Syed Rameel Ahmad
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2401.01511
Pdf link: https://arxiv.org/pdf/2401.01511
Abstract The advent of Large Language Models has revolutionized information retrieval, ushering in a new era of expansive knowledge accessibility. While these models excel in providing open-world knowledge, effectively extracting answers in diverse linguistic environments with varying levels of literacy remains a formidable challenge. Retrieval Augmented Generation (RAG) emerges as a promising solution, bridging the gap between information availability and multilingual comprehension. However, deploying RAG models in real-world scenarios demands careful consideration of various factors. This paper addresses the critical challenges associated with implementing RAG models in multicultural environments. We delve into essential considerations, including data feeding strategies, timely updates, mitigation of hallucinations, prevention of erroneous responses, and optimization of delivery speed. Our work involves the integration of a diverse array of tools, meticulously combined to facilitate the seamless adoption of RAG models across languages and literacy levels within a multicultural organizational context. Through strategic tweaks in our approaches, we achieve not only effectiveness but also efficiency, ensuring the accelerated and accurate delivery of information in a manner that is tailored to the unique requirements of multilingual and multicultural settings.
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Authors: Authors: Chen Tang, Yuan Meng, Jiacheng Jiang, Shuzhao Xie, Rongwei Lu, Xinzhu Ma, Zhi Wang, Wenwu Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01543
Pdf link: https://arxiv.org/pdf/2401.01543
Abstract Quantization is of significance for compressing the over-parameterized deep neural models and deploying them on resource-limited devices. Fixed-precision quantization suffers from performance drop due to the limited numerical representation ability. Conversely, mixed-precision quantization (MPQ) is advocated to compress the model effectively by allocating heterogeneous bit-width for layers. MPQ is typically organized into a searching-retraining two-stage process. Previous works only focus on determining the optimal bit-width configuration in the first stage efficiently, while ignoring the considerable time costs in the second stage. However, retraining always consumes hundreds of GPU-hours on the cutting-edge GPUs, thus hindering deployment efficiency significantly. In this paper, we devise a one-shot training-searching paradigm for mixed-precision model compression. Specifically, in the first stage, all potential bit-width configurations are coupled and thus optimized simultaneously within a set of shared weights. However, our observations reveal a previously unseen and severe bit-width interference phenomenon among highly coupled weights during optimization, leading to considerable performance degradation under a high compression ratio. To tackle this problem, we first design a bit-width scheduler to dynamically freeze the most turbulent bit-width of layers during training, to ensure the rest bit-widths converged properly. Then, taking inspiration from information theory, we present an information distortion mitigation technique to align the behaviour of the bad-performing bit-widths to the well-performing ones.
DDN-SLAM: Real-time Dense Dynamic Neural Implicit SLAM with Joint Semantic Encoding
Authors: Authors: Mingrui Li, Jiaming He, Guangan Jiang, Hongyu Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.01545
Pdf link: https://arxiv.org/pdf/2401.01545
Abstract We propose DDN-SLAM, a real-time dense neural implicit semantic SLAM system designed for dynamic scenes. While existing neural implicit SLAM systems perform well in static scenes, they often encounter challenges in real-world environments with dynamic interferences, leading to ineffective tracking and mapping. DDN-SLAM utilizes the priors provided by the deep semantic system, combined with conditional probability fields, for segmentation.By constructing depth-guided static masks and employing joint multi-resolution hashing encoding, we ensure fast hole filling and high-quality mapping while mitigating the effects of dynamic information interference. To enhance tracking robustness, we utilize sparse feature points validated with optical flow and keyframes, enabling loop closure detection and global bundle optimization. Furthermore, DDN-SLAM supports monocular, stereo, and RGB-D inputs, operating robustly at a frequency of 20-30Hz. Extensive experiments on 6 virtual/real datasets demonstrate that our method outperforms state-of-the-art approaches in both dynamic and static scenes.
One-Step Late Fusion Multi-view Clustering with Compressed Subspace
Authors: Authors: Qiyuan Ou, Pei Zhang, Sihang Zhou, En Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01558
Pdf link: https://arxiv.org/pdf/2401.01558
Abstract Late fusion multi-view clustering (LFMVC) has become a rapidly growing class of methods in the multi-view clustering (MVC) field, owing to its excellent computational speed and clustering performance. One bottleneck faced by existing late fusion methods is that they are usually aligned to the average kernel function, which makes the clustering performance highly dependent on the quality of datasets. Another problem is that they require subsequent k-means clustering after obtaining the consensus partition matrix to get the final discrete labels, and the resulting separation of the label learning and cluster structure optimization processes limits the integrity of these models. To address the above issues, we propose an integrated framework named One-Step Late Fusion Multi-view Clustering with Compressed Subspace (OS-LFMVC-CS). Specifically, we use the consensus subspace to align the partition matrix while optimizing the partition fusion, and utilize the fused partition matrix to guide the learning of discrete labels. A six-step iterative optimization approach with verified convergence is proposed. Sufficient experiments on multiple datasets validate the effectiveness and efficiency of our proposed method.
Towards Multi-Objective High-Dimensional Feature Selection via Evolutionary Multitasking
Authors: Authors: Yinglan Feng, Liang Feng, Songbai Liu, Sam Kwong, Kay Chen Tan
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2401.01563
Pdf link: https://arxiv.org/pdf/2401.01563
Abstract Evolutionary Multitasking (EMT) paradigm, an emerging research topic in evolutionary computation, has been successfully applied in solving high-dimensional feature selection (FS) problems recently. However, existing EMT-based FS methods suffer from several limitations, such as a single mode of multitask generation, conducting the same generic evolutionary search for all tasks, relying on implicit transfer mechanisms through sole solution encodings, and employing single-objective transformation, which result in inadequate knowledge acquisition, exploitation, and transfer. To this end, this paper develops a novel EMT framework for multiobjective high-dimensional feature selection problems, namely MO-FSEMT. In particular, multiple auxiliary tasks are constructed by distinct formulation methods to provide diverse search spaces and information representations and then simultaneously addressed with the original task through a multi-slover-based multitask optimization scheme. Each task has an independent population with task-specific representations and is solved using separate evolutionary solvers with different biases and search preferences. A task-specific knowledge transfer mechanism is designed to leverage the advantage information of each task, enabling the discovery and effective transmission of high-quality solutions during the search process. Comprehensive experimental results demonstrate that our MO-FSEMT framework can achieve overall superior performance compared to the state-of-the-art FS methods on 26 datasets. Moreover, the ablation studies verify the contributions of different components of the proposed MO-FSEMT.
Enhancing Generalization of Invisible Facial Privacy Cloak via Gradient Accumulation
Authors: Authors: Xuannan Liu, Yaoyao Zhong, Weihong Deng, Hongzhi Shi, Xingchen Cui, Yunfeng Yin, Dongchao Wen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01575
Pdf link: https://arxiv.org/pdf/2401.01575
Abstract The blooming of social media and face recognition (FR) systems has increased people's concern about privacy and security. A new type of adversarial privacy cloak (class-universal) can be applied to all the images of regular users, to prevent malicious FR systems from acquiring their identity information. In this work, we discover the optimization dilemma in the existing methods -- the local optima problem in large-batch optimization and the gradient information elimination problem in small-batch optimization. To solve these problems, we propose Gradient Accumulation (GA) to aggregate multiple small-batch gradients into a one-step iterative gradient to enhance the gradient stability and reduce the usage of quantization operations. Experiments show that our proposed method achieves high performance on the Privacy-Commons dataset against black-box face recognition models.
An Invariant Information Geometric Method for High-Dimensional Online Optimization
Authors: Authors: Zhengfei Zhang, Yunyue Wei, Yanan Sui
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2401.01579
Pdf link: https://arxiv.org/pdf/2401.01579
Abstract Sample efficiency is crucial in optimization, particularly in black-box scenarios characterized by expensive evaluations and zeroth-order feedback. When computing resources are plentiful, Bayesian optimization is often favored over evolution strategies. In this paper, we introduce a full invariance oriented evolution strategies algorithm, derived from its corresponding framework, that effectively rivals the leading Bayesian optimization method in tasks with dimensions at the upper limit of Bayesian capability. Specifically, we first build the framework InvIGO that fully incorporates historical information while retaining the full invariant and computational complexity. We then exemplify InvIGO on multi-dimensional Gaussian, which gives an invariant and scalable optimizer SynCMA . The theoretical behavior and advantages of our algorithm over other Gaussian-based evolution strategies are further analyzed. Finally, We benchmark SynCMA against leading algorithms in Bayesian optimization and evolution strategies on various high dimension tasks, in cluding Mujoco locomotion tasks, rover planning task and synthetic functions. In all scenarios, SynCMA demonstrates great competence, if not dominance, over other algorithms in sample efficiency, showing the underdeveloped potential of property oriented evolution strategies.
Energy Sharing among Resources within Electrical Distribution Systems: A Systematic Review
Authors: Authors: G Hari Krishna, K. Victor Sam Moses Babu, Divyanshi Dwivedi, Pratyush Chakraborty, Pradeep Kumar Yemula, Mayukha Pal
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.01597
Pdf link: https://arxiv.org/pdf/2401.01597
Abstract The rapid increase in Electric Vehicle (EV) adoption provides a promising solution for reducing carbon emissions and fossil fuel dependency in transportation systems. However, the increasing numbers of EVs pose significant challenges to the electrical grids. In addition, the number of Distributed Energy Resources (DER) and Microgrids (MGs) is increasing on a global scale to meet the energy demand, consequently changing the energy infrastructure. Recently, energy-sharing methods have been proposed to share excess energy from DERs and EVs in Electric Vehicle Charging Infrastructure (EVCI) and MGs. Accommodating this sharing mechanism with the existing electrical distribution systems is a critical issue concerning the economic, reliability, and resilience aspects. This study examines the ever-changing field of EVCI and the critical role of peer-to-peer (P2P) energy trading in mitigating the problems with grid management that result from unorganized EV charging and intermittency in DER. Also, the possibility of energy sharing in electrical distribution systems for microgrids and EVCI on various energy-sharing methods and algorithms are discussed in detail. Furthermore, the application of market clearing algorithms like game theory, double auction theory, blockchain technology, optimization techniques, machine learning algorithms, and other models from the existing literature are presented. This paper discusses the policies, economic benefits, environmental impacts, societal advantages, and challenges in distribution systems related to sharing in EVCI and MGs. A roadmap for future research and sharing strategies is provided to guide policymakers, researchers, and industry stakeholders toward a sustainable, resilient, and efficient energy market by integrating P2P technology into EVCIs and MGs.
Entropy-based Probing Beam Selection and Beam Prediction via Deep Learning
Authors: Authors: Fan Meng, Cheng Zhang, Yongming Huang, Zhilei Zhang, Xiaoyu Bai, Zhaohua Lu
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.01609
Pdf link: https://arxiv.org/pdf/2401.01609
Abstract Hierarchical beam search in mmWave communications incurs substantial training overhead, necessitating deep learning-enabled beam predictions to effectively leverage channel priors and mitigate this overhead. In this study, we introduce a comprehensive probabilistic model of power distribution in beamspace, and formulate the joint optimization problem of probing beam selection and probabilistic beam prediction as an entropy minimization problem. Then, we propose a greedy scheme to iteratively and alternately solve this problem, where a transformer-based beam predictor is trained to estimate the conditional power distribution based on the probing beams and user location within each iteration, and the trained predictor selects an unmeasured beam that minimizes the entropy of remaining beams. To further reduce the number of interactions and the computational complexity of the iterative scheme, we propose a two-stage probing beam selection scheme. Firstly, probing beams are selected from a location-specific codebook designed by an entropy-based criterion, and predictions are made with corresponding feedback. Secondly, the optimal beam is identified using additional probing beams with the highest predicted power values. Simulation results demonstrate the superiority of the proposed schemes compared to hierarchical beam search and beam prediction with uniform probing beams.
SIGNeRF: Scene Integrated Generation for Neural Radiance Fields
Authors: Authors: Jan-Niklas Dihlmann, Andreas Engelhardt, Hendrik Lensch
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2401.01647
Pdf link: https://arxiv.org/pdf/2401.01647
Abstract Advances in image diffusion models have recently led to notable improvements in the generation of high-quality images. In combination with Neural Radiance Fields (NeRFs), they enabled new opportunities in 3D generation. However, most generative 3D approaches are object-centric and applying them to editing existing photorealistic scenes is not trivial. We propose SIGNeRF, a novel approach for fast and controllable NeRF scene editing and scene-integrated object generation. A new generative update strategy ensures 3D consistency across the edited images, without requiring iterative optimization. We find that depth-conditioned diffusion models inherently possess the capability to generate 3D consistent views by requesting a grid of images instead of single views. Based on these insights, we introduce a multi-view reference sheet of modified images. Our method updates an image collection consistently based on the reference sheet and refines the original NeRF with the newly generated image set in one go. By exploiting the depth conditioning mechanism of the image diffusion model, we gain fine control over the spatial location of the edit and enforce shape guidance by a selected region or an external mesh.
Distributed Pose-graph Optimization with Multi-level Partitioning for Collaborative SLAM
Authors: Authors: Cunhao Li, Peng Yi, Guanghui Guo, Yiguang Hong
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2401.01657
Pdf link: https://arxiv.org/pdf/2401.01657
Abstract The back-end module of Distributed Collaborative Simultaneous Localization and Mapping (DCSLAM) requires solving a nonlinear Pose Graph Optimization (PGO) under a distributed setting, also known as SE(d)-synchronization. Most existing distributed graph optimization algorithms employ a simple sequential partitioning scheme, which may result in unbalanced subgraph dimensions due to the different geographic locations of each robot, and hence imposes extra communication load. Moreover, the performance of current Riemannian optimization algorithms can be further accelerated. In this letter, we propose a novel distributed pose graph optimization algorithm combining multi-level partitioning with an accelerated Riemannian optimization method. Firstly, we employ the multi-level graph partitioning algorithm to preprocess the naive pose graph to formulate a balanced optimization problem. In addition, inspired by the accelerated coordinate descent method, we devise an Improved Riemannian Block Coordinate Descent (IRBCD) algorithm and the critical point obtained is globally optimal. Finally, we evaluate the effects of four common graph partitioning approaches on the correlation of the inter-subgraphs, and discover that the Highest scheme has the best partitioning performance. Also, we implement simulations to quantitatively demonstrate that our proposed algorithm outperforms the state-of-the-art distributed pose graph optimization protocols.
Simultaneous q-Space Sampling Optimization and Reconstruction for Fast and High-fidelity Diffusion Magnetic Resonance Imaging
Authors: Authors: Jing Yang, Jian Cheng, Cheng Li, Wenxin Fan, Juan Zou, Ruoyou Wu, Shanshan Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01662
Pdf link: https://arxiv.org/pdf/2401.01662
Abstract Diffusion Magnetic Resonance Imaging (dMRI) plays a crucial role in the noninvasive investigation of tissue microstructural properties and structural connectivity in the \textit{in vivo} human brain. However, to effectively capture the intricate characteristics of water diffusion at various directions and scales, it is important to employ comprehensive q-space sampling. Unfortunately, this requirement leads to long scan times, limiting the clinical applicability of dMRI. To address this challenge, we propose SSOR, a Simultaneous q-Space sampling Optimization and Reconstruction framework. We jointly optimize a subset of q-space samples using a continuous representation of spherical harmonic functions and a reconstruction network. Additionally, we integrate the unique properties of diffusion magnetic resonance imaging (dMRI) in both the q-space and image domains by applying $l1$-norm and total-variation regularization. The experiments conducted on HCP data demonstrate that SSOR has promising strengths both quantitatively and qualitatively and exhibits robustness to noise.
Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices
Authors: Authors: Anirudh Rajiv Menon, Unnikrishnan Menon, Kailash Ahirwar
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2401.01728
Pdf link: https://arxiv.org/pdf/2401.01728
Abstract Modern deep learning models, growing larger and more complex, have demonstrated exceptional generalization and accuracy due to training on huge datasets. This trend is expected to continue. However, the increasing size of these models poses challenges in training, as traditional centralized methods are limited by memory constraints at such scales. This paper proposes an asynchronous decentralized training paradigm for large modern deep learning models that harnesses the compute power of regular heterogeneous PCs with limited resources connected across the internet to achieve favourable performance metrics. Ravnest facilitates decentralized training by efficiently organizing compute nodes into clusters with similar data transfer rates and compute capabilities, without necessitating that each node hosts the entire model. These clusters engage in $\textit{Zero-Bubble Asynchronous Model Parallel}$ training, and a $\textit{Parallel Multi-Ring All-Reduce}$ method is employed to effectively execute global parameter averaging across all clusters. We have framed our asynchronous SGD loss function as a block structured optimization problem with delayed updates and derived an optimal convergence rate of $O\left(\frac{1}{\sqrt{K}}\right)$. We further discuss linear speedup with respect to the number of participating clusters and the bound on the staleness parameter.
Gradient-Based Optimization of Lattice Quantizers
Authors: Authors: Erik Agrell, Daniel Pook-Kolb, Bruce Allen
Subjects: Information Theory (cs.IT); Mathematical Physics (math-ph); Metric Geometry (math.MG)
Arxiv link: https://arxiv.org/abs/2401.01799
Pdf link: https://arxiv.org/pdf/2401.01799
Abstract Lattices with minimal normalized second moments are designed using a new numerical optimization algorithm. Starting from a random lower-triangular generator matrix and applying stochastic gradient descent, all elements are updated towards the negative gradient, which makes it the most efficient algorithm proposed so far for this purpose. A graphical illustration of the theta series, called theta image, is introduced and shown to be a powerful tool for converting numerical lattice representations into their underlying exact forms. As a proof of concept, optimized lattices are designed in dimensions up to 16. In all dimensions, the algorithm converges to either the previously best known lattice or a better one. The dual of the 15-dimensional laminated lattice is conjectured to be optimal in its dimension.
Signal Processing in the Retina: Interpretable Graph Classifier to Predict Ganglion Cell Responses
Authors: Authors: Yasaman Parhizkar, Gene Cheung, Andrew W. Eckford
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2401.01813
Pdf link: https://arxiv.org/pdf/2401.01813
Abstract It is a popular hypothesis in neuroscience that ganglion cells in the retina are activated by selectively detecting visual features in an observed scene. While ganglion cell firings can be predicted via data-trained deep neural nets, the networks remain indecipherable, thus providing little understanding of the cells' underlying operations. To extract knowledge from the cell firings, in this paper we learn an interpretable graph-based classifier from data to predict the firings of ganglion cells in response to visual stimuli. Specifically, we learn a positive semi-definite (PSD) metric matrix $\mathbf{M} \succeq 0$ that defines Mahalanobis distances between graph nodes (visual events) endowed with pre-computed feature vectors; the computed inter-node distances lead to edge weights and a combinatorial graph that is amenable to binary classification. Mathematically, we define the objective of metric matrix $\mathbf{M}$ optimization using a graph adaptation of large margin nearest neighbor (LMNN), which is rewritten as a semi-definite programming (SDP) problem. We solve it efficiently via a fast approximation called Gershgorin disc perfect alignment (GDPA) linearization. The learned metric matrix $\mathbf{M}$ provides interpretability: important features are identified along $\mathbf{M}$'s diagonal, and their mutual relationships are inferred from off-diagonal terms. Our fast metric learning framework can be applied to other biological systems with pre-chosen features that require interpretation.
Many-Objective-Optimized Semi-Automated Robotic Disassembly Sequences
Authors: Authors: Takuya Kiyokawa, Kensuke Harada, Weiwei Wan, Tomoki Ishikura, Naoya Miyaji, Genichiro Matsuda
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.01817
Pdf link: https://arxiv.org/pdf/2401.01817
Abstract This study tasckles the problem of many-objective sequence optimization for semi-automated robotic disassembly operations. To this end, we employ a many-objective genetic algorithm (MaOGA) algorithm inspired by the Non-dominated Sorting Genetic Algorithm (NSGA)-III, along with robotic-disassembly-oriented constraints and objective functions derived from geometrical and robot simulations using 3-dimensional (3D) geometrical information stored in a 3D Computer-Aided Design (CAD) model of the target product. The MaOGA begins by generating a set of initial chromosomes based on a contact and connection graph (CCG), rather than random chromosomes, to avoid falling into a local minimum and yield repeatable convergence. The optimization imposes constraints on feasibility and stability as well as objective functions regarding difficulty, efficiency, prioritization, and allocability to generate a sequence that satisfies many preferred conditions under mandatory requirements for semi-automated robotic disassembly. The NSGA-III-inspired MaOGA also utilizes non-dominated sorting and niching with reference lines to further encourage steady and stable exploration and uniformly lower the overall evaluation values. Our sequence generation experiments for a complex product (36 parts) demonstrated that the proposed method can consistently produce feasible and stable sequences with a 100% success rate, bringing the multiple preferred conditions closer to the optimal solution required for semi-automated robotic disassembly operations.
Keyword: adam

Hadamard integrators for wave equations in time and frequency domain: Eulerian formulations via butterfly algorithms
Authors: Authors: Yuxiao Wei, Jin Cheng, Shingyu Leung, Robert Burridge, Jianliang Qian
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2401.01423
Pdf link: https://arxiv.org/pdf/2401.01423
Abstract Starting from the Kirchhoff-Huygens representation and Duhamel's principle of time-domain wave equations, we propose novel butterfly-compressed Hadamard integrators for self-adjoint wave equations in both time and frequency domain in an inhomogeneous medium. First, we incorporate the leading term of Hadamard's ansatz into the Kirchhoff-Huygens representation to develop a short-time valid propagator. Second, using the Fourier transform in time, we derive the corresponding Eulerian short-time propagator in frequency domain; on top of this propagator, we further develop a time-frequency-time (TFT) method for the Cauchy problem of time-domain wave equations. Third, we further propose the time-frequency-time-frequency (TFTF) method for the corresponding point-source Helmholtz equation, which provides Green's functions of the Helmholtz equation for all angular frequencies within a given frequency band. Fourth, to implement TFT and TFTF methods efficiently, we introduce butterfly algorithms to compress oscillatory integral kernels at different frequencies. As a result, the proposed methods can construct wave field beyond caustics implicitly and advance spatially overturning waves in time naturally with quasi-optimal computational complexity and memory usage. Furthermore, once constructed the Hadamard integrators can be employed to solve both time-domain wave equations with various initial conditions and frequency-domain wave equations with different point sources. Numerical examples for two-dimensional wave equations illustrate the accuracy and efficiency of the proposed methods.
Keyword: gradient

Hierarchical Over-the-Air Federated Learning with Awareness of Interference and Data Heterogeneity
Authors: Authors: Seyed Mohammad Azimi-Abarghouyi, Viktoria Fodor
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.01442
Pdf link: https://arxiv.org/pdf/2401.01442
Abstract When implementing hierarchical federated learning over wireless networks, scalability assurance and the ability to handle both interference and device data heterogeneity are crucial. This work introduces a learning method designed to address these challenges, along with a scalable transmission scheme that efficiently uses a single wireless resource through over-the-air computation. To provide resistance against data heterogeneity, we employ gradient aggregations. Meanwhile, the impact of interference is minimized through optimized receiver normalizing factors. For this, we model a multi-cluster wireless network using stochastic geometry, and characterize the mean squared error of the aggregation estimations as a function of the network parameters. We show that despite the interference and the data heterogeneity, the proposed scheme achieves high learning accuracy and can significantly outperform the conventional hierarchical algorithm.
Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment
Authors: Authors: Shamyo Brotee, Farhan Kabir, Md. Abdur Razzaque, Palash Roy, Md. Mamun-Or-Rashid, Md. Rafiul Hassan, Mohammad Mehedi Hassan
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2401.01481
Pdf link: https://arxiv.org/pdf/2401.01481
Abstract One of the most critical applications undertaken by coalitions of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is reaching predefined targets by following the most time-efficient routes while avoiding collisions. Unfortunately, UAVs are hampered by limited battery life, and UGVs face challenges in reachability due to obstacles and elevation variations. Existing literature primarily focuses on one-to-one coalitions, which constrains the efficiency of reaching targets. In this work, we introduce a novel approach for a UAV-UGV coalition with a variable number of vehicles, employing a modified mean-shift clustering algorithm to segment targets into multiple zones. Each vehicle utilizes Multi-agent Deep Deterministic Policy Gradient (MADDPG) and Multi-agent Proximal Policy Optimization (MAPPO), two advanced reinforcement learning algorithms, to form an effective coalition for navigating obstructed environments without collisions. This approach of assigning targets to various circular zones, based on density and range, significantly reduces the time required to reach these targets. Moreover, introducing variability in the number of UAVs and UGVs in a coalition enhances task efficiency by enabling simultaneous multi-target engagement. The results of our experimental evaluation demonstrate that our proposed method substantially surpasses current state-of-the-art techniques, nearly doubling efficiency in terms of target navigation time and task completion rate.
Enhancing Generalization of Invisible Facial Privacy Cloak via Gradient Accumulation
Authors: Authors: Xuannan Liu, Yaoyao Zhong, Weihong Deng, Hongzhi Shi, Xingchen Cui, Yunfeng Yin, Dongchao Wen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01575
Pdf link: https://arxiv.org/pdf/2401.01575
Abstract The blooming of social media and face recognition (FR) systems has increased people's concern about privacy and security. A new type of adversarial privacy cloak (class-universal) can be applied to all the images of regular users, to prevent malicious FR systems from acquiring their identity information. In this work, we discover the optimization dilemma in the existing methods -- the local optima problem in large-batch optimization and the gradient information elimination problem in small-batch optimization. To solve these problems, we propose Gradient Accumulation (GA) to aggregate multiple small-batch gradients into a one-step iterative gradient to enhance the gradient stability and reduce the usage of quantization operations. Experiments show that our proposed method achieves high performance on the Privacy-Commons dataset against black-box face recognition models.
Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay
Authors: Authors: Yicheng Li, Weiye Gan, Zuoqiang Shi, Qian Lin
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Arxiv link: https://arxiv.org/abs/2401.01599
Pdf link: https://arxiv.org/pdf/2401.01599
Abstract The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method (and a large class of analytic spectral algorithms) in kernel regression. Consequently, we could sharpen the near inconsistency of kernel interpolation and clarify the saturation effects of kernel regression algorithms with higher qualification, etc. Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks. A novel technical contribution, the analytic functional argument, might be of independent interest.
Gradient-Based Optimization of Lattice Quantizers
Authors: Authors: Erik Agrell, Daniel Pook-Kolb, Bruce Allen
Subjects: Information Theory (cs.IT); Mathematical Physics (math-ph); Metric Geometry (math.MG)
Arxiv link: https://arxiv.org/abs/2401.01799
Pdf link: https://arxiv.org/pdf/2401.01799
Abstract Lattices with minimal normalized second moments are designed using a new numerical optimization algorithm. Starting from a random lower-triangular generator matrix and applying stochastic gradient descent, all elements are updated towards the negative gradient, which makes it the most efficient algorithm proposed so far for this purpose. A graphical illustration of the theta series, called theta image, is introduced and shown to be a powerful tool for converting numerical lattice representations into their underlying exact forms. As a proof of concept, optimized lattices are designed in dimensions up to 16. In all dimensions, the algorithm converges to either the previously best known lattice or a better one. The dual of the 15-dimensional laminated lattice is conjectured to be optimal in its dimension.
NODEC: Neural ODE For Optimal Control of Unknown Dynamical Systems
Authors: Authors: Cheng Chi
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.01836
Pdf link: https://arxiv.org/pdf/2401.01836
Abstract Controlling complex dynamical systems is generally associated with minimizing certain control objectives with known dynamics under the variational calculus framework. For systems with unknown dynamics, an additional step of dynamics modeling is required. However, any inaccuracy in dynamics modeling will lead to sub-optimality in the resulting control function. Another set of approaches for controlling unknown dynamical systems - reinforcement learning, folds the dynamics modeling into controller training via value function approximation or policy gradient through extensively interacting with the environment, but it suffers from low data efficiency. To address these, we introduce NODEC, a novel framework for controlling unknown dynamical systems, which combines dynamics modelling and controller training using a coupled neural ODE model. Through an intriguing interplay between the two coupled neural networks, NODEC learns system dynamics as well as optimal controls that guides the unknown dynamical system towards target states. Our experiments demonstrate the effectiveness and data efficiency of NODEC for learning optimal control of unknown dynamical systems.
On the hardness of learning under symmetries
Authors: Authors: Bobak T. Kiani, Thien Le, Hannah Lawrence, Stefanie Jegelka, Melanie Weber
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.01869
Pdf link: https://arxiv.org/pdf/2401.01869
Abstract We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected (i.e. non-symmetric) networks has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this work, we ask: are known problem symmetries sufficient to alleviate the fundamental hardness of learning neural nets with gradient descent? We answer this question in the negative. In particular, we give lower bounds for shallow graph neural networks, convolutional networks, invariant polynomials, and frame-averaged networks for permutation subgroups, which all scale either superpolynomially or exponentially in the relevant input dimension. Therefore, in spite of the significant inductive bias imparted via symmetry, actually learning the complete classes of functions represented by equivariant neural networks via gradient descent remains hard.
Keyword: super-resolution

Efficient Hybrid Zoom using Camera Fusion on Mobile Phones
Authors: Authors: Xiaotong Wu, Wei-Sheng Lai, YiChang Shih, Charles Herrmann, Michael Krainin, Deqing Sun, Chia-Kai Liang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.01461
Pdf link: https://arxiv.org/pdf/2401.01461
Abstract DSLR cameras can achieve multiple zoom levels via shifting lens distances or swapping lens types. However, these techniques are not possible on smartphone devices due to space constraints. Most smartphone manufacturers adopt a hybrid zoom system: commonly a Wide (W) camera at a low zoom level and a Telephoto (T) camera at a high zoom level. To simulate zoom levels between W and T, these systems crop and digitally upsample images from W, leading to significant detail loss. In this paper, we propose an efficient system for hybrid zoom super-resolution on mobile devices, which captures a synchronous pair of W and T shots and leverages machine learning models to align and transfer details from T to W. We further develop an adaptive blending method that accounts for depth-of-field mismatches, scene occlusion, flow uncertainty, and alignment errors. To minimize the domain gap, we design a dual-phone camera rig to capture real-world inputs and ground-truths for supervised training. Our method generates a 12-megapixel image in 500ms on a mobile platform and compares favorably against state-of-the-art methods under extensive evaluation on real-world scenarios.

zoq / arxiv-updates

New submissions for Thu, 4 Jan 24 #679

Keyword: sgd

Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices

Keyword: optimization

Optimizing Convolutional Neural Network Architecture

RL-MPCA: A Reinforcement Learning Based Multi-Phase Computation Allocation Approach for Recommender Systems

Cost Minimization in Multi-cloud Systems with Runtime Microservice Re-orchestration

Multiple Access Techniques for Intelligent and Multi-Functional 6G: Tutorial, Survey, and Outlook

Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment

Enhancing Multilingual Information Retrieval in Mixed Human Resources Environments: A RAG Model Implementation for Multicultural Enterprise

Retraining-free Model Quantization via One-Shot Weight-Coupling Learning

DDN-SLAM: Real-time Dense Dynamic Neural Implicit SLAM with Joint Semantic Encoding

One-Step Late Fusion Multi-view Clustering with Compressed Subspace

Towards Multi-Objective High-Dimensional Feature Selection via Evolutionary Multitasking

Enhancing Generalization of Invisible Facial Privacy Cloak via Gradient Accumulation

An Invariant Information Geometric Method for High-Dimensional Online Optimization

Energy Sharing among Resources within Electrical Distribution Systems: A Systematic Review

Entropy-based Probing Beam Selection and Beam Prediction via Deep Learning

SIGNeRF: Scene Integrated Generation for Neural Radiance Fields

Distributed Pose-graph Optimization with Multi-level Partitioning for Collaborative SLAM

Simultaneous q-Space Sampling Optimization and Reconstruction for Fast and High-fidelity Diffusion Magnetic Resonance Imaging

Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices

Gradient-Based Optimization of Lattice Quantizers

Signal Processing in the Retina: Interpretable Graph Classifier to Predict Ganglion Cell Responses

Many-Objective-Optimized Semi-Automated Robotic Disassembly Sequences

Keyword: adam

Hadamard integrators for wave equations in time and frequency domain: Eulerian formulations via butterfly algorithms

Keyword: gradient

Hierarchical Over-the-Air Federated Learning with Awareness of Interference and Data Heterogeneity

Optimizing UAV-UGV Coalition Operations: A Hybrid Clustering and Multi-Agent Reinforcement Learning Approach for Path Planning in Obstructed Environment

Enhancing Generalization of Invisible Facial Privacy Cloak via Gradient Accumulation

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

Gradient-Based Optimization of Lattice Quantizers

NODEC: Neural ODE For Optimal Control of Unknown Dynamical Systems

On the hardness of learning under symmetries

Keyword: super-resolution

Efficient Hybrid Zoom using Camera Fusion on Mobile Phones