New submissions for Wed, 13 Dec 23

Keyword: sgd

Layered Randomized Quantization for Communication-Efficient and Privacy-Preserving Distributed Learning

Authors: Authors: Guangfeng Yan, Tan Li, Tian Lan, Kui Wu, Linqi Song
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2312.07060
Pdf link: https://arxiv.org/pdf/2312.07060
Abstract Next-generation wireless networks, such as edge intelligence and wireless distributed learning, face two critical challenges: communication efficiency and privacy protection. In this work, our focus is on addressing these issues in a distributed learning framework. We consider a new approach that simultaneously achieves communication efficiency and privacy protection by exploiting the privacy advantage offered by quantization. Specifically, we use a quantization scheme called \textbf{Gau}ssian \textbf{L}ayered \textbf{R}andomized \textbf{Q}uantization (Gau-LRQ) that compresses the raw model gradients using a layer multishift coupler. By adjusting the parameters of Gau-LRQ, we shape the quantization error to follow the expected Gaussian distribution, thus ensuring client-level differential privacy (CLDP). We demonstrate the effectiveness of our proposed Gau-LRQ in the distributed stochastic gradient descent (SGD) framework and theoretically quantify the trade-offs between communication, privacy, and convergence performance. We further improve the convergence performance by enabling dynamic private budget and quantization bit allocation. We achieve this by using an optimization formula that minimizes convergence error subject to the privacy budget constraint. We evaluate our approach on multiple datasets, including MNIST, CIFAR-10, and CIFAR-100, and show that our proposed method outperforms the baselines in terms of learning performance under various privacy constraints. Moreover, we observe that dynamic privacy allocation yields additional accuracy improvements for the models compared to the fixed scheme.
Keyword: optimization

Optimizing Fault-Tolerant Quality-Guaranteed Sensor Deployments for UAV Localization in Critical Areas via Computational Geometry
Authors: Authors: Marco Esposito, Toni Mancini, Enrico Tronci
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computational Geometry (cs.CG)
Arxiv link: https://arxiv.org/abs/2312.06667
Pdf link: https://arxiv.org/pdf/2312.06667
Abstract The increasing spreading of small commercial Unmanned Aerial Vehicles (UAVs, aka drones) presents serious threats for critical areas such as airports, power plants, governmental and military facilities. In fact, such UAVs can easily disturb or jam radio communications, collide with other flying objects, perform espionage activity, and carry offensive payloads, e.g., weapons or explosives. A central problem when designing surveillance solutions for the localization of unauthorized UAVs in critical areas is to decide how many triangulating sensors to use, and where to deploy them to optimise both coverage and cost effectiveness. In this article, we compute deployments of triangulating sensors for UAV localization, optimizing a given blend of metrics, namely: coverage under multiple sensing quality levels, cost-effectiveness, fault-tolerance. We focus on large, complex 3D regions, which exhibit obstacles (e.g., buildings), varying terrain elevation, different coverage priorities, constraints on possible sensors placement. Our novel approach relies on computational geometry and statistical model checking, and enables the effective use of off-the-shelf AI-based black-box optimizers. Moreover, our method allows us to compute a closed-form, analytical representation of the region uncovered by a sensor deployment, which provides the means for rigorous, formal certification of the quality of the latter. We show the practical feasibility of our approach by computing optimal sensor deployments for UAV localization in two large, complex 3D critical regions, the Rome Leonardo Da Vinci International Airport (FCO) and the Vienna International Center (VIC), using NOMAD as our state-of-the-art underlying optimization engine. Results show that we can compute optimal sensor deployments within a few hours on a standard workstation and within minutes on a small parallel infrastructure.
Perceptual Similarity guidance and text guidance optimization for Editing Real Images using Guided Diffusion Models
Authors: Authors: Ruichen Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.06680
Pdf link: https://arxiv.org/pdf/2312.06680
Abstract When using a diffusion model for image editing, there are times when the modified image can differ greatly from the source. To address this, we apply a dual-guidance approach to maintain high fidelity to the original in areas that are not altered. First, we employ text-guided optimization, using text embeddings to direct latent space and classifier-free guidance. Second, we use perceptual similarity guidance, optimizing latent vectors with posterior sampling via Tweedie formula during the reverse process. This method ensures the realistic rendering of both the edited elements and the preservation of the unedited parts of the original image.
Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
Authors: Authors: Xiao Zhang, David Yunis, Michael Maire
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.06716
Pdf link: https://arxiv.org/pdf/2312.06716
Abstract We present an approach for analyzing grouping information contained within a neural network's activations, permitting extraction of spatial layout and semantic segmentation from the behavior of large pre-trained vision models. Unlike prior work, our method conducts a wholistic analysis of a network's activation state, leveraging features from all layers and obviating the need to guess which part of the model contains relevant information. Motivated by classic spectral clustering, we formulate this analysis in terms of an optimization objective involving a set of affinity matrices, each formed by comparing features within a different layer. Solving this optimization problem using gradient descent allows our technique to scale from single images to dataset-level analysis, including, in the latter, both intra- and inter-image relationships. Analyzing a pre-trained generative transformer provides insight into the computational strategy learned by such models. Equating affinity with key-query similarity across attention layers yields eigenvectors encoding scene spatial layout, whereas defining affinity by value vector similarity yields eigenvectors encoding object identity. This result suggests that key and query vectors coordinate attentional information flow according to spatial proximity (a where' pathway), while value vectors refine a semantic category representation (awhat' pathway).
Efficient and Effective Similarity Search over Bipartite Graphs
Authors: Authors: Renchi Yang
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2312.06724
Pdf link: https://arxiv.org/pdf/2312.06724
Abstract Similarity search over a bipartite graph aims to retrieve from the graph the nodes that are similar to each other, which finds applications in various fields such as online advertising, recommender systems etc. Existing similarity measures either (i) overlook the unique properties of bipartite graphs, or (ii) fail to capture high-order information between nodes accurately, leading to suboptimal result quality. Recently, Hidden Personalized PageRank (HPP) is applied to this problem and found to be more effective compared with prior similarity measures. However, existing solutions for HPP computation incur significant computational costs, rendering it inefficient especially on large graphs. In this paper, we first identify an inherent drawback of HPP and overcome it by proposing bidirectional HPP (BHPP). Then, we formulate similarity search over bipartite graphs as the problem of approximate BHPP computation, and present an efficient solution Approx-BHPP. Specifically, Approx-BHPP offers rigorous theoretical accuracy guarantees with optimal computational complexity by combining deterministic graph traversal with matrix operations in an optimized and non-trivial way. Moreover, our solution achieves significant gain in practical efficiency due to several carefully-designed optimizations. Extensive experiments, comparing BHPP against 8 existing similarity measures over 7 real bipartite graphs, demonstrate the effectiveness of BHPP on query rewriting and item recommendation. Moreover, Approx-BHPP outperforms baseline solutions often by up to orders of magnitude in terms of computational time on both small and large datasets.
RGNet: A Unified Retrieval and Grounding Network for Long Videos
Authors: Authors: Tanveer Hannan, Md Mohaiminul Islam, Thomas Seidl, Gedas Bertasius
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.06729
Pdf link: https://arxiv.org/pdf/2312.06729
Abstract We present a novel end-to-end method for long-form video temporal grounding to locate specific moments described by natural language queries. Prior long-video methods for this task typically contain two stages: proposal selection and grounding regression. However, the proposal selection of these methods is disjoint from the grounding network and is not trained end-to-end, which limits the effectiveness of these methods. Moreover, these methods operate uniformly over the entire temporal window, which is suboptimal given redundant and irrelevant features in long videos. In contrast to these prior approaches, we introduce RGNet, a unified network designed for jointly selecting proposals from hour-long videos and locating moments specified by natural language queries within them. To achieve this, we redefine proposal selection as a video-text retrieval task, i.e., retrieving the correct candidate videos given a text query. The core component of RGNet is a unified cross-modal RG-Encoder that bridges the two stages with shared features and mutual optimization. The encoder strategically focuses on relevant time frames using a sparse sampling technique. RGNet outperforms previous methods, demonstrating state-of-the-art performance on long video temporal grounding datasets MAD and Ego4D. The code is released at https://github.com/Tanveer81/RGNet
Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach
Authors: Authors: Travis Driver, Andrew Vaughan, Yang Cheng, Adnan Ansar, John Christian, Panagiotis Tsiotras
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.06865
Pdf link: https://arxiv.org/pdf/2312.06865
Abstract This paper proposes the incorporation of techniques from stereophotoclinometry (SPC) into a keypoint-based structure-from-motion (SfM) system to estimate the surface normal and albedo at detected landmarks to improve autonomous surface and shape characterization of small celestial bodies from in-situ imagery. In contrast to the current state-of-the-practice method for small body shape reconstruction, i.e., SPC, which relies on human-in-the-loop verification and high-fidelity a priori information to achieve accurate results, we forego the expensive maplet estimation step and instead leverage dense keypoint measurements and correspondences from an autonomous keypoint detection and matching method based on deep learning to provide the necessary photogrammetric constraints. Moreover, we develop a factor graph-based approach allowing for simultaneous optimization of the spacecraft's pose, landmark positions, Sun-relative direction, and surface normals and albedos via fusion of Sun sensor measurements and image keypoint measurements. The proposed framework is validated on real imagery of the Cornelia crater on Asteroid 4 Vesta, along with pose estimation and mapping comparison against an SPC reconstruction, where we demonstrate precise alignment to the SPC solution without relying on any a priori camera pose and topography information or humans-in-the-loop
ELSA: Partial Weight Freezing for Overhead-Free Sparse Network Deployment
Authors: Authors: Paniz Halvachi, Alexandra Peste, Dan Alistarh, Christoph H. Lampert
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.06872
Pdf link: https://arxiv.org/pdf/2312.06872
Abstract We present ELSA, a practical solution for creating deep networks that can easily be deployed at different levels of sparsity. The core idea is to embed one or more sparse networks within a single dense network as a proper subset of the weights. At prediction time, any sparse model can be extracted effortlessly simply be zeroing out weights according to a predefined mask. ELSA is simple, powerful and highly flexible. It can use essentially any existing technique for network sparsification and network training. In particular, it does not restrict the loss function, architecture or the optimization technique. Our experiments show that ELSA's advantages of flexible deployment comes with no or just a negligible reduction in prediction quality compared to the standard way of using multiple sparse networks that are trained and stored independently.
A Novel Differentiable Loss Function for Unsupervised Graph Neural Networks in Graph Partitioning
Authors: Authors: Vivek Chaudhary
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.06877
Pdf link: https://arxiv.org/pdf/2312.06877
Abstract In this paper, we explore the graph partitioning problem, a pivotal combina-torial optimization challenge with extensive applications in various fields such as science, technology, and business. Recognized as an NP-hard prob-lem, graph partitioning lacks polynomial-time algorithms for its resolution. Recently, there has been a burgeoning interest in leveraging machine learn-ing, particularly approaches like supervised, unsupervised, and reinforce-ment learning, to tackle such NP-hard problems. However, these methods face significant hurdles: supervised learning is constrained by the necessity of labeled solution instances, which are often computationally impractical to obtain; reinforcement learning grapples with instability in the learning pro-cess; and unsupervised learning contends with the absence of a differentia-ble loss function, a consequence of the discrete nature of most combinatorial optimization problems. Addressing these challenges, our research introduces a novel pipeline employing an unsupervised graph neural network to solve the graph partitioning problem. The core innovation of this study is the for-mulation of a differentiable loss function tailored for this purpose. We rigor-ously evaluate our methodology against contemporary state-of-the-art tech-niques, focusing on metrics: cuts and balance, and our findings reveal that our is competitive with these leading methods.
LoRA-Enhanced Distillation on Guided Diffusion Models
Authors: Authors: Pareesa Ameneh Golnari
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.06899
Pdf link: https://arxiv.org/pdf/2312.06899
Abstract Diffusion models, such as Stable Diffusion (SD), offer the ability to generate high-resolution images with diverse features, but they come at a significant computational and memory cost. In classifier-free guided diffusion models, prolonged inference times are attributed to the necessity of computing two separate diffusion models at each denoising step. Recent work has shown promise in improving inference time through distillation techniques, teaching the model to perform similar denoising steps with reduced computations. However, the application of distillation introduces additional memory overhead to these already resource-intensive diffusion models, making it less practical. To address these challenges, our research explores a novel approach that combines Low-Rank Adaptation (LoRA) with model distillation to efficiently compress diffusion models. This approach not only reduces inference time but also mitigates memory overhead, and notably decreases memory consumption even before applying distillation. The results are remarkable, featuring a significant reduction in inference time due to the distillation process and a substantial 50% reduction in memory consumption. Our examination of the generated images underscores that the incorporation of LoRA-enhanced distillation maintains image quality and alignment with the provided prompts. In summary, while conventional distillation tends to increase memory consumption, LoRA-enhanced distillation offers optimization without any trade-offs or compromises in quality.
Perseus: Removing Energy Bloat from Large Model Training
Authors: Authors: Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2312.06902
Pdf link: https://arxiv.org/pdf/2312.06902
Abstract Training large AI models on numerous GPUs consumes a massive amount of energy. We observe that not all energy consumed during training directly contributes to end-to-end training throughput, and a significant portion can be removed without slowing down training, which we call energy bloat. In this work, we identify two independent sources of energy bloat in large model training, intrinsic and extrinsic, and propose Perseus, a unified optimization framework that mitigates both. Perseus obtains the "iteration time-energy" Pareto frontier of any large model training job using an efficient iterative graph cut-based algorithm and schedules energy consumption of its forward and backward computations across time to remove intrinsic and extrinsic energy bloat. Evaluation on large models like GPT-3 and Bloom shows that Perseus reduces energy consumption of large model training by up to 30%, enabling savings otherwise unobtainable before.
"I Want It That Way": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming
Authors: Authors: Connor Lawless, Jakob Schoeffer, Lindy Le, Kael Rowan, Shilad Sen, Cristina St. Hill, Jina Suh, Bahar Sarrafzadeh
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2312.06908
Pdf link: https://arxiv.org/pdf/2312.06908
Abstract A critical factor in the success of decision support systems is the accurate modeling of user preferences. Psychology research has demonstrated that users often develop their preferences during the elicitation process, highlighting the pivotal role of system-user interaction in developing personalized systems. This paper introduces a novel approach, combining Large Language Models (LLMs) with Constraint Programming to facilitate interactive decision support. We study this hybrid framework through the lens of meeting scheduling, a time-consuming daily activity faced by a multitude of information workers. We conduct three studies to evaluate the novel framework, including a diary study (n=64) to characterize contextual scheduling preferences, a quantitative evaluation of the system's performance, and a user study (n=10) with a prototype system. Our work highlights the potential for a hybrid LLM and optimization approach for iterative preference elicitation and design considerations for building systems that support human-system collaborative decision-making processes.
Blockchain-Based Security Architecture for Unmanned Aerial Vehicles in B5G/6G Services and Beyond: A Comprehensive Approach
Authors: Authors: Senthil Kumar Jagatheesaperumal, Mohamed Rahouti, Kaiqi Xiong, Abdellah Chehri, Nasir Ghani, Jan Bieniek
Subjects: Cryptography and Security (cs.CR); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.06928
Pdf link: https://arxiv.org/pdf/2312.06928
Abstract Unmanned Aerial Vehicles (UAVs), previously favored by enthusiasts, have evolved into indispensable tools for effectively managing disasters and responding to emergencies. For example, one of their most critical applications is to provide seamless wireless communication services in remote rural areas. Thus, it is substantial to identify and consider the different security challenges in the research and development associated with advanced UAV-based B5G/6G architectures. Following this requirement, the present study thoroughly examines the security considerations about UAVs in relation to the architectural framework of the 5G/6G system, the technologies that facilitate its operation, and the concerns surrounding privacy. It exhibits security integration at all the protocol stack layers and analyzes the existing mechanisms to secure UAV-based B5G/6G communications and its energy and power optimization factors. Last, this article also summarizes modern technological trends for establishing security and protecting UAV-based systems, along with the open challenges and strategies for future research work.
Online Saddle Point Problem and Online Convex-Concave Optimization
Authors: Authors: Qing-xin Meng, Jian-wei Liu
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.06957
Pdf link: https://arxiv.org/pdf/2312.06957
Abstract Centered around solving the Online Saddle Point problem, this paper introduces the Online Convex-Concave Optimization (OCCO) framework, which involves a sequence of two-player time-varying convex-concave games. We propose the generalized duality gap (Dual-Gap) as the performance metric and establish the parallel relationship between OCCO with Dual-Gap and Online Convex Optimization (OCO) with regret. To demonstrate the natural extension of OCCO from OCO, we develop two algorithms, the implicit online mirror descent-ascent and its optimistic variant. Analysis reveals that their duality gaps share similar expression forms with the corresponding dynamic regrets arising from implicit updates in OCO. Empirical results further substantiate the effectiveness of our algorithms. Simultaneously, we unveil that the dynamic Nash equilibrium regret, which was initially introduced in a recent paper, has inherent defects.
Dynamically configured physics-informed neural network in topology optimization applications
Authors: Authors: Jichao Yin, Ziming Wen, Shuhao Li, Yaya Zhanga, Hu Wang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.06993
Pdf link: https://arxiv.org/pdf/2312.06993
Abstract Integration of machine learning (ML) into the topology optimization (TO) framework is attracting increasing attention, but data acquisition in data-driven models is prohibitive. Compared with popular ML methods, the physics-informed neural network (PINN) can avoid generating enormous amounts of data when solving forward problems and additionally provide better inference. To this end, a dynamically configured PINN-based topology optimization (DCPINN-TO) method is proposed. The DCPINN is composed of two subnetworks, namely the backbone neural network (NN) and the coefficient NN, where the coefficient NN has fewer trainable parameters. The designed architecture aims to dynamically configure trainable parameters; that is, an inexpensive NN is used to replace an expensive one at certain optimization cycles. Furthermore, an active sampling strategy is proposed to selectively sample collocations depending on the pseudo-densities at each optimization cycle. In this manner, the number of collocations will decrease with the optimization process but will hardly affect it. The Gaussian integral is used to calculate the strain energy of elements, which yields a byproduct of decoupling the mapping of the material at the collocations. Several examples with different resolutions validate the feasibility of the DCPINN-TO method, and multiload and multiconstraint problems are employed to illustrate its generalization. In addition, compared to finite element analysis-based TO (FEA-TO), the accuracy of the displacement prediction and optimization results indicate that the DCPINN-TO method is effective and efficient.
Stein Coverage: a Variational Inference Approach to Distribution-matching Multisensor Deployment
Authors: Authors: Donipolo Ghimire, Solmaz S. Kia
Subjects: Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2312.07001
Pdf link: https://arxiv.org/pdf/2312.07001
Abstract This paper examines the spatial coverage optimization problem for multiple sensors in a known convex environment, where the coverage service of each sensor is heterogeneous and anisotropic. We introduce the Stein Coverage algorithm, a distribution-matching coverage approach that aims to place sensors at positions and orientations such that their collective coverage distribution is as close as possible to the event distribution. To select the most important representative points from the coverage event distribution, Stein Coverage utilizes the Stein Variational Gradient Descent (SVGD), a deterministic sampling method from the variational inference literature. An innovation in our work is the introduction of a repulsive force between the samples in the SVGD algorithm to spread the samples and avoid footprint overlap for the deployed sensors. After pinpointing the points of interest for deployment, Stein Coverage solves the multisensor assignment problem using a bipartite optimal matching process. Simulations demonstrate the advantages of the Stein Coverage method compared to conventional Voronoi partitioning multisensor deployment methods.
Vision-language Assisted Attribute Learning
Authors: Authors: Kongming Liang, Xinran Wang, Rui Wang, Donghui Gao, Ling Jin, Weidong Liu, Xiatian Zhu, Zhanyu Ma, Jun Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.07009
Pdf link: https://arxiv.org/pdf/2312.07009
Abstract Attribute labeling at large scale is typically incomplete and partial, posing significant challenges to model optimization. Existing attribute learning methods often treat the missing labels as negative or simply ignore them all during training, either of which could hamper the model performance to a great extent. To overcome these limitations, in this paper we leverage the available vision-language knowledge to explicitly disclose the missing labels for enhancing model learning. Given an image, we predict the likelihood of each missing attribute label assisted by an off-the-shelf vision-language model, and randomly select to ignore those with high scores in training. Our strategy strikes a good balance between fully ignoring and negatifying the missing labels, as these high scores are found to be informative on revealing label ambiguity. Extensive experiments show that our proposed vision-language assisted loss can achieve state-of-the-art performance on the newly cleaned VAW dataset. Qualitative evaluation demonstrates the ability of the proposed method in predicting more complete attributes.
Debiasing Sequential Recommenders through Distributionally Robust Optimization over System Exposure
Authors: Authors: Jiyuan Yang, Yue Ding, Yidan Wang, Pengjie Ren, Zhumin Chen, Fei Cai, Jun Ma, Rui Zhang, Zhaochun Ren, Xin Xin
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2312.07036
Pdf link: https://arxiv.org/pdf/2312.07036
Abstract Sequential recommendation (SR) models are typically trained on user-item interactions which are affected by the system exposure bias, leading to the user preference learned from the biased SR model not being fully consistent with the true user preference. Exposure bias refers to the fact that user interactions are dependent upon the partial items exposed to the user. Existing debiasing methods do not make full use of the system exposure data and suffer from sub-optimal recommendation performance and high variance. In this paper, we propose to debias sequential recommenders through Distributionally Robust Optimization (DRO) over system exposure data. The key idea is to utilize DRO to optimize the worst-case error over an uncertainty set to safeguard the model against distributional discrepancy caused by the exposure bias. The main challenge to apply DRO for exposure debiasing in SR lies in how to construct the uncertainty set and avoid the overestimation of user preference on biased samples. Moreover, how to evaluate the debiasing effect on biased test set is also an open question. To this end, we first introduce an exposure simulator trained upon the system exposure data to calculate the exposure distribution, which is then regarded as the nominal distribution to construct the uncertainty set of DRO. Then, we introduce a penalty to items with high exposure probability to avoid the overestimation of user preference for biased samples. Finally, we design a debiased self-normalized inverse propensity score (SNIPS) evaluator for evaluating the debiasing effect on the biased offline test set. We conduct extensive experiments on two real-world datasets to verify the effectiveness of the proposed methods. Experimental results demonstrate the superior exposure debiasing performance of proposed methods. Codes and data are available at \url{https://github.com/nancheng58/DebiasedSR_DRO}.
Layered Randomized Quantization for Communication-Efficient and Privacy-Preserving Distributed Learning
Authors: Authors: Guangfeng Yan, Tan Li, Tian Lan, Kui Wu, Linqi Song
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2312.07060
Pdf link: https://arxiv.org/pdf/2312.07060
Abstract Next-generation wireless networks, such as edge intelligence and wireless distributed learning, face two critical challenges: communication efficiency and privacy protection. In this work, our focus is on addressing these issues in a distributed learning framework. We consider a new approach that simultaneously achieves communication efficiency and privacy protection by exploiting the privacy advantage offered by quantization. Specifically, we use a quantization scheme called \textbf{Gau}ssian \textbf{L}ayered \textbf{R}andomized \textbf{Q}uantization (Gau-LRQ) that compresses the raw model gradients using a layer multishift coupler. By adjusting the parameters of Gau-LRQ, we shape the quantization error to follow the expected Gaussian distribution, thus ensuring client-level differential privacy (CLDP). We demonstrate the effectiveness of our proposed Gau-LRQ in the distributed stochastic gradient descent (SGD) framework and theoretically quantify the trade-offs between communication, privacy, and convergence performance. We further improve the convergence performance by enabling dynamic private budget and quantization bit allocation. We achieve this by using an optimization formula that minimizes convergence error subject to the privacy budget constraint. We evaluate our approach on multiple datasets, including MNIST, CIFAR-10, and CIFAR-100, and show that our proposed method outperforms the baselines in terms of learning performance under various privacy constraints. Moreover, we observe that dynamic privacy allocation yields additional accuracy improvements for the models compared to the fixed scheme.
Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training
Authors: Authors: Qian Li, Yuxiao Hu, Yinpeng Dong, Dongxiao Zhang, Yuntian Chen
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
Arxiv link: https://arxiv.org/abs/2312.07067
Pdf link: https://arxiv.org/pdf/2312.07067
Abstract Adversarial training is often formulated as a min-max problem, however, concentrating only on the worst adversarial examples causes alternating repetitive confusion of the model, i.e., previously defended or correctly classified samples are not defensible or accurately classifiable in subsequent adversarial training. We characterize such non-ignorable samples as "hiders", which reveal the hidden high-risk regions within the secure area obtained through adversarial training and prevent the model from finding the real worst cases. We demand the model to prevent hiders when defending against adversarial examples for improving accuracy and robustness simultaneously. By rethinking and redefining the min-max optimization problem for adversarial training, we propose a generalized adversarial training algorithm called Hider-Focused Adversarial Training (HFAT). HFAT introduces the iterative evolution optimization strategy to simplify the optimization problem and employs an auxiliary model to reveal hiders, effectively combining the optimization directions of standard adversarial training and prevention hiders. Furthermore, we introduce an adaptive weighting mechanism that facilitates the model in adaptively adjusting its focus between adversarial examples and hiders during different training periods. We demonstrate the effectiveness of our method based on extensive experiments, and ensure that HFAT can provide higher robustness and accuracy.
Solving Large-Scale Electricity Market Pricing Problems in Polynomial Time
Authors: Authors: Mete Şeref Ahunbay, Martin Bichler, Teodora Dobos, Johannes Knörr
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2312.07071
Pdf link: https://arxiv.org/pdf/2312.07071
Abstract Electricity market operators worldwide use mixed-integer linear programming to solve the allocation problem in wholesale electricity markets. Prices are typically determined based on the duals of relaxed versions of this optimization problem. The resulting outcomes are efficient, but market operators must pay out-of-market uplifts to some market participants and incur a considerable budget deficit that was criticized by regulators. As the share of renewables increases, the number of market participants will grow, leading to larger optimization problems and runtime issues. At the same time, non-convexities will continue to matter e.g., due to ramping constraints of the generators required to address the variability of renewables or non-convex curtailment costs. We draw on recent theoretical advances in the approximation of competitive equilibrium to compute allocations and prices in electricity markets using convex optimization. The proposed mechanism promises approximate efficiency, no budget deficit, and computational tractability. We present experimental results for this new mechanism in the context of electricity markets, and compare the runtimes, the average efficiency loss of the method, and the uplifts paid with standard pricing rules. We find that the computations with the new algorithm are considerably fast for relevant problem sizes. In general, the computational advantages come at the cost of efficiency losses and a price markup for the demand side. Interestingly, both are small with realistic problem instances. Importantly, the market operator does not incur a budget deficit and the uplifts paid to market participants are significantly lower compared to standard pricing rules.
On the Potential of an Independent Avatar to Augment Metaverse User Socialization Time
Authors: Authors: Theofanis P. Raptis, Chiara Boldrini, Marco Conti, Andrea Passarella
Subjects: Social and Information Networks (cs.SI); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2312.07077
Pdf link: https://arxiv.org/pdf/2312.07077
Abstract We present a computational modelling approach which targets at capturing the specifics on how to virtually augment a Metaverse user's available social time capacity via using an independent and autonomous version of her digital representation in the Metaverse. We envision a Metaverse-focused extension of the traditional avatar concept: An avatar can be as well programmed to operate independently when its user is not controlling it directly, thus turning it into an agent-based digital human representation. This way, the user can virtually delegate on the avatar socializing time required for maintaining the existing contacts, so as to eventually maintain spare non-avatar-mediated socializing time which can be potentially invested in additional socialization activities. We model the setting and identify the characteristic variables via using selected concepts from social sciences: ego networks, social presence, and social cues. Then, we formulate the problem of maximizing the user's non-avatar-mediated spare time as a linear optimization. Finally, we analyze the feasible region of the problem and we present some initial insights on the spare time that can be achieved for different parameter values of the avatar-mediated interactions.
GNBG: A Generalized and Configurable Benchmark Generator for Continuous Numerical Optimization
Authors: Authors: Danial Yazdani (1), Mohammad Nabi Omidvar (2), Delaram Yazdani (3), Kalyanmoy Deb (4), Amir H. Gandomi (1,5) ((1) Faculty of Engineering & Information Technology, University of Technology Sydney, (2) School of Computing, University of Leeds, and Leeds University Business School, (3) Liverpool Logistics, Offshore and Marine (LOOM) Research Institute, Faculty of Engineering and Technology, School of Engineering, Liverpool John Moores University, (4) BEACON Center, Michigan State University, (5) University Research and Innovation Center (EKIK), Obuda University)
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2312.07083
Pdf link: https://arxiv.org/pdf/2312.07083
Abstract As optimization challenges continue to evolve, so too must our tools and understanding. To effectively assess, validate, and compare optimization algorithms, it is crucial to use a benchmark test suite that encompasses a diverse range of problem instances with various characteristics. Traditional benchmark suites often consist of numerous fixed test functions, making it challenging to align these with specific research objectives, such as the systematic evaluation of algorithms under controllable conditions. This paper introduces the Generalized Numerical Benchmark Generator (GNBG) for single-objective, box-constrained, continuous numerical optimization. Unlike existing approaches that rely on multiple baseline functions and transformations, GNBG utilizes a single, parametric, and configurable baseline function. This design allows for control over various problem characteristics. Researchers using GNBG can generate instances that cover a broad array of morphological features, from unimodal to highly multimodal functions, various local optima patterns, and symmetric to highly asymmetric structures. The generated problems can also vary in separability, variable interaction structures, dimensionality, conditioning, and basin shapes. These customizable features enable the systematic evaluation and comparison of optimization algorithms, allowing researchers to probe their strengths and weaknesses under diverse and controllable conditions.
Toward Robustness in Multi-label Classification: A Data Augmentation Strategy against Imbalance and Noise
Authors: Authors: Hwanjun Song, Minseok Kim, Jae-Gil Lee
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.07087
Pdf link: https://arxiv.org/pdf/2312.07087
Abstract Multi-label classification poses challenges due to imbalanced and noisy labels in training data. We propose a unified data augmentation method, named BalanceMix, to address these challenges. Our approach includes two samplers for imbalanced labels, generating minority-augmented instances with high diversity. It also refines multi-labels at the label-wise granularity, categorizing noisy labels as clean, re-labeled, or ambiguous for robust optimization. Extensive experiments on three benchmark datasets demonstrate that BalanceMix outperforms existing state-of-the-art methods. We release the code at https://github.com/DISL-Lab/BalanceMix.
Efficiently Programming Large Language Models using SGLang
Authors: Authors: Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Jeff Huang, Chuyue Sun, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng
Subjects: Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.07104
Pdf link: https://arxiv.org/pdf/2312.07104
Abstract Large language models (LLMs) are increasingly used for complex tasks requiring multiple chained generation calls, advanced prompting techniques, control flow, and interaction with external environments. However, efficient systems for programming and executing these applications are lacking. To bridge this gap, we introduce SGLang, a Structured Generation Language for LLMs. SGLang is designed for the efficient programming of LLMs and incorporates primitives for common LLM programming patterns. We have implemented SGLang as a domain-specific language embedded in Python, and we developed an interpreter, a compiler, and a high-performance runtime for SGLang. These components work together to enable optimizations such as parallelism, batching, caching, sharing, and other compilation techniques. Additionally, we propose RadixAttention, a novel technique that maintains a Least Recently Used (LRU) cache of the Key-Value (KV) cache for all requests in a radix tree, enabling automatic KV cache reuse across multiple generation calls at runtime. SGLang simplifies the writing of LLM programs and boosts execution efficiency. Our experiments demonstrate that SGLang can speed up common LLM tasks by up to 5x, while reducing code complexity and enhancing control.
General Tail Bounds for Non-Smooth Stochastic Mirror Descent
Authors: Authors: Khaled Eldowa, Andrea Paudice
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.07142
Pdf link: https://arxiv.org/pdf/2312.07142
Abstract In this paper, we provide novel tail bounds on the optimization error of Stochastic Mirror Descent for convex and Lipschitz objectives. Our analysis extends the existing tail bounds from the classical light-tailed Sub-Gaussian noise case to heavier-tailed noise regimes. We study the optimization error of the last iterate as well as the average of the iterates. We instantiate our results in two important cases: a class of noise with exponential tails and one with polynomial tails. A remarkable feature of our results is that they do not require an upper bound on the diameter of the domain. Finally, we support our theory with illustrative experiments that compare the behavior of the average of the iterates with that of the last iterate in heavy-tailed noise regimes.
Rate-Splitting Multiple Access for Semantic-Aware Networks: an Age of Incorrect Information Perspective
Authors: Authors: Onur Dizdar, Stephen Wang
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2312.07159
Pdf link: https://arxiv.org/pdf/2312.07159
Abstract In this letter, we design a downlink multi-user communication framework based on Rate-Splitting Multiple Access (RSMA) for semantic-aware networks. First, we formulate an optimization problem to obtain the optimal user scheduling, precoding, and power allocation schemes jointly. We consider the metric Age of Incorrect Information (AoII) in the objective function of the formulated problem to maximize the freshness of the overall information to be transmitted. Using big-M and Successive Convex Approximation (SCA) methods, we convert the resulting non-convex problem with conditional objective and constraints into a convex one and propose an iterative algorithm to solve it. By numerical results, we show that RSMA achieves a lower AoII than SDMA owing to its superior performance under multi-user interference.
Investigation into the Training Dynamics of Learned Optimizers
Authors: Authors: Jan Sobotka, Petr Šimánek, Daniel Vašata
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.07174
Pdf link: https://arxiv.org/pdf/2312.07174
Abstract Optimization is an integral part of modern deep learning. Recently, the concept of learned optimizers has emerged as a way to accelerate this optimization process by replacing traditional, hand-crafted algorithms with meta-learned functions. Despite the initial promising results of these methods, issues with stability and generalization still remain, limiting their practical use. Moreover, their inner workings and behavior under different conditions are not yet fully understood, making it difficult to come up with improvements. For this reason, our work examines their optimization trajectories from the perspective of network architecture symmetries and parameter update distributions. Furthermore, by contrasting the learned optimizers with their manually designed counterparts, we identify several key insights that demonstrate how each approach can benefit from the strengths of the other.
Construction and application of an algebraic dual basis and the Fine-Scale Greens' Function for computing projections and reconstructing unresolved scales
Authors: Authors: Suyash Shrestha, Joey Dekker, Marc Gerritsma, Steven Hulshoff, Ido Akkerman
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2312.07205
Pdf link: https://arxiv.org/pdf/2312.07205
Abstract In this paper, we build on the work of [T. Hughes, G. Sangalli, VARIATIONAL MULTISCALE ANALYSIS: THE FINE-SCALE GREENS' FUNCTION, PROJECTION, OPTIMIZATION, LOCALIZATION, AND STABILIZED METHODS, SIAM Journal of Numerical Analysis, 45(2), 2007] dealing with the explicit computation of the Fine-Scale Green's function. The original approach chooses a set of functionals associated with a projector to compute the Fine-Scale Green's function. The construction of these functionals, however, does not generalise to arbitrary projections, higher dimensions, or Spectral Element methods. We propose to generalise the construction of the required functionals by using dual functions. These dual functions can be directly derived from the chosen projector and are explicitly computable. We show how to find the dual functions for both the $L^2$ and the $H^1_0$ projections. We then go on to demonstrate that the Fine-Scale Green's functions constructed with the dual basis functions consistently reproduce the unresolved scales removed by the projector. The methodology is tested using one-dimensional Poisson and advection-diffusion problems, as well as a two-dimensional Poisson problem. We present the computed components of the Fine-Scale Green's function, and the Fine-Scale Green's function itself. These results show that the method works for arbitrary projections, in arbitrary dimensions. Moreover, the methodology can be applied to any Finite/Spectral Element or Isogeometric framework.
Safe Multi-Task Bayesian Optimization
Authors: Authors: Jannis O. Lübsen, Christian Hespe, Annika Eichler
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.07281
Pdf link: https://arxiv.org/pdf/2312.07281
Abstract Bayesian optimization has become a powerful tool for safe online optimization of systems, due to its high sample efficiency and noise robustness. For further speed-up reduced physical models of the system can be incorporated into the optimization to accelerate the process, since the models are able to offer an approximation of the actual system, and sampling from them is significantly cheaper. The similarity between model and reality is represented by additional hyperparameters and learned within the optimization process. Safety is an important criteria for online optimization methods like Bayesian optimization, which has been addressed by recent literature, which provide safety guarantees under the assumption of known hyperparameters. However, in practice this is not applicable. Therefore, we extend the robust Gaussian process uniform error bounds to meet the multi-task setting, which involves the calculation of a confidence region from the hyperparameter posterior distribution utilizing Markov chain Monte Carlo methods. Then, using the robust safety bounds, Bayesian optimization is applied to safely optimize the system while incorporating measurements of the models. Simulations show that the optimization can be significantly accelerated compared to other state-of-the-art safe Bayesian optimization methods depending on the fidelity of the models.
Statistically Distinct Plans for Multi-Objective Task Assignment
Authors: Authors: Nils Wilde, Javier Alonso-Mora
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.07292
Pdf link: https://arxiv.org/pdf/2312.07292
Abstract We study the problem of finding statistically distinct plans for stochastic planning and task assignment problems such as online multi-robot pickup and delivery (MRPD) when facing multiple competing objectives. In many real-world settings robot fleets do not only need to fulfil delivery requests, but also have to consider auxiliary objectives such as energy efficiency or avoiding human-centered work spaces. We pose MRPD as a multi-objective optimization problem where the goal is to find MRPD policies that yield different trade-offs between given objectives. There are two main challenges: 1) MRPD is computationally hard, which limits the number of trade-offs that can reasonably be computed, and 2) due to the random task arrivals, one needs to consider statistical variance of the objective values in addition to the average. We present an adaptive sampling algorithm that finds a set of policies which i) are approximately optimal, ii) approximate the set of all optimal solutions, and iii) are statistically distinguishable. We prove completeness and adapt a state-of-the-art MRPD solver to the multi-objective setting for three example objectives. In a series of simulation experiments we demonstrate the advantages of the proposed method compared to baseline approaches and show its robustness in a sensitivity analysis. The approach is general and could be adapted to other multi-objective task assignment and planning problems under uncertainty.
Coupled Confusion Correction: Learning from Crowds with Sparse Annotations
Authors: Authors: Hansong Zhang, Shikun Li, Dan Zeng, Chenggang Yan, Shiming Ge
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2312.07331
Pdf link: https://arxiv.org/pdf/2312.07331
Abstract As the size of the datasets getting larger, accurately annotating such datasets is becoming more impractical due to the expensiveness on both time and economy. Therefore, crowd-sourcing has been widely adopted to alleviate the cost of collecting labels, which also inevitably introduces label noise and eventually degrades the performance of the model. To learn from crowd-sourcing annotations, modeling the expertise of each annotator is a common but challenging paradigm, because the annotations collected by crowd-sourcing are usually highly-sparse. To alleviate this problem, we propose Coupled Confusion Correction (CCC), where two models are simultaneously trained to correct the confusion matrices learned by each other. Via bi-level optimization, the confusion matrices learned by one model can be corrected by the distilled data from the other. Moreover, we cluster the ``annotator groups'' who share similar expertise so that their confusion matrices could be corrected together. In this way, the expertise of the annotators, especially of those who provide seldom labels, could be better captured. Remarkably, we point out that the annotation sparsity not only means the average number of labels is low, but also there are always some annotators who provide very few labels, which is neglected by previous works when constructing synthetic crowd-sourcing annotations. Based on that, we propose to use Beta distribution to control the generation of the crowd-sourcing labels so that the synthetic annotations could be more consistent with the real-world ones. Extensive experiments are conducted on two types of synthetic datasets and three real-world datasets, the results of which demonstrate that CCC significantly outperforms state-of-the-art approaches.
Momentum Particle Maximum Likelihood
Authors: Authors: Jen Ning Lim, Juan Kuntz, Samuel Power, Adam M. Johansen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.07335
Pdf link: https://arxiv.org/pdf/2312.07335
Abstract Maximum likelihood estimation (MLE) of latent variable models is often recast as an optimization problem over the extended space of parameters and probability distributions. For example, the Expectation Maximization (EM) algorithm can be interpreted as coordinate descent applied to a suitable free energy functional over this space. Recently, this perspective has been combined with insights from optimal transport and Wasserstein gradient flows to develop particle-based algorithms applicable to wider classes of models than standard EM. Drawing inspiration from prior works which interpret `momentum-enriched' optimisation algorithms as discretizations of ordinary differential equations, we propose an analogous dynamical systems-inspired approach to minimizing the free energy functional over the extended space of parameters and probability distributions. The result is a dynamic system that blends elements of Nesterov's Accelerated Gradient method, the underdamped Langevin diffusion, and particle methods. Under suitable assumptions, we establish quantitative convergence of the proposed system to the unique minimiser of the functional in continuous time. We then propose a numerical discretization of this system which enables its application to parameter estimation in latent variable models. Through numerical experiments, we demonstrate that the resulting algorithm converges faster than existing methods and compares favourably with other (approximate) MLE algorithms.
MRCN: Enhanced Coherence Mechanism for Near Memory Processing Architectures
Authors: Authors: Amit Kumar Kabat, Shubhang Pandey, TG Venkatesh
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2312.07355
Pdf link: https://arxiv.org/pdf/2312.07355
Abstract In Near Memory Processing (NMP), processing elements(PEs) are placed near the 3D memory, reducing unnecessary data transfers between the CPU and the memory. However, as the CPUs and the PEs of the NMP use a shared memory space, maintaining coherency between them is a challenge. Most current literature relies on maintaining coherence for fine-grained or coarse-grained instruction granularities for the offloaded code blocks. We understand that for most NMP-offloaded instructions, the coherence conflict is low, and waiting for the coherence transaction hinders the performance. We construct an analytical model for an existing coherence strategy called CONDA, which is within 4% accuracy. This model indicates the key parameters responsible - the granularity of offloaded code, probability of conflicts, transaction times, and commit time. This paper identifies the prospective optimizations using the analytical model for CONDA. It proposes a new coherence scheme called MRCN: Monitored Rollback Coherence for NMP. MRCN addresses the coherence issue while eliminating unnecessary re-executions with limited hardware overhead. The MRCN is evaluated on synthetic as well as Rodinia benchmarks. The analytical results are within 4% accuracy of the simulation results. The MRCN shows improvement of upto 25% over CONDA strategy for the same benchmark under different execution conditions.
Medical Image Classification Using Transfer Learning and Chaos Game Optimization on the Internet of Medical Things
Authors: Authors: Alhassan Mabrouk, Abdelghani Dahou, Mohamed Abd Elaziz, Rebeca P. Díaz Redondo, Mohammed Kayed
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.07437
Pdf link: https://arxiv.org/pdf/2312.07437
Abstract The Internet of Medical Things (IoMT) has dramatically benefited medical professionals that patients and physicians can access from all regions. Although the automatic detection and prediction of diseases such as melanoma and leukemia is still being researched and studied in IoMT, existing approaches are not able to achieve a high degree of efficiency. Thus, with a new approach that provides better results, patients would access the adequate treatments earlier and the death rate would be reduced. Therefore, this paper introduces an IoMT proposal for medical images classification that may be used anywhere, i.e. it is an ubiquitous approach. It was design in two stages: first, we employ a Transfer Learning (TL)-based method for feature extraction, which is carried out using MobileNetV3; second, we use the Chaos Game Optimization (CGO) for feature selection, with the aim of excluding unnecessary features and improving the performance, which is key in IoMT. Our methodology was evaluated using ISIC-2016, PH2, and Blood-Cell datasets. The experimental results indicated that the proposed approach obtained an accuracy of 88.39% on ISIC-2016, 97.52% on PH2, and 88.79% on Blood-cell. Moreover, our approach had successful performances for the metrics employed compared to other existing methods.
Codesign of Humanoid Robots for Ergonomy Collaboration with Multiple Humans via Genetic Algorithms and Nonlinear Optimization
Authors: Authors: Carlotta Sartore, Lorenzo Rapetti, Fabio Bergonti, Stefano Dafarra, Silvio Traversaro, Daniele Pucci
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.07459
Pdf link: https://arxiv.org/pdf/2312.07459
Abstract Ergonomics is a key factor to consider when designing control architectures for effective physical collaborations between humans and humanoid robots. In contrast, ergonomic indexes are often overlooked in the robot design phase, which leads to suboptimal performance in physical human-robot interaction tasks. This paper proposes a novel methodology for optimizing the design of humanoid robots with respect to ergonomic indicators associated with the interaction of multiple agents. Our approach leverages a dynamic and kinematic parameterization of the robot link and motor specifications to seek for optimal robot designs using a bilevel optimization approach. Specifically, a genetic algorithm first generates robot designs by selecting the link and motor characteristics. Then, we use nonlinear optimization to evaluate interaction ergonomy indexes during collaborative payload lifting with different humans and weights. To assess the effectiveness of our approach, we compare the optimal design obtained using bilevel optimization against the design obtained using nonlinear optimization. Our results show that the proposed approach significantly improves ergonomics in terms of energy expenditure calculated in two reference scenarios involving static and dynamic robot motions. We plan to apply our methodology to drive the design of the ergoCub2 robot, a humanoid intended for optimal physical collaboration with humans in diverse environments
Search Optimization with Query Likelihood Boosting and Two-Level Approximate Search for Edge Devices
Authors: Authors: Jianwei Zhang, Helian Feng, Xin He, Grant P. Strimel, Farhad Ghassemi, Ali Kebarighotbi
Subjects: Information Retrieval (cs.IR); Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2312.07517
Pdf link: https://arxiv.org/pdf/2312.07517
Abstract We present a novel search optimization solution for approximate nearest neighbor (ANN) search on resource-constrained edge devices. Traditional ANN approaches fall short in meeting the specific demands of real-world scenarios, e.g., skewed query likelihood distribution and search on large-scale indices with a low latency and small footprint. To address these limitations, we introduce two key components: a Query Likelihood Boosted Tree (QLBT) to optimize average search latency for frequently used small datasets, and a two-level approximate search algorithm to enable efficient retrieval with large datasets on edge devices. We perform thorough evaluation on simulated and real data and demonstrate QLBT can significantly reduce latency by 15% on real data and our two-level search algorithm successfully achieve deployable accuracy and latency on a 10 million dataset for edge devices. In addition, we provide a comprehensive protocol for configuring and optimizing on-device search algorithm through extensive empirical studies.
Self-Healing Distributed Swarm Formation Control Using Image Moments
Authors: Authors: C. Lin Liu, Israel L. Donato Ridgley, Matthew L. Elwin, Michael Rubenstein, Randy A. Freeman, Kevin M. Lynch
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2312.07523
Pdf link: https://arxiv.org/pdf/2312.07523
Abstract Human-swarm interaction is facilitated by a low-dimensional encoding of the swarm formation, independent of the (possibly large) number of robots. We propose using image moments to encode two-dimensional formations of robots. Each robot knows the desired formation moments, and simultaneously estimates the current moments of the entire swarm while controlling its motion to better achieve the desired group moments. The estimator is a distributed optimization, requiring no centralized processing, and self-healing, meaning that the process is robust to initialization errors, packet drops, and robots being added to or removed from the swarm. Our experimental results with a swarm of 50 robots, suffering nearly 50% packet loss, show that distributed estimation and control of image moments effectively achieves desired swarm formations.
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
Authors: Authors: Soyong Shin, Juyong Kim, Eni Halilaj, Michael J. Black
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.07531
Pdf link: https://arxiv.org/pdf/2312.07531
Abstract The estimation of 3D human motion from video has progressed rapidly but current methods still have several key limitations. First, most methods estimate the human in camera coordinates. Second, prior work on estimating humans in global coordinates often assumes a flat ground plane and produces foot sliding. Third, the most accurate methods rely on computationally expensive optimization pipelines, limiting their use to offline applications. Finally, existing video-based methods are surprisingly less accurate than single-frame methods. We address these limitations with WHAM (World-grounded Humans with Accurate Motion), which accurately and efficiently reconstructs 3D human motion in a global coordinate system from video. WHAM learns to lift 2D keypoint sequences to 3D using motion capture data and fuses this with video features, integrating motion context and visual information. WHAM exploits camera angular velocity estimated from a SLAM method together with human motion to estimate the body's global trajectory. We combine this with a contact-aware trajectory refinement method that lets WHAM capture human motion in diverse conditions, such as climbing stairs. WHAM outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks. Code will be available for research purposes at this http URL
Keyword: adam

There is no result

Keyword: gradient

Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
Authors: Authors: Xiao Zhang, David Yunis, Michael Maire
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.06716
Pdf link: https://arxiv.org/pdf/2312.06716
Abstract We present an approach for analyzing grouping information contained within a neural network's activations, permitting extraction of spatial layout and semantic segmentation from the behavior of large pre-trained vision models. Unlike prior work, our method conducts a wholistic analysis of a network's activation state, leveraging features from all layers and obviating the need to guess which part of the model contains relevant information. Motivated by classic spectral clustering, we formulate this analysis in terms of an optimization objective involving a set of affinity matrices, each formed by comparing features within a different layer. Solving this optimization problem using gradient descent allows our technique to scale from single images to dataset-level analysis, including, in the latter, both intra- and inter-image relationships. Analyzing a pre-trained generative transformer provides insight into the computational strategy learned by such models. Equating affinity with key-query similarity across attention layers yields eigenvectors encoding scene spatial layout, whereas defining affinity by value vector similarity yields eigenvectors encoding object identity. This result suggests that key and query vectors coordinate attentional information flow according to spatial proximity (a where' pathway), while value vectors refine a semantic category representation (awhat' pathway).
MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
Authors: Authors: Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos, Martin Rünz, Lourdes Agapito, Matthias Nießner
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.06740
Pdf link: https://arxiv.org/pdf/2312.06740
Abstract We present Monocular Neural Parametric Head Models (MonoNPHM) for dynamic 3D head reconstructions from monocular RGB videos. To this end, we propose a latent appearance space that parameterizes a texture field on top of a neural parametric model. We constrain predicted color values to be correlated with the underlying geometry such that gradients from RGB effectively influence latent geometry codes during inverse rendering. To increase the representational capacity of our expression space, we augment our backward deformation field with hyper-dimensions, thus improving color and geometry representation in topologically challenging expressions. Using MonoNPHM as a learned prior, we approach the task of 3D head reconstruction using signed distance field based volumetric rendering. By numerically inverting our backward deformation field, we incorporated a landmark loss using facial anchor points that are closely tied to our canonical geometry representation. To evaluate the task of dynamic face reconstruction from monocular RGB videos we record 20 challenging Kinect sequences under casual conditions. MonoNPHM outperforms all baselines with a significant margin, and makes an important step towards easily accessible neural parametric face models through RGB tracking.
Symptom-based Machine Learning Models for the Early Detection of COVID-19: A Narrative Review
Authors: Authors: Moyosolu Akinloye
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.06832
Pdf link: https://arxiv.org/pdf/2312.06832
Abstract Despite the widespread testing protocols for COVID-19, there are still significant challenges in early detection of the disease, which is crucial for preventing its spread and optimizing patient outcomes. Owing to the limited testing capacity in resource-strapped settings and the limitations of the available traditional methods of testing, it has been established that a fast and efficient strategy is important to fully stop the virus. Machine learning models can analyze large datasets, incorporating patient-reported symptoms, clinical data, and medical imaging. Symptom-based detection methods have been developed to predict COVID-19, and they have shown promising results. In this paper, we provide an overview of the landscape of symptoms-only machine learning models for predicting COVID-19, including their performance and limitations. The review will also examine the performance of symptom-based models when compared to image-based models. Because different studies used varying datasets, methodologies, and performance metrics. Selecting the model that performs best relies on the context and objectives of the research. However, based on the results, we observed that ensemble classifier performed exceptionally well in predicting the occurrence of COVID-19 based on patient symptoms with the highest overall accuracy of 97.88%. Gradient Boosting Algorithm achieved an AUC (Area Under the Curve) of 0.90 and identified key features contributing to the decision-making process. Image-based models, as observed in the analyzed studies, have consistently demonstrated higher accuracy than symptom-based models, often reaching impressive levels ranging from 96.09% to as high as 99%.
When Bio-Inspired Computing meets Deep Learning: Low-Latency, Accurate, & Energy-Efficient Spiking Neural Networks from Artificial Neural Networks
Authors: Authors: Gourav Datta, Zeyu Liu, James Diffenderfer, Bhavya Kailkhura, Peter A. Beerel
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.06900
Pdf link: https://arxiv.org/pdf/2312.06900
Abstract Bio-inspired Spiking Neural Networks (SNN) are now demonstrating comparable accuracy to intricate convolutional neural networks (CNN), all while delivering remarkable energy and latency efficiency when deployed on neuromorphic hardware. In particular, ANN-to-SNN conversion has recently gained significant traction in developing deep SNNs with close to state-of-the-art (SOTA) test accuracy on complex image recognition tasks. However, advanced ANN-to-SNN conversion approaches demonstrate that for lossless conversion, the number of SNN time steps must equal the number of quantization steps in the ANN activation function. Reducing the number of time steps significantly increases the conversion error. Moreover, the spiking activity of the SNN, which dominates the compute energy in neuromorphic chips, does not reduce proportionally with the number of time steps. To mitigate the accuracy concern, we propose a novel ANN-to-SNN conversion framework, that incurs an exponentially lower number of time steps compared to that required in the SOTA conversion approaches. Our framework modifies the SNN integrate-and-fire (IF) neuron model with identical complexity and shifts the bias term of each batch normalization (BN) layer in the trained ANN. To mitigate the spiking activity concern, we propose training the source ANN with a fine-grained L1 regularizer with surrogate gradients that encourages high spike sparsity in the converted SNN. Our proposed framework thus yields lossless SNNs with ultra-low latency, ultra-low compute energy, thanks to the ultra-low timesteps and high spike sparsity, and ultra-high test accuracy, for example, 73.30% with only 4 time steps on the ImageNet dataset.
DGNet: Dynamic Gradient-guided Network with Noise Suppression for Underwater Image Enhancement
Authors: Authors: Jingchun Zhou, Zongxin He, Dehuan Zhang, Kin-man Lam, Weishi Zhang, Xianping Fu, Yi Wang, Chongyi Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.06999
Pdf link: https://arxiv.org/pdf/2312.06999
Abstract Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments. To solve this issue, previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features, limiting the generalization and adaptability of the model. Previous methods use the reference gradient that is constructed from original images and synthetic ground-truth images. This may cause the network performance to be influenced by some low-quality training data. Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space. This process improves image quality and avoids local optima. Moreover, we propose a Feature Restoration and Reconstruction module (FRR) based on a Channel Combination Inference (CCI) strategy and a Frequency Domain Smoothing module (FRS). These modules decouple other degradation features while reducing the impact of various types of noise on network performance. Experiments on multiple public datasets demonstrate the superiority of our method over existing state-of-the-art approaches, especially in achieving performance milestones: PSNR of 25.6dB and SSIM of 0.93 on the UIEB dataset. Its efficiency in terms of parameter size and inference time further attests to its broad practicality. The code will be made publicly available.
Stein Coverage: a Variational Inference Approach to Distribution-matching Multisensor Deployment
Authors: Authors: Donipolo Ghimire, Solmaz S. Kia
Subjects: Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2312.07001
Pdf link: https://arxiv.org/pdf/2312.07001
Abstract This paper examines the spatial coverage optimization problem for multiple sensors in a known convex environment, where the coverage service of each sensor is heterogeneous and anisotropic. We introduce the Stein Coverage algorithm, a distribution-matching coverage approach that aims to place sensors at positions and orientations such that their collective coverage distribution is as close as possible to the event distribution. To select the most important representative points from the coverage event distribution, Stein Coverage utilizes the Stein Variational Gradient Descent (SVGD), a deterministic sampling method from the variational inference literature. An innovation in our work is the introduction of a repulsive force between the samples in the SVGD algorithm to spread the samples and avoid footprint overlap for the deployed sensors. After pinpointing the points of interest for deployment, Stein Coverage solves the multisensor assignment problem using a bipartite optimal matching process. Simulations demonstrate the advantages of the Stein Coverage method compared to conventional Voronoi partitioning multisensor deployment methods.
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models
Authors: Authors: Arnav Chavan, Nahush Lele, Deepak Gupta
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2312.07046
Pdf link: https://arxiv.org/pdf/2312.07046
Abstract Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical. The computational demands associated with even minimal gradient updates present challenges, particularly on consumer-grade hardware. This paper introduces an innovative approach for the parametric and practical compression of LLMs based on reduced order modelling, which entails low-rank decomposition within the feature space and re-parameterization in the weight space. Notably, this compression technique operates in a layer-wise manner, obviating the need for a GPU device and enabling the compression of billion-scale models within stringent constraints of both memory and time. Our method represents a significant advancement in model compression by leveraging matrix decomposition, demonstrating superior efficacy compared to the prevailing state-of-the-art structured pruning method.
Layered Randomized Quantization for Communication-Efficient and Privacy-Preserving Distributed Learning
Authors: Authors: Guangfeng Yan, Tan Li, Tian Lan, Kui Wu, Linqi Song
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2312.07060
Pdf link: https://arxiv.org/pdf/2312.07060
Abstract Next-generation wireless networks, such as edge intelligence and wireless distributed learning, face two critical challenges: communication efficiency and privacy protection. In this work, our focus is on addressing these issues in a distributed learning framework. We consider a new approach that simultaneously achieves communication efficiency and privacy protection by exploiting the privacy advantage offered by quantization. Specifically, we use a quantization scheme called \textbf{Gau}ssian \textbf{L}ayered \textbf{R}andomized \textbf{Q}uantization (Gau-LRQ) that compresses the raw model gradients using a layer multishift coupler. By adjusting the parameters of Gau-LRQ, we shape the quantization error to follow the expected Gaussian distribution, thus ensuring client-level differential privacy (CLDP). We demonstrate the effectiveness of our proposed Gau-LRQ in the distributed stochastic gradient descent (SGD) framework and theoretically quantify the trade-offs between communication, privacy, and convergence performance. We further improve the convergence performance by enabling dynamic private budget and quantization bit allocation. We achieve this by using an optimization formula that minimizes convergence error subject to the privacy budget constraint. We evaluate our approach on multiple datasets, including MNIST, CIFAR-10, and CIFAR-100, and show that our proposed method outperforms the baselines in terms of learning performance under various privacy constraints. Moreover, we observe that dynamic privacy allocation yields additional accuracy improvements for the models compared to the fixed scheme.
More than Vanilla Fusion: a Simple, Decoupling-free, Attention Module for Multimodal Fusion Based on Signal Theory
Authors: Authors: Peiwen Sun, Yifan Zhang, Zishan Liu, Donghao Chen, Honggang Zhang
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2312.07212
Pdf link: https://arxiv.org/pdf/2312.07212
Abstract The vanilla fusion methods still dominate a large percentage of mainstream audio-visual tasks. However, the effectiveness of vanilla fusion from a theoretical perspective is still worth discussing. Thus, this paper reconsiders the signal fused in the multimodal case from a bionics perspective and proposes a simple, plug-and-play, attention module for vanilla fusion based on fundamental signal theory and uncertainty theory. In addition, previous work on multimodal dynamic gradient modulation still relies on decoupling the modalities. So, a decoupling-free gradient modulation scheme has been designed in conjunction with the aforementioned attention module, which has various advantages over the decoupled one. Experiment results show that just a few lines of code can achieve up to 2.0% performance improvements to several multimodal classification methods. Finally, quantitative evaluation of other fusion tasks reveals the potential for additional application scenarios.
Uncertainty Quantification for the Homogeneous Landau-Fokker-Planck Equation via Deterministic Particle Galerkin methods
Authors: Authors: Rafael Bailo, José Antonio Carrillo, Andrea Medaglia, Mattia Zanella
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP); Computational Physics (physics.comp-ph); Plasma Physics (physics.plasm-ph)
Arxiv link: https://arxiv.org/abs/2312.07218
Pdf link: https://arxiv.org/pdf/2312.07218
Abstract We design a deterministic particle method for the solution of the spatially homogeneous Landau equation with uncertainty. The deterministic particle approximation is based on the reformulation of the Landau equation as a formal gradient flow on the set of probability measures, whereas the propagation of uncertain quantities is computed by means of a sg representation of each particle. This approach guarantees spectral accuracy in uncertainty space while preserving the fundamental structural properties of the model: the positivity of the solution, the conservation of invariant quantities, and the entropy production. We provide a regularity results for the particle method in the random space. We perform the numerical validation of the particle method in a wealth of test cases.
Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation
Authors: Authors: Shentong Mo, Enze Xie, Yue Wu, Junsong Chen, Matthias Nießner, Zhenguo Li
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.07231
Pdf link: https://arxiv.org/pdf/2312.07231
Abstract Diffusion Transformers have recently shown remarkable effectiveness in generating high-quality 3D point clouds. However, training voxel-based diffusion models for high-resolution 3D voxels remains prohibitively expensive due to the cubic complexity of attention operators, which arises from the additional dimension of voxels. Motivated by the inherent redundancy of 3D compared to 2D, we propose FastDiT-3D, a novel masked diffusion transformer tailored for efficient 3D point cloud generation, which greatly reduces training costs. Specifically, we draw inspiration from masked autoencoders to dynamically operate the denoising process on masked voxelized point clouds. We also propose a novel voxel-aware masking strategy to adaptively aggregate background/foreground information from voxelized point clouds. Our method achieves state-of-the-art performance with an extreme masking ratio of nearly 99%. Moreover, to improve multi-category 3D generation, we introduce Mixture-of-Expert (MoE) in 3D diffusion model. Each category can learn a distinct diffusion path with different experts, relieving gradient conflict. Experimental results on the ShapeNet dataset demonstrate that our method achieves state-of-the-art high-fidelity and diverse 3D point cloud generation performance. Our FastDiT-3D improves 1-Nearest Neighbor Accuracy and Coverage metrics when generating 128-resolution voxel point clouds, using only 6.5% of the original training cost.
Scalable Motion Style Transfer with Constrained Diffusion Generation
Authors: Authors: Wenjie Yin, Yi Yu, Hang Yin, Danica Kragic, Mårten Björkman
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.07311
Pdf link: https://arxiv.org/pdf/2312.07311
Abstract Current training of motion style transfer systems relies on consistency losses across style domains to preserve contents, hindering its scalable application to a large number of domains and private data. Recent image transfer works show the potential of independent training on each domain by leveraging implicit bridging between diffusion models, with the content preservation, however, limited to simple data patterns. We address this by imposing biased sampling in backward diffusion while maintaining the domain independence in the training stage. We construct the bias from the source domain keyframes and apply them as the gradient of content constraints, yielding a framework with keyframe manifold constraint gradients (KMCGs). Our validation demonstrates the success of training separate models to transfer between as many as ten dance motion styles. Comprehensive experiments find a significant improvement in preserving motion contents in comparison to baseline and ablative diffusion-based style transfer models. In addition, we perform a human study for a subjective assessment of the quality of generated dance motions. The results validate the competitiveness of KMCGs.
Momentum Particle Maximum Likelihood
Authors: Authors: Jen Ning Lim, Juan Kuntz, Samuel Power, Adam M. Johansen
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.07335
Pdf link: https://arxiv.org/pdf/2312.07335
Abstract Maximum likelihood estimation (MLE) of latent variable models is often recast as an optimization problem over the extended space of parameters and probability distributions. For example, the Expectation Maximization (EM) algorithm can be interpreted as coordinate descent applied to a suitable free energy functional over this space. Recently, this perspective has been combined with insights from optimal transport and Wasserstein gradient flows to develop particle-based algorithms applicable to wider classes of models than standard EM. Drawing inspiration from prior works which interpret `momentum-enriched' optimisation algorithms as discretizations of ordinary differential equations, we propose an analogous dynamical systems-inspired approach to minimizing the free energy functional over the extended space of parameters and probability distributions. The result is a dynamic system that blends elements of Nesterov's Accelerated Gradient method, the underdamped Langevin diffusion, and particle methods. Under suitable assumptions, we establish quantitative convergence of the proposed system to the unique minimiser of the functional in continuous time. We then propose a numerical discretization of this system which enables its application to parameter estimation in latent variable models. Through numerical experiments, we demonstrate that the resulting algorithm converges faster than existing methods and compares favourably with other (approximate) MLE algorithms.
RMS: Redundancy-Minimizing Point Cloud Sampling for Real-Time Pose Estimation in Degenerated Environments
Authors: Authors: Pavel Petracek, Kostas Alexis, Martin Saska
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.07337
Pdf link: https://arxiv.org/pdf/2312.07337
Abstract The typical point cloud sampling methods used in state estimation for mobile robots preserve a high level of point redundancy. The point redundancy slows down the estimation pipeline and can make real-time estimation drift in geometrically symmetrical and structureless environments. We propose a novel point cloud sampling method that is capable of lowering the effects of geometrical degeneracies by minimizing redundancy within the cloud. The proposed method is an alternative to the commonly used sparsification methods that normalize the density of points to comply with the constraints on the real-time capabilities of a robot. In contrast to density normalization, our method builds on the fact that linear and planar surfaces contain a high level of redundancy propagated into iterative estimation pipelines. We define the concept of gradient flow quantifying the surface underlying a point. We also show that maximizing the entropy of the gradient flow minimizes point redundancy for robot ego-motion estimation. We integrate the proposed method into the point-based KISS-ICP and feature-based LOAM odometry pipelines and evaluate it experimentally on KITTI, Hilti-Oxford, and custom datasets from multirotor UAVs. The experiments show that the proposed sampling technique outperforms state-of-the-art methods in well-conditioned as well as in geometrically-degenerated settings, in both accuracy and speed.
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer
Authors: Authors: Linglin Jing, Ying Xue, Xu Yan, Chaoda Zheng, Dong Wang, Ruimao Zhang, Zhigang Wang, Hui Fang, Bin Zhao, Zhen Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.07378
Pdf link: https://arxiv.org/pdf/2312.07378
Abstract The field of 4D point cloud understanding is rapidly developing with the goal of analyzing dynamic 3D point cloud sequences. However, it remains a challenging task due to the sparsity and lack of texture in point clouds. Moreover, the irregularity of point cloud poses a difficulty in aligning temporal information within video sequences. To address these issues, we propose a novel cross-modal knowledge transfer framework, called X4D-SceneFormer. This framework enhances 4D-Scene understanding by transferring texture priors from RGB sequences using a Transformer architecture with temporal relationship mining. Specifically, the framework is designed with a dual-branch architecture, consisting of an 4D point cloud transformer and a Gradient-aware Image Transformer (GIT). During training, we employ multiple knowledge transfer techniques, including temporal consistency losses and masked self-attention, to strengthen the knowledge transfer between modalities. This leads to enhanced performance during inference using single-modal 4D point cloud inputs. Extensive experiments demonstrate the superior performance of our framework on various 4D point cloud video understanding tasks, including action recognition, action segmentation and semantic segmentation. The results achieve 1st places, i.e., 85.3% (+7.9%) accuracy and 47.3% (+5.0%) mIoU for 4D action segmentation and semantic segmentation, on the HOI4D challenge\footnote{\url{this http URL}.}, outperforming previous state-of-the-art by a large margin. We release the code at https://github.com/jinglinglingling/X4D
NAC-TCN: Temporal Convolutional Networks with Causal Dilated Neighborhood Attention for Emotion Understanding
Authors: Authors: Alexander Mehta, William Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.07507
Pdf link: https://arxiv.org/pdf/2312.07507
Abstract In the task of emotion recognition from videos, a key improvement has been to focus on emotions over time rather than a single frame. There are many architectures to address this task such as GRUs, LSTMs, Self-Attention, Transformers, and Temporal Convolutional Networks (TCNs). However, these methods suffer from high memory usage, large amounts of operations, or poor gradients. We propose a method known as Neighborhood Attention with Convolutions TCN (NAC-TCN) which incorporates the benefits of attention and Temporal Convolutional Networks while ensuring that causal relationships are understood which results in a reduction in computation and memory cost. We accomplish this by introducing a causal version of Dilated Neighborhood Attention while incorporating it with convolutions. Our model achieves comparable, better, or state-of-the-art performance over TCNs, TCAN, LSTMs, and GRUs while requiring fewer parameters on standard emotion recognition datasets. We publish our code online for easy reproducibility and use in other projects.
Keyword: super-resolution

TULIP: Transformer for Upsampling of LiDAR Point Cloud
Authors: Authors: Bin Yang, Patrick Pfreundschuh, Roland Siegwart, Marco Hutter, Peyman Moghadam, Vaishakh Patil
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.06733
Pdf link: https://arxiv.org/pdf/2312.06733
Abstract LiDAR Upsampling is a challenging task for the perception systems of robots and autonomous vehicles, due to the sparse and irregular structure of large-scale scene contexts. Recent works propose to solve this problem by converting LiDAR data from 3D Euclidean space into an image super-resolution problem in 2D image space. Although their methods can generate high-resolution range images with fine-grained details, the resulting 3D point clouds often blur out details and predict invalid points. In this paper, we propose TULIP, a new method to reconstruct high-resolution LiDAR point clouds from low-resolution LiDAR input. We also follow a range image-based approach but specifically modify the patch and window geometries of a Swin-Transformer-based network to better fit the characteristics of range images. We conducted several experiments on three different public real-world and simulated datasets. TULIP outperforms state-of-the-art methods in all relevant metrics and generates robust and more realistic point clouds than prior works.

zoq / arxiv-updates

New submissions for Wed, 13 Dec 23 #663

Keyword: sgd

Layered Randomized Quantization for Communication-Efficient and Privacy-Preserving Distributed Learning

Keyword: optimization

Optimizing Fault-Tolerant Quality-Guaranteed Sensor Deployments for UAV Localization in Critical Areas via Computational Geometry

Perceptual Similarity guidance and text guidance optimization for Editing Real Images using Guided Diffusion Models

Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations

Efficient and Effective Similarity Search over Bipartite Graphs

RGNet: A Unified Retrieval and Grounding Network for Long Videos

Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach

ELSA: Partial Weight Freezing for Overhead-Free Sparse Network Deployment

A Novel Differentiable Loss Function for Unsupervised Graph Neural Networks in Graph Partitioning

LoRA-Enhanced Distillation on Guided Diffusion Models

Perseus: Removing Energy Bloat from Large Model Training

"I Want It That Way": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming

Blockchain-Based Security Architecture for Unmanned Aerial Vehicles in B5G/6G Services and Beyond: A Comprehensive Approach

Online Saddle Point Problem and Online Convex-Concave Optimization

Dynamically configured physics-informed neural network in topology optimization applications

Stein Coverage: a Variational Inference Approach to Distribution-matching Multisensor Deployment

Vision-language Assisted Attribute Learning

Debiasing Sequential Recommenders through Distributionally Robust Optimization over System Exposure

Layered Randomized Quantization for Communication-Efficient and Privacy-Preserving Distributed Learning

Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training

Solving Large-Scale Electricity Market Pricing Problems in Polynomial Time

On the Potential of an Independent Avatar to Augment Metaverse User Socialization Time

GNBG: A Generalized and Configurable Benchmark Generator for Continuous Numerical Optimization

Toward Robustness in Multi-label Classification: A Data Augmentation Strategy against Imbalance and Noise

Efficiently Programming Large Language Models using SGLang

General Tail Bounds for Non-Smooth Stochastic Mirror Descent

Rate-Splitting Multiple Access for Semantic-Aware Networks: an Age of Incorrect Information Perspective

Investigation into the Training Dynamics of Learned Optimizers

Construction and application of an algebraic dual basis and the Fine-Scale Greens' Function for computing projections and reconstructing unresolved scales

Safe Multi-Task Bayesian Optimization

Statistically Distinct Plans for Multi-Objective Task Assignment

Coupled Confusion Correction: Learning from Crowds with Sparse Annotations

Momentum Particle Maximum Likelihood

MRCN: Enhanced Coherence Mechanism for Near Memory Processing Architectures

Medical Image Classification Using Transfer Learning and Chaos Game Optimization on the Internet of Medical Things

Codesign of Humanoid Robots for Ergonomy Collaboration with Multiple Humans via Genetic Algorithms and Nonlinear Optimization

Search Optimization with Query Likelihood Boosting and Two-Level Approximate Search for Edge Devices

Self-Healing Distributed Swarm Formation Control Using Image Moments

WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion

Keyword: adam

Keyword: gradient

Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations

MonoNPHM: Dynamic Head Reconstruction from Monocular Videos

Symptom-based Machine Learning Models for the Early Detection of COVID-19: A Narrative Review

When Bio-Inspired Computing meets Deep Learning: Low-Latency, Accurate, & Energy-Efficient Spiking Neural Networks from Artificial Neural Networks

DGNet: Dynamic Gradient-guided Network with Noise Suppression for Underwater Image Enhancement

Stein Coverage: a Variational Inference Approach to Distribution-matching Multisensor Deployment

Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models

Layered Randomized Quantization for Communication-Efficient and Privacy-Preserving Distributed Learning

More than Vanilla Fusion: a Simple, Decoupling-free, Attention Module for Multimodal Fusion Based on Signal Theory

Uncertainty Quantification for the Homogeneous Landau-Fokker-Planck Equation via Deterministic Particle Galerkin methods

Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation

Scalable Motion Style Transfer with Constrained Diffusion Generation

Momentum Particle Maximum Likelihood

RMS: Redundancy-Minimizing Point Cloud Sampling for Real-Time Pose Estimation in Degenerated Environments

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer

NAC-TCN: Temporal Convolutional Networks with Causal Dilated Neighborhood Attention for Emotion Understanding

Keyword: super-resolution

TULIP: Transformer for Upsampling of LiDAR Point Cloud