Abstract
Neural field is an emerging paradigm in data representation that trains a neural network to approximate the given signal. A key obstacle that prevents its widespread adoption is the encoding speed-generating neural fields requires an overfitting of a neural network, which can take a significant number of SGD steps to reach the desired fidelity level. In this paper, we delve into the impacts of data transformations on the speed of neural field training, specifically focusing on how permuting pixel locations affect the convergence speed of SGD. Counterintuitively, we find that randomly permuting the pixel locations can considerably accelerate the training. To explain this phenomenon, we examine the neural field training through the lens of PSNR curves, loss landscapes, and error patterns. Our analyses suggest that the random pixel permutations remove the easy-to-fit patterns, which facilitate easy optimization in the early stage but hinder capturing fine details of the signal.
The Effects of Overparameterization on Sharpness-aware Minimization: An Empirical and Theoretical Analysis
Authors: Authors: Sungbin Shin, Dongyeop Lee, Maksym Andriushchenko, Namhoon Lee
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
Training an overparameterized neural network can yield minimizers of the same level of training loss and yet different generalization capabilities. With evidence that indicates a correlation between sharpness of minima and their generalization errors, increasing efforts have been made to develop an optimization method to explicitly find flat minima as more generalizable solutions. This sharpness-aware minimization (SAM) strategy, however, has not been studied much yet as to how overparameterization can actually affect its behavior. In this work, we analyze SAM under varying degrees of overparameterization and present both empirical and theoretical results that suggest a critical influence of overparameterization on SAM. Specifically, we first use standard techniques in optimization to prove that SAM can achieve a linear convergence rate under overparameterization in a stochastic setting. We also show that the linearly stable minima found by SAM are indeed flatter and have more uniformly distributed Hessian moments compared to those of SGD. These results are corroborated with our experiments that reveal a consistent trend that the generalization improvement made by SAM continues to increase as the model becomes more overparameterized. We further present that sparsity can open up an avenue for effective overparameterization in practice.
Leveraging Graph Diffusion Models for Network Refinement Tasks
Authors: Authors: Puja Trivedi, Ryan Rossi, David Arbour, Tong Yu, Franck Dernoncourt, Sungchul Kim, Nedim Lipka, Namyong Park, Nesreen K. Ahmed, Danai Koutra
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract
Most real-world networks are noisy and incomplete samples from an unknown target distribution. Refining them by correcting corruptions or inferring unobserved regions typically improves downstream performance. Inspired by the impressive generative capabilities that have been used to correct corruptions in images, and the similarities between "in-painting" and filling in missing nodes and edges conditioned on the observed graph, we propose a novel graph generative framework, SGDM, which is based on subgraph diffusion. Our framework not only improves the scalability and fidelity of graph diffusion models, but also leverages the reverse process to perform novel, conditional generation tasks. In particular, through extensive empirical analysis and a set of novel metrics, we demonstrate that our proposed model effectively supports the following refinement tasks for partially observable networks: T1: denoising extraneous subgraphs, T2: expanding existing subgraphs and T3: performing "style" transfer by regenerating a particular subgraph to match the characteristics of a different node or subgraph.
Keyword: optimization
Deep convolutional encoder-decoder hierarchical neural networks for conjugate heat transfer surrogate modeling
Authors: Authors: Takiah Ebbs-Picken, David A. Romero, Carlos M. Da Silva, Cristina H. Amon
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Abstract
Conjugate heat transfer (CHT) models are vital for the design of many engineering systems. However, high-fidelity CHT models are computationally intensive, which limits their use in applications such as design optimization, where hundreds to thousands of model evaluations are required. In this work, we develop a modular deep convolutional encoder-decoder hierarchical (DeepEDH) neural network, a novel deep-learning-based surrogate modeling methodology for computationally intensive CHT models. Leveraging convective temperature dependencies, we propose a two-stage temperature prediction architecture that couples velocity and temperature models. The proposed DeepEDH methodology is demonstrated by modeling the pressure, velocity, and temperature fields for a liquid-cooled cold-plate-based battery thermal management system with variable channel geometry. A computational model of the cold plate is developed and solved using the finite element method (FEM), generating a dataset of 1,500 simulations. The FEM results are transformed and scaled from unstructured to structured, image-like meshes to create training and test datasets. The DeepEDH methodology's performance is examined in relation to data scaling, training dataset size, and network depth. Our performance analysis covers the impact of the novel architecture, separate field models, output geometry masks, multi-stage temperature models, and optimizations of the hyperparameters and architecture. Furthermore, we quantify the influence of the CHT thermal boundary condition on surrogate model performance, highlighting improved temperature model performance with higher heat fluxes. Compared to other deep learning neural network surrogate models, such as U-Net and DenseED, the proposed DeepEDH methodology for CHT models exhibits up to a 65% enhancement in the coefficient of determination ($R^{2}$).
Practical Layout-Aware Analog/Mixed-Signal Design Automation with Bayesian Neural Networks
Authors: Authors: Ahmet F. Budak, Keren Zhu, David Z. Pan
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
The high simulation cost has been a bottleneck of practical analog/mixed-signal design automation. Many learning-based algorithms require thousands of simulated data points, which is impractical for expensive to simulate circuits. We propose a learning-based algorithm that can be trained using a small amount of data and, therefore, scalable to tasks with expensive simulations. Our efficient algorithm solves the post-layout performance optimization problem where simulations are known to be expensive. Our comprehensive study also solves the schematic-level sizing problem. For efficient optimization, we utilize Bayesian Neural Networks as a regression model to approximate circuit performance. For layout-aware optimization, we handle the problem as a multi-fidelity optimization problem and improve efficiency by exploiting the correlations from cheaper evaluations. We present three test cases to demonstrate the efficiency of our algorithms. Our tests prove that the proposed approach is more efficient than conventional baselines and state-of-the-art algorithms.
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
Authors: Authors: Linqi Zhou, Andy Shih, Chenlin Meng, Stefano Ermon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Abstract
Recent methods such as Score Distillation Sampling (SDS) and Variational Score Distillation (VSD) using 2D diffusion models for text-to-3D generation have demonstrated impressive generation quality. However, the long generation time of such algorithms significantly degrades the user experience. To tackle this problem, we propose DreamPropeller, a drop-in acceleration algorithm that can be wrapped around any existing text-to-3D generation pipeline based on score distillation. Our framework generalizes Picard iterations, a classical algorithm for parallel sampling an ODE path, and can account for non-ODE paths such as momentum-based gradient updates and changes in dimensions during the optimization process as in many cases of 3D generation. We show that our algorithm trades parallel compute for wallclock time and empirically achieves up to 4.7x speedup with a negligible drop in generation quality for all tested frameworks.
In Search of a Data Transformation That Accelerates Neural Field Training
Authors: Authors: Junwon Seo, Sangyoon Lee, Kwang In Kim, Jaeho Lee
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Neural field is an emerging paradigm in data representation that trains a neural network to approximate the given signal. A key obstacle that prevents its widespread adoption is the encoding speed-generating neural fields requires an overfitting of a neural network, which can take a significant number of SGD steps to reach the desired fidelity level. In this paper, we delve into the impacts of data transformations on the speed of neural field training, specifically focusing on how permuting pixel locations affect the convergence speed of SGD. Counterintuitively, we find that randomly permuting the pixel locations can considerably accelerate the training. To explain this phenomenon, we examine the neural field training through the lens of PSNR curves, loss landscapes, and error patterns. Our analyses suggest that the random pixel permutations remove the easy-to-fit patterns, which facilitate easy optimization in the early stage but hinder capturing fine details of the signal.
DyRA: Dynamic Resolution Adjustment for Scale-robust Object Detection
Authors: Authors: Daeun Seo, Hoeseok Yang, Hyungshin Kim
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
In object detection, achieving constant accuracy is challenging due to the variability of object sizes. One possible solution to this problem is to optimize the input resolution, known as a multi-resolution strategy. Previous approaches for optimizing resolution are often based on pre-defined resolutions or a dynamic neural network, but there is a lack of study for run-time resolution optimization for existing architecture. In this paper, we propose an adaptive resolution scaling network called DyRA, which comprises convolutions and transformer encoder blocks, for existing detectors. Our DyRA returns a scale factor from an input image, which enables instance-specific scaling. This network is jointly trained with detectors with specially designed loss functions, namely ParetoScaleLoss and BalanceLoss. The ParetoScaleLoss produces an adaptive scale factor from the image, while the BalanceLoss optimizes the scale factor according to localization power for the dataset. The loss function is designed to minimize accuracy drop about the contrasting objective of small and large objects. Our experiments on COCO, RetinaNet, Faster-RCNN, FCOS, and Mask-RCNN achieved 1.3%, 1.1%, 1.3%, and 0.8% accuracy improvement than a multi-resolution baseline with solely resolution adjustment. The code is available at https://github.com/DaEunFullGrace/DyRA.git.
Abstract
In recent years, the field of single-cell RNA sequencing has seen a surge in the development of clustering methods. These methods enable the identification of cell subpopulations, thereby facilitating the understanding of tumor microenvironments. Despite their utility, most existing clustering algorithms primarily focus on the attribute information provided by the cell matrix or the network structure between cells, often neglecting the network between genes. This oversight could lead to loss of information and clustering results that lack clinical significance. To address this limitation, we develop an advanced single-cell clustering model incorporating dual-graph alignment, which integrates gene network information into the clustering process based on self-supervised and unsupervised optimization. Specifically, we designed a graph-based autoencoder enhanced by an attention mechanism to effectively capture relationships between cells. Moreover, we performed the node2vec method on Protein-Protein Interaction (PPI) networks to derive the gene network structure and maintained this structure throughout the clustering process. Our proposed method has been demonstrated to be effective through experimental results, showcasing its ability to optimize clustering outcomes while preserving the original associations between cells and genes. This research contributes to obtaining accurate cell subpopulations and generates clustering results that more closely resemble real-world biological scenarios. It provides better insights into the characteristics and distribution of diseased cells, ultimately building a foundation for early disease diagnosis and treatment.
Continuous Pose for Monocular Cameras in Neural Implicit Representation
Authors: Authors: Qi Ma, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this paper, we showcase the effectiveness of optimizing monocular camera poses as a continuous function of time. The camera poses are represented using an implicit neural function which maps the given time to the corresponding camera pose. The mapped camera poses are then used for the downstream tasks where joint camera pose optimization is also required. While doing so, the network parameters -- that implicitly represent camera poses -- are optimized. We exploit the proposed method in four diverse experimental settings, namely, (1) NeRF from noisy poses; (2) NeRF from asynchronous Events; (3) Visual Simultaneous Localization and Mapping (vSLAM); and (4) vSLAM with IMUs. In all four settings, the proposed method performs significantly better than the compared baselines and the state-of-the-art methods. Additionally, using the assumption of continuous motion, changes in pose may actually live in a manifold that has lower than 6 degrees of freedom (DOF) is also realized. We call this low DOF motion representation as the \emph{intrinsic motion} and use the approach in vSLAM settings, showing impressive camera tracking performance.
TLControl: Trajectory and Language Control for Human Motion Synthesis
Abstract
Controllable human motion synthesis is essential for applications in AR/VR, gaming, movies, and embodied AI. Existing methods often focus solely on either language or full trajectory control, lacking precision in synthesizing motions aligned with user-specified trajectories, especially for multi-joint control. To address these issues, we present TLControl, a new method for realistic human motion synthesis, incorporating both low-level trajectory and high-level language semantics controls. Specifically, we first train a VQ-VAE to learn a compact latent motion space organized by body parts. We then propose a Masked Trajectories Transformer to make coarse initial predictions of full trajectories of joints based on the learned latent motion space, with user-specified partial trajectories and text descriptions as conditioning. Finally, we introduce an efficient test-time optimization to refine these coarse predictions for accurate trajectory control. Experiments demonstrate that TLControl outperforms the state-of-the-art in trajectory accuracy and time efficiency, making it practical for interactive and high-quality animation generation.
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Abstract
We propose SceneTex, a novel method for effectively generating high-quality and style-consistent textures for indoor scenes using depth-to-image diffusion priors. Unlike previous methods that either iteratively warp 2D views onto a mesh surface or distillate diffusion latent features without accurate geometric and style cues, SceneTex formulates the texture synthesis task as an optimization problem in the RGB space where style and geometry consistency are properly reflected. At its core, SceneTex proposes a multiresolution texture field to implicitly encode the mesh appearance. We optimize the target texture via a score-distillation-based objective function in respective RGB renderings. To further secure the style consistency across views, we introduce a cross-attention decoder to predict the RGB values by cross-attending to the pre-sampled reference locations in each instance. SceneTex enables various and accurate texture synthesis for 3D-FRONT scenes, demonstrating significant improvements in visual quality and prompt fidelity over the prior texture generation methods.
Enhancing the Performance of Neural Networks Through Causal Discovery and Integration of Domain Knowledge
Abstract
In this paper, we develop a generic methodology to encode hierarchical causality structure among observed variables into a neural network in order to improve its predictive performance. The proposed methodology, called causality-informed neural network (CINN), leverages three coherent steps to systematically map the structural causal knowledge into the layer-to-layer design of neural network while strictly preserving the orientation of every causal relationship. In the first step, CINN discovers causal relationships from observational data via directed acyclic graph (DAG) learning, where causal discovery is recast as a continuous optimization problem to avoid the combinatorial nature. In the second step, the discovered hierarchical causality structure among observed variables is systematically encoded into neural network through a dedicated architecture and customized loss function. By categorizing variables in the causal DAG as root, intermediate, and leaf nodes, the hierarchical causal DAG is translated into CINN with a one-to-one correspondence between nodes in the causal DAG and units in the CINN while maintaining the relative order among these nodes. Regarding the loss function, both intermediate and leaf nodes in the DAG graph are treated as target outputs during CINN training so as to drive co-learning of causal relationships among different types of nodes. As multiple loss components emerge in CINN, we leverage the projection of conflicting gradients to mitigate gradient interference among the multiple learning tasks. Computational experiments across a broad spectrum of UCI data sets demonstrate substantial advantages of CINN in predictive performance over other state-of-the-art methods. In addition, an ablation study underscores the value of integrating structural and quantitative causal knowledge in enhancing the neural network's predictive performance incrementally.
eMotions: A Large-Scale Dataset for Emotion Recognition in Short Videos
Authors: Authors: Xuecheng Wu, Heli Sun, Junxiao Xue, Ruofan Zhai, Xiangyan Kong, Jiayu Nie, Liang He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract
Nowadays, short videos (SVs) are essential to information acquisition and sharing in our life. The prevailing use of SVs to spread emotions leads to the necessity of emotion recognition in SVs. Considering the lack of SVs emotion data, we introduce a large-scale dataset named eMotions, comprising 27,996 videos. Meanwhile, we alleviate the impact of subjectivities on labeling quality by emphasizing better personnel allocations and multi-stage annotations. In addition, we provide the category-balanced and test-oriented variants through targeted data sampling. Some commonly used videos (e.g., facial expressions and postures) have been well studied. However, it is still challenging to understand the emotions in SVs. Since the enhanced content diversity brings more distinct semantic gaps and difficulties in learning emotion-related features, and there exists information gaps caused by the emotion incompleteness under the prevalently audio-visual co-expressions. To tackle these problems, we present an end-to-end baseline method AV-CPNet that employs the video transformer to better learn semantically relevant representations. We further design the two-stage cross-modal fusion module to complementarily model the correlations of audio-visual features. The EP-CE Loss, incorporating three emotion polarities, is then applied to guide model optimization. Extensive experimental results on nine datasets verify the effectiveness of AV-CPNet. Datasets and code will be open on https://github.com/XuecWu/eMotions.
Two Scalable Approaches for Burned-Area Mapping Using U-Net and Landsat Imagery
Authors: Authors: Ian Mancilla-Wulff, Jaime Carrasco, Cristobal Pais, Alejandro Miranda, Andres Weintraub
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Abstract
Monitoring wildfires is an essential step in minimizing their impact on the planet, understanding the many negative environmental, economic, and social consequences. Recent advances in remote sensing technology combined with the increasing application of artificial intelligence methods have improved real-time, high-resolution fire monitoring. This study explores two proposed approaches based on the U-Net model for automating and optimizing the burned-area mapping process. Denoted 128 and AllSizes (AS), they are trained on datasets with a different class balance by cropping input images to different sizes. They are then applied to Landsat imagery and time-series data from two fire-prone regions in Chile. The results obtained after enhancement of model performance by hyperparameter optimization demonstrate the effectiveness of both approaches. Tests based on 195 representative images of the study area show that increasing dataset balance using the AS model yields better performance. More specifically, AS exhibited a Dice Coefficient (DC) of 0.93, an Omission Error (OE) of 0.086, and a Commission Error (CE) of 0.045, while the 128 model achieved a DC of 0.86, an OE of 0.12, and a CE of 0.12. These findings should provide a basis for further development of scalable automatic burned-area mapping tools.
Efficient and Scalable Architecture for Multiple-chip Implementation of Simulated Bifurcation Machines
Abstract
Ising machines are specialized computers for finding the lowest energy states of Ising spin models, onto which many practical combinatorial optimization problems can be mapped. Simulated bifurcation (SB) is a quantum-inspired parallelizable algorithm for Ising problems that enables scalable multi-chip implementations of Ising machines. However, the computational performance of a previously proposed multi-chip architecture tends to saturate as the number of chips increases for a given problem size because both computation and communication are exclusive in the time domain. In this paper, we propose a streaming architecture for multi-chip implementations of SB-based Ising machines with full spin-to-spin connectivity. The data flow in in-chip computation is harmonized with the data flow in inter-chip communication, enabling the computation and communication to overlap and the communication time to be hidden. Systematic experiments demonstrate linear strong scaling of performance up to the vicinity of the ideal communication limit determined only by the latency of chip-to-chip communication. Our eight-FPGA (field-programmable gate array) cluster can compute a 32,768-spin problem with a high pipeline efficiency of 97.9%. The performance of a 79-FPGA cluster for a 100,000-spin problem, projected using a theoretical performance model validated on smaller experimental clusters, is comparable to that of a state-of-the-art 100,000-spin optical Ising machine.
Comparison of metaheuristics for the firebreak placement problem: a simulation-based optimization approach
Authors: Authors: David Palacios-Meneses, Jaime Carrasco, Sebastián Dávila, Maximiliano Martínez, Rodrigo Mahaluf, Andrés Weintraub
Abstract
The problem of firebreak placement is crucial for fire prevention, and its effectiveness at landscape scale will depend on their ability to impede the progress of future wildfires. To provide an adequate response, it is therefore necessary to consider the stochastic nature of fires, which are highly unpredictable from ignition to extinction. Thus, the placement of firebreaks can be considered a stochastic optimization problem where: (1) the objective function is to minimize the expected cells burnt of the landscape; (2) the decision variables being the location of firebreaks; and (3) the random variable being the spatial propagation/behavior of fires. In this paper, we propose a solution approach for the problem from the perspective of simulation-based optimization (SbO), where the objective function is not available (a black-box function), but can be computed (and/or approximated) by wildfire simulations. For this purpose, Genetic Algorithm and GRASP are implemented. The final implementation yielded favorable results for the Genetic Algorithm, demonstrating strong performance in scenarios with medium to high operational capacity, as well as medium levels of stochasticity
Gene-MOE: A Sparsely-gated Framework for Pan-Cancer Genomic Analysis
Authors: Authors: Xiangyu Meng, Tao Song, Qing Yang, Huanhuan Dai, Lian Qiao, Hongzhen Ding, Long Hao, Xun Wang
Abstract
Analyzing the genomic information from the Pan-Cancer database can help us understand cancer-related factors and contribute to the cancer diagnosis and prognosis. However, existing computational methods and deep learning methods can not effectively find the deep correlations between tens of thousands of genes, which leads to precision loss. In this paper, we proposed a novel pretrained model called Gene-MOE to learn the general feature representations of the Pan-Cancer dataset and transfer the pretrained weights to the downstream tasks. The Gene-MOE fully exploits the mixture of expert (MOE) layers to learn rich feature representations of high-dimensional genes. At the same time, we build a mixture of attention expert (MOAE) model to learn the deep semantic relationships within genetic features. Finally, we proposed a new self-supervised pretraining strategy including loss function design, data enhancement, and optimization strategy to train the Gene-MOE and further improve the performance for the downstream analysis. We carried out cancer classification and survival analysis experiments based on the Gene-MOE. According to the survival analysis results on 14 cancer types, using Gene-MOE outperformed state-of-the-art models on 12 cancer types. According to the classification results, the total accuracy of the classification model for 33 cancer classifications reached 95.2\%. Through detailed feature analysis, we found the Gene-MOE model can learn rich feature representations of high-dimensional genes.
GNNFlow: A Distributed Framework for Continuous Temporal GNN Learning on Dynamic Graphs
Abstract
Graph Neural Networks (GNNs) play a crucial role in various fields. However, most existing deep graph learning frameworks assume pre-stored static graphs and do not support training on graph streams. In contrast, many real-world graphs are dynamic and contain time domain information. We introduce GNNFlow, a distributed framework that enables efficient continuous temporal graph representation learning on dynamic graphs on multi-GPU machines. GNNFlow introduces an adaptive time-indexed block-based data structure that effectively balances memory usage with graph update and sampling operation efficiency. It features a hybrid GPU-CPU graph data placement for rapid GPU-based temporal neighborhood sampling and kernel optimizations for enhanced sampling processes. A dynamic GPU cache for node and edge features is developed to maximize cache hit rates through reuse and restoration strategies. GNNFlow supports distributed training across multiple machines with static scheduling to ensure load balance. We implement GNNFlow based on DGL and PyTorch. Our experimental results show that GNNFlow provides up to 21.1x faster continuous learning than existing systems.
Group-wise Sparse and Explainable Adversarial Attacks
Authors: Authors: Shpresim Sadiku, Moritz Wagner, Sebastian Pokutta
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, typically regularized by the $\ell_0$ norm. Recent efforts have replaced this norm with a structural sparsity regularizer, such as the nuclear group norm, to craft group-wise sparse adversarial attacks. The resulting perturbations are thus explainable and hold significant practical relevance, shedding light on an even greater vulnerability of DNNs than previously anticipated. However, crafting such attacks poses an optimization challenge, as it involves computing norms for groups of pixels within a non-convex objective. In this paper, we tackle this challenge by presenting an algorithm that simultaneously generates group-wise sparse attacks within semantically meaningful areas of an image. In each iteration, the core operation of our algorithm involves the optimization of a quasinorm adversarial loss. This optimization is achieved by employing the $1/2$-quasinorm proximal operator for some iterations, a method tailored for nonconvex programming. Subsequently, the algorithm transitions to a projected Nesterov's accelerated gradient descent with $2$-norm regularization applied to perturbation magnitudes. We rigorously evaluate the efficacy of our novel attack in both targeted and non-targeted attack scenarios, on CIFAR-10 and ImageNet datasets. When compared to state-of-the-art methods, our attack consistently results in a remarkable increase in group-wise sparsity, e.g., an increase of $48.12\%$ on CIFAR-10 and $40.78\%$ on ImageNet (average case, targeted attack), all while maintaining lower perturbation magnitudes. Notably, this performance is complemented by a significantly faster computation time and a $100\%$ attack success rate.
Wireless Network Digital Twin for 6G: Generative AI as A Key Enabler
Abstract
Digital twin, which enables emulation, evaluation, and optimization of physical entities through synchronized digital replicas, has gained increasingly attention as a promising technology for intricate wireless networks. For 6G, numerous innovative wireless technologies and network architectures have posed new challenges in establishing wireless network digital twins. To tackle these challenges, artificial intelligence (AI), particularly the flourishing generative AI, emerges as a potential solution. In this article, we discuss emerging prerequisites for wireless network digital twins considering the complicated network architecture, tremendous network scale, extensive coverage, and diversified application scenarios in the 6G era. We further explore the applications of generative AI, such as transformer and diffusion model, to empower the 6G digital twin from multiple perspectives including implementation, physical-digital synchronization, and slicing capability. Subsequently, we propose a hierarchical generative AI-enabled wireless network digital twin at both the message-level and policy-level, and provide a typical use case with numerical results to validate the effectiveness and efficiency. Finally, open research issues for wireless network digital twins in the 6G era are discussed.
Towards Higher Ranks via Adversarial Weight Pruning
Abstract
Convolutional Neural Networks (CNNs) are hard to deploy on edge devices due to its high computation and storage complexities. As a common practice for model compression, network pruning consists of two major categories: unstructured and structured pruning, where unstructured pruning constantly performs better. However, unstructured pruning presents a structured pattern at high pruning rates, which limits its performance. To this end, we propose a Rank-based PruninG (RPG) method to maintain the ranks of sparse weights in an adversarial manner. In each step, we minimize the low-rank approximation error for the weight matrices using singular value decomposition, and maximize their distance by pushing the weight matrices away from its low rank approximation. This rank-based optimization objective guides sparse weights towards a high-rank topology. The proposed method is conducted in a gradual pruning fashion to stabilize the change of rank during training. Experimental results on various datasets and different tasks demonstrate the effectiveness of our algorithm in high sparsity. The proposed RPG outperforms the state-of-the-art performance by 1.13% top-1 accuracy on ImageNet in ResNet-50 with 98% sparsity. The codes are available at https://github.com/huawei-noah/Efficient-Computing/tree/master/Pruning/RPG and https://gitee.com/mindspore/models/tree/master/research/cv/RPG.
Model Performance Prediction for Hyperparameter Optimization of Deep Learning Models Using High Performance Computing and Quantum Annealing
Authors: Authors: Juan Pablo García Amboage, Eric Wulff, Maria Girone, Tomás F. Pena
Subjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
Abstract
Hyperparameter Optimization (HPO) of Deep Learning-based models tends to be a compute resource intensive process as it usually requires to train the target model with many different hyperparameter configurations. We show that integrating model performance prediction with early stopping methods holds great potential to speed up the HPO process of deep learning models. Moreover, we propose a novel algorithm called Swift-Hyperband that can use either classical or quantum support vector regression for performance prediction and benefit from distributed High Performance Computing environments. This algorithm is tested not only for the Machine-Learned Particle Flow model used in High Energy Physics, but also for a wider range of target models from domains such as computer vision and natural language processing. Swift-Hyperband is shown to find comparable (or better) hyperparameters as well as using less computational resources in all test cases.
The Effects of Overparameterization on Sharpness-aware Minimization: An Empirical and Theoretical Analysis
Authors: Authors: Sungbin Shin, Dongyeop Lee, Maksym Andriushchenko, Namhoon Lee
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Abstract
Training an overparameterized neural network can yield minimizers of the same level of training loss and yet different generalization capabilities. With evidence that indicates a correlation between sharpness of minima and their generalization errors, increasing efforts have been made to develop an optimization method to explicitly find flat minima as more generalizable solutions. This sharpness-aware minimization (SAM) strategy, however, has not been studied much yet as to how overparameterization can actually affect its behavior. In this work, we analyze SAM under varying degrees of overparameterization and present both empirical and theoretical results that suggest a critical influence of overparameterization on SAM. Specifically, we first use standard techniques in optimization to prove that SAM can achieve a linear convergence rate under overparameterization in a stochastic setting. We also show that the linearly stable minima found by SAM are indeed flatter and have more uniformly distributed Hessian moments compared to those of SGD. These results are corroborated with our experiments that reveal a consistent trend that the generalization improvement made by SAM continues to increase as the model becomes more overparameterized. We further present that sparsity can open up an avenue for effective overparameterization in practice.
Abstract
We study the problems of distributed online and bandit convex optimization against an adaptive adversary. We aim to minimize the average regret on $M$ machines working in parallel over $T$ rounds with $R$ intermittent communications. Assuming the underlying cost functions are convex and can be generated adaptively, our results show that collaboration is not beneficial when the machines have access to the first-order gradient information at the queried points. This is in contrast to the case for stochastic functions, where each machine samples the cost functions from a fixed distribution. Furthermore, we delve into the more challenging setting of federated online optimization with bandit (zeroth-order) feedback, where the machines can only access values of the cost functions at the queried points. The key finding here is identifying the high-dimensional regime where collaboration is beneficial and may even lead to a linear speedup in the number of machines. We further illustrate our findings through federated adversarial linear bandits by developing novel distributed single and two-point feedback algorithms. Our work is the first attempt towards a systematic understanding of federated online optimization with limited feedback, and it attains tight regret bounds in the intermittent communication setting for both first and zeroth-order feedback. Our results thus bridge the gap between stochastic and adaptive settings in federated online optimization.
A Unified Framework for Multi-Hop Wireless Relaying with Hardware Impairments
Abstract
Relaying increases the coverage area and reliability of wireless communications systems by mitigating the fading effect on the received signal. Most technical contributions in the context of these systems assume ideal hardware (ID) by neglecting the non-idealities of the transceivers, which include phase noise, in-phase/quadrature mismatch and high power amplifier nonlinearities. These non-idealities create distortion on the received signal by causing variations in the phase and attenuating the amplitude. The resulting deterioration of the performance of wireless communication systems is further magnified as the frequency of transmission increases. In this paper, we investigate the aggregate impact of hardware impairments (HI) on the general multi-hop relay system using amplify-and-forward (AF) and decode-and-forward (DF) relaying techniques over a general H-fading model. H-fading model includes free space optics, radio frequency, millimeter wave, Terahertz, and underwater fading models. Closed-form expressions of outage probability, bit error probability and ergodic capacity are derived in terms of H-functions. Following an asymptotic analysis at high signal-to-noise ratio (SNR), practical optimization problems have been formulated with the objective of finding the optimal level of HI subject to the limitation on the total HI level. The analytical solution has been derived for the Nakagami-m fading channel which is a special case of H-fading for AF and DF relaying techniques. The overall instantaneous signal-to-noise-plus-distortion ratio has been demonstrated to reach a ceiling at high SNRs which has a reciprocal proportion to the HI level of all hops transceivers on the contrary to the ID.
Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images
Authors: Authors: Jiaqi Zhao, Zeyu Ding, Yong Zhou, Hancheng Zhu, Wenliang Du, Rui Yao, Abdulmotaleb El Saddik
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Object instances in remote sensing images often distribute with multi-orientations, varying scales, and dense distribution. These issues bring challenges to end-to-end oriented object detectors including multi-scale features alignment and a large number of queries. To address these limitations, we propose an end-to-end oriented detector equipped with an efficient decoder, which incorporates two technologies, Rotated RoI attention (RRoI attention) and Selective Distinct Queries (SDQ). Specifically, RRoI attention effectively focuses on oriented regions of interest through a cross-attention mechanism and aligns multi-scale features. SDQ collects queries from intermediate decoder layers and then filters similar queries to obtain distinct queries. The proposed SDQ can facilitate the optimization of one-to-one label assignment, without introducing redundant initial queries or extra auxiliary branches. Extensive experiments on five datasets demonstrate the effectiveness of our method. Notably, our method achieves state-of-the-art performance on DIOR-R (67.31% mAP), DOTA-v1.5 (67.43% mAP), and DOTA-v2.0 (53.28% mAP) with the ResNet50 backbone.
Optimization in Mobile Augmented Reality Systems for the Metaverse over Wireless Communications
Authors: Authors: Tianming Lan, Jun Zhao
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract
As the essential technical support for Metaverse, Mobile Augmented Reality (MAR) has attracted the attention of many researchers. MAR applications rely on real-time processing of visual and audio data, and thus those heavy workloads can quickly drain the battery of a mobile device. To address such problem, edge-based solutions have appeared for handling some tasks that require more computing power. However, such strategies introduce a new trade-off: reducing the network latency and overall energy consumption requires limiting the size of the data sent to the edge server, which, in turn, results in lower accuracy. In this paper, we design an edge-based MAR system and propose a mathematical model to describe it and analyze the trade-off between latency, accuracy, server resources allocation and energy consumption. Furthermore, an algorithm named LEAO is proposed to solve this problem. We evaluate the performance of the LEAO and other related algorithms across various simulation scenarios. The results demonstrate the superiority of the LEAO algorithm. Finally, our work provides insight into optimization problem in edge-based MAR system for Metaverse.
Fair Text-to-Image Diffusion via Fair Mapping
Authors: Authors: Jia Li, Lijie Hu, Jingfeng Zhang, Tianhang Zheng, Hua Zhang, Di Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract
In this paper, we address the limitations of existing text-to-image diffusion models in generating demographically fair results when given human-related descriptions. These models often struggle to disentangle the target language context from sociocultural biases, resulting in biased image generation. To overcome this challenge, we propose Fair Mapping, a general, model-agnostic, and lightweight approach that modifies a pre-trained text-to-image model by controlling the prompt to achieve fair image generation. One key advantage of our approach is its high efficiency. The training process only requires updating a small number of parameters in an additional linear mapping network. This not only reduces the computational cost but also accelerates the optimization process. We first demonstrate the issue of bias in generated results caused by language biases in text-guided diffusion models. By developing a mapping network that projects language embeddings into an unbiased space, we enable the generation of relatively balanced demographic results based on a keyword specified in the prompt. With comprehensive experiments on face image generation, we show that our method significantly improves image generation performance when prompted with descriptions related to human faces. By effectively addressing the issue of bias, we produce more fair and diverse image outputs. This work contributes to the field of text-to-image generation by enhancing the ability to generate images that accurately reflect the intended demographic characteristics specified in the text.
GenZI: Zero-Shot 3D Human-Scene Interaction Generation
Authors: Authors: Lei Li, Angela Dai
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
Can we synthesize 3D humans interacting with scenes without learning from any 3D human-scene interaction data? We propose GenZI, the first zero-shot approach to generating 3D human-scene interactions. Key to GenZI is our distillation of interaction priors from large vision-language models (VLMs), which have learned a rich semantic space of 2D human-scene compositions. Given a natural language description and a coarse point location of the desired interaction in a 3D scene, we first leverage VLMs to imagine plausible 2D human interactions inpainted into multiple rendered views of the scene. We then formulate a robust iterative optimization to synthesize the pose and shape of a 3D human model in the scene, guided by consistency with the 2D interaction hypotheses. In contrast to existing learning-based approaches, GenZI circumvents the conventional need for captured 3D interaction data, and allows for flexible control of the 3D interaction synthesis with easy-to-use text prompts. Extensive experiments show that our zero-shot approach has high flexibility and generality, making it applicable to diverse scene types, including both indoor and outdoor environments.
Robust Localization and Tracking of UAVs in OTFS-based Networks
Abstract
We consider the problem of accurately localizing N unmanned aerial vehicles (UAV) in 3D space where the UAVs are part of a swarm and communicate with each other through orthogonal time-frequency space (OTFS) modulated signals. Each receiving UAV estimates the multipath wireless channel on each link formed by the line-of-sight (LoS) transmission and by the single reflections from the remaining N-2 UAVs. The estimated power delay profiles are communicated to an edge server, which is in charge of computing the exact location and speed of the UAVs. To obtain the UAVs locations and velocities, we propose an iterative algorithm, named Turbo Iterative Positioning (TIP), which, using a belief-propagation approach, effectively exploits the time difference of arrival (TDoA) measurements between the LoS and the non-LoS paths. Enabling a full cold start (no prior knowledge), our solution first maps each TDoA's profile element to a specific ID of the reflecting UAV's. The Doppler shifts measured by the OTFS receivers associated with each path are also used to estimate the UAV's velocities. The localization of the N UAVs is then derived via gradient descent optimization, with the aid of turbo-like iterations that can progressively correct some of the residual errors in the initial ID mapping operation. Our numerical results, obtained also using real-world traces, show how the multipath links are beneficial to achieving very accurate localization and speed of all UAVs, even with a limited delay-Doppler resolution. Robustness of our scheme is proven by its performance approaching the Cramer-Rao bound.
Variational Bayes image restoration with compressive autoencoders
Authors: Authors: Maud Biquard, Marie Chabert, Thomas Oberlin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Abstract
Regularization of inverse problems is of paramount importance in computational imaging. The ability of neural networks to learn efficient image representations has been recently exploited to design powerful data-driven regularizers. While state-of-the-art plug-and-play methods rely on an implicit regularization provided by neural denoisers, alternative Bayesian approaches consider Maximum A Posteriori (MAP) estimation in the latent space of a generative model, thus with an explicit regularization. However, state-of-the-art deep generative models require a huge amount of training data compared to denoisers. Besides, their complexity hampers the optimization of the latent MAP. In this work, we propose to use compressive autoencoders for latent estimation. These networks, which can be seen as variational autoencoders with a flexible latent prior, are smaller and easier to train than state-of-the-art generative models. We then introduce the Variational Bayes Latent Estimation (VBLE) algorithm, which performs this estimation within the framework of variational inference. This allows for fast and easy (approximate) posterior sampling. Experimental results on image datasets BSD and FFHQ demonstrate that VBLE reaches similar performance than state-of-the-art plug-and-play methods, while being able to quantify uncertainties faster than other existing posterior sampling techniques.
Robust Scheduling in Cloud Environment Based on Heuristic Optimization Algorithm
Abstract
Aiming at analyzing performance in cloud computing, some unpredictable perturbations which may lead to performance downgrade are essential factors that should not be neglected. To avoid performance downgrade in cloud computing system, it is reasonable to measure the impact of the perturbations, and further propose a robust scheduling strategy to maintain the performance of the system at an acceptable level. In this paper, we first describe the supply-demand relationship of service between cloud service providers and customers, in which the profit and waiting time are objectives they most concerned. Then, on the basis of introducing the lowest acceptable profit and longest acceptable waiting time for cloud service providers and customers respectively, we define a robustness metric method to declare that the number and speed of servers should be adequately configured in a feasible region, such that the performance of cloud computing system can stay at an acceptable level when it is subject to the perturbations. Subsequently, we discuss the robustness metric method in several cases, and propose heuristic optimization algorithm to enhance the robustness of the system as much as possible. At last, the performances of the proposed algorithm are validated by comparing with DE and PSO algorithm, the results show the superiority of the proposed algorithm.
Robustness Approaches for the Examination Timetabling Problem under Data Uncertainty
Abstract
In the literature the examination timetabling problem (ETTP) is often considered a post-enrollment problem (PE-ETTP). In the real world, universities often schedule their exams before students register using information from previous terms. A direct consequence of this approach is the uncertainty present in the resulting models. In this work we discuss several approaches available in the robust optimization literature. We consider the implications of each approach in respect to the examination timetabling problem and present how the most favorable approaches can be applied to the ETTP. Afterwards we analyze the impact of some possible implementations of the given robustness approaches on two real world instances and several random instances generated by our instance generation framework which we introduce in this work.
A Simple and General Operational Framework to Deploy Optimal Routes with Source Routing
Abstract
Source Routing, currently facilitated by Segment Routing (SR), enables precise control of forwarding paths by specifying detours (or segments) to deviate IP packets along routes with advanced properties beyond typical shortest IGP paths. Computing the desired optimal segment lists, known as encoding, leads to interesting challenges as the number of detours is tightly constrained for hardware performance. Existing solutions either lack generality, correctness, optimality, or practical computing efficiency-in particular for sparse realistic networks. In this paper, we address all such challenges with GOFOR-SR. Our framework extends usual path computation algorithms to inherently look at optimal and feasible segment lists, streamlining the deployment of TE-compliant paths. By integrating encoding within the path computation itself and modifying the distance comparison method, GOFOR allows algorithms with various optimization objectives to efficiently compute optimal segment lists. Despite the loss of substructure optimality induced by SR, GOFOR proves particularly efficient, inducing only a linear overhead at worst. It also offers different strategies and path diversity options for intricate TE-aware loadbalancing. We formally prove the correctness and optimality of GOFOR, implement our framework for various practical usecases, and demonstrate its performance and benefits on both real and challenging topologies.
Identifying Dynamic Regulation with Adversarial Surrogates
Authors: Authors: Ron Teichner, Naama Brenner, Ron Meir
Abstract
Homeostasis, the ability to maintain a stable internal environment in the face of perturbations, is essential for the functioning of living systems. Given observations of a system, or even a detailed model of one, it is both valuable and extremely challenging to extract the control objectives of the homeostatic mechanisms. Lacking a clear separation between plant and controller, frameworks such as inverse optimal control and inverse reinforcement learning are unable to identify the homeostatic mechanisms. A recently developed data-driven algorithm, Identifying Regulation with Adversarial Surrogates (IRAS), detects highly regulated or conserved quantities as the solution of a min-max optimization scheme that automates classical surrogate data methods. Yet, the definition of homeostasis as regulation within narrow limits is too strict for biological systems which show sustained oscillations such as circadian rhythms. In this work, we introduce Identifying Dynamic Regulation with Adversarial Surrogates (IDRAS), a generalization of the IRAS algorithm, capable of identifying control objectives that are regulated with respect to a dynamical reference value. We test the algorithm on simulation data from realistic biological models and benchmark physical systems, demonstrating excellent empirical results.
SPiC-E : Structural Priors in 3D Diffusion Models using Cross Entity Attention
Authors: Authors: Etai Sella, Gal Fiebelman, Noam Atia, Hadar Averbuch-Elor
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
We are witnessing rapid progress in automatically generating and manipulating 3D assets due to the availability of pretrained text-image diffusion models. However, time-consuming optimization procedures are required for synthesizing each sample, hindering their potential for democratizing 3D content creation. Conversely, 3D diffusion models now train on million-scale 3D datasets, yielding high-quality text-conditional 3D samples within seconds. In this work, we present SPiC-E - a neural network that adds structural guidance to 3D diffusion models, extending their usage beyond text-conditional generation. At its core, our framework introduces a cross-entity attention mechanism that allows for multiple entities (in particular, paired input and guidance 3D shapes) to interact via their internal representations within the denoising network. We utilize this mechanism for learning task-specific structural priors in 3D diffusion models from auxiliary guidance shapes. We show that our approach supports a variety of applications, including 3D stylization, semantic shape editing and text-conditional abstraction-to-3D, which transforms primitive-based abstractions into highly-expressive shapes. Extensive experiments demonstrate that SPiC-E achieves SOTA performance over these tasks while often being considerably faster than alternative methods. Importantly, this is accomplished without tailoring our approach for any specific task.
A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies
Authors: Authors: Ainesh Bakshi, Vincent Cohen-Addad, Samuel B. Hopkins, Rajesh Jayaram, Silvio Lattanzi
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
Abstract
Multi-dimensional Scaling (MDS) is a family of methods for embedding pair-wise dissimilarities between $n$ objects into low-dimensional space. MDS is widely used as a data visualization tool in the social and biological sciences, statistics, and machine learning. We study the Kamada-Kawai formulation of MDS: given a set of non-negative dissimilarities ${d{i,j}}{i , j \in [n]}$ over $n$ points, the goal is to find an embedding ${x_1,\dots,xn} \subset \mathbb{R}^k$ that minimizes [ \text{OPT} = \min{x} \mathbb{E}_{i,j \in [n]} \left[ \left(1-\frac{|x_i - xj|}{d{i,j}}\right)^2 \right] ] Despite its popularity, our theoretical understanding of MDS is extremely limited. Recently, Demaine, Hesterberg, Koehler, Lynch, and Urschel (arXiv:2109.11505) gave the first approximation algorithm with provable guarantees for Kamada-Kawai, which achieves an embedding with cost $\text{OPT} +\epsilon$ in $n^2 \cdot 2^{\tilde{\mathcal{O}}(k \Delta^4 / \epsilon^2)}$ time, where $\Delta$ is the aspect ratio of the input dissimilarities. In this work, we give the first approximation algorithm for MDS with quasi-polynomial dependency on $\Delta$: for target dimension $k$, we achieve a solution with cost $\mathcal{O}(\text{OPT}^{ \hspace{0.04in}1/k } \cdot \log(\Delta/\epsilon) )+ \epsilon$ in time $n^{ \mathcal{O}(1)} \cdot 2^{\tilde{\mathcal{O}}( k^2 (\log(\Delta)/\epsilon)^{k/2 + 1} ) }$. Our approach is based on a novel analysis of a conditioning-based rounding scheme for the Sherali-Adams LP Hierarchy. Crucially, our analysis exploits the geometry of low-dimensional Euclidean space, allowing us to avoid an exponential dependence on the aspect ratio $\Delta$. We believe our geometry-aware treatment of the Sherali-Adams Hierarchy is an important step towards developing general-purpose techniques for efficient metric optimization algorithms.
SLO/GO Degradation-Loss Sensitivity in Climate-Human System Coupling
Authors: Authors: Sierra Cabrera, Irina Babayan, Hazhir Aliahmadi, Dongmei Chen, Greg van Anders
Subjects: Computational Engineering, Finance, and Science (cs.CE); Physics and Society (physics.soc-ph)
Abstract
The potential of extreme environmental change driven by a destabilized climate system is an alarming prospect for humanity. But the intricate, subtle ways Earth's climate couples to social and economic systems raise the question of when more incremental climate change signals the need for alarm. Questions about incremental sensitivity are particularly crucial for human systems that are organized by optimization. Optimization is most valuable in resolving complex interactions among multiple factors, however, those interactions can obscure coupling to underlying drivers such as environmental degradation. Here, using Multi-Objective Land Allocation as an example, we show that model features that are common across non-convex optimization problems drive hypersensitivities in climate-induced degradation--loss response. We show that catastrophic losses in human systems can occur well before catastrophic climate collapse. We find punctuated insensitive/hypersensitive degradation--loss response, which we trace to the contrasting effects of environmental degradation on subleading, local versus global optima (SLO/GO). We argue that the SLO/GO response we identify in land-allocation problems traces to features that are common across non-convex optimization problems more broadly. Given the broad range of human systems that rely on non-convex optimization, our results therefore suggest that substantial social and economic risks could be lurking in a broad range in human systems that are coupled to the environment, even in the absence of catastrophic changes to the environment itself.
Keyword: adam
A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies
Authors: Authors: Ainesh Bakshi, Vincent Cohen-Addad, Samuel B. Hopkins, Rajesh Jayaram, Silvio Lattanzi
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
Abstract
Multi-dimensional Scaling (MDS) is a family of methods for embedding pair-wise dissimilarities between $n$ objects into low-dimensional space. MDS is widely used as a data visualization tool in the social and biological sciences, statistics, and machine learning. We study the Kamada-Kawai formulation of MDS: given a set of non-negative dissimilarities ${d{i,j}}{i , j \in [n]}$ over $n$ points, the goal is to find an embedding ${x_1,\dots,xn} \subset \mathbb{R}^k$ that minimizes [ \text{OPT} = \min{x} \mathbb{E}_{i,j \in [n]} \left[ \left(1-\frac{|x_i - xj|}{d{i,j}}\right)^2 \right] ] Despite its popularity, our theoretical understanding of MDS is extremely limited. Recently, Demaine, Hesterberg, Koehler, Lynch, and Urschel (arXiv:2109.11505) gave the first approximation algorithm with provable guarantees for Kamada-Kawai, which achieves an embedding with cost $\text{OPT} +\epsilon$ in $n^2 \cdot 2^{\tilde{\mathcal{O}}(k \Delta^4 / \epsilon^2)}$ time, where $\Delta$ is the aspect ratio of the input dissimilarities. In this work, we give the first approximation algorithm for MDS with quasi-polynomial dependency on $\Delta$: for target dimension $k$, we achieve a solution with cost $\mathcal{O}(\text{OPT}^{ \hspace{0.04in}1/k } \cdot \log(\Delta/\epsilon) )+ \epsilon$ in time $n^{ \mathcal{O}(1)} \cdot 2^{\tilde{\mathcal{O}}( k^2 (\log(\Delta)/\epsilon)^{k/2 + 1} ) }$. Our approach is based on a novel analysis of a conditioning-based rounding scheme for the Sherali-Adams LP Hierarchy. Crucially, our analysis exploits the geometry of low-dimensional Euclidean space, allowing us to avoid an exponential dependence on the aspect ratio $\Delta$. We believe our geometry-aware treatment of the Sherali-Adams Hierarchy is an important step towards developing general-purpose techniques for efficient metric optimization algorithms.
Keyword: gradient
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
Authors: Authors: Linqi Zhou, Andy Shih, Chenlin Meng, Stefano Ermon
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Abstract
Recent methods such as Score Distillation Sampling (SDS) and Variational Score Distillation (VSD) using 2D diffusion models for text-to-3D generation have demonstrated impressive generation quality. However, the long generation time of such algorithms significantly degrades the user experience. To tackle this problem, we propose DreamPropeller, a drop-in acceleration algorithm that can be wrapped around any existing text-to-3D generation pipeline based on score distillation. Our framework generalizes Picard iterations, a classical algorithm for parallel sampling an ODE path, and can account for non-ODE paths such as momentum-based gradient updates and changes in dimensions during the optimization process as in many cases of 3D generation. We show that our algorithm trades parallel compute for wallclock time and empirically achieves up to 4.7x speedup with a negligible drop in generation quality for all tested frameworks.
Rethinking Mixup for Improving the Adversarial Transferability
Authors: Authors: Xiaosen Wang, Zeyuan Yin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Mixup augmentation has been widely integrated to generate adversarial examples with superior adversarial transferability when immigrating from a surrogate model to other models. However, the underlying mechanism influencing the mixup's effect on transferability remains unexplored. In this work, we posit that the adversarial examples located at the convergence of decision boundaries across various categories exhibit better transferability and identify that Admix tends to steer the adversarial examples towards such regions. However, we find the constraint on the added image in Admix decays its capability, resulting in limited transferability. To address such an issue, we propose a new input transformation-based attack called Mixing the Image but Separating the gradienT (MIST). Specifically, MIST randomly mixes the input image with a randomly shifted image and separates the gradient of each loss item for each mixed image. To counteract the imprecise gradient, MIST calculates the gradient on several mixed images for each input sample. Extensive experimental results on the ImageNet dataset demonstrate that MIST outperforms existing SOTA input transformation-based attacks with a clear margin on both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) w/wo defense mechanisms, supporting MIST's high effectiveness and generality.
Federated Fine-Tuning of Foundation Models via Probabilistic Masking
Abstract
Foundation Models (FMs) have revolutionized machine learning with their adaptability and high performance across tasks; yet, their integration into Federated Learning (FL) is challenging due to substantial communication overhead from their extensive parameterization. Current communication-efficient FL strategies, such as gradient compression, reduce bitrates to around $1$ bit-per-parameter (bpp). However, these approaches fail to harness the characteristics of FMs, with their large number of parameters still posing a challenge to communication efficiency, even at these bitrate regimes. In this work, we present DeltaMask, a novel method that efficiently fine-tunes FMs in FL at an ultra-low bitrate, well below 1 bpp. DeltaMask employs stochastic masking to detect highly effective subnetworks within FMs and leverage stochasticity and sparsity in client masks to compress updates into a compact grayscale image using probabilistic filters, deviating from traditional weight training approaches. Our comprehensive evaluations across various datasets and architectures demonstrate DeltaMask efficiently achieves bitrates as low as 0.09 bpp, enhancing communication efficiency while maintaining FMs performance, as measured on 8 datasets and 5 pre-trained models of various network architectures.
Enhancing the Performance of Neural Networks Through Causal Discovery and Integration of Domain Knowledge
Abstract
In this paper, we develop a generic methodology to encode hierarchical causality structure among observed variables into a neural network in order to improve its predictive performance. The proposed methodology, called causality-informed neural network (CINN), leverages three coherent steps to systematically map the structural causal knowledge into the layer-to-layer design of neural network while strictly preserving the orientation of every causal relationship. In the first step, CINN discovers causal relationships from observational data via directed acyclic graph (DAG) learning, where causal discovery is recast as a continuous optimization problem to avoid the combinatorial nature. In the second step, the discovered hierarchical causality structure among observed variables is systematically encoded into neural network through a dedicated architecture and customized loss function. By categorizing variables in the causal DAG as root, intermediate, and leaf nodes, the hierarchical causal DAG is translated into CINN with a one-to-one correspondence between nodes in the causal DAG and units in the CINN while maintaining the relative order among these nodes. Regarding the loss function, both intermediate and leaf nodes in the DAG graph are treated as target outputs during CINN training so as to drive co-learning of causal relationships among different types of nodes. As multiple loss components emerge in CINN, we leverage the projection of conflicting gradients to mitigate gradient interference among the multiple learning tasks. Computational experiments across a broad spectrum of UCI data sets demonstrate substantial advantages of CINN in predictive performance over other state-of-the-art methods. In addition, an ablation study underscores the value of integrating structural and quantitative causal knowledge in enhancing the neural network's predictive performance incrementally.
Microstructure reconstruction of 2D/3D random materials via diffusion-based deep generative models
Authors: Authors: Xianrui Lyu, Xiaodan Ren
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
Microstructure reconstruction serves as a crucial foundation for establishing Process-Structure-Property (PSP) relationship in material design. Confronting the limitations of variational autoencoder and generative adversarial network within generative modeling, this study adopted the denoising diffusion probability model (DDPM) to learn the probability distribution of high-dimensional raw data and successfully reconstructed the microstructures of various composite materials, such as inclusion materials, spinodal decomposition materials, chessboard materials, fractal noise materials, and so on. The quality of generated microstructure was evaluated using quantitative measures like spatial correlation functions and Fourier descriptor. On this basis, this study also successfully achieved the regulation of microstructure randomness and the generation of gradient materials through continuous interpolation in latent space using denoising diffusion implicit model (DDIM). Furthermore, the two-dimensional microstructure reconstruction is extended to three-dimensional framework and integrates permeability as a feature encoding embedding. This enables the conditional generation of three-dimensional microstructures for random porous materials within a defined permeability range. The permeabilities of these generated microstructures were further validated through the application of the Boltzmann method.
Abstract
The paradigm of pre-training and fine-tuning has laid the foundation for deploying deep learning models. However, most fine-tuning methods are designed to meet a specific resource budget. Recently, considering diverse deployment scenarios with various resource budgets, stitchable neural network (SN-Net) is introduced to quickly obtain numerous new networks (stitches) from the pre-trained models (anchors) in a model family via model stitching. Although promising, SN-Net confronts new challenges when adapting it to new target domains, including huge memory and storage requirements and a long and sub-optimal multistage adaptation process. In this work, we present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints. Specifically, we first tailor parameter-efficient fine-tuning to share low-rank updates among the stitches while maintaining independent bias terms. In this way, we largely reduce fine-tuning memory burdens and mitigate the interference among stitches that arises in task adaptation. Furthermore, we streamline a simple yet effective one-stage deployment pipeline, which estimates the important stitches to deploy with training-time gradient statistics. By assigning higher sampling probabilities to important stitches, we also get a boosted Pareto frontier. Extensive experiments on 25 downstream visual recognition tasks demonstrate that our ESTA is capable of generating stitches with smooth accuracy-efficiency trade-offs and surpasses the direct SN-Net adaptation by remarkable margins with significantly lower training time and fewer trainable parameters. Furthermore, we demonstrate the flexibility and scalability of our ESTA framework by stitching LLMs from LLaMA family, obtaining chatbot stitches of assorted sizes.
Group-wise Sparse and Explainable Adversarial Attacks
Authors: Authors: Shpresim Sadiku, Moritz Wagner, Sebastian Pokutta
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract
Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, typically regularized by the $\ell_0$ norm. Recent efforts have replaced this norm with a structural sparsity regularizer, such as the nuclear group norm, to craft group-wise sparse adversarial attacks. The resulting perturbations are thus explainable and hold significant practical relevance, shedding light on an even greater vulnerability of DNNs than previously anticipated. However, crafting such attacks poses an optimization challenge, as it involves computing norms for groups of pixels within a non-convex objective. In this paper, we tackle this challenge by presenting an algorithm that simultaneously generates group-wise sparse attacks within semantically meaningful areas of an image. In each iteration, the core operation of our algorithm involves the optimization of a quasinorm adversarial loss. This optimization is achieved by employing the $1/2$-quasinorm proximal operator for some iterations, a method tailored for nonconvex programming. Subsequently, the algorithm transitions to a projected Nesterov's accelerated gradient descent with $2$-norm regularization applied to perturbation magnitudes. We rigorously evaluate the efficacy of our novel attack in both targeted and non-targeted attack scenarios, on CIFAR-10 and ImageNet datasets. When compared to state-of-the-art methods, our attack consistently results in a remarkable increase in group-wise sparsity, e.g., an increase of $48.12\%$ on CIFAR-10 and $40.78\%$ on ImageNet (average case, targeted attack), all while maintaining lower perturbation magnitudes. Notably, this performance is complemented by a significantly faster computation time and a $100\%$ attack success rate.
Discrete Empirical Interpolation Method for nonlinear softening problems involving damage and plasticity
Authors: Authors: Steffen Kastian, Jannick Kehls, Tim Brepols, Stefanie Reese
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
Accurate simulations are essential for engineering applications, and intricate continuum mechanical material models are constructed to achieve this goal. However, the increasing complexity of the material models and geometrical properties lead to a significant increase in computational effort. Model order reduction aims to implement efficient methods for accelerating the simulation process while preserving a high degree of accuracy. Numerous studies have already demonstrated the benefits of this method for linear elastic material modeling. However, in the present work, we investigate a two-surface gradient-extended damage-plasticity model. We conducted complex simulations with this model, demonstrating both damage behavior and softening. The POD-based discrete empirical interpolation method (DEIM) is introduced and implemented. To accomplish simulations with DEIM and softening behaviour, we propose the implementation of a reduced form of the arc-length method. Existing research on calculating models with both damage and softening behavior using the DEIM and arc-length method is limited. To validate the methods, two numerical examples are thoroughly investigated in this study: a plate with a hole and an asymmetrically notched specimen. The results show that the proposed methods can create a reduced order model with high accuracy and a significant speedup of the simulation. For both examples, the analysis is conducted in three steps: first, plasticity without damage is examined, followed by damage without plasticity, and finally, the combination of plasticity and damage is investigated.
Reinforcement Replaces Supervision: Query focused Summarization using Deep Reinforcement Learning
Abstract
Query-focused Summarization (QfS) deals with systems that generate summaries from document(s) based on a query. Motivated by the insight that Reinforcement Learning (RL) provides a generalization to Supervised Learning (SL) for Natural Language Generation, and thereby performs better (empirically) than SL, we use an RL-based approach for this task of QfS. Additionally, we also resolve the conflict of employing RL in Transformers with Teacher Forcing. We develop multiple Policy Gradient networks, trained on various reward signals: ROUGE, BLEU, and Semantic Similarity, which lead to a 10-point improvement over the State-of-the-Art approach on the ROUGE-L metric for a benchmark dataset (ELI5). We also show performance of our approach in zero-shot setting for another benchmark dataset (DebatePedia) -- our approach leads to results comparable to baselines, which were specifically trained on DebatePedia. To aid the RL training, we propose a better semantic similarity reward, enabled by a novel Passage Embedding scheme developed using Cluster Hypothesis. Lastly, we contribute a gold-standard test dataset to further research in QfS and Long-form Question Answering (LfQA).
Abstract
We study the problems of distributed online and bandit convex optimization against an adaptive adversary. We aim to minimize the average regret on $M$ machines working in parallel over $T$ rounds with $R$ intermittent communications. Assuming the underlying cost functions are convex and can be generated adaptively, our results show that collaboration is not beneficial when the machines have access to the first-order gradient information at the queried points. This is in contrast to the case for stochastic functions, where each machine samples the cost functions from a fixed distribution. Furthermore, we delve into the more challenging setting of federated online optimization with bandit (zeroth-order) feedback, where the machines can only access values of the cost functions at the queried points. The key finding here is identifying the high-dimensional regime where collaboration is beneficial and may even lead to a linear speedup in the number of machines. We further illustrate our findings through federated adversarial linear bandits by developing novel distributed single and two-point feedback algorithms. Our work is the first attempt towards a systematic understanding of federated online optimization with limited feedback, and it attains tight regret bounds in the intermittent communication setting for both first and zeroth-order feedback. Our results thus bridge the gap between stochastic and adaptive settings in federated online optimization.
Adversarial Robust Memory-Based Continual Learner
Authors: Authors: Xiaoyue Mi, Fan Tang, Zonghan Yang, Danding Wang, Juan Cao, Peng Li, Yang Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Despite the remarkable advances that have been made in continual learning, the adversarial vulnerability of such methods has not been fully discussed. We delve into the adversarial robustness of memory-based continual learning algorithms and observe limited robustness improvement by directly applying adversarial training techniques. Preliminary studies reveal the twin challenges for building adversarial robust continual learners: accelerated forgetting in continual learning and gradient obfuscation in adversarial robustness. In this study, we put forward a novel adversarial robust memory-based continual learner that adjusts data logits to mitigate the forgetting of pasts caused by adversarial samples. Furthermore, we devise a gradient-based data selection mechanism to overcome the gradient obfuscation caused by limited stored data. The proposed approach can widely integrate with existing memory-based continual learning as well as adversarial training algorithms in a plug-and-play way. Extensive experiments on Split-CIFAR10/100 and Split-Tiny-ImageNet demonstrate the effectiveness of our approach, achieving up to 8.13% higher accuracy for adversarial data.
Robust Localization and Tracking of UAVs in OTFS-based Networks
Abstract
We consider the problem of accurately localizing N unmanned aerial vehicles (UAV) in 3D space where the UAVs are part of a swarm and communicate with each other through orthogonal time-frequency space (OTFS) modulated signals. Each receiving UAV estimates the multipath wireless channel on each link formed by the line-of-sight (LoS) transmission and by the single reflections from the remaining N-2 UAVs. The estimated power delay profiles are communicated to an edge server, which is in charge of computing the exact location and speed of the UAVs. To obtain the UAVs locations and velocities, we propose an iterative algorithm, named Turbo Iterative Positioning (TIP), which, using a belief-propagation approach, effectively exploits the time difference of arrival (TDoA) measurements between the LoS and the non-LoS paths. Enabling a full cold start (no prior knowledge), our solution first maps each TDoA's profile element to a specific ID of the reflecting UAV's. The Doppler shifts measured by the OTFS receivers associated with each path are also used to estimate the UAV's velocities. The localization of the N UAVs is then derived via gradient descent optimization, with the aid of turbo-like iterations that can progressively correct some of the residual errors in the initial ID mapping operation. Our numerical results, obtained also using real-world traces, show how the multipath links are beneficial to achieving very accurate localization and speed of all UAVs, even with a limited delay-Doppler resolution. Robustness of our scheme is proven by its performance approaching the Cramer-Rao bound.
Keyword: super-resolution
Neural Fields with Thermal Activations for Arbitrary-Scale Super-Resolution
Authors: Authors: Alexander Becker, Rodrigo Caye Daudt, Nando Metzger, Jan Dirk Wegner, Konrad Schindler
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Recent approaches for arbitrary-scale single image super-resolution (ASSR) have used local neural fields to represent continuous signals that can be sampled at different rates. However, in such formulation, the point-wise query of field values does not naturally match the point spread function (PSF) of a given pixel. In this work we present a novel way to design neural fields such that points can be queried with a Gaussian PSF, which serves as anti-aliasing when moving across resolutions for ASSR. We achieve this using a novel activation function derived from Fourier theory and the heat equation. This comes at no additional cost: querying a point with a Gaussian PSF in our framework does not affect computational cost, unlike filtering in the image domain. Coupled with a hypernetwork, our method not only provides theoretically guaranteed anti-aliasing, but also sets a new bar for ASSR while also being more parameter-efficient than previous methods.
Keyword: sgd
In Search of a Data Transformation That Accelerates Neural Field Training
The Effects of Overparameterization on Sharpness-aware Minimization: An Empirical and Theoretical Analysis
Leveraging Graph Diffusion Models for Network Refinement Tasks
Keyword: optimization
Deep convolutional encoder-decoder hierarchical neural networks for conjugate heat transfer surrogate modeling
Practical Layout-Aware Analog/Mixed-Signal Design Automation with Bayesian Neural Networks
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
In Search of a Data Transformation That Accelerates Neural Field Training
DyRA: Dynamic Resolution Adjustment for Scale-robust Object Detection
Single-Cell Clustering via Dual-Graph Alignment
Continuous Pose for Monocular Cameras in Neural Implicit Representation
TLControl: Trajectory and Language Control for Human Motion Synthesis
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Enhancing the Performance of Neural Networks Through Causal Discovery and Integration of Domain Knowledge
eMotions: A Large-Scale Dataset for Emotion Recognition in Short Videos
Two Scalable Approaches for Burned-Area Mapping Using U-Net and Landsat Imagery
Efficient and Scalable Architecture for Multiple-chip Implementation of Simulated Bifurcation Machines
Comparison of metaheuristics for the firebreak placement problem: a simulation-based optimization approach
Gene-MOE: A Sparsely-gated Framework for Pan-Cancer Genomic Analysis
GNNFlow: A Distributed Framework for Continuous Temporal GNN Learning on Dynamic Graphs
Group-wise Sparse and Explainable Adversarial Attacks
Wireless Network Digital Twin for 6G: Generative AI as A Key Enabler
Towards Higher Ranks via Adversarial Weight Pruning
Model Performance Prediction for Hyperparameter Optimization of Deep Learning Models Using High Performance Computing and Quantum Annealing
The Effects of Overparameterization on Sharpness-aware Minimization: An Empirical and Theoretical Analysis
Federated Online and Bandit Convex Optimization
A Unified Framework for Multi-Hop Wireless Relaying with Hardware Impairments
Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images
Optimization in Mobile Augmented Reality Systems for the Metaverse over Wireless Communications
Fair Text-to-Image Diffusion via Fair Mapping
GenZI: Zero-Shot 3D Human-Scene Interaction Generation
Robust Localization and Tracking of UAVs in OTFS-based Networks
Variational Bayes image restoration with compressive autoencoders
Robust Scheduling in Cloud Environment Based on Heuristic Optimization Algorithm
Robustness Approaches for the Examination Timetabling Problem under Data Uncertainty
A Simple and General Operational Framework to Deploy Optimal Routes with Source Routing
Identifying Dynamic Regulation with Adversarial Surrogates
SPiC-E : Structural Priors in 3D Diffusion Models using Cross Entity Attention
A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies
SLO/GO Degradation-Loss Sensitivity in Climate-Human System Coupling
Keyword: adam
A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies
Keyword: gradient
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
Rethinking Mixup for Improving the Adversarial Transferability
Federated Fine-Tuning of Foundation Models via Probabilistic Masking
Enhancing the Performance of Neural Networks Through Causal Discovery and Integration of Domain Knowledge
Microstructure reconstruction of 2D/3D random materials via diffusion-based deep generative models
Efficient Stitchable Task Adaptation
Group-wise Sparse and Explainable Adversarial Attacks
Discrete Empirical Interpolation Method for nonlinear softening problems involving damage and plasticity
Reinforcement Replaces Supervision: Query focused Summarization using Deep Reinforcement Learning
Federated Online and Bandit Convex Optimization
Adversarial Robust Memory-Based Continual Learner
Robust Localization and Tracking of UAVs in OTFS-based Networks
Keyword: super-resolution
Neural Fields with Thermal Activations for Arbitrary-Scale Super-Resolution