New submissions for Fri, 8 Dec 23

Keyword: sgd

PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance

Authors: Authors: Haichao Sha (1), Ruixuan Liu (1), Yixuan Liu (1), Hong Chen (1) ((1) Renmin University of China)
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.03792
Pdf link: https://arxiv.org/pdf/2312.03792
Abstract The paradigm of Differentially Private SGD~(DP-SGD) can provide a theoretical guarantee for training data in both centralized and federated settings. However, the utility degradation caused by DP-SGD limits its wide application in high-stakes tasks, such as medical image diagnosis. In addition to the necessary perturbation, the convergence issue is attributed to the information loss on the gradient clipping. In this work, we propose a general framework PCDP-SGD, which aims to compress redundant gradient norms and preserve more crucial top gradient components via projection operation before gradient clipping. Additionally, we extend PCDP-SGD as a fundamental component in differential privacy federated learning~(DPFL) for mitigating the data heterogeneous challenge and achieving efficient communication. We prove that pre-projection enhances the convergence of DP-SGD by reducing the dependence of clipping error and bias to a fraction of the top gradient eigenspace, and in theory, limits cross-client variance to improve the convergence under heterogeneous federation. Experimental results demonstrate that PCDP-SGD achieves higher accuracy compared with state-of-the-art DP-SGD variants in computer vision tasks. Moreover, PCDP-SGD outperforms current federated learning frameworks when DP is guaranteed on local training sets.
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection
Authors: Authors: Tuan Hoang, Santu Rana, Sunil Gupta, Svetha Venkatesh
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.04095
Pdf link: https://arxiv.org/pdf/2312.04095
Abstract Recent data-privacy laws have sparked interest in machine unlearning, which involves removing the effect of specific training samples from a learnt model as if they were never present in the original training dataset. The challenge of machine unlearning is to discard information about the ``forget'' data in the learnt model without altering the knowledge about the remaining dataset and to do so more efficiently than the naive retraining approach. To achieve this, we adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU), in which the model takes steps in the orthogonal direction to the gradient subspaces deemed unimportant for the retaining dataset, so as to its knowledge is preserved. By utilizing Stochastic Gradient Descent (SGD) to update the model weights, our method can efficiently scale to any model and dataset size. We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible. Our code is available at https://github.com/hnanhtuan/projected_gradient_unlearning.
Keyword: optimization

Computation of the optimal error exponent function for fixed-length lossy source coding in discrete memoryless sources
Authors: Authors: Yutaka Jitsumatsu
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2312.03784
Pdf link: https://arxiv.org/pdf/2312.03784
Abstract The error exponent of fixed-length lossy source coding was established by Marton. Ahlswede showed that this exponent can be discontinuous at a rate $R$, depending on the source distribution $P$ and the distortion measure $d(x,y)$. The reason for the discontinuity in the error exponent is that there exists a distortion measure $d(x,y)$ and a distortion level $\Delta$ such that the rate-distortion function $R(\Delta|P)$ is neither concave nor quasi-concave with respect to $P$. Arimoto's algorithm for computing the error exponent in lossy source coding is based on Blahut's parametric representation of the error exponent. However, Blahut's parametric representation is a lower convex envelope of Marton's exponent, and the two do not generally agree. A major contribution of this paper is to provide a parametric representation that perfectly matches the inverse function of Marton's exponent, thereby preventing the problems arising from the above-mentioned non-concavity of $R(\Delta|P)$. For fixed parameters, an optimal distribution can be obtained using Arimoto's algorithm. By performing a nonconvex optimization over the parameters, the inverse function of Marton's exponent is obtained.
AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation
Authors: Authors: Xinzhou Wang, Yikai Wang, Junliang Ye, Zhengyi Wang, Fuchun Sun, Pengkun Liu, Ling Wang, Kai Sun, Xintong Wang, Bin He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.03795
Pdf link: https://arxiv.org/pdf/2312.03795
Abstract Text-to-3D model adaptations have advanced static 3D model quality, but sequential 3D model generation, particularly for animatable objects with large motions, is still scarce. Our work proposes AnimatableDreamer, a text-to-4D generation framework capable of generating diverse categories of non-rigid objects while adhering to the object motions extracted from a monocular video. At its core, AnimatableDreamer is equipped with our novel optimization design dubbed Canonical Score Distillation (CSD), which simplifies the generation dimension from 4D to 3D by denoising over different frames in the time-varying camera spaces while conducting the distillation process in a unique canonical space shared per video. Concretely, CSD ensures that score gradients back-propagate to the canonical space through differentiable warping, hence guaranteeing the time-consistent generation and maintaining morphological plausibility across different poses. By lifting the 3D generator to 4D with warping functions, AnimatableDreamer offers a novel perspective on non-rigid 3D model generation and reconstruction. Besides, with inductive knowledge from a multi-view consistent diffusion model, CSD regularizes reconstruction from novel views, thus cyclically enhancing the generation process. Extensive experiments demonstrate the capability of our method in generating high-flexibility text-guided 3D models from the monocular video, while also showing improved reconstruction performance over typical non-rigid reconstruction methods. Project page https://AnimatableDreamer.github.io.
XCube ($\mathcal{X}^3$): Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies
Authors: Authors: Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, Francis Williams
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.03806
Pdf link: https://arxiv.org/pdf/2312.03806
Abstract We present $\mathcal{X}^3$ (pronounced XCube), a novel generative model for high-resolution sparse 3D voxel grids with arbitrary attributes. Our model can generate millions of voxels with a finest effective resolution of up to $1024^3$ in a feed-forward fashion without time-consuming test-time optimization. To achieve this, we employ a hierarchical voxel latent diffusion model which generates progressively higher resolution grids in a coarse-to-fine manner using a custom framework built on the highly efficient VDB data structure. Apart from generating high-resolution objects, we demonstrate the effectiveness of XCube on large outdoor scenes at scales of 100m$\times$100m with a voxel size as small as 10cm. We observe clear qualitative and quantitative improvements over past approaches. In addition to unconditional generation, we show that our model can be used to solve a variety of tasks such as user-guided editing, scene completion from a single scan, and text-to-3D. More results and details can be found at https://research.nvidia.com/labs/toronto-ai/xcube/.
Uncertainty-Informed Renewable Energy Scheduling: A Scalable Bilevel Framework
Authors: Authors: Dongwei Zhao, Vladimir Dvorkin, Stefanos Delikaraoglou, Alberto J. Lamadrid L., Audun Botterud
Subjects: Systems and Control (eess.SY); General Economics (econ.GN); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.03868
Pdf link: https://arxiv.org/pdf/2312.03868
Abstract This work proposes an uncertainty-informed bid adjustment framework for integrating variable renewable energy sources (VRES) into electricity markets. This framework adopts a bilevel model to compute the optimal VRES day-ahead bids. It aims to minimize the expected system cost across day-ahead and real-time stages and approximate the cost efficiency of the stochastic market design. However, solving the bilevel optimization problem is computationally challenging for large-scale systems. To overcome this challenge, we introduce a novel technique based on strong duality and McCormick envelopes, which relaxes the problem to a linear program, enabling large-scale applications. The proposed bilevel framework is applied to the 1576-bus NYISO system and benchmarked against a myopic strategy, where the VRES bid is the mean value of the probabilistic power forecast. Results demonstrate that, under high VRES penetration levels (e.g., 40%), our framework can significantly reduce system costs and market-price volatility, by optimizing VRES quantities efficiently in the day-ahead market. Furthermore, we find that when transmission capacity increases, the proposed bilevel model will still reduce the system cost, whereas the myopic strategy may incur a much higher cost due to over-scheduling of VRES in the day-ahead market and the lack of flexible conventional generators in real time.
Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives
Authors: Authors: Pierre Wolinski
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.03885
Pdf link: https://arxiv.org/pdf/2312.03885
Abstract We consider a gradient-based optimization method applied to a function $\mathcal{L}$ of a vector of variables $\boldsymbol{\theta}$, in the case where $\boldsymbol{\theta}$ is represented as a tuple of tensors $(\mathbf{T}_1, \cdots, \mathbf{T}_S)$. This framework encompasses many common use-cases, such as training neural networks by gradient descent. First, we propose a computationally inexpensive technique providing higher-order information on $\mathcal{L}$, especially about the interactions between the tensors $\mathbf{T}_s$, based on automatic differentiation and computational tricks. Second, we use this technique at order 2 to build a second-order optimization method which is suitable, among other things, for training deep neural networks of various architectures. This second-order method leverages the partition structure of $\boldsymbol{\theta}$ into tensors $(\mathbf{T}_1, \cdots, \mathbf{T}_S)$, in such a way that it requires neither the computation of the Hessian of $\mathcal{L}$ according to $\boldsymbol{\theta}$, nor any approximation of it. The key part consists in computing a smaller matrix interpretable as a "Hessian according to the partition", which can be computed exactly and efficiently. In contrast to many existing practical second-order methods used in neural networks, which perform a diagonal or block-diagonal approximation of the Hessian or its inverse, the method we propose does not neglect interactions between layers. Finally, we can tune the coarseness of the partition to recover well-known optimization methods: the coarsest case corresponds to Cauchy's steepest descent method, the finest case corresponds to the usual Newton's method.
A Scalable and Generalizable Pathloss Map Prediction
Authors: Authors: Ju-Hyung Lee, Andreas F. Molisch
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.03950
Pdf link: https://arxiv.org/pdf/2312.03950
Abstract Large-scale channel prediction, i.e., estimation of the pathloss from geographical/morphological/building maps, is an essential component of wireless network planning. Ray tracing (RT)-based methods have been widely used for many years, but they require significant computational effort that may become prohibitive with the increased network densification and/or use of higher frequencies in B5G/6G systems. In this paper, we propose a data-driven, model-free pathloss map prediction (PMP) method, called PMNet. PMNet uses a supervised learning approach: it is trained on a limited amount of RT (or channel measurement) data and map data. Once trained, PMNet can predict pathloss over location with high accuracy (an RMSE level of $10^{-2}$) in a few milliseconds. We further extend PMNet by employing transfer learning (TL). TL allows PMNet to learn a new network scenario quickly (x5.6 faster training) and efficiently (using x4.5 less data) by transferring knowledge from a pre-trained model, while retaining accuracy. Our results demonstrate that PMNet is a scalable and generalizable ML-based PMP method, showing its potential to be used in several network optimization applications.
Understanding the Role of Optimization in Double Descent
Authors: Authors: Chris Yuhao Liu, Jeffrey Flanigan
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.03951
Pdf link: https://arxiv.org/pdf/2312.03951
Abstract The phenomenon of model-wise double descent, where the test error peaks and then reduces as the model size increases, is an interesting topic that has attracted the attention of researchers due to the striking observed gap between theory and practice \citep{Belkin2018ReconcilingMM}. Additionally, while double descent has been observed in various tasks and architectures, the peak of double descent can sometimes be noticeably absent or diminished, even without explicit regularization, such as weight decay and early stopping. In this paper, we investigate this intriguing phenomenon from the optimization perspective and propose a simple optimization-based explanation for why double descent sometimes occurs weakly or not at all. To the best of our knowledge, we are the first to demonstrate that many disparate factors contributing to model-wise double descent (initialization, normalization, batch size, learning rate, optimization algorithm) are unified from the viewpoint of optimization: model-wise double descent is observed if and only if the optimizer can find a sufficiently low-loss minimum. These factors directly affect the condition number of the optimization problem or the optimizer and thus affect the final minimum found by the optimizer, reducing or increasing the height of the double descent peak. We conduct a series of controlled experiments on random feature models and two-layer neural networks under various optimization settings, demonstrating this optimization-based unified view. Our results suggest the following implication: Double descent is unlikely to be a problem for real-world machine learning setups. Additionally, our results help explain the gap between weak double descent peaks in practice and strong peaks observable in carefully designed setups.
MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator
Authors: Authors: Xiao-Yin Liu, Xiao-Hu Zhou, Guo-Tao Li, Hao Li, Mei-Jiang Gui, Tian-Yu Xiang, De-Xing Huang, Zeng-Guang Hou
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.03991
Pdf link: https://arxiv.org/pdf/2312.03991
Abstract Offline reinforcement learning (RL) faces a significant challenge of distribution shift. Model-free offline RL penalizes the Q value for out-of-distribution (OOD) data or constrains the policy closed to the behavior policy to tackle this problem, but this inhibits the exploration of the OOD region. Model-based offline RL, which uses the trained environment model to generate more OOD data and performs conservative policy optimization within that model, has become an effective method for this problem. However, the current model-based algorithms rarely consider agent robustness when incorporating conservatism into policy. Therefore, the new model-based offline algorithm with a conservative Bellman operator (MICRO) is proposed. This method trades off performance and robustness via introducing the robust Bellman operator into the algorithm. Compared with previous model-based algorithms with robust adversarial models, MICRO can significantly reduce the computation cost by only choosing the minimal Q value in the state uncertainty set. Extensive experiments demonstrate that MICRO outperforms prior RL algorithms in offline RL benchmark and is considerably robust to adversarial perturbations.
Resilience-Assuring Hydrogen-Powered Microgrids
Authors: Authors: Chaofan Lin, Peng Zhang, Xiaonan Lu
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.04014
Pdf link: https://arxiv.org/pdf/2312.04014
Abstract Green hydrogen has shown great potential to power microgrids as a primary source, yet the operation methodology under extreme events is still an open area. To fill this gap, this letter establishes an operational optimization strategy towards resilient hydrogen-powered microgrids, where the frequency and voltage regulation characteristics of hydrogen sources under advanced controls are accurately represented by piecewise linear constraints. The results show that the new operation approach can output a safety-assured operation plan with rational power change distribution and reduced frequency and voltage variation.
Moirai: Towards Optimal Placement for Distributed Inference on Heterogeneous Devices
Authors: Authors: Beibei Zhang, Hongwei Zhu, Feng Gao, Zhihui Yang, Sean Xiaoyang Wang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.04025
Pdf link: https://arxiv.org/pdf/2312.04025
Abstract The escalating size of Deep Neural Networks (DNNs) has spurred a growing research interest in hosting and serving DNN models across multiple devices. A number of studies have been reported to partition a DNN model across devices, providing device placement solutions. The methods appeared in the literature, however, either suffer from poor placement performance due to the exponential search space or miss an optimal placement as a consequence of the reduced search space with limited heuristics. Moreover, these methods have ignored the runtime inter-operator optimization of a computation graph when coarsening the graph, which degrades the end-to-end inference performance. This paper presents Moirai that better exploits runtime inter-operator fusion in a model to render a coarsened computation graph, reducing the search space while maintaining the inter-operator optimization provided by inference backends. Moirai also generalizes the device placement algorithm from multiple perspectives by considering inference constraints and device heterogeneity.Extensive experimental evaluation with 11 large DNNs demonstrates that Moirai outperforms the state-of-the-art counterparts, i.e., Placeto, m-SCT, and GETF, up to 4.28$\times$ in reduction of the end-to-end inference latency. Moirai code is anonymously released at \url{https://github.com/moirai-placement/moirai}.
Improved Face Representation via Joint Label Classification and Supervised Contrastive Clustering
Authors: Authors: Zhenduo Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.04029
Pdf link: https://arxiv.org/pdf/2312.04029
Abstract Face clustering tasks can learn hierarchical semantic information from large-scale data, which has the potential to help facilitate face recognition. However, there are few works on this problem. This paper explores it by proposing a joint optimization task of label classification and supervised contrastive clustering to introduce the cluster knowledge to the traditional face recognition task in two ways. We first extend ArcFace with a cluster-guided angular margin to adjust the within-class feature distribution according to the hard level of face clustering. Secondly, we propose a supervised contrastive clustering approach to pull the features to the cluster center and propose the cluster-aligning procedure to align the cluster center and the learnable class center in the classifier for joint training. Finally, extensive qualitative and quantitative experiments on popular facial benchmarks demonstrate the effectiveness of our paradigm and its superiority over the existing approaches to face recognition.
Queueing Delay Minimization in Overloaded Networks via Rate Control
Authors: Authors: Xinyu Wu, Dan Wu, Eytan Modiano
Subjects: Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2312.04054
Pdf link: https://arxiv.org/pdf/2312.04054
Abstract We develop link rate control policies to minimize the queueing delay of packets in overloaded networks. We show that increasing link rates does not guarantee delay reduction during overload. We consider a fluid queueing model that facilitates explicit characterization of the queueing delay of packets, and establish explicit conditions on link rates that can minimize the average and maximum queueing delay in both single-hop and multi-stage (switching) networks. These min-delay conditions require maintaining an identical ratio between the ingress and egress rates of different nodes at the same layer of the network. We term the policies that follow these conditions rate-proportional policies. We further generalize the rate-proportional policies to queue-proportional policies, which minimize the queueing delay asymptotically based on the time-varying queue length while remaining agnostic of packet arrival rates. We validate that the proposed policies lead to minimum queueing delay under various network topologies and settings, compared with benchmarks including the backpressure policy that maximizes network throughput and the max-link-rate policy that fully utilizes bandwidth. We further remark that the explicit min-delay policy design in multi-stage networks facilitates co-optimization with other metrics, such as minimizing total bandwidth, balancing link utilization and node buffer usage. This demonstrates the wider utility of our main results in data center network optimization in practice.
Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching
Authors: Authors: Junsheng Zhou, Baorui Ma, Wenyuan Zhang, Yi Fang, Yu-Shen Liu, Zhizhong Han
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.04060
Pdf link: https://arxiv.org/pdf/2312.04060
Abstract Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic. Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks, and use Perspective-n-Points (PnP) to estimate rigid transformation during post-processing. However, these methods struggle to map points and pixels to a shared latent space robustly since points and pixels have very different characteristics with patterns learned in different manners (MLP and CNN), and they also fail to construct supervision directly on the transformation since the PnP is non-differentiable, which leads to unstable registration results. To address these problems, we propose to learn a structured cross-modality latent space to represent pixel features and 3D features via a differentiable probabilistic PnP solver. Specifically, we design a triplet network to learn VoxelPoint-to-Pixel matching, where we represent 3D elements using both voxels and points to learn the cross-modality latent space with pixels. We design both the voxel and pixel branch based on CNNs to operate convolutions on voxels/pixels represented in grids, and integrate an additional point branch to regain the information lost during voxelization. We train our framework end-to-end by imposing supervisions directly on the predicted pose distribution with a probabilistic PnP solver. To explore distinctive patterns of cross-modality features, we design a novel loss with adaptive-weighted optimization for cross-modality feature description. The experimental results on KITTI and nuScenes datasets show significant improvements over the state-of-the-art methods. The code and models are available at https://github.com/junshengzhou/VP2P-Match.
Rate-splitting Multiple Access for Hierarchical HAP-LAP Networks under Limited Fronthaul
Authors: Authors: Jeongbin Kim, Seongah Jeong, Seonghoon Yoo, Woong Son, Joonhyuk Kang
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.04081
Pdf link: https://arxiv.org/pdf/2312.04081
Abstract In this correspondence, we propose hierarchical high-altitude platform (HAP)-low-altitude platform (LAP) networks with the aim of maximizing the sum-rate of ground user equipments (UEs). The multiple aerial radio units (RUs) mounted on HAPs and LAPs are managed by the central unit (CU) via constrained fronthaul links. The limitation of fronthaul capacity can be addressed through quantization, employing the cloud radio access network (C-RAN) architecture. For spectral efficiency, we adopt the rate-splitting multiple access (RSMA), leveraging the advantages of both space-division multiple access (SDMA) and non-orthogonal multiple access (NOMA). To achieve this, we jointly optimize rate splitting, transmit power allocation, quantization noise variance, and UAV placement using an alternating optimization (AO) approach coupled with successive convex approximation (SCA) and the weighted minimum mean square error (WMMSE) method. Numerical results validate the superior performance of the proposed method compared to benchmark schemes, including partial optimizations or those without the assistance of LAPs.
Edge computing service deployment and task offloading based on multi-task high-dimensional multi-objective optimization
Authors: Authors: Yanheng Guo, Yan Zhang, Linjie Wu, Mengxia Li, Xingjuan Cai, Jinjun Chen
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2312.04101
Pdf link: https://arxiv.org/pdf/2312.04101
Abstract The Mobile Edge Computing (MEC) system located close to the client allows mobile smart devices to offload their computations onto edge servers, enabling them to benefit from low-latency computing services. Both cloud service providers and users seek more comprehensive solutions, necessitating judicious decisions in service deployment and task offloading while balancing multiple objectives. This study investigates service deployment and task offloading challenges in a multi-user environment, framing them as a multi-task high-dimensional multi-objective optimization (MT-HD-MOO) problem within an edge environment. To ensure stable service provisioning, beyond considering latency, energy consumption, and cost as deployment objectives, network reliability is also incorporated. Furthermore, to promote equitable usage of edge servers, load balancing is introduced as a fourth task offloading objective, in addition to latency, energy consumption, and cost. Additionally, this paper designs a MT-HD-MOO algorithm based on a multi-selection strategy to address this model and its solution. By employing diverse selection strategies, an environment selection strategy pool is established to enhance population diversity within the high-dimensional objective space. Ultimately, the algorithm's effectiveness is verified through simulation experiments.
Towards 4D Human Video Stylization
Authors: Authors: Tiantian Wang, Xinxin Zuo, Fangzhou Mu, Jian Wang, Ming-Hsuan Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.04143
Pdf link: https://arxiv.org/pdf/2312.04143
Abstract We present a first step towards 4D (3D and time) human video stylization, which addresses style transfer, novel view synthesis and human animation within a unified framework. While numerous video stylization methods have been developed, they are often restricted to rendering images in specific viewpoints of the input video, lacking the capability to generalize to novel views and novel poses in dynamic scenes. To overcome these limitations, we leverage Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in the rendered feature space. Our innovative approach involves the simultaneous representation of both the human subject and the surrounding scene using two NeRFs. This dual representation facilitates the animation of human subjects across various poses and novel viewpoints. Specifically, we introduce a novel geometry-guided tri-plane representation, significantly enhancing feature representation robustness compared to direct tri-plane optimization. Following the video reconstruction, stylization is performed within the NeRFs' rendered feature space. Extensive experiments demonstrate that the proposed method strikes a superior balance between stylized textures and temporal coherence, surpassing existing approaches. Furthermore, our framework uniquely extends its capabilities to accommodate novel poses and viewpoints, making it a versatile tool for creative human video stylization.
Zero-Touch Networks: Towards Next-Generation Network Automation
Authors: Authors: Mirna El Rajab, Li Yang, Abdallah Shami
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.04159
Pdf link: https://arxiv.org/pdf/2312.04159
Abstract The Zero-touch network and Service Management (ZSM) framework represents an emerging paradigm in the management of the fifth-generation (5G) and Beyond (5G+) networks, offering automated self-management and self-healing capabilities to address the escalating complexity and the growing data volume of modern networks. ZSM frameworks leverage advanced technologies such as Machine Learning (ML) to enable intelligent decision-making and reduce human intervention. This paper presents a comprehensive survey of Zero-Touch Networks (ZTNs) within the ZSM framework, covering network optimization, traffic monitoring, energy efficiency, and security aspects of next-generational networks. The paper explores the challenges associated with ZSM, particularly those related to ML, which necessitate the need to explore diverse network automation solutions. In this context, the study investigates the application of Automated ML (AutoML) in ZTNs, to reduce network management costs and enhance performance. AutoML automates the selection and tuning process of a ML model for a given task. Specifically, the focus is on AutoML's ability to predict application throughput and autonomously adapt to data drift. Experimental results demonstrate the superiority of the proposed AutoML pipeline over traditional ML in terms of prediction accuracy. Integrating AutoML and ZSM concepts significantly reduces network configuration and management efforts, allowing operators to allocate more time and resources to other important tasks. The paper also provides a high-level 5G system architecture incorporating AutoML and ZSM concepts. This research highlights the potential of ZTNs and AutoML to revolutionize the management of 5G+ networks, enabling automated decision-making and empowering network operators to achieve higher efficiency, improved performance, and enhanced user experience.
Cell segmentation of in situ transcriptomics data using signed graph partitioning
Authors: Authors: Axel Andersson, Andrea Behanova, Carolina Wählby, Filip Malmberg
Subjects: Information Theory (cs.IT); Quantitative Methods (q-bio.QM)
Arxiv link: https://arxiv.org/abs/2312.04181
Pdf link: https://arxiv.org/pdf/2312.04181
Abstract The locations of different mRNA molecules can be revealed by multiplexed in situ RNA detection. By assigning detected mRNA molecules to individual cells, it is possible to identify many different cell types in parallel. This in turn enables investigation of the spatial cellular architecture in tissue, which is crucial for furthering our understanding of biological processes and diseases. However, cell typing typically depends on the segmentation of cell nuclei, which is often done based on images of a DNA stain, such as DAPI. Limiting cell definition to a nuclear stain makes it fundamentally difficult to determine accurate cell borders, and thereby also difficult to assign mRNA molecules to the correct cell. As such, we have developed a computational tool that segments cells solely based on the local composition of mRNA molecules. First, a small neural network is trained to compute attractive and repulsive edges between pairs of mRNA molecules. The signed graph is then partitioned by a mutex watershed into components corresponding to different cells. We evaluated our method on two publicly available datasets and compared it against the current state-of-the-art and older baselines. We conclude that combining neural networks with combinatorial optimization is a promising approach for cell segmentation of in situ transcriptomics data.
Adaptive Recursive Query Optimization
Authors: Authors: Anna Herlihy, Guillaume Martres, Anastasia Ailamaki, Martin Odersky
Subjects: Databases (cs.DB); Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2312.04282
Pdf link: https://arxiv.org/pdf/2312.04282
Abstract Performance-critical industrial applications, including large-scale program, network, and distributed system analyses, are increasingly reliant on recursive queries for data analysis. Yet traditional relational algebra-based query optimization techniques do not scale well to recursive query processing due to the iterative nature of query evaluation, where relation cardinalities can change unpredictably during the course of a single query execution. To avoid error-prone cardinality estimation, adaptive query processing techniques use runtime information to inform query optimization, but these systems are not optimized for the specific needs of recursive query processing. In this paper, we introduce Adaptive Metaprogramming, an innovative technique that shifts recursive query optimization and code generation from compile-time to runtime using principled metaprogramming, enabling dynamic optimization and re-optimization before and after query execution has begun. We present a custom join-ordering optimization applicable at multiple stages during query compilation and execution. Through Carac, we evaluate the optimization potential of Adaptive Metaprogramming and show unoptimized recursive query execution time can be improved by three orders of magnitude and hand-optimized queries by 4x.
HARQ-IR Aided Short Packet Communications: BLER Analysis and Throughput Maximization
Authors: Authors: Fuchao He, Zheng Shi, Guanghua Yang, Xiaofan Li, Xinrong Ye, Shaodan Ma
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.04377
Pdf link: https://arxiv.org/pdf/2312.04377
Abstract This paper introduces hybrid automatic repeat request with incremental redundancy (HARQ-IR) to boost the reliability of short packet communications. The finite blocklength information theory and correlated decoding events tremendously preclude the analysis of average block error rate (BLER). Fortunately, the recursive form of average BLER motivates us to calculate its value through the trapezoidal approximation and Gauss-Laguerre quadrature. Moreover, the asymptotic analysis is performed to derive a simple expression for the average BLER at high signal-to-noise ratio (SNR). Then, we study the maximization of long term average throughput (LTAT) via power allocation meanwhile ensuring the power and the BLER constraints. For tractability, the asymptotic BLER is employed to solve the problem through geometric programming (GP). However, the GP-based solution underestimates the LTAT at low SNR due to a large approximation error in this case. Alternatively, we also develop a deep reinforcement learning (DRL)-based framework to learn power allocation policy. In particular, the optimization problem is transformed into a constrained Markov decision process, which is solved by integrating deep deterministic policy gradient (DDPG) with subgradient method. The numerical results finally demonstrate that the DRL-based method outperforms the GP-based one at low SNR, albeit at the cost of increasing computational burden.
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization
Authors: Authors: Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.04386
Pdf link: https://arxiv.org/pdf/2312.04386
Abstract We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation (UBE), but the over-approximation may result in inefficient exploration. We propose a new UBE whose solution converges to the true posterior variance over values and leads to lower regret in tabular exploration problems. We identify challenges to apply the UBE theory beyond tabular problems and propose a suitable approximation. Based on this approximation, we introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC), that can be applied for either risk-seeking or risk-averse policy optimization with minimal changes. Experiments in both online and offline RL demonstrate improved performance compared to other uncertainty estimation methods.
MIST: An Efficient Approach for Software-Defined Multicast in Wireless Mesh Networks
Authors: Authors: Rupei Xu, Yuming Jiang, Jason P. Jue
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.04418
Pdf link: https://arxiv.org/pdf/2312.04418
Abstract Multicasting is a vital information dissemination technique in Software-Defined Networking (SDN). With SDN, a multicast service can incorporate network functions implemented at different nodes, which is referred to as software-defined multicast. Emerging ubiquitous wireless networks for 5G and Beyond (B5G) inherently support multicast. However, the broadcast nature of wireless channels, especially in dense deployments, leads to neighborhood interference as a primary system degradation factor, which introduces a new challenge for software-defined multicast in wireless mesh networks. To tackle this, this paper introduces a novel approach, based on the idea of minimizing both the total length cost of the multicast tree and the interference at the same time. Accordingly, a bicriteria optimization problem is formulated, which is called \emph{Minimum Interference Steiner Tree (MIST)}. To solve the bicriteria problem, instead of resorting to heuristics, this paper employs an innovative approach that is an approximate algorithm for MIST but with guaranteed performance. Specifically, the approach is a two-stage relaxation algorithm by exploiting the monotone submodularity property of the interference metric and identifying Pareto optimal solutions for MIST. Simulation results demonstrate and validate the performance of the proposed algorithm.
Using Large Language Models for Hyperparameter Optimization
Authors: Authors: Michael R. Zhang, Nishkrit Desai, Juhan Bae, Jonathan Lorraine, Jimmy Ba
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.04528
Pdf link: https://arxiv.org/pdf/2312.04528
Abstract This paper studies using foundational large language models (LLMs) to make decisions during hyperparameter optimization (HPO). Empirical evaluations demonstrate that in settings with constrained search budgets, LLMs can perform comparably or better than traditional HPO methods like random search and Bayesian optimization on standard benchmarks. Furthermore, we propose to treat the code specifying our model as a hyperparameter, which the LLM outputs, going beyond the capabilities of existing HPO approaches. Our findings suggest that LLMs are a promising tool for improving efficiency in the traditional decision-making problem of hyperparameter optimization.
Camera Height Doesn't Change: Unsupervised Monocular Scale-Aware Road-Scene Depth Estimation
Authors: Authors: Genki Kinoshita, Ko Nishino
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.04530
Pdf link: https://arxiv.org/pdf/2312.04530
Abstract Monocular depth estimators either require explicit scale supervision through auxiliary sensors or suffer from scale ambiguity, which renders them difficult to deploy in downstream applications. A possible source of scale is the sizes of objects found in the scene, but inaccurate localization makes them difficult to exploit. In this paper, we introduce a novel scale-aware monocular depth estimation method called StableCamH that does not require any auxiliary sensor or supervision. The key idea is to exploit prior knowledge of object heights in the scene but aggregate the height cues into a single invariant measure common to all frames in a road video sequence, namely the camera height. By formulating monocular depth estimation as camera height optimization, we achieve robust and accurate unsupervised end-to-end training. To realize StableCamH, we devise a novel learning-based size prior that can directly convert car appearance into its dimensions. Extensive experiments on KITTI and Cityscapes show the effectiveness of StableCamH, its state-of-the-art accuracy compared with related methods, and its generalizability. The training framework of StableCamH can be used for any monocular depth estimation method and will hopefully become a fundamental building block for further work.
EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
Authors: Authors: Sharath Girish, Kamal Gupta, Abhinav Shrivastava
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2312.04564
Pdf link: https://arxiv.org/pdf/2312.04564
Abstract Recently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for both training and storage, as they require millions of Gaussians in their point cloud representation for each scene. We present a technique utilizing quantized embeddings to significantly reduce memory storage requirements and a coarse-to-fine training strategy for a faster and more stable optimization of the Gaussian point clouds. Our approach results in scene representations with fewer Gaussians and quantized representations, leading to faster training times and rendering speeds for real-time rendering of high resolution scenes. We reduce memory by more than an order of magnitude all while maintaining the reconstruction quality. We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20x less memory and faster training/inference speed. Project page and code is available https://efficientgaussian.github.io
Keyword: adam

There is no result

Keyword: gradient

PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
Authors: Authors: Haichao Sha (1), Ruixuan Liu (1), Yixuan Liu (1), Hong Chen (1) ((1) Renmin University of China)
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.03792
Pdf link: https://arxiv.org/pdf/2312.03792
Abstract The paradigm of Differentially Private SGD~(DP-SGD) can provide a theoretical guarantee for training data in both centralized and federated settings. However, the utility degradation caused by DP-SGD limits its wide application in high-stakes tasks, such as medical image diagnosis. In addition to the necessary perturbation, the convergence issue is attributed to the information loss on the gradient clipping. In this work, we propose a general framework PCDP-SGD, which aims to compress redundant gradient norms and preserve more crucial top gradient components via projection operation before gradient clipping. Additionally, we extend PCDP-SGD as a fundamental component in differential privacy federated learning~(DPFL) for mitigating the data heterogeneous challenge and achieving efficient communication. We prove that pre-projection enhances the convergence of DP-SGD by reducing the dependence of clipping error and bias to a fraction of the top gradient eigenspace, and in theory, limits cross-client variance to improve the convergence under heterogeneous federation. Experimental results demonstrate that PCDP-SGD achieves higher accuracy compared with state-of-the-art DP-SGD variants in computer vision tasks. Moreover, PCDP-SGD outperforms current federated learning frameworks when DP is guaranteed on local training sets.
AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation
Authors: Authors: Xinzhou Wang, Yikai Wang, Junliang Ye, Zhengyi Wang, Fuchun Sun, Pengkun Liu, Ling Wang, Kai Sun, Xintong Wang, Bin He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.03795
Pdf link: https://arxiv.org/pdf/2312.03795
Abstract Text-to-3D model adaptations have advanced static 3D model quality, but sequential 3D model generation, particularly for animatable objects with large motions, is still scarce. Our work proposes AnimatableDreamer, a text-to-4D generation framework capable of generating diverse categories of non-rigid objects while adhering to the object motions extracted from a monocular video. At its core, AnimatableDreamer is equipped with our novel optimization design dubbed Canonical Score Distillation (CSD), which simplifies the generation dimension from 4D to 3D by denoising over different frames in the time-varying camera spaces while conducting the distillation process in a unique canonical space shared per video. Concretely, CSD ensures that score gradients back-propagate to the canonical space through differentiable warping, hence guaranteeing the time-consistent generation and maintaining morphological plausibility across different poses. By lifting the 3D generator to 4D with warping functions, AnimatableDreamer offers a novel perspective on non-rigid 3D model generation and reconstruction. Besides, with inductive knowledge from a multi-view consistent diffusion model, CSD regularizes reconstruction from novel views, thus cyclically enhancing the generation process. Extensive experiments demonstrate the capability of our method in generating high-flexibility text-guided 3D models from the monocular video, while also showing improved reconstruction performance over typical non-rigid reconstruction methods. Project page https://AnimatableDreamer.github.io.
Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives
Authors: Authors: Pierre Wolinski
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.03885
Pdf link: https://arxiv.org/pdf/2312.03885
Abstract We consider a gradient-based optimization method applied to a function $\mathcal{L}$ of a vector of variables $\boldsymbol{\theta}$, in the case where $\boldsymbol{\theta}$ is represented as a tuple of tensors $(\mathbf{T}_1, \cdots, \mathbf{T}_S)$. This framework encompasses many common use-cases, such as training neural networks by gradient descent. First, we propose a computationally inexpensive technique providing higher-order information on $\mathcal{L}$, especially about the interactions between the tensors $\mathbf{T}_s$, based on automatic differentiation and computational tricks. Second, we use this technique at order 2 to build a second-order optimization method which is suitable, among other things, for training deep neural networks of various architectures. This second-order method leverages the partition structure of $\boldsymbol{\theta}$ into tensors $(\mathbf{T}_1, \cdots, \mathbf{T}_S)$, in such a way that it requires neither the computation of the Hessian of $\mathcal{L}$ according to $\boldsymbol{\theta}$, nor any approximation of it. The key part consists in computing a smaller matrix interpretable as a "Hessian according to the partition", which can be computed exactly and efficiently. In contrast to many existing practical second-order methods used in neural networks, which perform a diagonal or block-diagonal approximation of the Hessian or its inverse, the method we propose does not neglect interactions between layers. Finally, we can tune the coarseness of the partition to recover well-known optimization methods: the coarsest case corresponds to Cauchy's steepest descent method, the finest case corresponds to the usual Newton's method.
On The Fairness Impacts of Hardware Selection in Machine Learning
Authors: Authors: Sree Harsha Nelaturu, Nishaanth Kanna Ravichandran, Cuong Tran, Sara Hooker, Ferdinando Fioretto
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2312.03886
Pdf link: https://arxiv.org/pdf/2312.03886
Abstract In the machine learning ecosystem, hardware selection is often regarded as a mere utility, overshadowed by the spotlight on algorithms and data. This oversight is particularly problematic in contexts like ML-as-a-service platforms, where users often lack control over the hardware used for model deployment. How does the choice of hardware impact generalization properties? This paper investigates the influence of hardware on the delicate balance between model performance and fairness. We demonstrate that hardware choices can exacerbate existing disparities, attributing these discrepancies to variations in gradient flows and loss surfaces across different demographic groups. Through both theoretical and empirical analysis, the paper not only identifies the underlying factors but also proposes an effective strategy for mitigating hardware-induced performance imbalances.
Improving Gradient-guided Nested Sampling for Posterior Inference
Authors: Authors: Pablo Lemos, Nikolay Malkin, Will Handley, Yoshua Bengio, Yashar Hezaveh, Laurence Perreault-Levasseur
Subjects: Machine Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.03911
Pdf link: https://arxiv.org/pdf/2312.03911
Abstract We present a performant, general-purpose gradient-guided nested sampling algorithm, ${\tt GGNS}$, combining the state of the art in differentiable programming, Hamiltonian slice sampling, clustering, mode separation, dynamic nested sampling, and parallelization. This unique combination allows ${\tt GGNS}$ to scale well with dimensionality and perform competitively on a variety of synthetic and real-world problems. We also show the potential of combining nested sampling with generative flow networks to obtain large amounts of high-quality samples from the posterior distribution. This combination leads to faster mode discovery and more accurate estimates of the partition function.
MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity and Degree Descent Criterion
Authors: Authors: Dehua Peng, Zhipeng Gui, Huayi Wu
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.04067
Pdf link: https://arxiv.org/pdf/2312.04067
Abstract As the most typical graph clustering method, spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability. Classical spectral clustering measures the edge weights of graph using pairwise Euclidean-based metric, and solves the optimal graph partition by relaxing the constraints of indicator matrix and performing Laplacian decomposition. However, Euclidean-based similarity might cause skew graph cuts when handling non-spherical data distributions, and the relaxation strategy introduces information loss. Meanwhile, spectral clustering requires specifying the number of clusters, which is hard to determine without enough prior knowledge. In this work, we leverage the path-based similarity to enhance intra-cluster associations, and propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition. This algorithm enables the identification of arbitrary shaped clusters and is robust to noise. To reduce the computational complexity of similarity calculation, we transform optimal path search into generating the maximum spanning tree (MST), and develop a fast MST (FastMST) algorithm to further improve its time-efficiency. Moreover, we define a density gradient factor (DGF) for separating the weakly connected clusters. The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition. The source code of MeanCut is available at https://github.com/ZPGuiGroupWhu/MeanCut-Clustering.
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection
Authors: Authors: Tuan Hoang, Santu Rana, Sunil Gupta, Svetha Venkatesh
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.04095
Pdf link: https://arxiv.org/pdf/2312.04095
Abstract Recent data-privacy laws have sparked interest in machine unlearning, which involves removing the effect of specific training samples from a learnt model as if they were never present in the original training dataset. The challenge of machine unlearning is to discard information about the ``forget'' data in the learnt model without altering the knowledge about the remaining dataset and to do so more efficiently than the naive retraining approach. To achieve this, we adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU), in which the model takes steps in the orthogonal direction to the gradient subspaces deemed unimportant for the retaining dataset, so as to its knowledge is preserved. By utilizing Stochastic Gradient Descent (SGD) to update the model weights, our method can efficiently scale to any model and dataset size. We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible. Our code is available at https://github.com/hnanhtuan/projected_gradient_unlearning.
Merging by Matching Models in Task Subspaces
Authors: Authors: Derek Tam, Mohit Bansal, Colin Raffel
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2312.04339
Pdf link: https://arxiv.org/pdf/2312.04339
Abstract Model merging aims to cheaply combine individual task-specific models into a single multitask model. In this work, we view past merging methods as leveraging different notions of a ''task subspace'' in which models are matched before being merged. We connect the task subspace of a given model to its loss landscape and formalize how this approach to model merging can be seen as solving a linear system of equations. While past work has generally been limited to linear systems that have a closed-form solution, we consider using the conjugate gradient method to find a solution. We show that using the conjugate gradient method can outperform closed-form solutions, enables merging via linear systems that are otherwise intractable to solve, and flexibly allows choosing from a wide variety of initializations and estimates for the ''task subspace''. We ultimately demonstrate that our merging framework called ''Matching Models in their Task Subspace'' (MaTS) achieves state-of-the-art results in multitask and intermediate-task model merging. We release all of the code and checkpoints used in our work at https://github.com/r-three/mats.
HARQ-IR Aided Short Packet Communications: BLER Analysis and Throughput Maximization
Authors: Authors: Fuchao He, Zheng Shi, Guanghua Yang, Xiaofan Li, Xinrong Ye, Shaodan Ma
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.04377
Pdf link: https://arxiv.org/pdf/2312.04377
Abstract This paper introduces hybrid automatic repeat request with incremental redundancy (HARQ-IR) to boost the reliability of short packet communications. The finite blocklength information theory and correlated decoding events tremendously preclude the analysis of average block error rate (BLER). Fortunately, the recursive form of average BLER motivates us to calculate its value through the trapezoidal approximation and Gauss-Laguerre quadrature. Moreover, the asymptotic analysis is performed to derive a simple expression for the average BLER at high signal-to-noise ratio (SNR). Then, we study the maximization of long term average throughput (LTAT) via power allocation meanwhile ensuring the power and the BLER constraints. For tractability, the asymptotic BLER is employed to solve the problem through geometric programming (GP). However, the GP-based solution underestimates the LTAT at low SNR due to a large approximation error in this case. Alternatively, we also develop a deep reinforcement learning (DRL)-based framework to learn power allocation policy. In particular, the optimization problem is transformed into a constrained Markov decision process, which is solved by integrating deep deterministic policy gradient (DDPG) with subgradient method. The numerical results finally demonstrate that the DRL-based method outperforms the GP-based one at low SNR, albeit at the cost of increasing computational burden.
Adversarial Learning for Feature Shift Detection and Correction
Authors: Authors: Miriam Barrabes, Daniel Mas Montserrat, Margarita Geleta, Xavier Giro-i-Nieto, Alexander G. Ioannidis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Applications (stat.AP); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.04546
Pdf link: https://arxiv.org/pdf/2312.04546
Abstract Data shift is a phenomenon present in many real-world applications, and while there are multiple methods attempting to detect shifts, the task of localizing and correcting the features originating such shifts has not been studied in depth. Feature shifts can occur in many datasets, including in multi-sensor data, where some sensors are malfunctioning, or in tabular and structured data, including biomedical, financial, and survey data, where faulty standardization and data processing pipelines can lead to erroneous features. In this work, we explore using the principles of adversarial learning, where the information from several discriminators trained to distinguish between two distributions is used to both detect the corrupted features and fix them in order to remove the distribution shift between datasets. We show that mainstream supervised classifiers, such as random forest or gradient boosting trees, combined with simple iterative heuristics, can localize and correct feature shifts, outperforming current statistical and neural network-based techniques. The code is available at https://github.com/AI-sandbox/DataFix.
Improved Visual Grounding through Self-Consistent Explanations
Authors: Authors: Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.04554
Pdf link: https://arxiv.org/pdf/2312.04554
Abstract Vision-and-language models trained to match images with text can be combined with visual explanation methods to point to the locations of specific objects in an image. Our work shows that the localization --"grounding"-- abilities of these models can be further improved by finetuning for self-consistent visual explanations. We propose a strategy for augmenting existing text-image datasets with paraphrases using a large language model, and SelfEQ, a weakly-supervised strategy on visual explanation maps for paraphrases that encourages self-consistency. Specifically, for an input textual phrase, we attempt to generate a paraphrase and finetune the model so that the phrase and paraphrase map to the same region in the image. We posit that this both expands the vocabulary that the model is able to handle, and improves the quality of the object locations highlighted by gradient-based visual explanation methods (e.g. GradCAM). We demonstrate that SelfEQ improves performance on Flickr30k, ReferIt, and RefCOCO+ over a strong baseline method and several prior works. Particularly, comparing to other methods that do not use any type of box annotations, we obtain 84.07% on Flickr30k (an absolute improvement of 4.69%), 67.40% on ReferIt (an absolute improvement of 7.68%), and 75.10%, 55.49% on RefCOCO+ test sets A and B respectively (an absolute improvement of 3.74% on average).
Keyword: super-resolution

There is no result

zoq / arxiv-updates

New submissions for Fri, 8 Dec 23 #660

Keyword: sgd

PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance

Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection

Keyword: optimization

Computation of the optimal error exponent function for fixed-length lossy source coding in discrete memoryless sources

AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation

XCube ($\mathcal{X}^3$): Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies

Uncertainty-Informed Renewable Energy Scheduling: A Scalable Bilevel Framework

Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives

A Scalable and Generalizable Pathloss Map Prediction

Understanding the Role of Optimization in Double Descent

MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator

Resilience-Assuring Hydrogen-Powered Microgrids

Moirai: Towards Optimal Placement for Distributed Inference on Heterogeneous Devices

Improved Face Representation via Joint Label Classification and Supervised Contrastive Clustering

Queueing Delay Minimization in Overloaded Networks via Rate Control

Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching

Rate-splitting Multiple Access for Hierarchical HAP-LAP Networks under Limited Fronthaul

Edge computing service deployment and task offloading based on multi-task high-dimensional multi-objective optimization

Towards 4D Human Video Stylization

Zero-Touch Networks: Towards Next-Generation Network Automation

Cell segmentation of in situ transcriptomics data using signed graph partitioning

Adaptive Recursive Query Optimization

HARQ-IR Aided Short Packet Communications: BLER Analysis and Throughput Maximization

Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization

MIST: An Efficient Approach for Software-Defined Multicast in Wireless Mesh Networks

Using Large Language Models for Hyperparameter Optimization

Camera Height Doesn't Change: Unsupervised Monocular Scale-Aware Road-Scene Depth Estimation

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS

Keyword: adam

Keyword: gradient

PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance

AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation

Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives

On The Fairness Impacts of Hardware Selection in Machine Learning

Improving Gradient-guided Nested Sampling for Posterior Inference

MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity and Degree Descent Criterion

Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection

Merging by Matching Models in Task Subspaces

HARQ-IR Aided Short Packet Communications: BLER Analysis and Throughput Maximization

Adversarial Learning for Feature Shift Detection and Correction

Improved Visual Grounding through Self-Consistent Explanations

Keyword: super-resolution