New submissions for Wed, 6 Dec 23

Keyword: sgd

Can We Learn Communication-Efficient Optimizers?

Authors: Authors: Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.02204
Pdf link: https://arxiv.org/pdf/2312.02204
Abstract Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally, that is on each worker, before averaging model parameters, helping relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art adaptive optimizers for deep learning. In this work, we investigate if the recent progress in the emerging area of learned optimizers can potentially close this gap while remaining communication-efficient. Specifically, we meta-learn how to perform global updates given an update from local SGD iterations. Our results demonstrate that learned optimizers can substantially outperform local SGD and its sophisticated variants while maintaining their communication efficiency. Learned optimizers can even generalize to unseen and much larger datasets and architectures, including ImageNet and ViTs, and to unseen modalities such as language modeling. We therefore demonstrate the potential of learned optimizers for improving communication-efficient distributed learning.
Auto DP-SGD: Dual Improvements of Privacy and Accuracy via Automatic Clipping Threshold and Noise Multiplier Estimation
Authors: Authors: Sai Venkatesh Chilukoti, Md Imran Hossen, Liqun Shan, Vijay Srinivas Tida, Xiai Hei
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2312.02400
Pdf link: https://arxiv.org/pdf/2312.02400
Abstract DP-SGD has emerged as a popular method to protect personally identifiable information in deep learning applications. Unfortunately, DP-SGD's per-sample gradient clipping and uniform noise addition during training can significantly degrade model utility. To enhance the model's utility, researchers proposed various adaptive DP-SGD methods. However, we examine and discover that these techniques result in greater privacy leakage or lower accuracy than the traditional DP-SGD method, or a lack of evaluation on a complex data set such as CIFAR100. To address these limitations, we propose an Auto DP-SGD. Our method automates clipping threshold estimation based on the DL model's gradient norm and scales the gradients of each training sample without losing gradient information. This helps to improve the algorithm's utility while using a less privacy budget. To further improve accuracy, we introduce automatic noise multiplier decay mechanisms to decrease the noise multiplier after every epoch. Finally, we develop closed-form mathematical expressions using tCDP accountant for automatic noise multiplier and automatic clipping threshold estimation. Through extensive experimentation, we demonstrate that Auto DP-SGD outperforms existing SOTA DP-SGD methods in privacy and accuracy on various benchmark datasets. We also show that privacy can be improved by lowering the scale factor and using learning rate schedulers without significantly reducing accuracy. Specifically, Auto DP-SGD, when used with a step noise multiplier, improves accuracy by 3.20, 1.57, 6.73, and 1.42 for the MNIST, CIFAR10, CIFAR100, and AG News Corpus datasets, respectively. Furthermore, it obtains a substantial reduction in the privacy budget of 94.9, 79.16, 67.36, and 53.37 for the corresponding data sets.
Flexible Communication for Optimal Distributed Learning over Unpredictable Networks
Authors: Authors: Sahil Tyagi, Martin Swany
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02493
Pdf link: https://arxiv.org/pdf/2312.02493
Abstract Gradient compression alleviates expensive communication in distributed deep learning by sending fewer values and its corresponding indices, typically via Allgather (AG). Training with high compression ratio (CR) achieves high accuracy like DenseSGD, but has lower parallel scaling due to high communication cost (i.e., parallel efficiency). Using lower CRs improves parallel efficiency by lowering synchronization cost, but degrades model accuracy as well (statistical efficiency). Further, speedup attained with different models and CRs also varies with network latency, effective bandwidth and collective op used for aggregation. In many cases, collectives like Allreduce (AR) have lower cost than AG to exchange the same amount of data. In this paper, we propose an AR-compatible Topk compressor that is bandwidth-optimal and thus performs better than AG in certain network configurations. We develop a flexible communication strategy that switches between AG and AR based on which collective is optimal in the current settings, and model the pareto-relationship between parallel and statistical efficiency as a multi-objective optimization (MOO) problem to dynamically adjust CR and accelerate training while still converging to high accuracy.
Keyword: optimization

Efficient LDPC Decoding using Physical Computation
Authors: Authors: Uday Kumar Reddy Vengalam, Andrew Hahn, Yongchao Liu, Anshujit Sharma, Hui Wu, Michael Huang
Subjects: Information Theory (cs.IT); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2312.02161
Pdf link: https://arxiv.org/pdf/2312.02161
Abstract Due to 5G deployment, there is significant interest in LDPC decoding. While much research is devoted on efficient hardwiring of algorithms based on Belief Propagation (BP), it has been shown that LDPC decoding can be formulated as a combinatorial optimization problem, which could benefit from significant acceleration of physical computation mechanisms such as Ising machines. This approach has so far resulted in poor performance. This paper shows that the reason is not fundamental but suboptimal hardware and formulation. A co-designed Ising machine-based system can improve speed by 3 orders of magnitude. As a result, a physical computation approach can outperform hardwiring state-of-the-art algorithms. In this paper, we show such an augmented Ising machine that is 4.4$\times$ more energy efficient than the state of the art in the literature.
Mercury: A modeling, simulation, and optimization framework for data stream-oriented IoT applications
Authors: Authors: Román Cárdenas, Patricia Arroba, Roberto Blanco, Pedro Malagón, José L. Risco-Martín, José M. Moya
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2312.02172
Pdf link: https://arxiv.org/pdf/2312.02172
Abstract The Internet of Things is transforming our society by monitoring users and infrastructures' behavior to enable new services that will improve life quality and resource management. These applications require a vast amount of localized information to be processed in real-time so, the deployment of new fog computing infrastructures that bring computing closer to the data sources is a major concern. In this context, we present Mercury, a Modeling, Simulation, and Optimization (M&S&O) framework to analyze the dimensioning and the dynamic operation of real-time fog computing scenarios. Our research proposes a location-aware solution that supports data stream analytics applications including FaaS-based computation offloading. Mercury implements a detailed structural and behavioral simulation model, providing fine-grained simulation outputs, and is described using the Discrete Event System Specification (DEVS) mathematical formalism, helping to validate the model's implementation. Finally, we present a case study using real traces from a driver assistance scenario, offering a detailed comparison with other state-of-the-art simulators.
How Generative-AI can be Effectively used in Government Chatbots
Authors: Authors: Zeteng Lin
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); General Economics (econ.GN)
Arxiv link: https://arxiv.org/abs/2312.02181
Pdf link: https://arxiv.org/pdf/2312.02181
Abstract With the rapid development of artificial intelligence and breakthroughs in machine learning and natural language processing, intelligent question-answering robots have become widely used in government affairs. This paper conducts a horizontal comparison between Guangdong Province's government chatbots, ChatGPT, and Wenxin Ernie, two large language models, to analyze the strengths and weaknesses of existing government chatbots and AIGC technology. The study finds significant differences between government chatbots and large language models. China's government chatbots are still in an exploratory stage and have a gap to close to achieve "intelligence." To explore the future direction of government chatbots more deeply, this research proposes targeted optimization paths to help generative AI be effectively applied in government chatbot conversations.
DiverseDream: Diverse Text-to-3D Synthesis with Augmented Text Embedding
Authors: Authors: Uy Dieu Tran, Minh Luu, Phong Nguyen, Janne Heikkila, Khoi Nguyen, Binh-Son Hua
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02192
Pdf link: https://arxiv.org/pdf/2312.02192
Abstract Text-to-3D synthesis has recently emerged as a new approach to sampling 3D models by adopting pretrained text-to-image models as guiding visual priors. An intriguing but underexplored problem with existing text-to-3D methods is that 3D models obtained from the sampling-by-optimization procedure tend to have mode collapses, and hence poor diversity in their results. In this paper, we provide an analysis and identify potential causes of such a limited diversity, and then devise a new method that considers the joint generation of different 3D models from the same text prompt, where we propose to use augmented text prompts via textual inversion of reference images to diversify the joint generation. We show that our method leads to improved diversity in text-to-3D synthesis qualitatively and quantitatively.
TranSegPGD: Improving Transferability of Adversarial Examples on Semantic Segmentation
Authors: Authors: Xiaojun Jia, Jindong Gu, Yihao Huang, Simeng Qin, Qing Guo, Yang Liu, Xiaochun Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02207
Pdf link: https://arxiv.org/pdf/2312.02207
Abstract Transferability of adversarial examples on image classification has been systematically explored, which generates adversarial examples in black-box mode. However, the transferability of adversarial examples on semantic segmentation has been largely overlooked. In this paper, we propose an effective two-stage adversarial attack strategy to improve the transferability of adversarial examples on semantic segmentation, dubbed TranSegPGD. Specifically, at the first stage, every pixel in an input image is divided into different branches based on its adversarial property. Different branches are assigned different weights for optimization to improve the adversarial performance of all pixels.We assign high weights to the loss of the hard-to-attack pixels to misclassify all pixels. At the second stage, the pixels are divided into different branches based on their transferable property which is dependent on Kullback-Leibler divergence. Different branches are assigned different weights for optimization to improve the transferability of the adversarial examples. We assign high weights to the loss of the high-transferability pixels to improve the transferability of adversarial examples. Extensive experiments with various segmentation models are conducted on PASCAL VOC 2012 and Cityscapes datasets to demonstrate the effectiveness of the proposed method. The proposed adversarial attack method can achieve state-of-the-art performance.
A Data-efficient Framework for Robotics Large-scale LiDAR Scene Parsing
Authors: Authors: Kangcheng Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.02208
Pdf link: https://arxiv.org/pdf/2312.02208
Abstract Existing state-of-the-art 3D point clouds understanding methods only perform well in a fully supervised manner. To the best of our knowledge, there exists no unified framework which simultaneously solves the downstream high-level understanding tasks, especially when labels are extremely limited. This work presents a general and simple framework to tackle point clouds understanding when labels are limited. We propose a novel unsupervised region expansion based clustering method for generating clusters. More importantly, we innovatively propose to learn to merge the over-divided clusters based on the local low-level geometric property similarities and the learned high-level feature similarities supervised by weak labels. Hence, the true weak labels guide pseudo labels merging taking both geometric and semantic feature correlations into consideration. Finally, the self-supervised reconstruction and data augmentation optimization modules are proposed to guide the propagation of labels among semantically similar points within a scene. Experimental Results demonstrate that our framework has the best performance among the three most important weakly supervised point clouds understanding tasks including semantic segmentation, instance segmentation, and object detection even when limited points are labeled, under the data-efficient settings for the large-scale 3D semantic scene parsing. The developed techniques have postentials to be applied to downstream tasks for better representations in robotic manipulation and robotic autonomous navigation. Codes and models are publicly available at: https://github.com/KangchengLiu.
JarviX: A LLM No code Platform for Tabular Data Analysis and Optimization
Authors: Authors: Shang-Ching Liu, ShengKun Wang, Wenqi Lin, Chung-Wei Hsiung, Yi-Chen Hsieh, Yu-Ping Cheng, Sian-Hong Luo, Tsungyao Chang, Jianwei Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Applications (stat.AP)
Arxiv link: https://arxiv.org/abs/2312.02213
Pdf link: https://arxiv.org/pdf/2312.02213
Abstract In this study, we introduce JarviX, a sophisticated data analytics framework. JarviX is designed to employ Large Language Models (LLMs) to facilitate an automated guide and execute high-precision data analyzes on tabular datasets. This framework emphasizes the significance of varying column types, capitalizing on state-of-the-art LLMs to generate concise data insight summaries, propose relevant analysis inquiries, visualize data effectively, and provide comprehensive explanations for results drawn from an extensive data analysis pipeline. Moreover, JarviX incorporates an automated machine learning (AutoML) pipeline for predictive modeling. This integration forms a comprehensive and automated optimization cycle, which proves particularly advantageous for optimizing machine configuration. The efficacy and adaptability of JarviX are substantiated through a series of practical use case studies.
FlowHON: Representing Flow Fields Using Higher-Order Networks
Authors: Authors: Nan Chen, Zhihong Li, Jun Tao
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.02243
Pdf link: https://arxiv.org/pdf/2312.02243
Abstract Flow fields are often partitioned into data blocks for massively parallel computation and analysis based on blockwise relationships. However, most of the previous techniques only consider the first-order dependencies among blocks, which is insufficient in describing complex flow patterns. In this work, we present FlowHON, an approach to construct higher-order networks (HONs) from flow fields. FlowHON captures the inherent higher-order dependencies in flow fields as nodes and estimates the transitions among them as edges. We formulate the HON construction as an optimization problem with three linear transformations. The first two layers correspond to the node generation and the third one corresponds to edge estimation. Our formulation allows the node generation and edge estimation to be solved in a unified framework. With FlowHON, the rich set of traditional graph algorithms can be applied without any modification to analyze flow fields, while leveraging the higher-order information to understand the inherent structure and manage flow data for efficiency. We demonstrate the effectiveness of FlowHON using a series of downstream tasks, including estimating the density of particles during tracing, partitioning flow fields for data management, and understanding flow fields using the node-link diagram representation of networks.
Re-Nerfing: Enforcing Geometric Constraints on Neural Radiance Fields through Novel Views Synthesis
Authors: Authors: Felix Tristram, Stefano Gasperini, Federico Tombari, Nassir Navab, Benjamin Busam
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.02255
Pdf link: https://arxiv.org/pdf/2312.02255
Abstract Neural Radiance Fields (NeRFs) have shown remarkable novel view synthesis capabilities even in large-scale, unbounded scenes, albeit requiring hundreds of views or introducing artifacts in sparser settings. Their optimization suffers from shape-radiance ambiguities wherever only a small visual overlap is available. This leads to erroneous scene geometry and artifacts. In this paper, we propose Re-Nerfing, a simple and general multi-stage approach that leverages NeRF's own view synthesis to address these limitations. With Re-Nerfing, we increase the scene's coverage and enhance the geometric consistency of novel views as follows: First, we train a NeRF with the available views. Then, we use the optimized NeRF to synthesize pseudo-views next to the original ones to simulate a stereo or trifocal setup. Finally, we train a second NeRF with both original and pseudo views while enforcing structural, epipolar constraints via the newly synthesized images. Extensive experiments on the mip-NeRF 360 dataset show the effectiveness of Re-Nerfing across denser and sparser input scenarios, bringing improvements to the state-of-the-art Zip-NeRF, even when trained with all views.
CLIPDrawX: Primitive-based Explanations for Text Guided Sketch Synthesis
Authors: Authors: Nityanand Mathur, Shyam Marjit, Abhra Chaudhuri, Anjan Dutta
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02345
Pdf link: https://arxiv.org/pdf/2312.02345
Abstract With the goal of understanding the visual concepts that CLIP associates with text prompts, we show that the latent space of CLIP can be visualized solely in terms of linear transformations on simple geometric primitives like circles and straight lines. Although existing approaches achieve this by sketch-synthesis-through-optimization, they do so on the space of B\'ezier curves, which exhibit a wastefully large set of structures that they can evolve into, as most of them are non-essential for generating meaningful sketches. We present CLIPDrawX, an algorithm that provides significantly better visualizations for CLIP text embeddings, using only simple primitive shapes like straight lines and circles. This constrains the set of possible outputs to linear transformations on these primitives, thereby exhibiting an inherently simpler mathematical form. The synthesis process of CLIPDrawX can be tracked end-to-end, with each visual concept being explained exclusively in terms of primitives. Implementation will be released upon acceptance. Project Page: $\href{https://clipdrawx.github.io/}{\text{https://clipdrawx.github.io/}}$.
Deep Neural Operator Enabled Concurrent Multitask Design for Multifunctional Metamaterials under Heterogeneous Fields
Authors: Authors: Doksoo Lee, Lu Zhang, Yue Yu, Wei Chen
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2312.02403
Pdf link: https://arxiv.org/pdf/2312.02403
Abstract Multifunctional metamaterials (MMM) bear promise as next-generation material platforms supporting miniaturization and customization. Despite many proof-of-concept demonstrations and the proliferation of deep learning assisted design, grand challenges of inverse design for MMM, especially those involving heterogeneous fields possibly subject to either mutual meta-atom coupling or long-range interactions, remain largely under-explored. To this end, we present a data-driven design framework, which streamlines the inverse design of MMMs involving heterogeneous fields. A core enabler is implicit Fourier neural operator (IFNO), which predicts heterogeneous fields distributed across a metamaterial array, thus in general at odds with homogenization assumptions, in a parameter-/sample-efficient fashion. Additionally, we propose a standard formulation of inverse problem covering a broad class of MMMs, and gradient-based multitask concurrent optimization identifying a set of Pareto-optimal architecture-stimulus (A-S) pairs. Fourier multiclass blending is proposed to synthesize inter-class meta-atoms anchored on a set of geometric motifs, while enjoying training-free dimension reduction and built-it reconstruction. Interlocking the three pillars, the framework is validated for light-bylight programmable plasmonic nanoantenna, whose design involves vast space jointly spanned by quasi-freeform supercells, maneuverable incident phase distributions, and conflicting figure-of-merits involving on-demand localization patterns. Accommodating all the challenges without a-priori simplifications, our framework could propel future advancements of MMM.
GNSS Odometry: Precise Trajectory Estimation Based on Carrier Phase Cycle Slip Estimation
Authors: Authors: Taro Suzuki
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.02424
Pdf link: https://arxiv.org/pdf/2312.02424
Abstract This paper proposes a highly accurate trajectory estimation method for outdoor mobile robots using global navigation satellite system (GNSS) time differences of carrier phase (TDCP) measurements. By using GNSS TDCP, the relative 3D position can be estimated with millimeter precision. However, when a phenomenon called cycle slip occurs, wherein the carrier phase measurement jumps and becomes discontinuous, it is impossible to accurately estimate the relative position using TDCP. Although previous studies have eliminated the effect of cycle slip using a robust optimization technique, it was difficult to completely eliminate the effect of outliers. In this paper, we propose a method to detect GNSS carrier phase cycle slip, estimate the amount of cycle slip, and modify the observed TDCP to calculate the relative position using the factor graph optimization framework. The estimated relative position acts as a loop closure in graph optimization and contributes to the reduction in the integration error of the relative position. Experiments with an unmanned aerial vehicle showed that by modifying the cycle slip using the proposed method, the vehicle trajectory could be estimated with an accuracy of 5 to 30 cm using only a single GNSS receiver, without using any other external data or sensors.
PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models
Authors: Authors: Wei-Cheng Chang, Jyun-Yu Jiang, Jiong Zhang, Mutasem Al-Darabsah, Choon Hui Teo, Cho-Jui Hsieh, Hsiang-Fu Yu, S.V.N. Vishwanathan
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.02429
Pdf link: https://arxiv.org/pdf/2312.02429
Abstract Embedding-based Retrieval Models (ERMs) have emerged as a promising framework for large-scale text retrieval problems due to powerful large language models. Nevertheless, fine-tuning ERMs to reach state-of-the-art results can be expensive due to the extreme scale of data as well as the complexity of multi-stages pipelines (e.g., pre-training, fine-tuning, distillation). In this work, we propose the PEFA framework, namely ParamEter-Free Adapters, for fast tuning of ERMs without any backward pass in the optimization. At index building stage, PEFA equips the ERM with a non-parametric k-nearest neighbor (kNN) component. At inference stage, PEFA performs a convex combination of two scoring functions, one from the ERM and the other from the kNN. Based on the neighborhood definition, PEFA framework induces two realizations, namely PEFA-XL (i.e., extra large) using double ANN indices and PEFA-XS (i.e., extra small) using a single ANN index. Empirically, PEFA achieves significant improvement on two retrieval applications. For document retrieval, regarding Recall@100 metric, PEFA improves not only pre-trained ERMs on Trivia-QA by an average of 13.2%, but also fine-tuned ERMs on NQ-320K by an average of 5.5%, respectively. For product search, PEFA improves the Recall@100 of the fine-tuned ERMs by an average of 5.3% and 14.5%, for PEFA-XS and PEFA-XL, respectively. Our code is available at https://github.com/ amzn/pecos/tree/mainline/examples/pefa-wsdm24
FINER: Flexible spectral-bias tuning in Implicit NEural Representation by Variable-periodic Activation Functions
Authors: Authors: Zhen Liu, Hao Zhu, Qi Zhang, Jingde Fu, Weibing Deng, Zhan Ma, Yanwen Guo, Xun Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02434
Pdf link: https://arxiv.org/pdf/2312.02434
Abstract Implicit Neural Representation (INR), which utilizes a neural network to map coordinate inputs to corresponding attributes, is causing a revolution in the field of signal processing. However, current INR techniques suffer from a restricted capability to tune their supported frequency set, resulting in imperfect performance when representing complex signals with multiple frequencies. We have identified that this frequency-related problem can be greatly alleviated by introducing variable-periodic activation functions, for which we propose FINER. By initializing the bias of the neural network within different ranges, sub-functions with various frequencies in the variable-periodic function are selected for activation. Consequently, the supported frequency set of FINER can be flexibly tuned, leading to improved performance in signal representation. We demonstrate the capabilities of FINER in the contexts of 2D image fitting, 3D signed distance field representation, and 5D neural radiance fields optimization, and we show that it outperforms existing INRs.
Time-Relative RTK-GNSS: GNSS Loop Closure in Pose Graph Optimization
Authors: Authors: Taro Suzuki
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.02448
Pdf link: https://arxiv.org/pdf/2312.02448
Abstract A pose-graph-based optimization technique is widely used to estimate robot poses using various sensor measurements from devices such as laser scanners and cameras. The global navigation satellite system (GNSS) has recently been used to estimate the absolute 3D position of outdoor mobile robots. However, since the accuracy of GNSS single-point positioning is only a few meters, the GNSS is not used for the loop closure of a pose graph. The main purpose of this study is to generate a loop closure of a pose graph using a time-relative real-time kinematic GNSS (TR-RTK-GNSS) technique. The proposed TR-RTK-GNSS technique uses time-differential carrier phase positioning, which is based on carrier-phase-based differential GNSS with a single GNSS receiver. Unlike a conventional RTK-GNSS, we can directly compute the robot's relative position using only a stand-alone GNSS receiver. The initial pose graph is generated from the accumulated velocity computed from GNSS Doppler measurements. To reduce the accumulated error of velocity, we use the TR-RTK-GNSS technique for the loop closure in the graph-based optimization framework. The kinematic positioning tests were performed using an unmanned aerial vehicle to confirm the effectiveness of the proposed technique. From the tests, we can estimate the vehicle's trajectory with approximately 3 cm accuracy using only a stand-alone GNSS receiver.
SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints
Authors: Authors: Xianping Ma, Qianqian Wu, Xingyu Zhao, Xiaokang Zhang, Man-On Pun, Bo Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02464
Pdf link: https://arxiv.org/pdf/2312.02464
Abstract Semantic segmentation of remote sensing imagery plays a pivotal role in extracting precise information for diverse down-stream applications. Recent development of the Segment Anything Model (SAM), an advanced general-purpose segmentation model, has revolutionized this field, presenting new avenues for accurate and efficient segmentation. However, SAM is limited to generating segmentation results without class information. Consequently, the utilization of such a powerful general vision model for semantic segmentation in remote sensing images has become a focal point of research. In this paper, we present a streamlined framework aimed at leveraging the raw output of SAM by exploiting two novel concepts called SAM-Generated Object (SGO) and SAM-Generated Boundary (SGB). More specifically, we propose a novel object loss and further introduce a boundary loss as augmentative components to aid in model optimization in a general semantic segmentation framework. Taking into account the content characteristics of SGO, we introduce the concept of object consistency to leverage segmented regions lacking semantic information. By imposing constraints on the consistency of predicted values within objects, the object loss aims to enhance semantic segmentation performance. Furthermore, the boundary loss capitalizes on the distinctive features of SGB by directing the model's attention to the boundary information of the object. Experimental results on two well-known datasets, namely ISPRS Vaihingen and LoveDA Urban, demonstrate the effectiveness of our proposed method. The source code for this work will be accessible at https://github.com/sstary/SSRS.
Flexible Communication for Optimal Distributed Learning over Unpredictable Networks
Authors: Authors: Sahil Tyagi, Martin Swany
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02493
Pdf link: https://arxiv.org/pdf/2312.02493
Abstract Gradient compression alleviates expensive communication in distributed deep learning by sending fewer values and its corresponding indices, typically via Allgather (AG). Training with high compression ratio (CR) achieves high accuracy like DenseSGD, but has lower parallel scaling due to high communication cost (i.e., parallel efficiency). Using lower CRs improves parallel efficiency by lowering synchronization cost, but degrades model accuracy as well (statistical efficiency). Further, speedup attained with different models and CRs also varies with network latency, effective bandwidth and collective op used for aggregation. In many cases, collectives like Allreduce (AR) have lower cost than AG to exchange the same amount of data. In this paper, we propose an AR-compatible Topk compressor that is bandwidth-optimal and thus performs better than AG in certain network configurations. We develop a flexible communication strategy that switches between AG and AR based on which collective is optimal in the current settings, and model the pareto-relationship between parallel and statistical efficiency as a multi-objective optimization (MOO) problem to dynamically adjust CR and accelerate training while still converging to high accuracy.
Estimation of articulated angle in six-wheeled dump trucks using multiple GNSS receivers for autonomous driving
Authors: Authors: Taro Suzuki, Kazunori Ohno, Syotaro Kojima, Naoto Miyamoto, Takahiro Suzuki, Tomohiro Komatsu, Yukinori Shibata, Kimitaka Asano, Keiji Nagatani
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.02510
Pdf link: https://arxiv.org/pdf/2312.02510
Abstract Due to the declining birthrate and aging population, the shortage of labor in the construction industry has become a serious problem, and increasing attention has been paid to automation of construction equipment. We focus on the automatic operation of articulated six-wheel dump trucks at construction sites. For the automatic operation of the dump trucks, it is important to estimate the position and the articulated angle of the dump trucks with high accuracy. In this study, we propose a method for estimating the state of a dump truck by using four global navigation satellite systems (GNSSs) installed on an articulated dump truck and a graph optimization method that utilizes the redundancy of multiple GNSSs. By adding real-time kinematic (RTK)-GNSS constraints and geometric constraints between the four antennas, the proposed method can robustly estimate the position and articulation angle even in environments where GNSS satellites are partially blocked. As a result of evaluating the accuracy of the proposed method through field tests, it was confirmed that the articulated angle could be estimated with an accuracy of 0.1$^\circ$ in an open-sky environment and 0.7$^\circ$ in a mountainous area simulating an elevation angle of 45$^\circ$ where GNSS satellites are blocked.
Skipping Scheme for Gate-hiding Garbled Circuits
Authors: Authors: Ke Lin
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2312.02514
Pdf link: https://arxiv.org/pdf/2312.02514
Abstract In classic settings of garbled circuits, each gate type is leaked to improve both space and speed optimization. Zahur et al. have shown in EUROCRYPT 2015 that a typical linear garbling scheme requires at least two $\lambda$-bit elements per gate with a security parameter of $\lambda$, which limits their efficiency. In contrast to typical garbled circuits, gate-hiding garbled circuits have the potential to drastically reduce time costs, although they have been underappreciated. We propose the first skipping scheme for gate-hiding garbled circuits to enhance the efficiency of evaluation by observing prime implicants. Our scheme introduces skip gates to eliminate the need to calculate the entire circuit, enabling unnecessary execution paths to be avoided. We also introduce two variants of our scheme that balance security with parallelism. A proof of hybrid security that combines simulation-based and symmetry-based security in semi-honest scenarios is presented to demonstrate its security under gate-hiding conditions. Our scheme will inspire new directions to improve the general garbling scheme and lead to more practical ones.
BOgen: Generating Part-Level 3D Designs Based on User Intention Inference through Bayesian Optimization and Variational Autoencoder
Authors: Authors: Seung Won Lee, Jiin Choi, Kyung Hoon Hyun
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2312.02557
Pdf link: https://arxiv.org/pdf/2312.02557
Abstract Advancements in generative artificial intelligence (AI) have introduced various AI models capable of producing impressive visual design outputs. However, when it comes to AI models in the design process, prioritizing outputs that align with designers' needs over mere visual craftsmanship becomes even more crucial. Furthermore, designers often intricately combine parts of various designs to create novel designs. The ability to generate designs that align with the designers' intentions at the part level is pivotal for assisting designers. Hence, we introduced BOgen, which empowers designers to proactively generate and explore part-level designs through Bayesian optimization and variational autoencoders, thereby enhancing their overall user experience. We assessed BOgen's performance using a study involving 30 designers. The results revealed that, compared to the baseline, BOgen fulfilled the designer requirements for part recommendations and design exploration space guidance. BOgen assists designers in navigation and development, offering valuable design suggestions and fosters proactive design exploration and creation.
Prompt2NeRF-PIL: Fast NeRF Generation via Pretrained Implicit Latent
Authors: Authors: Jianmeng Liu, Yuyao Zhang, Zeyuan Meng, Yu-Wing Tai, Chi-Keung Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02568
Pdf link: https://arxiv.org/pdf/2312.02568
Abstract This paper explores promptable NeRF generation (e.g., text prompt or single image prompt) for direct conditioning and fast generation of NeRF parameters for the underlying 3D scenes, thus undoing complex intermediate steps while providing full 3D generation with conditional control. Unlike previous diffusion-CLIP-based pipelines that involve tedious per-prompt optimizations, Prompt2NeRF-PIL is capable of generating a variety of 3D objects with a single forward pass, leveraging a pre-trained implicit latent space of NeRF parameters. Furthermore, in zero-shot tasks, our experiments demonstrate that the NeRFs produced by our method serve as semantically informative initializations, significantly accelerating the inference process of existing prompt-to-NeRF methods. Specifically, we will show that our approach speeds up the text-to-NeRF model DreamFusion and the 3D reconstruction speed of the image-to-NeRF method Zero-1-to-3 by 3 to 5 times.
TSVR+: Twin support vector regression with privileged information
Authors: Authors: Anuradha Kumari, M. Tanveer
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.02596
Pdf link: https://arxiv.org/pdf/2312.02596
Abstract In the realm of machine learning, the data may contain additional attributes, known as privileged information (PI). The main purpose of PI is to assist in the training of the model and then utilize the acquired knowledge to make predictions for unseen samples. Support vector regression (SVR) is an effective regression model, however, it has a low learning speed due to solving a convex quadratic problem (QP) subject to a pair of constraints. In contrast, twin support vector regression (TSVR) is more efficient than SVR as it solves two QPs each subject to one set of constraints. However, TSVR and its variants are trained only on regular features and do not use privileged features for training. To fill this gap, we introduce a fusion of TSVR with learning using privileged information (LUPI) and propose a novel approach called twin support vector regression with privileged information (TSVR+). The regularization terms in the proposed TSVR+ capture the essence of statistical learning theory and implement the structural risk minimization principle. We use the successive overrelaxation (SOR) technique to solve the optimization problem of the proposed TSVR+, which enhances the training efficiency. As far as our knowledge extends, the integration of the LUPI concept into twin variants of regression models is a novel advancement. The numerical experiments conducted on UCI, stock and time series data collectively demonstrate the superiority of the proposed model.
Prompt Optimization via Adversarial In-Context Learning
Authors: Authors: Xuan Long Do, Yiran Zhao, Hannah Brown, Yuxi Xie, James Xu Zhao, Nancy F. Chen, Kenji Kawaguchi, Michael Qizhe Xie, Junxian He
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2312.02614
Pdf link: https://arxiv.org/pdf/2312.02614
Abstract We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough output to fool the discriminator. In each round, given an input prefixed by task instructions and several exemplars, the generator produces an output. The discriminator is then tasked with classifying the generator input-output pair as model-generated or real data. Based on the discriminator loss, the prompt modifier proposes possible edits to the generator and discriminator prompts, and the edits that most improve the adversarial loss are selected. We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques for both open and closed-source models on 11 generation and classification tasks including summarization, arithmetic reasoning, machine translation, data-to-text generation, and the MMLU and big-bench hard benchmarks. In addition, because our method uses pre-trained models and updates only prompts rather than model parameters, it is computationally efficient, easy to extend to any LLM and task, and effective in low-resource settings.
On the Initialization of Graph Neural Networks
Authors: Authors: Jiahang Li, Yakun Song, Xiang Song, David Paul Wipf
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.02622
Pdf link: https://arxiv.org/pdf/2312.02622
Abstract Graph Neural Networks (GNNs) have displayed considerable promise in graph representation learning across various applications. The core learning process requires the initialization of model weight matrices within each GNN layer, which is typically accomplished via classic initialization methods such as Xavier initialization. However, these methods were originally motivated to stabilize the variance of hidden embeddings and gradients across layers of Feedforward Neural Networks (FNNs) and Convolutional Neural Networks (CNNs) to avoid vanishing gradients and maintain steady information flow. In contrast, within the GNN context classical initializations disregard the impact of the input graph structure and message passing on variance. In this paper, we analyze the variance of forward and backward propagation across GNN layers and show that the variance instability of GNN initializations comes from the combined effect of the activation function, hidden dimension, graph structure and message passing. To better account for these influence factors, we propose a new initialization method for Variance Instability Reduction within GNN Optimization (Virgo), which naturally tends to equate forward and backward variances across successive layers. We conduct comprehensive experiments on 15 datasets to show that Virgo can lead to superior model performance and more stable variance at initialization on node classification, link prediction and graph classification tasks. Codes are in https://github.com/LspongebobJH/virgo_icml2023.
TPA3D: Triplane Attention for Fast Text-to-3D Generation
Authors: Authors: Hong-En Chen, Bin-Shih Wu, Sheng-Yu Huang, Yu-Chiang Frank Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02647
Pdf link: https://arxiv.org/pdf/2312.02647
Abstract Due to the lack of large-scale text-3D correspondence data, recent text-to-3D generation works mainly rely on utilizing 2D diffusion models for synthesizing 3D data. Since diffusion-based methods typically require significant optimization time for both training and inference, the use of GAN-based models would still be desirable for fast 3D generation. In this work, we propose Triplane Attention for text-guided 3D generation (TPA3D), an end-to-end trainable GAN-based deep learning model for fast text-to-3D generation. With only 3D shape data and their rendered 2D images observed during training, our TPA3D is designed to retrieve detailed visual descriptions for synthesizing the corresponding 3D mesh data. This is achieved by the proposed attention mechanisms on the extracted sentence and word-level text features. In our experiments, we show that TPA3D generates high-quality 3D textured shapes aligned with fine-grained descriptions, while impressive computation efficiency can be observed.
Towards the Inferrence of Structural Similarity of Combinatorial Landscapes
Authors: Authors: Mingyu Huang, Ke Li
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.02720
Pdf link: https://arxiv.org/pdf/2312.02720
Abstract One of the most common problem-solving heuristics is by analogy. For a given problem, a solver can be viewed as a strategic walk on its fitness landscape. Thus if a solver works for one problem instance, we expect it will also be effective for other instances whose fitness landscapes essentially share structural similarities with each other. However, due to the black-box nature of combinatorial optimization, it is far from trivial to infer such similarity in real-world scenarios. To bridge this gap, by using local optima network as a proxy of fitness landscapes, this paper proposed to leverage graph data mining techniques to conduct qualitative and quantitative analyses to explore the latent topological structural information embedded in those landscapes. By conducting large-scale empirical experiments on three classic combinatorial optimization problems, we gain concrete evidence to support the existence of structural similarity between landscapes of the same classes within neighboring dimensions. We also interrogated the relationship between landscapes of different problem classes.
D-LGP: Dynamic Logic-Geometric Program for Combined Task and Motion Planning
Authors: Authors: Teng Xue, Amirreza Razmjoo, Sylvain Calinon
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2312.02731
Pdf link: https://arxiv.org/pdf/2312.02731
Abstract Many real-world sequential manipulation tasks involve a combination of discrete symbolic search and continuous motion planning, collectively known as combined task and motion planning (TAMP). However, prevailing methods often struggle with the computational burden and intricate combinatorial challenges stemming from the multitude of action skeletons. To address this, we propose Dynamic Logic-Geometric Program (D-LGP), a novel approach integrating Dynamic Tree Search and global optimization for efficient hybrid planning. Through empirical evaluation on three benchmarks, we demonstrate the efficacy of our approach, showcasing superior performance in comparison to state-of-the-art techniques. We validate our approach through simulation and demonstrate its capability for online replanning under uncertainty and external disturbances in the real world.
Geometric Data-Driven Dimensionality Reduction in MPC with Guarantees
Authors: Authors: Roland Schurig, Andreas Himmel, Rolf Findeisen
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2312.02734
Pdf link: https://arxiv.org/pdf/2312.02734
Abstract We consider the problem of reducing the dimension of the discrete-time optimal control problem that is solved repeatedly online in model predictive control. We show that a reduced-order scheme, which solves the optimization problem in a low-dimensional subspace, inherits the stability and recursive feasibility properties from the original formulation. We introduce a necessary and sufficient condition for initial feasibility and incorporate that in the subspace design. Finally, we use concepts of optimization over Riemannian manifolds to compute a subspace that provides optimal representations for a set of pre-defined high-dimensional optimizers under the initial admissibility constraint.
Integrating Plug-and-Play Data Priors with Weighted Prediction Error for Speech Dereverberation
Authors: Authors: Ziye Yang, Wenxing Yang, Kai Xie, Jie Chen
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2312.02773
Pdf link: https://arxiv.org/pdf/2312.02773
Abstract Speech dereverberation aims to alleviate the detrimental effects of late-reverberant components. While the weighted prediction error (WPE) method has shown superior performance in dereverberation, there is still room for further improvement in terms of performance and robustness in complex and noisy environments. Recent research has highlighted the effectiveness of integrating physics-based and data-driven methods, enhancing the performance of various signal processing tasks while maintaining interpretability. Motivated by these advancements, this paper presents a novel dereverberation frame-work, which incorporates data-driven methods for capturing speech priors within the WPE framework. The plug-and-play strategy (PnP), specifically the regularization by denoising (RED) strategy, is utilized to incorporate speech prior information learnt from data during the optimization problem solving iterations. Experimental results validate the effectiveness of the proposed approach.
On Age of Information and Energy-Transfer in a STAR-RIS-assisted System
Authors: Authors: Mohammad Reza Kavianinia, Mohammad Mehdi Setoode, Mohammad Javad Emadi
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.02776
Pdf link: https://arxiv.org/pdf/2312.02776
Abstract Battery-limited devices and time-sensitive applications are considered as key players in the forthcoming wireless sensor network. Therefore, the main goal of the network is two-fold; Charge battery-limited devices, and provide status updates to users where information-freshness matters. In this paper, a multi-antenna base station (BS) in assistance of simultaneously-transmitting-and-reflecting reconfigurable intelligent surface (STAR-RIS) transmits power to energy-harvesting devices while controlling status update performance at information-users by analyzing age of information (AoI) metric. Therefore, we derive a scheduling policy at BS, and analyze joint transmit beamforming and amplitude-phase optimization at BS and STAR-RIS, respectively, to reduce average sum-AoI for the time-sensitive information-users while satisfying minimum required energy at energy-harvesting users. Moreover, two different energy-splitting and mode switching policies at STAR-RIS are studied. Then, by use of an alternating optimization algorithm, the optimization problem is studied and non-convexity of the problem is tackled by using the successive convex approximation technique. Through numerical results, AoI-metric and energy harvesting requirements of the network are analyzed versus different parameters such as number of antennas at BS, size of STAR-RIS, and transmitted power to highlight how we can improve two fold performance of the system by utilizing STAR-RIS compared to the conventional RIS structure.
Low-complexity Linear Multicast Beamforming for Cache-aided MIMO Communications
Authors: Authors: Mohammad NaseriTehrani, MohammadJavad Salehi, Antti Tölli
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2312.02839
Pdf link: https://arxiv.org/pdf/2312.02839
Abstract A practical and scalable multicast beamformer design in multi-input multi-output~(MIMO) coded caching~(CC) systems is introduced in this paper. The proposed approach allows multicast transmission to multiple groups with partially overlapping user sets using receiver dimensions to distinguish between different group-specific streams. Additionally, it provides flexibility in accommodating various parameter configurations of the MIMO-CC setup and overcomes practical limitations, such as the requirement to use successive interference cancellation~(SIC) at the receiver, while achieving the same degrees-of-freedom~(DoF). To evaluate the proposed scheme, we define the symmetric rate as the sum rate of the partially overlapping streams received per user, comprising a linear multistream multicast transmission vector and the linear minimum mean square error~(LMMSE) receiver. The resulting non-convex symmetric rate maximization problem is solved using alternative optimization and successive convex approximation~(SCA). Moreover, a fast iterative Lagrangian-based algorithm is developed, significantly reducing the computational overhead compared to previous designs. The effectiveness of our proposed method is demonstrated by extensive simulations.
Realistic Scatterer Based Adversarial Attacks on SAR Image Classifiers
Authors: Authors: Tian Ye, Rajgopal Kannan, Viktor Prasanna, Carl Busart, Lance Kaplan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02912
Pdf link: https://arxiv.org/pdf/2312.02912
Abstract Adversarial attacks have highlighted the vulnerability of classifiers based on machine learning for Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) tasks. An adversarial attack perturbs SAR images of on-ground targets such that the classifiers are misled into making incorrect predictions. However, many existing attacking techniques rely on arbitrary manipulation of SAR images while overlooking the feasibility of executing the attacks on real-world SAR imagery. Instead, adversarial attacks should be able to be implemented by physical actions, for example, placing additional false objects as scatterers around the on-ground target to perturb the SAR image and fool the SAR ATR. In this paper, we propose the On-Target Scatterer Attack (OTSA), a scatterer-based physical adversarial attack. To ensure the feasibility of its physical execution, we enforce a constraint on the positioning of the scatterers. Specifically, we restrict the scatterers to be placed only on the target instead of in the shadow regions or the background. To achieve this, we introduce a positioning score based on Gaussian kernels and formulate an optimization problem for our OTSA attack. Using a gradient ascent method to solve the optimization problem, the OTSA can generate a vector of parameters describing the positions, shapes, sizes and amplitudes of the scatterers to guide the physical execution of the attack that will mislead SAR image classifiers. The experimental results show that our attack obtains significantly higher success rates under the positioning constraint compared with the existing method.
A Sparsity Approach to Scheduling and Control of Networked Systems
Authors: Authors: Anubhab Dasgupta, Atreyee Kundu
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.02915
Pdf link: https://arxiv.org/pdf/2312.02915
Abstract We study the design of scheduling logic and control logic for networked control systems (NCSs) where plants communicate with their remotely located controllers over a shared band-limited communication network. Due to a limited capacity of the network, only a subset of the plants can exchange information with their controllers at any instant of time and the remaining plants operate in open-loop. Our key contribution is a new algorithm that co-designs (a) an allocation scheme of the communication network among the plants (scheduling logic) and (b) the control inputs for the plants accessing the network (control logic) under which given non-zero initial states are steered to zero in a given time horizon for all the plants in the NCS. Sparse optimization is the primary apparatus for our analysis. We also provide sufficient conditions on the plant dynamics, capacity of the communication network and the given time horizon that lead to a numerically tractable implementation of our algorithm. A numerical experiment is presented to demonstrate the proposed results.
MIND: Multi-Task Incremental Network Distillation
Authors: Authors: Jacopo Bonato, Francesco Pelosin, Luigi Sabetta, Alessandro Nicolosi
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.02916
Pdf link: https://arxiv.org/pdf/2312.02916
Abstract The recent surge in pervasive devices generating dynamic data streams has underscored the necessity for learning systems to adapt to data distributional shifts continually. To tackle this challenge, the research community has put forth a spectrum of methodologies, including the demanding pursuit of class-incremental learning without replay data. In this study, we present MIND, a parameter isolation method that aims to significantly enhance the performance of replay-free solutions and achieve state-of-the-art results on several widely studied datasets. Our approach introduces two main contributions: two alternative distillation procedures that significantly improve the efficiency of MIND increasing the accumulated knowledge of each sub-network, and the optimization of the BachNorm layers across tasks inside the sub-networks. Overall, MIND outperforms all the state-of-the-art methods for rehearsal-free Class-Incremental learning (with an increment in classification accuracy of approx. +6% on CIFAR-100/10 and +10% on TinyImageNet/10) reaching up to approx. +40% accuracy in Domain-Incremental scenarios. Moreover, we ablated each contribution to demonstrate its impact on performance improvement. Our results showcase the superior performance of MIND indicating its potential for addressing the challenges posed by Class-incremental and Domain-Incremental learning in resource-constrained environments.
Fine-grained Controllable Video Generation via Object Appearance and Context
Authors: Authors: Hsin-Ping Huang, Yu-Chuan Su, Deqing Sun, Lu Jiang, Xuhui Jia, Yukun Zhu, Ming-Hsuan Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02919
Pdf link: https://arxiv.org/pdf/2312.02919
Abstract Text-to-video generation has shown promising results. However, by taking only natural languages as input, users often face difficulties in providing detailed information to precisely control the model's output. In this work, we propose fine-grained controllable video generation (FACTOR) to achieve detailed control. Specifically, FACTOR aims to control objects' appearances and context, including their location and category, in conjunction with the text prompt. To achieve detailed control, we propose a unified framework to jointly inject control signals into the existing text-to-video model. Our model consists of a joint encoder and adaptive cross-attention layers. By optimizing the encoder and the inserted layer, we adapt the model to generate videos that are aligned with both text prompts and fine-grained control. Compared to existing methods relying on dense control signals such as edge maps, we provide a more intuitive and user-friendly interface to allow object-level fine-grained control. Our method achieves controllability of object appearances without finetuning, which reduces the per-subject optimization efforts for the users. Extensive experiments on standard benchmark datasets and user-provided inputs validate that our model obtains a 70% improvement in controllability metrics over competitive baselines.
An alternating peak-optimization method for optimal trajectory generation of quadrotor drones
Authors: Authors: Wytze A.B. de Vries, Ming Li, Qirui Song, Zhiyong Sun
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2312.02944
Pdf link: https://arxiv.org/pdf/2312.02944
Abstract In this paper, we propose an alternating optimization method to address a time-optimal trajectory generation problem. Different from the existing solutions, our approach introduces a new formulation that minimizes the overall trajectory running time while maintaining the polynomial smoothness constraints and incorporating hard limits on motion derivatives to ensure feasibility. To address this problem, an alternating peak-optimization method is developed, which splits the optimization process into two sub-optimizations: the first sub-optimization optimizes polynomial coefficients for smoothness, and the second sub-optimization adjusts the time allocated to each trajectory segment. These are alternated until a feasible minimum-time solution is found. We offer a comprehensive set of simulations and experiments to showcase the superior performance of our approach in comparison to existing methods. A collection of demonstration videos with real drone flying experiments can be accessed at https://www.youtube.com/playlist?list=PLQGtPFK17zUYkwFT-fr0a8E49R8Uq712l .
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Authors: Authors: Shoukang Hu, Ziwei Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02973
Pdf link: https://arxiv.org/pdf/2312.02973
Abstract We present, GauHuman, a 3D human model with Gaussian Splatting for both fast training (1 ~ 2 minutes) and real-time rendering (up to 189 FPS), compared with existing NeRF-based implicit representation modelling frameworks demanding hours of training and seconds of rendering per frame. Specifically, GauHuman encodes Gaussian Splatting in the canonical space and transforms 3D Gaussians from canonical space to posed space with linear blend skinning (LBS), in which effective pose and LBS refinement modules are designed to learn fine details of 3D humans under negligible computational cost. Moreover, to enable fast optimization of GauHuman, we initialize and prune 3D Gaussians with 3D human prior, while splitting/cloning via KL divergence guidance, along with a novel merge operation for further speeding up. Extensive experiments on ZJU_Mocap and MonoCap datasets demonstrate that GauHuman achieves state-of-the-art performance quantitatively and qualitatively with fast training and real-time rendering speed. Notably, without sacrificing rendering quality, GauHuman can fast model the 3D human performer with ~13k 3D Gaussians.
Keyword: adam

Adam-like Algorithm with Smooth Clipping Attains Global Minima: Analysis Based on Ergodicity of Functional SDEs
Authors: Authors: Keisuke Suzuki
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2312.02182
Pdf link: https://arxiv.org/pdf/2312.02182
Abstract In this paper, we prove that an Adam-type algorithm with smooth clipping approaches the global minimizer of the regularized non-convex loss function. Adding smooth clipping and taking the state space as the set of all trajectories, we can apply the ergodic theory of Markov semigroups for this algorithm and investigate its asymptotic behavior. The ergodic theory we establish in this paper reduces the problem of evaluating the convergence, generalization error and discretization error of this algorithm to the problem of evaluating the difference between two functional stochastic differential equations (SDEs) with different drift coefficients. As a result of our analysis, we have shown that this algorithm minimizes the the regularized non-convex loss function with errors of the form $n^{-1/2}$, $\eta^{1/4}$, $\beta^{-1} \log (\beta + 1)$ and $e^{- c t}$. Here, $c$ is a constant and $n$, $\eta$, $\beta$ and $t$ denote the size of the training dataset, learning rate, inverse temperature and time, respectively.
Keyword: gradient

Training Chain-of-Thought via Latent-Variable Inference
Authors: Authors: Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2312.02179
Pdf link: https://arxiv.org/pdf/2312.02179
Abstract Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a ``chain-of-thought'' (CoT) prompt. One can also improve LLMs' performance on a specific task by supervised fine-tuning, i.e., by using gradient ascent on some tunable parameters to maximize the average log-likelihood of correct answers from a labeled training set. Naively combining CoT with supervised tuning requires supervision not just of the correct answers, but also of detailed rationales that lead to those answers; these rationales are expensive to produce by hand. Instead, we propose a fine-tuning strategy that tries to maximize the \emph{marginal} log-likelihood of generating a correct answer using CoT prompting, approximately averaging over all possible rationales. The core challenge is sampling from the posterior over rationales conditioned on the correct answer; we address it using a simple Markov-chain Monte Carlo (MCMC) expectation-maximization (EM) algorithm inspired by the self-taught reasoner (STaR), memoized wake-sleep, Markovian score climbing, and persistent contrastive divergence. This algorithm also admits a novel control-variate technique that drives the variance of our gradient estimates to zero as the model improves. Applying our technique to GSM8K and the tasks in BIG-Bench Hard, we find that this MCMC-EM fine-tuning technique typically improves the model's accuracy on held-out examples more than STaR or prompt-tuning with or without CoT.
Can We Learn Communication-Efficient Optimizers?
Authors: Authors: Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2312.02204
Pdf link: https://arxiv.org/pdf/2312.02204
Abstract Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally, that is on each worker, before averaging model parameters, helping relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art adaptive optimizers for deep learning. In this work, we investigate if the recent progress in the emerging area of learned optimizers can potentially close this gap while remaining communication-efficient. Specifically, we meta-learn how to perform global updates given an update from local SGD iterations. Our results demonstrate that learned optimizers can substantially outperform local SGD and its sophisticated variants while maintaining their communication efficiency. Learned optimizers can even generalize to unseen and much larger datasets and architectures, including ImageNet and ViTs, and to unseen modalities such as language modeling. We therefore demonstrate the potential of learned optimizers for improving communication-efficient distributed learning.
Low-Precision Mixed-Computation Models for Inference on Edge
Authors: Authors: Seyedarmin Azizi, Mahdi Nazemi, Mehdi Kamal, Massoud Pedram
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.02210
Pdf link: https://arxiv.org/pdf/2312.02210
Abstract This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach employs 4-bit Posit (Posit4), which has higher precision around zero, for representing weights with high sensitivity, while it uses 4-bit FixP (FixP4) for representing other weights. A heuristic for analyzing the importance and the quantization error of the weights is presented to assign the proper number system to different weights. Additionally, a gradient approximation for Posit representation is introduced to improve the quality of weight updates in the backpropagation process. Due to the high energy consumption of the fully Posit-based computations, neural network operations are carried out in FixP or Posit/FixP. An efficient hardware implementation of a MAC operation with a first Posit operand and FixP for a second operand and accumulator is presented. The efficacy of the proposed low-precision mixed-computation approach is extensively assessed on vision and language models. The results show that, on average, the accuracy of the mixed-computation is about 1.5% higher than that of FixP with a cost of 0.19% energy overhead.
Innovations in Agricultural Forecasting: A Multivariate Regression Study on Global Crop Yield Prediction
Authors: Authors: Ishaan Gupta, Samyutha Ayalasomayajula, Yashas Shashidhara, Anish Kataria, Shreyas Shashidhara, Krishita Kataria, Aditya Undurti
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.02254
Pdf link: https://arxiv.org/pdf/2312.02254
Abstract The prediction of crop yields internationally is a crucial objective in agricultural research. Thus, this study implements 6 regression models (Linear, Tree, Gradient Descent, Gradient Boosting, K- Nearest Neighbors, and Random Forest) to predict crop yields in 196 countries. Given 4 key training parameters, pesticides (tonnes), rainfall (mm), temperature (Celsius), and yield (hg/ha), it was found that our Random Forest Regression model achieved a determination coefficient (r^2) of 0.94, with a margin of error (ME) of .03. The models were trained and tested using the Food and Agricultural Organization of the United Nations data, along with the World Bank Climate Change Data Catalog. Furthermore, each parameter was analyzed to understand how varying factors could impact overall yield. We used unconventional models, contrary to generally used Deep Learning (DL) and Machine Learning (ML) models, combined with recently collected data to implement a unique approach in our research. Existing scholarship would benefit from understanding the most optimal model for agricultural research, specifically using the United Nations data.
Auto DP-SGD: Dual Improvements of Privacy and Accuracy via Automatic Clipping Threshold and Noise Multiplier Estimation
Authors: Authors: Sai Venkatesh Chilukoti, Md Imran Hossen, Liqun Shan, Vijay Srinivas Tida, Xiai Hei
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2312.02400
Pdf link: https://arxiv.org/pdf/2312.02400
Abstract DP-SGD has emerged as a popular method to protect personally identifiable information in deep learning applications. Unfortunately, DP-SGD's per-sample gradient clipping and uniform noise addition during training can significantly degrade model utility. To enhance the model's utility, researchers proposed various adaptive DP-SGD methods. However, we examine and discover that these techniques result in greater privacy leakage or lower accuracy than the traditional DP-SGD method, or a lack of evaluation on a complex data set such as CIFAR100. To address these limitations, we propose an Auto DP-SGD. Our method automates clipping threshold estimation based on the DL model's gradient norm and scales the gradients of each training sample without losing gradient information. This helps to improve the algorithm's utility while using a less privacy budget. To further improve accuracy, we introduce automatic noise multiplier decay mechanisms to decrease the noise multiplier after every epoch. Finally, we develop closed-form mathematical expressions using tCDP accountant for automatic noise multiplier and automatic clipping threshold estimation. Through extensive experimentation, we demonstrate that Auto DP-SGD outperforms existing SOTA DP-SGD methods in privacy and accuracy on various benchmark datasets. We also show that privacy can be improved by lowering the scale factor and using learning rate schedulers without significantly reducing accuracy. Specifically, Auto DP-SGD, when used with a step noise multiplier, improves accuracy by 3.20, 1.57, 6.73, and 1.42 for the MNIST, CIFAR10, CIFAR100, and AG News Corpus datasets, respectively. Furthermore, it obtains a substantial reduction in the privacy budget of 94.9, 79.16, 67.36, and 53.37 for the corresponding data sets.
Deep Neural Operator Enabled Concurrent Multitask Design for Multifunctional Metamaterials under Heterogeneous Fields
Authors: Authors: Doksoo Lee, Lu Zhang, Yue Yu, Wei Chen
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2312.02403
Pdf link: https://arxiv.org/pdf/2312.02403
Abstract Multifunctional metamaterials (MMM) bear promise as next-generation material platforms supporting miniaturization and customization. Despite many proof-of-concept demonstrations and the proliferation of deep learning assisted design, grand challenges of inverse design for MMM, especially those involving heterogeneous fields possibly subject to either mutual meta-atom coupling or long-range interactions, remain largely under-explored. To this end, we present a data-driven design framework, which streamlines the inverse design of MMMs involving heterogeneous fields. A core enabler is implicit Fourier neural operator (IFNO), which predicts heterogeneous fields distributed across a metamaterial array, thus in general at odds with homogenization assumptions, in a parameter-/sample-efficient fashion. Additionally, we propose a standard formulation of inverse problem covering a broad class of MMMs, and gradient-based multitask concurrent optimization identifying a set of Pareto-optimal architecture-stimulus (A-S) pairs. Fourier multiclass blending is proposed to synthesize inter-class meta-atoms anchored on a set of geometric motifs, while enjoying training-free dimension reduction and built-it reconstruction. Interlocking the three pillars, the framework is validated for light-bylight programmable plasmonic nanoantenna, whose design involves vast space jointly spanned by quasi-freeform supercells, maneuverable incident phase distributions, and conflicting figure-of-merits involving on-demand localization patterns. Accommodating all the challenges without a-priori simplifications, our framework could propel future advancements of MMM.
Towards Fast and Stable Federated Learning: Confronting Heterogeneity via Knowledge Anchor
Authors: Authors: Jinqian Chen, Jihua Zhu, Qinghai Zheng
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.02416
Pdf link: https://arxiv.org/pdf/2312.02416
Abstract Federated learning encounters a critical challenge of data heterogeneity, adversely affecting the performance and convergence of the federated model. Various approaches have been proposed to address this issue, yet their effectiveness is still limited. Recent studies have revealed that the federated model suffers severe forgetting in local training, leading to global forgetting and performance degradation. Although the analysis provides valuable insights, a comprehensive understanding of the vulnerable classes and their impact factors is yet to be established. In this paper, we aim to bridge this gap by systematically analyzing the forgetting degree of each class during local training across different communication rounds. Our observations are: (1) Both missing and non-dominant classes suffer similar severe forgetting during local training, while dominant classes show improvement in performance. (2) When dynamically reducing the sample size of a dominant class, catastrophic forgetting occurs abruptly when the proportion of its samples is below a certain threshold, indicating that the local model struggles to leverage a few samples of a specific class effectively to prevent forgetting. Motivated by these findings, we propose a novel and straightforward algorithm called Federated Knowledge Anchor (FedKA). Assuming that all clients have a single shared sample for each class, the knowledge anchor is constructed before each local training stage by extracting shared samples for missing classes and randomly selecting one sample per class for non-dominant classes. The knowledge anchor is then utilized to correct the gradient of each mini-batch towards the direction of preserving the knowledge of the missing and non-dominant classes. Extensive experimental results demonstrate that our proposed FedKA achieves fast and stable convergence, significantly improving accuracy on popular benchmarks.
Generator Born from Classifier
Authors: Authors: Runpeng Yu, Xinchao Wang
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02470
Pdf link: https://arxiv.org/pdf/2312.02470
Abstract In this paper, we make a bold attempt toward an ambitious task: given a pre-trained classifier, we aim to reconstruct an image generator, without relying on any data samples. From a black-box perspective, this challenge seems intractable, since it inevitably involves identifying the inverse function for a classifier, which is, by nature, an information extraction process. As such, we resort to leveraging the knowledge encapsulated within the parameters of the neural network. Grounded on the theory of Maximum-Margin Bias of gradient descent, we propose a novel learning paradigm, in which the generator is trained to ensure that the convergence conditions of the network parameters are satisfied over the generated distribution of the samples. Empirical validation from various image generation tasks substantiates the efficacy of our strategy.
Flexible Communication for Optimal Distributed Learning over Unpredictable Networks
Authors: Authors: Sahil Tyagi, Martin Swany
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02493
Pdf link: https://arxiv.org/pdf/2312.02493
Abstract Gradient compression alleviates expensive communication in distributed deep learning by sending fewer values and its corresponding indices, typically via Allgather (AG). Training with high compression ratio (CR) achieves high accuracy like DenseSGD, but has lower parallel scaling due to high communication cost (i.e., parallel efficiency). Using lower CRs improves parallel efficiency by lowering synchronization cost, but degrades model accuracy as well (statistical efficiency). Further, speedup attained with different models and CRs also varies with network latency, effective bandwidth and collective op used for aggregation. In many cases, collectives like Allreduce (AR) have lower cost than AG to exchange the same amount of data. In this paper, we propose an AR-compatible Topk compressor that is bandwidth-optimal and thus performs better than AG in certain network configurations. We develop a flexible communication strategy that switches between AG and AR based on which collective is optimal in the current settings, and model the pareto-relationship between parallel and statistical efficiency as a multi-objective optimization (MOO) problem to dynamically adjust CR and accelerate training while still converging to high accuracy.
UTBoost: A Tree-boosting based System for Uplift Modeling
Authors: Authors: Junjie Gao, Xiangyu Zheng, DongDong Wang, Zhixiang Huang, Bangqi Zheng, Kai Yang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.02573
Pdf link: https://arxiv.org/pdf/2312.02573
Abstract Uplift modeling refers to the set of machine learning techniques that a manager may use to estimate customer uplift, that is, the net effect of an action on some customer outcome. By identifying the subset of customers for whom a treatment will have the greatest effect, uplift models assist decision-makers in optimizing resource allocations and maximizing overall returns. Accurately estimating customer uplift poses practical challenges, as it requires assessing the difference between two mutually exclusive outcomes for each individual. In this paper, we propose two innovative adaptations of the well-established Gradient Boosting Decision Trees (GBDT) algorithm, which learn the causal effect in a sequential way and overcome the counter-factual nature. Both approaches innovate existing techniques in terms of ensemble learning method and learning objectives, respectively. Experiments on large-scale datasets demonstrate the usefulness of the proposed methods, which often yielding remarkable improvements over base models. To facilitate the application, we develop the UTBoost, an end-to-end tree boosting system specifically designed for uplift modeling. The package is open source and has been optimized for training speed to meet the needs of real industrial applications.
On the Initialization of Graph Neural Networks
Authors: Authors: Jiahang Li, Yakun Song, Xiang Song, David Paul Wipf
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2312.02622
Pdf link: https://arxiv.org/pdf/2312.02622
Abstract Graph Neural Networks (GNNs) have displayed considerable promise in graph representation learning across various applications. The core learning process requires the initialization of model weight matrices within each GNN layer, which is typically accomplished via classic initialization methods such as Xavier initialization. However, these methods were originally motivated to stabilize the variance of hidden embeddings and gradients across layers of Feedforward Neural Networks (FNNs) and Convolutional Neural Networks (CNNs) to avoid vanishing gradients and maintain steady information flow. In contrast, within the GNN context classical initializations disregard the impact of the input graph structure and message passing on variance. In this paper, we analyze the variance of forward and backward propagation across GNN layers and show that the variance instability of GNN initializations comes from the combined effect of the activation function, hidden dimension, graph structure and message passing. To better account for these influence factors, we propose a new initialization method for Variance Instability Reduction within GNN Optimization (Virgo), which naturally tends to equate forward and backward variances across successive layers. We conduct comprehensive experiments on 15 datasets to show that Virgo can lead to superior model performance and more stable variance at initialization on node classification, link prediction and graph classification tasks. Codes are in https://github.com/LspongebobJH/virgo_icml2023.
Do AI models produce better weather forecasts than physics-based models? A quantitative evaluation case study of Storm Ciarán
Authors: Authors: Andrew J. Charlton-Perez, Helen F. Dacre, Simon Driscoll, Suzanne L. Gray, Ben Harvey, Natalie J. Harvey, Kieran M. R. Hunt, Robert W. Lee, Ranjini Swaminathan, Remy Vandaele, Ambrogio Volonté
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
Arxiv link: https://arxiv.org/abs/2312.02658
Pdf link: https://arxiv.org/pdf/2312.02658
Abstract There has been huge recent interest in the potential of making operational weather forecasts using machine learning techniques. As they become a part of the weather forecasting toolbox, there is a pressing need to understand how well current machine learning models can simulate high-impactweather events. We compare forecasts of Storm Ciar\'an, a European windstorm that caused sixteen deaths and extensive damage in Northern Europe, made by machine learning and numericalweather prediction models. The four machine learning models considered (FourCastNet, Pangu-Weather, GraphCast and FourCastNet-v2) produce forecasts that accurately capture the synoptic-scale structure of the cyclone including the position of the cloud head, shape of the warm sector and location of warm conveyor belt jet, and the large-scale dynamical drivers important for the rapid storm development such as the position of the storm relative to the upper-level jet exit. However, their ability to resolve the more detailed structures important for issuing weather warnings is more mixed. All of the machine learning models underestimate the peak amplitude of winds associated with the storm, only some machine learning models resolve the warm core seclusion and none of the machine learning models capture the sharp bent-back warm frontal gradient. Our study shows there is a great deal about the performance and properties of machine learning weather forecasts that can be derived from case studies of high-impact weather events such as Storm Ciar\'an.
A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling
Authors: Authors: Wentao Qu, Yuantian Shao, Lingwu Meng, Xiaoshui Huang, Liang Xiao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02719
Pdf link: https://arxiv.org/pdf/2312.02719
Abstract Point cloud upsampling (PCU) enriches the representation of raw point clouds, significantly improving the performance in downstream tasks such as classification and reconstruction. Most of the existing point cloud upsampling methods focus on sparse point cloud feature extraction and upsampling module design. In a different way, we dive deeper into directly modelling the gradient of data distribution from dense point clouds. In this paper, we proposed a conditional denoising diffusion probability model (DDPM) for point cloud upsampling, called PUDM. Specifically, PUDM treats the sparse point cloud as a condition, and iteratively learns the transformation relationship between the dense point cloud and the noise. Simultaneously, PUDM aligns with a dual mapping paradigm to further improve the discernment of point features. In this context, PUDM enables learning complex geometry details in the ground truth through the dominant features, while avoiding an additional upsampling module design. Furthermore, to generate high-quality arbitrary-scale point clouds during inference, PUDM exploits the prior knowledge of the scale between sparse point clouds and dense point clouds during training by parameterizing a rate factor. Moreover, PUDM exhibits strong noise robustness in experimental results. In the quantitative and qualitative evaluations on PU1K and PUGAN, PUDM significantly outperformed existing methods in terms of Chamfer Distance (CD) and Hausdorff Distance (HD), achieving state of the art (SOTA) performance.
Scaling-up Memristor Monte Carlo with magnetic domain-wall physics
Authors: Authors: Thomas Dalgaty, Shogo Yamada, Anca Molnos, Eiji Kawasaki, Thomas Mesquida, François Rummens, Tatsuo Shibata, Yukihiro Urakawa, Yukio Terasaki, Tomoyuki Sasaki, Marc Duranton
Subjects: Emerging Technologies (cs.ET); Applied Physics (physics.app-ph)
Arxiv link: https://arxiv.org/abs/2312.02771
Pdf link: https://arxiv.org/pdf/2312.02771
Abstract By exploiting the intrinsic random nature of nanoscale devices, Memristor Monte Carlo (MMC) is a promising enabler of edge learning systems. However, due to multiple algorithmic and device-level limitations, existing demonstrations have been restricted to very small neural network models and datasets. We discuss these limitations, and describe how they can be overcome, by mapping the stochastic gradient Langevin dynamics (SGLD) algorithm onto the physics of magnetic domain-wall Memristors to scale-up MMC models by five orders of magnitude. We propose the push-pull pulse programming method that realises SGLD in-physics, and use it to train a domain-wall based ResNet18 on the CIFAR-10 dataset. On this task, we observe no performance degradation relative to a floating point model down to an update precision of between 6 and 7-bits, indicating we have made a step towards a large-scale edge learning system leveraging noisy analogue devices.
Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems
Authors: Authors: Céline Comte, Matthieu Jonckheere, Jaron Sanders, Albert Senen-Cerda
Subjects: Machine Learning (cs.LG); Performance (cs.PF); Optimization and Control (math.OC); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2312.02804
Pdf link: https://arxiv.org/pdf/2312.02804
Abstract Stochastic networks and queueing systems often lead to Markov decision processes (MDPs) with large state and action spaces as well as nonconvex objective functions, which hinders the convergence of many reinforcement learning (RL) algorithms. Policy-gradient methods perform well on MDPs with large state and action spaces, but they sometimes experience slow convergence due to the high variance of the gradient estimator. In this paper, we show that some of these difficulties can be circumvented by exploiting the structure of the underlying MDP. We first introduce a new family of gradient estimators called score-aware gradient estimators (SAGEs). When the stationary distribution of the MDP belongs to an exponential family parametrized by the policy parameters, SAGEs allow us to estimate the policy gradient without relying on value-function estimation, contrary to classical policy-gradient methods like actor-critic. To demonstrate their applicability, we examine two common control problems arising in stochastic networks and queueing systems whose stationary distributions have a product-form, a special case of exponential families. As a second contribution, we show that, under appropriate assumptions, the policy under a SAGE-based policy-gradient method has a large probability of converging to an optimal policy, provided that it starts sufficiently close to it, even with a nonconvex objective function and multiple maximizers. Our key assumptions are that, locally around a maximizer, a nondegeneracy property of the Hessian of the objective function holds and a Lyapunov function exists. Finally, we conduct a numerical comparison between a SAGE-based policy-gradient method and an actor-critic algorithm. The results demonstrate that the SAGE-based method finds close-to-optimal policies more rapidly, highlighting its superior performance over the traditional actor-critic method.
Realistic Scatterer Based Adversarial Attacks on SAR Image Classifiers
Authors: Authors: Tian Ye, Rajgopal Kannan, Viktor Prasanna, Carl Busart, Lance Kaplan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2312.02912
Pdf link: https://arxiv.org/pdf/2312.02912
Abstract Adversarial attacks have highlighted the vulnerability of classifiers based on machine learning for Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) tasks. An adversarial attack perturbs SAR images of on-ground targets such that the classifiers are misled into making incorrect predictions. However, many existing attacking techniques rely on arbitrary manipulation of SAR images while overlooking the feasibility of executing the attacks on real-world SAR imagery. Instead, adversarial attacks should be able to be implemented by physical actions, for example, placing additional false objects as scatterers around the on-ground target to perturb the SAR image and fool the SAR ATR. In this paper, we propose the On-Target Scatterer Attack (OTSA), a scatterer-based physical adversarial attack. To ensure the feasibility of its physical execution, we enforce a constraint on the positioning of the scatterers. Specifically, we restrict the scatterers to be placed only on the target instead of in the shadow regions or the background. To achieve this, we introduce a positioning score based on Gaussian kernels and formulate an optimization problem for our OTSA attack. Using a gradient ascent method to solve the optimization problem, the OTSA can generate a vector of parameters describing the positions, shapes, sizes and amplitudes of the scatterers to guide the physical execution of the attack that will mislead SAR image classifiers. The experimental results show that our attack obtains significantly higher success rates under the positioning constraint compared with the existing method.
Keyword: super-resolution

Conditional Variational Diffusion Models
Authors: Authors: Gabriel della Maggiora, Luis Alberto Croquevielle, Nikita Desphande, Harry Horsley, Thomas Heinis, Artur Yakimovich
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2312.02246
Pdf link: https://arxiv.org/pdf/2312.02246
Abstract Inverse problems aim to determine parameters from observations, a crucial task in engineering and science. Lately, generative models, especially diffusion models, have gained popularity in this area for their ability to produce realistic solutions and their good mathematical properties. Despite their success, an important drawback of diffusion models is their sensitivity to the choice of variance schedule, which controls the dynamics of the diffusion process. Fine-tuning this schedule for specific applications is crucial but time-costly and does not guarantee an optimal result. We propose a novel approach for learning the schedule as part of the training process. Our method supports probabilistic conditioning on data, provides high-quality solutions, and is flexible, proving able to adapt to different applications with minimum overhead. This approach is tested in two unrelated inverse problems: super-resolution microscopy and quantitative phase imaging, yielding comparable or superior results to previous methods and fine-tuned diffusion models. We conclude that fine-tuning the schedule by experimentation should be avoided because it can be learned during training in a stable way that yields better results.

zoq / arxiv-updates

New submissions for Wed, 6 Dec 23 #658

Keyword: sgd

Can We Learn Communication-Efficient Optimizers?

Auto DP-SGD: Dual Improvements of Privacy and Accuracy via Automatic Clipping Threshold and Noise Multiplier Estimation

Flexible Communication for Optimal Distributed Learning over Unpredictable Networks

Keyword: optimization

Efficient LDPC Decoding using Physical Computation

Mercury: A modeling, simulation, and optimization framework for data stream-oriented IoT applications

How Generative-AI can be Effectively used in Government Chatbots

DiverseDream: Diverse Text-to-3D Synthesis with Augmented Text Embedding

TranSegPGD: Improving Transferability of Adversarial Examples on Semantic Segmentation

A Data-efficient Framework for Robotics Large-scale LiDAR Scene Parsing

JarviX: A LLM No code Platform for Tabular Data Analysis and Optimization

FlowHON: Representing Flow Fields Using Higher-Order Networks

Re-Nerfing: Enforcing Geometric Constraints on Neural Radiance Fields through Novel Views Synthesis

CLIPDrawX: Primitive-based Explanations for Text Guided Sketch Synthesis

Deep Neural Operator Enabled Concurrent Multitask Design for Multifunctional Metamaterials under Heterogeneous Fields

GNSS Odometry: Precise Trajectory Estimation Based on Carrier Phase Cycle Slip Estimation

PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models

FINER: Flexible spectral-bias tuning in Implicit NEural Representation by Variable-periodic Activation Functions

Time-Relative RTK-GNSS: GNSS Loop Closure in Pose Graph Optimization

SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints

Flexible Communication for Optimal Distributed Learning over Unpredictable Networks

Estimation of articulated angle in six-wheeled dump trucks using multiple GNSS receivers for autonomous driving

Skipping Scheme for Gate-hiding Garbled Circuits

BOgen: Generating Part-Level 3D Designs Based on User Intention Inference through Bayesian Optimization and Variational Autoencoder

Prompt2NeRF-PIL: Fast NeRF Generation via Pretrained Implicit Latent

TSVR+: Twin support vector regression with privileged information

Prompt Optimization via Adversarial In-Context Learning

On the Initialization of Graph Neural Networks

TPA3D: Triplane Attention for Fast Text-to-3D Generation

Towards the Inferrence of Structural Similarity of Combinatorial Landscapes

D-LGP: Dynamic Logic-Geometric Program for Combined Task and Motion Planning

Geometric Data-Driven Dimensionality Reduction in MPC with Guarantees

Integrating Plug-and-Play Data Priors with Weighted Prediction Error for Speech Dereverberation

On Age of Information and Energy-Transfer in a STAR-RIS-assisted System

Low-complexity Linear Multicast Beamforming for Cache-aided MIMO Communications

Realistic Scatterer Based Adversarial Attacks on SAR Image Classifiers

A Sparsity Approach to Scheduling and Control of Networked Systems

MIND: Multi-Task Incremental Network Distillation

Fine-grained Controllable Video Generation via Object Appearance and Context

An alternating peak-optimization method for optimal trajectory generation of quadrotor drones

GauHuman: Articulated Gaussian Splatting from Monocular Human Videos

Keyword: adam

Adam-like Algorithm with Smooth Clipping Attains Global Minima: Analysis Based on Ergodicity of Functional SDEs

Keyword: gradient

Training Chain-of-Thought via Latent-Variable Inference

Can We Learn Communication-Efficient Optimizers?

Low-Precision Mixed-Computation Models for Inference on Edge

Innovations in Agricultural Forecasting: A Multivariate Regression Study on Global Crop Yield Prediction

Auto DP-SGD: Dual Improvements of Privacy and Accuracy via Automatic Clipping Threshold and Noise Multiplier Estimation

Deep Neural Operator Enabled Concurrent Multitask Design for Multifunctional Metamaterials under Heterogeneous Fields

Towards Fast and Stable Federated Learning: Confronting Heterogeneity via Knowledge Anchor

Generator Born from Classifier

Flexible Communication for Optimal Distributed Learning over Unpredictable Networks

UTBoost: A Tree-boosting based System for Uplift Modeling

On the Initialization of Graph Neural Networks

Do AI models produce better weather forecasts than physics-based models? A quantitative evaluation case study of Storm Ciarán

A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling

Scaling-up Memristor Monte Carlo with magnetic domain-wall physics

Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems

Realistic Scatterer Based Adversarial Attacks on SAR Image Classifiers

Keyword: super-resolution

Conditional Variational Diffusion Models