New submissions for Mon, 2 Oct 23

Keyword: sgd

There is no result

Keyword: optimization

Alternate Learning based Sparse Semantic Communications for Visual Transmission

Authors: Authors: Siyu Tong, Xiaoxue Yu, Rongpeng Li, Kun Lu, Zhifeng Zhao, Honggang Zhang
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.16681
Pdf link: https://arxiv.org/pdf/2309.16681
Abstract Semantic communication (SemCom) demonstrates strong superiority over conventional bit-level accurate transmission, by only attempting to recover the essential semantic information of data. In this paper, in order to tackle the non-differentiability of channels, we propose an alternate learning based SemCom system for visual transmission, named SparseSBC. Specially, SparseSBC leverages two separate Deep Neural Network (DNN)-based models at the transmitter and receiver, respectively, and learns the encoding and decoding in an alternate manner, rather than the joint optimization in existing literature, so as to solving the non-differentiability in the channel. In particular, a self-critic" training scheme is leveraged for stable training. Moreover, the DNN-based transmitter generates a sparse set of bits in deducedsemantic bases", by further incorporating a binary quantization module on the basis of minimal detrimental effect to the semantic accuracy. Extensive simulation results validate that SparseSBC shows efficient and effective transmission performance under various channel conditions, and outperforms typical SemCom solutions.
Autonomous Guidance Navigation and Control of the VISORS Formation-Flying Mission
Authors: Authors: Tommaso Guffanti, Toby Bell, Samuel Y. W. Low, Mason Murray-Cooper, Simone D'Amico
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.16698
Pdf link: https://arxiv.org/pdf/2309.16698
Abstract Virtual Super-resolution Optics with Reconfigurable Swarms (VISORS) is a distributed telescope mission for high-resolution imaging of the Sun using two 6U CubeSats flying in formation in a Sun-synchronous low-Earth orbit. An optics spacecraft carries a photon sieve acting as a high-resolution lens in the extreme ultraviolet spectrum, while the image passing through the sieve is focused on a detector spacecraft. This paper presents the newly conceived design of the on-board guidance, navigation and control (GNC) system, which is highly autonomous, robust, passively safe, and validated under realistic mission simulations. The primary objective of the GNC system is to establish a passively safe and high-precision formation alignment at 40-meter separation, with sub-centimeter relative navigation and position control accuracy, over repeated observations of 10-second duration. Science mission success rates are assessed via Monte-Carlo analyses under realistically modelled uncertainties stemming from sensing errors, maneuver errors, unmodelled dynamics, and erroneous knowledge of internal spacecraft components. Precise real-time relative navigation is achieved by carrier phase differential GPS with integer ambiguity resolution. Precise control over short baselines is achieved via closed-loop optimization-based stochastic model predictive control with centimeter-level accuracy. Control at far range and during approach is achieved by closed-form impulsive control with meter-level accuracy. Passive safety is enforced throughout the mission to mitigate collision risks even under critical subsystem failure. Beyond VISORS, this work also realizes the crucial insight that the described GNC architecture is generalizable to other distributed space missions where accuracy and fault-tolerant safety are key requirements, such as rendezvous, proximity operations, and swarming missions.
AIR: Threats of Adversarial Attacks on Deep Learning-Based Information Recovery
Authors: Authors: Jinyin Chen, Jie Ge, Shilian Zheng, Linhui Ye, Haibin Zheng, Weiguo Shen, Keqiang Yue, Xiaoniu Yang
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.16706
Pdf link: https://arxiv.org/pdf/2309.16706
Abstract A wireless communications system usually consists of a transmitter which transmits the information and a receiver which recovers the original information from the received distorted signal. Deep learning (DL) has been used to improve the performance of the receiver in complicated channel environments and state-of-the-art (SOTA) performance has been achieved. However, its robustness has not been investigated. In order to evaluate the robustness of DL-based information recovery models under adversarial circumstances, we investigate adversarial attacks on the SOTA DL-based information recovery model, i.e., DeepReceiver. We formulate the problem as an optimization problem with power and peak-to-average power ratio (PAPR) constraints. We design different adversarial attack methods according to the adversary's knowledge of DeepReceiver's model and/or testing samples. Extensive experiments show that the DeepReceiver is vulnerable to the designed attack methods in all of the considered scenarios. Even in the scenario of both model and test sample restricted, the adversary can attack the DeepReceiver and increase its bit error rate (BER) above 10%. It can also be found that the DeepReceiver is vulnerable to adversarial perturbations even with very low power and limited PAPR. These results suggest that defense measures should be taken to enhance the robustness of DeepReceiver.
Joint Participation Incentive and Network Pricing Design for Federated Learning
Authors: Authors: Ningning Ding, Lin Gao, Jianwei Huang
Subjects: Networking and Internet Architecture (cs.NI); Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2309.16712
Pdf link: https://arxiv.org/pdf/2309.16712
Abstract Federated learning protects users' data privacy through sharing users' local model parameters (instead of raw data) with a server. However, when massive users train a large machine learning model through federated learning, the dynamically varying and often heavy communication overhead can put significant pressure on the network operator. The operator may choose to dynamically change the network prices in response, which will eventually affect the payoffs of the server and users. This paper considers the under-explored yet important issue of the joint design of participation incentives (for encouraging users' contribution to federated learning) and network pricing (for managing network resources). Due to heterogeneous users' private information and multi-dimensional decisions, the optimization problems in Stage I of multi-stage games are non-convex. Nevertheless, we are able to analytically derive the corresponding optimal contract and pricing mechanism through proper transformations of constraints, variables, and functions, under both vertical and horizontal interaction structures of the participants. We show that the vertical structure is better than the horizontal one, as it avoids the interests misalignment between the server and the network operator. Numerical results based on real-world datasets show that our proposed mechanisms decrease server's cost by up to 24.87% comparing with the state-of-the-art benchmarks.
Energy Efficient Foot-Shape Design for Bipedal Walkers on Granular Terrain
Authors: Authors: Xunjie Chen, Jingang Yi, Hao Wang
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.16720
Pdf link: https://arxiv.org/pdf/2309.16720
Abstract It is important to understand how bipedal walkers balance and walk effectively on granular materials, such as sand and loose dirt, etc. This paper first presents a computational approach to obtain the motion and energy analysis of bipedal walkers on granular terrains and then discusses an optimization method for the robot foot-shape contour design for energy efficiently walking. We first present the foot-terrain interaction characteristics of the intrusion process using the resistive force theory that provides comprehensive force laws. Using human gait profiles, we compute and compare the ground reaction forces and the external work for walking gaits with various foot shapes on granular terrains. A multi-objective optimization problem is finally formulated for the foot contour design considering energy saving and walking efficiency. It is interesting to find out a non-convex foot shape gives the best performance in term of energy and locomotion efficiency on hard granular terrains. The presented work provides an enabling tool to further understand and design efficient and effective bipedal walkers on granular terrains.
XVO: Generalized Visual Odometry via Cross-Modal Self-Training
Authors: Authors: Lei Lai, Zhongkai Shangguan, Jimuyang Zhang, Eshed Ohn-Bar
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.16772
Pdf link: https://arxiv.org/pdf/2309.16772
Abstract We propose XVO, a semi-supervised learning method for training generalized monocular Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and settings. In contrast to standard monocular VO approaches which often study a known calibration within a single dataset, XVO efficiently learns to recover relative pose with real-world scale from visual scene semantics, i.e., without relying on any known camera parameters. We optimize the motion estimation model via self-training from large amounts of unconstrained and heterogeneous dash camera videos available on YouTube. Our key contribution is twofold. First, we empirically demonstrate the benefits of semi-supervised training for learning a general-purpose direct VO regression network. Second, we demonstrate multi-modal supervision, including segmentation, flow, depth, and audio auxiliary prediction tasks, to facilitate generalized representations for the VO task. Specifically, we find audio prediction task to significantly enhance the semi-supervised learning process while alleviating noisy pseudo-labels, particularly in highly dynamic and out-of-domain video data. Our proposed teacher network achieves state-of-the-art performance on the commonly used KITTI benchmark despite no multi-frame optimization or knowledge of camera parameters. Combined with the proposed semi-supervised step, XVO demonstrates off-the-shelf knowledge transfer across diverse conditions on KITTI, nuScenes, and Argoverse without fine-tuning.
Photonic Accelerators for Image Segmentation in Autonomous Driving and Defect Detection
Authors: Authors: Lakshmi Nair, David Widemann, Brad Turcott, Nick Moore, Alexandra Wleklinski, Darius Bunandar, Ioannis Papavasileiou, Shihu Wang, Eric Logan
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.16783
Pdf link: https://arxiv.org/pdf/2309.16783
Abstract Photonic computing promises faster and more energy-efficient deep neural network (DNN) inference than traditional digital hardware. Advances in photonic computing can have profound impacts on applications such as autonomous driving and defect detection that depend on fast, accurate and energy efficient execution of image segmentation models. In this paper, we investigate image segmentation on photonic accelerators to explore: a) the types of image segmentation DNN architectures that are best suited for photonic accelerators, and b) the throughput and energy efficiency of executing the different image segmentation models on photonic accelerators, along with the trade-offs involved therein. Specifically, we demonstrate that certain segmentation models exhibit negligible loss in accuracy (compared to digital float32 models) when executed on photonic accelerators, and explore the empirical reasoning for their robustness. We also discuss techniques for recovering accuracy in the case of models that do not perform well. Further, we compare throughput (inferences-per-second) and energy consumption estimates for different image segmentation workloads on photonic accelerators. We discuss the challenges and potential optimizations that can help improve the application of photonic accelerators to such computer vision tasks.
Agent Coordination via Contextual Regression (AgentCONCUR) for Data Center Flexibility
Authors: Authors: Vladimir Dvorkin
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.16792
Pdf link: https://arxiv.org/pdf/2309.16792
Abstract A network of spatially distributed data centers can provide operational flexibility to power systems by shifting computing tasks among electrically remote locations. However, harnessing this flexibility in real-time through the standard optimization techniques is challenged by the need for sensitive operational datasets and substantial computational resources. To alleviate the data and computational requirements, this paper introduces a coordination mechanism based on contextual regression. This mechanism, abbreviated as AgentCONCUR, associates cost-optimal task shifts with public and trusted contextual data (e.g., real-time prices) and uses regression on this data as a coordination policy. Notably, regression-based coordination does not learn the optimal coordination actions from a labeled dataset. Instead, it exploits the optimization structure of the coordination problem to ensure feasible and cost-effective actions. A NYISO-based study reveals large coordination gains and the optimal features for the successful regression-based coordination.
SatDM: Synthesizing Realistic Satellite Image with Semantic Layout Conditioning using Diffusion Models
Authors: Authors: Orkhan Baghirli, Hamid Askarov, Imran Ibrahimli, Ismat Bakhishov, Nabi Nabiyev
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2309.16812
Pdf link: https://arxiv.org/pdf/2309.16812
Abstract Deep learning models in the Earth Observation domain heavily rely on the availability of large-scale accurately labeled satellite imagery. However, obtaining and labeling satellite imagery is a resource-intensive endeavor. While generative models offer a promising solution to address data scarcity, their potential remains underexplored. Recently, Denoising Diffusion Probabilistic Models (DDPMs) have demonstrated significant promise in synthesizing realistic images from semantic layouts. In this paper, a conditional DDPM model capable of taking a semantic map and generating high-quality, diverse, and correspondingly accurate satellite images is implemented. Additionally, a comprehensive illustration of the optimization dynamics is provided. The proposed methodology integrates cutting-edge techniques such as variance learning, classifier-free guidance, and improved noise scheduling. The denoising network architecture is further complemented by the incorporation of adaptive normalization and self-attention mechanisms, enhancing the model's capabilities. The effectiveness of our proposed model is validated using a meticulously labeled dataset introduced within the context of this study. Validation encompasses both algorithmic methods such as Frechet Inception Distance (FID) and Intersection over Union (IoU), as well as a human opinion study. Our findings indicate that the generated samples exhibit minimal deviation from real ones, opening doors for practical applications such as data augmentation. We look forward to further explorations of DDPMs in a wider variety of settings and data modalities. An open-source reference implementation of the algorithm and a link to the benchmarked dataset are provided at https://github.com/obaghirli/syn10-diffusion.
Stochastic Implicit Neural Signed Distance Functions for Safe Motion Planning under Sensing Uncertainty
Authors: Authors: Carlos Quintero-Peña, Wil Thomason, Zachary Kingston, Anastasios Kyrillidis, Lydia E. Kavraki
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.16862
Pdf link: https://arxiv.org/pdf/2309.16862
Abstract Motion planning under sensing uncertainty is critical for robots in unstructured environments to guarantee safety for both the robot and any nearby humans. Most work on planning under uncertainty does not scale to high-dimensional robots such as manipulators, assumes simplified geometry of the robot or environment, or requires per-object knowledge of noise. Instead, we propose a method that directly models sensor-specific aleatoric uncertainty to find safe motions for high-dimensional systems in complex environments, without exact knowledge of environment geometry. We combine a novel implicit neural model of stochastic signed distance functions with a hierarchical optimization-based motion planner to plan low-risk motions without sacrificing path quality. Our method also explicitly bounds the risk of the path, offering trustworthiness. We empirically validate that our method produces safe motions and accurate risk bounds and is safer than baseline approaches.
Stochastic Digital Twin for Copy Detection Patterns
Authors: Authors: Yury Belousov, Olga Taran, Vitaliy Kinakh, Slava Voloshynovskiy
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2309.16866
Pdf link: https://arxiv.org/pdf/2309.16866
Abstract Copy detection patterns (CDP) present an efficient technique for product protection against counterfeiting. However, the complexity of studying CDP production variability often results in time-consuming and costly procedures, limiting CDP scalability. Recent advancements in computer modelling, notably the concept of a "digital twin" for printing-imaging channels, allow for enhanced scalability and the optimization of authentication systems. Yet, the development of an accurate digital twin is far from trivial. This paper extends previous research which modelled a printing-imaging channel using a machine learning-based digital twin for CDP. This model, built upon an information-theoretic framework known as "Turbo", demonstrated superior performance over traditional generative models such as CycleGAN and pix2pix. However, the emerging field of Denoising Diffusion Probabilistic Models (DDPM) presents a potential advancement in generative models due to its ability to stochastically model the inherent randomness of the printing-imaging process, and its impressive performance in image-to-image translation tasks. This study aims at comparing the capabilities of the Turbo framework and DDPM on the same CDP datasets, with the goal of establishing the real-world benefits of DDPM models for digital twin applications in CDP security. Furthermore, the paper seeks to evaluate the generative potential of the studied models in the context of mobile phone data acquisition. Despite the increased complexity of DDPM methods when compared to traditional approaches, our study highlights their advantages and explores their potential for future applications.
Predicting Object Interactions with Behavior Primitives: An Application in Stowing Tasks
Authors: Authors: Haonan Chen, Yilong Niu, Kaiwen Hong, Shuijing Liu, Yixuan Wang, Yunzhu, Katherine Driggs-Campbell
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.16873
Pdf link: https://arxiv.org/pdf/2309.16873
Abstract Stowing, the task of placing objects in cluttered shelves or bins, is a common task in warehouse and manufacturing operations. However, this task is still predominantly carried out by human workers as stowing is challenging to automate due to the complex multi-object interactions and long-horizon nature of the task. Previous works typically involve extensive data collection and costly human labeling of semantic priors across diverse object categories. This paper presents a method to learn a generalizable robot stowing policy from predictive model of object interactions and a single demonstration with behavior primitives. We propose a novel framework that utilizes Graph Neural Networks to predict object interactions within the parameter space of behavioral primitives. We further employ primitive-augmented trajectory optimization to search the parameters of a predefined library of heterogeneous behavioral primitives to instantiate the control action. Our framework enables robots to proficiently execute long-horizon stowing tasks with a few keyframes (3-4) from a single demonstration. Despite being solely trained in a simulation, our framework demonstrates remarkable generalization capabilities. It efficiently adapts to a broad spectrum of real-world conditions, including various shelf widths, fluctuating quantities of objects, and objects with diverse attributes such as sizes and shapes.
Sourcing Investment Targets for Venture and Growth Capital Using Multivariate Time Series Transformer
Authors: Authors: Lele Cao, Gustaf Halvardsson, Andrew McCornack, Vilhelm von Ehrenheim, Pawel Herman
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Portfolio Management (q-fin.PM)
Arxiv link: https://arxiv.org/abs/2309.16888
Pdf link: https://arxiv.org/pdf/2309.16888
Abstract This paper addresses the growing application of data-driven approaches within the Private Equity (PE) industry, particularly in sourcing investment targets (i.e., companies) for Venture Capital (VC) and Growth Capital (GC). We present a comprehensive review of the relevant approaches and propose a novel approach leveraging a Transformer-based Multivariate Time Series Classifier (TMTSC) for predicting the success likelihood of any candidate company. The objective of our research is to optimize sourcing performance for VC and GC investments by formally defining the sourcing problem as a multivariate time series classification task. We consecutively introduce the key components of our implementation which collectively contribute to the successful application of TMTSC in VC/GC sourcing: input features, model architecture, optimization target, and investor-centric data augmentation and split. Our extensive experiments on four datasets, benchmarked towards three popular baselines, demonstrate the effectiveness of our approach in improving decision making within the VC and GC industry.
SIMD-ified R-tree Query Processing and Optimization
Authors: Authors: Yeasir Rayhan, Walid G. Aref
Subjects: Databases (cs.DB)
Arxiv link: https://arxiv.org/abs/2309.16913
Pdf link: https://arxiv.org/pdf/2309.16913
Abstract The introduction of Single Instruction Multiple Data (SIMD) instructions in mainstream CPUs has enabled modern database engines to leverage data parallelism by performing more computation with a single instruction, resulting in a reduced number of instructions required to execute a query as well as the elimination of conditional branches. Though SIMD in the context of traditional database engines has been studied extensively, it has been overlooked in the context of spatial databases. In this paper, we investigate how spatial database engines can benefit from SIMD vectorization in the context of an R-tree spatial index. We present vectorized versions of the spatial range select, and spatial join operations over a vectorized R-tree index. For each of the operations, we investigate two storage layouts for an R-tree node to leverage SIMD instructions. We design vectorized algorithms for each of the spatial operations given each of the two data layouts. We show that the introduction of SIMD can improve the latency of the spatial query operators up to 9x. We introduce several optimizations over the vectorized implementation of these query operators, and study their effectiveness in query performance and various hardware performance counters under different scenarios.
ONNXExplainer: an ONNX Based Generic Framework to Explain Neural Networks Using Shapley Values
Authors: Authors: Yong Zhao, Runxin He, Nicholas Kersting, Can Liu, Shubham Agrawal, Chiranjeet Chetia, Yu Gu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.16916
Pdf link: https://arxiv.org/pdf/2309.16916
Abstract Understanding why a neural network model makes certain decisions can be as important as the inference performance. Various methods have been proposed to help practitioners explain the prediction of a neural network model, of which Shapley values are most popular. SHAP package is a leading implementation of Shapley values to explain neural networks implemented in TensorFlow or PyTorch but lacks cross-platform support, one-shot deployment and is highly inefficient. To address these problems, we present the ONNXExplainer, which is a generic framework to explain neural networks using Shapley values in the ONNX ecosystem. In ONNXExplainer, we develop its own automatic differentiation and optimization approach, which not only enables One-Shot Deployment of neural networks inference and explanations, but also significantly improves the efficiency to compute explanation with less memory consumption. For fair comparison purposes, we also implement the same optimization in TensorFlow and PyTorch and measure its performance against the current state of the art open-source counterpart, SHAP. Extensive benchmarks demonstrate that the proposed optimization approach improves the explanation latency of VGG19, ResNet50, DenseNet201, and EfficientNetB0 by as much as 500%.
TranDRL: A Transformer-Driven Deep Reinforcement Learning Enabled Prescriptive Maintenance Framework
Authors: Authors: Yang Zhao, Wenbo Wang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.16935
Pdf link: https://arxiv.org/pdf/2309.16935
Abstract Industrial systems demand reliable predictive maintenance strategies to enhance operational efficiency and reduce downtime. This paper introduces a novel, integrated framework that leverages the power of transformer neural networks and deep reinforcement learning (DRL) algorithms to optimize maintenance actions. Our approach employs the transformer model to effectively capture complex temporal patterns in sensor data, thereby accurately predicting the Remaining Useful Life (RUL) of equipment. Simultaneously, the DRL component of our framework provides cost-effective and timely maintenance recommendations. We validate the efficacy of our framework on the NASA C-MPASS dataset, where it demonstrates significant advancements in both RUL prediction accuracy and the optimization of maintenance actions. Consequently, our pioneering approach provides an innovative data-driven methodology for prescriptive maintenance, addressing key challenges in industrial operations and leading the way to more efficient, cost-effective, and reliable systems.
Leveraging Optimization for Adaptive Attacks on Image Watermarks
Authors: Authors: Nils Lukas, Abdulrahman Diaa, Lucas Fenaux, Florian Kerschbaum
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.16952
Pdf link: https://arxiv.org/pdf/2309.16952
Abstract Untrustworthy users can misuse image generators to synthesize high-quality deepfakes and engage in online spam or disinformation campaigns. Watermarking deters misuse by marking generated content with a hidden message, enabling its detection using a secret watermarking key. A core security property of watermarking is robustness, which states that an attacker can only evade detection by substantially degrading image quality. Assessing robustness requires designing an adaptive attack for the specific watermarking algorithm. A challenge when evaluating watermarking algorithms and their (adaptive) attacks is to determine whether an adaptive attack is optimal, i.e., it is the best possible attack. We solve this problem by defining an objective function and then approach adaptive attacks as an optimization problem. The core idea of our adaptive attacks is to replicate secret watermarking keys locally by creating surrogate keys that are differentiable and can be used to optimize the attack's parameters. We demonstrate for Stable Diffusion models that such an attacker can break all five surveyed watermarking methods at negligible degradation in image quality. These findings emphasize the need for more rigorous robustness testing against adaptive, learnable attackers.
Multi-Resolution Active Learning of Fourier Neural Operators
Authors: Authors: Shibo Li, Xin Yu, Wei Xing, Mike Kirby, Akil Narayan, Shandian Zhe
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.16971
Pdf link: https://arxiv.org/pdf/2309.16971
Abstract Fourier Neural Operator (FNO) is a popular operator learning framework, which not only achieves the state-of-the-art performance in many tasks, but also is highly efficient in training and prediction. However, collecting training data for the FNO is a costly bottleneck in practice, because it often demands expensive physical simulations. To overcome this problem, we propose Multi-Resolution Active learning of FNO (MRA-FNO), which can dynamically select the input functions and resolutions to lower the data cost as much as possible while optimizing the learning efficiency. Specifically, we propose a probabilistic multi-resolution FNO and use ensemble Monte-Carlo to develop an effective posterior inference algorithm. To conduct active learning, we maximize a utility-cost ratio as the acquisition function to acquire new examples and resolutions at each step. We use moment matching and the matrix determinant lemma to enable tractable, efficient utility computation. Furthermore, we develop a cost annealing framework to avoid over-penalizing high-resolution queries at the early stage. The over-penalization is severe when the cost difference is significant between the resolutions, which renders active learning often stuck at low-resolution queries and inferior performance. Our method overcomes this problem and applies to general multi-fidelity active learning and optimization problems. We have shown the advantage of our method in several benchmark operator learning tasks.
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors
Authors: Authors: Chengming Zhang, Baixi Sun, Xiaodong Yu, Zhen Xie, Weijian Zheng, Kamil Iskra, Pete Beckman, Dingwen Tao
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2309.16976
Pdf link: https://arxiv.org/pdf/2309.16976
Abstract Transformer models have achieved remarkable success in various machine learning tasks but suffer from high computational complexity and resource requirements. The quadratic complexity of the self-attention mechanism further exacerbates these challenges when dealing with long sequences and large datasets. Specialized AI hardware accelerators, such as the Habana GAUDI architecture, offer a promising solution to tackle these issues. GAUDI features a Matrix Multiplication Engine (MME) and a cluster of fully programmable Tensor Processing Cores (TPC). This paper explores the untapped potential of using GAUDI processors to accelerate Transformer-based models, addressing key challenges in the process. Firstly, we provide a comprehensive performance comparison between the MME and TPC components, illuminating their relative strengths and weaknesses. Secondly, we explore strategies to optimize MME and TPC utilization, offering practical insights to enhance computational efficiency. Thirdly, we evaluate the performance of Transformers on GAUDI, particularly in handling long sequences and uncovering performance bottlenecks. Lastly, we evaluate the end-to-end performance of two Transformer-based large language models (LLM) on GAUDI. The contributions of this work encompass practical insights for practitioners and researchers alike. We delve into GAUDI's capabilities for Transformers through systematic profiling, analysis, and optimization exploration. Our study bridges a research gap and offers a roadmap for optimizing Transformer-based model training on the GAUDI architecture.
Optimization on the smallest eigenvalue of grounded Laplacian matrix via edge addition
Authors: Authors: Xiaotian Zhou, Haoxin Sun, Wei Li, Zhongzhi Zhang
Subjects: Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2309.17019
Pdf link: https://arxiv.org/pdf/2309.17019
Abstract The grounded Laplacian matrix $\LL{-S}$ of a graph $\calG=(V,E)$ with $n=|V|$ nodes and $m=|E|$ edges is a $(n-s)\times (n-s)$ submatrix of its Laplacian matrix $\LL$, obtained from $\LL$ by deleting rows and columns corresponding to $s=|S| \ll n $ ground nodes forming set $S\subset V$. The smallest eigenvalue of $\LL{-S}$ plays an important role in various practical scenarios, such as characterizing the convergence rate of leader-follower opinion dynamics, with a larger eigenvalue indicating faster convergence of opinion. In this paper, we study the problem of adding $k \ll n$ edges among all the nonexistent edges forming the candidate edge set $Q = (V\times V)\backslash E$, in order to maximize the smallest eigenvalue of the grounded Laplacian matrix. We show that the objective function of the combinatorial optimization problem is monotone but non-submodular. To solve the problem, we first simplify the problem by restricting the candidate edge set $Q$ to be $(S\times (V\backslash S))\backslash E$, and prove that it has the same optimal solution as the original problem, although the size of set $Q$ is reduced from $O(n^2)$ to $O(n)$. Then, we propose two greedy approximation algorithms. One is a simple greedy algorithm with an approximation ratio $(1-e^{-\alpha\gamma})/\alpha$ and time complexity $O(kn^4)$, where $\gamma$ and $\alpha$ are, respectively, submodularity ratio and curvature, whose bounds are provided for some particular cases. The other is a fast greedy algorithm without approximation guarantee, which has a running time $\tilde{O}(km)$, where $\tilde{O}(\cdot)$ suppresses the ${\rm poly} (\log n)$ factors. Numerous experiments on various real networks are performed to validate the superiority of our algorithms, in terms of effectiveness and efficiency.
UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and Light-Weight Modeling
Authors: Authors: Linghao Yang, Yanmin Wu, Yu Deng, Rui Tian, Xinggang Hu, Tiefeng Ma
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.17036
Pdf link: https://arxiv.org/pdf/2309.17036
Abstract Tracking and modeling unknown rigid objects in the environment play a crucial role in autonomous unmanned systems and virtual-real interactive applications. However, many existing Simultaneous Localization, Mapping and Moving Object Tracking (SLAMMOT) methods focus solely on estimating specific object poses and lack estimation of object scales and are unable to effectively track unknown objects. In this paper, we propose a novel SLAM backend that unifies ego-motion tracking, rigid object motion tracking, and modeling within a joint optimization framework. In the perception part, we designed a pixel-level asynchronous object tracker (AOT) based on the Segment Anything Model (SAM) and DeAOT, enabling the tracker to effectively track target unknown objects guided by various predefined tasks and prompts. In the modeling part, we present a novel object-centric quadric parameterization to unify both static and dynamic object initialization and optimization. Subsequently, in the part of object state estimation, we propose a tightly coupled optimization model for object pose and scale estimation, incorporating hybrids constraints into a novel dual sliding window optimization framework for joint estimation. To our knowledge, we are the first to tightly couple object pose tracking with light-weight modeling of dynamic and static objects using quadric. We conduct qualitative and quantitative experiments on simulation datasets and real-world datasets, demonstrating the state-of-the-art robustness and accuracy in motion estimation and modeling. Our system showcases the potential application of object perception in complex dynamic scenes.
Double-Layer Power Control for Mobile Cell-Free XL-MIMO with Multi-Agent Reinforcement Learning
Authors: Authors: Ziheng Liu, Jiayi Zhang, Zhilong Liu, Huahua Xiao, Bo Ai
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.17079
Pdf link: https://arxiv.org/pdf/2309.17079
Abstract Cell-free (CF) extremely large-scale multiple-input multiple-output (XL-MIMO) is regarded as a promising technology for enabling future wireless communication systems. Significant attention has been generated by its considerable advantages in augmenting degrees of freedom. In this paper, we first investigate a CF XL-MIMO system with base stations equipped with XL-MIMO panels under a dynamic environment. Then, we propose an innovative multi-agent reinforcement learning (MARL)-based power control algorithm that incorporates predictive management and distributed optimization architecture, which provides a dynamic strategy for addressing high-dimension signal processing problems. Specifically, we compare various MARL-based algorithms, which shows that the proposed MARL-based algorithm effectively strikes a balance between spectral efficiency (SE) performance and convergence time. Moreover, we consider a double-layer power control architecture based on the large-scale fading coefficients between antennas to suppress interference within dynamic systems. Compared to the single-layer architecture, the results obtained unveil that the proposed double-layer architecture has a nearly24% SE performance improvement, especially with massive antennas and smaller antenna spacing.
Too Big, so Fail? -- Enabling Neural Construction Methods to Solve Large-Scale Routing Problems
Authors: Authors: Jonas K. Falkner, Lars Schmidt-Thieme
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.17089
Pdf link: https://arxiv.org/pdf/2309.17089
Abstract In recent years new deep learning approaches to solve combinatorial optimization problems, in particular NP-hard Vehicle Routing Problems (VRP), have been proposed. The most impactful of these methods are sequential neural construction approaches which are usually trained via reinforcement learning. Due to the high training costs of these models, they usually are trained on limited instance sizes (e.g. serving 100 customers) and later applied to vastly larger instance size (e.g. 2000 customers). By means of a systematic scale-up study we show that even state-of-the-art neural construction methods are outperformed by simple heuristics, failing to generalize to larger problem instances. We propose to use the ruin recreate principle that alternates between completely destroying a localized part of the solution and then recreating an improved variant. In this way, neural construction methods like POMO are never applied to the global problem but just in the reconstruction step, which only involves partial problems much closer in size to their original training instances. In thorough experiments on four datasets of varying distributions and modalities we show that our neural ruin recreate approach outperforms alternative forms of improving construction methods such as sampling and beam search and in several experiments also advanced local search approaches.
Guiding Instruction-based Image Editing via Multimodal Large Language Models
Authors: Authors: Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.17102
Pdf link: https://arxiv.org/pdf/2309.17102
Abstract Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks. However, human instructions are sometimes too brief for current methods to capture and follow. Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation via LMs. We investigate how MLLMs facilitate edit instructions and present MLLM-Guided Image Editing (MGIE). MGIE learns to derive expressive instructions and provides explicit guidance. The editing model jointly captures this visual imagination and performs manipulation through end-to-end training. We evaluate various aspects of Photoshop-style modification, global photo optimization, and local editing. Extensive experimental results demonstrate that expressive instructions are crucial to instruction-based image editing, and our MGIE can lead to a notable improvement in automatic metrics and human evaluation while maintaining competitive inference efficiency.
GRANDE: Gradient-Based Decision Tree Ensembles
Authors: Authors: Sascha Marton, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.17130
Pdf link: https://arxiv.org/pdf/2309.17130
Abstract Despite the success of deep learning for text and image data, tree-based ensemble models are still state-of-the-art for machine learning with heterogeneous tabular data. However, there is a significant need for tabular-specific gradient-based methods due to their high flexibility. In this paper, we propose $\text{GRANDE}$, $\text{GRA}$die$\text{N}$t-Based $\text{D}$ecision Tree $\text{E}$nsembles, a novel approach for learning hard, axis-aligned decision tree ensembles using end-to-end gradient descent. GRANDE is based on a dense representation of tree ensembles, which affords to use backpropagation with a straight-through operator to jointly optimize all model parameters. Our method combines axis-aligned splits, which is a useful inductive bias for tabular data, with the flexibility of gradient-based optimization. Furthermore, we introduce an advanced instance-wise weighting that facilitates learning representations for both, simple and complex relations, within a single model. We conducted an extensive evaluation on a predefined benchmark with 19 classification datasets and demonstrate that our method outperforms existing gradient-boosting and deep learning frameworks on most datasets.
Convex Optimization of Bearing Formation Control of Rigid bodies on Lie Group
Authors: Authors: Sara Mansourinasab, Mahdi Sojoodi, Seyed Reza Moghadasi
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.17150
Pdf link: https://arxiv.org/pdf/2309.17150
Abstract In this paper, the problem of reaching formation for a network of rigid agents over a special orthogonal group is investigated by considering bearing-only constraints as the desired formation. Each agent is able to gather the measurements with respect to other agents in its own body frame. So, the agents are coordinated-free concerning a global reference frame. Attracting to the desired formation is founded on solving an optimization problem for minimizing the difference between the instantaneous bearing between agents and their desired bearing. In order to have a unique global solution, the convex optimization method is implemented. Since the rotation matrices are not convex, the method of convex relaxation of rotation matrices space is used to embed the rotation matrices on the convex hull of the Lie group. Then the control law is designed to achieve the desired bearing with minimum energy consumption. Finally, a simulation example is provided to verify the results.
Efficient Interpretable Nonlinear Modeling for Multiple Time Series
Authors: Authors: Kevin Roy, Luis Miguel Lopez-Ramos, Baltasar Beferull-Lozano
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.17154
Pdf link: https://arxiv.org/pdf/2309.17154
Abstract Predictive linear and nonlinear models based on kernel machines or deep neural networks have been used to discover dependencies among time series. This paper proposes an efficient nonlinear modeling approach for multiple time series, with a complexity comparable to linear vector autoregressive (VAR) models while still incorporating nonlinear interactions among different time-series variables. The modeling assumption is that the set of time series is generated in two steps: first, a linear VAR process in a latent space, and second, a set of invertible and Lipschitz continuous nonlinear mappings that are applied per sensor, that is, a component-wise mapping from each latent variable to a variable in the measurement space. The VAR coefficient identification provides a topology representation of the dependencies among the aforementioned variables. The proposed approach models each component-wise nonlinearity using an invertible neural network and imposes sparsity on the VAR coefficients to reflect the parsimonious dependencies usually found in real applications. To efficiently solve the formulated optimization problems, a custom algorithm is devised combining proximal gradient descent, stochastic primal-dual updates, and projection to enforce the corresponding constraints. Experimental results on both synthetic and real data sets show that the proposed algorithm improves the identification of the support of the VAR coefficients in a parsimonious manner while also improving the time-series prediction, as compared to the current state-of-the-art methods.
FedZeN: Towards superlinear zeroth-order federated learning via incremental Hessian estimation
Authors: Authors: Alessio Maritan, Subhrakanti Dey, Luca Schenato
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.17174
Pdf link: https://arxiv.org/pdf/2309.17174
Abstract Federated learning is a distributed learning framework that allows a set of clients to collaboratively train a model under the orchestration of a central server, without sharing raw data samples. Although in many practical scenarios the derivatives of the objective function are not available, only few works have considered the federated zeroth-order setting, in which functions can only be accessed through a budgeted number of point evaluations. In this work we focus on convex optimization and design the first federated zeroth-order algorithm to estimate the curvature of the global objective, with the purpose of achieving superlinear convergence. We take an incremental Hessian estimator whose error norm converges linearly, and we adapt it to the federated zeroth-order setting, sampling the random search directions from the Stiefel manifold for improved performance. In particular, both the gradient and Hessian estimators are built at the central server in a communication-efficient and privacy-preserving way by leveraging synchronized pseudo-random number generators. We provide a theoretical analysis of our algorithm, named FedZeN, proving local quadratic convergence with high probability and global linear convergence up to zeroth-order precision. Numerical simulations confirm the superlinear convergence rate and show that our algorithm outperforms the federated zeroth-order methods available in the literature.
RECOMBINER: Robust and Enhanced Compression with Bayesian Implicit Neural Representations
Authors: Authors: Jiajun He, Gergely Flamich, Zongyu Guo, José Miguel Hernández-Lobato
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.17182
Pdf link: https://arxiv.org/pdf/2309.17182
Abstract COMpression with Bayesian Implicit NEural Representations (COMBINER) is a recent data compression method that addresses a key inefficiency of previous Implicit Neural Representation (INR)-based approaches: it avoids quantization and enables direct optimization of the rate-distortion performance. However, COMBINER still has significant limitations: 1) it uses factorized priors and posterior approximations that lack flexibility; 2) it cannot effectively adapt to local deviations from global patterns in the data; and 3) its performance can be susceptible to modeling choices and the variational parameters' initializations. Our proposed method, Robust and Enhanced COMBINER (RECOMBINER), addresses these issues by 1) enriching the variational approximation while maintaining its computational cost via a linear reparameterization of the INR weights, 2) augmenting our INRs with learnable positional encodings that enable them to adapt to local details and 3) splitting high-resolution data into patches to increase robustness and utilizing expressive hierarchical priors to capture dependency across patches. We conduct extensive experiments across several data modalities, showcasing that RECOMBINER achieves competitive results with the best INR-based methods and even outperforms autoencoder-based codecs on low-resolution images at low bitrates.
Meta Reinforcement Learning for Fast Spectrum Sharing in Vehicular Networks
Authors: Authors: Kai Huang, Le Liang, Shi Jin, Geoffrey Ye Li
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.17185
Pdf link: https://arxiv.org/pdf/2309.17185
Abstract In this paper, we investigate the problem of fast spectrum sharing in vehicle-to-everything communication. In order to improve the spectrum efficiency of the whole system, the spectrum of vehicle-to-infrastructure links is reused by vehicle-to-vehicle links. To this end, we model it as a problem of deep reinforcement learning and tackle it with proximal policy optimization. A considerable number of interactions are often required for training an agent with good performance, so simulation-based training is commonly used in communication networks. Nevertheless, severe performance degradation may occur when the agent is directly deployed in the real world, even though it can perform well on the simulator, due to the reality gap between the simulation and the real environments. To address this issue, we make preliminary efforts by proposing an algorithm based on meta reinforcement learning. This algorithm enables the agent to rapidly adapt to a new task with the knowledge extracted from similar tasks, leading to fewer interactions and less training time. Numerical results show that our method achieves near-optimal performance and exhibits rapid convergence.
M-DAB: An Input-Distribution Optimization Algorithm for Composite DNA Storage by the Multinomial Channel
Authors: Authors: Adir Kobovich, Eitan Yaakobi, Nir Weinberger
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2309.17193
Pdf link: https://arxiv.org/pdf/2309.17193
Abstract Recent experiments have shown that the capacity of DNA storage systems may be significantly increased by synthesizing composite DNA letters. In this work, we model a DNA storage channel with composite inputs as a \textit{multinomial channel}, and propose an optimization algorithm for its capacity achieving input distribution, for an arbitrary number of output reads. The algorithm is termed multidimensional dynamic assignment Blahut-Arimoto (M-DAB), and is a generalized version of the DAB algorithm, proposed by Wesel et al. developed for the binomial channel. We also empirically observe a scaling law behavior of the capacity as a function of the support size of the capacity-achieving input distribution.
Generalized Activation via Multivariate Projection
Authors: Authors: Jiayun Li, Yuxiao Cheng, Zhuofan Xia, Yilin Mo, Gao Huang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.17194
Pdf link: https://arxiv.org/pdf/2309.17194
Abstract Activation functions are essential to introduce nonlinearity into neural networks, with the Rectified Linear Unit (ReLU) often favored for its simplicity and effectiveness. Motivated by the structural similarity between a shallow Feedforward Neural Network (FNN) and a single iteration of the Projected Gradient Descent (PGD) algorithm, a standard approach for solving constrained optimization problems, we consider ReLU as a projection from R onto the nonnegative half-line R+. Building on this interpretation, we extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection, thereby naturally extending it to a Multivariate Projection Unit (MPU), an activation function with multiple inputs and multiple outputs. We further provide a mathematical proof establishing that FNNs activated by SOC projections outperform those utilizing ReLU in terms of expressive power. Experimental evaluations on widely-adopted architectures further corroborate MPU's effectiveness against a broader range of existing activation functions.
Memory Gym: Partially Observable Challenges to Memory-Based Agents in Endless Episodes
Authors: Authors: Marco Pleines, Matthias Pallasch, Frank Zimmer, Mike Preuss
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.17207
Pdf link: https://arxiv.org/pdf/2309.17207
Abstract Memory Gym introduces a unique benchmark designed to test Deep Reinforcement Learning agents, specifically comparing Gated Recurrent Unit (GRU) against Transformer-XL (TrXL), on their ability to memorize long sequences, withstand noise, and generalize. It features partially observable 2D environments with discrete controls, namely Mortar Mayhem, Mystery Path, and Searing Spotlights. These originally finite environments are extrapolated to novel endless tasks that act as an automatic curriculum, drawing inspiration from the car game ``I packed my bag". These endless tasks are not only beneficial for evaluating efficiency but also intriguingly valuable for assessing the effectiveness of approaches in memory-based agents. Given the scarcity of publicly available memory baselines, we contribute an implementation driven by TrXL and Proximal Policy Optimization. This implementation leverages TrXL as episodic memory using a sliding window approach. In our experiments on the finite environments, TrXL demonstrates superior sample efficiency in Mystery Path and outperforms in Mortar Mayhem. However, GRU is more efficient on Searing Spotlights. Most notably, in all endless tasks, GRU makes a remarkable resurgence, consistently outperforming TrXL by significant margins.
RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization
Authors: Authors: Tuan Truong, Hoang-Phi Nguyen, Tung Pham, Minh-Tuan Tran, Mehrtash Harandi, Dinh Phung, Trung Le
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.17215
Pdf link: https://arxiv.org/pdf/2309.17215
Abstract Nowadays, understanding the geometry of the loss landscape shows promise in enhancing a model's generalization ability. In this work, we draw upon prior works that apply geometric principles to optimization and present a novel approach to improve robustness and generalization ability for constrained optimization problems. Indeed, this paper aims to generalize the Sharpness-Aware Minimization (SAM) optimizer to Riemannian manifolds. In doing so, we first extend the concept of sharpness and introduce a novel notion of sharpness on manifolds. To support this notion of sharpness, we present a theoretical analysis characterizing generalization capabilities with respect to manifold sharpness, which demonstrates a tighter bound on the generalization gap, a result not known before. Motivated by this analysis, we introduce our algorithm, Riemannian Sharpness-Aware Minimization (RSAM). To demonstrate RSAM's ability to enhance generalization ability, we evaluate and contrast our algorithm on a broad set of problems, such as image classification and contrastive learning across different datasets, including CIFAR100, CIFAR10, and FGVCAircraft. Our code is publicly available at \url{https://t.ly/RiemannianSAM}.
Differentiable Optimization Based Time-Varying Control Barrier Functions for Dynamic Obstacle Avoidance
Authors: Authors: Bolun Dai, Rooholla Khorrambakht, Prashanth Krishnamurthy, Farshad Khorrami
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.17226
Pdf link: https://arxiv.org/pdf/2309.17226
Abstract Control barrier functions (CBFs) provide a simple yet effective way for safe control synthesis. Recently, work has been done using differentiable optimization based methods to systematically construct CBFs for static obstacle avoidance tasks between geometric shapes. In this work, we extend the application of differentiable optimization based CBFs to perform dynamic obstacle avoidance tasks. We show that by using the time-varying CBF (TVCBF) formulation, we can perform obstacle avoidance for dynamic geometric obstacles. Additionally, we show how to alter the TVCBF constraint to consider measurement noise and actuation limits. To demonstrate the efficacy of our proposed approach, we first compare its performance with a model predictive control based method on a simulated dynamic obstacle avoidance task with non-ellipsoidal obstacles. Then, we demonstrate the performance of our proposed approach in experimental studies using a 7-degree-of-freedom Franka Research 3 robotic manipulator.
MORPH: Design Co-optimization with Reinforcement Learning via a Differentiable Hardware Model Proxy
Authors: Authors: Zhanpeng He, Matei Ciocarlie
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.17227
Pdf link: https://arxiv.org/pdf/2309.17227
Abstract We introduce MORPH, a method for co-optimization of hardware design parameters and control policies in simulation using reinforcement learning. Like most co-optimization methods, MORPH relies on a model of the hardware being optimized, usually simulated based on the laws of physics. However, such a model is often difficult to integrate into an effective optimization routine. To address this, we introduce a proxy hardware model, which is always differentiable and enables efficient co-optimization alongside a long-horizon control policy using RL. MORPH is designed to ensure that the optimized hardware proxy remains as close as possible to its realistic counterpart, while still enabling task completion. We demonstrate our approach on simulated 2D reaching and 3D multi-fingered manipulation tasks.
A Framework and a python-package for Real-time NMPC parameters settings
Authors: Authors: Mazen Alamir
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.17238
Pdf link: https://arxiv.org/pdf/2309.17238
Abstract This paper presents a framework that enables a systematic and rational choice of NMPC design components such as control updating period, down-sampling period for prediction, control parameterization, prediction horizon's length, the maximum number of iterations as well as penalties on the terminal cost and the soft constraints. The rationale that underlines the design choices is based on real-time implementability, convergence and constraints satisfaction for a given computational device and a specific optimization algorithm. Moreover, a freely available associated Python-based implementation is also described with a fully developed illustrative example implementing a nonlinear MPC controller for a Planar Vertical Take-Off and Landing (PVTOL) aircraft under control saturation and state constraints.
Data-Driven Min-Max MPC for Linear Systems
Authors: Authors: Yifan Xie, Julian Berberich, Frank Allgower
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.17307
Pdf link: https://arxiv.org/pdf/2309.17307
Abstract Designing data-driven controllers in the presence of noise is an important research problem, in particular when guarantees on stability, robustness, and constraint satisfaction are desired. In this paper, we propose a data-driven min-max model predictive control (MPC) scheme to design state-feedback controllers from noisy data for unknown linear time-invariant (LTI) system. The considered min-max problem minimizes the worst-case cost over the set of system matrices consistent with the data. We show that the resulting optimization problem can be reformulated as a semidefinite program (SDP). By solving the SDP, we obtain a state-feedback control law that stabilizes the closed-loop system and guarantees input and state constraint satisfaction. A numerical example demonstrates the validity of our theoretical results.
Few-Shot Domain Adaptation for Charge Prediction on Unprofessional Descriptions
Authors: Authors: Jie Zhao, Ziyu Guan, Wei Zhao, Yue Jiang, Xiaofei He
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2309.17313
Pdf link: https://arxiv.org/pdf/2309.17313
Abstract Recent works considering professional legal-linguistic style (PLLS) texts have shown promising results on the charge prediction task. However, unprofessional users also show an increasing demand on such a prediction service. There is a clear domain discrepancy between PLLS texts and non-PLLS texts expressed by those laypersons, which degrades the current SOTA models' performance on non-PLLS texts. A key challenge is the scarcity of non-PLLS data for most charge classes. This paper proposes a novel few-shot domain adaptation (FSDA) method named Disentangled Legal Content for Charge Prediction (DLCCP). Compared with existing FSDA works, which solely perform instance-level alignment without considering the negative impact of text style information existing in latent features, DLCCP (1) disentangles the content and style representations for better domain-invariant legal content learning with carefully designed optimization goals for content and style spaces and, (2) employs the constitutive elements knowledge of charges to extract and align element-level and instance-level content representations simultaneously. We contribute the first publicly available non-PLLS dataset named NCCP for developing layperson-friendly charge prediction models. Experiments on NCCP show the superiority of our methods over competitive baselines.
Toward Operationalizing Pipeline-aware ML Fairness: A Research Agenda for Developing Practical Guidelines and Tools
Authors: Authors: Emily Black, Rakshit Naidu, Rayid Ghani, Kit T. Rodolfa, Daniel E. Ho, Hoda Heidari
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2309.17337
Pdf link: https://arxiv.org/pdf/2309.17337
Abstract While algorithmic fairness is a thriving area of research, in practice, mitigating issues of bias often gets reduced to enforcing an arbitrarily chosen fairness metric, either by enforcing fairness constraints during the optimization step, post-processing model outputs, or by manipulating the training data. Recent work has called on the ML community to take a more holistic approach to tackle fairness issues by systematically investigating the many design choices made through the ML pipeline, and identifying interventions that target the issue's root cause, as opposed to its symptoms. While we share the conviction that this pipeline-based approach is the most appropriate for combating algorithmic unfairness on the ground, we believe there are currently very few methods of \emph{operationalizing} this approach in practice. Drawing on our experience as educators and practitioners, we first demonstrate that without clear guidelines and toolkits, even individuals with specialized ML knowledge find it challenging to hypothesize how various design choices influence model behavior. We then consult the fair-ML literature to understand the progress to date toward operationalizing the pipeline-aware approach: we systematically collect and organize the prior work that attempts to detect, measure, and mitigate various sources of unfairness through the ML pipeline. We utilize this extensive categorization of previous contributions to sketch a research agenda for the community. We hope this work serves as the stepping stone toward a more comprehensive set of resources for ML researchers, practitioners, and students interested in exploring, designing, and testing pipeline-oriented approaches to algorithmic fairness.
MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search
Authors: Authors: Eliska Kloberdanz, Wei Le
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.17341
Pdf link: https://arxiv.org/pdf/2309.17341
Abstract Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference latency, and therefore allows for DNNs to be deployed on platforms with constrained computational resources and real-time systems. However, quantization can lead to numerical instability caused by roundoff error which leads to inaccurate computations and therefore, a decrease in quantized model accuracy. Similarly to prior works, which have shown that both biases and activations are more sensitive to quantization and are best kept in full precision or quantized with higher bit-widths, we show that some weights are more sensitive than others which should be reflected on their quantization bit-width. To that end we propose MixQuant, a search algorithm that finds the optimal custom quantization bit-width for each layer weight based on roundoff error and can be combined with any quantization method as a form of pre-processing optimization. We show that combining MixQuant with BRECQ, a state-of-the-art quantization method, yields better quantized model accuracy than BRECQ alone. Additionally, we combine MixQuant with vanilla asymmetric quantization to show that MixQuant has the potential to optimize the performance of any quantization technique.
Network Memory Footprint Compression Through Jointly Learnable Codebooks and Mappings
Authors: Authors: Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.17361
Pdf link: https://arxiv.org/pdf/2309.17361
Abstract The massive interest in deep neural networks (DNNs) for both computer vision and natural language processing has been sparked by the growth in computational power. However, this led to an increase in the memory footprint, to a point where it can be challenging to simply load a model on commodity devices such as mobile phones. To address this limitation, quantization is a favored solution as it maps high precision tensors to a low precision, memory efficient format. In terms of memory footprint reduction, its most effective variants are based on codebooks. These methods, however, suffer from two limitations. First, they either define a single codebook for each tensor, or use a memory-expensive mapping to multiple codebooks. Second, gradient descent optimization of the mapping favors jumps toward extreme values, hence not defining a proximal search. In this work, we propose to address these two limitations. First, we initially group similarly distributed neurons and leverage the re-ordered structure to either apply different scale factors to the different groups, or map weights that fall in these groups to several codebooks, without any mapping overhead. Second, stemming from this initialization, we propose a joint learning of the codebook and weight mappings that bears similarities with recent gradient-based post-training quantization techniques. Third, drawing estimation from straight-through estimation techniques, we introduce a novel gradient update definition to enable a proximal search of the codebooks and their mappings. The proposed jointly learnable codebooks and mappings (JLCM) method allows a very efficient approximation of any DNN: as such, a Llama 7B can be compressed down to 2Go and loaded on 5-year-old smartphones.
Keyword: adam

Handling Correlated Rounding Error via Preclustering: A 1.73-approximation for Correlation Clustering
Authors: Authors: Vincent Cohen-Addad, Euiwoong Lee, Shi Li, Alantha Newman
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2309.17243
Pdf link: https://arxiv.org/pdf/2309.17243
Abstract We consider the classic Correlation Clustering problem: Given a complete graph where edges are labelled either $+$ or $-$, the goal is to find a partition of the vertices that minimizes the number of the \pedges across parts plus the number of the \medges within parts. Recently, Cohen-Addad, Lee and Newman [CLN22] presented a 1.994-approximation algorithm for the problem using the Sherali-Adams hierarchy, hence breaking through the integrality gap of 2 for the classic linear program and improving upon the 2.06-approximation of Chawla, Makarychev, Schramm and Yaroslavtsev [CMSY15]. We significantly improve the state-of-the-art by providing a 1.73-approximation for the problem. Our approach introduces a preclustering of Correlation Clustering instances that allows us to essentially ignore the error arising from the {\em correlated rounding} used by [CLN22]. This additional power simplifies the previous algorithm and analysis. More importantly, it enables a new {\em set-based rounding} that complements the previous roundings. A combination of these two rounding algorithms yields the improved bound.
Keyword: gradient

Explainable machine learning-based prediction model for diabetic nephropathy
Authors: Authors: Jing-Mei Yin, Yang Li, Jun-Tang Xue, Guo-Wei Zong, Zhong-Ze Fang, Lang Zou
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2309.16730
Pdf link: https://arxiv.org/pdf/2309.16730
Abstract The aim of this study is to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach. The dataset consists of 548 patients from April 2018 to April 2019 in Second Affiliated Hospital of Dalian Medical University (SAHDMU). We select the optimal 38 features through a Least absolute shrinkage and selection operator (LASSO) regression model and a 10-fold cross-validation. We compare four machine learning algorithms, including eXtreme Gradient Boosting (XGB), random forest, decision tree and logistic regression, by AUC-ROC curves, decision curves, calibration curves. We quantify feature importance and interaction effects in the optimal predictive model by Shapley Additive exPlanations (SHAP) method. The XGB model has the best performance to screen for DN with the highest AUC value of 0.966. The XGB model also gains more clinical net benefits than others and the fitting degree is better. In addition, there are significant interactions between serum metabolites and duration of diabetes. We develop a predictive model by XGB algorithm to screen for DN. C2, C5DC, Tyr, Ser, Met, C24, C4DC, and Cys have great contribution in the model, and can possibly be biomarkers for DN.
Efficient Training of One Class Classification-SVMs
Authors: Authors: Isaac Amornortey Yowetu, Nana Kena Frempong
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.16745
Pdf link: https://arxiv.org/pdf/2309.16745
Abstract This study examines the use of a highly effective training method to conduct one-class classification. The existence of both positive and negative examples in the training data is necessary to develop an effective classifier in common binary classification scenarios. Unfortunately, this criteria is not met in many domains. Here, there is just one class of examples. Classification algorithms that learn from solely positive input have been created to deal with this setting. In this paper, an effective algorithm for dual soft-margin one-class SVM training is presented. Our approach makes use of the Augmented Lagrangian (AL-FPGM), a variant of the Fast Projected Gradient Method. The FPGM requires only first derivatives, which for the dual soft margin OCC-SVM means computing mainly a matrix-vector product. Therefore, AL-FPGM, being computationally inexpensive, may complement existing quadratic programming solvers for training large SVMs. We extensively validate our approach over real-world datasets and demonstrate that our strategy obtains statistically significant results.
Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories
Authors: Authors: Benjamin Hoover, Hendrik Strobelt, Dmitry Krotov, Judy Hoffman, Zsolt Kira, Duen Horng Chau
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS)
Arxiv link: https://arxiv.org/abs/2309.16750
Pdf link: https://arxiv.org/pdf/2309.16750
Abstract Diffusion Models (DMs) have recently set state-of-the-art on many generation benchmarks. However, there are myriad ways to describe them mathematically, which makes it difficult to develop a simple understanding of how they work. In this survey, we provide a concise overview of DMs from the perspective of dynamical systems and Ordinary Differential Equations (ODEs) which exposes a mathematical connection to the highly related yet often overlooked class of energy-based models, called Associative Memories (AMs). Energy-based AMs are a theoretical framework that behave much like denoising DMs, but they enable us to directly compute a Lyapunov energy function on which we can perform gradient descent to denoise data. We then summarize the 40 year history of energy-based AMs, beginning with the original Hopfield Network, and discuss new research directions for AMs and DMs that are revealed by characterizing the extent of their similarities and differences
GraB-sampler: Optimal Permutation-based SGD Data Sampler for PyTorch
Authors: Authors: Guanghao Wei
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.16809
Pdf link: https://arxiv.org/pdf/2309.16809
Abstract The online Gradient Balancing (GraB) algorithm greedily choosing the examples ordering by solving the herding problem using per-sample gradients is proved to be the theoretically optimal solution that guarantees to outperform Random Reshuffling. However, there is currently no efficient implementation of GraB for the community to easily use it. This work presents an efficient Python library, $\textit{GraB-sampler}$, that allows the community to easily use GraB algorithms and proposes 5 variants of the GraB algorithm. The best performance result of the GraB-sampler reproduces the training loss and test accuracy results while only in the cost of 8.7% training time overhead and 0.85% peak GPU memory usage overhead.
An analysis of the derivative-free loss method for solving PDEs
Authors: Authors: Jihun Han, Yoonsang Lee
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.16829
Pdf link: https://arxiv.org/pdf/2309.16829
Abstract This study analyzes the derivative-free loss method to solve a certain class of elliptic PDEs using neural networks. The derivative-free loss method uses the Feynman-Kac formulation, incorporating stochastic walkers and their corresponding average values. We investigate the effect of the time interval related to the Feynman-Kac formulation and the walker size in the context of computational efficiency, trainability, and sampling errors. Our analysis shows that the training loss bias is proportional to the time interval and the spatial gradient of the neural network while inversely proportional to the walker size. We also show that the time interval must be sufficiently long to train the network. These analytic results tell that we can choose the walker size as small as possible based on the optimal lower bound of the time interval. We also provide numerical tests supporting our analysis.
Symmetry Leads to Structured Constraint of Learning
Authors: Authors: Liu Ziyin
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.16932
Pdf link: https://arxiv.org/pdf/2309.16932
Abstract Due to common architecture designs, symmetries exist extensively in contemporary neural networks. In this work, we unveil the importance of the loss function symmetries in affecting, if not deciding, the learning behavior of machine learning models. We prove that every mirror symmetry of the loss function leads to a structured constraint, which becomes a favored solution when either the weight decay or gradient noise is large. As direct corollaries, we show that rescaling symmetry leads to sparsity, rotation symmetry leads to low rankness, and permutation symmetry leads to homogeneous ensembling. Then, we show that the theoretical framework can explain the loss of plasticity and various collapse phenomena in neural networks and suggest how symmetries can be used to design algorithms to enforce hard constraints in a differentiable way.
On Uniform Scalar Quantization for Learned Image Compression
Authors: Authors: Haotian Zhang, Li Li, Dong Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.17051
Pdf link: https://arxiv.org/pdf/2309.17051
Abstract Learned image compression possesses a unique challenge when incorporating non-differentiable quantization into the gradient-based training of the networks. Several quantization surrogates have been proposed to fulfill the training, but they were not systematically justified from a theoretical perspective. We fill this gap by contrasting uniform scalar quantization, the most widely used category with rounding being its simplest case, and its training surrogates. In principle, we find two factors crucial: one is the discrepancy between the surrogate and rounding, leading to train-test mismatch; the other is gradient estimation risk due to the surrogate, which consists of bias and variance of the gradient estimation. Our analyses and simulations imply that there is a tradeoff between the train-test mismatch and the gradient estimation risk, and the tradeoff varies across different network structures. Motivated by these analyses, we present a method based on stochastic uniform annealing, which has an adjustable temperature coefficient to control the tradeoff. Moreover, our analyses enlighten us as to two subtle tricks: one is to set an appropriate lower bound for the variance parameter of the estimated quantized latent distribution, which effectively reduces the train-test mismatch; the other is to use zero-center quantization with partial stop-gradient, which reduces the gradient estimation variance and thus stabilize the training. Our method with the tricks is verified to outperform the existing practices of quantization surrogates on a variety of representative image compression networks.
GRANDE: Gradient-Based Decision Tree Ensembles
Authors: Authors: Sascha Marton, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.17130
Pdf link: https://arxiv.org/pdf/2309.17130
Abstract Despite the success of deep learning for text and image data, tree-based ensemble models are still state-of-the-art for machine learning with heterogeneous tabular data. However, there is a significant need for tabular-specific gradient-based methods due to their high flexibility. In this paper, we propose $\text{GRANDE}$, $\text{GRA}$die$\text{N}$t-Based $\text{D}$ecision Tree $\text{E}$nsembles, a novel approach for learning hard, axis-aligned decision tree ensembles using end-to-end gradient descent. GRANDE is based on a dense representation of tree ensembles, which affords to use backpropagation with a straight-through operator to jointly optimize all model parameters. Our method combines axis-aligned splits, which is a useful inductive bias for tabular data, with the flexibility of gradient-based optimization. Furthermore, we introduce an advanced instance-wise weighting that facilitates learning representations for both, simple and complex relations, within a single model. We conducted an extensive evaluation on a predefined benchmark with 19 classification datasets and demonstrate that our method outperforms existing gradient-boosting and deep learning frameworks on most datasets.
Efficient Interpretable Nonlinear Modeling for Multiple Time Series
Authors: Authors: Kevin Roy, Luis Miguel Lopez-Ramos, Baltasar Beferull-Lozano
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.17154
Pdf link: https://arxiv.org/pdf/2309.17154
Abstract Predictive linear and nonlinear models based on kernel machines or deep neural networks have been used to discover dependencies among time series. This paper proposes an efficient nonlinear modeling approach for multiple time series, with a complexity comparable to linear vector autoregressive (VAR) models while still incorporating nonlinear interactions among different time-series variables. The modeling assumption is that the set of time series is generated in two steps: first, a linear VAR process in a latent space, and second, a set of invertible and Lipschitz continuous nonlinear mappings that are applied per sensor, that is, a component-wise mapping from each latent variable to a variable in the measurement space. The VAR coefficient identification provides a topology representation of the dependencies among the aforementioned variables. The proposed approach models each component-wise nonlinearity using an invertible neural network and imposes sparsity on the VAR coefficients to reflect the parsimonious dependencies usually found in real applications. To efficiently solve the formulated optimization problems, a custom algorithm is devised combining proximal gradient descent, stochastic primal-dual updates, and projection to enforce the corresponding constraints. Experimental results on both synthetic and real data sets show that the proposed algorithm improves the identification of the support of the VAR coefficients in a parsimonious manner while also improving the time-series prediction, as compared to the current state-of-the-art methods.
FedZeN: Towards superlinear zeroth-order federated learning via incremental Hessian estimation
Authors: Authors: Alessio Maritan, Subhrakanti Dey, Luca Schenato
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.17174
Pdf link: https://arxiv.org/pdf/2309.17174
Abstract Federated learning is a distributed learning framework that allows a set of clients to collaboratively train a model under the orchestration of a central server, without sharing raw data samples. Although in many practical scenarios the derivatives of the objective function are not available, only few works have considered the federated zeroth-order setting, in which functions can only be accessed through a budgeted number of point evaluations. In this work we focus on convex optimization and design the first federated zeroth-order algorithm to estimate the curvature of the global objective, with the purpose of achieving superlinear convergence. We take an incremental Hessian estimator whose error norm converges linearly, and we adapt it to the federated zeroth-order setting, sampling the random search directions from the Stiefel manifold for improved performance. In particular, both the gradient and Hessian estimators are built at the central server in a communication-efficient and privacy-preserving way by leveraging synchronized pseudo-random number generators. We provide a theoretical analysis of our algorithm, named FedZeN, proving local quadratic convergence with high probability and global linear convergence up to zeroth-order precision. Numerical simulations confirm the superlinear convergence rate and show that our algorithm outperforms the federated zeroth-order methods available in the literature.
Generalized Activation via Multivariate Projection
Authors: Authors: Jiayun Li, Yuxiao Cheng, Zhuofan Xia, Yilin Mo, Gao Huang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.17194
Pdf link: https://arxiv.org/pdf/2309.17194
Abstract Activation functions are essential to introduce nonlinearity into neural networks, with the Rectified Linear Unit (ReLU) often favored for its simplicity and effectiveness. Motivated by the structural similarity between a shallow Feedforward Neural Network (FNN) and a single iteration of the Projected Gradient Descent (PGD) algorithm, a standard approach for solving constrained optimization problems, we consider ReLU as a projection from R onto the nonnegative half-line R+. Building on this interpretation, we extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection, thereby naturally extending it to a Multivariate Projection Unit (MPU), an activation function with multiple inputs and multiple outputs. We further provide a mathematical proof establishing that FNNs activated by SOC projections outperform those utilizing ReLU in terms of expressive power. Experimental evaluations on widely-adopted architectures further corroborate MPU's effectiveness against a broader range of existing activation functions.
Training and inference of large language models using 8-bit floating point
Authors: Authors: Sergio P. Perez, Yan Zhang, James Briggs, Charlie Blake, Josh Levy-Kramer, Paul Balanca, Carlo Luschi, Stephen Barlow, Andrew William Fitzgibbon
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Computation and Language (cs.CL); Emerging Technologies (cs.ET); Performance (cs.PF)
Arxiv link: https://arxiv.org/abs/2309.17224
Pdf link: https://arxiv.org/pdf/2309.17224
Abstract FP8 formats are gaining popularity to boost the computational efficiency for training and inference of large deep learning models. Their main challenge is that a careful choice of scaling is needed to prevent degradation due to the reduced dynamic range compared to higher-precision formats. Although there exists ample literature about selecting such scalings for INT formats, this critical aspect has yet to be addressed for FP8. This paper presents a methodology to select the scalings for FP8 linear layers, based on dynamically updating per-tensor scales for the weights, gradients and activations. We apply this methodology to train and validate large language models of the type of GPT and Llama 2 using FP8, for model sizes ranging from 111M to 70B. To facilitate the understanding of the FP8 dynamics, our results are accompanied by plots of the per-tensor scale distribution for weights, activations and gradients during both training and inference.
Module-wise Training of Neural Networks via the Minimizing Movement Scheme
Authors: Authors: Skander Karkar, Ibrahim Ayed, Emmanuel de Bézenac, Patrick Gallinari
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.17357
Pdf link: https://arxiv.org/pdf/2309.17357
Abstract Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings where memory is limited, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introducing a module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. We call the method TRGL for Transport Regularized Greedy Learning and study it theoretically, proving that it leads to greedy modules that are regular and that progressively solve the task. Experimentally, we show improved accuracy of module-wise training of various architectures such as ResNets, Transformers and VGG, when our regularization is added, superior to that of other module-wise training methods and often to end-to-end training, with as much as 60% less memory usage.
Network Memory Footprint Compression Through Jointly Learnable Codebooks and Mappings
Authors: Authors: Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.17361
Pdf link: https://arxiv.org/pdf/2309.17361
Abstract The massive interest in deep neural networks (DNNs) for both computer vision and natural language processing has been sparked by the growth in computational power. However, this led to an increase in the memory footprint, to a point where it can be challenging to simply load a model on commodity devices such as mobile phones. To address this limitation, quantization is a favored solution as it maps high precision tensors to a low precision, memory efficient format. In terms of memory footprint reduction, its most effective variants are based on codebooks. These methods, however, suffer from two limitations. First, they either define a single codebook for each tensor, or use a memory-expensive mapping to multiple codebooks. Second, gradient descent optimization of the mapping favors jumps toward extreme values, hence not defining a proximal search. In this work, we propose to address these two limitations. First, we initially group similarly distributed neurons and leverage the re-ordered structure to either apply different scale factors to the different groups, or map weights that fall in these groups to several codebooks, without any mapping overhead. Second, stemming from this initialization, we propose a joint learning of the codebook and weight mappings that bears similarities with recent gradient-based post-training quantization techniques. Third, drawing estimation from straight-through estimation techniques, we introduce a novel gradient update definition to enable a proximal search of the codebooks and their mappings. The proposed jointly learnable codebooks and mappings (JLCM) method allows a very efficient approximation of any DNN: as such, a Llama 7B can be compressed down to 2Go and loaded on 5-year-old smartphones.
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Authors: Authors: Kevin Clark, Paul Vicol, Kevin Swersky, David J Fleet
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.17400
Pdf link: https://arxiv.org/pdf/2309.17400
Abstract We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, such as scores from human preference models. We first show that it is possible to backpropagate the reward function gradient through the full sampling procedure, and that doing so achieves strong performance on a variety of rewards, outperforming reinforcement learning-based approaches. We then propose more efficient variants of DRaFT: DRaFT-K, which truncates backpropagation to only the last K steps of sampling, and DRaFT-LV, which obtains lower-variance gradient estimates for the case when K=1. We show that our methods work well for a variety of reward functions and can be used to substantially improve the aesthetic quality of images generated by Stable Diffusion 1.4. Finally, we draw connections between our approach and prior work, providing a unifying perspective on the design space of gradient-based fine-tuning algorithms.
Keyword: super-resolution

Autonomous Guidance Navigation and Control of the VISORS Formation-Flying Mission
Authors: Authors: Tommaso Guffanti, Toby Bell, Samuel Y. W. Low, Mason Murray-Cooper, Simone D'Amico
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.16698
Pdf link: https://arxiv.org/pdf/2309.16698
Abstract Virtual Super-resolution Optics with Reconfigurable Swarms (VISORS) is a distributed telescope mission for high-resolution imaging of the Sun using two 6U CubeSats flying in formation in a Sun-synchronous low-Earth orbit. An optics spacecraft carries a photon sieve acting as a high-resolution lens in the extreme ultraviolet spectrum, while the image passing through the sieve is focused on a detector spacecraft. This paper presents the newly conceived design of the on-board guidance, navigation and control (GNC) system, which is highly autonomous, robust, passively safe, and validated under realistic mission simulations. The primary objective of the GNC system is to establish a passively safe and high-precision formation alignment at 40-meter separation, with sub-centimeter relative navigation and position control accuracy, over repeated observations of 10-second duration. Science mission success rates are assessed via Monte-Carlo analyses under realistically modelled uncertainties stemming from sensing errors, maneuver errors, unmodelled dynamics, and erroneous knowledge of internal spacecraft components. Precise real-time relative navigation is achieved by carrier phase differential GPS with integer ambiguity resolution. Precise control over short baselines is achieved via closed-loop optimization-based stochastic model predictive control with centimeter-level accuracy. Control at far range and during approach is achieved by closed-form impulsive control with meter-level accuracy. Passive safety is enforced throughout the mission to mitigate collision risks even under critical subsystem failure. Beyond VISORS, this work also realizes the crucial insight that the described GNC architecture is generalizable to other distributed space missions where accuracy and fault-tolerant safety are key requirements, such as rendezvous, proximity operations, and swarming missions.
Revisiting Cephalometric Landmark Detection from the view of Human Pose Estimation with Lightweight Super-Resolution Head
Authors: Authors: Qian Wu, Si Yong Yeo, Yufei Chen, Jun Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.17143
Pdf link: https://arxiv.org/pdf/2309.17143
Abstract Accurate localization of cephalometric landmarks holds great importance in the fields of orthodontics and orthognathics due to its potential for automating key point labeling. In the context of landmark detection, particularly in cephalometrics, it has been observed that existing methods often lack standardized pipelines and well-designed bias reduction processes, which significantly impact their performance. In this paper, we revisit a related task, human pose estimation (HPE), which shares numerous similarities with cephalometric landmark detection (CLD), and emphasize the potential for transferring techniques from the former field to benefit the latter. Motivated by this insight, we have developed a robust and adaptable benchmark based on the well-established HPE codebase known as MMPose. This benchmark can serve as a dependable baseline for achieving exceptional CLD performance. Furthermore, we introduce an upscaling design within the framework to further enhance performance. This enhancement involves the incorporation of a lightweight and efficient super-resolution module, which generates heatmap predictions on high-resolution features and leads to further performance refinement, benefiting from its ability to reduce quantization bias. In the MICCAI CLDetection2023 challenge, our method achieves 1st place ranking on three metrics and 3rd place on the remaining one. The code for our method is available at https://github.com/5k5000/CLdetection2023.
Effect of structure-based training on 3D localization precision and quality
Authors: Authors: Armin Abdehkakha, Craig Snoeyink
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2309.17265
Pdf link: https://arxiv.org/pdf/2309.17265
Abstract This study introduces a structural-based training approach for CNN-based algorithms in single-molecule localization microscopy (SMLM) and 3D object reconstruction. We compare this approach with the traditional random-based training method, utilizing the LUENN package as our AI pipeline. The quantitative evaluation demonstrates significant improvements in detection rate and localization precision with the structural-based training approach, particularly in varying signal-to-noise ratios (SNRs). Moreover, the method effectively removes checkerboard artifacts, ensuring more accurate 3D reconstructions. Our findings highlight the potential of the structural-based training approach to advance super-resolution microscopy and deepen our understanding of complex biological systems at the nanoscale.

zoq / arxiv-updates

New submissions for Mon, 2 Oct 23 #611

Keyword: sgd

Keyword: optimization

Alternate Learning based Sparse Semantic Communications for Visual Transmission

Autonomous Guidance Navigation and Control of the VISORS Formation-Flying Mission

AIR: Threats of Adversarial Attacks on Deep Learning-Based Information Recovery

Joint Participation Incentive and Network Pricing Design for Federated Learning

Energy Efficient Foot-Shape Design for Bipedal Walkers on Granular Terrain

XVO: Generalized Visual Odometry via Cross-Modal Self-Training

Photonic Accelerators for Image Segmentation in Autonomous Driving and Defect Detection

Agent Coordination via Contextual Regression (AgentCONCUR) for Data Center Flexibility

SatDM: Synthesizing Realistic Satellite Image with Semantic Layout Conditioning using Diffusion Models

Stochastic Implicit Neural Signed Distance Functions for Safe Motion Planning under Sensing Uncertainty

Stochastic Digital Twin for Copy Detection Patterns

Predicting Object Interactions with Behavior Primitives: An Application in Stowing Tasks

Sourcing Investment Targets for Venture and Growth Capital Using Multivariate Time Series Transformer

SIMD-ified R-tree Query Processing and Optimization

ONNXExplainer: an ONNX Based Generic Framework to Explain Neural Networks Using Shapley Values

TranDRL: A Transformer-Driven Deep Reinforcement Learning Enabled Prescriptive Maintenance Framework

Leveraging Optimization for Adaptive Attacks on Image Watermarks

Multi-Resolution Active Learning of Fourier Neural Operators

Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors

Optimization on the smallest eigenvalue of grounded Laplacian matrix via edge addition

UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and Light-Weight Modeling

Double-Layer Power Control for Mobile Cell-Free XL-MIMO with Multi-Agent Reinforcement Learning

Too Big, so Fail? -- Enabling Neural Construction Methods to Solve Large-Scale Routing Problems

Guiding Instruction-based Image Editing via Multimodal Large Language Models

GRANDE: Gradient-Based Decision Tree Ensembles

Convex Optimization of Bearing Formation Control of Rigid bodies on Lie Group

Efficient Interpretable Nonlinear Modeling for Multiple Time Series

FedZeN: Towards superlinear zeroth-order federated learning via incremental Hessian estimation

RECOMBINER: Robust and Enhanced Compression with Bayesian Implicit Neural Representations

Meta Reinforcement Learning for Fast Spectrum Sharing in Vehicular Networks

M-DAB: An Input-Distribution Optimization Algorithm for Composite DNA Storage by the Multinomial Channel

Generalized Activation via Multivariate Projection

Memory Gym: Partially Observable Challenges to Memory-Based Agents in Endless Episodes

RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization

Differentiable Optimization Based Time-Varying Control Barrier Functions for Dynamic Obstacle Avoidance

MORPH: Design Co-optimization with Reinforcement Learning via a Differentiable Hardware Model Proxy

A Framework and a python-package for Real-time NMPC parameters settings

Data-Driven Min-Max MPC for Linear Systems

Few-Shot Domain Adaptation for Charge Prediction on Unprofessional Descriptions

Toward Operationalizing Pipeline-aware ML Fairness: A Research Agenda for Developing Practical Guidelines and Tools

MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search

Network Memory Footprint Compression Through Jointly Learnable Codebooks and Mappings

Keyword: adam

Handling Correlated Rounding Error via Preclustering: A 1.73-approximation for Correlation Clustering

Keyword: gradient

Explainable machine learning-based prediction model for diabetic nephropathy

Efficient Training of One Class Classification-SVMs

Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories

GraB-sampler: Optimal Permutation-based SGD Data Sampler for PyTorch

An analysis of the derivative-free loss method for solving PDEs

Symmetry Leads to Structured Constraint of Learning

On Uniform Scalar Quantization for Learned Image Compression

GRANDE: Gradient-Based Decision Tree Ensembles

Efficient Interpretable Nonlinear Modeling for Multiple Time Series

FedZeN: Towards superlinear zeroth-order federated learning via incremental Hessian estimation

Generalized Activation via Multivariate Projection

Training and inference of large language models using 8-bit floating point

Module-wise Training of Neural Networks via the Minimizing Movement Scheme

Network Memory Footprint Compression Through Jointly Learnable Codebooks and Mappings

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Keyword: super-resolution

Autonomous Guidance Navigation and Control of the VISORS Formation-Flying Mission

Revisiting Cephalometric Landmark Detection from the view of Human Pose Estimation with Lightweight Super-Resolution Head

Effect of structure-based training on 3D localization precision and quality