Abstract
Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance remains a challenging task. Given the inherent complexity of neural networks, the simple arithmetic averaging of the obtained local optima models in undesirable results. This paper proposes a {\em soft merging} method that facilitates rapid merging of multiple models, simplifies the merging of specific parts of neural networks, and enhances robustness against malicious models with extreme values. This is achieved by learning gate parameters through a surrogate of the $l_0$ norm using hard concrete distribution without modifying the model weights of the given local optima models. This merging process not only enhances the model performance by converging to a better local optimum, but also minimizes computational costs, offering an efficient and explicit learning process integrated with stochastic gradient descent. Thorough experiments underscore the effectiveness and superior performance of the merged neural networks.
Keyword: optimization
AdBooster: Personalized Ad Creative Generation using Stable Diffusion Outpainting
Abstract
In digital advertising, the selection of the optimal item (recommendation) and its best creative presentation (creative optimization) have traditionally been considered separate disciplines. However, both contribute significantly to user satisfaction, underpinning our assumption that it relies on both an item's relevance and its presentation, particularly in the case of visual creatives. In response, we introduce the task of {\itshape Generative Creative Optimization (GCO)}, which proposes the use of generative models for creative generation that incorporate user interests, and {\itshape AdBooster}, a model for personalized ad creatives based on the Stable Diffusion outpainting architecture. This model uniquely incorporates user interests both during fine-tuning and at generation time. To further improve AdBooster's performance, we also introduce an automated data augmentation pipeline. Through our experiments on simulated data, we validate AdBooster's effectiveness in generating more relevant creatives than default product images, showing its potential of enhancing user engagement.
Ad-load Balancing via Off-policy Learning in a Content Marketplace
Abstract
Ad-load balancing is a critical challenge in online advertising systems, particularly in the context of social media platforms, where the goal is to maximize user engagement and revenue while maintaining a satisfactory user experience. This requires the optimization of conflicting objectives, such as user satisfaction and ads revenue. Traditional approaches to ad-load balancing rely on static allocation policies, which fail to adapt to changing user preferences and contextual factors. In this paper, we present an approach that leverages off-policy learning and evaluation from logged bandit feedback. We start by presenting a motivating analysis of the ad-load balancing problem, highlighting the conflicting objectives between user satisfaction and ads revenue. We emphasize the nuances that arise due to user heterogeneity and the dependence on the user's position within a session. Based on this analysis, we define the problem as determining the optimal ad-load for a particular feed fetch. To tackle this problem, we propose an off-policy learning framework that leverages unbiased estimators such as Inverse Propensity Scoring (IPS) and Doubly Robust (DR) to learn and estimate the policy values using offline collected stochastic data. We present insights from online A/B experiments deployed at scale across over 80 million users generating over 200 million sessions, where we find statistically significant improvements in both user satisfaction metrics and ads revenue for the platform.
EPTQ: Enhanced Post-Training Quantization via Label-Free Hessian
Authors: Authors: Ofir Gordon, Hai Victor Habi, Arnon Netzer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Quantization of deep neural networks (DNN) has become a key element in the efforts of embedding such networks on end-user devices. However, current quantization methods usually suffer from costly accuracy degradation. In this paper, we propose a new method for Enhanced Post Training Quantization named EPTQ. The method is based on knowledge distillation with an adaptive weighting of layers. In addition, we introduce a new label-free technique for approximating the Hessian trace of the task loss, named Label-Free Hessian. This technique removes the requirement of a labeled dataset for computing the Hessian. The adaptive knowledge distillation uses the Label-Free Hessian technique to give greater attention to the sensitive parts of the model while performing the optimization. Empirically, by employing EPTQ we achieve state-of-the-art results on a wide variety of models, tasks, and datasets, including ImageNet classification, COCO object detection, and Pascal-VOC for semantic segmentation. We demonstrate the performance and compatibility of EPTQ on an extended set of architectures, including CNNs, Transformers, hybrid, and MLP-only models.
Parallel-mentoring for Offline Model-based Optimization
Authors: Authors: Can Chen, Christopher Beckham, Zixuan Liu, Xue Liu, Christopher Pal
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose \textit{parallel-mentoring} as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, \textit{voting-based pairwise supervision}, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an \textit{adaptive soft-labeling} module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.
Importance-aware Co-teaching for Offline Model-based Optimization
Authors: Authors: Ye Yuan, Can Chen, Zixuan Liu, Willie Neiswanger, Xue Liu
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
Offline model-based optimization aims to find a design that maximizes a property of interest using only an offline dataset, with applications in robot, protein, and molecule design, among others. A prevalent approach is gradient ascent, where a proxy model is trained on the offline dataset and then used to optimize the design. This method suffers from an out-of-distribution issue, where the proxy is not accurate for unseen designs. To mitigate this issue, we explore using a pseudo-labeler to generate valuable data for fine-tuning the proxy. Specifically, we propose \textit{\textbf{I}mportance-aware \textbf{C}o-\textbf{T}eaching for Offline Model-based Optimization}~(\textbf{ICT}). This method maintains three symmetric proxies with their mean ensemble as the final proxy, and comprises two steps. The first step is \textit{pseudo-label-driven co-teaching}. In this step, one proxy is iteratively selected as the pseudo-labeler for designs near the current optimization point, generating pseudo-labeled data. Subsequently, a co-teaching process identifies small-loss samples as valuable data and exchanges them between the other two proxies for fine-tuning, promoting knowledge transfer. This procedure is repeated three times, with a different proxy chosen as the pseudo-labeler each time, ultimately enhancing the ensemble performance. To further improve accuracy of pseudo-labels, we perform a secondary step of \textit{meta-learning-based sample reweighting}, which assigns importance weights to samples in the pseudo-labeled dataset and updates them via meta-learning. ICT achieves state-of-the-art results across multiple design-bench tasks, achieving the best mean rank of $3.1$ and median rank of $2$, among $15$ methods. Our source code can be found here.
Latent Diffusion Models for Structural Component Design
Abstract
Recent advances in generative modeling, namely Diffusion models, have revolutionized generative modeling, enabling high-quality image generation tailored to user needs. This paper proposes a framework for the generative design of structural components. Specifically, we employ a Latent Diffusion model to generate potential designs of a component that can satisfy a set of problem-specific loading conditions. One of the distinct advantages our approach offers over other generative approaches, such as generative adversarial networks (GANs), is that it permits the editing of existing designs. We train our model using a dataset of geometries obtained from structural topology optimization utilizing the SIMP algorithm. Consequently, our framework generates inherently near-optimal designs. Our work presents quantitative results that support the structural performance of the generated designs and the variability in potential candidate designs. Furthermore, we provide evidence of the scalability of our framework by operating over voxel domains with resolutions varying from $32^3$ to $128^3$. Our framework can be used as a starting point for generating novel near-optimal designs similar to topology-optimized designs.
GenLayNeRF: Generalizable Layered Representations with 3D Model Alignment for Multi-Human View Synthesis
Abstract
Novel view synthesis (NVS) of multi-human scenes imposes challenges due to the complex inter-human occlusions. Layered representations handle the complexities by dividing the scene into multi-layered radiance fields, however, they are mainly constrained to per-scene optimization making them inefficient. Generalizable human view synthesis methods combine the pre-fitted 3D human meshes with image features to reach generalization, yet they are mainly designed to operate on single-human scenes. Another drawback is the reliance on multi-step optimization techniques for parametric pre-fitting of the 3D body models that suffer from misalignment with the images in sparse view settings causing hallucinations in synthesized views. In this work, we propose, GenLayNeRF, a generalizable layered scene representation for free-viewpoint rendering of multiple human subjects which requires no per-scene optimization and very sparse views as input. We divide the scene into multi-human layers anchored by the 3D body meshes. We then ensure pixel-level alignment of the body models with the input views through a novel end-to-end trainable module that carries out iterative parametric correction coupled with multi-view feature fusion to produce aligned 3D models. For NVS, we extract point-wise image-aligned and human-anchored features which are correlated and fused using self-attention and cross-attention modules. We augment low-level RGB values into the features with an attention-based RGB fusion module. To evaluate our approach, we construct two multi-human view synthesis datasets; DeepMultiSyn and ZJU-MultiHuman. The results indicate that our proposed approach outperforms generalizable and non-human per-scene NeRF methods while performing at par with layered per-scene methods without test time optimization.
Interactively Optimizing Layout Transfer for Vector Graphics
Authors: Authors: Jeremy Warner, Shuyao Zhou, Bjoern Hartmann
Abstract
One of the most common ways to represent and share visual designs is with vector graphics. Designers working with vector graphics often explore layout alternatives and generate them by moving and resizing elements. The motivation for this can range from establishing a different visual flow, adapting a design to a different aspect ratio, standardizing spacing, or redirecting the design's visual emphasis. Existing designs can serve as a source of inspiration for layout modification across these goals. However, generating these layout alternatives still requires significant manual effort in rearranging large groups of elements. We present VLT, short for Vector Layout Transfer, a novel graphic design tool that enables flexible transfer of layouts between designs. It provides designers with multiple levels of semantic layout editing controls, powered by automatic graphics correspondence and layout optimization algorithms.
Achieving Autonomous Cloth Manipulation with Optimal Control via Differentiable Physics-Aware Regularization and Safety Constraints
Authors: Authors: Yutong Zhang, Fei Liu, Xiao Liang, Michael Yip
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Cloth manipulation is a category of deformable object manipulation of great interest to the robotics community, from applications of automated laundry-folding and home organizing and cleaning to textiles and flexible manufacturing. Despite the desire for automated cloth manipulation, the thin-shell dynamics and under-actuation nature of cloth present significant challenges for robots to effectively interact with them. Many recent works omit explicit modeling in favor of learning-based methods that may yield control policies directly. However, these methods require large training sets that must be collected and curated. In this regard, we create a framework for differentiable modeling of cloth dynamics leveraging an Extended Position-based Dynamics (XPBD) algorithm. Together with the desired control objective, physics-aware regularization terms are designed for better results, including trajectory smoothness and elastic potential energy. In addition, safety constraints, such as avoiding obstacles, can be specified using signed distance functions (SDFs). We formulate the cloth manipulation task with safety constraints as a constrained optimization problem, which can be effectively solved by mainstream gradient-based optimizers thanks to the end-to-end differentiability of our framework. Finally, we assess the proposed framework for manipulation tasks with various safety thresholds and demonstrate the feasibility of result trajectories on a surgical robot. The effects of the regularization terms are analyzed in an additional ablation study.
Real-to-Sim Deformable Object Manipulation: Optimizing Physics Models with Residual Mappings for Robotic Surgery
Abstract
Accurate deformable object manipulation (DOM) is essential for achieving autonomy in robotic surgery, where soft tissues are being displaced, stretched, and dissected. Many DOM methods can be powered by simulation, which ensures realistic deformation by adhering to the governing physical constraints and allowing for model prediction and control. However, real soft objects in robotic surgery, such as membranes and soft tissues, have complex, anisotropic physical parameters that a simulation with simple initialization from cameras may not fully capture. To use the simulation techniques in real surgical tasks, the "real-to-sim" gap needs to be properly compensated. In this work, we propose an online, adaptive parameter tuning approach for simulation optimization that (1) bridges the real-to-sim gap between a physics simulation and observations obtained 3D perceptions through estimating a residual mapping and (2) optimizes its stiffness parameters online. Our method ensures a small residual gap between the simulation and observation and improves the simulation's predictive capabilities. The effectiveness of the proposed mechanism is evaluated in the manipulation of both a thin-shell and volumetric tissue, representative of most tissue scenarios. This work contributes to the advancement of simulation-based deformable tissue manipulation and holds potential for improving surgical autonomy.
Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework
Authors: Authors: Sina Baharlouei, Meisam Razaviyayn
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (stat.ML)
Abstract
While training fair machine learning models has been studied extensively in recent years, most developed methods rely on the assumption that the training and test data have similar distributions. In the presence of distribution shifts, fair models may behave unfairly on test data. There have been some developments for fair learning robust to distribution shifts to address this shortcoming. However, most proposed solutions are based on the assumption of having access to the causal graph describing the interaction of different features. Moreover, existing algorithms require full access to data and cannot be used when small batches are used (stochastic/batch implementation). This paper proposes the first stochastic distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph. More specifically, we formulate the fair inference in the presence of the distribution shift as a distributionally robust optimization problem under $L_p$ norm uncertainty sets with respect to the Exponential Renyi Mutual Information (ERMI) as the measure of fairness violation. We then discuss how the proposed method can be implemented in a stochastic fashion. We have evaluated the presented framework's performance and efficiency through extensive experiments on real datasets consisting of distribution shifts.
Large-scale Pretraining Improves Sample Efficiency of Active Learning based Molecule Virtual Screening
Authors: Authors: Zhonglin Cao, Simone Sciabola, Ye Wang
Abstract
Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, brute-force virtual screening using traditional tools such as docking becomes infeasible in terms of time and computational resources. Active learning and Bayesian optimization has recently been proven as effective methods of narrowing down the search space. An essential component in those methods is a surrogate machine learning model that is trained with a small subset of the library to predict the desired properties of compounds. Accurate model can achieve high sample efficiency by finding the most promising compounds with only a fraction of the whole library being virtually screened. In this study, we examined the performance of pretrained transformer-based language model and graph neural network in Bayesian optimization active learning framework. The best pretrained models identifies 58.97% of the top-50000 by docking score after screening only 0.6% of an ultra-large library containing 99.5 million compounds, improving 8% over previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Such model can serve as a boost to the accuracy and sample efficiency of active learning based molecule virtual screening.
RAI4IoE: Responsible AI for Enabling the Internet of Energy
Abstract
This paper plans to develop an Equitable and Responsible AI framework with enabling techniques and algorithms for the Internet of Energy (IoE), in short, RAI4IoE. The energy sector is going through substantial changes fueled by two key drivers: building a zero-carbon energy sector and the digital transformation of the energy infrastructure. We expect to see the convergence of these two drivers resulting in the IoE, where renewable distributed energy resources (DERs), such as electric cars, storage batteries, wind turbines and photovoltaics (PV), can be connected and integrated for reliable energy distribution by leveraging advanced 5G-6G networks and AI technology. This allows DER owners as prosumers to participate in the energy market and derive economic incentives. DERs are inherently asset-driven and face equitable challenges (i.e., fair, diverse and inclusive). Without equitable access, privileged individuals, groups and organizations can participate and benefit at the cost of disadvantaged groups. The real-time management of DER resources not only brings out the equity problem to the IoE, it also collects highly sensitive location, time, activity dependent data, which requires to be handled responsibly (e.g., privacy, security and safety), for AI-enhanced predictions, optimization and prioritization services, and automated management of flexible resources. The vision of our project is to ensure equitable participation of the community members and responsible use of their data in IoE so that it could reap the benefits of advances in AI to provide safe, reliable and sustainable energy services.
FleXstage: Lightweight Magnetically Levitated Precision Stage with Over-Actuation towards High-Throughput IC Manufacturing
Abstract
Precision motion stages play a critical role in various manufacturing and inspection equipment, for example, the wafer/reticle scanning in photolithography scanners and positioning stages in wafer inspection systems. To meet the growing demand for higher throughput in chip manufacturing and inspection, it is critical to create new precision motion stages with higher acceleration capability with high control bandwidth, which calls for the development of lightweight precision stages. However, in today's precision motion systems, only the rigid body motion of the system are under control, and the flexible dynamic systems are in open loop. For these systems, the motion control bandwidth is limited by the first structural resonance frequency of the stage, which enforces a fundamental trade-off between the stage's bandwidth and acceleration capability. Aiming to overcome this trade-off, we have introduced a sequential structure and control design framework for lightweight stages with the low-frequency flexible modes of the stage are under active control. To facilitate the controller design, we further propose to minimize the resonance frequency of the stage's mode being controlled and to maximize the resonance frequency of the uncontrolled mode. The system's control bandwidth is placed in between the resonance frequencies. This paper presents the design, optimization, building, and experimental evaluations for a lightweight magnetically levitated planar stage, which we call FleXstage, with first flexible mode actively controlled via over-actuation. Simulations show the proposed design is highly promising in enabling stages with lightweight without sacrificing control bandwidth. We have some preliminary results now and are still working on the experimental evaluations for the closed-loop system, and will present the results in the oral presentation.
Deep Learning Meets Swarm Intelligence for UAV-Assisted IoT Coverage in Massive MIMO
Abstract
This study considers a UAV-assisted multi-user massive multiple-input multiple-output (MU-mMIMO) systems, where a decode-and-forward (DF) relay in the form of an unmanned aerial vehicle (UAV) facilitates the transmission of multiple data streams from a base station (BS) to multiple Internet-of-Things (IoT) users. A joint optimization problem of hybrid beamforming (HBF), UAV relay positioning, and power allocation (PA) to multiple IoT users to maximize the total achievable rate (AR) is investigated. The study adopts a geometry-based millimeter-wave (mmWave) channel model for both links and proposes three different swarm intelligence (SI)-based algorithmic solutions to optimize: 1) UAV location with equal PA; 2) PA with fixed UAV location; and 3) joint PA with UAV deployment. The radio frequency (RF) stages are designed to reduce the number of RF chains based on the slow time-varying angular information, while the baseband (BB) stages are designed using the reduced-dimension effective channel matrices. Then, a novel deep learning (DL)-based low-complexity joint hybrid beamforming, UAV location and power allocation optimization scheme (J-HBF-DLLPA) is proposed via fully-connected deep neural network (DNN), consisting of an offline training phase, and an online prediction of UAV location and optimal power values for maximizing the AR. The illustrative results show that the proposed algorithmic solutions can attain higher capacity and reduce average delay for delay-constrained transmissions in a UAV-assisted MU-mMIMO IoT systems. Additionally, the proposed J-HBF-DLLPA can closely approach the optimal capacity while significantly reducing the runtime by 99%, which makes the DL-based solution a promising implementation for real-time online applications in UAV-assisted MU-mMIMO IoT systems.
A Real-Time Multi-Task Learning System for Joint Detection of Face, Facial Landmark and Head Pose
Authors: Authors: Qingtian Wu, Liming Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Extreme head postures pose a common challenge across a spectrum of facial analysis tasks, including face detection, facial landmark detection (FLD), and head pose estimation (HPE). These tasks are interdependent, where accurate FLD relies on robust face detection, and HPE is intricately associated with these key points. This paper focuses on the integration of these tasks, particularly when addressing the complexities posed by large-angle face poses. The primary contribution of this study is the proposal of a real-time multi-task detection system capable of simultaneously performing joint detection of faces, facial landmarks, and head poses. This system builds upon the widely adopted YOLOv8 detection framework. It extends the original object detection head by incorporating additional landmark regression head, enabling efficient localization of crucial facial landmarks. Furthermore, we conduct optimizations and enhancements on various modules within the original YOLOv8 framework. To validate the effectiveness and real-time performance of our proposed model, we conduct extensive experiments on 300W-LP and AFLW2000-3D datasets. The results obtained verify the capability of our model to tackle large-angle face pose challenges while delivering real-time performance across these interconnected tasks.
Collaborative Fault-Identification & Reconstruction in Multi-Agent Systems
Authors: Authors: Shiraz Khan, Inseok Hwang
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
Abstract
The conventional solutions for fault-detection, identification, and reconstruction (FDIR) require centralized decision-making mechanisms which are typically combinatorial in their nature, necessitating the design of an efficient distributed FDIR mechanism that is suitable for multi-agent applications. To this end, we develop a general framework for efficiently reconstructing a sparse vector being observed over a sensor network via nonlinear measurements. The proposed framework is used to design a distributed multi-agent FDIR algorithm based on a combination of the sequential convex programming (SCP) and the alternating direction method of multipliers (ADMM) optimization approaches. The proposed distributed FDIR algorithm can process a variety of inter-agent measurements (including distances, bearings, relative velocities, and subtended angles between agents) to identify the faulty agents and recover their true states. The effectiveness of the proposed distributed multi-agent FDIR approach is demonstrated by considering a numerical example in which the inter-agent distances are used to identify the faulty agents in a multi-agent configuration, as well as reconstruct their error vectors.
Joint Beamforming for RIS Aided Full-Duplex Integrated Sensing and Uplink Communication
Authors: Authors: Yuan Guo, Yang Liu, Qingqing Wu, Xin Zeng, Qingjiang Shi
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Abstract
This paper studies integrated sensing and communication (ISAC) technology in a full-duplex (FD) uplink communication system. As opposed to the half-duplex system, where sensing is conducted in a first-emit-then-listen manner, FD ISAC system emits and listens simultaneously and hence conducts uninterrupted target sensing. Besides, impressed by the recently emerging reconfigurable intelligent surface (RIS) technology, we also employ RIS to improve the self-interference (SI) suppression and signal processing gain. As will be seen, the joint beamforming, RIS configuration and mobile users' power allocation is a difficult optimization problem. To resolve this challenge, via leveraging the cutting-the-edge majorization-minimization (MM) and penalty-dual-decomposition (PDD) methods, we develop an iterative solution that optimizes all variables via using convex optimization techniques. Numerical results demonstrate the effectiveness of our proposed solution and the great benefit of employing RIS in the FD ISAC system.
Abstract
Over the last decades, ample achievements have been made on Structure from motion (SfM). However, the vast majority of them basically work in an offline manner, i.e., images are firstly captured and then fed together into a SfM pipeline for obtaining poses and sparse point cloud. In this work, on the contrary, we present an on-the-fly SfM: running online SfM while image capturing, the newly taken On-the-Fly image is online estimated with the corresponding pose and points, i.e., what you capture is what you get. Specifically, our approach firstly employs a vocabulary tree that is unsupervised trained using learning-based global features for fast image retrieval of newly fly-in image. Then, a robust feature matching mechanism with least squares (LSM) is presented to improve image registration performance. Finally, via investigating the influence of newly fly-in image's connected neighboring images, an efficient hierarchical weighted local bundle adjustment (BA) is used for optimization. Extensive experimental results demonstrate that on-the-fly SfM can meet the goal of robustly registering the images while capturing in an online way.
A Machine Learning-oriented Survey on Tiny Machine Learning
Authors: Authors: Luigi Capogrosso, Federico Cunico, Dong Seon Cheng, Franco Fummi, Marco cristani
Abstract
The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused computing technologies (e.g., smart cities, automotive, and medical robotics). Given its multidisciplinary nature, the field of TinyML has been approached from many different angles: this comprehensive survey wishes to provide an up-to-date overview focused on all the learning algorithms within TinyML-based solutions. The survey is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodological flow, allowing for a systematic and complete literature survey. In particular, firstly we will examine the three different workflows for implementing a TinyML-based system, i.e., ML-oriented, HW-oriented, and co-design. Secondly, we propose a taxonomy that covers the learning panorama under the TinyML lens, examining in detail the different families of model optimization and design, as well as the state-of-the-art learning techniques. Thirdly, this survey will present the distinct features of hardware devices and software tools that represent the current state-of-the-art for TinyML intelligent edge applications. Finally, we discuss the challenges and future directions.
State-aware Real-time Tracking and Remote Reconstruction of a Markov Source
Abstract
The problem of real-time remote tracking and reconstruction of a two-state Markov process is considered here. A transmitter sends samples from an observed information source to a remote monitor over an unreliable wireless channel. The receiver, in turn, performs an action according to the state of the reconstructed source. We propose a state-aware randomized stationary sampling and transmission policy which accounts for the importance of different states of the information source, and their impact on the goal of the communication process. We then analyze the performance of the proposed policy, and compare it with existing goal-oriented joint sampling and transmission policies, with respect to a set of performance metrics. Specifically, we study the real-time reconstruction error, the cost of actuation error, the consecutive error, and a new metric, coined importance-aware consecutive error. In addition, we formulate and solve a constrained optimization problem that aims to obtain the optimal sampling probabilities that minimize the average cost of actuation error. Our results show that in the scenario of constrained sampling generation, the optimal state-aware randomized stationary policy outperforms all other sampling policies for fast evolving sources, and, under certain conditions, for slowly varying sources. Otherwise, a semantics-aware policy performs better only when the source is slowly varying.
Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
Authors: Authors: Zheng Nan, Ting Dang, Vidhyasaharan Sethu, Beena Ahmed
Abstract
Connectionist temporal classification (CTC) is commonly adopted for sequence modeling tasks like speech recognition, where it is necessary to preserve order between the input and target sequences. However, CTC is only applied to deterministic sequence models, where the latent space is discontinuous and sparse, which in turn makes them less capable of handling data variability when compared to variational models. In this paper, we integrate CTC with a variational model and derive loss functions that can be used to train more generalizable sequence models that preserve order. Specifically, we derive two versions of the novel variational CTC based on two reasonable assumptions, the first being that the variational latent variables at each time step are conditionally independent; and the second being that these latent variables are Markovian. We show that both loss functions allow direct optimization of the variational lower bound for the model log-likelihood, and present computationally tractable forms for implementing them.
Enhancing SAEAs with Unevaluated Solutions: A Case Study of Relation Model for Expensive Optimization
Abstract
Surrogate-assisted evolutionary algorithms (SAEAs) hold significant importance in resolving expensive optimization problems~(EOPs). Extensive efforts have been devoted to improving the efficacy of SAEAs through the development of proficient model-assisted selection methods. However, generating high-quality solutions is a prerequisite for selection. The fundamental paradigm of evaluating a limited number of solutions in each generation within SAEAs reduces the variance of adjacent populations, thus impacting the quality of offspring solutions. This is a frequently encountered issue, yet it has not gained widespread attention. This paper presents a framework using unevaluated solutions to enhance the efficiency of SAEAs. The surrogate model is employed to identify high-quality solutions for direct generation of new solutions without evaluation. To ensure dependable selection, we have introduced two tailored relation models for the selection of the optimal solution and the unevaluated population. A comprehensive experimental analysis is performed on two test suites, which showcases the superiority of the relation model over regression and classification models in the selection phase. Furthermore, the surrogate-selected unevaluated solutions with high potential have been shown to significantly enhance the efficiency of the algorithm.
An elastic properties-based topology optimization algorithm for linear orthotropic, functionally graded materials
Authors: Authors: Ismael Ben-Yelun, Víctor Riera, Luis Saucedo-Mora, Miguel Ángel Sanz, Francisco Javier Montáns
Abstract
Topology optimization (TO) has experienced a dramatic development over the last decades aided by the arising of metamaterials and additive manufacturing (AM) techniques, and it is intended to achieve the current and future challenges. In this paper we propose an extension for linear orthotropic materials of a three-dimensional TO algorithm which directly operates on the six elastic properties -- three longitudinal and shear moduli, having fixed three Poisson ratios -- of the finite element (FE) discretization of certain analysis domain. By performing a gradient-descent-alike optimization on these properties, the standard deviation of a strain-energy measurement is minimized, thus coming up with optimized, strain-homogenized structures with variable longitudinal and shear stiffness in their different material directions. To this end, an orthotropic formulation with two approaches -- direct or strain-based and complementary or stress-based -- has been developed for this optimization problem, being the stress-based more efficient as previous works on this topic have shown. The key advantages that we propose are: (1) the use of orthotropic ahead of isotropic materials, which enables a more versatile optimization process since the design space is increased by six times, and (2) no constraint needs to be imposed (such as maximum volume) in contrast to other methods widely used in this field such as Solid Isotropic Material with Penalization (SIMP), all of this by setting one unique hyper-parameter. Results of four designed load cases show that this orthotropic-TO algorithm outperforms the isotropic case, both for the similar algorithm from which this is an extension and for a SIMP run in a FE commercial software, presenting a comparable computational cost. We remark that it works particularly effectively on pure shear or shear-governed problems such as torsion loading.
Multi-Task Cooperative Learning via Searching for Flat Minima
Authors: Authors: Fuping Wu, Le Zhang, Yang Sun, Yuanhan Mo, Thomas Nichols, Bartlomiej W. Papiez
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multi-task learning (MTL) has shown great potential in medical image analysis, improving the generalizability of the learned features and the performance in individual tasks. However, most of the work on MTL focuses on either architecture design or gradient manipulation, while in both scenarios, features are learned in a competitive manner. In this work, we propose to formulate MTL as a multi/bi-level optimization problem, and therefore force features to learn from each task in a cooperative approach. Specifically, we update the sub-model for each task alternatively taking advantage of the learned sub-models of the other tasks. To alleviate the negative transfer problem during the optimization, we search for flat minima for the current objective function with regard to features from other tasks. To demonstrate the effectiveness of the proposed approach, we validate our method on three publicly available datasets. The proposed method shows the advantage of cooperative learning, and yields promising results when compared with the state-of-the-art MTL approaches. The code will be available online.
Physics-informed State-space Neural Networks for Transport Phenomena
Authors: Authors: Akshay J Dave, Richard B. Vilim
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)
Abstract
This work introduces Physics-informed State-space neural network Models (PSMs), a novel solution to achieving real-time optimization, flexibility, and fault tolerance in autonomous systems, particularly in transport-dominated systems such as chemical, biomedical, and power plants. Traditional data-driven methods fall short due to a lack of physical constraints like mass conservation; PSMs address this issue by training deep neural networks with sensor data and physics-informing using components' Partial Differential Equations (PDEs), resulting in a physics-constrained, end-to-end differentiable forward dynamics model. Through two in silico experiments - a heated channel and a cooling system loop - we demonstrate that PSMs offer a more accurate approach than purely data-driven models. Beyond accuracy, there are several compelling use cases for PSMs. In this work, we showcase two: the creation of a nonlinear supervisory controller through a sequentially updated state-space representation and the proposal of a diagnostic algorithm using residuals from each of the PDEs. The former demonstrates the ability of PSMs to handle both constant and time-dependent constraints, while the latter illustrates their value in system diagnostics and fault detection. We further posit that PSMs could serve as a foundation for Digital Twins, constantly updated digital representations of physical systems.
SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices
Abstract
Adiabatic Quantum-Flux-Parametron (AQFP) is a superconducting logic with extremely high energy efficiency. By employing the distinct polarity of current to denote logic 0' and1', AQFP devices serve as excellent carriers for binary neural network (BNN) computations. Although recent research has made initial strides toward developing an AQFP-based BNN accelerator, several critical challenges remain, preventing the design from being a comprehensive solution. In this paper, we propose SupeRBNN, an AQFP-based randomized BNN acceleration framework that leverages software-hardware co-optimization to eventually make the AQFP devices a feasible solution for BNN acceleration. Specifically, we investigate the randomized behavior of the AQFP devices and analyze the impact of crossbar size on current attenuation, subsequently formulating the current amplitude into the values suitable for use in BNN computation. To tackle the accumulation problem and improve overall hardware performance, we propose a stochastic computing-based accumulation module and a clocking scheme adjustment-based circuit optimization method. We validate our SupeRBNN framework across various datasets and network architectures, comparing it with implementations based on different technologies, including CMOS, ReRAM, and superconducting RSFQ/ERSFQ. Experimental results demonstrate that our design achieves an energy efficiency of approximately 7.8x10^4 times higher than that of the ReRAM-based BNN framework while maintaining a similar level of model accuracy. Furthermore, when compared with superconductor-based counterparts, our framework demonstrates at least two orders of magnitude higher energy efficiency.
ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems
Authors: Authors: Jinqing Lian, Xinyi Zhang, Yingxia Shao, Zenglin Pu, Qingfeng Xiang, Yawen Li, Bin Cui
Abstract
The past decade has seen rapid growth of distributed stream data processing systems. Under these systems, a stream application is realized as a Directed Acyclic Graph (DAG) of operators, where the level of parallelism of each operator has a substantial impact on its overall performance. However, finding optimal levels of parallelism remains challenging. Most existing methods are heavily coupled with the topological graph of operators, unable to efficiently tune under-provisioned jobs. They either insufficiently use previous tuning experience by treating successively tuning independently, or explore the configuration space aggressively, violating the Service Level Agreements (SLA). To address the above problems, we propose ContTune, a continuous tuning system for stream applications. It is equipped with a novel Big-small algorithm, in which the Big phase decouples the tuning from the topological graph by decomposing the job tuning problem into sub-problems that can be solved concurrently. We propose a conservative Bayesian Optimization (CBO) technique in the Small phase to speed up the tuning process by utilizing the previous observations. It leverages the state-of-the-art (SOTA) tuning method as conservative exploration to avoid SLA violations. Experimental results show that ContTune reduces up to 60.75% number of reconfigurations under synthetic workloads and up to 57.5% number of reconfigurations under real workloads, compared to the SOTA method DS2.
Planning Optimal Trajectories for Mobile Manipulators under End-effector Trajectory Continuity Constraint
Abstract
Mobile manipulators have been employed in many applications which are usually performed by multiple fixed-base robots or a large-size system, thanks to the mobility of the mobile base. However, the mobile base also brings redundancies to the system, which makes trajectory planning more challenging. One class of problems recently arising from mobile 3D printing is the trajectory-continuous tasks, in which the end-effector is required to follow a designed continuous trajectory (time-parametrized path) in task space. This paper formulates and solves the optimal trajectory planning problem for mobile manipulators under end-effector trajectory continuity constraint, which allows considerations of other constraints and trajectory optimization. To demonstrate our method, a discrete optimal trajectory planning algorithm is proposed to solve mobile 3D printing tasks in multiple experiments.
Variational Quantum Harmonizer: Generating Chord Progressions and Other Sonification Methods with the VQE Algorithm
Authors: Authors: Paulo Vitor Itaboraí, Tim Schwägerl, María Aguado Yáñez, Arianna Crippa, Karl Jansen, Eduardo Reck Miranda, Peter Thomas
Abstract
This work investigates a case study of using physical-based sonification of Quadratic Unconstrained Binary Optimization (QUBO) problems, optimized by the Variational Quantum Eigensolver (VQE) algorithm. The VQE approximates the solution of the problem by using an iterative loop between the quantum computer and a classical optimization routine. This work explores the intermediary statevectors found in each VQE iteration as the means of sonifying the optimization process itself. The implementation was realised in the form of a musical interface prototype named Variational Quantum Harmonizer (VQH), providing potential design strategies for musical applications, focusing on chords, chord progressions, and arpeggios. The VQH can be used both to enhance data visualization or to create artistic pieces. The methodology is also relevant in terms of how an artist would gain intuition towards achieving a desired musical sound by carefully designing QUBO cost functions. Flexible mapping strategies could supply a broad portfolio of sounds for QUBO and quantum-inspired musical compositions, as demonstrated in a case study composition, "Dependent Origination" by Peter Thomas and Paulo Itaborai.
Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
Abstract
Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance remains a challenging task. Given the inherent complexity of neural networks, the simple arithmetic averaging of the obtained local optima models in undesirable results. This paper proposes a {\em soft merging} method that facilitates rapid merging of multiple models, simplifies the merging of specific parts of neural networks, and enhances robustness against malicious models with extreme values. This is achieved by learning gate parameters through a surrogate of the $l_0$ norm using hard concrete distribution without modifying the model weights of the given local optima models. This merging process not only enhances the model performance by converging to a better local optimum, but also minimizes computational costs, offering an efficient and explicit learning process integrated with stochastic gradient descent. Thorough experiments underscore the effectiveness and superior performance of the merged neural networks.
Real-Time Capable Decision Making for Autonomous Driving Using Reachable Sets
Authors: Authors: Niklas Kochdumper, Stanley Bak
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Despite large advances in recent years, real-time capable motion planning for autonomous road vehicles remains a huge challenge. In this work, we present a decision module that is based on set-based reachability analysis: First, we identify all possible driving corridors by computing the reachable set for the longitudinal position of the vehicle along the lanelets of the road network, where lane changes are modeled as discrete events. Next, we select the best driving corridor based on a cost function that penalizes lane changes and deviations from a desired velocity profile. Finally, we generate a reference trajectory inside the selected driving corridor, which can be used to guide or warm start low-level trajectory planners. For the numerical evaluation we combine our decision module with a motion-primitive-based and an optimization-based planner and evaluate the performance on 2000 challenging CommonRoad traffic scenarios as well in the realistic CARLA simulator. The results demonstrate that our decision module is real-time capable and yields significant speed-ups compared to executing a motion planner standalone without a decision module.
Keyword: adam
Linearity of Gray Codes via Schur Product
Authors: Authors: Gustavo Terra Bastos, Maiara F. Bollauf, Agnaldo J. Ferrari
Abstract
We propose an original approach to investigate the linearity of Gray codes obtained from $\mathbb{Z}{2^L}$-additive codes by introducing two related binary codes: the associated and concatenated. Once they are defined, one could perform a straightforward analysis of the Schur product between their codewords and determine the linearity of the respective Gray code. This work expands on earlier contributions from the literature, where the linearity was established with respect to the kernel of a code and/or operations on $\mathbb{Z}{2^L}$. The $\mathbb{Z}{2^L}$-additive codes we apply the Gray map and check the linearity are the well-known Hadamard, simplex, MacDonald, Kerdock, and Preparata codes. We also present a family of Reed-Muller codes that yield to linear Gray codes and perform a computational verification of our proposed method applied to other $\mathbb{Z}{2^L}$-additive codes.
Keyword: gradient
Parallel-mentoring for Offline Model-based Optimization
Authors: Authors: Can Chen, Christopher Beckham, Zixuan Liu, Xue Liu, Christopher Pal
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose \textit{parallel-mentoring} as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, \textit{voting-based pairwise supervision}, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an \textit{adaptive soft-labeling} module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.
Importance-aware Co-teaching for Offline Model-based Optimization
Authors: Authors: Ye Yuan, Can Chen, Zixuan Liu, Willie Neiswanger, Xue Liu
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Abstract
Offline model-based optimization aims to find a design that maximizes a property of interest using only an offline dataset, with applications in robot, protein, and molecule design, among others. A prevalent approach is gradient ascent, where a proxy model is trained on the offline dataset and then used to optimize the design. This method suffers from an out-of-distribution issue, where the proxy is not accurate for unseen designs. To mitigate this issue, we explore using a pseudo-labeler to generate valuable data for fine-tuning the proxy. Specifically, we propose \textit{\textbf{I}mportance-aware \textbf{C}o-\textbf{T}eaching for Offline Model-based Optimization}~(\textbf{ICT}). This method maintains three symmetric proxies with their mean ensemble as the final proxy, and comprises two steps. The first step is \textit{pseudo-label-driven co-teaching}. In this step, one proxy is iteratively selected as the pseudo-labeler for designs near the current optimization point, generating pseudo-labeled data. Subsequently, a co-teaching process identifies small-loss samples as valuable data and exchanges them between the other two proxies for fine-tuning, promoting knowledge transfer. This procedure is repeated three times, with a different proxy chosen as the pseudo-labeler each time, ultimately enhancing the ensemble performance. To further improve accuracy of pseudo-labels, we perform a secondary step of \textit{meta-learning-based sample reweighting}, which assigns importance weights to samples in the pseudo-labeled dataset and updates them via meta-learning. ICT achieves state-of-the-art results across multiple design-bench tasks, achieving the best mean rank of $3.1$ and median rank of $2$, among $15$ methods. Our source code can be found here.
Achieving Autonomous Cloth Manipulation with Optimal Control via Differentiable Physics-Aware Regularization and Safety Constraints
Authors: Authors: Yutong Zhang, Fei Liu, Xiao Liang, Michael Yip
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Abstract
Cloth manipulation is a category of deformable object manipulation of great interest to the robotics community, from applications of automated laundry-folding and home organizing and cleaning to textiles and flexible manufacturing. Despite the desire for automated cloth manipulation, the thin-shell dynamics and under-actuation nature of cloth present significant challenges for robots to effectively interact with them. Many recent works omit explicit modeling in favor of learning-based methods that may yield control policies directly. However, these methods require large training sets that must be collected and curated. In this regard, we create a framework for differentiable modeling of cloth dynamics leveraging an Extended Position-based Dynamics (XPBD) algorithm. Together with the desired control objective, physics-aware regularization terms are designed for better results, including trajectory smoothness and elastic potential energy. In addition, safety constraints, such as avoiding obstacles, can be specified using signed distance functions (SDFs). We formulate the cloth manipulation task with safety constraints as a constrained optimization problem, which can be effectively solved by mainstream gradient-based optimizers thanks to the end-to-end differentiability of our framework. Finally, we assess the proposed framework for manipulation tasks with various safety thresholds and demonstrate the feasibility of result trajectories on a surgical robot. The effects of the regularization terms are analyzed in an additional ablation study.
An optimal control deep learning method to design artificial viscosities for Discontinuous Galerkin schemes
Authors: Authors: L éo Bois, Emmanuel Franck, Laurent Navoret, Vincent Vigon
Abstract
In this paper, we propose a method for constructing a neural network viscosity in order to reduce the non-physical oscillations generated by high-order Discontiuous Galerkin (DG) methods. To this end, the problem is reformulated as an optimal control problem for which the control is the viscosity function and the cost function involves comparison with a reference solution after several compositions of the scheme. The learning process is strongly based on gradient backpropagation tools. Numerical simulations show that the artificial viscosities constructed in this way are just as good or better than those used in the literatur
An elastic properties-based topology optimization algorithm for linear orthotropic, functionally graded materials
Authors: Authors: Ismael Ben-Yelun, Víctor Riera, Luis Saucedo-Mora, Miguel Ángel Sanz, Francisco Javier Montáns
Abstract
Topology optimization (TO) has experienced a dramatic development over the last decades aided by the arising of metamaterials and additive manufacturing (AM) techniques, and it is intended to achieve the current and future challenges. In this paper we propose an extension for linear orthotropic materials of a three-dimensional TO algorithm which directly operates on the six elastic properties -- three longitudinal and shear moduli, having fixed three Poisson ratios -- of the finite element (FE) discretization of certain analysis domain. By performing a gradient-descent-alike optimization on these properties, the standard deviation of a strain-energy measurement is minimized, thus coming up with optimized, strain-homogenized structures with variable longitudinal and shear stiffness in their different material directions. To this end, an orthotropic formulation with two approaches -- direct or strain-based and complementary or stress-based -- has been developed for this optimization problem, being the stress-based more efficient as previous works on this topic have shown. The key advantages that we propose are: (1) the use of orthotropic ahead of isotropic materials, which enables a more versatile optimization process since the design space is increased by six times, and (2) no constraint needs to be imposed (such as maximum volume) in contrast to other methods widely used in this field such as Solid Isotropic Material with Penalization (SIMP), all of this by setting one unique hyper-parameter. Results of four designed load cases show that this orthotropic-TO algorithm outperforms the isotropic case, both for the similar algorithm from which this is an extension and for a SIMP run in a FE commercial software, presenting a comparable computational cost. We remark that it works particularly effectively on pure shear or shear-governed problems such as torsion loading.
Precision in Building Extraction: Comparing Shallow and Deep Models using LiDAR Data
Authors: Authors: Muhammad Sulaiman, Mina Farmanbar, Ahmed Nabil Belbachir, Chunming Rong
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Building segmentation is essential in infrastructure development, population management, and geological observations. This article targets shallow models due to their interpretable nature to assess the presence of LiDAR data for supervised segmentation. The benchmark data used in this article are published in NORA MapAI competition for deep learning model. Shallow models are compared with deep learning models based on Intersection over Union (IoU) and Boundary Intersection over Union (BIoU). In the proposed work, boundary masks from the original mask are generated to improve the BIoU score, which relates to building shapes' borderline. The influence of LiDAR data is tested by training the model with only aerial images in task 1 and a combination of aerial and LiDAR data in task 2 and then compared. shallow models outperform deep learning models in IoU by 8% using aerial images (task 1) only and 2% in combined aerial images and LiDAR data (task 2). In contrast, deep learning models show better performance on BIoU scores. Boundary masks improve BIoU scores by 4% in both tasks. Light Gradient-Boosting Machine (LightGBM) performs better than RF and Extreme Gradient Boosting (XGBoost).
S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees
Authors: Authors: Moritz Kirsche, Thorsten Peinemann, Joshua Stock, Carlos Cotrini, Esfandiar Mohammadi
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract
Privacy-preserving learning of gradient boosting decision trees (GBDT) has the potential for strong utility-privacy tradeoffs for tabular data, such as census data or medical meta data: classical GBDT learners can extract non-linear patterns from small sized datasets. The state-of-the-art notion for provable privacy-properties is differential privacy, which requires that the impact of single data points is limited and deniable. We introduce a novel differentially private GBDT learner and utilize four main techniques to improve the utility-privacy tradeoff. (1) We use an improved noise scaling approach with tighter accounting of privacy leakage of a decision tree leaf compared to prior work, resulting in noise that in expectation scales with $O(1/n)$, for $n$ data points. (2) We integrate individual R\'enyi filters to our method to learn from data points that have been underutilized during an iterative training process, which -- potentially of independent interest -- results in a natural yet effective insight to learning streams of non-i.i.d. data. (3) We incorporate the concept of random decision tree splits to concentrate privacy budget on learning leaves. (4) We deploy subsampling for privacy amplification. Our evaluation shows for the Abalone dataset ($<4k$ training data points) a $R^2$-score of $0.39$ for $\varepsilon=0.15$, which the closest prior work only achieved for $\varepsilon=10.0$. On the Adult dataset ($50k$ training data points) we achieve test error of $18.7\,\%$ for $\varepsilon=0.07$ which the closest prior work only achieved for $\varepsilon=1.0$. For the Abalone dataset for $\varepsilon=0.54$ we achieve $R^2$-score of $0.47$ which is very close to the $R^2$-score of $0.54$ for the nonprivate version of GBDT. For the Adult dataset for $\varepsilon=0.54$ we achieve test error $17.1\,\%$ which is very close to the test error $13.7\,\%$ of the nonprivate version of GBDT.
Abstract
Existing time-resolved non-line-of-sight (NLOS) imaging methods reconstruct hidden scenes by inverting the optical paths of indirect illumination measured at visible relay surfaces. These methods are prone to reconstruction artifacts due to inversion ambiguities and capture noise, which are typically mitigated through the manual selection of filtering functions and parameters. We introduce a fully-differentiable end-to-end NLOS inverse rendering pipeline that self-calibrates the imaging parameters during the reconstruction of hidden scenes, using as input only the measured illumination while working both in the time and frequency domains. Our pipeline extracts a geometric representation of the hidden scene from NLOS volumetric intensities and estimates the time-resolved illumination at the relay wall produced by such geometric information using differentiable transient rendering. We then use gradient descent to optimize imaging parameters by minimizing the error between our simulated time-resolved illumination and the measured illumination. Our end-to-end differentiable pipeline couples diffraction-based volumetric NLOS reconstruction with path-space light transport and a simple ray marching technique to extract detailed, dense sets of surface points and normals of hidden scenes. We demonstrate the robustness of our method to consistently reconstruct geometry and albedo, even under significant noise levels.
Clustering-based Domain-Incremental Learning
Authors: Authors: Christiaan Lamers, Rene Vidal, Nabil Belbachir, Niki van Stein, Thomas Baeck, Paris Giampouras
Abstract
We consider the problem of learning multiple tasks in a continual learning setting in which data from different tasks is presented to the learner in a streaming fashion. A key challenge in this setting is the so-called "catastrophic forgetting problem", in which the performance of the learner in an "old task" decreases when subsequently trained on a "new task". Existing continual learning methods, such as Averaged Gradient Episodic Memory (A-GEM) and Orthogonal Gradient Descent (OGD), address catastrophic forgetting by minimizing the loss for the current task without increasing the loss for previous tasks. However, these methods assume the learner knows when the task changes, which is unrealistic in practice. In this paper, we alleviate the need to provide the algorithm with information about task changes by using an online clustering-based approach on a dynamically updated finite pool of samples or gradients. We thereby successfully counteract catastrophic forgetting in one of the hardest settings, namely: domain-incremental learning, a setting for which the problem was previously unsolved. We showcase the benefits of our approach by applying these ideas to projection-based methods, such as A-GEM and OGD, which lead to task-agnostic versions of them. Experiments on real datasets demonstrate the effectiveness of the proposed strategy and its promising performance compared to state-of-the-art methods.
Multi-Task Cooperative Learning via Searching for Flat Minima
Authors: Authors: Fuping Wu, Le Zhang, Yang Sun, Yuanhan Mo, Thomas Nichols, Bartlomiej W. Papiez
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multi-task learning (MTL) has shown great potential in medical image analysis, improving the generalizability of the learned features and the performance in individual tasks. However, most of the work on MTL focuses on either architecture design or gradient manipulation, while in both scenarios, features are learned in a competitive manner. In this work, we propose to formulate MTL as a multi/bi-level optimization problem, and therefore force features to learn from each task in a cooperative approach. Specifically, we update the sub-model for each task alternatively taking advantage of the learned sub-models of the other tasks. To alleviate the negative transfer problem during the optimization, we search for flat minima for the current objective function with regard to features from other tasks. To demonstrate the effectiveness of the proposed approach, we validate our method on three publicly available datasets. The proposed method shows the advantage of cooperative learning, and yields promising results when compared with the state-of-the-art MTL approaches. The code will be available online.
Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
Abstract
Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance remains a challenging task. Given the inherent complexity of neural networks, the simple arithmetic averaging of the obtained local optima models in undesirable results. This paper proposes a {\em soft merging} method that facilitates rapid merging of multiple models, simplifies the merging of specific parts of neural networks, and enhances robustness against malicious models with extreme values. This is achieved by learning gate parameters through a surrogate of the $l_0$ norm using hard concrete distribution without modifying the model weights of the given local optima models. This merging process not only enhances the model performance by converging to a better local optimum, but also minimizes computational costs, offering an efficient and explicit learning process integrated with stochastic gradient descent. Thorough experiments underscore the effectiveness and superior performance of the merged neural networks.
Keyword: sgd
Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
Keyword: optimization
AdBooster: Personalized Ad Creative Generation using Stable Diffusion Outpainting
Ad-load Balancing via Off-policy Learning in a Content Marketplace
EPTQ: Enhanced Post-Training Quantization via Label-Free Hessian
Parallel-mentoring for Offline Model-based Optimization
Importance-aware Co-teaching for Offline Model-based Optimization
Latent Diffusion Models for Structural Component Design
GenLayNeRF: Generalizable Layered Representations with 3D Model Alignment for Multi-Human View Synthesis
Interactively Optimizing Layout Transfer for Vector Graphics
Achieving Autonomous Cloth Manipulation with Optimal Control via Differentiable Physics-Aware Regularization and Safety Constraints
Real-to-Sim Deformable Object Manipulation: Optimizing Physics Models with Residual Mappings for Robotic Surgery
Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework
Large-scale Pretraining Improves Sample Efficiency of Active Learning based Molecule Virtual Screening
RAI4IoE: Responsible AI for Enabling the Internet of Energy
FleXstage: Lightweight Magnetically Levitated Precision Stage with Over-Actuation towards High-Throughput IC Manufacturing
Deep Learning Meets Swarm Intelligence for UAV-Assisted IoT Coverage in Massive MIMO
A Real-Time Multi-Task Learning System for Joint Detection of Face, Facial Landmark and Head Pose
Collaborative Fault-Identification & Reconstruction in Multi-Agent Systems
Joint Beamforming for RIS Aided Full-Duplex Integrated Sensing and Uplink Communication
On-the-Fly SfM: What you capture is What you get
A Machine Learning-oriented Survey on Tiny Machine Learning
State-aware Real-time Tracking and Remote Reconstruction of a Markov Source
Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
Enhancing SAEAs with Unevaluated Solutions: A Case Study of Relation Model for Expensive Optimization
An elastic properties-based topology optimization algorithm for linear orthotropic, functionally graded materials
Multi-Task Cooperative Learning via Searching for Flat Minima
Physics-informed State-space Neural Networks for Transport Phenomena
SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices
0' and
1', AQFP devices serve as excellent carriers for binary neural network (BNN) computations. Although recent research has made initial strides toward developing an AQFP-based BNN accelerator, several critical challenges remain, preventing the design from being a comprehensive solution. In this paper, we propose SupeRBNN, an AQFP-based randomized BNN acceleration framework that leverages software-hardware co-optimization to eventually make the AQFP devices a feasible solution for BNN acceleration. Specifically, we investigate the randomized behavior of the AQFP devices and analyze the impact of crossbar size on current attenuation, subsequently formulating the current amplitude into the values suitable for use in BNN computation. To tackle the accumulation problem and improve overall hardware performance, we propose a stochastic computing-based accumulation module and a clocking scheme adjustment-based circuit optimization method. We validate our SupeRBNN framework across various datasets and network architectures, comparing it with implementations based on different technologies, including CMOS, ReRAM, and superconducting RSFQ/ERSFQ. Experimental results demonstrate that our design achieves an energy efficiency of approximately 7.8x10^4 times higher than that of the ReRAM-based BNN framework while maintaining a similar level of model accuracy. Furthermore, when compared with superconductor-based counterparts, our framework demonstrates at least two orders of magnitude higher energy efficiency.ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems
Planning Optimal Trajectories for Mobile Manipulators under End-effector Trajectory Continuity Constraint
Variational Quantum Harmonizer: Generating Chord Progressions and Other Sonification Methods with the VQE Algorithm
Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
Real-Time Capable Decision Making for Autonomous Driving Using Reachable Sets
Keyword: adam
Linearity of Gray Codes via Schur Product
Keyword: gradient
Parallel-mentoring for Offline Model-based Optimization
Importance-aware Co-teaching for Offline Model-based Optimization
Achieving Autonomous Cloth Manipulation with Optimal Control via Differentiable Physics-Aware Regularization and Safety Constraints
An optimal control deep learning method to design artificial viscosities for Discontinuous Galerkin schemes
An elastic properties-based topology optimization algorithm for linear orthotropic, functionally graded materials
Precision in Building Extraction: Comparing Shallow and Deep Models using LiDAR Data
S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees
Self-Calibrating, Fully Differentiable NLOS Inverse Rendering
Clustering-based Domain-Incremental Learning
Multi-Task Cooperative Learning via Searching for Flat Minima
Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
Keyword: super-resolution
There is no result