New submissions for Mon, 18 Sep 23

Keyword: sgd

There is no result

Keyword: optimization

Landscape-Sketch-Step: An AI/ML-Based Metaheuristic for Surrogate Optimization Problems

Authors: Authors: Rafael Monteiro, Kartik Sau
Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Probability (math.PR)
Arxiv link: https://arxiv.org/abs/2309.07936
Pdf link: https://arxiv.org/pdf/2309.07936
Abstract In this paper, we introduce a new heuristics for global optimization in scenarios where extensive evaluations of the cost function are expensive, inaccessible, or even prohibitive. The method, which we call Landscape-Sketch-and-Step (LSS), combines Machine Learning, Stochastic Optimization, and Reinforcement Learning techniques, relying on historical information from previously sampled points to make judicious choices of parameter values where the cost function should be evaluated at. Unlike optimization by Replica Exchange Monte Carlo methods, the number of evaluations of the cost function required in this approach is comparable to that used by Simulated Annealing, quality that is especially important in contexts like high-throughput computing or high-performance computing tasks, where evaluations are either computationally expensive or take a long time to be performed. The method also differs from standard Surrogate Optimization techniques, for it does not construct a surrogate model that aims at approximating or reconstructing the objective function. We illustrate our method by applying it to low dimensional optimization problems (dimensions 1, 2, 4, and 8) that mimick known difficulties of minimization on rugged energy landscapes often seen in Condensed Matter Physics, where cost functions are rugged and plagued with local minima. When compared to classical Simulated Annealing, the LSS shows an effective acceleration of the optimization process.
Smart Helper-Aided F-RANs: Improving Delay and Reducing Fronthaul Load
Authors: Authors: Hesameddin Mokhtarzadeh, Mohammed S. Al-Abiad, Md Jahangir Hossain, Julian Cheng
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2309.07975
Pdf link: https://arxiv.org/pdf/2309.07975
Abstract In traditional Fog-Radio Access Networks (F-RANs), enhanced remote radio heads (eRRHs) are connected to a macro base station (MBS) through fronthaul links. Deploying a massive number of eRRHs is not always feasible due to site constraints and the cost of fronthaul links. This paper introduces an innovative concept of using smart helpers (SHs) in F-RANs. These SHs do not require fronthaul links and listen to the nearby eRRHs' communications. Then, they smartly select and cache popular content. This capability enables SHs to serve users with frequent on-demand service requests potentially. As such, network operators have the flexibility to easily deploy SHs in various scenarios, such as dense urban areas and temporary public events, to expand their F-RANs and improve the quality of service (QoS). To study the performance of the proposed SH-aided F-RAN, we formulate an optimization problem of minimizing the average transmission delay that jointly optimizes cache resources and user scheduling. To tackle the formulated problem, we develop an innovative multi-stage algorithm that uses a reinforcement learning (RL) framework. Various performance measures, e.g., the average transmission delay, fronthaul load, and cache hit rate of the proposed SH-aided F-RAN are evaluated numerically and compared with those of traditional F-RANs.
Fast Safe Rectangular Corridor-based Online AGV Trajectory Optimization with Obstacle Avoidance
Authors: Authors: Shaoqiang Liang, Songyuan Fa, Zong Chen, Yiqun Li
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.07979
Pdf link: https://arxiv.org/pdf/2309.07979
Abstract Automated Guided Vehicles (AGVs) are widely adopted in various industries due to their efficiency and adaptability. However, safely deploying AGVs in dynamic environments remains a significant challenge. This paper introduces an online trajectory optimization framework, the Fast Safe Rectangular Corridor (FSRC), designed for AGVs in obstacle-rich settings. The primary challenge is efficiently planning trajectories that prioritize safety and collision avoidance. To tackle this challenge, the FSRC algorithm constructs convex regions, represented as rectangular corridors, to address obstacle avoidance constraints within an optimal control problem. This conversion from non-convex to box constraints improves the collision avoidance efficiency and quality. Additionally, the Modified Visibility Graph algorithm speeds up path planning, and a boundary discretization strategy expedites FSRC construction. The framework also includes a dynamic obstacle avoidance strategy for real-time adaptability. Our framework's effectiveness and superiority have been demonstrated in experiments, particularly in computational efficiency (see Fig. \ref{fig:case1} and \ref{fig:case23}). Compared to state-of-the-art frameworks, our trajectory planning framework significantly enhances computational efficiency, ranging from 1 to 2 orders of magnitude (see Table \ref{tab:res}). Notably, the FSRC algorithm outperforms other safe convex corridor-based methods, substantially improving computational efficiency by 1 to 2 orders of magnitude (see Table \ref{tab:FRSC}).
Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures
Authors: Authors: Johnathan Alsop, Shaizeen Aga, Mohamed Ibrahim, Mahzabeen Islam, Andrew Mccrabb, Nuwan Jayasena
Subjects: Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2309.07984
Pdf link: https://arxiv.org/pdf/2309.07984
Abstract Continual demand for memory bandwidth has made it worthwhile for memory vendors to reassess processing in memory (PIM), which enables higher bandwidth by placing compute units in/near-memory. As such, memory vendors have recently proposed commercially viable PIM designs. However, these proposals are largely driven by the needs of (a narrow set of) machine learning (ML) primitives. While such proposals are reasonable given the the growing importance of ML, as memory is a pervasive component, %in this work, we make there is a case for a more inclusive PIM design that can accelerate primitives across domains. In this work, we ascertain the capabilities of commercial PIM proposals to accelerate various primitives across domains. We first begin with outlining a set of characteristics, termed PIM-amenability-test, which aid in assessing if a given primitive is likely to be accelerated by PIM. Next, we apply this test to primitives under study to ascertain efficient data-placement and orchestration to map the primitives to underlying PIM architecture. We observe here that, even though primitives under study are largely PIM-amenable, existing commercial PIM proposals do not realize their performance potential for these primitives. To address this, we identify bottlenecks that arise in PIM execution and propose hardware and software optimizations which stand to broaden the acceleration reach of commercial PIM designs (improving average PIM speedups from 1.12x to 2.49x relative to a GPU baseline). Overall, while we believe emerging commercial PIM proposals add a necessary and complementary design point in the application acceleration space, hardware-software co-design is necessary to deliver their benefits broadly.
An Automated Machine Learning Approach for Detecting Anomalous Peak Patterns in Time Series Data from a Research Watershed in the Northeastern United States Critical Zone
Authors: Authors: Ijaz Ul Haq, Byung Suk Lee, Donna M. Rizzo, Julia N Perdrial
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.07992
Pdf link: https://arxiv.org/pdf/2309.07992
Abstract This paper presents an automated machine learning framework designed to assist hydrologists in detecting anomalies in time series data generated by sensors in a research watershed in the northeastern United States critical zone. The framework specifically focuses on identifying peak-pattern anomalies, which may arise from sensor malfunctions or natural phenomena. However, the use of classification methods for anomaly detection poses challenges, such as the requirement for labeled data as ground truth and the selection of the most suitable deep learning model for the given task and dataset. To address these challenges, our framework generates labeled datasets by injecting synthetic peak patterns into synthetically generated time series data and incorporates an automated hyperparameter optimization mechanism. This mechanism generates an optimized model instance with the best architectural and training parameters from a pool of five selected models, namely Temporal Convolutional Network (TCN), InceptionTime, MiniRocket, Residual Networks (ResNet), and Long Short-Term Memory (LSTM). The selection is based on the user's preferences regarding anomaly detection accuracy and computational cost. The framework employs Time-series Generative Adversarial Networks (TimeGAN) as the synthetic dataset generator. The generated model instances are evaluated using a combination of accuracy and computational cost metrics, including training time and memory, during the anomaly detection process. Performance evaluation of the framework was conducted using a dataset from a watershed, demonstrating consistent selection of the most fitting model instance that satisfies the user's preferences.
Efficient online update of model predictive control in embedded systems using first-order methods
Authors: Authors: Victor Gracia, Pablo Krupa, Teodoro Alamo, Daniel Limon
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.07996
Pdf link: https://arxiv.org/pdf/2309.07996
Abstract Model Predictive Control (MPC) is typically characterized for being computationally demanding, as it requires solving optimization problems online; a particularly relevant point when considering its implementation in embedded systems. To reduce the computational burden of the optimization algorithm, most solvers perform as many offline operations as possible, typically performing the computation and factorization of its expensive matrices offline and then storing them in the embedded system. This improves the efficiency of the solver, with the disadvantage that online changes on some of the ingredients of the MPC formulation require performing these expensive computations online. This article presents an efficient algorithm for the factorization of the key matrix used in several first-order optimization methods applied to linear MPC formulations, allowing its prediction model and cost function matrices to be updated online at the expense of a small computational cost. We show results comparing the proposed approach with other solvers from the literature applied to a linear time-varying system.
A Subspace Framework for ${\mathcal L}_\infty$ Model Reduction
Authors: Authors: Emre Mengi
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.08011
Pdf link: https://arxiv.org/pdf/2309.08011
Abstract We consider the problem of locating a nearest descriptor system of prescribed reduced order to a descriptor system with large order with respect to the ${\mathcal L}\infty$ norm. Widely employed approaches such as the balanced truncation and best Hankel norm approximation for this ${\mathcal L}\infty$ model reduction problem are usually expensive and yield solutions that are not optimal, not even locally. We propose approaches based on the minimization of the ${\mathcal L}\infty$ objective by means of smooth optimization techniques. As we illustrate, direct applications of smooth optimization techniques are not feasible, since the optimization techniques converge at best at a linear rate requiring too many evaluations of the costly ${\mathcal L}\infty$-norm objective to be practical. We replace the original large-scale system with a system of smaller order that interpolates the original system at points on the imaginary axis, and minimize the ${\mathcal L}\infty$ objective after this replacement. The smaller system is refined by interpolating at additional imaginary points determined based on the local minimizer of the ${\mathcal L}\infty$ objective, and the optimization is repeated. We argue the framework converges at a quadratic rate under smoothness and nondegeneracy assumptions, and describe how asymptotic stability constraints on the reduced system sought can be incorporated into our approach. The numerical experiments on benchmark examples illustrate that the approach leads to locally optimal solutions to the ${\mathcal L}_\infty$ model reduction problem, and the convergence occurs quickly for descriptors systems of order a few ten thousands.
Depth Estimation from a Single Optical Encoded Image using a Learned Colored-Coded Aperture
Authors: Authors: Jhon Lopez, Edwin Vargas, Henry Arguello
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08033
Pdf link: https://arxiv.org/pdf/2309.08033
Abstract Depth estimation from a single image of a conventional camera is a challenging task since depth cues are lost during the acquisition process. State-of-the-art approaches improve the discrimination between different depths by introducing a binary-coded aperture (CA) in the lens aperture that generates different coded blur patterns at different depths. Color-coded apertures (CCA) can also produce color misalignment in the captured image which can be utilized to estimate disparity. Leveraging advances in deep learning, more recent works have explored the data-driven design of a diffractive optical element (DOE) for encoding depth information through chromatic aberrations. However, compared with binary CA or CCA, DOEs are more expensive to fabricate and require high-precision devices. Different from previous CCA-based approaches that employ few basic colors, in this work we propose a CCA with a greater number of color filters and richer spectral information to optically encode relevant depth information in a single snapshot. Furthermore, we propose to jointly learn the color-coded aperture (CCA) pattern and a convolutional neural network (CNN) to retrieve depth information by using an end-to-end optimization approach. We demonstrate through different experiments on three different data sets that the designed color-encoding has the potential to remove depth ambiguities and provides better depth estimates compared to state-of-the-art approaches. Additionally, we build a low-cost prototype of our CCA using a photographic film and validate the proposed approach in real scenarios.
Gradient based Grasp Pose Optimization on a NeRF that Approximates Grasp Success
Authors: Authors: Gergely Sóti, Björn Hein, Christian Wurll
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08040
Pdf link: https://arxiv.org/pdf/2309.08040
Abstract Current robotic grasping methods often rely on estimating the pose of the target object, explicitly predicting grasp poses, or implicitly estimating grasp success probabilities. In this work, we propose a novel approach that directly maps gripper poses to their corresponding grasp success values, without considering objectness. Specifically, we leverage a Neural Radiance Field (NeRF) architecture to learn a scene representation and use it to train a grasp success estimator that maps each pose in the robot's task space to a grasp success value. We employ this learned estimator to tune its inputs, i.e., grasp poses, by gradient-based optimization to obtain successful grasp poses. Contrary to other NeRF-based methods which enhance existing grasp pose estimation approaches by relying on NeRF's rendering capabilities or directly estimate grasp poses in a discretized space using NeRF's scene representation capabilities, our approach uniquely sidesteps both the need for rendering and the limitation of discretization. We demonstrate the effectiveness of our approach on four simulated 3DoF (Degree of Freedom) robotic grasping tasks and show that it can generalize to novel objects. Our best model achieves an average translation error of 3mm from valid grasp poses. This work opens the door for future research to apply our approach to higher DoF grasps and real-world scenarios.
A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling
Authors: Authors: Charles Dawson, Chuchu Fan
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.08052
Pdf link: https://arxiv.org/pdf/2309.08052
Abstract Before autonomous systems can be deployed in safety-critical applications, we must be able to understand and verify the safety of these systems. For cases where the risk or cost of real-world testing is prohibitive, we propose a simulation-based framework for a) predicting ways in which an autonomous system is likely to fail and b) automatically adjusting the system's design to preemptively mitigate those failures. We frame this problem through the lens of approximate Bayesian inference and use differentiable simulation for efficient failure case prediction and repair. We apply our approach on a range of robotics and control problems, including optimizing search patterns for robot swarms and reducing the severity of outages in power transmission networks. Compared to optimization-based falsification techniques, our method predicts a more diverse, representative set of failure modes, and we also find that our use of differentiable simulation yields solutions that have up to 10x lower cost and requires up to 2x fewer iterations to converge relative to gradient-free techniques. Code and videos can be found at https://mit-realm.github.io/breaking-things/
MPCGPU: Real-Time Nonlinear Model Predictive Control through Preconditioned Conjugate Gradient on the GPU
Authors: Authors: Emre Adabag, Miloni Atal, William Gerard, Brian Plancher
Subjects: Robotics (cs.RO); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2309.08079
Pdf link: https://arxiv.org/pdf/2309.08079
Abstract Nonlinear Model Predictive Control (NMPC) is a state-of-the-art approach for locomotion and manipulation which leverages trajectory optimization at each control step. While the performance of this approach is computationally bounded, implementations of direct trajectory optimization that use iterative methods to solve the underlying moderately-large and sparse linear systems, are a natural fit for parallel hardware acceleration. In this work, we introduce MPCGPU, a GPU-accelerated, real-time NMPC solver that leverages an accelerated preconditioned conjugate gradient (PCG) linear system solver at its core. We show that MPCGPU increases the scalability and real-time performance of NMPC, solving larger problems, at faster rates. In particular, for tracking tasks using the Kuka IIWA manipulator, MPCGPU is able to scale to kilohertz control rates with trajectories as long as 512 knot points. This is driven by a custom PCG solver which outperforms state-of-the-art, CPU-based, linear system solvers by at least 10x for a majority of solves and 3.6x on average.
MetaF2N: Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces
Authors: Authors: Zhicun Yin, Ming Liu, Xiaoming Li, Hui Yang, Longan Xiao, Wangmeng Zuo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08113
Pdf link: https://arxiv.org/pdf/2309.08113
Abstract Due to their highly structured characteristics, faces are easier to recover than natural scenes for blind image super-resolution. Therefore, we can extract the degradation representation of an image from the low-quality and recovered face pairs. Using the degradation representation, realistic low-quality images can then be synthesized to fine-tune the super-resolution model for the real-world low-quality image. However, such a procedure is time-consuming and laborious, and the gaps between recovered faces and the ground-truths further increase the optimization uncertainty. To facilitate efficient model adaptation towards image-specific degradations, we propose a method dubbed MetaF2N, which leverages the contained Faces to fine-tune model parameters for adapting to the whole Natural image in a Meta-learning framework. The degradation extraction and low-quality image synthesis steps are thus circumvented in our MetaF2N, and it requires only one fine-tuning step to get decent performance. Considering the gaps between the recovered faces and ground-truths, we further deploy a MaskNet for adaptively predicting loss weights at different positions to reduce the impact of low-confidence areas. To evaluate our proposed MetaF2N, we have collected a real-world low-quality dataset with one or multiple faces in each image, and our MetaF2N achieves superior performance on both synthetic and real-world datasets. Source code, pre-trained models, and collected datasets are available at https://github.com/yinzhicun/MetaF2N.
Graph IRs for Impure Higher-Order Languages (Technical Report)
Authors: Authors: Oliver Bračevac, Guannan Wei, Songlin Jia, Supun Abeysinghe, Yuxuan Jiang, Yuyan Bao, Tiark Rompf
Subjects: Programming Languages (cs.PL)
Arxiv link: https://arxiv.org/abs/2309.08118
Pdf link: https://arxiv.org/pdf/2309.08118
Abstract This is a companion report for the OOPSLA 2023 paper of the same title, presenting a detailed end-to-end account of the $\lambda^_{\mathsf{G}}$ graph IR, at a level of detail beyond a regular conference paper. Our first concern is adequacy and soundness of $\lambda^{\mathsf{G}}$, which we derive from a direct-style imperative functional language (a variant of Bao et al.'s $\lambda^$-calculus with reachability types and a simple effect system) by a series of type-preserving translations into a calculus in monadic normalform (MNF). Static reachability types and effects entirely inform $\lambda^{\mathsf{G}}$'s dependency synthesis. We argue for its adequacy by proving its functional properties along with dependency safety via progress and preservation lemmas with respect to a notion of call-by-value (CBV) reduction that checks the observed order of effects. Our second concern is establishing the correctness of $\lambda^_{\mathsf{G}}$'s equational rules that drive compiler optimizations (e.g., DCE, $\lambda$-hoisting, etc.), by proving contextual equivalence using logical relations. A key insight is that the functional properties of dependency synthesis permit a logical relation on $\lambda^{\mathsf{G}}$ in MNF in terms of previously developed logical relations for the direct-style $\lambda^$-calculus. Finally, we also include a longer version of the conference paper's section on code generation and code motion for $\lambda^{\mathsf{G}}$ as implemented in Scala~LMS.
MAVIS: Multi-Camera Augmented Visual-Inertial SLAM using SE2(3) Based Exact IMU Pre-integration
Authors: Authors: Yifu Wang, Yonhon Ng, Inkyu Sa, Alvaro Parra, Cristian Rodriguez, Tao Jun Lin, Hongdong Li
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08142
Pdf link: https://arxiv.org/pdf/2309.08142
Abstract We present a novel optimization-based Visual-Inertial SLAM system designed for multiple partially overlapped camera systems, named MAVIS. Our framework fully exploits the benefits of wide field-of-view from multi-camera systems, and the metric scale measurements provided by an inertial measurement unit (IMU). We introduce an improved IMU pre-integration formulation based on the exponential function of an automorphism of SE_2(3), which can effectively enhance tracking performance under fast rotational motion and extended integration time. Furthermore, we extend conventional front-end tracking and back-end optimization module designed for monocular or stereo setup towards multi-camera systems, and introduce implementation details that contribute to the performance of our system in challenging scenarios. The practical validity of our approach is supported by our experiments on public datasets. Our MAVIS won the first place in all the vision-IMU tracks (single and multi-session SLAM) on Hilti SLAM Challenge 2023 with 1.7 times the score compared to the second place.
Multilingual Sentence-Level Semantic Search using Meta-Distillation Learning
Authors: Authors: Meryem M'hamdi, Jonathan May, Franck Dernoncourt, Trung Bui, Seunghyun Yoon
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2309.08185
Pdf link: https://arxiv.org/pdf/2309.08185
Abstract Multilingual semantic search is the task of retrieving relevant contents to a query expressed in different language combinations. This requires a better semantic understanding of the user's intent and its contextual meaning. Multilingual semantic search is less explored and more challenging than its monolingual or bilingual counterparts, due to the lack of multilingual parallel resources for this task and the need to circumvent "language bias". In this work, we propose an alignment approach: MAML-Align, specifically for low-resource scenarios. Our approach leverages meta-distillation learning based on MAML, an optimization-based Model-Agnostic Meta-Learner. MAML-Align distills knowledge from a Teacher meta-transfer model T-MAML, specialized in transferring from monolingual to bilingual semantic search, to a Student model S-MAML, which meta-transfers from bilingual to multilingual semantic search. To the best of our knowledge, we are the first to extend meta-distillation to a multilingual search application. Our empirical results show that on top of a strong baseline based on sentence transformers, our meta-distillation approach boosts the gains provided by MAML and significantly outperforms naive fine-tuning methods. Furthermore, multilingual meta-distillation learning improves generalization even to unseen languages.
Gaussian Processes with Linear Multiple Kernel: Spectrum Design and Distributed Learning for Multi-Dimensional Data
Authors: Authors: Richard Cornelius Suwandi, Zhidi Lin, Feng Yin
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.08201
Pdf link: https://arxiv.org/pdf/2309.08201
Abstract Gaussian processes (GPs) have emerged as a prominent technique for machine learning and signal processing. A key component in GP modeling is the choice of kernel, and linear multiple kernels (LMKs) have become an attractive kernel class due to their powerful modeling capacity and interpretability. This paper focuses on the grid spectral mixture (GSM) kernel, an LMK that can approximate arbitrary stationary kernels. Specifically, we propose a novel GSM kernel formulation for multi-dimensional data that reduces the number of hyper-parameters compared to existing formulations, while also retaining a favorable optimization structure and approximation capability. In addition, to make the large-scale hyper-parameter optimization in the GSM kernel tractable, we first introduce the distributed SCA (DSCA) algorithm. Building on this, we propose the doubly distributed SCA (D$^2$SCA) algorithm based on the alternating direction method of multipliers (ADMM) framework, which allows us to cooperatively learn the GSM kernel in the context of big data while maintaining data privacy. Furthermore, we tackle the inherent communication bandwidth restriction in distributed frameworks, by quantizing the hyper-parameters in D$^2$SCA, resulting in the quantized doubly distributed SCA (QD$^2$SCA) algorithm. Theoretical analysis establishes convergence guarantees for the proposed algorithms, while experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our methods.
One-stage Modality Distillation for Incomplete Multimodal Learning
Authors: Authors: Shicai Wei, Yang Luo, Chunbo Luo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08204
Pdf link: https://arxiv.org/pdf/2309.08204
Abstract Learning based on multimodal data has attracted increasing interest recently. While a variety of sensory modalities can be collected for training, not all of them are always available in development scenarios, which raises the challenge to infer with incomplete modality. To address this issue, this paper presents a one-stage modality distillation framework that unifies the privileged knowledge transfer and modality information fusion into a single optimization procedure via multi-task learning. Compared with the conventional modality distillation that performs them independently, this helps to capture the valuable representation that can assist the final model inference directly. Specifically, we propose the joint adaptation network for the modality transfer task to preserve the privileged information. This addresses the representation heterogeneity caused by input discrepancy via the joint distribution adaptation. Then, we introduce the cross translation network for the modality fusion task to aggregate the restored and available modality features. It leverages the parameters-sharing strategy to capture the cross-modal cues explicitly. Extensive experiments on RGB-D classification and segmentation tasks demonstrate the proposed multimodal inheritance framework can overcome the problem of incomplete modality input in various scenes and achieve state-of-the-art performance.
MTG: Mapless Trajectory Generator with Traversability Coverage for Outdoor Navigation
Authors: Authors: Jing Liang, Peng Gao, Xuesu Xiao, Adarsh Jagan Sathyamoorthy, Mohamed Elnoor, Ming Lin, Dinesh Manocha
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08214
Pdf link: https://arxiv.org/pdf/2309.08214
Abstract We present a novel learning algorithm for trajectory generation for outdoor robot navigation. Our goal is to compute collision-free paths that also satisfies the environment-specific traversability constraints. Our approach is designed for global planning using limited onboard robot perception in mapless environments and ensures comprehensive coverage of all traversable directions. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model that is enhanced with traversability constraints and an optimization formulation used for the coverage. We highlight the benefits of our approach over state-of-the-art trajectory generation approaches and demonstrate its performance in challenging outdoor environments, including around buildings, across intersections, along trails, and in off-road terrain, using a Clearpath Husky and a Boston Dynamics Spot robot. In practice, our approach results in a 6% improvement in coverage of traversable areas and an 89% reduction in trajectory portions residing in non-traversable regions.
PRIEST: Projection Guided Sampling-Based Optimization For Autonomous Navigation
Authors: Authors: Fatemeh Rastgar, Houman Masnavi, Basant Sharma, Alvo Aabloo, Jan Swevers, Arun Kumar Singh
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08235
Pdf link: https://arxiv.org/pdf/2309.08235
Abstract Efficient navigation in unknown and dynamic environments is crucial for expanding the application domain of mobile robots. The core challenge stems from the nonavailability of a feasible global path for guiding optimization-based local planners. As a result, existing local planners often get trapped in poor local minima. In this paper, we present a novel optimizer that can explore multiple homotopies to plan high-quality trajectories over long horizons while still being fast enough for real-time applications. We build on the gradient-free paradigm by augmenting the trajectory sampling strategy with a projection optimization that guides the samples toward a feasible region. As a result, our approach can recover from the frequently encountered pathological cases wherein all the sampled trajectories lie in the high-cost region. Furthermore, we also show that our projection optimization has a highly parallelizable structure that can be easily accelerated over GPUs. We push the state-of-the-art in the following respects. Over the navigation stack of the Robot Operating System (ROS), we show an improvement of 7-13% in success rate and up to two times in total travel time metric. On the same benchmarks and metrics, our approach achieves up to 44% improvement over MPPI and its recent variants. On simple point-to-point navigation tasks, our optimizer is up to two times more reliable than SOTA gradient-based solvers, as well as sampling-based approaches such as the Cross-Entropy Method (CEM) and VPSTO. Codes: https://github.com/fatemeh-rastgar/PRIEST
Optimization of Rank Losses for Image Retrieval
Authors: Authors: Elias Ramzi, Nicolas Audebert, Clément Rambour, André Araujo, Xavier Bitot, Nicolas Thome
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08250
Pdf link: https://arxiv.org/pdf/2309.08250
Abstract In image retrieval, standard evaluation metrics rely on score ranking, \eg average precision (AP), recall at k (R@k), normalized discounted cumulative gain (NDCG). In this work we introduce a general framework for robust and decomposable rank losses optimization. It addresses two major challenges for end-to-end training of deep neural networks with rank losses: non-differentiability and non-decomposability. Firstly we propose a general surrogate for ranking operator, SupRank, that is amenable to stochastic gradient descent. It provides an upperbound for rank losses and ensures robust training. Secondly, we use a simple yet effective loss function to reduce the decomposability gap between the averaged batch approximation of ranking losses and their values on the whole training set. We apply our framework to two standard metrics for image retrieval: AP and R@k. Additionally we apply our framework to hierarchical image retrieval. We introduce an extension of AP, the hierarchical average precision $\mathcal{H}$-AP, and optimize it as well as the NDCG. Finally we create the first hierarchical landmarks retrieval dataset. We use a semi-automatic pipeline to create hierarchical labels, extending the large scale Google Landmarks v2 dataset. The hierarchical dataset is publicly available at https://github.com/cvdfoundation/google-landmark. Code will be released at https://github.com/elias-ramzi/SupRank.
Quantitative and Qualitative Evaluation of Reinforcement Learning Policies for Autonomous Vehicles
Authors: Authors: Laura Ferrarotti, Massimiliano Luca, Gabriele Santin, Giorgio Previati, Gianpiero Mastinu, Elena Campi, Lorenzo Uccello, Antonino Albanese, Praveen Zalaya, Alessandro Roccasalva, Bruno Lepri
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08254
Pdf link: https://arxiv.org/pdf/2309.08254
Abstract Optimizing traffic dynamics in an evolving transportation landscape is crucial, particularly in scenarios where autonomous vehicles (AVs) with varying levels of autonomy coexist with human-driven cars. This paper presents a novel approach to optimizing choices of AVs using Proximal Policy Optimization (PPO), a reinforcement learning algorithm. We learned a policy to minimize traffic jams (i.e., minimize the time to cross the scenario) and to minimize pollution in a roundabout in Milan, Italy. Through empirical analysis, we demonstrate that our approach can reduce time and pollution levels. Furthermore, we qualitatively evaluate the learned policy using a cutting-edge cockpit to assess its performance in near-real-world conditions. To gauge the practicality and acceptability of the policy, we conducted evaluations with human participants using the simulator, focusing on a range of metrics like traffic smoothness and safety perception. In general, our findings show that human-driven vehicles benefit from optimizing AVs dynamics. Also, participants in the study highlighted that the scenario with 80\% AVs is perceived as safer than the scenario with 20\%. The same result is obtained for traffic smoothness perception.
Greedy Optimization of Resistance-based Graph Robustness with Global and Local Edge Insertions
Authors: Authors: Maria Predari (1), Lukas Berner (1), Robert Kooij (2 and 3), Henning Meyerhenke (1) ((1) Department of Computer Science, Humboldt-Universität zu Berlin, (2) Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, (3) UNIT ICT, Strategy & Policy, TNO (Netherlands Organisation for Applied Scientific Research))
Subjects: Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2309.08271
Pdf link: https://arxiv.org/pdf/2309.08271
Abstract The total effective resistance, also called the Kirchhoff index, provides a robustness measure for a graph $G$. We consider two optimization problems of adding $k$ new edges to $G$ such that the resulting graph has minimal total effective resistance (i.e., is most robust) -- one where the new edges can be anywhere in the graph and one where the new edges need to be incident to a specified focus node. The total effective resistance and effective resistances between nodes can be computed using the pseudoinverse of the graph Laplacian. The pseudoinverse may be computed explicitly via pseudoinversion; yet, this takes cubic time in practice and quadratic space. We instead exploit combinatorial and algebraic connections to speed up gain computations in an established generic greedy heuristic. Moreover, we leverage existing randomized techniques to boost the performance of our approaches by introducing a sub-sampling step. Our different graph- and matrix-based approaches are indeed significantly faster than the state-of-the-art greedy algorithm, while their quality remains reasonably high and is often quite close. Our experiments show that we can now process larger graphs for which the application of the state-of-the-art greedy approach was impractical before.
Convergence of ADAM with Constant Step Size in Non-Convex Settings: A Simple Proof
Authors: Authors: Alokendu Mazumder, Bhartendu Kumar, Manan Tayal, Punit Rathore
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.08339
Pdf link: https://arxiv.org/pdf/2309.08339
Abstract In neural network training, RMSProp and ADAM remain widely favoured optimization algorithms. One of the keys to their performance lies in selecting the correct step size, which can significantly influence their effectiveness. It is worth noting that these algorithms performance can vary considerably, depending on the chosen step sizes. Additionally, questions about their theoretical convergence properties continue to be a subject of interest. In this paper, we theoretically analyze a constant stepsize version of ADAM in the non-convex setting. We show sufficient conditions for the stepsize to achieve almost sure asymptotic convergence of the gradients to zero with minimal assumptions. We also provide runtime bounds for deterministic ADAM to reach approximate criticality when working with smooth, non-convex functions.
Achievable Rate of a STAR-RIS Assisted Massive MIMO System Under Spatially-Correlated Channels
Authors: Authors: Anastasios Papazafeiropoulos, Le-Nam Tran, Zaid Abdullah, Pandelis Kourtessis, Symeon Chatzinotas
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2309.08342
Pdf link: https://arxiv.org/pdf/2309.08342
Abstract Reconfigurable intelligent surfaces (RIS)-assisted massive multiple-input multiple-output (mMIMO) is a promising technology for applications in next-generation networks. However, reflecting-only RIS provides limited coverage compared to a simultaneously transmitting and reflecting RIS (STAR-RIS). Hence, in this paper, we focus on the downlink achievable rate and its optimization of a STAR-RIS-assisted mMIMO system. Contrary to previous works on STAR-RIS, we consider mMIMO, correlated fading, and multiple user equipments (UEs) at both sides of the RIS. In particular, we introduce an estimation approach of the aggregated channel with the main benefit of reduced overhead links instead of estimating the individual channels. {Next, leveraging channel hardening in mMIMO and the use-and-forget bounding technique, we obtain an achievable rate in closed-form that only depends on statistical channel state information (CSI). To optimize the amplitudes and phase shifts of the STAR-RIS, we employ a projected gradient ascent method (PGAM) that simultaneously adjusts the amplitudes and phase shifts for both energy splitting (ES) and mode switching (MS) STAR-RIS operation protocols.} By considering large-scale fading, the proposed optimization can be performed every several coherence intervals, which can significantly reduce overhead. Considering that STAR-RIS has twice the number of controllable parameters compared to conventional reflecting-only RIS, this accomplishment offers substantial practical benefits. Simulations are carried out to verify the analytical results, reveal the interplay of the achievable rate with fundamental parameters, and show the superiority of STAR-RIS regarding its achievable rate compared to its reflecting-only counterpart.
Resource Optimization Using A Step-by-step Scheme in Wireless Sensing and Localization Networks
Authors: Authors: Ruihang Zhang, Jiayan Yang, Mu Jia, Tingting Zhang
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2309.08396
Pdf link: https://arxiv.org/pdf/2309.08396
Abstract Due to the lack of wireless spectrum resources, people are focusing on the versatile wireless networks. Wireless localization and target sensing both rely on precise extraction of parameters such as signal amplitude, propagation delay and Doppler shift from the received signals. Due to the high multi-path resolution and strong penetration of UWB signals, both localization and sensing can be achieved through the same UWB waveform. Practical networks are often resource-constrained, in order to improve the accuracy of integrated networks, we need to optimize the allocation of resources in the networks. Considering the complexity of the multi-slot networks, this paper derives the Fisher Information Matrix (FIM) expressions for single-slot and dual-slot integrated sensing and localization (ISAL) networks respectively, and proposes two resource optimization schemes, namely step-by-step scheme and integrated scheme. The numerical results show that: (i) for the sensing-resource-deficient networks with relatively uniform node distribution, the energy allocated to each step in the step-by-step scheme satisfies the relationship: energy for clock offset < energy for radar localization < energy for target sensing. (ii) In the multi-slot ISAL networks, the system will allocate more energy to the time slots where the networks are relatively sensing-resource-deficient. (iii) The step-by-step scheme is more suitable for the sensing-resource-abundant networks, while the integrated scheme is more suitable for the sensing-resource-deficient networks.
Constraint-Free Structure Learning with Smooth Acyclic Orientations
Authors: Authors: Riccardo Massidda, Francesco Landolfi, Martina Cinquini, Davide Bacciu
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.08406
Pdf link: https://arxiv.org/pdf/2309.08406
Abstract The structure learning problem consists of fitting data generated by a Directed Acyclic Graph (DAG) to correctly reconstruct its arcs. In this context, differentiable approaches constrain or regularize the optimization problem using a continuous relaxation of the acyclicity property. The computational cost of evaluating graph acyclicity is cubic on the number of nodes and significantly affects scalability. In this paper we introduce COSMO, a constraint-free continuous optimization scheme for acyclic structure learning. At the core of our method, we define a differentiable approximation of an orientation matrix parameterized by a single priority vector. Differently from previous work, our parameterization fits a smooth orientation matrix and the resulting acyclic adjacency matrix without evaluating acyclicity at any step. Despite the absence of explicit constraints, we prove that COSMO always converges to an acyclic solution. In addition to being asymptotically faster, our empirical analysis highlights how COSMO performance on graph reconstruction compares favorably with competing structure learning methods.
Make Deep Networks Shallow Again
Authors: Authors: Bernhard Bermeitinger, Tomas Hrycej, Siegfried Handschuh
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.08414
Pdf link: https://arxiv.org/pdf/2309.08414
Abstract Deep neural networks have a good success record and are thus viewed as the best architecture choice for complex applications. Their main shortcoming has been, for a long time, the vanishing gradient which prevented the numerical optimization algorithms from acceptable convergence. A breakthrough has been achieved by the concept of residual connections -- an identity mapping parallel to a conventional layer. This concept is applicable to stacks of layers of the same dimension and substantially alleviates the vanishing gradient problem. A stack of residual connection layers can be expressed as an expansion of terms similar to the Taylor expansion. This expansion suggests the possibility of truncating the higher-order terms and receiving an architecture consisting of a single broad layer composed of all initially stacked layers in parallel. In other words, a sequential deep architecture is substituted by a parallel shallow one. Prompted by this theory, we investigated the performance capabilities of the parallel architecture in comparison to the sequential one. The computer vision datasets MNIST and CIFAR10 were used to train both architectures for a total of 6912 combinations of varying numbers of convolutional layers, numbers of filters, kernel sizes, and other meta parameters. Our findings demonstrate a surprising equivalence between the deep (sequential) and shallow (parallel) architectures. Both layouts produced similar results in terms of training and validation set loss. This discovery implies that a wide, shallow architecture can potentially replace a deep network without sacrificing performance. Such substitution has the potential to simplify network architectures, improve optimization efficiency, and accelerate the training process.
TOMAS: Topology Optimization of Multiscale Fluid Devices using Variational Autoencoders and Super-Shapes
Authors: Authors: Rahul Kumar Padhy, Krishnan Suresh, Aaditya Chandrasekhar
Subjects: Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2309.08435
Pdf link: https://arxiv.org/pdf/2309.08435
Abstract In this paper, we present a framework for multiscale topology optimization of fluid-flow devices. The objective is to minimize dissipated power, subject to a desired contact-area. The proposed strategy is to design optimal microstructures in individual finite element cells, while simultaneously optimizing the overall fluid flow. In particular, parameterized super-shape microstructures are chosen here to represent microstructures since they exhibit a wide range of permeability and contact area. To avoid repeated homogenization, a finite set of these super-shapes are analyzed a priori, and a variational autoencoder (VAE) is trained on their fluid constitutive properties (permeability), contact area and shape parameters. The resulting differentiable latent space is integrated with a coordinate neural network to carry out a global multi-scale fluid flow optimization. The latent space enables the use of new microstructures that were not present in the original data-set. The proposed method is illustrated using numerous examples in 2D.
MBAPPE: MCTS-Built-Around Prediction for Planning Explicitly
Authors: Authors: Raphael Chekroun, Thomas Gilles, Marin Toromanoff, Sascha Hornauer, Fabien Moutarde
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08452
Pdf link: https://arxiv.org/pdf/2309.08452
Abstract We present MBAPPE, a novel approach to motion planning for autonomous driving combining tree search with a partially-learned model of the environment. Leveraging the inherent explainable exploration and optimization capabilities of the Monte-Carlo Search Tree (MCTS), our method addresses complex decision-making in a dynamic environment. We propose a framework that combines MCTS with supervised learning, enabling the autonomous vehicle to effectively navigate through diverse scenarios. Experimental results demonstrate the effectiveness and adaptability of our approach, showcasing improved real-time decision-making and collision avoidance. This paper contributes to the field by providing a robust solution for motion planning in autonomous driving systems, enhancing their explainability and reliability.
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Authors: Authors: Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.08532
Pdf link: https://arxiv.org/pdf/2309.08532
Abstract Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. To enable EAs to work on discrete prompts, which are natural language expressions that need to be coherent and human-readable, we connect LLMs with EAs. This approach allows us to simultaneously leverage the powerful language processing capabilities of LLMs and the efficient optimization performance of EAs. Specifically, abstaining from any gradients or parameters, EvoPrompt starts from a population of prompts and iteratively generates new prompts with LLMs based on the evolutionary operators, improving the population based on the development set. We optimize prompts for both closed- and open-source LLMs including GPT-3.5 and Alpaca, on 9 datasets spanning language understanding and generation tasks. EvoPrompt significantly outperforms human-engineered prompts and existing methods for automatic prompt generation by up to 25% and 14% respectively. Furthermore, EvoPrompt demonstrates that connecting LLMs with EAs creates synergies, which could inspire further research on the combination of LLMs and conventional algorithms.
Quadcopter Trajectory Time Minimization and Robust Collision Avoidance via Optimal Time Allocation
Authors: Authors: Zhefan Xu, Kenji Shimada
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08544
Pdf link: https://arxiv.org/pdf/2309.08544
Abstract Autonomous navigation requires robots to generate trajectories for collision avoidance efficiently. Although plenty of previous works have proven successful in generating smooth and spatially collision-free trajectories, their solutions often suffer from suboptimal time efficiency and potential unsafety, particularly when accounting for uncertainties in robot perception and control. To address this issue, this paper presents the Robust Optimal Time Allocation (ROTA) framework. This framework is designed to optimize the time progress of the trajectories temporally, serving as a post-processing tool to enhance trajectory time efficiency and safety under uncertainties. In this study, we begin by formulating a non-convex optimization problem aimed at minimizing trajectory execution time while incorporating constraints on collision probability as the robot approaches obstacles. Subsequently, we introduce the concept of the trajectory braking zone and adopt the chance-constrained formulation for robust collision avoidance in the braking zones. Finally, the non-convex optimization problem is reformulated into a second-order cone programming problem to achieve real-time performance. Through simulations and physical flight experiments, we demonstrate that the proposed approach effectively reduces trajectory execution time while enabling robust collision avoidance in complex environments.
Deep Reinforcement Learning for Efficient and Fair Allocation of Health Care Resources
Authors: Authors: Yikuan Li, Chengsheng Mao, Kaixuan Huang, Hanyin Wang, Zheng Yu, Mengdi Wang, Yuan Luo
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.08560
Pdf link: https://arxiv.org/pdf/2309.08560
Abstract Scarcity of health care resources could result in the unavoidable consequence of rationing. For example, ventilators are often limited in supply, especially during public health emergencies or in resource-constrained health care settings, such as amid the pandemic of COVID-19. Currently, there is no universally accepted standard for health care resource allocation protocols, resulting in different governments prioritizing patients based on various criteria and heuristic-based protocols. In this study, we investigate the use of reinforcement learning for critical care resource allocation policy optimization to fairly and effectively ration resources. We propose a transformer-based deep Q-network to integrate the disease progression of individual patients and the interaction effects among patients during the critical care resource allocation. We aim to improve both fairness of allocation and overall patient outcomes. Our experiments demonstrate that our method significantly reduces excess deaths and achieves a more equitable distribution under different levels of ventilator shortage, when compared to existing severity-based and comorbidity-based methods in use by different governments. Our source code is included in the supplement and will be released on Github upon publication.
Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
Authors: Authors: Fabien Delattre, David Dirnfeld, Phat Nguyen, Stephen Scarano, Michael J. Jones, Pedro Miraldo, Erik Learned-Miller
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08588
Pdf link: https://arxiv.org/pdf/2309.08588
Abstract We present an approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video. While camera rotation estimation is a well-studied problem, no previous methods exhibit both high accuracy and acceptable speed in this setting. Because the setting is not addressed well by other datasets, we provide a new dataset and benchmark, with high-accuracy, rigorously verified ground truth, on 17 video sequences. Methods developed for wide baseline stereo (e.g., 5-point methods) perform poorly on monocular video. On the other hand, methods used in autonomous driving (e.g., SLAM) leverage specific sensor setups, specific motion models, or local optimization strategies (lagging batch processing) and do not generalize well to handheld video. Finally, for dynamic scenes, commonly used robustification techniques like RANSAC require large numbers of iterations, and become prohibitively slow. We introduce a novel generalization of the Hough transform on SO(3) to efficiently and robustly find the camera rotation most compatible with optical flow. Among comparably fast methods, ours reduces error by almost 50\% over the next best, and is more accurate than any method, irrespective of speed. This represents a strong new performance point for crowded scenes, an important setting for computer vision. The code and the dataset are available at https://fabiendelattre.com/robust-rotation-estimation.
Keyword: adam

Convergence of ADAM with Constant Step Size in Non-Convex Settings: A Simple Proof
Authors: Authors: Alokendu Mazumder, Bhartendu Kumar, Manan Tayal, Punit Rathore
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.08339
Pdf link: https://arxiv.org/pdf/2309.08339
Abstract In neural network training, RMSProp and ADAM remain widely favoured optimization algorithms. One of the keys to their performance lies in selecting the correct step size, which can significantly influence their effectiveness. It is worth noting that these algorithms performance can vary considerably, depending on the chosen step sizes. Additionally, questions about their theoretical convergence properties continue to be a subject of interest. In this paper, we theoretically analyze a constant stepsize version of ADAM in the non-convex setting. We show sufficient conditions for the stepsize to achieve almost sure asymptotic convergence of the gradients to zero with minimal assumptions. We also provide runtime bounds for deterministic ADAM to reach approximate criticality when working with smooth, non-convex functions.
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization
Authors: Authors: Jack Foster, Alexandra Brintrup
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.08546
Pdf link: https://arxiv.org/pdf/2309.08546
Abstract The pursuit of long-term autonomy mandates that robotic agents must continuously adapt to their changing environments and learn to solve new tasks. Continual learning seeks to overcome the challenge of catastrophic forgetting, where learning to solve new tasks causes a model to forget previously learnt information. Prior-based continual learning methods are appealing for robotic applications as they are space efficient and typically do not increase in computational complexity as the number of tasks grows. Despite these desirable properties, prior-based approaches typically fail on important benchmarks and consequently are limited in their potential applications compared to their memory-based counterparts. We introduce Bayesian adaptive moment regularization (BAdam), a novel prior-based method that better constrains parameter growth, leading to lower catastrophic forgetting. Our method boasts a range of desirable properties for robotic applications such as being lightweight and task label-free, converging quickly, and offering calibrated uncertainty that is important for safe real-world deployment. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments such as Split MNIST and Split FashionMNIST, and does so without relying on task labels or discrete task boundaries.
Keyword: gradient

Text-to-Image Models for Counterfactual Explanations: a Black-Box Approach
Authors: Authors: Guillaume Jeanneret, Loïc Simon, Frédéric Jurie
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.07944
Pdf link: https://arxiv.org/pdf/2309.07944
Abstract This paper addresses the challenge of generating Counterfactual Explanations (CEs), involving the identification and modification of the fewest necessary features to alter a classifier's prediction for a given image. Our proposed method, Text-to-Image Models for Counterfactual Explanations (TIME), is a black-box counterfactual technique based on distillation. Unlike previous methods, this approach requires solely the image and its prediction, omitting the need for the classifier's structure, parameters, or gradients. Before generating the counterfactuals, TIME introduces two distinct biases into Stable Diffusion in the form of textual embeddings: the context bias, associated with the image's structure, and the class bias, linked to class-specific features learned by the target classifier. After learning these biases, we find the optimal latent code applying the classifier's predicted class token and regenerate the image using the target embedding as conditioning, producing the counterfactual explanation. Extensive empirical studies validate that TIME can generate explanations of comparable effectiveness even when operating within a black-box setting.
Temporal-aware Hierarchical Mask Classification for Video Semantic Segmentation
Authors: Authors: Zhaochong An, Guolei Sun, Zongwei Wu, Hao Tang, Luc Van Gool
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08020
Pdf link: https://arxiv.org/pdf/2309.08020
Abstract Modern approaches have proved the huge potential of addressing semantic segmentation as a mask classification task which is widely used in instance-level segmentation. This paradigm trains models by assigning part of object queries to ground truths via conventional one-to-one matching. However, we observe that the popular video semantic segmentation (VSS) dataset has limited categories per video, meaning less than 10% of queries could be matched to receive meaningful gradient updates during VSS training. This inefficiency limits the full expressive potential of all queries.Thus, we present a novel solution THE-Mask for VSS, which introduces temporal-aware hierarchical object queries for the first time. Specifically, we propose to use a simple two-round matching mechanism to involve more queries matched with minimal cost during training while without any extra cost during inference. To support our more-to-one assignment, in terms of the matching results, we further design a hierarchical loss to train queries with their corresponding hierarchy of primary or secondary. Moreover, to effectively capture temporal information across frames, we propose a temporal aggregation decoder that fits seamlessly into the mask-classification paradigm for VSS. Utilizing temporal-sensitive multi-level queries, our method achieves state-of-the-art performance on the latest challenging VSS benchmark VSPW without bells and whistles.
Gradient based Grasp Pose Optimization on a NeRF that Approximates Grasp Success
Authors: Authors: Gergely Sóti, Björn Hein, Christian Wurll
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08040
Pdf link: https://arxiv.org/pdf/2309.08040
Abstract Current robotic grasping methods often rely on estimating the pose of the target object, explicitly predicting grasp poses, or implicitly estimating grasp success probabilities. In this work, we propose a novel approach that directly maps gripper poses to their corresponding grasp success values, without considering objectness. Specifically, we leverage a Neural Radiance Field (NeRF) architecture to learn a scene representation and use it to train a grasp success estimator that maps each pose in the robot's task space to a grasp success value. We employ this learned estimator to tune its inputs, i.e., grasp poses, by gradient-based optimization to obtain successful grasp poses. Contrary to other NeRF-based methods which enhance existing grasp pose estimation approaches by relying on NeRF's rendering capabilities or directly estimate grasp poses in a discretized space using NeRF's scene representation capabilities, our approach uniquely sidesteps both the need for rendering and the limitation of discretization. We demonstrate the effectiveness of our approach on four simulated 3DoF (Degree of Freedom) robotic grasping tasks and show that it can generalize to novel objects. Our best model achieves an average translation error of 3mm from valid grasp poses. This work opens the door for future research to apply our approach to higher DoF grasps and real-world scenarios.
A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling
Authors: Authors: Charles Dawson, Chuchu Fan
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.08052
Pdf link: https://arxiv.org/pdf/2309.08052
Abstract Before autonomous systems can be deployed in safety-critical applications, we must be able to understand and verify the safety of these systems. For cases where the risk or cost of real-world testing is prohibitive, we propose a simulation-based framework for a) predicting ways in which an autonomous system is likely to fail and b) automatically adjusting the system's design to preemptively mitigate those failures. We frame this problem through the lens of approximate Bayesian inference and use differentiable simulation for efficient failure case prediction and repair. We apply our approach on a range of robotics and control problems, including optimizing search patterns for robot swarms and reducing the severity of outages in power transmission networks. Compared to optimization-based falsification techniques, our method predicts a more diverse, representative set of failure modes, and we also find that our use of differentiable simulation yields solutions that have up to 10x lower cost and requires up to 2x fewer iterations to converge relative to gradient-free techniques. Code and videos can be found at https://mit-realm.github.io/breaking-things/
MPCGPU: Real-Time Nonlinear Model Predictive Control through Preconditioned Conjugate Gradient on the GPU
Authors: Authors: Emre Adabag, Miloni Atal, William Gerard, Brian Plancher
Subjects: Robotics (cs.RO); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2309.08079
Pdf link: https://arxiv.org/pdf/2309.08079
Abstract Nonlinear Model Predictive Control (NMPC) is a state-of-the-art approach for locomotion and manipulation which leverages trajectory optimization at each control step. While the performance of this approach is computationally bounded, implementations of direct trajectory optimization that use iterative methods to solve the underlying moderately-large and sparse linear systems, are a natural fit for parallel hardware acceleration. In this work, we introduce MPCGPU, a GPU-accelerated, real-time NMPC solver that leverages an accelerated preconditioned conjugate gradient (PCG) linear system solver at its core. We show that MPCGPU increases the scalability and real-time performance of NMPC, solving larger problems, at faster rates. In particular, for tracking tasks using the Kuka IIWA manipulator, MPCGPU is able to scale to kilohertz control rates with trajectories as long as 512 knot points. This is driven by a custom PCG solver which outperforms state-of-the-art, CPU-based, linear system solvers by at least 10x for a majority of solves and 3.6x on average.
Multicontinuum homogenization. General theory and applications
Authors: Authors: E. Chung, Y. Efendiev, J. Galvis, W.T. Leung
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2309.08128
Pdf link: https://arxiv.org/pdf/2309.08128
Abstract In this paper, we discuss a general framework for multicontinuum homogenization. Multicontinuum models are widely used in many applications and some derivations for these models are established. In these models, several macroscopic variables at each macroscale point are defined and the resulting multicontinuum equations are formulated. In this paper, we propose a general formulation and associated ingredients that allow performing multicontinuum homogenization. Our derivation consists of several main parts. In the first part, we propose a general expansion, where the solution is expressed via the product of multiple macro variables and associated cell problems. The second part consists of formulating the cell problems. The cell problems are formulated as saddle point problems with constraints for each continua. Defining the continua via test functions, we set the constraints as an integral representation. Finally, substituting the expansion to the original system, we obtain multicontinuum systems. We present an application to the mixed formulation of elliptic equations. This is a challenging system as the system does not have symmetry. We discuss the local problems and various macroscale representations for the solution and its gradient. Using various order approximations, one can obtain different systems of equations. We discuss the applicability of multicontinuum homogenization and relate this to high contrast in the cell problem. Numerical results are presented.
VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference
Authors: Authors: Soumya Banerjee, Vinay K. Verma, Avideep Mukherjee, Deepak Gupta, Vinay P. Namboodiri, Piyush Rai
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08227
Pdf link: https://arxiv.org/pdf/2309.08227
Abstract Lifelong learning, also referred to as continual learning, is the problem of training an AI agent continuously while also preventing it from forgetting its previously acquired knowledge. Most of the existing methods primarily focus on lifelong learning within a static environment and lack the ability to mitigate forgetting in a quickly-changing dynamic environment. Streaming lifelong learning is a challenging setting of lifelong learning with the goal of continuous learning in a dynamic non-stationary environment without forgetting. We introduce a novel approach to lifelong learning, which is streaming, requires a single pass over the data, can learn in a class-incremental manner, and can be evaluated on-the-fly (anytime inference). To accomplish these, we propose virtual gradients for continual representation learning to prevent catastrophic forgetting and leverage an exponential-moving-average-based semantic memory to further enhance performance. Extensive experiments on diverse datasets demonstrate our method's efficacy and superior performance over existing methods.
PRIEST: Projection Guided Sampling-Based Optimization For Autonomous Navigation
Authors: Authors: Fatemeh Rastgar, Houman Masnavi, Basant Sharma, Alvo Aabloo, Jan Swevers, Arun Kumar Singh
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.08235
Pdf link: https://arxiv.org/pdf/2309.08235
Abstract Efficient navigation in unknown and dynamic environments is crucial for expanding the application domain of mobile robots. The core challenge stems from the nonavailability of a feasible global path for guiding optimization-based local planners. As a result, existing local planners often get trapped in poor local minima. In this paper, we present a novel optimizer that can explore multiple homotopies to plan high-quality trajectories over long horizons while still being fast enough for real-time applications. We build on the gradient-free paradigm by augmenting the trajectory sampling strategy with a projection optimization that guides the samples toward a feasible region. As a result, our approach can recover from the frequently encountered pathological cases wherein all the sampled trajectories lie in the high-cost region. Furthermore, we also show that our projection optimization has a highly parallelizable structure that can be easily accelerated over GPUs. We push the state-of-the-art in the following respects. Over the navigation stack of the Robot Operating System (ROS), we show an improvement of 7-13% in success rate and up to two times in total travel time metric. On the same benchmarks and metrics, our approach achieves up to 44% improvement over MPPI and its recent variants. On simple point-to-point navigation tasks, our optimizer is up to two times more reliable than SOTA gradient-based solvers, as well as sampling-based approaches such as the Cross-Entropy Method (CEM) and VPSTO. Codes: https://github.com/fatemeh-rastgar/PRIEST
Optimization of Rank Losses for Image Retrieval
Authors: Authors: Elias Ramzi, Nicolas Audebert, Clément Rambour, André Araujo, Xavier Bitot, Nicolas Thome
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08250
Pdf link: https://arxiv.org/pdf/2309.08250
Abstract In image retrieval, standard evaluation metrics rely on score ranking, \eg average precision (AP), recall at k (R@k), normalized discounted cumulative gain (NDCG). In this work we introduce a general framework for robust and decomposable rank losses optimization. It addresses two major challenges for end-to-end training of deep neural networks with rank losses: non-differentiability and non-decomposability. Firstly we propose a general surrogate for ranking operator, SupRank, that is amenable to stochastic gradient descent. It provides an upperbound for rank losses and ensures robust training. Secondly, we use a simple yet effective loss function to reduce the decomposability gap between the averaged batch approximation of ranking losses and their values on the whole training set. We apply our framework to two standard metrics for image retrieval: AP and R@k. Additionally we apply our framework to hierarchical image retrieval. We introduce an extension of AP, the hierarchical average precision $\mathcal{H}$-AP, and optimize it as well as the NDCG. Finally we create the first hierarchical landmarks retrieval dataset. We use a semi-automatic pipeline to create hierarchical labels, extending the large scale Google Landmarks v2 dataset. The hierarchical dataset is publicly available at https://github.com/cvdfoundation/google-landmark. Code will be released at https://github.com/elias-ramzi/SupRank.
Edge Based Oriented Object Detection
Authors: Authors: Jianghu Shen, Xiaojun Wu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08265
Pdf link: https://arxiv.org/pdf/2309.08265
Abstract In the field of remote sensing, we often utilize oriented bounding boxes (OBB) to bound the objects. This approach significantly reduces the overlap among dense detection boxes and minimizes the inclusion of background content within the bounding boxes. To enhance the detection accuracy of oriented objects, we propose a unique loss function based on edge gradients, inspired by the similarity measurement function used in template matching task. During this process, we address the issues of non-differentiability of the function and the semantic alignment between gradient vectors in ground truth (GT) boxes and predicted boxes (PB). Experimental results show that our proposed loss function achieves $0.6\%$ mAP improvement compared to the commonly used Smooth L1 loss in the baseline algorithm. Additionally, we design an edge-based self-attention module to encourage the detection network to focus more on the object edges. Leveraging these two innovations, we achieve a mAP increase of 1.3% on the DOTA dataset.
Bridging Topic, Domain, and Language Shifts: An Evaluation of Comprehensive Out-of-Distribution Scenarios
Authors: Authors: Andreas Waldis, Iryna Gurevych
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2309.08316
Pdf link: https://arxiv.org/pdf/2309.08316
Abstract Language models (LMs) excel in in-distribution (ID) scenarios where train and test data are independent and identically distributed. However, their performance often degrades in real-world applications like argument mining. Such degradation happens when new topics emerge, or other text domains and languages become relevant. To assess LMs' generalization abilities in such out-of-distribution (OOD) scenarios, we simulate such distribution shifts by deliberately withholding specific instances for testing, as from the social media domain or the topic Solar Energy. Unlike prior studies focusing on specific shifts and metrics in isolation, we comprehensively analyze OOD generalization. We define three metrics to pinpoint generalization flaws and propose eleven classification tasks covering topic, domain, and language shifts. Overall, we find superior performance of prompt-based fine-tuning, notably when train and test splits primarily differ semantically. Simultaneously, in-context learning is more effective than prompt-based or vanilla fine-tuning for tasks when training data embodies heavy discrepancies in label distribution compared to testing data. This reveals a crucial drawback of gradient-based learning: it biases LMs regarding such structural obstacles.
Let's Predict Who Will Move to a New Job
Authors: Authors: Rania Mkhinini Gahar, Adel Hidri, Minyar Sassi Hidri
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.08333
Pdf link: https://arxiv.org/pdf/2309.08333
Abstract Any company's human resources department faces the challenge of predicting whether an applicant will search for a new job or stay with the company. In this paper, we discuss how machine learning (ML) is used to predict who will move to a new job. First, the data is pre-processed into a suitable format for ML models. To deal with categorical features, data encoding is applied and several MLA (ML Algorithms) are performed including Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), and eXtreme Gradient Boosting (XGBoost). To improve the performance of ML models, the synthetic minority oversampling technique (SMOTE) is used to retain them. Models are assessed using decision support metrics such as precision, recall, F1-Score, and accuracy.
Convergence of ADAM with Constant Step Size in Non-Convex Settings: A Simple Proof
Authors: Authors: Alokendu Mazumder, Bhartendu Kumar, Manan Tayal, Punit Rathore
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.08339
Pdf link: https://arxiv.org/pdf/2309.08339
Abstract In neural network training, RMSProp and ADAM remain widely favoured optimization algorithms. One of the keys to their performance lies in selecting the correct step size, which can significantly influence their effectiveness. It is worth noting that these algorithms performance can vary considerably, depending on the chosen step sizes. Additionally, questions about their theoretical convergence properties continue to be a subject of interest. In this paper, we theoretically analyze a constant stepsize version of ADAM in the non-convex setting. We show sufficient conditions for the stepsize to achieve almost sure asymptotic convergence of the gradients to zero with minimal assumptions. We also provide runtime bounds for deterministic ADAM to reach approximate criticality when working with smooth, non-convex functions.
Achievable Rate of a STAR-RIS Assisted Massive MIMO System Under Spatially-Correlated Channels
Authors: Authors: Anastasios Papazafeiropoulos, Le-Nam Tran, Zaid Abdullah, Pandelis Kourtessis, Symeon Chatzinotas
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2309.08342
Pdf link: https://arxiv.org/pdf/2309.08342
Abstract Reconfigurable intelligent surfaces (RIS)-assisted massive multiple-input multiple-output (mMIMO) is a promising technology for applications in next-generation networks. However, reflecting-only RIS provides limited coverage compared to a simultaneously transmitting and reflecting RIS (STAR-RIS). Hence, in this paper, we focus on the downlink achievable rate and its optimization of a STAR-RIS-assisted mMIMO system. Contrary to previous works on STAR-RIS, we consider mMIMO, correlated fading, and multiple user equipments (UEs) at both sides of the RIS. In particular, we introduce an estimation approach of the aggregated channel with the main benefit of reduced overhead links instead of estimating the individual channels. {Next, leveraging channel hardening in mMIMO and the use-and-forget bounding technique, we obtain an achievable rate in closed-form that only depends on statistical channel state information (CSI). To optimize the amplitudes and phase shifts of the STAR-RIS, we employ a projected gradient ascent method (PGAM) that simultaneously adjusts the amplitudes and phase shifts for both energy splitting (ES) and mode switching (MS) STAR-RIS operation protocols.} By considering large-scale fading, the proposed optimization can be performed every several coherence intervals, which can significantly reduce overhead. Considering that STAR-RIS has twice the number of controllable parameters compared to conventional reflecting-only RIS, this accomplishment offers substantial practical benefits. Simulations are carried out to verify the analytical results, reveal the interplay of the achievable rate with fundamental parameters, and show the superiority of STAR-RIS regarding its achievable rate compared to its reflecting-only counterpart.
Double Domain Guided Real-Time Low-Light Image Enhancement for Ultra-High-Definition Transportation Surveillance
Authors: Authors: Jingxiang Qu, Ryan Wen Liu, Yuan Gao, Yu Guo, Fenghua Zhu, Fei-yue Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08382
Pdf link: https://arxiv.org/pdf/2309.08382
Abstract Real-time transportation surveillance is an essential part of the intelligent transportation system (ITS). However, images captured under low-light conditions often suffer the poor visibility with types of degradation, such as noise interference and vague edge features, etc. With the development of imaging devices, the quality of the visual surveillance data is continually increasing, like 2K and 4K, which has more strict requirements on the efficiency of image processing. To satisfy the requirements on both enhancement quality and computational speed, this paper proposes a double domain guided real-time low-light image enhancement network (DDNet) for ultra-high-definition (UHD) transportation surveillance. Specifically, we design an encoder-decoder structure as the main architecture of the learning network. In particular, the enhancement processing is divided into two subtasks (i.e., color enhancement and gradient enhancement) via the proposed coarse enhancement module (CEM) and LoG-based gradient enhancement module (GEM), which are embedded in the encoder-decoder structure. It enables the network to enhance the color and edge features simultaneously. Through the decomposition and reconstruction on both color and gradient domains, our DDNet can restore the detailed feature information concealed by the darkness with better visual quality and efficiency. The evaluation experiments on standard and transportation-related datasets demonstrate that our DDNet provides superior enhancement quality and efficiency compared with the state-of-the-art methods. Besides, the object detection and scene segmentation experiments indicate the practical benefits for higher-level image analysis under low-light environments in ITS.
Make Deep Networks Shallow Again
Authors: Authors: Bernhard Bermeitinger, Tomas Hrycej, Siegfried Handschuh
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.08414
Pdf link: https://arxiv.org/pdf/2309.08414
Abstract Deep neural networks have a good success record and are thus viewed as the best architecture choice for complex applications. Their main shortcoming has been, for a long time, the vanishing gradient which prevented the numerical optimization algorithms from acceptable convergence. A breakthrough has been achieved by the concept of residual connections -- an identity mapping parallel to a conventional layer. This concept is applicable to stacks of layers of the same dimension and substantially alleviates the vanishing gradient problem. A stack of residual connection layers can be expressed as an expansion of terms similar to the Taylor expansion. This expansion suggests the possibility of truncating the higher-order terms and receiving an architecture consisting of a single broad layer composed of all initially stacked layers in parallel. In other words, a sequential deep architecture is substituted by a parallel shallow one. Prompted by this theory, we investigated the performance capabilities of the parallel architecture in comparison to the sequential one. The computer vision datasets MNIST and CIFAR10 were used to train both architectures for a total of 6912 combinations of varying numbers of convolutional layers, numbers of filters, kernel sizes, and other meta parameters. Our findings demonstrate a surprising equivalence between the deep (sequential) and shallow (parallel) architectures. Both layouts produced similar results in terms of training and validation set loss. This discovery implies that a wide, shallow architecture can potentially replace a deep network without sacrificing performance. Such substitution has the potential to simplify network architectures, improve optimization efficiency, and accelerate the training process.
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Authors: Authors: Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.08532
Pdf link: https://arxiv.org/pdf/2309.08532
Abstract Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. To enable EAs to work on discrete prompts, which are natural language expressions that need to be coherent and human-readable, we connect LLMs with EAs. This approach allows us to simultaneously leverage the powerful language processing capabilities of LLMs and the efficient optimization performance of EAs. Specifically, abstaining from any gradients or parameters, EvoPrompt starts from a population of prompts and iteratively generates new prompts with LLMs based on the evolutionary operators, improving the population based on the development set. We optimize prompts for both closed- and open-source LLMs including GPT-3.5 and Alpaca, on 9 datasets spanning language understanding and generation tasks. EvoPrompt significantly outperforms human-engineered prompts and existing methods for automatic prompt generation by up to 25% and 14% respectively. Furthermore, EvoPrompt demonstrates that connecting LLMs with EAs creates synergies, which could inspire further research on the combination of LLMs and conventional algorithms.
Keyword: super-resolution

MetaF2N: Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces
Authors: Authors: Zhicun Yin, Ming Liu, Xiaoming Li, Hui Yang, Longan Xiao, Wangmeng Zuo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.08113
Pdf link: https://arxiv.org/pdf/2309.08113
Abstract Due to their highly structured characteristics, faces are easier to recover than natural scenes for blind image super-resolution. Therefore, we can extract the degradation representation of an image from the low-quality and recovered face pairs. Using the degradation representation, realistic low-quality images can then be synthesized to fine-tune the super-resolution model for the real-world low-quality image. However, such a procedure is time-consuming and laborious, and the gaps between recovered faces and the ground-truths further increase the optimization uncertainty. To facilitate efficient model adaptation towards image-specific degradations, we propose a method dubbed MetaF2N, which leverages the contained Faces to fine-tune model parameters for adapting to the whole Natural image in a Meta-learning framework. The degradation extraction and low-quality image synthesis steps are thus circumvented in our MetaF2N, and it requires only one fine-tuning step to get decent performance. Considering the gaps between the recovered faces and ground-truths, we further deploy a MaskNet for adaptively predicting loss weights at different positions to reduce the impact of low-confidence areas. To evaluate our proposed MetaF2N, we have collected a real-world low-quality dataset with one or multiple faces in each image, and our MetaF2N achieves superior performance on both synthetic and real-world datasets. Source code, pre-trained models, and collected datasets are available at https://github.com/yinzhicun/MetaF2N.

zoq / arxiv-updates

New submissions for Mon, 18 Sep 23 #602

Keyword: sgd

Keyword: optimization

Landscape-Sketch-Step: An AI/ML-Based Metaheuristic for Surrogate Optimization Problems

Smart Helper-Aided F-RANs: Improving Delay and Reducing Fronthaul Load

Fast Safe Rectangular Corridor-based Online AGV Trajectory Optimization with Obstacle Avoidance

Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures

An Automated Machine Learning Approach for Detecting Anomalous Peak Patterns in Time Series Data from a Research Watershed in the Northeastern United States Critical Zone

Efficient online update of model predictive control in embedded systems using first-order methods

A Subspace Framework for ${\mathcal L}_\infty$ Model Reduction

Depth Estimation from a Single Optical Encoded Image using a Learned Colored-Coded Aperture

Gradient based Grasp Pose Optimization on a NeRF that Approximates Grasp Success

A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling

MPCGPU: Real-Time Nonlinear Model Predictive Control through Preconditioned Conjugate Gradient on the GPU

MetaF2N: Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces

Graph IRs for Impure Higher-Order Languages (Technical Report)

MAVIS: Multi-Camera Augmented Visual-Inertial SLAM using SE2(3) Based Exact IMU Pre-integration

Multilingual Sentence-Level Semantic Search using Meta-Distillation Learning

Gaussian Processes with Linear Multiple Kernel: Spectrum Design and Distributed Learning for Multi-Dimensional Data

One-stage Modality Distillation for Incomplete Multimodal Learning

MTG: Mapless Trajectory Generator with Traversability Coverage for Outdoor Navigation

PRIEST: Projection Guided Sampling-Based Optimization For Autonomous Navigation

Optimization of Rank Losses for Image Retrieval

Quantitative and Qualitative Evaluation of Reinforcement Learning Policies for Autonomous Vehicles

Greedy Optimization of Resistance-based Graph Robustness with Global and Local Edge Insertions

Convergence of ADAM with Constant Step Size in Non-Convex Settings: A Simple Proof

Achievable Rate of a STAR-RIS Assisted Massive MIMO System Under Spatially-Correlated Channels

Resource Optimization Using A Step-by-step Scheme in Wireless Sensing and Localization Networks

Constraint-Free Structure Learning with Smooth Acyclic Orientations

Make Deep Networks Shallow Again

TOMAS: Topology Optimization of Multiscale Fluid Devices using Variational Autoencoders and Super-Shapes

MBAPPE: MCTS-Built-Around Prediction for Planning Explicitly

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Quadcopter Trajectory Time Minimization and Robust Collision Avoidance via Optimal Time Allocation

Deep Reinforcement Learning for Efficient and Fair Allocation of Health Care Resources

Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes

Keyword: adam

Convergence of ADAM with Constant Step Size in Non-Convex Settings: A Simple Proof

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

Keyword: gradient

Text-to-Image Models for Counterfactual Explanations: a Black-Box Approach

Temporal-aware Hierarchical Mask Classification for Video Semantic Segmentation

Gradient based Grasp Pose Optimization on a NeRF that Approximates Grasp Success

A Bayesian approach to breaking things: efficiently predicting and repairing failure modes via sampling

MPCGPU: Real-Time Nonlinear Model Predictive Control through Preconditioned Conjugate Gradient on the GPU

Multicontinuum homogenization. General theory and applications

VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference

PRIEST: Projection Guided Sampling-Based Optimization For Autonomous Navigation

Optimization of Rank Losses for Image Retrieval

Edge Based Oriented Object Detection

Bridging Topic, Domain, and Language Shifts: An Evaluation of Comprehensive Out-of-Distribution Scenarios

Let's Predict Who Will Move to a New Job

Convergence of ADAM with Constant Step Size in Non-Convex Settings: A Simple Proof

Achievable Rate of a STAR-RIS Assisted Massive MIMO System Under Spatially-Correlated Channels

Double Domain Guided Real-Time Low-Light Image Enhancement for Ultra-High-Definition Transportation Surveillance

Make Deep Networks Shallow Again

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Keyword: super-resolution

MetaF2N: Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces