New submissions for Wed, 27 Sep 23

Keyword: sgd

Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs

Authors: Authors: Rajat Vadiraj Dwaraknath, Tolga Ergen, Mert Pilanci
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.15096
Pdf link: https://arxiv.org/pdf/2309.15096
Abstract Recently, theoretical analyses of deep neural networks have broadly focused on two directions: 1) Providing insight into neural network training by SGD in the limit of infinite hidden-layer width and infinitesimally small learning rate (also known as gradient flow) via the Neural Tangent Kernel (NTK), and 2) Globally optimizing the regularized training objective via cone-constrained convex reformulations of ReLU networks. The latter research direction also yielded an alternative formulation of the ReLU network, called a gated ReLU network, that is globally optimizable via efficient unconstrained convex programs. In this work, we interpret the convex program for this gated ReLU network as a Multiple Kernel Learning (MKL) model with a weighted data masking feature map and establish a connection to the NTK. Specifically, we show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set. By using iterative reweighting, we improve the weights induced by the NTK to obtain the optimal MKL kernel which is equivalent to the solution of the exact convex reformulation of the gated ReLU network. We also provide several numerical simulations corroborating our theory. Additionally, we provide an analysis of the prediction error of the resulting optimal kernel via consistency results for the group lasso.
SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem
Authors: Authors: Margalit Glasgow
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.15111
Pdf link: https://arxiv.org/pdf/2309.15111
Abstract In this work, we consider the optimization process of minibatch stochastic gradient descent (SGD) on a 2-layer neural network with data separated by a quadratic ground truth function. We prove that with data drawn from the $d$-dimensional Boolean hypercube labeled by the quadratic ``XOR'' function $y = -x_ix_j$, it is possible to train to a population error $o(1)$ with $d \:\text{polylog}(d)$ samples. Our result considers simultaneously training both layers of the two-layer-neural network with ReLU activations via standard minibatch SGD on the logistic loss. To our knowledge, this work is the first to give a sample complexity of $\tilde{O}(d)$ for efficiently learning the XOR function on isotropic data on a standard neural network with standard training. Our main technique is showing that the network evolves in two phases: a $\textit{signal-finding}$ phase where the network is small and many of the neurons evolve independently to find features, and a $\textit{signal-heavy}$ phase, where SGD maintains and balances the features. We leverage the simultaneous training of the layers to show that it is sufficient for only a small fraction of the neurons to learn features, since those neurons will be amplified by the simultaneous growth of their second layer weights.
Keyword: optimization

Integration of Polyimide Flexible PCB Wings in Northeastern Aerobat
Authors: Authors: Yizhe Xu
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.14346
Pdf link: https://arxiv.org/pdf/2309.14346
Abstract The principal aim of this Master's thesis is to propel the optimization of the membrane wing structure of the Northeastern Aerobat through origami techniques and enhancing its capacity for secure hovering within confined spaces. Bio-inspired drones offer distinctive capabilities that pave the way for innovative applications, encompassing wildlife monitoring, precision agriculture, search and rescue operations, as well as the augmentation of residential safety. The evolved noise-reduction mechanisms of birds and insects prove advantageous for drones utilized in tasks like surveillance and wildlife observation, ensuring operation devoid of disturbances. Traditional flying drones equipped with rotary or fixed wings encounter notable constraints when navigating narrow pathways. While rotary and fixed-wing systems are conventionally harnessed for surveillance and reconnaissance, the integration of onboard sensor suites within micro aerial vehicles (MAVs) has garnered interest in vigilantly monitoring hazardous scenarios in residential settings. Notwithstanding the agility and commendable fault tolerance exhibited by systems such as quadrotors in demanding conditions, their inflexible body structures impede collision tolerance, necessitating operational spaces free of collisions. Recent years have witnessed an upsurge in integrating soft and pliable materials into the design of such systems; however, the pursuit of aerodynamic efficiency curtails the utilization of excessively flexible materials for rotor blades or propellers. This thesis introduces a design that integrates polyimide flexible PCBs into the wings of the Aerobat and employs guard design incorporating feedback-driven stabilizers, enabling stable hovering flights within Northeastern's Robotics-Inspired Study and Experimentation (RISE) cage.
Carbon Containers: A System-level Facility for Managing Application-level Carbon Emissions
Authors: Authors: John Thiede, Noman Bashir, David Irwin, Prashant Shenoy
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Emerging Technologies (cs.ET); Operating Systems (cs.OS); Performance (cs.PF); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.14477
Pdf link: https://arxiv.org/pdf/2309.14477
Abstract To reduce their environmental impact, cloud datacenters' are increasingly focused on optimizing applications' carbon-efficiency, or work done per mass of carbon emitted. To facilitate such optimizations, we present Carbon Containers, a simple system-level facility, which extends prior work on power containers, that automatically regulates applications' carbon emissions in response to variations in both their workload's intensity and their energy's carbon-intensity. Specifically, \carbonContainerS enable applications to specify a maximum carbon emissions rate (in g$\cdot$CO$_2$e/hr), and then transparently enforce this rate via a combination of vertical scaling, container migration, and suspend/resume while maximizing either energy-efficiency or performance. Carbon Containers are especially useful for applications that i) must continue running even during high-carbon periods, and ii) execute in regions with few variations in carbon-intensity. These low-variability regions also tend to have high average carbon-intensity, which increases the importance of regulating carbon emissions. We implement a Carbon Containers prototype by extending Linux Containers to incorporate the mechanisms above and evaluate it using real workload traces and carbon-intensity data from multiple regions. We compare Carbon Containers with prior work that regulates carbon emissions by suspending/resuming applications during high/low carbon periods. We show that Carbon Containers are more carbon-efficient and improve performance while maintaining similar carbon emissions.
Bicriteria Approximation Algorithms for the Submodular Cover Problem
Authors: Authors: Wenjing Chen, Victoria G. Crawford
Subjects: Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2309.14558
Pdf link: https://arxiv.org/pdf/2309.14558
Abstract In this paper, we consider the optimization problem Submodular Cover (SCP), which is to find a minimum cardinality subset of a finite universe $U$ such that the value of a submodular function $f$ is above an input threshold $\tau$. In particular, we consider several variants of SCP including the general case, the case where $f$ is additionally assumed to be monotone, and finally the case where $f$ is a regularized monotone submodular function. Our most significant contributions are that: (i) We propose a scalable algorithm for monotone SCP that achieves nearly the same approximation guarantees as the standard greedy algorithm in significantly faster time; (ii) We are the first to develop an algorithm for general SCP that achieves a solution arbitrarily close to being feasible; and finally (iii) we are the first to develop algorithms for regularized SCP. Our algorithms are then demonstrated to be effective in an extensive experimental section on data summarization and graph cut, two applications of SCP.
Generative Escher Meshes
Authors: Authors: Noam Aigerman, Thibault Groueix
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Geometry (cs.CG); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2309.14564
Pdf link: https://arxiv.org/pdf/2309.14564
Abstract This paper proposes a fully-automatic, text-guided generative method for producing periodic, repeating, tile-able 2D art, such as the one seen on floors, mosaics, ceramics, and the work of M.C. Escher. In contrast to the standard concept of a seamless texture, i.e., square images that are seamless when tiled, our method generates non-square tilings which comprise solely of repeating copies of the same object. It achieves this by optimizing both geometry and color of a 2D mesh, in order to generate a non-square tile in the shape and appearance of the desired object, with close to no additional background details. We enable geometric optimization of tilings by our key technical contribution: an unconstrained, differentiable parameterization of the space of all possible tileable shapes for a given symmetry group. Namely, we prove that modifying the laplacian used in a 2D mesh-mapping technique - Orbifold Tutte Embedding - can achieve all possible tiling configurations for a chosen planar symmetry group. We thus consider both the mesh's tile-shape and its texture as optimizable parameters, rendering the textured mesh via a differentiable renderer. We leverage a trained image diffusion model to define a loss on the resulting image, thereby updating the mesh's parameters based on its appearance matching the text prompt. We show our method is able to produce plausible, appealing results, with non-trivial tiles, for a variety of different periodic tiling patterns.
Integrating Higher-Order Dynamics and Roadway-Compliance into Constrained ILQR-based Trajectory Planning for Autonomous Vehicles
Authors: Authors: Hanxiang Li, Jiaqiao Zhang, Sheng Zhu, Dongjian Tang, Donghao Xu
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.14566
Pdf link: https://arxiv.org/pdf/2309.14566
Abstract This paper addresses the advancements in on-road trajectory planning for Autonomous Passenger Vehicles (APV). Trajectory planning aims to produce a globally optimal route for APVs, considering various factors such as vehicle dynamics, constraints, and detected obstacles. Traditional techniques involve a combination of sampling methods followed by optimization algorithms, where the former ensures global awareness and the latter refines for local optima. Notably, the Constrained Iterative Linear Quadratic Regulator (CILQR) optimization algorithm has recently emerged, adapted for APV systems, emphasizing improved safety and comfort. However, existing implementations utilizing the vehicle bicycle kinematic model may not guarantee controllable trajectories. We augment this model by incorporating higher-order terms, including the first and second-order derivatives of curvature and longitudinal jerk. This inclusion facilitates a richer representation in our cost and constraint design. We also address roadway compliance, emphasizing adherence to lane boundaries and directions, which past work often overlooked. Lastly, we adopt a relaxed logarithmic barrier function to address the CILQR's dependency on feasible initial trajectories. The proposed methodology is then validated through simulation and real-world experiment driving scenes in real time.
Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Authors: Authors: Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.14597
Pdf link: https://arxiv.org/pdf/2309.14597
Abstract Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.
Progressive Text-to-3D Generation for Automatic 3D Prototyping
Authors: Authors: Han Yi, Zhedong Zheng, Xiangyu Xu, Tat-seng Chua
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.14600
Pdf link: https://arxiv.org/pdf/2309.14600
Abstract Text-to-3D generation is to craft a 3D object according to a natural language description. This can significantly reduce the workload for manually designing 3D models and provide a more natural way of interaction for users. However, this problem remains challenging in recovering the fine-grained details effectively and optimizing a large-size 3D output efficiently. Inspired by the success of progressive learning, we propose a Multi-Scale Triplane Network (MTN) and a new progressive learning strategy. As the name implies, the Multi-Scale Triplane Network consists of four triplanes transitioning from low to high resolution. The low-resolution triplane could serve as an initial shape for the high-resolution ones, easing the optimization difficulty. To further enable the fine-grained details, we also introduce the progressive learning strategy, which explicitly demands the network to shift its focus of attention from simple coarse-grained patterns to difficult fine-grained patterns. Our experiment verifies that the proposed method performs favorably against existing methods. For even the most challenging descriptions, where most existing methods struggle to produce a viable shape, our proposed method consistently delivers. We aspire for our work to pave the way for automatic 3D prototyping via natural language descriptions.
Feeder bus service design under spatially heterogeneous demand
Authors: Authors: Li Zhen, Weihua Gu
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.14688
Pdf link: https://arxiv.org/pdf/2309.14688
Abstract In rapidly sprawling urban areas and booming intercity express rail networks, efficiently designed feeder bus systems are more essential than ever to transport passengers to and from trunk-line rail terminals. When the feeder service region is sufficiently large, the spatial heterogeneity in demand distribution must be considered. This paper develops continuous approximation models for optimizing a heterogeneous fixed-route feeder network in a rectangular service region next to a rail terminal. Our work enhances previous studies by: (i) optimizing heterogeneous stop spacings along with line spacings and headways; (ii) accounting for passenger boarding and alighting numbers on bus dwell times and patron transfer delays at the rail terminal; and (iii) examining the advantages of asymmetric coordination between trunk and feeder schedules in both service directions. To tackle the increased modeling complexity, we introduce a semi-analytical method that combines analytically derived properties of the optimal solution with an iterative search algorithm. Local transit agencies can readily utilize this approach to design a real fixed-route feeder system. This paper reveals many findings and insights not previously reported. For instance, integrating the heterogeneous stop spacing optimization further reduces the system cost (by 4% under specific operating conditions). The cost savings increase with demand heterogeneity but decrease with the demand rate and service region size. Choosing the layout of feeder lines where buses pick up and drop off passengers along the service region's shorter side also significantly lowers the system cost (by 6% when the service region's aspect ratio is 1 to 2). Furthermore, coordinating trunk and feeder schedules in both service directions yields an additional cost saving of up to 20%.
Learning to Assist Different Wearers in Multitasks: Efficient and Individualized Human-In-the-Loop Adaption Framework for Exoskeleton Robots
Authors: Authors: Yu Chen, Gong Chen, Jing Ye, Chenglong Fu, Bin Liang, Xiang Li
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2309.14720
Pdf link: https://arxiv.org/pdf/2309.14720
Abstract One of the typical purposes of using lower-limb exoskeleton robots is to provide assistance to the wearer by supporting their weight and augmenting their physical capabilities according to a given task and human motion intentions. The generalizability of robots across different wearers in multiple tasks is important to ensure that the robot can provide correct and effective assistance in actual implementation. However, most lower-limb exoskeleton robots exhibit only limited generalizability. Therefore, this paper proposes a human-in-the-loop learning and adaptation framework for exoskeleton robots to improve their performance in various tasks and for different wearers. To suit different wearers, an individualized walking trajectory is generated online using dynamic movement primitives and Bayes optimization. To accommodate various tasks, a task translator is constructed using a neural network to generalize a trajectory to more complex scenarios. These generalization techniques are integrated into a unified variable impedance model, which regulates the exoskeleton to provide assistance while ensuring safety. In addition, an anomaly detection network is developed to quantitatively evaluate the wearer's comfort, which is considered in the trajectory learning procedure and contributes to the relaxation of conflicts in impedance control. The proposed framework is easy to implement, because it requires proprioceptive sensors only to perform and deploy data-efficient learning schemes. This makes the exoskeleton practical for deployment in complex scenarios, accommodating different walking patterns, habits, tasks, and conflicts. Experiments and comparative studies on a lower-limb exoskeleton robot are performed to demonstrate the effectiveness of the proposed framework.
Volumetric Semantically Consistent 3D Panoptic Mapping
Authors: Authors: Yang Miao, Iro Armeni, Marc Pollefeys, Daniel Barath
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.14737
Pdf link: https://arxiv.org/pdf/2309.14737
Abstract We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions. Further improvements are achieved by graph optimization-based semantic labeling and instance refinement. The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics. We also highlight a downfall in the evaluation of recent studies: using the ground truth trajectory as input instead of a SLAM-estimated one substantially affects the accuracy, creating a large gap between the reported results and the actual performance on real-world data.
Markov Chain Mirror Descent On Data Federation
Authors: Authors: Yawei Zhao
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.14775
Pdf link: https://arxiv.org/pdf/2309.14775
Abstract Stochastic optimization methods such as mirror descent have wide applications due to low computational cost. Those methods have been well studied under assumption of the independent and identical distribution, and usually achieve sublinear rate of convergence. However, this assumption may be too strong and unpractical in real application scenarios. Recent researches investigate stochastic gradient descent when instances are sampled from a Markov chain. Unfortunately, few results are known for stochastic mirror descent. In the paper, we propose a new version of stochastic mirror descent termed by MarchOn in the scenario of the federated learning. Given a distributed network, the model iteratively travels from a node to one of its neighbours randomly. Furthermore, we propose a new framework to analyze MarchOn, which yields best rates of convergence for convex, strongly convex, and non-convex loss. Finally, we conduct empirical studies to evaluate the convergence of MarchOn, and validate theoretical results.
RAN Functional Splits in NTN: Architectures and Challenges
Authors: Authors: Riccardo Campana, Carla Amatetti, Alessandro Vanelli-Coralli
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2309.14810
Pdf link: https://arxiv.org/pdf/2309.14810
Abstract While 5G networks are already being deployed for commercial applications, Academia and industry are focusing their effort on the development and standardization of the next generations of mobile networks, i.e., 5G-Advance and 6G. Beyond 5G networks will revolutionize communications systems providing seamless connectivity, both in time and in space, to a unique ecosystem consisting of the convergence of the digital, physical, and human domains. In this scenario, NonTerrestrial Networks (NTN) will play a crucial role by providing ubiquitous, secure, and resilient infrastructure fully integrated into the overall system. The additional network complexity introduced by the third dimension of the architecture requires the interoperability of different network elements, enabled by the disaggregation and virtualization of network components, their interconnection by standard interfaces and orchestration by data-driven network artificial intelligence. The disaggregation paradigm foresees the division of the radio access network in different virtualized block of functions, introducing the concept of functional split. Wisely selecting the RAN functional split is possible to better exploit the system resources, obtaining costs saving, and to increase the system performances. In this paper, we firstly provide a discussion of the current 6G NTN development in terms of architectural solutions and then, we thoroughly analyze the impact of the typical NTN channel impairments on the available functional splits. Finally, the benefits of introducing the dynamic optimization of the functional split in NTN are analyzed, together with the foreseen challenges.
Generalization of pixel-wise phase estimation by CNN and improvement of phase-unwrapping by MRF optimization for one-shot 3D scan
Authors: Authors: Hiroto Harada, Michihiro Mikamo, Ryo Furukawa, Ryushuke Sagawa, Hiroshi Kawasaki
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.14824
Pdf link: https://arxiv.org/pdf/2309.14824
Abstract Active stereo technique using single pattern projection, a.k.a. one-shot 3D scan, have drawn a wide attention from industry, medical purposes, etc. One severe drawback of one-shot 3D scan is sparse reconstruction. In addition, since spatial pattern becomes complicated for the purpose of efficient embedding, it is easily affected by noise, which results in unstable decoding. To solve the problems, we propose a pixel-wise interpolation technique for one-shot scan, which is applicable to any types of static pattern if the pattern is regular and periodic. This is achieved by U-net which is pre-trained by CG with efficient data augmentation algorithm. In the paper, to further overcome the decoding instability, we propose a robust correspondence finding algorithm based on Markov random field (MRF) optimization. We also propose a shape refinement algorithm based on b-spline and Gaussian kernel interpolation using explicitly detected laser curves. Experiments are conducted to show the effectiveness of the proposed method using real data with strong noises and textures.
Supersonic: Learning to Generate Source Code Optimisations in C/C++
Authors: Authors: Zimin Chen, Sen Fang, Martin Monperrus
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.14846
Pdf link: https://arxiv.org/pdf/2309.14846
Abstract Software optimization refines programs for resource efficiency while preserving functionality. Traditionally, it is a process done by developers and compilers. This paper introduces a third option, automated optimization at the source code level. We present Supersonic, a neural approach targeting minor source code modifications for optimization. Using a seq2seq model, Supersonic is trained on C/C++ program pairs ($x{t}$, $x{t+1}$), where $x{t+1}$ is an optimized version of $x{t}$, and outputs a diff. Supersonic's performance is benchmarked against OpenAI's GPT-3.5-Turbo and GPT-4 on competitive programming tasks. The experiments show that Supersonic not only outperforms both models on the code optimization task, but also minimizes the extent of change with a more than 600x smaller than GPT-3.5-Turbo and 3700x smaller than GPT-4.
Cluster Exploration using Informative Manifold Projections
Authors: Authors: Stavros Gerolymatos, Xenophon Evangelopoulos, Vladimir Gusev, John Y. Goulermas
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2309.14857
Pdf link: https://arxiv.org/pdf/2309.14857
Abstract Dimensionality reduction (DR) is one of the key tools for the visual exploration of high-dimensional data and uncovering its cluster structure in two- or three-dimensional spaces. The vast majority of DR methods in the literature do not take into account any prior knowledge a practitioner may have regarding the dataset under consideration. We propose a novel method to generate informative embeddings which not only factor out the structure associated with different kinds of prior knowledge but also aim to reveal any remaining underlying structure. To achieve this, we employ a linear combination of two objectives: firstly, contrastive PCA that discounts the structure associated with the prior information, and secondly, kurtosis projection pursuit which ensures meaningful data separation in the obtained embeddings. We formulate this task as a manifold optimization problem and validate it empirically across a variety of datasets considering three distinct types of prior knowledge. Lastly, we provide an automated framework to perform iterative visual exploration of high-dimensional data.
ITEM3D: Illumination-Aware Directional Texture Editing for 3D Models
Authors: Authors: Shengqi Liu, Zhuo Chen, Jingnan Gao, Yichao Yan, Wenhan Zhu, Xiaobo Li, Ke Gao, Jiangjiang Lyu, Xiaokang Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.14872
Pdf link: https://arxiv.org/pdf/2309.14872
Abstract Texture editing is a crucial task in 3D modeling that allows users to automatically manipulate the surface materials of 3D models. However, the inherent complexity of 3D models and the ambiguous text description lead to the challenge in this task. To address this challenge, we propose ITEM3D, an illumination-aware model for automatic 3D object editing according to the text prompts. Leveraging the diffusion models and the differentiable rendering, ITEM3D takes the rendered images as the bridge of text and 3D representation, and further optimizes the disentangled texture and environment map. Previous methods adopt the absolute editing direction namely score distillation sampling (SDS) as the optimization objective, which unfortunately results in the noisy appearance and text inconsistency. To solve the problem caused by the ambiguous text, we introduce a relative editing direction, an optimization objective defined by the noise difference between the source and target texts, to release the semantic ambiguity between the texts and images. Additionally, we gradually adjust the direction during optimization to further address the unexpected deviation in the texture domain. Qualitative and quantitative experiments show that our ITEM3D outperforms the state-of-the-art methods on various 3D objects. We also perform text-guided relighting to show explicit control over lighting.
Parallel Multi-Objective Hyperparameter Optimization with Uniform Normalization and Bounded Objectives
Authors: Authors: Romain Egele, Tyler Chang, Yixuan Sun, Venkatram Vishwanath, Prasanna Balaprakash
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2309.14936
Pdf link: https://arxiv.org/pdf/2309.14936
Abstract Machine learning (ML) methods offer a wide range of configurable hyperparameters that have a significant influence on their performance. While accuracy is a commonly used performance objective, in many settings, it is not sufficient. Optimizing the ML models with respect to multiple objectives such as accuracy, confidence, fairness, calibration, privacy, latency, and memory consumption is becoming crucial. To that end, hyperparameter optimization, the approach to systematically optimize the hyperparameters, which is already challenging for a single objective, is even more challenging for multiple objectives. In addition, the differences in objective scales, the failures, and the presence of outlier values in objectives make the problem even harder. We propose a multi-objective Bayesian optimization (MoBO) algorithm that addresses these problems through uniform objective normalization and randomized weights in scalarization. We increase the efficiency of our approach by imposing constraints on the objective to avoid exploring unnecessary configurations (e.g., insufficient accuracy). Finally, we leverage an approach to parallelize the MoBO which results in a 5x speed-up when using 16x more workers.
Minimizing Energy Consumption for 5G NR Beam Management for RedCap Devices
Authors: Authors: Manishika Rawat, Matteo Pagin, Marco Giordani, Louis-Adrien Dufrene, Quentin Lampin, Michele Zorzi
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.14971
Pdf link: https://arxiv.org/pdf/2309.14971
Abstract In 5G New Radio (NR), beam management entails periodic and continuous transmission and reception of control signals in the form of synchronization signal blocks (SSBs), used to perform initial access and/or channel estimation. However, this procedure demands continuous energy consumption, which is particularly challenging to handle for low-cost, low-complexity, and battery-constrained devices, such as RedCap devices to support mid-market Internet of Things (IoT) use cases. In this context, this work aims at reducing the energy consumption during beam management for RedCap devices, while ensuring that the desired Quality of Service (QoS) requirements are met. To do so, we formalize an optimization problem in an Indoor Factory (InF) scenario to select the best beam management parameters, including the beam update periodicity and the beamwidth, to minimize energy consumption based on users' distribution and their speed. The analysis yields the regions of feasibility, i.e., the upper limit(s) on the beam management parameters for RedCap devices, that we use to provide design guidelines accordingly.
Improving Unsupervised Visual Program Inference with Code Rewriting Families
Authors: Authors: Aditya Ganeshan, R. Kenny Jones, Daniel Ritchie
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2309.14972
Pdf link: https://arxiv.org/pdf/2309.14972
Abstract Programs offer compactness and structure that makes them an attractive representation for visual data. We explore how code rewriting can be used to improve systems for inferring programs from visual data. We first propose Sparse Intermittent Rewrite Injection (SIRI), a framework for unsupervised bootstrapped learning. SIRI sparsely applies code rewrite operations over a dataset of training programs, injecting the improved programs back into the training set. We design a family of rewriters for visual programming domains: parameter optimization, code pruning, and code grafting. For three shape programming languages in 2D and 3D, we show that using SIRI with our family of rewriters improves performance: better reconstructions and faster convergence rates, compared with bootstrapped learning methods that do not use rewriters or use them naively. Finally, we demonstrate that our family of rewriters can be effectively used at test time to improve the output of SIRI predictions. For 2D and 3D CSG, we outperform or match the reconstruction performance of recent domain-specific neural architectures, while producing more parsimonious programs that use significantly fewer primitives.
An Ensemble Model for Distorted Images in Real Scenarios
Authors: Authors: Boyuan Ji, Jianchang Huang, Wenzhuo Huang, Shuke He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.14998
Pdf link: https://arxiv.org/pdf/2309.14998
Abstract Image acquisition conditions and environments can significantly affect high-level tasks in computer vision, and the performance of most computer vision algorithms will be limited when trained on distortion-free datasets. Even with updates in hardware such as sensors and deep learning methods, it will still not work in the face of variable conditions in real-world applications. In this paper, we apply the object detector YOLOv7 to detect distorted images from the dataset CDCOCO. Through carefully designed optimizations including data enhancement, detection box ensemble, denoiser ensemble, super-resolution models, and transfer learning, our model achieves excellent performance on the CDCOCO test set. Our denoising detection model can denoise and repair distorted images, making the model useful in a variety of real-world scenarios and environments.
Making PPO even better: Value-Guided Monte-Carlo Tree Search decoding
Authors: Authors: Jiacheng Liu, Andrew Cohen, Ramakanth Pasunuru, Yejin Choi, Hannaneh Hajishirzi, Asli Celikyilmaz
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.15028
Pdf link: https://arxiv.org/pdf/2309.15028
Abstract Inference-time search algorithms such as Monte-Carlo Tree Search (MCTS) may seem unnecessary when generating natural language text based on state-of-the-art reinforcement learning such as Proximal Policy Optimization (PPO). In this paper, we demonstrate that it is possible to get extra mileage out of PPO by integrating MCTS on top. The key idea is not to throw out the value network, a byproduct of PPO training for evaluating partial output sequences, when decoding text out of the policy network. More concretely, we present a novel value-guided decoding algorithm called PPO-MCTS, which can integrate the value network from PPO to work closely with the policy network during inference-time generation. Compared to prior approaches based on MCTS for controlled text generation, the key strength of our approach is to reduce the fundamental mismatch of the scoring mechanisms of the partial outputs between training and test. Evaluation on four text generation tasks demonstrate that PPO-MCTS greatly improves the preferability of generated text compared to the standard practice of using only the PPO policy. Our results demonstrate the promise of search algorithms even on top of the aligned language models from PPO, and the under-explored benefit of the value network.
STAR-RIS Assisted Full-Duplex Communication Networks
Authors: Authors: Abdelhamid Salem, Kai-Kit Wong, Chan-Byoung Chae, Yangyang Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2309.15037
Pdf link: https://arxiv.org/pdf/2309.15037
Abstract Different from conventional reconfigurable intelligent surfaces (RIS), a recent innovation called simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) has emerged, aimed at achieving complete 360-degree coverage in communication networks. Additionally, fullduplex (FD) technology is recognized as a potent approach for enhancing spectral efficiency by enabling simultaneous transmission and reception within the same time and frequency resources. In this study, we investigate the performance of a STAR-RIS-assisted FD communication system. The STAR-RIS is strategically placed at the cell-edge to facilitate communication for users located in this challenging region, while cell-center users can communicate directly with the FD base station (BS). We employ a non-orthogonal multiple access (NOMA) pairing scheme and account for system impairments, such as self-interference at the BS and imperfect successive interference cancellation (SIC). We derive closed-form expressions for the ergodic rates in both the up-link and down-link communications and extend our analysis to bidirectional communication between cell-center and cell-edge users. Furthermore, we formulate an optimization problem aimed at maximizing the ergodic sum-rate. This optimization involves adjusting the amplitudes and phase-shifts of the STAR-RIS elements and allocating total transmit power efficiently. To gain deeper insights into the achievable rates of STAR-RIS-aided FD systems, we explore the impact of various system parameters through numerical results.
SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem
Authors: Authors: Margalit Glasgow
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.15111
Pdf link: https://arxiv.org/pdf/2309.15111
Abstract In this work, we consider the optimization process of minibatch stochastic gradient descent (SGD) on a 2-layer neural network with data separated by a quadratic ground truth function. We prove that with data drawn from the $d$-dimensional Boolean hypercube labeled by the quadratic ``XOR'' function $y = -x_ix_j$, it is possible to train to a population error $o(1)$ with $d \:\text{polylog}(d)$ samples. Our result considers simultaneously training both layers of the two-layer-neural network with ReLU activations via standard minibatch SGD on the logistic loss. To our knowledge, this work is the first to give a sample complexity of $\tilde{O}(d)$ for efficiently learning the XOR function on isotropic data on a standard neural network with standard training. Our main technique is showing that the network evolves in two phases: a $\textit{signal-finding}$ phase where the network is small and many of the neurons evolve independently to find features, and a $\textit{signal-heavy}$ phase, where SGD maintains and balances the features. We leverage the simultaneous training of the layers to show that it is sufficient for only a small fraction of the neurons to learn features, since those neurons will be amplified by the simultaneous growth of their second layer weights.
Keyword: adam

There is no result

Keyword: gradient

Era Splitting
Authors: Authors: Timothy DeLise
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2309.14496
Pdf link: https://arxiv.org/pdf/2309.14496
Abstract Real life machine learning problems exhibit distributional shifts in the data from one time to another or from on place to another. This behavior is beyond the scope of the traditional empirical risk minimization paradigm, which assumes i.i.d. distribution of data over time and across locations. The emerging field of out-of-distribution (OOD) generalization addresses this reality with new theory and algorithms which incorporate environmental, or era-wise information into the algorithms. So far, most research has been focused on linear models and/or neural networks. In this research we develop two new splitting criteria for decision trees, which allow us to apply ideas from OOD generalization research to decision tree models, including random forest and gradient-boosting decision trees. The new splitting criteria use era-wise information associated with each data point to allow tree-based models to find split points that are optimal across all disjoint eras in the data, instead of optimal over the entire data set pooled together, which is the default setting. We describe the new splitting criteria in detail and develop unique experiments to showcase the benefits of these new criteria, which improve metrics in our experiments out-of-sample. The new criteria are incorporated into the a state-of-the-art gradient boosted decision tree model in the Scikit-Learn code base, which is made freely available.
Byzantine-Resilient Federated PCA and Low Rank Matrix Recovery
Authors: Authors: Ankit Pratap Singh, Namrata Vaswani
Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.14512
Pdf link: https://arxiv.org/pdf/2309.14512
Abstract In this work we consider the problem of estimating the principal subspace (span of the top r singular vectors) of a symmetric matrix in a federated setting, when each node has access to estimates of this matrix. We study how to make this problem Byzantine resilient. We introduce a novel provably Byzantine-resilient, communication-efficient, and private algorithm, called Subspace-Median, to solve it. We also study the most natural solution for this problem, a geometric median based modification of the federated power method, and explain why it is not useful. We consider two special cases of the resilient subspace estimation meta-problem - federated principal components analysis (PCA) and the spectral initialization step of horizontally federated low rank column-wise sensing (LRCCS) in this work. For both these problems we show how Subspace Median provides a resilient solution that is also communication-efficient. Median of Means extensions are developed for both problems. Extensive simulation experiments are used to corroborate our theoretical guarantees. Our second contribution is a complete AltGDmin based algorithm for Byzantine-resilient horizontally federated LRCCS and guarantees for it. We do this by developing a geometric median of means estimator for aggregating the partial gradients computed at each node, and using Subspace Median for initialization.
DifAttack: Query-Efficient Black-Box Attack via Disentangled Feature Space
Authors: Authors: Liu Jun, Zhou Jiantao, Zeng Jiandian, Jinyu Tian
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.14585
Pdf link: https://arxiv.org/pdf/2309.14585
Abstract This work investigates efficient score-based black-box adversarial attacks with a high Attack Success Rate (ASR) and good generalizability. We design a novel attack method based on a Disentangled Feature space, called DifAttack, which differs significantly from the existing ones operating over the entire feature space. Specifically, DifAttack firstly disentangles an image's latent feature into an adversarial feature and a visual feature, where the former dominates the adversarial capability of an image, while the latter largely determines its visual appearance. We train an autoencoder for the disentanglement by using pairs of clean images and their Adversarial Examples (AEs) generated from available surrogate models via white-box attack methods. Eventually, DifAttack iteratively optimizes the adversarial feature according to the query feedback from the victim model until a successful AE is generated, while keeping the visual feature unaltered. In addition, due to the avoidance of using surrogate models' gradient information when optimizing AEs for black-box models, our proposed DifAttack inherently possesses better attack capability in the open-set scenario, where the training dataset of the victim model is unknown. Extensive experimental results demonstrate that our method achieves significant improvements in ASR and query efficiency simultaneously, especially in the targeted attack and open-set scenarios. The code will be available at https://github.com/csjunjun/DifAttack.git soon.
Gray-box Adversarial Attack of Deep Reinforcement Learning-based Trading Agents
Authors: Authors: Foozhan Ataiefard, Hadi Hemmati
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Trading and Market Microstructure (q-fin.TR)
Arxiv link: https://arxiv.org/abs/2309.14615
Pdf link: https://arxiv.org/pdf/2309.14615
Abstract In recent years, deep reinforcement learning (Deep RL) has been successfully implemented as a smart agent in many systems such as complex games, self-driving cars, and chat-bots. One of the interesting use cases of Deep RL is its application as an automated stock trading agent. In general, any automated trading agent is prone to manipulations by adversaries in the trading environment. Thus studying their robustness is vital for their success in practice. However, typical mechanism to study RL robustness, which is based on white-box gradient-based adversarial sample generation techniques (like FGSM), is obsolete for this use case, since the models are protected behind secure international exchange APIs, such as NASDAQ. In this research, we demonstrate that a "gray-box" approach for attacking a Deep RL-based trading agent is possible by trading in the same stock market, with no extra access to the trading agent. In our proposed approach, an adversary agent uses a hybrid Deep Neural Network as its policy consisting of Convolutional layers and fully-connected layers. On average, over three simulated trading market configurations, the adversary policy proposed in this research is able to reduce the reward values by 214.17%, which results in reducing the potential profits of the baseline by 139.4%, ensemble method by 93.7%, and an automated trading software developed by our industrial partner by 85.5%, while consuming significantly less budget than the victims (427.77%, 187.16%, and 66.97%, respectively).
Structure Invariant Transformation for better Adversarial Transferability
Authors: Authors: Xiaosen Wang, Zeliang Zhang, Jianping Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.14700
Pdf link: https://arxiv.org/pdf/2309.14700
Abstract Given the severe vulnerability of Deep Neural Networks (DNNs) against adversarial examples, there is an urgent need for an effective adversarial attack to identify the deficiencies of DNNs in security-sensitive applications. As one of the prevalent black-box adversarial attacks, the existing transfer-based attacks still cannot achieve comparable performance with the white-box attacks. Among these, input transformation based attacks have shown remarkable effectiveness in boosting transferability. In this work, we find that the existing input transformation based attacks transform the input image globally, resulting in limited diversity of the transformed images. We postulate that the more diverse transformed images result in better transferability. Thus, we investigate how to locally apply various transformations onto the input image to improve such diversity while preserving the structure of image. To this end, we propose a novel input transformation based attack, called Structure Invariant Attack (SIA), which applies a random image transformation onto each image block to craft a set of diverse images for gradient calculation. Extensive experiments on the standard ImageNet dataset demonstrate that SIA exhibits much better transferability than the existing SOTA input transformation based attacks on CNN-based and transformer-based models, showing its generality and superiority in boosting transferability. Code is available at https://github.com/xiaosen-wang/SIT.
Effective Multi-Agent Deep Reinforcement Learning Control with Relative Entropy Regularization
Authors: Authors: Chenyang Miao, Yunduan Cui, Huiyun Li, Xinyu Wu
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.14727
Pdf link: https://arxiv.org/pdf/2309.14727
Abstract In this paper, a novel Multi-agent Reinforcement Learning (MARL) approach, Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle the issues of limited capability and sample efficiency in various scenarios controlled by multiple agents. It alleviates the inconsistency of multiple agents' policy updates by introducing the relative entropy regularization to the Centralized Training with Decentralized Execution (CTDE) framework with the Actor-Critic (AC) structure. Evaluated by multi-agent cooperation and competition tasks and traditional control tasks including OpenAI benchmarks and robot arm manipulation, MACDPP demonstrates significant superiority in learning capability and sample efficiency compared with both related multi-agent and widely implemented signal-agent baselines and therefore expands the potential of MARL in effectively learning challenging control scenarios.
Markov Chain Mirror Descent On Data Federation
Authors: Authors: Yawei Zhao
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2309.14775
Pdf link: https://arxiv.org/pdf/2309.14775
Abstract Stochastic optimization methods such as mirror descent have wide applications due to low computational cost. Those methods have been well studied under assumption of the independent and identical distribution, and usually achieve sublinear rate of convergence. However, this assumption may be too strong and unpractical in real application scenarios. Recent researches investigate stochastic gradient descent when instances are sampled from a Markov chain. Unfortunately, few results are known for stochastic mirror descent. In the paper, we propose a new version of stochastic mirror descent termed by MarchOn in the scenario of the federated learning. Given a distributed network, the model iteratively travels from a node to one of its neighbours randomly. Furthermore, we propose a new framework to analyze MarchOn, which yields best rates of convergence for convex, strongly convex, and non-convex loss. Finally, we conduct empirical studies to evaluate the convergence of MarchOn, and validate theoretical results.
3D Density-Gradient based Edge Detection on Neural Radiance Fields (NeRFs) for Geometric Reconstruction
Authors: Authors: Miriam Jäger, Boris Jutzi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.14800
Pdf link: https://arxiv.org/pdf/2309.14800
Abstract Generating geometric 3D reconstructions from Neural Radiance Fields (NeRFs) is of great interest. However, accurate and complete reconstructions based on the density values are challenging. The network output depends on input data, NeRF network configuration and hyperparameter. As a result, the direct usage of density values, e.g. via filtering with global density thresholds, usually requires empirical investigations. Under the assumption that the density increases from non-object to object area, the utilization of density gradients from relative values is evident. As the density represents a position-dependent parameter it can be handled anisotropically, therefore processing of the voxelized 3D density field is justified. In this regard, we address geometric 3D reconstructions based on density gradients, whereas the gradients result from 3D edge detection filters of the first and second derivatives, namely Sobel, Canny and Laplacian of Gaussian. The gradients rely on relative neighboring density values in all directions, thus are independent from absolute magnitudes. Consequently, gradient filters are able to extract edges along a wide density range, almost independent from assumptions and empirical investigations. Our approach demonstrates the capability to achieve geometric 3D reconstructions with high geometric accuracy on object surfaces and remarkable object completeness. Notably, Canny filter effectively eliminates gaps, delivers a uniform point density, and strikes a favorable balance between correctness and completeness across the scenes.
Development of boundary layers in Euler fluids that on "activation'' respond like Navier-Stokes fluids
Authors: Authors: P. A. Gazca-Orozco, J. Málek, K. R. Rajagopal
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2309.14802
Pdf link: https://arxiv.org/pdf/2309.14802
Abstract We consider the flow of a fluid whose response characteristics change due the value of the norm of the symmetric part of the velocity gradient, behaving as an Euler fluid below a critical value and as a Navier-Stokes fluid at and above the critical value, the norm being determined by the external stimuli. We show that such a fluid, while flowing past a bluff body, develops boundary layers which are practically identical to those that one encounters within the context of the classical boundary layer theory propounded by Prandtl. Unlike the classical boundary layer theory that arises as an approximation within the context of the Navier-Stokes theory, here the development of boundary layers is due to a change in the response characteristics of the constitutive relation. We study the flow of such a fluid past an airfoil and compare the same against the solution of the Navier-Stokes equations. We find that the results are in excellent agreement with regard to the velocity and vorticity fields for the two cases.
Evaluating Soccer Match Prediction Models: A Deep Learning Approach and Feature Optimization for Gradient-Boosted Trees
Authors: Authors: Calvin Yeung, Rory Bunker, Rikuhei Umemoto, Keisuke Fujii
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.14807
Pdf link: https://arxiv.org/pdf/2309.14807
Abstract Machine learning models have become increasingly popular for predicting the results of soccer matches, however, the lack of publicly-available benchmark datasets has made model evaluation challenging. The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and loss. The original training set of matches and features, which was provided for the competition, was augmented with additional matches that were played between 4 April and 13 April 2023, representing the period after which the training set ended, but prior to the first matches that were to be predicted (upon which the performance was evaluated). A CatBoost model was employed using pi-ratings as the features, which were initially identified as the optimal choice for calculating the win/draw/loss probabilities. Notably, deep learning models have frequently been disregarded in this particular task. Therefore, in this study, we aimed to assess the performance of a deep learning model and determine the optimal feature set for a gradient-boosted tree model. The model was trained using the most recent five years of data, and three training and validation sets were used in a hyperparameter grid search. The results from the validation sets show that our model had strong performance and stability compared to previously published models from the 2017 Soccer Prediction Challenge for win/draw/loss prediction.
Measurement Models For Sailboats Price vs. Features And Regional Areas
Authors: Authors: Jiaqi Weng, Chunlin Feng, Yihan Shao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2309.14994
Pdf link: https://arxiv.org/pdf/2309.14994
Abstract In this study, we investigated the relationship between sailboat technical specifications and their prices, as well as regional pricing influences. Utilizing a dataset encompassing characteristics like length, beam, draft, displacement, sail area, and waterline, we applied multiple machine learning models to predict sailboat prices. The gradient descent model demonstrated superior performance, producing the lowest MSE and MAE. Our analysis revealed that monohulled boats are generally more affordable than catamarans, and that certain specifications such as length, beam, displacement, and sail area directly correlate with higher prices. Interestingly, lower draft was associated with higher listing prices. We also explored regional price determinants and found that the United States tops the list in average sailboat prices, followed by Europe, Hong Kong, and the Caribbean. Contrary to our initial hypothesis, a country's GDP showed no direct correlation with sailboat prices. Utilizing a 50% cross-validation method, our models yielded consistent results across test groups. Our research offers a machine learning-enhanced perspective on sailboat pricing, aiding prospective buyers in making informed decisions.
Convergence Analysis of Nonlinear Kaczmarz Method for Systems of Nonlinear Equations with Component-wise Convex Mapping
Authors: Authors: Yu Gao, Chong Chen
Subjects: Numerical Analysis (math.NA); Medical Physics (physics.med-ph)
Arxiv link: https://arxiv.org/abs/2309.15003
Pdf link: https://arxiv.org/pdf/2309.15003
Abstract Motivated by a class of nonlinear imaging inverse problems, for instance, multispectral computed tomography (MSCT), this paper studies the convergence theory of the nonlinear Kaczmarz method (NKM) for solving systems of nonlinear equations with component-wise convex mapping, namely, the function corresponding to each equation being convex. Although the tangential cone condition (TCC) is often used to prove the convergence of NKM, it may be impossible or difficult to verify/satisfy this condition for such kind of nonlinear systems. We propose a novel condition named relative gradient discrepancy condition (RGDC), and make use of it to prove the convergence and even the convergence rate of NKM with several general index selection strategies, where these strategies include the cyclic strategy and maximum residual strategy. Particularly, we investigate the application of NKM for solving nonlinear systems in MSCT image reconstruction. We prove that the nonlinear mapping of interest fulfills the proposed RGDC rather than the component-wise local TCC, and provide the global convergence of NKM based on the previously obtained results. Numerical experiments further illustrate the numerical convergence of NKM for MSCT image reconstruction.
HPCR: Holistic Proxy-based Contrastive Replay for Online Continual Learning
Authors: Authors: Huiwei Lin, Shanshan Feng, Baoquan Zhang, Xutao Li, Yew-soon Ong, Yunming Ye
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.15038
Pdf link: https://arxiv.org/pdf/2309.15038
Abstract Online continual learning (OCL) aims to continuously learn new data from a single pass over the online data stream. It generally suffers from the catastrophic forgetting issue. Existing replay-based methods effectively alleviate this issue by replaying part of old data in a proxy-based or contrastive-based replay manner. In this paper, we conduct a comprehensive analysis of these two replay manners and find they can be complementary. Inspired by this finding, we propose a novel replay-based method called proxy-based contrastive replay (PCR), which replaces anchor-to-sample pairs with anchor-to-proxy pairs in the contrastive-based loss to alleviate the phenomenon of forgetting. Based on PCR, we further develop a more advanced method named holistic proxy-based contrastive replay (HPCR), which consists of three components. The contrastive component conditionally incorporates anchor-to-sample pairs to PCR, learning more fine-grained semantic information with a large training batch. The second is a temperature component that decouples the temperature coefficient into two parts based on their impacts on the gradient and sets different values for them to learn more novel knowledge. The third is a distillation component that constrains the learning process to keep more historical knowledge. Experiments on four datasets consistently demonstrate the superiority of HPCR over various state-of-the-art methods.
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs
Authors: Authors: Rajat Vadiraj Dwaraknath, Tolga Ergen, Mert Pilanci
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.15096
Pdf link: https://arxiv.org/pdf/2309.15096
Abstract Recently, theoretical analyses of deep neural networks have broadly focused on two directions: 1) Providing insight into neural network training by SGD in the limit of infinite hidden-layer width and infinitesimally small learning rate (also known as gradient flow) via the Neural Tangent Kernel (NTK), and 2) Globally optimizing the regularized training objective via cone-constrained convex reformulations of ReLU networks. The latter research direction also yielded an alternative formulation of the ReLU network, called a gated ReLU network, that is globally optimizable via efficient unconstrained convex programs. In this work, we interpret the convex program for this gated ReLU network as a Multiple Kernel Learning (MKL) model with a weighted data masking feature map and establish a connection to the NTK. Specifically, we show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set. By using iterative reweighting, we improve the weights induced by the NTK to obtain the optimal MKL kernel which is equivalent to the solution of the exact convex reformulation of the gated ReLU network. We also provide several numerical simulations corroborating our theory. Additionally, we provide an analysis of the prediction error of the resulting optimal kernel via consistency results for the group lasso.
SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem
Authors: Authors: Margalit Glasgow
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2309.15111
Pdf link: https://arxiv.org/pdf/2309.15111
Abstract In this work, we consider the optimization process of minibatch stochastic gradient descent (SGD) on a 2-layer neural network with data separated by a quadratic ground truth function. We prove that with data drawn from the $d$-dimensional Boolean hypercube labeled by the quadratic ``XOR'' function $y = -x_ix_j$, it is possible to train to a population error $o(1)$ with $d \:\text{polylog}(d)$ samples. Our result considers simultaneously training both layers of the two-layer-neural network with ReLU activations via standard minibatch SGD on the logistic loss. To our knowledge, this work is the first to give a sample complexity of $\tilde{O}(d)$ for efficiently learning the XOR function on isotropic data on a standard neural network with standard training. Our main technique is showing that the network evolves in two phases: a $\textit{signal-finding}$ phase where the network is small and many of the neurons evolve independently to find features, and a $\textit{signal-heavy}$ phase, where SGD maintains and balances the features. We leverage the simultaneous training of the layers to show that it is sufficient for only a small fraction of the neurons to learn features, since those neurons will be amplified by the simultaneous growth of their second layer weights.
Keyword: super-resolution

DONNAv2 -- Lightweight Neural Architecture Search for Vision tasks
Authors: Authors: Sweta Priyadarshi, Tianyu Jiang, Hsin-Pai Cheng, Sendil Krishna, Viswanath Ganapathy, Chirag Patel
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2309.14670
Pdf link: https://arxiv.org/pdf/2309.14670
Abstract With the growing demand for vision applications and deployment across edge devices, the development of hardware-friendly architectures that maintain performance during device deployment becomes crucial. Neural architecture search (NAS) techniques explore various approaches to discover efficient architectures for diverse learning tasks in a computationally efficient manner. In this paper, we present the next-generation neural architecture design for computationally efficient neural architecture distillation - DONNAv2 . Conventional NAS algorithms rely on a computationally extensive stage where an accuracy predictor is learned to estimate model performance within search space. This building of accuracy predictors helps them predict the performance of models that are not being finetuned. Here, we have developed an elegant approach to eliminate building the accuracy predictor and extend DONNA to a computationally efficient setting. The loss metric of individual blocks forming the network serves as the surrogate performance measure for the sampled models in the NAS search stage. To validate the performance of DONNAv2 we have performed extensive experiments involving a range of diverse vision tasks including classification, object detection, image denoising, super-resolution, and panoptic perception network (YOLOP). The hardware-in-the-loop experiments were carried out using the Samsung Galaxy S10 mobile platform. Notably, DONNAv2 reduces the computational cost of DONNA by 10x for the larger datasets. Furthermore, to improve the quality of NAS search space, DONNAv2 leverages a block knowledge distillation filter to remove blocks with high inference costs.
An Ensemble Model for Distorted Images in Real Scenarios
Authors: Authors: Boyuan Ji, Jianchang Huang, Wenzhuo Huang, Shuke He
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.14998
Pdf link: https://arxiv.org/pdf/2309.14998
Abstract Image acquisition conditions and environments can significantly affect high-level tasks in computer vision, and the performance of most computer vision algorithms will be limited when trained on distortion-free datasets. Even with updates in hardware such as sensors and deep learning methods, it will still not work in the face of variable conditions in real-world applications. In this paper, we apply the object detector YOLOv7 to detect distorted images from the dataset CDCOCO. Through carefully designed optimizations including data enhancement, detection box ensemble, denoiser ensemble, super-resolution models, and transfer learning, our model achieves excellent performance on the CDCOCO test set. Our denoising detection model can denoise and repair distorted images, making the model useful in a variety of real-world scenarios and environments.
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models
Authors: Authors: Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2309.15103
Pdf link: https://arxiv.org/pdf/2309.15103
Abstract This work aims to learn a high-quality text-to-video (T2V) generative model by leveraging a pre-trained text-to-image (T2I) model as a basis. It is a highly desirable yet challenging task to simultaneously a) accomplish the synthesis of visually realistic and temporally coherent videos while b) preserving the strong creative generation nature of the pre-trained T2I model. To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model. Our key insights are two-fold: 1) We reveal that the incorporation of simple temporal self-attentions, coupled with rotary positional encoding, adequately captures the temporal correlations inherent in video data. 2) Additionally, we validate that the process of joint image-video fine-tuning plays a pivotal role in producing high-quality and creative outcomes. To enhance the performance of LaVie, we contribute a comprehensive and diverse video dataset named Vimeo25M, consisting of 25 million text-video pairs that prioritize quality, diversity, and aesthetic appeal. Extensive experiments demonstrate that LaVie achieves state-of-the-art performance both quantitatively and qualitatively. Furthermore, we showcase the versatility of pre-trained LaVie models in various long video generation and personalized video synthesis applications.

zoq / arxiv-updates

New submissions for Wed, 27 Sep 23 #608

Keyword: sgd

Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs

SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem

Keyword: optimization

Integration of Polyimide Flexible PCB Wings in Northeastern Aerobat

Carbon Containers: A System-level Facility for Managing Application-level Carbon Emissions

Bicriteria Approximation Algorithms for the Submodular Cover Problem

Generative Escher Meshes

Integrating Higher-Order Dynamics and Roadway-Compliance into Constrained ILQR-based Trajectory Planning for Autonomous Vehicles

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Progressive Text-to-3D Generation for Automatic 3D Prototyping

Feeder bus service design under spatially heterogeneous demand

Learning to Assist Different Wearers in Multitasks: Efficient and Individualized Human-In-the-Loop Adaption Framework for Exoskeleton Robots

Volumetric Semantically Consistent 3D Panoptic Mapping

Markov Chain Mirror Descent On Data Federation

RAN Functional Splits in NTN: Architectures and Challenges

Generalization of pixel-wise phase estimation by CNN and improvement of phase-unwrapping by MRF optimization for one-shot 3D scan

Supersonic: Learning to Generate Source Code Optimisations in C/C++

Cluster Exploration using Informative Manifold Projections

ITEM3D: Illumination-Aware Directional Texture Editing for 3D Models

Parallel Multi-Objective Hyperparameter Optimization with Uniform Normalization and Bounded Objectives

Minimizing Energy Consumption for 5G NR Beam Management for RedCap Devices

Improving Unsupervised Visual Program Inference with Code Rewriting Families

An Ensemble Model for Distorted Images in Real Scenarios

Making PPO even better: Value-Guided Monte-Carlo Tree Search decoding

STAR-RIS Assisted Full-Duplex Communication Networks

SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem

Keyword: adam

Keyword: gradient

Era Splitting

Byzantine-Resilient Federated PCA and Low Rank Matrix Recovery

DifAttack: Query-Efficient Black-Box Attack via Disentangled Feature Space

Gray-box Adversarial Attack of Deep Reinforcement Learning-based Trading Agents

Structure Invariant Transformation for better Adversarial Transferability

Effective Multi-Agent Deep Reinforcement Learning Control with Relative Entropy Regularization

Markov Chain Mirror Descent On Data Federation

3D Density-Gradient based Edge Detection on Neural Radiance Fields (NeRFs) for Geometric Reconstruction

Development of boundary layers in Euler fluids that on "activation'' respond like Navier-Stokes fluids

Evaluating Soccer Match Prediction Models: A Deep Learning Approach and Feature Optimization for Gradient-Boosted Trees

Measurement Models For Sailboats Price vs. Features And Regional Areas

Convergence Analysis of Nonlinear Kaczmarz Method for Systems of Nonlinear Equations with Component-wise Convex Mapping

HPCR: Holistic Proxy-based Contrastive Replay for Online Continual Learning

Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs

SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem

Keyword: super-resolution

DONNAv2 -- Lightweight Neural Architecture Search for Vision tasks

An Ensemble Model for Distorted Images in Real Scenarios

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models