New submissions for Wed, 29 Nov 23

Keyword: sgd

There is no result

Keyword: optimization

Community Battery Energy Storage Systems for Enhancing Distribution System Operation: A Multi-objective Optimization Approach

Authors: Authors: Yunqi Wang, Hao Wang, Markus Wagner, Ariel Liebman
Subjects: Networking and Internet Architecture (cs.NI); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.16110
Pdf link: https://arxiv.org/pdf/2311.16110
Abstract The growing penetration of distributed energy resources (DERs) in distribution networks (DNs) raises new operational challenges, particularly in terms of reliability and voltage regulation. In response to these challenges, we introduce an innovative DN operation framework with multi-objective optimization, leveraging community battery energy storage systems (C-BESS). The proposed framework targets two key operational objectives: first, to minimize voltage deviation, which is a concern for a distribution network service provider (DNSP), and second, to maximize the utilization of DERs on the demand side. Recognizing the conflicting nature of these objectives, we utilize C-BESS to enhance the system's adaptability to dynamically adjust DN operations. The multi-objective optimization problem is solved using the non-dominated sorting genetic algorithm-II (NSGA-II). Case studies using real-world data are conducted to validate the effectiveness of the proposed framework. The results show significant improvements in voltage regulation and DER utilization, demonstrating the potential of C-BESS in enabling more reliable DN operation. Our findings contribute to the ongoing discourse on the role of C-BESS in DN operation enhancement and DER integration.
Resource Scheduling for UAVs-aided D2D Networks: A Multi-objective Optimization Approach
Authors: Authors: Hongyang Pan, Yanheng Liu, Geng Sun, Pengfei Wang, Chau Yuen
Subjects: Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.16116
Pdf link: https://arxiv.org/pdf/2311.16116
Abstract Unmanned aerial vehicles (UAVs)-aided device-todevice (D2D) networks have attracted great interests with the development of 5G/6G communications, while there are several challenges about resource scheduling in UAVs-aided D2D networks. In this work, we formulate a UAVs-aided D2D network resource scheduling optimization problem (NetResSOP) to comprehensively consider the number of deployed UAVs, UAV positions, UAV transmission powers, UAV flight velocities, communication channels, and UAV-device pair assignment so as to maximize the D2D network capacity, minimize the number of deployed UAVs, and minimize the average energy consumption over all UAVs simultaneously. The formulated NetResSOP is a mixed-integer programming problem (MIPP) and an NP-hard problem, which means that it is difficult to be solved in polynomial time. Moreover, there are trade-offs between the optimization objectives, and hence it is also difficult to find an optimal solution that can simultaneously make all objectives be optimal. Thus, we propose a non-dominated sorting genetic algorithm-III with a Flexible solution dimension mechanism, a Discrete part generation mechanism, and a UAV number adjustment mechanism (NSGA-III-FDU) for solving the problem comprehensively. Simulation results demonstrate the effectiveness and the stability of the proposed NSGA-III-FDU under different scales and settings of the D2D networks.
SeamlessNeRF: Stitching Part NeRFs with Gradient Propagation
Authors: Authors: Bingchen Gong, Yuehao Wang, Xiaoguang Han, Qi Dou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2311.16127
Pdf link: https://arxiv.org/pdf/2311.16127
Abstract Neural Radiance Fields (NeRFs) have emerged as promising digital mediums of 3D objects and scenes, sparking a surge in research to extend the editing capabilities in this domain. The task of seamless editing and merging of multiple NeRFs, resembling the ``Poisson blending'' in 2D image editing, remains a critical operation that is under-explored by existing work. To fill this gap, we propose SeamlessNeRF, a novel approach for seamless appearance blending of multiple NeRFs. In specific, we aim to optimize the appearance of a target radiance field in order to harmonize its merge with a source field. We propose a well-tailored optimization procedure for blending, which is constrained by 1) pinning the radiance color in the intersecting boundary area between the source and target fields and 2) maintaining the original gradient of the target. Extensive experiments validate that our approach can effectively propagate the source appearance from the boundary area to the entire target field through the gradients. To the best of our knowledge, SeamlessNeRF is the first work that introduces gradient-guided appearance editing to radiance fields, offering solutions for seamless stitching of 3D objects represented in NeRFs.
Physics-Inspired Discrete-Phase Optimization for 3D Beamforming with PIN-Diode Extra-Large Antenna Arrays
Authors: Authors: Minsung Kim, Annalise Stockley, Keith Briggs, Kyle Jamieson
Subjects: Information Theory (cs.IT); Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2311.16128
Pdf link: https://arxiv.org/pdf/2311.16128
Abstract Large antenna arrays can steer narrow beams towards a target area, and thus improve the communications capacity of wireless channels and the fidelity of radio sensing. Hardware that is capable of continuously-variable phase shifts is expensive, presenting scaling challenges. PIN diodes that apply only discrete phase shifts are promising and cost-effective; however, unlike continuous phase shifters, finding the best phase configuration across elements is an NP-hard optimization problem. Thus, the complexity of optimization becomes a new bottleneck for large-antenna arrays. To address this challenge, this paper suggests a procedure for converting the optimization objective function from a ratio of quadratic functions to a sequence of more easily solvable quadratic unconstrained binary optimization (QUBO) sub-problems. This conversion is an exact equivalence, and the resulting QUBO forms are standard input formats for various physics-inspired optimization methods. We demonstrate that a simulated annealing approach is very effective for solving these sub-problems, and we give performance metrics for several large array types optimized by this technique. Through numerical experiments, we report 3D beamforming performance for extra-large arrays with up to 10,000 elements.
Emulators in JINSP
Authors: Authors: Lei Zhao, Miaomiao Zhang, Lv Zhe
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.16146
Pdf link: https://arxiv.org/pdf/2311.16146
Abstract JINSP(Jiutian Intelligence Network Simulation Platform) describes a series of basic emulators and their combinations, such as the simulation of the protocol stack for dynamic users in a real environment, which is composed of user behavior simulation, base station simulation, and terminal simulation. It is applied in specific business scenarios, such as multi-target antenna optimization, compression feedback, and so on. This paper provides detailed descriptions of each emulator and its combination based on this foundation, including the implementation process of the emulator, integration with the platform, experimental results, and other aspects.
Load balancing in cloud data centers with optimized virtual machines placement
Authors: Authors: Hamid Reza naji, Reza Esmaeili
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.16147
Pdf link: https://arxiv.org/pdf/2311.16147
Abstract So far, various solutions have been proposed for symmetric distribution of load cloud computing environments. In this article, a new solution to the optimal allocation of virtual machines in the cloud data centers is presented to provide a good load balancing among servers. The proposed method offers a solution uses learning automata as a reinforcement learning model to improve the performance of the optimization algorithm for optimal placement of virtual machines. Also, it helps the search algorithm to converge more quickly to the global optimum. The simulation results show the proposed method has been able to perform good level of load balancing in cloud data centers.
Variational Exploration Module VEM: A Cloud-Native Optimization and Validation Tool for Geospatial Modeling and AI Workflows
Authors: Authors: Julian Kuehnert (1), Hiwot Tadesse (1), Chris Dearden (2), Rosie Lickorish (3), Paolo Fraccaro (3), Anne Jones (3), Blair Edwards (3), Sekou L. Remy (1), Peter Melling (4), Tim Culmer (4) ((1) IBM Research, Nairobi, Kenya, (2) STFC Hartree Centre, Warrington, UK, (3) IBM Research, Daresbury, UK, (4) Riskaware Ltd., Bristol, UK)
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.16196
Pdf link: https://arxiv.org/pdf/2311.16196
Abstract Geospatial observations combined with computational models have become key to understanding the physical systems of our environment and enable the design of best practices to reduce societal harm. Cloud-based deployments help to scale up these modeling and AI workflows. Yet, for practitioners to make robust conclusions, model tuning and testing is crucial, a resource intensive process which involves the variation of model input variables. We have developed the Variational Exploration Module which facilitates the optimization and validation of modeling workflows deployed in the cloud by orchestrating workflow executions and using Bayesian and machine learning-based methods to analyze model behavior. User configurations allow the combination of diverse sampling strategies in multi-agent environments. The flexibility and robustness of the model-agnostic module is demonstrated using real-world applications.
Optimal Observer Design Using Reinforcement Learning and Quadratic Neural Networks
Authors: Authors: Soroush Asri, Luis Rodrigues
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.16272
Pdf link: https://arxiv.org/pdf/2311.16272
Abstract This paper introduces an innovative approach based on policy iteration (PI), a reinforcement learning (RL) algorithm, to obtain an optimal observer with a quadratic cost function. This observer is designed for systems with a given linearized model and a stabilizing Luenberger observer gain. We utilize two-layer quadratic neural networks (QNN) for policy evaluation and derive a linear correction term using the input and output data. This correction term effectively rectifies inaccuracies introduced by the linearized model employed within the observer design. A unique feature of the proposed methodology is that the QNN is trained through convex optimization. The main advantage is that the QNN's input-output mapping has an analytical expression as a quadratic form, which can then be used to obtain a linear correction term policy. This is in stark contrast to the available techniques in the literature that must train a second neural network to obtain policy improvement. It is proven that the obtained linear correction term is optimal for linear systems, as both the value function and the QNN's input-output mapping are quadratic. The proposed method is applied to a simple pendulum, demonstrating an enhanced correction term policy compared to relying solely on the linearized model. This shows its promise for addressing nonlinear systems.
A Graph Neural Network-Based QUBO-Formulated Hamiltonian-Inspired Loss Function for Combinatorial Optimization using Reinforcement Learning
Authors: Authors: Redwan Ahmed Rizvee, Raheeb Hasan, Md. Mosaddek Khan
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.16277
Pdf link: https://arxiv.org/pdf/2311.16277
Abstract Quadratic Unconstrained Binary Optimization (QUBO) is a generic technique to model various NP-hard Combinatorial Optimization problems (CO) in the form of binary variables. Ising Hamiltonian is used to model the energy function of a system. QUBO to Ising Hamiltonian is regarded as a technique to solve various canonical optimization problems through quantum optimization algorithms. Recently, PI-GNN, a generic framework, has been proposed to address CO problems over graphs based on Graph Neural Network (GNN) architecture. They introduced a generic QUBO-formulated Hamiltonian-inspired loss function that was directly optimized using GNN. PI-GNN is highly scalable but there lies a noticeable decrease in the number of satisfied constraints when compared to problem-specific algorithms and becomes more pronounced with increased graph densities. Here, We identify a behavioral pattern related to it and devise strategies to improve its performance. Another group of literature uses Reinforcement learning (RL) to solve the aforementioned NP-hard problems using problem-specific reward functions. In this work, we also focus on creating a bridge between the RL-based solutions and the QUBO-formulated Hamiltonian. We formulate and empirically evaluate the compatibility of the QUBO-formulated Hamiltonian as the generic reward function in the RL-based paradigm in the form of rewards. Furthermore, we also introduce a novel Monty Carlo Tree Search-based strategy with GNN where we apply a guided search through manual perturbation of node labels during training. We empirically evaluated our methods and observed up to 44% improvement in the number of constraint violations compared to the PI-GNN.
Physics-Informed Neural Network for Discovering Systems with Unmeasurable States with Application to Lithium-Ion Batteries
Authors: Authors: Yuichi Kajiura, Jorge Espin, Dong Zhang
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.16374
Pdf link: https://arxiv.org/pdf/2311.16374
Abstract Combining machine learning with physics is a trending approach for discovering unknown dynamics, and one of the most intensively studied frameworks is the physics-informed neural network (PINN). However, PINN often fails to optimize the network due to its difficulty in concurrently minimizing multiple losses originating from the system's governing equations. This problem can be more serious when the system's states are unmeasurable, like lithium-ion batteries (LiBs). In this work, we introduce a robust method for training PINN that uses fewer loss terms and thus constructs a less complex landscape for optimization. In particular, instead of having loss terms from each differential equation, this method embeds the dynamics into a loss function that quantifies the error between observed and predicted system outputs. This is accomplished by numerically integrating the predicted states from the neural network(NN) using known dynamics and transforming them to obtain a sequence of predicted outputs. Minimizing such a loss optimizes the NN to predict states consistent with observations given the physics. Further, the system's parameters can be added to the optimization targets. To demonstrate the ability of this method to perform various modeling and control tasks, we apply it to a battery model to concurrently estimate its states and parameters.
On the Effect of Defections in Federated Learning and How to Prevent Them
Authors: Authors: Minbiao Han, Kumar Kshitij Patel, Han Shao, Lingxiao Wang
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.16459
Pdf link: https://arxiv.org/pdf/2311.16459
Abstract Federated learning is a machine learning protocol that enables a large population of agents to collaborate over multiple rounds to produce a single consensus model. There are several federated learning applications where agents may choose to defect permanently$-$essentially withdrawing from the collaboration$-$if they are content with their instantaneous model in that round. This work demonstrates the detrimental impact of such defections on the final model's robustness and ability to generalize. We also show that current federated optimization algorithms fail to disincentivize these harmful defections. We introduce a novel optimization algorithm with theoretical guarantees to prevent defections while ensuring asymptotic convergence to an effective solution for all participating agents. We also provide numerical experiments to corroborate our findings and demonstrate the effectiveness of our algorithm.
GS-IR: 3D Gaussian Splatting for Inverse Rendering
Authors: Authors: Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, Kui Jia
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.16473
Pdf link: https://arxiv.org/pdf/2311.16473
Abstract We propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS) that leverages forward mapping volume rendering to achieve photorealistic novel view synthesis and relighting results. Unlike previous works that use implicit neural representations and volume rendering (e.g. NeRF), which suffer from low expressive power and high computational complexity, we extend GS, a top-performance representation for novel view synthesis, to estimate scene geometry, surface material, and environment illumination from multi-view images captured under unknown lighting conditions. There are two main problems when introducing GS to inverse rendering: 1) GS does not support producing plausible normal natively; 2) forward mapping (e.g. rasterization and splatting) cannot trace the occlusion like backward mapping (e.g. ray tracing). To address these challenges, our GS-IR proposes an efficient optimization scheme that incorporates a depth-derivation-based regularization for normal estimation and a baking-based occlusion to model indirect lighting. The flexible and expressive GS representation allows us to achieve fast and compact geometry reconstruction, photorealistic novel view synthesis, and effective physically-based rendering. We demonstrate the superiority of our method over baseline methods through qualitative and quantitative evaluations on various challenging scenes.
On the Robustness of Decision-Focused Learning
Authors: Authors: Yehya Farhat
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.16487
Pdf link: https://arxiv.org/pdf/2311.16487
Abstract Decision-Focused Learning (DFL) is an emerging learning paradigm that tackles the task of training a machine learning (ML) model to predict missing parameters of an incomplete optimization problem, where the missing parameters are predicted. DFL trains an ML model in an end-to-end system, by integrating the prediction and optimization tasks, providing better alignment of the training and testing objectives. DFL has shown a lot of promise and holds the capacity to revolutionize decision-making in many real-world applications. However, very little is known about the performance of these models under adversarial attacks. We adopt ten unique DFL methods and benchmark their performance under two distinctly focused attacks adapted towards the Predict-then-Optimize problem setting. Our study proposes the hypothesis that the robustness of a model is highly correlated with its ability to find predictions that lead to optimal decisions without deviating from the ground-truth label. Furthermore, we provide insight into how to target the models that violate this condition and show how these models respond differently depending on the achieved optimality at the end of their training cycles.
RandMSAugment: A Mixed-Sample Augmentation for Limited-Data Scenarios
Authors: Authors: Swarna Kamlam Ravindran, Carlo Tomasi
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.16508
Pdf link: https://arxiv.org/pdf/2311.16508
Abstract The high costs of annotating large datasets suggests a need for effectively training CNNs with limited data, and data augmentation is a promising direction. We study foundational augmentation techniques, including Mixed Sample Data Augmentations (MSDAs) and a no-parameter variant of RandAugment termed Preset-RandAugment, in the fully supervised scenario. We observe that Preset-RandAugment excels in limited-data contexts while MSDAs are moderately effective. We show that low-level feature transforms play a pivotal role in this performance difference, postulate a new property of augmentations related to their data efficiency, and propose new ways to measure the diversity and realism of augmentations. Building on these insights, we introduce a novel augmentation technique called RandMSAugment that integrates complementary strengths of existing methods. RandMSAugment significantly outperforms the competition on CIFAR-100, STL-10, and Tiny-Imagenet. With very small training sets (4, 25, 100 samples/class), RandMSAugment achieves compelling performance gains between 4.1% and 6.75%. Even with more training data (500 samples/class) we improve performance by 1.03% to 2.47%. RandMSAugment does not require hyperparameter tuning, extra validation data, or cumbersome optimizations.
Communication Efficiency Optimization of Federated Learning for Computing and Network Convergence of 6G Networks
Authors: Authors: Yizhuo Cai, Bo Lei, Qianying Zhao, Jing Peng, Min Wei, Yushun Zhang, Xing Zhang
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2311.16540
Pdf link: https://arxiv.org/pdf/2311.16540
Abstract Federated learning effectively addresses issues such as data privacy by collaborating across participating devices to train global models. However, factors such as network topology and device computing power can affect its training or communication process in complex network environments. A new network architecture and paradigm with computing-measurable, perceptible, distributable, dispatchable, and manageable capabilities, computing and network convergence (CNC) of 6G networks can effectively support federated learning training and improve its communication efficiency. By guiding the participating devices' training in federated learning based on business requirements, resource load, network conditions, and arithmetic power of devices, CNC can reach this goal. In this paper, to improve the communication efficiency of federated learning in complex networks, we study the communication efficiency optimization of federated learning for computing and network convergence of 6G networks, methods that gives decisions on its training process for different network conditions and arithmetic power of participating devices in federated learning. The experiments address two architectures that exist for devices in federated learning and arrange devices to participate in training based on arithmetic power while achieving optimization of communication efficiency in the process of transferring model parameters. The results show that the method we proposed can (1) cope well with complex network situations (2) effectively balance the delay distribution of participating devices for local training (3) improve the communication efficiency during the transfer of model parameters (4) improve the resource utilization in the network.
Multi-Irreducible Spectral Synchronization for Robust Rotation Averaging
Authors: Authors: Owen Howell, Haoen Huang, David Rosen
Subjects: Computer Vision and Pattern Recognition (cs.CV); Group Theory (math.GR); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.16544
Pdf link: https://arxiv.org/pdf/2311.16544
Abstract Rotation averaging (RA) is a fundamental problem in robotics and computer vision. In RA, the goal is to estimate a set of $N$ unknown orientations $R{1}, ..., R{N} \in SO(3)$, given noisy measurements $R{ij} \sim R^{-1}{i} R_{j}$ of a subset of their pairwise relative rotations. This problem is both nonconvex and NP-hard, and thus difficult to solve in the general case. We apply harmonic analysis on compact groups to derive a (convex) spectral relaxation constructed from truncated Fourier decompositions of the individual summands appearing in the RA objective; we then recover an estimate of the RA solution by computing a few extremal eigenpairs of this relaxation, and (approximately) solving a consensus problem. Our approach affords several notable advantages versus prior RA methods: it can be used in conjunction with \emph{any} smooth loss function (including, but not limited to, robust M-estimators), does not require any initialization, and is implemented using only simple (and highly scalable) linear-algebraic computations and parallelizable optimizations over band-limited functions of individual rotational states. Moreover, under the (physically well-motivated) assumption of multiplicative Langevin measurement noise, we derive explicit performance guarantees for our spectral estimator (in the form of probabilistic tail bounds on the estimation error) that are parameterized in terms of graph-theoretic quantities of the underlying measurement network. By concretely linking estimator performance with properties of the underlying measurement graph, our results also indicate how to devise measurement networks that are \emph{guaranteed} to achieve accurate estimation, enabling such downstream tasks as sensor placement, network compression, and active sensing.
HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors
Authors: Authors: Shutong Zhang, Yi-Ling Qiao, Guanglei Zhu, Eric Heiden, Dylan Turpin, Jingzhou Liu, Ming Lin, Miles Macklin, Animesh Garg
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2311.16552
Pdf link: https://arxiv.org/pdf/2311.16552
Abstract Various heuristic objectives for modeling hand-object interaction have been proposed in past work. However, due to the lack of a cohesive framework, these objectives often possess a narrow scope of applicability and are limited by their efficiency or accuracy. In this paper, we propose HandyPriors, a unified and general pipeline for pose estimation in human-object interaction scenes by leveraging recent advances in differentiable physics and rendering. Our approach employs rendering priors to align with input images and segmentation masks along with physics priors to mitigate penetration and relative-sliding across frames. Furthermore, we present two alternatives for hand and object pose estimation. The optimization-based pose estimation achieves higher accuracy, while the filtering-based tracking, which utilizes the differentiable priors as dynamics and observation models, executes faster. We demonstrate that HandyPriors attains comparable or superior results in the pose estimation task, and that the differentiable physics module can predict contact information for pose refinement. We also show that our approach generalizes to perception tasks, including robotic hand manipulation and human-object pose estimation in the wild.
MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices
Authors: Authors: Yang Zhao, Yanwu Xu, Zhisheng Xiao, Tingbo Hou
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.16567
Pdf link: https://arxiv.org/pdf/2311.16567
Abstract The deployment of large-scale text-to-image diffusion models on mobile devices is impeded by their substantial model size and slow inference speed. In this paper, we propose \textbf{MobileDiffusion}, a highly efficient text-to-image diffusion model obtained through extensive optimizations in both architecture and sampling techniques. We conduct a comprehensive examination of model architecture design to reduce redundancy, enhance computational efficiency, and minimize model's parameter count, while preserving image generation quality. Additionally, we employ distillation and diffusion-GAN finetuning techniques on MobileDiffusion to achieve 8-step and 1-step inference respectively. Empirical studies, conducted both quantitatively and qualitatively, demonstrate the effectiveness of our proposed techniques. MobileDiffusion achieves a remarkable \textbf{sub-second} inference speed for generating a $512\times512$ image on mobile devices, establishing a new state of the art.
Active RIS Enhanced Spectrum Sensing for Cognitive Radio Networks
Authors: Authors: Jungang Ge, Ying-Chang Liang, Sumei Sun, Yonghong Zeng
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2311.16568
Pdf link: https://arxiv.org/pdf/2311.16568
Abstract In opportunistic cognitive radio networks, when the primary signal is very weak compared to the background noise, the secondary user requires long sensing time to achieve a reliable spectrum sensing performance, leading to little remaining time for the secondary transmission. To tackle this issue, we propose an active reconfigurable intelligent surface (RIS) assisted spectrum sensing system, where the received signal strength from the interested primary user can be enhanced and underlying interference within the background noise can be mitigated as well. In comparison with the passive RIS, the active RIS can not only adapt the phase shift of each reflecting element but also amplify the incident signals. Notably, we study the reflecting coefficient matrix (RCM) optimization problem to improve the detection probability given a maximum tolerable false alarm probability and limited sensing time. Then, we show that the formulated problem can be equivalently transformed to a weighted mean square error minimization problem using the principle of the well-known weighted minimum mean square error (WMMSE) algorithm, and an iterative optimization approach is proposed to obtain the optimal RCM. In addition, to fairly compare passive RIS and active RIS, we study the required power budget of the RIS to achieve a target detection probability under a special case where the direct links are neglected and the RIS-related channels are line-of-sight. Via extensive simulations, the effectiveness of the WMMSE-based RCM optimization approach is demonstrated. Furthermore, the results reveal that the active RIS can outperform the passive RIS when the underlying interference within the background noise is relatively weak, whereas the passive RIS performs better in strong interference scenarios because the same power budget can support a vast number of passive reflecting elements for interference mitigation.
Wireless Powered Metaverse: Joint Task Scheduling and Trajectory Design for Multi-Devices and Multi-UAVs
Authors: Authors: Xiaojie Wang, Jiameng Li, Zhaolong Ning, Qingyang Song, Lei Guo, Abbas Jamalipour
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2311.16576
Pdf link: https://arxiv.org/pdf/2311.16576
Abstract To support the running of human-centric metaverse applications on mobile devices, Unmanned Aerial Vehicle (UAV)-assisted Wireless Powered Mobile Edge Computing (WPMEC) is promising to compensate for limited computational capabilities and energy supplies of mobile devices. The high-speed computational processing demands and significant energy consumption of metaverse applications require joint resource scheduling of multiple devices and UAVs, but existing WPMEC solutions address either device or UAV scheduling due to the complexity of combinatorial optimization. To solve the above challenge, we propose a two-stage alternating optimization algorithm based on multi-task Deep Reinforcement Learning (DRL) to jointly allocate charging time, schedule computation tasks, and optimize trajectory of UAVs and mobile devices in a wireless powered metaverse scenario. First, considering energy constraints of both UAVs and mobile devices, we formulate an optimization problem to maximize the computation efficiency of the system. Second, we propose a heuristic algorithm to efficiently perform time allocation and charging scheduling for mobile devices. Following this, we design a multi-task DRL scheme to make charging scheduling and trajectory design decisions for UAVs. Finally, theoretical analysis and performance results demonstrate that our algorithm exhibits significant advantages over representative methods in terms of convergence speed and average computation efficiency.
Sorting Out New York City's Trash Problem
Authors: Authors: Steven DiSilvio, Anthony Ozerov, Leon Zhou
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2311.16585
Pdf link: https://arxiv.org/pdf/2311.16585
Abstract To reduce waste and improve public health and sanitation in New York City, innovative policies tailored to the city's unique urban landscape are necessary. The first program we propose is the Dumpster and Compost Accessibility Program. This program is affordable and utilizes dumpsters placed near fire hydrants to keep waste off the street without eliminating parking spaces. It also includes legal changes and the provision of compost bins to single/two-family households, which together will increase composting rates. The second program is the Pay-As-You-Throw Program. This requires New Yorkers living in single/two-family households to purchase stickers for each refuse bag they have collected by the city, incentivizing them to sort out compostable waste and recyclables. We conduct a weighted multi-objective optimization to determine the optimal sticker price based on the City's priorities. Roughly in proportion to the price, this program will increase diversion rates and decrease the net costs to New York City's Department of Sanitation. In conjunction, these two programs will improve NYC's diversion rates, eliminate garbage bags from the streets, and potentially save New York City money.
Monitor Placement for Fault Localization in Deep Neural Network Accelerators
Authors: Authors: Wei-Kai Liu, Benjamin Tan, Krishnendu Chakrabarty
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.16594
Pdf link: https://arxiv.org/pdf/2311.16594
Abstract Systolic arrays are a prominent choice for deep neural network (DNN) accelerators because they offer parallelism and efficient data reuse. Improving the reliability of DNN accelerators is crucial as hardware faults can degrade the accuracy of DNN inferencing. Systolic arrays make use of a large number of processing elements (PEs) for parallel processing, but when one PE is faulty, the error propagates and affects the outcomes of downstream PEs. Due to the large number of PEs, the cost associated with implementing hardware-based runtime monitoring of every single PE is infeasible. We present a solution to optimize the placement of hardware monitors within systolic arrays. We first prove that $2N-1$ monitors are needed to localize a single faulty PE and we also derive the monitor placement. We show that a second placement optimization problem, which minimizes the set of candidate faulty PEs for a given number of monitors, is NP-hard. Therefore, we propose a heuristic approach to balance the reliability and hardware resource utilization in DNN accelerators when number of monitors is limited. Experimental evaluation shows that to localize a single faulty PE, an area overhead of only 0.33% is incurred for a $256\times 256$ systolic array.
Harnessing customized built-in elements -- Empowering Component-Based Software Engineering and Design Systems with HTML5 Web Components
Authors: Authors: Hardik Shah
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.16601
Pdf link: https://arxiv.org/pdf/2311.16601
Abstract Customized built-in elements in HTML5 significantly transform web development. These elements enable developers to create unique HTML components tailored with specific design and purpose. Customized built-in elements enable developers to address the unique needs of web applications more quickly, supporting consistent user interfaces and experiences across diverse digital platforms. This study investigates the role of these features in Component-Based Software Engineering (CBSE) and Design Systems, emphasizing the benefits of code modularity, reusability, and scalability in web development. Customized built-in elements enable developers to address the unique needs of web applications more quickly, supporting consistent user interfaces and experiences across diverse digital platforms. The paper also discusses the difficulties and concerns that must be addressed when creating customized built-in elements, such as browser compatibility, performance optimization, accessibility, security, styling, and interoperability. It emphasizes the importance of standardization, developer tooling, and community interaction in order to fully realize the potential of these features. Looking ahead, customized built-in elements have potential in a variety of applications, including the Internet of Things (IoT), e-commerce, and educational technologies. Their incorporation into Progressive Web Apps (PWAs) is expected to further improve web experiences. While obstacles remain, the article concludes that HTML5 customized built-in elements are a driver for web development innovation, allowing the production of efficient, adaptive, and user-centric web applications in an ever-changing digital context.
Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective
Authors: Authors: Ming-Yu Chung, Sheng-Yen Chou, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo, Tsung-Yi Ho
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2311.16646
Pdf link: https://arxiv.org/pdf/2311.16646
Abstract Dataset distillation offers a potential means to enhance data efficiency in deep learning. Recent studies have shown its ability to counteract backdoor risks present in original training samples. In this study, we delve into the theoretical aspects of backdoor attacks and dataset distillation based on kernel methods. We introduce two new theory-driven trigger pattern generation methods specialized for dataset distillation. Following a comprehensive set of analyses and experiments, we show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation. Notably, datasets poisoned by our designed trigger prove resilient against conventional backdoor attack detection and mitigation methods. Our empirical results validate that the triggers developed using our approaches are proficient at executing resilient backdoor attacks.
An extended discontinuous Galerkin shock tracking method
Authors: Authors: Jakob Vandergrift, Florian Kummer
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2311.16699
Pdf link: https://arxiv.org/pdf/2311.16699
Abstract In this paper, we introduce a novel high-order shock tracking method and provide a proof of concept. Our method leverages concepts from implicit shock tracking and extended discontinuous Galerkin methods, primarily designed for solving partial differential equations featuring discontinuities. To address this challenge, we solve a constrained optimization problem aiming at accurately fitting the zero iso-contour of a level set function to the discontinuities. Additionally, we discuss various robustness measures inspired by both numerical experiments and existing literature. Finally, we showcase the capabilities of our method through a series of two-dimensional problems, progressively increasing in complexity.
Sinkhorn Flow: A Continuous-Time Framework for Understanding and Generalizing the Sinkhorn Algorithm
Authors: Authors: Mohammad Reza Karimi, Ya-Ping Hsieh, Andreas Krause
Subjects: Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.16706
Pdf link: https://arxiv.org/pdf/2311.16706
Abstract Many problems in machine learning can be formulated as solving entropy-regularized optimal transport on the space of probability measures. The canonical approach involves the Sinkhorn iterates, renowned for their rich mathematical properties. Recently, the Sinkhorn algorithm has been recast within the mirror descent framework, thus benefiting from classical optimization theory insights. Here, we build upon this result by introducing a continuous-time analogue of the Sinkhorn algorithm. This perspective allows us to derive novel variants of Sinkhorn schemes that are robust to noise and bias. Moreover, our continuous-time dynamics not only generalize but also offer a unified perspective on several recently discovered dynamics in machine learning and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or the "mean-field Schr\"odinger equation" of (Claisse et al. 2023).
LEDITS++: Limitless Image Editing using Text-to-Image Models
Authors: Authors: Manuel Brack, Felix Friedrich, Katharina Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, Apolinário Passos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.16711
Pdf link: https://arxiv.org/pdf/2311.16711
Abstract Text-to-image diffusion models have recently received increasing interest for their astonishing ability to produce high-fidelity images from solely text inputs. Subsequent research efforts aim to exploit and apply their capabilities to real image editing. However, existing image-to-image methods are often inefficient, imprecise, and of limited versatility. They either require time-consuming fine-tuning, deviate unnecessarily strongly from the input image, and/or lack support for multiple, simultaneous edits. To address these issues, we introduce LEDITS++, an efficient yet versatile and precise textual image manipulation technique. LEDITS++'s novel inversion approach requires no tuning nor optimization and produces high-fidelity results with a few diffusion steps. Second, our methodology supports multiple simultaneous edits and is architecture-agnostic. Third, we use a novel implicit masking technique that limits changes to relevant image regions. We propose the novel TEdBench++ benchmark as part of our exhaustive evaluation. Our results demonstrate the capabilities of LEDITS++ and its improvements over previous methods. The project page is available at https://leditsplusplus-project.static.hf.space .
Riemannian Self-Attention Mechanism for SPD Networks
Authors: Authors: Rui Wang, Xiao-Jun Wu, Hui Li, Josef Kittler
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.16738
Pdf link: https://arxiv.org/pdf/2311.16738
Abstract Symmetric positive definite (SPD) matrix has been demonstrated to be an effective feature descriptor in many scientific areas, as it can encode spatiotemporal statistics of the data adequately on a curved Riemannian manifold, i.e., SPD manifold. Although there are many different ways to design network architectures for SPD matrix nonlinear learning, very few solutions explicitly mine the geometrical dependencies of features at different layers. Motivated by the great success of self-attention mechanism in capturing long-range relationships, an SPD manifold self-attention mechanism (SMSA) is proposed in this paper using some manifold-valued geometric operations, mainly the Riemannian metric, Riemannian mean, and Riemannian optimization. Then, an SMSA-based geometric learning module (SMSA-GLM) is designed for the sake of improving the discrimination of the generated deep structured representations. Extensive experimental results achieved on three benchmarking datasets show that our modification against the baseline network further alleviates the information degradation problem and leads to improved accuracy.
Asynchronous Wireless Federated Learning with Probabilistic Client Selection
Authors: Authors: Jiarong Yang, Yuan Liu, Fangjiong Chen, Wen Chen, Changle Li
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2311.16741
Pdf link: https://arxiv.org/pdf/2311.16741
Abstract Federated learning (FL) is a promising distributed learning framework where distributed clients collaboratively train a machine learning model coordinated by a server. To tackle the stragglers issue in asynchronous FL, we consider that each client keeps local updates and probabilistically transmits the local model to the server at arbitrary times. We first derive the (approximate) expression for the convergence rate based on the probabilistic client selection. Then, an optimization problem is formulated to trade off the convergence rate of asynchronous FL and mobile energy consumption by joint probabilistic client selection and bandwidth allocation. We develop an iterative algorithm to solve the non-convex problem globally optimally. Experiments demonstrate the superiority of the proposed approach compared with the traditional schemes.
Fair Interventions in Weighted Congestion Games
Authors: Authors: Miriam Fischer, Martin Gairing, Dario Paccagnan
Subjects: Computer Science and Game Theory (cs.GT); Computational Complexity (cs.CC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2311.16760
Pdf link: https://arxiv.org/pdf/2311.16760
Abstract In this work we study the power and limitations of fair interventions in weighted congestion games. Specifically, we focus on interventions that aim at improving the equilibrium quality (price of anarchy) and are fair in the sense that identical players receive identical treatment. Within this setting, we provide three key contributions: First, we show that no fair intervention can reduce the price of anarchy below a given factor depending solely on the class of latencies considered. Interestingly, this lower bound is unconditional, i.e., it applies regardless of how much computation interventions are allowed to use. Second, we propose a taxation mechanism that is fair and show that the resulting price of anarchy matches this lower bound, while the mechanism can be efficiently computed in polynomial time. Third, we complement these results by showing that no intervention (fair or not) can achieve a better approximation if polynomial computability is required. We do so by proving that the minimum social cost is NP-hard to approximate below a factor identical to the one previously introduced. In doing so, we also show that the randomized algorithm proposed by Makarychev and Sviridenko (Journal of the ACM, 2018) for the class of optimization problems with a "diseconomy of scale" is optimal, and provide a novel way to derandomize its solution via equilibrium computation.
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
Authors: Authors: Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, Conghui He
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.16839
Pdf link: https://arxiv.org/pdf/2311.16839
Abstract Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem" where the models generate textual descriptions that contain inaccurate or non-existent content from the image. To address this issue, this paper introduces a novel strategy: Hallucination-Aware Direct Preference Optimization (HA-DPO). Our approach treats the hallucination problem as a unique preference selection issue, where the model is trained to favor the non-hallucinating response when presented with two responses of the same image (one accurate and one hallucinating). This paper also presents an efficient process for constructing hallucination sample pairs to ensure high-quality, style-consistent pairs for stable HA-DPO training. We applied this strategy to two mainstream multimodal models, and the results showed a significant reduction in the hallucination problem and an enhancement in the models' generalization capabilities. With HA-DPO, the MiniGPT-4 model demonstrates significant advancements: POPE accuracy increases from 51.13% to 85.66% (34.5% absolute improvement), and the MME score escalates from 968.58 to 1365.76 (41% relative improvement). The code, models, and datasets will be made publicly available.
Digital Twin-Enhanced Deep Reinforcement Learning for Resource Management in Networks Slicing
Authors: Authors: Zhengming Zhang, Yongming Huang, Cheng Zhang, Qingbi Zheng, Luxi Yang, Xiaohu You
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.16876
Pdf link: https://arxiv.org/pdf/2311.16876
Abstract Network slicing-based communication systems can dynamically and efficiently allocate resources for diversified services. However, due to the limitation of the network interface on channel access and the complexity of the resource allocation, it is challenging to achieve an acceptable solution in the practical system without precise prior knowledge of the dynamics probability model of the service requests. Existing work attempts to solve this problem using deep reinforcement learning (DRL), however, such methods usually require a lot of interaction with the real environment in order to achieve good results. In this paper, a framework consisting of a digital twin and reinforcement learning agents is present to handle the issue. Specifically, we propose to use the historical data and the neural networks to build a digital twin model to simulate the state variation law of the real environment. Then, we use the data generated by the network slicing environment to calibrate the digital twin so that it is in sync with the real environment. Finally, DRL for slice optimization optimizes its own performance in this virtual pre-verification environment. We conducted an exhaustive verification of the proposed digital twin framework to confirm its scalability. Specifically, we propose to use loss landscapes to visualize the generalization of DRL solutions. We explore a distillation-based optimization scheme for lightweight slicing strategies. In addition, we also extend the framework to offline reinforcement learning, where solutions can be used to obtain intelligent decisions based solely on historical data. Numerical simulation experiments show that the proposed digital twin can significantly improve the performance of the slice optimization strategy.
Optimization Theory Based Deep Reinforcement Learning for Resource Allocation in Ultra-Reliable Wireless Networked Control Systems
Authors: Authors: Hamida Qumber Ali, Amirhassan Babazadeh Darabi, Sinem Coleri
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2311.16895
Pdf link: https://arxiv.org/pdf/2311.16895
Abstract The design of Wireless Networked Control System (WNCS) requires addressing critical interactions between control and communication systems with minimal complexity and communication overhead while providing ultra-high reliability. This paper introduces a novel optimization theory based deep reinforcement learning (DRL) framework for the joint design of controller and communication systems. The objective of minimum power consumption is targeted while satisfying the schedulability and rate constraints of the communication system in the finite blocklength regime and stability constraint of the control system. Decision variables include the sampling period in the control system, and blocklength and packet error probability in the communication system. The proposed framework contains two stages: optimization theory and DRL. In the optimization theory stage, following the formulation of the joint optimization problem, optimality conditions are derived to find the mathematical relations between the optimal values of the decision variables. These relations allow the decomposition of the problem into multiple building blocks. In the DRL stage, the blocks that are simplified but not tractable are replaced by DRL. Via extensive simulations, the proposed optimization theory based DRL approach is demonstrated to outperform the optimization theory and pure DRL based approaches, with close to optimal performance and much lower complexity.
Lane-Keeping Control of Autonomous Vehicles Through a Soft-Constrained Iterative LQR
Authors: Authors: Der-Hau Lee
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2311.16900
Pdf link: https://arxiv.org/pdf/2311.16900
Abstract The accurate prediction of smooth steering inputs is crucial for autonomous vehicle applications because control actions with jitter might cause the vehicle system to become unstable. To address this problem in automobile lane-keeping control without the use of additional smoothing algorithms, we developed a soft-constrained iterative linear-quadratic regulator (soft-CILQR) algorithm by integrating CILQR algorithm and a model predictive control (MPC) constraint relaxation method. We incorporated slack variables into the state and control barrier functions of the soft-CILQR solver to soften the constraints in the optimization process so that stabilizing control inputs can be calculated in a relatively simple manner. Two types of automotive lane-keeping experiments were conducted with a linear system dynamics model to test the performance of the proposed soft-CILQR algorithm and to compare its performance with that of the CILQR algorithm: numerical simulations and experiments involving challenging vision-based maneuvers. In the numerical simulations, the soft-CILQR and CILQR solvers managed to drive the system toward the reference state asymptotically; however, the soft-CILQR solver obtained smooth steering input trajectories more easily than did the CILQR solver under conditions involving additive disturbances. In the experiments with visual inputs, the soft-CILQR controller outperformed the CILQR controller in terms of tracking accuracy and steering smoothness during the driving of an ego vehicle on TORCS.
Continuous optimization methods for the graph isomorphism problem
Authors: Authors: Stefan Klus, Patrick Gelß
Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2311.16912
Pdf link: https://arxiv.org/pdf/2311.16912
Abstract The graph isomorphism problem looks deceptively simple, but although polynomial-time algorithms exist for certain types of graphs such as planar graphs and graphs with bounded degree or eigenvalue multiplicity, its complexity class is still unknown. Information about potential isomorphisms between two graphs is contained in the eigenvalues and eigenvectors of their adjacency matrices. However, symmetries of graphs often lead to repeated eigenvalues so that associated eigenvectors are determined only up to basis rotations, which complicates graph isomorphism testing. We consider orthogonal and doubly stochastic relaxations of the graph isomorphism problem, analyze the geometric properties of the resulting solution spaces, and show that their complexity increases significantly if repeated eigenvalues exist. By restricting the search space to suitable subspaces, we derive an efficient Frank-Wolfe based continuous optimization approach for detecting isomorphisms. We illustrate the efficacy of the algorithm with the aid of various highly symmetric graphs.
Which Quantum Circuit Mutants Shall Be Used? An Empirical Evaluation of Quantum Circuit Mutations
Authors: Authors: Eñaut Mendiluze Usandizaga, Tao Yue, Paolo Arcaini, Shaukat Ali
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2311.16913
Pdf link: https://arxiv.org/pdf/2311.16913
Abstract As a new research area, quantum software testing lacks systematic testing benchmarks to assess testing techniques' effectiveness. Recently, some open-source benchmarks and mutation analysis tools have emerged. However, there is insufficient evidence on how various quantum circuit characteristics (e.g., circuit depth, number of quantum gates), algorithms (e.g., Quantum Approximate Optimization Algorithm), and mutation characteristics (e.g., mutation operators) affect the most mutant detection in quantum circuits. Studying such relations is important to systematically design faulty benchmarks with varied attributes (e.g., the difficulty in detecting a seeded fault) to facilitate assessing the cost-effectiveness of quantum software testing techniques efficiently. To this end, we present a large-scale empirical evaluation with more than 700K faulty benchmarks (quantum circuits) generated by mutating 382 real-world quantum circuits. Based on the results, we provide valuable insights for researchers to define systematic quantum mutation analysis techniques. We also provide a tool to recommend mutants to users based on chosen characteristics (e.g., a quantum algorithm type) and the required difficulty of killing mutants. Finally, we also provide faulty benchmarks that can already be used to assess the cost-effectiveness of quantum software testing techniques.
RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D
Authors: Authors: Lingteng Qiu, Guanying Chen, Xiaodong Gu, Qi Zuo, Mutian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, Xiaoguang Han
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.16918
Pdf link: https://arxiv.org/pdf/2311.16918
Abstract Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images. Existing methods have shown promise by first creating the geometry through score-distillation sampling (SDS) applied to rendered surface normals, followed by appearance modeling. However, relying on a 2D RGB diffusion model to optimize surface normals is suboptimal due to the distribution discrepancy between natural images and normals maps, leading to instability in optimization. In this paper, recognizing that the normal and depth information effectively describe scene geometry and be automatically estimated from images, we propose to learn a generalizable Normal-Depth diffusion model for 3D generation. We achieve this by training on the large-scale LAION dataset together with the generalizable image-to-depth and normal prior models. In an attempt to alleviate the mixed illumination effects in the generated materials, we introduce an albedo diffusion model to impose data-driven constraints on the albedo component. Our experiments show that when integrated into existing text-to-3D pipelines, our models significantly enhance the detail richness, achieving state-of-the-art results. Our project page is https://lingtengqiu.github.io/RichDreamer/.
LLaFS: When Large-Language Models Meet Few-Shot Segmentation
Authors: Authors: Lanyun Zhu, Tianrun Chen, Deyi Ji, Jieping Ye, Jun Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.16926
Pdf link: https://arxiv.org/pdf/2311.16926
Abstract This paper proposes LLaFS, the first attempt to leverage large language models (LLMs) in few-shot segmentation. In contrast to the conventional few-shot segmentation methods that only rely on the limited and biased information from the annotated support images, LLaFS leverages the vast prior knowledge gained by LLM as an effective supplement and directly uses the LLM to segment images in a few-shot manner. To enable the text-based LLM to handle image-related tasks, we carefully design an input instruction that allows the LLM to produce segmentation results represented as polygons, and propose a region-attribute table to simulate the human visual mechanism and provide multi-modal guidance. We also synthesize pseudo samples and use curriculum learning for pretraining to augment data and achieve better optimization. LLaFS achieves state-of-the-art results on multiple datasets, showing the potential of using LLMs for few-shot computer vision tasks. Code will be available at https://github.com/lanyunzhu99/LLaFS.
From Simulations to Reality: Enhancing Multi-Robot Exploration for Urban Search and Rescue
Authors: Authors: Gautam Siddharth Kashyap, Deepkashi Mahajan, Orchid Chetia Phukan, Ankit Kumar, Alexander E.I. Brownlee, Jiechao Gao
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2311.16958
Pdf link: https://arxiv.org/pdf/2311.16958
Abstract In this study, we present a novel hybrid algorithm, combining Levy Flight (LF) and Particle Swarm Optimization (PSO) (LF-PSO), tailored for efficient multi-robot exploration in unknown environments with limited communication and no global positioning information. The research addresses the growing interest in employing multiple autonomous robots for exploration tasks, particularly in scenarios such as Urban Search and Rescue (USAR) operations. Multiple robots offer advantages like increased task coverage, robustness, flexibility, and scalability. However, existing approaches often make assumptions such as search area, robot positioning, communication restrictions, and target information that may not hold in real-world situations. The hybrid algorithm leverages LF, known for its effectiveness in large space exploration with sparse targets, and incorporates inter-robot repulsion as a social component through PSO. This combination enhances area exploration efficiency. We redefine the local best and global best positions to suit scenarios without continuous target information. Experimental simulations in a controlled environment demonstrate the algorithm's effectiveness, showcasing improved area coverage compared to traditional methods. In the process of refining our approach and testing it in complex, obstacle-rich environments, the presented work holds promise for enhancing multi-robot exploration in scenarios with limited information and communication capabilities.
Principal-Agent Problem with Third Party: Information Design from Social Planner's Perspective
Authors: Authors: Shiyun Lin, Zhihua Zhang
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2311.16959
Pdf link: https://arxiv.org/pdf/2311.16959
Abstract We study the principal-agent problem with a third party that we call social planner, whose responsibility is to reconcile the conflicts of interest between the two players and induce socially optimal outcome in terms of some given social utility function. The social planner owns no contractual power but manages to control the information flow between the principal and the agent. We design a simple workflow with two stages for the social planner. In the first stage, the problem is reformulated as an optimization problem whose solution is the optimal utility profile. In the second stage, we investigate information design and show that binary-signal information structure suffices to induce the socially optimal outcome determined in the first stage. The result shows that information plays a key role in social planning in the principal-agent model.
HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion
Authors: Authors: Jingbo Zhang, Xiaoyu Li, Qi Zhang, Yanpei Cao, Ying Shan, Jing Liao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.16961
Pdf link: https://arxiv.org/pdf/2311.16961
Abstract Generating a 3D human model from a single reference image is challenging because it requires inferring textures and geometries in invisible views while maintaining consistency with the reference image. Previous methods utilizing 3D generative models are limited by the availability of 3D training data. Optimization-based methods that lift text-to-image diffusion models to 3D generation often fail to preserve the texture details of the reference image, resulting in inconsistent appearances in different views. In this paper, we propose HumanRef, a 3D human generation framework from a single-view input. To ensure the generated 3D model is photorealistic and consistent with the input image, HumanRef introduces a novel method called reference-guided score distillation sampling (Ref-SDS), which effectively incorporates image guidance into the generation process. Furthermore, we introduce region-aware attention to Ref-SDS, ensuring accurate correspondence between different body regions. Experimental results demonstrate that HumanRef outperforms state-of-the-art methods in generating 3D clothed humans with fine geometry, photorealistic textures, and view-consistent appearances.
DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models
Authors: Authors: Tsun-Hsuan Wang, Juntian Zheng, Pingchuan Ma, Yilun Du, Byungchul Kim, Andrew Spielberg, Joshua Tenenbaum, Chuang Gan, Daniela Rus
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.17053
Pdf link: https://arxiv.org/pdf/2311.17053
Abstract Nature evolves creatures with a high complexity of morphological and behavioral intelligence, meanwhile computational methods lag in approaching that diversity and efficacy. Co-optimization of artificial creatures' morphology and control in silico shows promise for applications in physical soft robotics and virtual character creation; such approaches, however, require developing new learning algorithms that can reason about function atop pure structure. In this paper, we present DiffuseBot, a physics-augmented diffusion model that generates soft robot morphologies capable of excelling in a wide spectrum of tasks. DiffuseBot bridges the gap between virtually generated content and physical utility by (i) augmenting the diffusion process with a physical dynamical simulation which provides a certificate of performance, and (ii) introducing a co-design procedure that jointly optimizes physical design and control by leveraging information about physical sensitivities from differentiable simulation. We showcase a range of simulated and fabricated robots along with their capabilities. Check our website at https://diffusebot.github.io/
Keyword: adam

Parameterized Inapproximability Hypothesis under ETH
Authors: Authors: Venkatesan Guruswami, Bingkai Lin, Xuandi Ren, Yican Sun, Kewen Wu
Subjects: Computational Complexity (cs.CC)
Arxiv link: https://arxiv.org/abs/2311.16587
Pdf link: https://arxiv.org/pdf/2311.16587
Abstract The Parameterized Inapproximability Hypothesis (PIH) asserts that no fixed parameter tractable (FPT) algorithm can distinguish a satisfiable CSP instance, parameterized by the number of variables, from one where every assignment fails to satisfy an $\varepsilon$ fraction of constraints for some absolute constant $\varepsilon > 0$. PIH plays the role of the PCP theorem in parameterized complexity. However, PIH has only been established under Gap-ETH, a very strong assumption with an inherent gap. In this work, we prove PIH under the Exponential Time Hypothesis (ETH). This is the first proof of PIH from a gap-free assumption. Our proof is self-contained and elementary. We identify an ETH-hard CSP whose variables take vector values, and constraints are either linear or of a special parallel structure. Both kinds of constraints can be checked with constant soundness via a "parallel PCP of proximity" based on the Walsh-Hadamard code.
Keyword: gradient

Exploring Multiple Neighborhood Neural Cellular Automata (MNNCA) for Enhanced Texture Learning
Authors: Authors: Magnus Petersen
Subjects: Neural and Evolutionary Computing (cs.NE); Cellular Automata and Lattice Gases (nlin.CG)
Arxiv link: https://arxiv.org/abs/2311.16123
Pdf link: https://arxiv.org/pdf/2311.16123
Abstract Cellular Automata (CA) have long been foundational in simulating dynamical systems computationally. With recent innovations, this model class has been brought into the realm of deep learning by parameterizing the CA's update rule using an artificial neural network, termed Neural Cellular Automata (NCA). This allows NCAs to be trained via gradient descent, enabling them to evolve into specific shapes, generate textures, and mimic behaviors such as swarming. However, a limitation of traditional NCAs is their inability to exhibit sufficiently complex behaviors, restricting their potential in creative and modeling tasks. Our research explores enhancing the NCA framework by incorporating multiple neighborhoods and introducing structured noise for seed states. This approach is inspired by techniques that have historically amplified the expressiveness of classical continuous CA. All code and example videos are publicly available on https://github.com/MagnusPetersen/MNNCA.
DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification
Authors: Authors: Mintong Kang, Dawn Song, Bo Li
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.16124
Pdf link: https://arxiv.org/pdf/2311.16124
Abstract Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples and achieve state-of-the-art robustness. Recent studies show that even advanced attacks cannot break such defenses effectively, since the purification process induces an extremely deep computational graph which poses the potential problem of gradient obfuscation, high memory cost, and unbounded randomness. In this paper, we propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses, including both DDPM and score-based approaches. In particular, we propose a deviated-reconstruction loss at intermediate diffusion steps to induce inaccurate density gradient estimation to tackle the problem of vanishing/exploding gradients. We also provide a segment-wise forwarding-backwarding algorithm, which leads to memory-efficient gradient backpropagation. We validate the attack effectiveness of DiffAttack compared with existing adaptive attacks on CIFAR-10 and ImageNet. We show that DiffAttack decreases the robust accuracy of models compared with SOTA attacks by over 20% on CIFAR-10 under $\ell\infty$ attack $(\epsilon=8/255)$, and over 10% on ImageNet under $\ell\infty$ attack $(\epsilon=4/255)$. We conduct a series of ablations studies, and we find 1) DiffAttack with the deviated-reconstruction loss added over uniformly sampled time steps is more effective than that added over only initial/final steps, and 2) diffusion-based purification with a moderate diffusion length is more robust under DiffAttack.
SeamlessNeRF: Stitching Part NeRFs with Gradient Propagation
Authors: Authors: Bingchen Gong, Yuehao Wang, Xiaoguang Han, Qi Dou
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2311.16127
Pdf link: https://arxiv.org/pdf/2311.16127
Abstract Neural Radiance Fields (NeRFs) have emerged as promising digital mediums of 3D objects and scenes, sparking a surge in research to extend the editing capabilities in this domain. The task of seamless editing and merging of multiple NeRFs, resembling the ``Poisson blending'' in 2D image editing, remains a critical operation that is under-explored by existing work. To fill this gap, we propose SeamlessNeRF, a novel approach for seamless appearance blending of multiple NeRFs. In specific, we aim to optimize the appearance of a target radiance field in order to harmonize its merge with a source field. We propose a well-tailored optimization procedure for blending, which is constrained by 1) pinning the radiance color in the intersecting boundary area between the source and target fields and 2) maintaining the original gradient of the target. Extensive experiments validate that our approach can effectively propagate the source appearance from the boundary area to the entire target field through the gradients. To the best of our knowledge, SeamlessNeRF is the first work that introduces gradient-guided appearance editing to radiance fields, offering solutions for seamless stitching of 3D objects represented in NeRFs.
Estimating Post-Synaptic Effects for Online Training of Feed-Forward SNNs
Authors: Authors: Thomas Summe, Clemens JS Schaefer, Siddharth Joshi
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2311.16151
Pdf link: https://arxiv.org/pdf/2311.16151
Abstract Facilitating online learning in spiking neural networks (SNNs) is a key step in developing event-based models that can adapt to changing environments and learn from continuous data streams in real-time. Although forward-mode differentiation enables online learning, its computational requirements restrict scalability. This is typically addressed through approximations that limit learning in deep models. In this study, we propose Online Training with Postsynaptic Estimates (OTPE) for training feed-forward SNNs, which approximates Real-Time Recurrent Learning (RTRL) by incorporating temporal dynamics not captured by current approximations, such as Online Training Through Time (OTTT) and Online Spatio-Temporal Learning (OSTL). We show improved scaling for multi-layer networks using a novel approximation of temporal effects on the subsequent layer's activity. This approximation incurs minimal overhead in the time and space complexity compared to similar algorithms, and the calculation of temporal effects remains local to each layer. We characterize the learning performance of our proposed algorithms on multiple SNN model configurations for rate-based and time-based encoding. OTPE exhibits the highest directional alignment to exact gradients, calculated with backpropagation through time (BPTT), in deep networks and, on time-based encoding, outperforms other approximate methods. We also observe sizeable gains in average performance over similar algorithms in offline training of Spiking Heidelberg Digits with equivalent hyper-parameters (OTTT/OSTL - 70.5%; OTPE - 75.2%; BPTT - 78.1%).
Influence Scores at Scale for Efficient Language Data Sampling
Authors: Authors: Nikhil Anand, Joshua Tan, Maria Minakova
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2311.16298
Pdf link: https://arxiv.org/pdf/2311.16298
Abstract Modern ML systems ingest data aggregated from diverse sources, such as synthetic, human-annotated, and live customer traffic. Understanding \textit{which} examples are important to the performance of a learning algorithm is crucial for efficient model training. Recently, a growing body of literature has given rise to various "influence scores," which use training artifacts such as model confidence or checkpointed gradients to identify important subsets of data. However, these methods have primarily been developed in computer vision settings, and it remains unclear how well they generalize to language-based tasks using pretrained models. In this paper, we explore the applicability of influence scores in language classification tasks. We evaluate a diverse subset of these scores on the SNLI dataset by quantifying accuracy changes in response to pruning training data through random and influence-score-based sampling. We then stress-test one of the scores -- "variance of gradients" (VoG) from Agarwal et al. (2022) -- in an NLU model stack that was exposed to dynamic user speech patterns in a voice assistant type of setting. Our experiments demonstrate that in many cases, encoder-based language models can be finetuned on roughly 50% of the original data without degradation in performance metrics. Along the way, we summarize lessons learned from applying out-of-the-box implementations of influence scores, quantify the effects of noisy and class-imbalanced data, and offer recommendations on score-based sampling for better accuracy and training efficiency.
D4AM: A General Denoising Framework for Downstream Acoustic Models
Authors: Authors: Chi-Chang Lee, Yu Tsao, Hsin-Min Wang, Chu-Song Chen
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2311.16595
Pdf link: https://arxiv.org/pdf/2311.16595
Abstract The performance of acoustic models degrades notably in noisy environments. Speech enhancement (SE) can be used as a front-end strategy to aid automatic speech recognition (ASR) systems. However, existing training objectives of SE methods are not fully effective at integrating speech-text and noisy-clean paired data for training toward unseen ASR systems. In this study, we propose a general denoising framework, D4AM, for various downstream acoustic models. Our framework fine-tunes the SE model with the backward gradient according to a specific acoustic model and the corresponding classification objective. In addition, our method aims to consider the regression objective as an auxiliary loss to make the SE model generalize to other unseen acoustic models. To jointly train an SE unit with regression and classification objectives, D4AM uses an adjustment scheme to directly estimate suitable weighting coefficients rather than undergoing a grid search process with additional training costs. The adjustment scheme consists of two parts: gradient calibration and regression objective weighting. The experimental results show that D4AM can consistently and effectively provide improvements to various unseen acoustic models and outperforms other combination setups. Specifically, when evaluated on the Google ASR API with real noisy data completely unseen during SE training, D4AM achieves a relative WER reduction of 24.65% compared with the direct feeding of noisy input. To our knowledge, this is the first work that deploys an effective combination scheme of regression (denoising) and classification (ASR) objectives to derive a general pre-processor applicable to various unseen ASR systems. Our code is available at https://github.com/ChangLee0903/D4AM.
Pseudo-Likelihood Inference
Authors: Authors: Theo Gruner, Boris Belousov, Fabio Muratore, Daniel Palenicek, Jan Peters
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2311.16656
Pdf link: https://arxiv.org/pdf/2311.16656
Abstract Simulation-Based Inference (SBI) is a common name for an emerging family of approaches that infer the model parameters when the likelihood is intractable. Existing SBI methods either approximate the likelihood, such as Approximate Bayesian Computation (ABC) or directly model the posterior, such as Sequential Neural Posterior Estimation (SNPE). While ABC is efficient on low-dimensional problems, on higher-dimensional tasks, it is generally outperformed by SNPE, which leverages function approximation. In this paper, we propose Pseudo-Likelihood Inference (PLI), a new method that brings neural approximation into ABC, making it competitive on challenging Bayesian system identification tasks. By utilizing integral probability metrics, we introduce a smooth likelihood kernel with an adaptive bandwidth that is updated based on information-theoretic trust regions. Thanks to this formulation, our method (i) allows for optimizing neural posteriors via gradient descent, (ii) does not rely on summary statistics, and (iii) enables multiple observations as input. In comparison to SNPE, it leads to improved performance when more data is available. The effectiveness of PLI is evaluated on four classical SBI benchmark tasks and on a highly dynamic physical system, showing particular advantages on stochastic simulations and multi-modal posterior landscapes.
As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors
Authors: Authors: Seungwoo Yoo, Kunho Kim, Vladimir G. Kim, Minhyuk Sung
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2311.16739
Pdf link: https://arxiv.org/pdf/2311.16739
Abstract We present As-Plausible-as-Possible (APAP) mesh deformation technique that leverages 2D diffusion priors to preserve the plausibility of a mesh under user-controlled deformation. Our framework uses per-face Jacobians to represent mesh deformations, where mesh vertex coordinates are computed via a differentiable Poisson Solve. The deformed mesh is rendered, and the resulting 2D image is used in the Score Distillation Sampling (SDS) process, which enables extracting meaningful plausibility priors from a pretrained 2D diffusion model. To better preserve the identity of the edited mesh, we fine-tune our 2D diffusion model with LoRA. Gradients extracted by SDS and a user-prescribed handle displacement are then backpropagated to the per-face Jacobians, and we use iterative gradient descent to compute the final deformation that balances between the user edit and the output plausibility. We evaluate our method with 2D and 3D meshes and demonstrate qualitative and quantitative improvements when using plausibility priors over geometry-preservation or distortion-minimization priors used by previous techniques.
Gradient-based Local Next-best-view Planning for Improved Perception of Targeted Plant Nodes
Authors: Authors: Akshay K. Burusa, Eldert J. van Henten, Gert Kootstra
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.16759
Pdf link: https://arxiv.org/pdf/2311.16759
Abstract Robots are increasingly used in tomato greenhouses to automate labour-intensive tasks such as selective harvesting and de-leafing. To perform these tasks, robots must be able to accurately and efficiently perceive the plant nodes that need to be cut, despite the high levels of occlusion from other plant parts. We formulate this problem as a local next-best-view (NBV) planning task where the robot has to plan an efficient set of camera viewpoints to overcome occlusion and improve the quality of perception. Our formulation focuses on quickly improving the perception accuracy of a single target node to maximise its chances of being cut. Previous methods of NBV planning mostly focused on global view planning and used random sampling of candidate viewpoints for exploration, which could suffer from high computational costs, ineffective view selection due to poor candidates, or non-smooth trajectories due to inefficient sampling. We propose a gradient-based NBV planner using differential ray sampling, which directly estimates the local gradient direction for viewpoint planning to overcome occlusion and improve perception. Through simulation experiments, we showed that our planner can handle occlusions and improve the 3D reconstruction and position estimation of nodes equally well as a sampling-based NBV planner, while taking ten times less computation and generating 28% more efficient trajectories.
The Sky's the Limit: Re-lightable Outdoor Scenes via a Sky-pixel Constrained Illumination Prior and Outside-In Visibility
Authors: Authors: James A. D. Gardner, Evgenii Kashin, Bernhard Egger, William A. P. Smith
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.16937
Pdf link: https://arxiv.org/pdf/2311.16937
Abstract Inverse rendering of outdoor scenes from unconstrained image collections is a challenging task, particularly illumination/albedo ambiguities and occlusion of the illumination environment (shadowing) caused by geometry. However, there are many cues in an image that can aid in the disentanglement of geometry, albedo and shadows. We exploit the fact that any sky pixel provides a direct measurement of distant lighting in the corresponding direction and, via a neural illumination prior, a statistical cue as to the remaining illumination environment. We also introduce a novel `outside-in' method for computing differentiable sky visibility based on a neural directional distance function. This is efficient and can be trained in parallel with the neural scene representation, allowing gradients from appearance loss to flow from shadows to influence estimation of illumination and geometry. Our method estimates high-quality albedo, geometry, illumination and sky visibility, achieving state-of-the-art results on the NeRF-OSR relighting benchmark. Our code and models can be found https://github.com/JADGardner/neusky
Surf-D: High-Quality Surface Generation for Arbitrary Topologies using Diffusion Models
Authors: Authors: Zhengming Yu, Zhiyang Dou, Xiaoxiao Long, Cheng Lin, Zekun Li, Yuan Liu, Norman Müller, Taku Komura, Marc Habermann, Christian Theobalt, Xin Li, Wenping Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Arxiv link: https://arxiv.org/abs/2311.17050
Pdf link: https://arxiv.org/pdf/2311.17050
Abstract In this paper, we present Surf-D, a novel method for generating high-quality 3D shapes as Surfaces with arbitrary topologies using Diffusion models. Specifically, we adopt Unsigned Distance Field (UDF) as the surface representation, as it excels in handling arbitrary topologies, enabling the generation of complex shapes. While the prior methods explored shape generation with different representations, they suffer from limited topologies and geometry details. Moreover, it's non-trivial to directly extend prior diffusion models to UDF because they lack spatial continuity due to the discrete volume structure. However, UDF requires accurate gradients for mesh extraction and learning. To tackle the issues, we first leverage a point-based auto-encoder to learn a compact latent space, which supports gradient querying for any input point through differentiation to effectively capture intricate geometry at a high resolution. Since the learning difficulty for various shapes can differ, a curriculum learning strategy is employed to efficiently embed various surfaces, enhancing the whole embedding process. With pretrained shape latent space, we employ a latent diffusion model to acquire the distribution of various shapes. Our approach demonstrates superior performance in shape generation across multiple modalities and conducts extensive experiments in unconditional generation, category conditional generation, 3D reconstruction from images, and text-to-shape tasks.
Keyword: super-resolution

CoSeR: Bridging Image and Language for Cognitive Super-Resolution
Authors: Authors: Haoze Sun, Wenbo Li, Jianzhuang Liu, Haoyu Chen, Renjing Pei, Xueyi Zou, Youliang Yan, Yujiu Yang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2311.16512
Pdf link: https://arxiv.org/pdf/2311.16512
Abstract Existing super-resolution (SR) models primarily focus on restoring local texture details, often neglecting the global semantic information within the scene. This oversight can lead to the omission of crucial semantic details or the introduction of inaccurate textures during the recovery process. In our work, we introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images. We achieve this by marrying image appearance and language understanding to generate a cognitive embedding, which not only activates prior information from large text-to-image diffusion models but also facilitates the generation of high-quality reference images to optimize the SR process. To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention", consolidating all conditional information into a single module. Consequently, our method successfully restores semantically correct and photorealistic details, demonstrating state-of-the-art performance across multiple benchmarks.
SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution
Authors: Authors: Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, Lei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.16518
Pdf link: https://arxiv.org/pdf/2311.16518
Abstract Owe to the powerful generative priors, the pre-trained text-to-image (T2I) diffusion models have become increasingly popular in solving the real-world image super-resolution problem. However, as a consequence of the heavy quality degradation of input low-resolution (LR) images, the destruction of local structures can lead to ambiguous image semantics. As a result, the content of reproduced high-resolution image may have semantic errors, deteriorating the super-resolution performance. To address this issue, we present a semantics-aware approach to better preserve the semantic fidelity of generative real-world image super-resolution. First, we train a degradation-aware prompt extractor, which can generate accurate soft and hard semantic prompts even under strong degradation. The hard semantic prompts refer to the image tags, aiming to enhance the local perception ability of the T2I model, while the soft semantic prompts compensate for the hard ones to provide additional representation information. These semantic prompts can encourage the T2I model to generate detailed and semantically accurate results. Furthermore, during the inference process, we integrate the LR images into the initial sampling noise to mitigate the diffusion model's tendency to generate excessive random details. The experiments show that our method can reproduce more realistic image details and hold better the semantics.
Brain-ID: Learning Robust Feature Representations for Brain Imaging
Authors: Authors: Peirong Liu, Oula Puonti, Xiaoling Hu, Daniel C. Alexander, Juan Eugenio Iglesias
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.16914
Pdf link: https://arxiv.org/pdf/2311.16914
Abstract Recent learning-based approaches have made astonishing advances in calibrated medical imaging like computerized tomography, yet they struggle to generalize in uncalibrated modalities -- notoriously magnetic resonance imaging (MRI), where performance is highly sensitive to the differences in MR contrast, resolution, and orientation between the training and testing data. This prevents broad applicability to the diverse clinical acquisition protocols in the real world. We introduce Brain-ID, a robust feature representation learning strategy for brain imaging, which is contrast-agnostic, and robust to the brain anatomy of each subject regardless of the appearance of acquired images (i.e., deformation, contrast, resolution, orientation, artifacts, etc). Brain-ID is trained entirely on synthetic data, and easily adapts to downstream tasks with our proposed simple one-layer solution. We validate the robustness of Brain-ID features, and evaluate their performance in a variety of downstream applications, including both contrast-independent (anatomy reconstruction/contrast synthesis, brain segmentation), and contrast-dependent (super-resolution, bias field estimation) tasks. Extensive experiments on 6 public datasets demonstrate that Brain-ID achieves state-of-the-art performance in all tasks, and more importantly, preserves its performance when only limited training data is available.
Super-Resolution through StyleGAN Regularized Latent Search: A Realism-Fidelity Trade-off
Authors: Authors: Marzieh Gheisari, Auguste Genovesio
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2311.16923
Pdf link: https://arxiv.org/pdf/2311.16923
Abstract This paper addresses the problem of super-resolution: constructing a highly resolved (HR) image from a low resolved (LR) one. Recent unsupervised approaches search the latent space of a StyleGAN pre-trained on HR images, for the image that best downscales to the input LR image. However, they tend to produce out-of-domain images and fail to accurately reconstruct HR images that are far from the original domain. Our contribution is twofold. Firstly, we introduce a new regularizer to constrain the search in the latent space, ensuring that the inverted code lies in the original image manifold. Secondly, we further enhanced the reconstruction through expanding the image prior around the optimal latent code. Our results show that the proposed approach recovers realistic high-quality images for large magnification factors. Furthermore, for low magnification factors, it can still reconstruct details that the generator could not have produced otherwise. Altogether, our approach achieves a good trade-off between fidelity and realism for the super-resolution task.

zoq / arxiv-updates

New submissions for Wed, 29 Nov 23 #653

Keyword: sgd

Keyword: optimization

Community Battery Energy Storage Systems for Enhancing Distribution System Operation: A Multi-objective Optimization Approach

Resource Scheduling for UAVs-aided D2D Networks: A Multi-objective Optimization Approach

SeamlessNeRF: Stitching Part NeRFs with Gradient Propagation

Physics-Inspired Discrete-Phase Optimization for 3D Beamforming with PIN-Diode Extra-Large Antenna Arrays

Emulators in JINSP

Load balancing in cloud data centers with optimized virtual machines placement

Variational Exploration Module VEM: A Cloud-Native Optimization and Validation Tool for Geospatial Modeling and AI Workflows

Optimal Observer Design Using Reinforcement Learning and Quadratic Neural Networks

A Graph Neural Network-Based QUBO-Formulated Hamiltonian-Inspired Loss Function for Combinatorial Optimization using Reinforcement Learning

Physics-Informed Neural Network for Discovering Systems with Unmeasurable States with Application to Lithium-Ion Batteries

On the Effect of Defections in Federated Learning and How to Prevent Them

GS-IR: 3D Gaussian Splatting for Inverse Rendering

On the Robustness of Decision-Focused Learning

RandMSAugment: A Mixed-Sample Augmentation for Limited-Data Scenarios

Communication Efficiency Optimization of Federated Learning for Computing and Network Convergence of 6G Networks

Multi-Irreducible Spectral Synchronization for Robust Rotation Averaging

HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors

MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices

Active RIS Enhanced Spectrum Sensing for Cognitive Radio Networks

Wireless Powered Metaverse: Joint Task Scheduling and Trajectory Design for Multi-Devices and Multi-UAVs

Sorting Out New York City's Trash Problem

Monitor Placement for Fault Localization in Deep Neural Network Accelerators

Harnessing customized built-in elements -- Empowering Component-Based Software Engineering and Design Systems with HTML5 Web Components

Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective

An extended discontinuous Galerkin shock tracking method

Sinkhorn Flow: A Continuous-Time Framework for Understanding and Generalizing the Sinkhorn Algorithm

LEDITS++: Limitless Image Editing using Text-to-Image Models

Riemannian Self-Attention Mechanism for SPD Networks

Asynchronous Wireless Federated Learning with Probabilistic Client Selection

Fair Interventions in Weighted Congestion Games

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Digital Twin-Enhanced Deep Reinforcement Learning for Resource Management in Networks Slicing

Optimization Theory Based Deep Reinforcement Learning for Resource Allocation in Ultra-Reliable Wireless Networked Control Systems

Lane-Keeping Control of Autonomous Vehicles Through a Soft-Constrained Iterative LQR

Continuous optimization methods for the graph isomorphism problem

Which Quantum Circuit Mutants Shall Be Used? An Empirical Evaluation of Quantum Circuit Mutations

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

LLaFS: When Large-Language Models Meet Few-Shot Segmentation

From Simulations to Reality: Enhancing Multi-Robot Exploration for Urban Search and Rescue

Principal-Agent Problem with Third Party: Information Design from Social Planner's Perspective

HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion

DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models

Keyword: adam

Parameterized Inapproximability Hypothesis under ETH

Keyword: gradient

Exploring Multiple Neighborhood Neural Cellular Automata (MNNCA) for Enhanced Texture Learning

DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification

SeamlessNeRF: Stitching Part NeRFs with Gradient Propagation

Estimating Post-Synaptic Effects for Online Training of Feed-Forward SNNs

Influence Scores at Scale for Efficient Language Data Sampling

D4AM: A General Denoising Framework for Downstream Acoustic Models

Pseudo-Likelihood Inference

As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors

Gradient-based Local Next-best-view Planning for Improved Perception of Targeted Plant Nodes

The Sky's the Limit: Re-lightable Outdoor Scenes via a Sky-pixel Constrained Illumination Prior and Outside-In Visibility

Surf-D: High-Quality Surface Generation for Arbitrary Topologies using Diffusion Models

Keyword: super-resolution

CoSeR: Bridging Image and Language for Cognitive Super-Resolution

SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

Brain-ID: Learning Robust Feature Representations for Brain Imaging

Super-Resolution through StyleGAN Regularized Latent Search: A Realism-Fidelity Trade-off