New submissions for Wed, 17 Jan 24

Keyword: sgd

Deep Learning Based Cyberbullying Detection in Bangla Language

Authors: Authors: Sristy Shidul Nath, Razuan Karim, Mahdi H. Miraz
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2401.06787
Pdf link: https://arxiv.org/pdf/2401.06787
Abstract The Internet is currently the largest platform for global communication including expressions of opinions, reviews, contents, images, videos and so forth. Moreover, social media has now become a very broad and highly engaging platform due to its immense popularity and swift adoption trend. Increased social networking, however, also has detrimental impacts on the society leading to a range of unwanted phenomena, such as online assault, intimidation, digital bullying, criminality and trolling. Hence, cyberbullying has become a pervasive and worrying problem that poses considerable psychological and emotional harm to the people, particularly amongst the teens and the young adults. In order to lessen its negative effects and provide victims with prompt support, a great deal of research to identify cyberbullying instances at various online platforms is emerging. In comparison to other languages, Bangla (also known as Bengali) has fewer research studies in this domain. This study demonstrates a deep learning strategy for identifying cyberbullying in Bengali, using a dataset of 12282 versatile comments from multiple social media sites. In this study, a two-layer bidirectional long short-term memory (Bi-LSTM) model has been built to identify cyberbullying, using a variety of optimisers as well as 5-fold cross validation. To evaluate the functionality and efficacy of the proposed system, rigorous assessment and validation procedures have been employed throughout the project. The results of this study reveals that the proposed model's accuracy, using momentum-based stochastic gradient descent (SGD) optimiser, is 94.46%. It also reflects a higher accuracy of 95.08% and a F1 score of 95.23% using Adam optimiser as well as a better accuracy of 94.31% in 5-fold cross validation.
An ADRC-Incorporated Stochastic Gradient Descent Algorithm for Latent Factor Analysis
Authors: Authors: Jinli Li, Ye Yuan
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.07012
Pdf link: https://arxiv.org/pdf/2401.07012
Abstract High-dimensional and incomplete (HDI) matrix contains many complex interactions between numerous nodes. A stochastic gradient descent (SGD)-based latent factor analysis (LFA) model is remarkably effective in extracting valuable information from an HDI matrix. However, such a model commonly encounters the problem of slow convergence because a standard SGD algorithm only considers the current learning error to compute the stochastic gradient without considering the historical and future state of the learning error. To address this critical issue, this paper innovatively proposes an ADRC-incorporated SGD (ADS) algorithm by refining the instance learning error by considering the historical and future state by following the principle of an ADRC controller. With it, an ADS-based LFA model is further achieved for fast and accurate latent factor analysis on an HDI matrix. Empirical studies on two HDI datasets demonstrate that the proposed model outperforms the state-of-the-art LFA models in terms of computational efficiency and accuracy for predicting the missing data of an HDI matrix.
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy
Authors: Authors: Chengli Tan, Jiangshe Zhang, Junmin Liu, Yicheng Wang, Yunda Hao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.07250
Pdf link: https://arxiv.org/pdf/2401.07250
Abstract Recently, sharpness-aware minimization (SAM) has attracted a lot of attention because of its surprising effectiveness in improving generalization performance.However, training neural networks with SAM can be highly unstable since the loss does not decrease along the direction of the exact gradient at the current point, but instead follows the direction of a surrogate gradient evaluated at another point nearby. To address this issue, we propose a simple renormalization strategy, dubbed StableSAM, so that the norm of the surrogate gradient maintains the same as that of the exact gradient. Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost. With elementary tools from convex optimization and learning theory, we also conduct a theoretical analysis of sharpness-aware training, revealing that compared to stochastic gradient descent (SGD), the effectiveness of SAM is only assured in a limited regime of learning rate. In contrast, we show how StableSAM extends this regime of learning rate and when it can consistently perform better than SAM with minor modification. Finally, we demonstrate the improved performance of StableSAM on several representative data sets and tasks.
Activations and Gradients Compression for Model-Parallel Training
Authors: Authors: Mikhail Rudakov, Aleksandr Beznosikov, Yaroslav Kholodov, Alexander Gasnikov
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.07788
Pdf link: https://arxiv.org/pdf/2401.07788
Abstract Large neural networks require enormous computational clusters of machines. Model-parallel training, when the model architecture is partitioned sequentially between workers, is a popular approach for training modern models. Information compression can be applied to decrease workers communication time, as it is often a bottleneck in such systems. This work explores how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We analyze compression methods such as quantization and TopK compression, and also experiment with error compensation techniques. Moreover, we employ TopK with AQ-SGD per-batch error feedback approach. We conduct experiments on image classification and language model fine-tuning tasks. Our findings demonstrate that gradients require milder compression rates than activations. We observe that $K=10\%$ is the lowest TopK compression level, which does not harm model convergence severely. Experiments also show that models trained with TopK perform well only when compression is also applied during inference. We find that error feedback techniques do not improve model-parallel training compared to plain compression, but allow model inference without compression with almost no quality drop. Finally, when applied with the AQ-SGD approach, TopK stronger than with $ K=30\%$ worsens model performance significantly.
Keyword: optimization

Reinforcement Learning for Optimizing RAG for Domain Chatbots
Authors: Authors: Mandar Kulkarni, Praveen Tangarajan, Kyung Kim, Anusua Trivedi
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.06800
Pdf link: https://arxiv.org/pdf/2401.06800
Abstract With the advent of Large Language Models (LLM), conversational assistants have become prevalent for domain use cases. LLMs acquire the ability to contextual question answering through training, and Retrieval Augmented Generation (RAG) further enables the bot to answer domain-specific questions. This paper describes a RAG-based approach for building a chatbot that answers user's queries using Frequently Asked Questions (FAQ) data. We train an in-house retrieval embedding model using infoNCE loss, and experimental results demonstrate that the in-house model works significantly better than the well-known general-purpose public embedding model, both in terms of retrieval accuracy and Out-of-Domain (OOD) query detection. As an LLM, we use an open API-based paid ChatGPT model. We noticed that a previously retrieved-context could be used to generate an answer for specific patterns/sequences of queries (e.g., follow-up queries). Hence, there is a scope to optimize the number of LLM tokens and cost. Assuming a fixed retrieval model and an LLM, we optimize the number of LLM tokens using Reinforcement Learning (RL). Specifically, we propose a policy-based model external to the RAG, which interacts with the RAG pipeline through policy actions and updates the policy to optimize the cost. The policy model can perform two actions: to fetch FAQ context or skip retrieval. We use the open API-based GPT-4 as the reward model. We then train a policy model using policy gradient on multiple training chat sessions. As a policy model, we experimented with a public gpt-2 model and an in-house BERT model. With the proposed RL-based optimization combined with similarity threshold, we are able to achieve significant cost savings while getting a slightly improved accuracy. Though we demonstrate results for the FAQ chatbot, the proposed RL approach is generic and can be experimented with any existing RAG pipeline.
Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification
Authors: Authors: Jiangming Shi, Xiangbo Yin, Yeyun Chen, Yachao Zhang, Zhizhong Zhang, Yuan Xie, Yanyun Qu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.06825
Pdf link: https://arxiv.org/pdf/2401.06825
Abstract Unsupervised visible-infrared person re-identification (USL-VI-ReID) is a promising yet challenging retrieval task. The key challenges in USL-VI-ReID are to effectively generate pseudo-labels and establish pseudo-label correspondences across modalities without relying on any prior annotations. Recently, clustered pseudo-label methods have gained more attention in USL-VI-ReID. However, previous methods fell short of fully exploiting the individual nuances, as they simply utilized a single memory that represented an identity to establish cross-modality correspondences, resulting in ambiguous cross-modality correspondences. To address the problem, we propose a Multi-Memory Matching (MMM) framework for USL-VI-ReID. We first design a Cross-Modality Clustering (CMC) module to generate the pseudo-labels through clustering together both two modality samples. To associate cross-modality clustered pseudo-labels, we design a Multi-Memory Learning and Matching (MMLM) module, ensuring that optimization explicitly focuses on the nuances of individual perspectives and establishes reliable cross-modality correspondences. Finally, we design a Soft Cluster-level Alignment (SCA) module to narrow the modality gap while mitigating the effect of noise pseudo-labels through a soft many-to-many alignment strategy. Extensive experiments on the public SYSU-MM01 and RegDB datasets demonstrate the reliability of the established cross-modality correspondences and the effectiveness of our MMM. The source codes will be released.
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
Authors: Authors: Shuaijie She, Shujian Huang, Wei Zou, Wenhao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.06838
Pdf link: https://arxiv.org/pdf/2401.06838
Abstract Though reasoning abilities are considered language-agnostic, existing LLMs exhibit inconsistent reasoning abilities across different languages, e.g., reasoning in a pivot language is superior to other languages due to the imbalance of multilingual training data.To enhance reasoning abilities in non-pivot languages, we propose an alignment-as-preference optimization framework. Specifically, we adopt an open-source translation model to estimate the consistency between answers in non-pivot and pivot languages. We further adopt the answer consistency as the preference for DPO or PPO thus optimizing the lesser reasoning. Experiments show that our method significantly improves the model's multilingual reasoning, with better reasoning consistency across languages. Our framework achieved a 13.7% accuracy improvement on out-of-domain datasets MSVAMP while preserving the competitive performance on MGSM. Moreover, we find that iterative DPO is helpful for further alignment and improvement of the model's multilingual mathematical reasoning ability, further pushing the improvement to 16.7%
Advanced safety filter based on SOS Control Barrier and Lyapunov Functions
Authors: Authors: Michael Schneeberger, Silvia Mastellone, Florian Dörfler
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.06901
Pdf link: https://arxiv.org/pdf/2401.06901
Abstract This paper presents a novel safety filter framework based on Control Barrier Functions (CBFs) and Control Lyapunov-like Functions (CLFs). The CBF guarantees forward invariance of the safe set, constraining system trajectories within state constraints, while the CLF guides the system away from unsafe states towards a nominal region, preserving the performance of a nominal controller. The first part of this work focuses on determining compatible CBF and CLF in the presence of linear or quadratic input constraints. This is achieved by formulating the CBF and CLF conditions, along with the input constraints, as Sum of Squares (SOS) constraints using Putinar's Positivstellensatz. For solving the resulting SOS optimization problem, we employ an alternating algorithm that simultaneously searches for a feasible controller in the class of rational functions of the state. The second part of this work details the implementation of the safety filter as a Quadratically Constrained Quadratic Program (QCQP), whose constraints encode the CBF and CLF conditions as well as the input constraints. To avoid the chattering effect and guarantee the uniqueness and Lipschitz continuity of solutions, the state-dependent inequality constraints of the QCQP are selected to be sufficiently regular. Finally, we demonstrate the method on a detailed case study involving the control of a three-phase ac/dc power converter connected to an infinite bus.
Multi-hop Relaying with Mixed Half and Full Duplex Relays for Offloading to MEC
Authors: Authors: Pavel Mach, Zdenek Becvar, Mohammadsaleh Nikooroo
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2401.06908
Pdf link: https://arxiv.org/pdf/2401.06908
Abstract In this paper, we focus on offloading a computing task from a user equipment (UE) to a multi-access edge computing (MEC) server via multi-hop relaying. We assume a general relaying case where relays are energy-constrained devices, such as other UEs, internet of things (IoT) devices, or unmanned aerial vehicles. To this end, we formulate the problem as a minimization of the sum energy consumed by the energy-constrained devices under the constraint on the maximum requested time of the task processing. Then, we propose a multi-hop relaying combining half and full duplexes at each individual relay involved in the offloading. We proof that the proposed multi-hop relaying is convex, thus it can be optimized by conventional convex optimization methods. We show our proposal outperforms existing multi-hop relaying schemes in terms of probability that tasks are processed within required time by up to 38\% and, at the same time, decreases energy consumption by up to 28%.
Fast and Accurate Zero-Training Classification for Tabular Engineering Data
Authors: Authors: Cyril Picard, Faez Ahmed
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2401.06948
Pdf link: https://arxiv.org/pdf/2401.06948
Abstract In engineering design, navigating complex decision-making landscapes demands a thorough exploration of the design, performance, and constraint spaces, often impeded by resource-intensive simulations. Data-driven methods can mitigate this challenge by harnessing historical data to delineate feasible domains, accelerate optimization, or evaluate designs. However, the implementation of these methods usually demands machine-learning expertise and multiple trials to choose the right method and hyperparameters. This makes them less accessible for numerous engineering situations. Additionally, there is an inherent trade-off between training speed and accuracy, with faster methods sometimes compromising precision. In our paper, we demonstrate that a recently released general-purpose transformer-based classification model, TabPFN, is both fast and accurate. Notably, it requires no dataset-specific training to assess new tabular data. TabPFN is a Prior-Data Fitted Network, which undergoes a one-time offline training across a broad spectrum of synthetic datasets and performs in-context learning. We evaluated TabPFN's efficacy across eight engineering design classification problems, contrasting it with seven other algorithms, including a state-of-the-art AutoML method. For these classification challenges, TabPFN consistently outperforms in speed and accuracy. It is also the most data-efficient and provides the added advantage of being differentiable and giving uncertainty estimates. Our findings advocate for the potential of pre-trained models that learn from synthetic data and require no domain-specific tuning to make data-driven engineering design accessible to a broader community and open ways to efficient general-purpose models valid across applications. Furthermore, we share a benchmark problem set for evaluating new classification algorithms in engineering design.
Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud Semantic Segmentation via Decoupling Optimization
Authors: Authors: Mengtian Li, Shaohui Lin, Zihan Wang, Yunhang Shen, Baochang Zhang, Lizhuang Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.06975
Pdf link: https://arxiv.org/pdf/2401.06975
Abstract Semi-supervised learning (SSL), thanks to the significant reduction of data annotation costs, has been an active research topic for large-scale 3D scene understanding. However, the existing SSL-based methods suffer from severe training bias, mainly due to class imbalance and long-tail distributions of the point cloud data. As a result, they lead to a biased prediction for the tail class segmentation. In this paper, we introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively. In particular, we first employ two-round pseudo-label generation to select unlabeled points across head-to-tail classes. We further introduce multi-class imbalanced focus loss to adaptively pay more attention to feature learning across head-to-tail classes. We fix the backbone parameters after feature learning and retrain the classifier using ground-truth points to update its parameters. Extensive experiments demonstrate the effectiveness of our method outperforming previous state-of-the-art methods on both indoor and outdoor 3D point cloud datasets (i.e., S3DIS, ScanNet-V2, Semantic3D, and SemanticKITTI) using 1% and 1pt evaluation.
Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization
Authors: Authors: A F M Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, Tianyi Chen
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.06980
Pdf link: https://arxiv.org/pdf/2401.06980
Abstract In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an unsupervised loss and a supervised loss respectively, leveraging recent advances in penalty-based bilevel optimization to solve this challenging ASR problem with affordable complexity and rigorous convergence guarantees.} To evaluate BL-JUST, extensive experiments on the LibriSpeech and TED-LIUM v2 datasets have been conducted. BL-JUST achieves superior performance over the commonly used pre-training followed by fine-tuning strategy.
UAV-assisted Emergency Integrated Sensing and Communication Networks: A CNN-based Rapid Deployment Approach
Authors: Authors: Zao Wang, Lianming Xu, Luyang Hou, Ruoguang Li, Li Wang
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.07001
Pdf link: https://arxiv.org/pdf/2401.07001
Abstract UAV-assisted integrated sensing and communication (ISAC) network is crucial for post-disaster emergency rescue. The speed of UAV deployment will directly impact rescue results. However, the ISAC UAV deployment in emergency scenarios is difficult to solve, which contradicts the rapid deployment. In this paper, we propose a two-stage deployment framework to achieve rapid ISAC UAV deployment in emergency scenarios, which consists of an offline stage and an online stage. Specifically, in the offline stage, we first formulate the ISAC UAV deployment problem and define the ISAC utility as the objective function, which integrates communication rate and localization accuracy. Secondly, we develop a dynamic particle swarm optimization (DPSO) algorithm to construct an optimized UAV deployment dataset. Finally, we train a convolutional neural network (CNN) model with this dataset, which replaces the time-consuming DPSO algorithm. In the online stage, the trained CNN model can be used to make quick decisions for the ISAC UAV deployment. The simulation results indicate that the trained CNN model achieves superior ISAC performance compared to the classic particle swarm optimization algorithm. Additionally, it significantly reduces the deployment time by more than 96%.
COIN: Chance-Constrained Imitation Learning for Uncertainty-aware Adaptive Resource Oversubscription Policy
Authors: Authors: Lu Wang, Mayukh Das, Fangkai Yang, Chao Duo, Bo Qiao, Hang Dong, Si Qin, Chetan Bansal, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.07051
Pdf link: https://arxiv.org/pdf/2401.07051
Abstract We address the challenge of learning safe and robust decision policies in presence of uncertainty in context of the real scientific problem of adaptive resource oversubscription to enhance resource efficiency while ensuring safety against resource congestion risk. Traditional supervised prediction or forecasting models are ineffective in learning adaptive policies whereas standard online optimization or reinforcement learning is difficult to deploy on real systems. Offline methods such as imitation learning (IL) are ideal since we can directly leverage historical resource usage telemetry. But, the underlying aleatoric uncertainty in such telemetry is a critical bottleneck. We solve this with our proposed novel chance-constrained imitation learning framework, which ensures implicit safety against uncertainty in a principled manner via a combination of stochastic (chance) constraints on resource congestion risk and ensemble value functions. This leads to substantial ($\approx 3-4\times$) improvement in resource efficiency and safety in many oversubscription scenarios, including resource management in cloud services.
GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching
Authors: Authors: Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, Dacheng Tao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.07080
Pdf link: https://arxiv.org/pdf/2401.07080
Abstract Beyond the text detection and recognition tasks in image text spotting, video text spotting presents an augmented challenge with the inclusion of tracking. While advanced end-to-end trainable methods have shown commendable performance, the pursuit of multi-task optimization may pose the risk of producing sub-optimal outcomes for individual tasks. In this paper, we highlight a main bottleneck in the state-of-the-art video text spotter: the limited recognition capability. In response to this issue, we propose to efficiently turn an off-the-shelf query-based image text spotter into a specialist on video and present a simple baseline termed GoMatching, which focuses the training efforts on tracking while maintaining strong recognition performance. To adapt the image text spotter to video datasets, we add a rescoring head to rescore each detected instance's confidence via efficient tuning, leading to a better tracking candidate pool. Additionally, we design a long-short term matching module, termed LST-Matcher, to enhance the spotter's tracking capability by integrating both long- and short-term matching results via Transformer. Based on the above simple designs, GoMatching achieves impressive performance on two public benchmarks, e.g., setting a new record on the ICDAR15-video dataset, and one novel test set with arbitrary-shaped text, while saving considerable training budgets. The code will be released at https://github.com/Hxyz-123/GoMatching.
Optimization of Inter-group Criteria for Clustering with Minimum Size Constraints
Authors: Authors: Eduardo S. Laber, Lucas Murtinho
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Arxiv link: https://arxiv.org/abs/2401.07091
Pdf link: https://arxiv.org/pdf/2401.07091
Abstract Internal measures that are used to assess the quality of a clustering usually take into account intra-group and/or inter-group criteria. There are many papers in the literature that propose algorithms with provable approximation guarantees for optimizing the former. However, the optimization of inter-group criteria is much less understood. Here, we contribute to the state-of-the-art of this literature by devising algorithms with provable guarantees for the maximization of two natural inter-group criteria, namely the minimum spacing and the minimum spanning tree spacing. The former is the minimum distance between points in different groups while the latter captures separability through the cost of the minimum spanning tree that connects all groups. We obtain results for both the unrestricted case, in which no constraint on the clusters is imposed, and for the constrained case where each group is required to have a minimum number of points. Our constraint is motivated by the fact that the popular Single Linkage, which optimizes both criteria in the unrestricted case, produces clusterings with many tiny groups. To complement our work, we present an empirical study with 10 real datasets, providing evidence that our methods work very well in practical settings.
Resource Allocation in Uplink Multi STAR-RIS-aided NOMA System via Meta-Learning
Authors: Authors: Sepideh Javadi, Armin Farhadi, Mohammad Robat Mili, Eduard Jorswieck
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.07100
Pdf link: https://arxiv.org/pdf/2401.07100
Abstract Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) is a novel technology which enables the full-space coverage by splitting the incident signal into reflected and transmitted signals. In this letter, a multi STAR-RIS-aided system using non-orthogonal multiple access (NOMA) in an uplink transmission is considered, where the multi-order reflections among multiple STAR-RISs assist the transmission from the single-antenna users to the multi-antenna base station (BS). Specifically, the total sum rate maximization problem is solved by jointly optimizing the active beamforming, power allocation, transmission and reflection beamforming at the STAR-RIS, and user-STAR-RIS association indicator. To solve the non-convex optimization problem, a novel deep reinforcement learning algorithm is proposed which is the combination of meta-learning and deep deterministic policy gradient (DDPG), namely Meta-DDPG. Numerical results demonstrate that the proposed Meta-DDPG algorithm outperforms the conventional DDPG algorithm.
Secrecy Coding for the Binary Symmetric Wiretap Channel via Linear Programming
Authors: Authors: Ali Nikkhah, Morteza Shoushtari, Bahareh Akhbari, Willie K. Harrison
Subjects: Information Theory (cs.IT)
Arxiv link: https://arxiv.org/abs/2401.07141
Pdf link: https://arxiv.org/pdf/2401.07141
Abstract In this paper, we use a linear programming (LP) optimization approach to evaluate the equivocation for a wiretap channel where the main channel is noiseless, and the wiretap channel is a binary symmetric channel (BSC). Using this technique, we present an analytical limit for the achievable secrecy rate in the finite blocklength regime that is tighter than traditional fundamental limits. We also propose a secrecy coding technique that outperforms random binning codes. When there is one overhead bit, this coding technique is optimum and achieves the analytical limit. For cases with additional bits of overhead, our coding scheme can achieve equivocation rates close to the new limit. Furthermore, we evaluate the patterns of the generator matrix and the parity-check matrix for linear codes and we present binning techniques for both linear and non-linear codes using two different approaches: recursive and non-recursive. To our knowledge, this is the first optimization solution for secrecy coding obtained through linear programming.
Adaptive Prognostic Malfunction Based Processor for Autonomous Landing Guidance Assistance System Using FPGA
Authors: Authors: Hossam O. Ahmed, David Wyatt
Subjects: Hardware Architecture (cs.AR); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.07143
Pdf link: https://arxiv.org/pdf/2401.07143
Abstract The demand for more developed and agile urban taxi drones is increasing rapidly nowadays to sustain crowded cities and their traffic issues. The critical factor for spreading such technology could be related to the safety criteria that must be considered. One of the most critical safety aspects for such vertical and/or Short Take-Off and Landing (V/STOL) drones is related to safety during the landing stage, in which most of the recent flight accidents have occurred. This paper focused on solving this issue by proposing decentralized processing cores that could improve the landing failure rate by depending on a Fuzzy Logic System (FLS) and additional Digital Signal Processing (DSP) elements. Also, the proposed system will enhance the safety factor during the landing stages by adding a self-awareness feature in case a certain sensor malfunction occurs using the proposed Adaptive Prognostic Malfunction Unit (APMU). This proposed coarse-grained Autonomous Landing Guidance Assistance System (ALGAS4) processing architecture has been optimized using different optimization techniques. The ALGAS4 architecture has been designed completely using VHDL, and the targeted FPGA was the INTEL Cyclone V 5CGXFC9D6F27C7 chip. According to the synthesis findings of the INTEL Quartus Prime software, the maximum working frequency of the ALGAS4 system is 278.24 MHz. In addition, the proposed ALGAS4 system could maintain a maximum computing performance of approximately 74.85 GOPS while using just 166.56 mW for dynamic and I/O power dissipation.
Inroads to a Structured Data Natural Language Bijection and the role of LLM annotation
Authors: Authors: Blake Vente
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.07190
Pdf link: https://arxiv.org/pdf/2401.07190
Abstract This work finds limited evidence supporting the theory that using multiple tasks with sequence-to-sequence transformer language models can improve performance on some metrics. In particular, the multi-task generalist t5-small outperforms the specialist t5-small with a $F_1$ of $0.771$ up from $0.692$, which may point to underlying cross-task knowledge generalization. This further suggests that even with the same network, "re-using" the same data in a different way may lead to higher performance in some metrics. However, the inverse task alone is likely only an optimization strategy, since it does not yield a significant general improvement at the model sizes explored in this work. Also, adding $\approx 4500$ LLM annotated records (interlaced with the $12800$ WebNLG training records) does not substantially change automatic metric performance compared to the same t5-small model without the synthetic data. This may be due to a learning capacity bottleneck on account of model size, and decreases observed may be due to distributional differences in the corpora. Future research using larger models or human evaluation is required to more fully explain the mechanisms contributing to performance on these tasks.
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy
Authors: Authors: Chengli Tan, Jiangshe Zhang, Junmin Liu, Yicheng Wang, Yunda Hao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.07250
Pdf link: https://arxiv.org/pdf/2401.07250
Abstract Recently, sharpness-aware minimization (SAM) has attracted a lot of attention because of its surprising effectiveness in improving generalization performance.However, training neural networks with SAM can be highly unstable since the loss does not decrease along the direction of the exact gradient at the current point, but instead follows the direction of a surrogate gradient evaluated at another point nearby. To address this issue, we propose a simple renormalization strategy, dubbed StableSAM, so that the norm of the surrogate gradient maintains the same as that of the exact gradient. Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost. With elementary tools from convex optimization and learning theory, we also conduct a theoretical analysis of sharpness-aware training, revealing that compared to stochastic gradient descent (SGD), the effectiveness of SAM is only assured in a limited regime of learning rate. In contrast, we show how StableSAM extends this regime of learning rate and when it can consistently perform better than SAM with minor modification. Finally, we demonstrate the improved performance of StableSAM on several representative data sets and tasks.
Emergency Localization for Mobile Ground Users: An Adaptive UAV Trajectory Planning Method
Authors: Authors: Zhihao Zhu, Jiafan He, Luyang Hou, Lianming Xu, Wendi Zhu, Li Wang
Subjects: Multiagent Systems (cs.MA)
Arxiv link: https://arxiv.org/abs/2401.07256
Pdf link: https://arxiv.org/pdf/2401.07256
Abstract In emergency search and rescue scenarios, the quick location of trapped people is essential. However, disasters can render the Global Positioning System (GPS) unusable. Unmanned aerial vehicles (UAVs) with localization devices can serve as mobile anchors due to their agility and high line-of-sight (LoS) probability. Nonetheless, the number of available UAVs during the initial stages of disaster relief is limited, and innovative methods are needed to quickly plan UAV trajectories to locate non-uniformly distributed dynamic targets while ensuring localization accuracy. To address this challenge, we design a single UAV localization method without hovering, use the maximum likelihood estimation (MLE) method to estimate the location of mobile users and define the upper bound of the localization error by considering users' movement.Combining this localization method and localization error-index, we utilize the enhanced particle swarm optimization (EPSO) algorithm and edge access strategy to develop a low complexity localization-oriented adaptive trajectory planning algorithm. Simulation results demonstrate that our method outperforms other baseline algorithms, enabling faster localization without compromising localization accuracy.
FROST-BRDF: A Fast and Robust Optimal Sampling Technique for BRDF Acquisition
Authors: Authors: Ehsan Miandji, Tanaboon Tongbuasirilai, Saghi Hajisharif, Behnaz Kavoosighafi, Jonas Unger
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.07283
Pdf link: https://arxiv.org/pdf/2401.07283
Abstract Efficient and accurate BRDF acquisition of real world materials is a challenging research problem that requires sampling millions of incident light and viewing directions. To accelerate the acquisition process, one needs to find a minimal set of sampling directions such that the recovery of the full BRDF is accurate and robust given such samples. In this paper, we formulate BRDF acquisition as a compressed sensing problem, where the sensing operator is one that performs sub-sampling of the BRDF signal according to a set of optimal sample directions. To solve this problem, we propose the Fast and Robust Optimal Sampling Technique (FROST) for designing a provably optimal sub-sampling operator that places light-view samples such that the recovery error is minimized. FROST casts the problem of designing an optimal sub-sampling operator for compressed sensing into a sparse representation formulation under the Multiple Measurement Vector (MMV) signal model. The proposed reformulation is exact, i.e. without any approximations, hence it converts an intractable combinatorial problem into one that can be solved with standard optimization techniques. As a result, FROST is accompanied by strong theoretical guarantees from the field of compressed sensing. We perform a thorough analysis of FROST-BRDF using a 10-fold cross-validation with publicly available BRDF datasets and show significant advantages compared to the state-of-the-art with respect to reconstruction quality. Finally, FROST is simple, both conceptually and in terms of implementation, it produces consistent results at each run, and it is at least two orders of magnitude faster than the prior art.
Hybrid Coded-Uncoded Caching in Multi-Access Networks with Non-uniform Demands
Authors: Authors: Abdollah Ghaffari Sheshjavani, Ahmad Khonsari, Masoumeh Moradian, Seyed Pooya Shariatpanahi, Seyedeh Bahereh Hassanpour
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2401.07288
Pdf link: https://arxiv.org/pdf/2401.07288
Abstract To address the massive growth of data traffic over cellular networks, increasing spatial reuse of the frequency spectrum by the deployment of small base stations (SBSs) has been considered. For rapid deployment of SBSs in the networks, caching popular content along with new coded caching schemes are proposed. To maximize the cellular network's capacity, densifying it with small base stations is inevitable. In ultra-dense cellular networks, coverage of SBSs may overlap. To this aim, the multi-access caching system, where users potentially can access multiple cache nodes simultaneously, has attracted more attention in recent years. Most previous works on multi-access coded caching, only consider specific conditions such as cyclic wrap-around network topologies. In this paper, we investigate caching in ultra-dense cellular networks, where different users can access different numbers of caches under non-uniform content popularity distribution, and propose Multi-Access Hybrid coded-uncoded Caching (MAHC). We formulate the optimization problem of the proposed scheme for general network topologies and evaluate it for 2-SBS network scenarios. The numerical and simulation results show that the proposed MAHC scheme outperforms optimal conventional uncoded and previous multi-access coded caching (MACC) schemes.
CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design
Authors: Authors: Zeji Yi, Chaoyi Pan, Guanqi He, Guannan Qu, Guanya Shi
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.07369
Pdf link: https://arxiv.org/pdf/2401.07369
Abstract Sampling-based Model Predictive Control (MPC) has been a practical and effective approach in many domains, notably model-based reinforcement learning, thanks to its flexibility and parallelizability. Despite its appealing empirical performance, the theoretical understanding, particularly in terms of convergence analysis and hyperparameter tuning, remains absent. In this paper, we characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI). We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems. We then extend to more general nonlinear systems. Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVariance-Optimal MPC (CoVo-MPC) that optimally schedules the sampling covariance to optimize the convergence rate. Empirically, CoVo-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quadrotor agile control tasks. Videos and Appendices are available at \url{https://lecar-lab.github.io/CoVO-MPC/}.
A Data-driven Resilience Framework of Directionality Configuration based on Topological Credentials in Road Networks
Authors: Authors: H M Imran Kays, Khondhaker Al Momin, K.K. "Muralee" Muraleetharan, Arif Mohaimin Sadri
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Applications (stat.AP)
Arxiv link: https://arxiv.org/abs/2401.07371
Pdf link: https://arxiv.org/pdf/2401.07371
Abstract Roadway reconfiguration is a crucial aspect of transportation planning, aiming to enhance traffic flow, reduce congestion, and improve overall road network performance with existing infrastructure and resources. This paper presents a novel roadway reconfiguration technique by integrating optimization based Brute Force search approach and decision support framework to rank various roadway configurations for better performance. The proposed framework incorporates a multi-criteria decision analysis (MCDA) approach, combining input from generated scenarios during the optimization process. By utilizing data from optimization, the model identifies total betweenness centrality (TBC), system travel time (STT), and total link traffic flow (TLTF) as the most influential decision variables. The developed framework leverages graph theory to model the transportation network topology and apply network science metrics as well as stochastic user equilibrium traffic assignment to assess the impact of each roadway configuration on the overall network performance. To rank the roadway configurations, the framework employs machine learning algorithms, such as ridge regression, to determine the optimal weights for each criterion (i.e., TBC, STT, TLTF). Moreover, the network-based analysis ensures that the selected configurations not only optimize individual roadway segments but also enhance system-level efficiency, which is particularly helpful as the increasing frequency and intensity of natural disasters and other disruptive events underscore the critical need for resilient transportation networks. By integrating multi-criteria decision analysis, machine learning, and network science metrics, the proposed framework would enable transportation planners to make informed and data-driven decisions, leading to more sustainable, efficient, and resilient roadway configurations.
A Novel Optimization Algorithm for Buffer and Splitter Minimization in Phase-Skipping Adiabatic Quantum-Flux-Parametron Circuits
Authors: Authors: Robert S. Aviles, Peter A. Beerel
Subjects: Emerging Technologies (cs.ET)
Arxiv link: https://arxiv.org/abs/2401.07393
Pdf link: https://arxiv.org/pdf/2401.07393
Abstract Adiabatic Quantum-Flux-Parametron (AQFP) logic is a promising emerging device technology that promises six orders of magnitude lower power than CMOS. However, AQFP is challenged by operation at only ultra-low temperatures, has high latency and area, and requires a complex clocking scheme. In particular, every logic gate, buffer, and splitter must be clocked and each pair of connected clocked gates requires overlapping alternating current (AC) clock signals. In particular, clocked buffers need to be used to balance re-convergent logic paths, a problem that is exacerbated by every multi-node fanout needing a tree of clocked splitters. To reduce circuit area many works have proposed buffer and splitter insertion optimization algorithms and recent works have demonstrated a phase-skipping clocking scheme that reduces latency and area. This paper proposes the first algorithm to optimize buffer and splitter insertion for circuits that adopt phase-skipping and demonstrate the resulting performance improvements for a suite of AQFP benchmark circuits.
Fairness-aware Photovoltaic Generation Limits for Voltage Regulation in Power Distribution Networks using Conservative Linear Approximations
Authors: Authors: Rahul K. Gupta, Paprapee Buason, Daniel K. Molzahn
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.07404
Pdf link: https://arxiv.org/pdf/2401.07404
Abstract This paper proposes a framework for fairly curtailing photovoltaic (PV) plants in response to the over-voltage problem in PV-rich distribution networks. The framework imposes PV generation limits to avoid overvoltages. These limits are computed a day ahead of real-time operations by solving an offline stochastic optimization problem using forecasted scenarios for PV generation and load demand. The framework minimizes the overall curtailment while considering fairness by reducing disparities in curtailments among different PV owners. We model the distribution grid constraints using a conservative linear approximation (CLA) of the AC power flow equations which is computed using a set of sampled power injections from the day-ahead predicted scenarios. The proposed framework is numerically validated on a CIGRE benchmark network interfaced with a large number of PV plants. We compare the performance of the proposed framework versus an alternative formulation that does not incorporate fairness considerations. To this end, we assess tradeoffs between fairness, as quantified with the Jain Fairness Index (JFI), and the total curtailed energy.
Multi-Task DNS Security Analysis via High-Order Heterogeneous Graph Embedding
Authors: Authors: Meng Qin
Subjects: Social and Information Networks (cs.SI); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.07410
Pdf link: https://arxiv.org/pdf/2401.07410
Abstract DNS is an essential Internet infrastructure to support network applications and services, but is also a significant tool exploited by various cyberattacks. Existing DNS security analysis techniques mostly focus on one specific task associated with one single entity (e.g., domain) via conventional feature engineering. They rely heavily on the labor-intensive feature selection and largely ignore the intrinsic correlations among the heterogeneous DNS entities (e.g., domain and IP). In this paper, I explore the potential of heterogeneous graph embedding to automatically learn the behavior features of multiple DNS entities, and to simultaneously support more than one security tasks. Considering the joint optimization of malicious domain detection and IP reputation evaluation as an example, I propose a novel joint DNS embedding (JDE) model to formulate the DNS query behavior via a similarity-enhanced graph with heterogeneous entities. The random walk technique is applied to the heterogeneous graph to comprehensively explore the hidden homogeneous and heterogeneous high-order proximities among domains and IPs. Extensive experiments on real DNS traffic demonstrate that the joint optimization of multiple tasks with the latent high-order proximities can lead to better security analysis performance for all the tasks than respectively optimizing each single task with the observable low-order proximity.
Startup Delay Aware Short Video Ordering: Problem, Model, and A Reinforcement Learning based Algorithm
Authors: Authors: Zhipeng Gao, Chunxi Li, Yongxiang Zhao, Baoxian Zhang
Subjects: Multimedia (cs.MM)
Arxiv link: https://arxiv.org/abs/2401.07411
Pdf link: https://arxiv.org/pdf/2401.07411
Abstract Short video applications have attracted billions of users on the Internet and can satisfy diverse users' fragmented spare time with content-rich and duration-short videos. To achieve fast playback at user side, existing short video systems typically enforce burst transmission of initial segment of each video when being requested for improved quality of user experiences. However, such a way of burst transmissions can cause unexpected large startup delays at user side. This is because users may frequently switch videos when sequentially watching a list of short videos recommended by the server side, which can cause excessive burst transmissions of initial segments of different short videos and thus quickly deplete the network transmission capacity. In this paper, we adopt token bucket to characterize the video transmission path between video server and each user, and accordingly study how to effectively reduce the startup delay of short videos by effectively arranging the viewing order of a video list at the server side. We formulate the optimal video ordering problem for minimizing the maximum video startup delay as a combinatorial optimization problem and prove its NP-hardness. We accordingly propose a Partially Shared Actor Critic reinforcement learning algorithm (PSAC) to learn optimized video ordering strategy. Numerical results based on a real dataset provided by a large-scale short video service provider demonstrate that the proposed PSAC algorithm can significantly reduce the video startup delay compared to baseline algorithms.
Evolutionary Multi-Objective Diversity Optimization
Authors: Authors: Anh Viet Do, Mingyu Guo, Aneta Neumann, Frank Neumann
Subjects: Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2401.07454
Pdf link: https://arxiv.org/pdf/2401.07454
Abstract Creating diverse sets of high quality solutions has become an important problem in recent years. Previous works on diverse solutions problems consider solutions' objective quality and diversity where one is regarded as the optimization goal and the other as the constraint. In this paper, we treat this problem as a bi-objective optimization problem, which is to obtain a range of quality-diversity trade-offs. To address this problem, we frame the evolutionary process as evolving a population of populations, and present a suitable general implementation scheme that is compatible with existing evolutionary multi-objective search methods. We realize the scheme in NSGA-II and SPEA2, and test the methods on various instances of maximum coverage, maximum cut and minimum vertex cover problems. The resulting non-dominated populations exhibit rich qualitative features, giving insights into the optimization instances and the quality-diversity trade-offs they induce.
Input Convex Lipschitz RNN: A Fast and Robust Approach for Engineering Tasks
Authors: Authors: Zihao Wang, P S Pravin, Zhe Wu
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.07494
Pdf link: https://arxiv.org/pdf/2401.07494
Abstract Computational efficiency and adversarial robustness are critical factors in real-world engineering applications. Yet, conventional neural networks often fall short in addressing both simultaneously, or even separately. Drawing insights from natural physical systems and existing literature, it is known that an input convex architecture enhances computational efficiency, while a Lipschitz-constrained architecture bolsters adversarial robustness. By leveraging the strengths of convexity and Lipschitz continuity, we develop a novel network architecture, termed Input Convex Lipschitz Recurrent Neural Network. This model outperforms existing recurrent units across a spectrum of engineering tasks in terms of computational efficiency and adversarial robustness. These tasks encompass a benchmark MNIST image classification, real-world solar irradiance prediction for Solar PV system planning at LHT Holdings in Singapore, and real-time Model Predictive Control optimization for a chemical reactor.
Study Features via Exploring Distribution Structure
Authors: Authors: Chunxu Cao, Qiang Zhang
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.07540
Pdf link: https://arxiv.org/pdf/2401.07540
Abstract In this paper, we present a novel framework for data redundancy measurement based on probabilistic modeling of datasets, and a new criterion for redundancy detection that is resilient to noise. We also develop new methods for data redundancy reduction using both deterministic and stochastic optimization techniques. Our framework is flexible and can handle different types of features, and our experiments on benchmark datasets demonstrate the effectiveness of our methods. We provide a new perspective on feature selection, and propose effective and robust approaches for both supervised and unsupervised learning problems.
Eco-driving Intelligent Systems and Algorithms: A Patent Review
Authors: Authors: Zhipeng Ma, Bo Nørregaard Jørgensen, Zheng Grace Ma
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.07559
Pdf link: https://arxiv.org/pdf/2401.07559
Abstract The transportation industry remains a significant contributor to greenhouse gas emissions, highlighting the requirement for intelligent systems to enhance vehicle energy efficiency. The intellectual property rights of developed systems should be protected by patents. However, there is no patent overview of eco-driving intelligent systems. Unlike a scientific article, a patent documentation indicates both novelty and commercialization potential of an inventor. To address this research gap, this paper provides a patent overview of eco-driving intelligent systems and algorithms. 424 patents in the Google Patent database are analyzed. The patent analysis results show that the top three Cooperative Patent Classifications are: Y02T - climate change mitigation technologies related to transportation (50.7%), B60W - Conjoint control of vehicle subunits of different types or different functions (34.4%) and B60L - Propulsion of electrically-propelled vehicles (20.2%). 219 patents were filed after 2016 when deep learning became popular and can be categorized into five groups: vehicle energy management, smart driving, ecological and sustainable driving, fuel consumption reduction, and driving behavior optimization. Furthermore, all 219 patents involve the physical components of the intelligent system and/or novel machine learning/deep learning algorithms. Moreover, over 70% of them are granted by the China National Intellectual Property Administration.
A greedy heuristic for graph burning
Authors: Authors: Jesús García-Díaz, José Alejandro Cornejo-Acosta
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
Arxiv link: https://arxiv.org/abs/2401.07577
Pdf link: https://arxiv.org/pdf/2401.07577
Abstract Given a graph $G$, the optimization version of the graph burning problem seeks for a sequence of vertices, $(u_1,u_2,...,u_k) \in V(G)^k$, with minimum $k$ and such that every $v \in V(G)$ has distance at most $k-i$ to some vertex $u_i$. The length $k$ of the optimal solution is known as the burning number and is denoted by $b(G)$, an invariant that helps quantify the graph's vulnerability to contagion. This paper explores the advantages and limitations of an $\mathcal{O}(mn + kn^2)$ deterministic greedy heuristic for this problem, where $n$ is the graph's order, $m$ is the graph's size, and $k$ is a guess on $b(G)$. This heuristic is based on the relationship between the graph burning problem and the clustered maximum coverage problem, and despite having limitations on paths and cycles, it found most of the optimal and best-known solutions of benchmark and synthetic graphs with up to 102400 vertices.
RedEx: Beyond Fixed Representation Methods via Convex Optimization
Authors: Authors: Amit Daniely, Mariano Schain, Gilad Yehudai
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.07606
Pdf link: https://arxiv.org/pdf/2401.07606
Abstract Optimizing Neural networks is a difficult task which is still not well understood. On the other hand, fixed representation methods such as kernels and random features have provable optimization guarantees but inferior performance due to their inherent inability to learn the representations. In this paper, we aim at bridging this gap by presenting a novel architecture called RedEx (Reduced Expander Extractor) that is as expressive as neural networks and can also be trained in a layer-wise fashion via a convex program with semi-definite constraints and optimization guarantees. We also show that RedEx provably surpasses fixed representation methods, in the sense that it can efficiently learn a family of target functions which fixed representation methods cannot.
Multi-Objective Optimization in STAR-RIS-Aided SWIPT with RSMA via Meta-Learning
Authors: Authors: Mojtaba Amiri, Elaheh Vaezpour, Sepideh Javadi, Mohammad Robat Mili, Halim Yanikomeroglu, Mehdi Bennis
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.07644
Pdf link: https://arxiv.org/pdf/2401.07644
Abstract Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) is a cutting-edge concept for the sixth-generation (6G) wireless networks. In this letter, we propose a novel system that incorporates STAR-RIS with simultaneous wireless information and power transfer (SWIPT) using rate splitting multiple access (RSMA). The proposed system facilitates communication from a multi-antenna base station (BS) to single-antenna users in a downlink transmission. The BS concurrently sends energy and information signals to multiple energy harvesting receivers (EHRs) and information data receivers (IDRs) with the support of a deployed STAR-RIS. Furthermore, a multi-objective optimization is introduced to strike a balance between users' sum rate and the total harvested energy. To achieve this, an optimization problem is formulated to optimize the energy/information beamforming vectors at the BS, the phase shifts at the STAR-RIS, and the common message rate. Subsequently, we employ a meta deep deterministic policy gradient (Meta-DDPG) approach to solve the complex problem. Simulation results validate that the proposed algorithm significantly enhances both data rate and harvested energy in comparison to conventional DDPG.
Preserving Power Optimizations Across the High Level Synthesis of Distinct Application-Specific Circuits
Authors: Authors: Paulo Garcia
Subjects: Programming Languages (cs.PL); Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.07726
Pdf link: https://arxiv.org/pdf/2401.07726
Abstract We evaluate the use of software interpretation to push High Level Synthesis of application-specific accelerators toward a higher level of abstraction. Our methodology is supported by a formal power consumption model that computes the power consumption of accelerator components, accurately predicting the power consumption on new designs from prior optimization estimations. We demonstrate how our approach simplifies the re-use of power optimizations across distinct designs, by leveraging the higher level of design abstraction, using two accelerators representative of the robotics domain, implemented through the Bambu High Level Synthesis tool. Results support the research hypothesis, achieving predictions accurate within +/- 1%.
HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation
Authors: Authors: Antoine Mercier, Ramin Nakhli, Mahesh Reddy, Rajeev Yasarla, Hong Cai, Fatih Porikli, Guillaume Berger
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.07727
Pdf link: https://arxiv.org/pdf/2401.07727
Abstract Despite the latest remarkable advances in generative modeling, efficient generation of high-quality 3D assets from textual prompts remains a difficult task. A key challenge lies in data scarcity: the most extensive 3D datasets encompass merely millions of assets, while their 2D counterparts contain billions of text-image pairs. To address this, we propose a novel approach which harnesses the power of large, pretrained 2D diffusion models. More specifically, our approach, HexaGen3D, fine-tunes a pretrained text-to-image model to jointly predict 6 orthographic projections and the corresponding latent triplane. We then decode these latents to generate a textured mesh. HexaGen3D does not require per-sample optimization, and can infer high-quality and diverse objects from textual prompts in 7 seconds, offering significantly better quality-to-latency trade-offs when comparing to existing approaches. Furthermore, HexaGen3D demonstrates strong generalization to new objects or compositions.
Joint Probability Selection and Power Allocation for Federated Learning
Authors: Authors: Ouiame Marnissi, Hajar EL Hammouti, El Houcine Bergou
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.07756
Pdf link: https://arxiv.org/pdf/2401.07756
Abstract In this paper, we study the performance of federated learning over wireless networks, where devices with a limited energy budget train a machine learning model. The federated learning performance depends on the selection of the clients participating in the learning at each round. Most existing studies suggest deterministic approaches for the client selection, resulting in challenging optimization problems that are usually solved using heuristics, and therefore without guarantees on the quality of the final solution. We formulate a new probabilistic approach to jointly select clients and allocate power optimally so that the expected number of participating clients is maximized. To solve the problem, a new alternating algorithm is proposed, where at each step, the closed-form solutions for user selection probabilities and power allocations are obtained. Our numerical results show that the proposed approach achieves a significant performance in terms of energy consumption, completion time and accuracy as compared to the studied benchmarks.
Learning Soft Constrained MPC Value Functions: Efficient MPC Design and Implementation providing Stability and Safety Guarantees
Authors: Authors: Nicolas Chatzikiriakos, Kim P. Wabersich, Felix Berkel, Patricia Pauli, Andrea Iannelli
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.07780
Pdf link: https://arxiv.org/pdf/2401.07780
Abstract Model Predictive Control (MPC) can be applied to safety-critical control problems, providing closed-loop safety and performance guarantees. Implementation of MPC controllers requires solving an optimization problem at every sampling instant, which is challenging to execute on embedded hardware. To address this challenge, we propose a framework that combines a tightened soft constrained MPC formulation with supervised learning to approximate the MPC value function. This combination enables us to obtain a corresponding optimal control law, which can be implemented efficiently on embedded platforms. The framework ensures stability and constraint satisfaction for various nonlinear systems. While the design effort is similar to that of nominal MPC, the proposed formulation provides input-to-state stability (ISS) with respect to the approximation error of the value function. Furthermore, we prove that the value function corresponding to the soft constrained MPC problem is Lipschitz continuous for Lipschitz continuous systems, even if the optimal control law may be discontinuous. This serves two purposes: First, it allows to relate approximation errors to a sufficiently large constraint tightening to obtain constraint satisfaction guarantees. Second, it paves the way for an efficient supervised learning procedure to obtain a continuous value function approximation. We demonstrate the effectiveness of the method using a nonlinear numerical example.
Certifiable Mutual Localization and Trajectory Planning for Bearing-Based Robot Swarm
Authors: Authors: Yingjian Wang, Xiangyong Wen, Fei Gao
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.07784
Pdf link: https://arxiv.org/pdf/2401.07784
Abstract Bearing measurements,as the most common modality in nature, have recently gained traction in multi-robot systems to enhance mutual localization and swarm collaboration. Despite their advantages, challenges such as sensory noise, obstacle occlusion, and uncoordinated swarm motion persist in real-world scenarios, potentially leading to erroneous state estimation and undermining the system's flexibility, practicality, and robustness.In response to these challenges, in this paper we address theoretical and practical problem related to both mutual localization and swarm planning.Firstly, we propose a certifiable mutual localization algorithm.It features a concise problem formulation coupled with lossless convex relaxation, enabling independence from initial values and globally optimal relative pose recovery.Then, to explore how detection noise and swarm motion influence estimation optimality, we conduct a comprehensive analysis on the interplay between robots' mutual spatial relationship and mutual localization. We develop a differentiable metric correlated with swarm trajectories to explicitly evaluate the noise resistance of optimal estimation.By establishing a finite and pre-computable threshold for this metric and accordingly generating swarm trajectories, the estimation optimality can be strictly guaranteed under arbitrary noise. Based on these findings, an optimization-based swarm planner is proposed to generate safe and smooth trajectories, with consideration of both inter-robot visibility and estimation optimality.Through numerical simulations, we evaluate the optimality and certifiablity of our estimator, and underscore the significance of our planner in enhancing estimation performance.The results exhibit considerable potential of our methods to pave the way for advanced closed-loop intelligence in swarm systems.
Optimal experimental design via gradient flow
Authors: Authors: Ruhui Jin, Martin Guerra, Qin Li, Stephen Wright
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2401.07806
Pdf link: https://arxiv.org/pdf/2401.07806
Abstract Optimal experimental design (OED) has far-reaching impacts in many scientific domains. We study OED over a continuous-valued design space, a setting that occurs often in practice. Optimization of a distributional function over an infinite-dimensional probability measure space is conceptually distinct from the discrete OED tasks that are conventionally tackled. We propose techniques based on optimal transport and Wasserstein gradient flow. A practical computational approach is derived from the Monte Carlo simulation, which transforms the infinite-dimensional optimization problem to a finite-dimensional problem over Euclidean space, to which gradient descent can be applied. We discuss first-order criticality and study the convexity properties of the OED objective. We apply our algorithm to the tomography inverse problem, where the solution reveals optimal sensor placements for imaging.
Online Simulation at Machine Level: A Systematic Review
Authors: Authors: Darius Deubert, Lars Klingel, Andreas Selig
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.07841
Pdf link: https://arxiv.org/pdf/2401.07841
Abstract The importance of simulation at machine level in industrial environments is steadily increasing especially in the design and commissioning phase. Using models during the operation phase together with the real machine or plant is referred to as online simulation. Online simulation is used for system monitoring, predictive analyses, decision support or online optimization and therefore has various advantages and a wide field of applications. This paper has the aim to characterize online simulation at machine level in industrial automation focusing on key technologies and common applications. Therefore, a set of 65 relevant publications, which are focusing on this subject, is found by database search, expert consultation, and snowballing. As key technological aspects, the used model types, interfaces and platforms, and the aspects of initialization and synchronization are further investigated. The results are interpreted and limitations, knowledge gaps and future prospects are discussed. The potential of online simulation at machine level especially arises due to the increasing availability of component and machine models from the design and commissioning phase, which can be reused for online simulation. Remaining challenges are identified concerning implementation, simulation platforms, model maintenance and especially in the field of synchronization.
PATSMA: Parameter Auto-tuning for Shared Memory Algorithms
Authors: Authors: Joao B. Fernandes, Felipe H. S. da Silva, Samuel Xavier-de-Souza, Italo A. S. Assis
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2401.07861
Pdf link: https://arxiv.org/pdf/2401.07861
Abstract Programs with high levels of complexity often face challenges in adjusting execution parameters, particularly when these parameters vary based on the execution context. These dynamic parameters significantly impact the program's performance, such as loop granularity, which can vary depending on factors like the execution environment, program input, or the choice of compiler. Given the expensive nature of testing each case individually, one viable solution is to automate parameter adjustments using optimization methods. This article introduces PATSMA, a parameter auto-tuning tool that leverages Coupled Simulated Annealing (CSA) and Nelder-Mead (NM) optimization methods to fine-tune existing parameters in an application. We demonstrate how auto-tuning can contribute to the real-time optimization of parallel algorithms designed for shared memory systems. PATSMA is a C++ library readily available under the MIT license.
The Chronicles of RAG: The Retriever, the Chunk and the Generator
Authors: Authors: Paulo Finardi, Leonardo Avila, Rodrigo Castaldoni, Pedro Gengo, Celio Larcher, Marcos Piau, Pablo Costa, Vinicius Caridá
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2401.07883
Pdf link: https://arxiv.org/pdf/2401.07883
Abstract Retrieval Augmented Generation (RAG) has become one of the most popular paradigms for enabling LLMs to access external data, and also as a mechanism for grounding to mitigate against hallucinations. When implementing RAG you can face several challenges like effective integration of retrieval models, efficient representation learning, data diversity, computational efficiency optimization, evaluation, and quality of text generation. Given all these challenges, every day a new technique to improve RAG appears, making it unfeasible to experiment with all combinations for your problem. In this context, this paper presents good practices to implement, optimize, and evaluate RAG for the Brazilian Portuguese language, focusing on the establishment of a simple pipeline for inference and experiments. We explored a diverse set of methods to answer questions about the first Harry Potter book. To generate the answers we used the OpenAI's gpt-4, gpt-4-1106-preview, gpt-3.5-turbo-1106, and Google's Gemini Pro. Focusing on the quality of the retriever, our approach achieved an improvement of MRR@10 by 35.4% compared to the baseline. When optimizing the input size in the application, we observed that it is possible to further enhance it by 2.4%. Finally, we present the complete architecture of the RAG with our recommendations. As result, we moved from a baseline of 57.88% to a maximum relative score of 98.61%.
6-DoF Grasp Pose Evaluation and Optimization via Transfer Learning from NeRFs
Authors: Authors: Gergely Sóti, Xi Huang, Christian Wurll, Björn Hein
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.07935
Pdf link: https://arxiv.org/pdf/2401.07935
Abstract We address the problem of robotic grasping of known and unknown objects using implicit behavior cloning. We train a grasp evaluation model from a small number of demonstrations that outputs higher values for grasp candidates that are more likely to succeed in grasping. This evaluation model serves as an objective function, that we maximize to identify successful grasps. Key to our approach is the utilization of learned implicit representations of visual and geometric features derived from a pre-trained NeRF. Though trained exclusively in a simulated environment with simplified objects and 4-DoF top-down grasps, our evaluation model and optimization procedure demonstrate generalization to 6-DoF grasps and novel objects both in simulation and in real-world settings, without the need for additional data. Supplementary material is available at: https://gergely-soti.github.io/grasp
Playing the MEV Game on a First-Come-First-Served Blockchain
Authors: Authors: Burak Öz, Jonas Gebele, Parshant Singh, Filip Rezabek, Florian Matthes
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.07992
Pdf link: https://arxiv.org/pdf/2401.07992
Abstract Maximal Extractable Value (MEV) searching has gained prominence on the Ethereum blockchain since the surge in Decentralized Finance activities. In Ethereum, MEV extraction primarily hinges on fee payments to block proposers. However, in First-Come-First-Served (FCFS) blockchain networks, the focus shifts to latency optimizations, akin to High-Frequency Trading in Traditional Finance. This paper illustrates the dynamics of the MEV extraction game in an FCFS network, specifically Algorand. We introduce an arbitrage detection algorithm tailored to the unique time constraints of FCFS networks and assess its effectiveness. Additionally, our experiments investigate potential optimizations in Algorand's network layer to secure optimal execution positions. Our analysis reveals that while the states of relevant trading pools are updated approximately every six blocks on median, pursuing MEV at the block state level is not viable on Algorand, as arbitrage opportunities are typically executed within the blocks they appear. Our algorithm's performance under varying time constraints underscores the importance of timing in arbitrage discovery. Furthermore, our network-level experiments identify critical transaction prioritization strategies for Algorand's FCFS network. Key among these is reducing latency in connections with relays that are well-connected to high-staked proposers.
Hardware Acceleration for Real-Time Wildfire Detection Onboard Drone Networks
Authors: Authors: Austin Briley, Fatemeh Afghah
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.08105
Pdf link: https://arxiv.org/pdf/2401.08105
Abstract Early wildfire detection in remote and forest areas is crucial for minimizing devastation and preserving ecosystems. Autonomous drones offer agile access to remote, challenging terrains, equipped with advanced imaging technology that delivers both high-temporal and detailed spatial resolution, making them valuable assets in the early detection and monitoring of wildfires. However, the limited computation and battery resources of Unmanned Aerial Vehicles (UAVs) pose significant challenges in implementing robust and efficient image classification models. Current works in this domain often operate offline, emphasizing the need for solutions that can perform inference in real time, given the constraints of UAVs. To address these challenges, this paper aims to develop a real-time image classification and fire segmentation model. It presents a comprehensive investigation into hardware acceleration using the Jetson Nano P3450 and the implications of TensorRT, NVIDIA's high-performance deep-learning inference library, on fire classification accuracy and speed. The study includes implementations of Quantization Aware Training (QAT), Automatic Mixed Precision (AMP), and post-training mechanisms, comparing them against the latest baselines for fire segmentation and classification. All experiments utilize the FLAME dataset - an image dataset collected by low-altitude drones during a prescribed forest fire. This work contributes to the ongoing efforts to enable real-time, on-board wildfire detection capabilities for UAVs, addressing speed and the computational and energy constraints of these crucial monitoring systems. The results show a 13% increase in classification speed compared to similar models without hardware optimization. Comparatively, loss and accuracy are within 1.225% of the original values.
Operation Scheme Optimizations to Achieve Ultra-high Endurance (1010) in Flash Memory with Robust Reliabilities
Authors: Authors: Yang Feng, Zhaohui Sun, Chengcheng Wang, Xinyi Guo, Junyao Mei, Yueran Qi, Jing Liu, Junyu Zhang, Jixuan Wu, Xuepeng Zhan, Jiezhi Chen
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.08120
Pdf link: https://arxiv.org/pdf/2401.08120
Abstract Flash memory has been widely adopted as stand-alone memory and embedded memory due to its robust reliability. However, the limited endurance obstacles its further applications in storage class memory (SCM) and to proceed endurance-required computing-in-memory (CIM) tasks. In this work, the optimization strategies have been studied to tackle this concern. It is shown that by adopting the channel hot electrons injection (CHEI) and hot hole injection (HHI) to implement program/erase (PE) cycling together with a balanced memory window (MW) at the high-Vth (HV) mode, impressively, the endurance can be greatly extended to 1010 PE cycles, which is a record-high value in flash memory. Moreover, by using the proposed electric-field-assisted relaxation (EAR) scheme, the degradation of flash cells can be well suppressed with better subthreshold swings (SS) and lower leakage currents (sub-10pA after 1010 PE cycles). Our results shed light on the optimization strategy of flash memory to serve as SCM and implementendurance-required CIM tasks.
Distributed Stackelberg Equilibrium Seeking for Networked Multi-Leader Multi-Follower Games with A Clustered Information Structure
Authors: Authors: Yue Chen, Peng Yi
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.08144
Pdf link: https://arxiv.org/pdf/2401.08144
Abstract The Stackelberg game depicts a leader-follower relationship wherein decisions are made sequentially, and the Stackelberg equilibrium represents an expected optimal solution when the leader can anticipate the rational response of the follower. Motivated by control of network systems with two levels of decision-making hierarchy, such as the management of energy networks and power coordination at cellular networks, a networked multi-leaders and multi-followers Stackelberg game is proposed. Due to the constraint of limited information interaction among players, a clustered information structure is assumed that each leader can only communicate with a portion of overall followers, namely its subordinated followers, and also only with its local neighboring leaders. In this case, the leaders cannot fully anticipate the collective rational response of all followers with its local information. To address Stackelberg equilibrium seeking under this partial information structure, we propose a distributed seeking algorithm based on implicit gradient estimation and network consensus mechanisms. We rigorously prove the convergence of the algorithm for both diminishing and constant step sizes under strict and strong monotonicity conditions, respectively. Furthermore, the model and the algorithm can also incorporate linear equality and inequality constraints into the followers' optimization problems, with the approach of the interior point barrier function. Finally, we present numerical simulations in applications to corroborate our claims on the proposed framework.
Learning Stable Koopman Embeddings for Identification and Control
Authors: Authors: Fletcher Fan, Bowen Yi, David Rye, Guodong Shi, Ian R. Manchester
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.08153
Pdf link: https://arxiv.org/pdf/2401.08153
Abstract This paper introduces new model parameterizations for learning dynamical systems from data via the Koopman operator, and studies their properties. Whereas most existing works on Koopman learning do not take into account the stability or stabilizability of the model -- two fundamental pieces of prior knowledge about a given system to be identified -- in this paper, we propose new classes of Koopman models that have built-in guarantees of these properties. These models are guaranteed to be stable or stabilizable via a novel {\em direct parameterization approach} that leads to {\em unconstrained} optimization problems with respect to their parameter sets. To explore the representational flexibility of these model sets, we establish novel theoretical connections between the stability of discrete-time Koopman embedding and contraction-based forms of nonlinear stability and stabilizability. The proposed approach is illustrated in applications to stable nonlinear system identification and imitation learning via stabilizable models. Simulation results empirically show that the learning approaches based on the proposed models outperform prior methods lacking stability guarantees.
PRewrite: Prompt Rewriting with Reinforcement Learning
Authors: Authors: Weize Kong, Spurthi Amba Hombaiah, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.08189
Pdf link: https://arxiv.org/pdf/2401.08189
Abstract Prompt engineering is critical for the development of LLM-based applications. However, it is usually done manually in a "trial and error" fashion. This manual procedure can be time consuming, ineffective, and the generated prompts are, in a lot of cases, sub-optimal. Even for the prompts which seemingly work well, there is always a lingering question: can the prompts be made better with further modifications? To address these questions, in this paper, we investigate prompt engineering automation. We consider a specific use case scenario in which developers/users have drafted initial prompts, but lack the time/expertise to optimize them. We propose PRewrite, an automated tool to rewrite these drafts and to generate highly effective new prompts. PRewrite is based on the Reinforcement Learning (RL) framework which allows for end-to-end optimization and our design allows the RL search to happen in a large action space. The automated tool leverages manually crafted prompts as starting points which makes the rewriting procedure more guided and efficient. The generated prompts are human readable, and self-explanatory, unlike some of those in previous works. We conducted extensive experiments on diverse datasets and found that the prompts generated with this new method not only outperform professionally crafted prompts, but also prompts generated with other previously proposed methods.
Efficient and Mathematically Robust Operations for Certified Neural Networks Inference
Authors: Authors: Fabien Geyer, Johannes Freitag, Tobias Schulz, Sascha Uhrig
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
Arxiv link: https://arxiv.org/abs/2401.08225
Pdf link: https://arxiv.org/pdf/2401.08225
Abstract In recent years, machine learning (ML) and neural networks (NNs) have gained widespread use and attention across various domains, particularly in transportation for achieving autonomy, including the emergence of flying taxis for urban air mobility (UAM). However, concerns about certification have come up, compelling the development of standardized processes encompassing the entire ML and NN pipeline. This paper delves into the inference stage and the requisite hardware, highlighting the challenges associated with IEEE 754 floating-point arithmetic and proposing alternative number representations. By evaluating diverse summation and dot product algorithms, we aim to mitigate issues related to non-associativity. Additionally, our exploration of fixed-point arithmetic reveals its advantages over floating-point methods, demonstrating significant hardware efficiencies. Employing an empirical approach, we ascertain the optimal bit-width necessary to attain an acceptable level of accuracy, considering the inherent complexity of bit-width optimization.
Phase-free Dynamic Movement Primitives Applied to Kinesthetic Guidance in Robotic Co-manipulation Tasks
Authors: Authors: Giovanni Braglia, Davide Tebaldi, Luigi Biagiotti
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.08238
Pdf link: https://arxiv.org/pdf/2401.08238
Abstract When there is a need to define and adapt a robotic task based on a reference motion, Dynamic Movement Primitives (DMP) is a standard and efficient method for encoding it. The nominal trajectory is typically obtained through a Programming by Demonstration (PbD) approach, where the robot is taught a specific task through kinesthetic guidance. Subsequently, the motion is reproduced by the manipulator in terms of both geometric path and timing law. The basic approach for modifying the duration of the execution involves adjusting a time constant characterizing the model. On the contrary, the goal of this paper is to achieve complete decoupling between the geometric information of the task, encoded into the DMP, and the phase law governing the execution, allowing them to be chosen independently. This enables the optimization of the task duration to satisfy constraints such as velocity or acceleration or even to define a phase law dependent on external inputs, such as the force applied by a user in a co-manipulation task. As an example, this mechanism will be exploited to define a rehabilitation activity where the cobot assists humans in performing various pre-planned exercises.
Optimizing $k$ in $k$NN Graphs with Graph Learning Perspective
Authors: Authors: Asuka Tamaru, Junya Hara, Hiroshi Higashi, Yuichi Tanaka, Antonio Ortega
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.08245
Pdf link: https://arxiv.org/pdf/2401.08245
Abstract In this paper, we propose a method, based on graph signal processing, to optimize the choice of $k$ in $k$-nearest neighbor graphs ($k$NNGs). $k$NN is one of the most popular approaches and is widely used in machine learning and signal processing. The parameter $k$ represents the number of neighbors that are connected to the target node; however, its appropriate selection is still a challenging problem. Therefore, most $k$NNGs use ad hoc selection methods for $k$. In the proposed method, we assume that a different $k$ can be chosen for each node. We formulate a discrete optimization problem to seek the best $k$ with a constraint on the sum of distances of the connected nodes. The optimal $k$ values are efficiently obtained without solving a complex optimization. Furthermore, we reveal that the proposed method is closely related to existing graph learning methods. In experiments on real datasets, we demonstrate that the $k$NNGs obtained with our method are sparse and can determine an appropriate variable number of edges per node. We validate the effectiveness of the proposed method for point cloud denoising, comparing our denoising performance with achievable graph construction methods that can be scaled to typical point cloud sizes (e.g., thousands of nodes).
The Faiss library
Authors: Authors: Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, Hervé Jégou
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2401.08281
Pdf link: https://arxiv.org/pdf/2401.08281
Abstract Vector databases manage large collections of embedding vectors. As AI applications are growing rapidly, so are the number of embeddings that need to be stored and indexed. The Faiss library is dedicated to vector similarity search, a core functionality of vector databases. Faiss is a toolkit of indexing methods and related primitives used to search, cluster, compress and transform vectors. This paper first describes the tradeoff space of vector search, then the design principles of Faiss in terms of structure, approach to optimization and interfacing. We benchmark key features of the library and discuss a few selected applications to highlight its broad applicability.
Boosting Gradient Ascent for Continuous DR-submodular Maximization
Authors: Authors: Qixin Zhang, Zongqi Wan, Zengde Deng, Zaiyi Chen, Xiaoming Sun, Jialin Zhang, Yu Yang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.08330
Pdf link: https://arxiv.org/pdf/2401.08330
Abstract Projected Gradient Ascent (PGA) is the most commonly used optimization scheme in machine learning and operations research areas. Nevertheless, numerous studies and examples have shown that the PGA methods may fail to achieve the tight approximation ratio for continuous DR-submodular maximization problems. To address this challenge, we present a boosting technique in this paper, which can efficiently improve the approximation guarantee of the standard PGA to \emph{optimal} with only small modifications on the objective function. The fundamental idea of our boosting technique is to exploit non-oblivious search to derive a novel auxiliary function $F$, whose stationary points are excellent approximations to the global maximum of the original DR-submodular objective $f$. Specifically, when $f$ is monotone and $\gamma$-weakly DR-submodular, we propose an auxiliary function $F$ whose stationary points can provide a better $(1-e^{-\gamma})$-approximation than the $(\gamma^2/(1+\gamma^2))$-approximation guaranteed by the stationary points of $f$ itself. Similarly, for the non-monotone case, we devise another auxiliary function $F$ whose stationary points can achieve an optimal $\frac{1-\min{\boldsymbol{x}\in\mathcal{C}}|\boldsymbol{x}|{\infty}}{4}$-approximation guarantee where $\mathcal{C}$ is a convex constraint set. In contrast, the stationary points of the original non-monotone DR-submodular function can be arbitrarily bad~\citep{chen2023continuous}. Furthermore, we demonstrate the scalability of our boosting technique on four problems. In all of these four problems, our resulting variants of boosting PGA algorithm beat the previous standard PGA in several aspects such as approximation ratio and efficiency. Finally, we corroborate our theoretical findings with numerical experiments, which demonstrate the effectiveness of our boosting PGA methods.
Adjoint Monte Carlo Method
Authors: Authors: Russel Caflisch, Yunan Yang
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2401.08361
Pdf link: https://arxiv.org/pdf/2401.08361
Abstract This survey explores the development of adjoint Monte Carlo methods for solving optimization problems governed by kinetic equations, a common challenge in areas such as plasma control and device design. These optimization problems are particularly demanding due to the high dimensionality of the phase space and the randomness in evaluating the objective functional, a consequence of using a forward Monte Carlo solver. To overcome these difficulties, a range of ``adjoint Monte Carlo methods'' have been devised. These methods skillfully combine Monte Carlo gradient estimators with PDE-constrained optimization, introducing innovative solutions tailored for kinetic applications. In this review, we begin by examining three primary strategies for Monte Carlo gradient estimation: the score function approach, the reparameterization trick, and the coupling method. We also delve into the adjoint-state method, an essential element in PDE-constrained optimization. Focusing on applications in the radiative transfer equation and the nonlinear Boltzmann equation, we provide a comprehensive guide on how to integrate Monte Carlo gradient techniques within both the optimize-then-discretize and the discretize-then-optimize frameworks from PDE-constrained optimization. This approach leads to the formulation of effective adjoint Monte Carlo methods, enabling efficient gradient estimation in complex, high-dimensional optimization problems.
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
Authors: Authors: Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. (DK)Panda
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2401.08383
Pdf link: https://arxiv.org/pdf/2401.08383
Abstract In large language models like the Generative Pre-trained Transformer, the Mixture of Experts paradigm has emerged as a powerful technique for enhancing model expressiveness and accuracy. However, deploying GPT MoE models for parallel inference on distributed systems presents significant challenges, primarily due to the extensive Alltoall communication required for expert routing and aggregation. This communication bottleneck exacerbates the already complex computational landscape, hindering the efficient utilization of high-performance computing resources. In this paper, we propose a lightweight optimization technique called ExFlow, to largely accelerate the inference of these MoE models. We take a new perspective on alleviating the communication overhead by exploiting the inter-layer expert affinity. Unlike previous methods, our solution can be directly applied to pre-trained MoE models without any fine-tuning or accuracy degradation. By proposing a context-coherent expert parallelism on distributed systems, our design only uses one Alltoall communication to deliver the same functionality while previous methods all require two Alltoalls. By carefully examining the conditional probability in tokens' routing across multiple layers, we proved that pre-trained GPT MoE models implicitly exhibit a strong inter-layer expert affinity. We then design an efficient integer programming model to capture such features and show that by properly placing the experts on corresponding GPUs, we can reduce up to 67% cross-GPU routing latency. Our solution beats the cutting-edge MoE implementations with experts from 8 to 64, with up to 2.2x improvement in inference throughput. We further provide a detailed study of how the model implicitly acquires this expert affinity at the very early training stage and how this affinity evolves and stabilizes during training.
High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering
Authors: Authors: Xin Ming, Jiawei Li, Jingwang Ling, Libo Zhang, Feng Xu
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.08398
Pdf link: https://arxiv.org/pdf/2401.08398
Abstract Readily editable mesh blendshapes have been widely used in animation pipelines, while recent advancements in neural geometry and appearance representations have enabled high-quality inverse rendering. Building upon these observations, we introduce a novel technique that reconstructs mesh-based blendshape rigs from single or sparse multi-view videos, leveraging state-of-the-art neural inverse rendering. We begin by constructing a deformation representation that parameterizes vertex displacements into differential coordinates with tetrahedral connections, allowing for high-quality vertex deformation on high-resolution meshes. By constructing a set of semantic regulations in this representation, we achieve joint optimization of blendshapes and expression coefficients. Furthermore, to enable a user-friendly multi-view setup with unsynchronized cameras, we propose a neural regressor to model time-varying motion parameters. This approach implicitly considers the time difference across multiple cameras, enhancing the accuracy of motion modeling. Experiments demonstrate that, with the flexible input of single or sparse multi-view videos, we reconstruct personalized high-fidelity blendshapes. These blendshapes are both geometrically and semantically accurate, and they are compatible with industrial animation pipelines. Code and data will be released.
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Authors: Authors: Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.08417
Pdf link: https://arxiv.org/pdf/2401.08417
Abstract Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.
Centralized vs. Decoupled Dual-Arm Planning Taking into Account Path Quality
Authors: Authors: Jonas Wittmann, Franziska Ochsenfarth, Valentin Sonneville, Daniel Rixen
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.08443
Pdf link: https://arxiv.org/pdf/2401.08443
Abstract The aim of coordinated planning is to avoid robot-to-robot collisions in a multi-robot system, and there are two standard solution approaches: centralized planning and decoupled planning. Our first contribution is a decoupled planning approach that ensures C2-continuous control commands with zero velocities at the start and goal. We benchmark our decoupled approach with a centralized approach. Contrary to literature, we show that for a standard motion planning pipeline, such as the one used by MoveIt!, centralized planning is superior to decoupled planning in dual-arm manipulation: It has a lower computation time and a higher robustness. Our second contribution is an optimization that minimizes the rotational motion of an end-effector while considering obstacle avoidance. We derive the analytic gradients of this optimization problem, making the algorithm suitable for online motion planning. Our optimization extends an existing path quality improvement method. Integrating it into our decoupled approach overcomes its shortcomings and provides a motion planning pipeline that is robust at up to 99.9% with a planning time of less than 1s and that computes high-quality paths.
Revealing the Hidden Impact of Top-N Metrics on Optimization in Recommender Systems
Authors: Authors: Lukas Wegmeth, Tobias Vente, Lennart Purucker
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2401.08444
Pdf link: https://arxiv.org/pdf/2401.08444
Abstract The hyperparameters of recommender systems for top-n predictions are typically optimized to enhance the predictive performance of algorithms. Thereby, the optimization algorithm, e.g., grid search or random search, searches for the best hyperparameter configuration according to an optimization-target metric, like nDCG or Precision. In contrast, the optimized algorithm, internally optimizes a different loss function during training, like squared error or cross-entropy. To tackle this discrepancy, recent work focused on generating loss functions better suited for recommender systems. Yet, when evaluating an algorithm using a top-n metric during optimization, another discrepancy between the optimization-target metric and the training loss has so far been ignored. During optimization, the top-n items are selected for computing a top-n metric; ignoring that the top-n items are selected from the recommendations of a model trained with an entirely different loss function. Item recommendations suitable for optimization-target metrics could be outside the top-n recommended items; hiddenly impacting the optimization performance. Therefore, we were motivated to analyze whether the top-n items are optimal for optimization-target top-n metrics. In pursuit of an answer, we exhaustively evaluate the predictive performance of 250 selection strategies besides selecting the top-n. We extensively evaluate each selection strategy over twelve implicit feedback and eight explicit feedback data sets with eleven recommender systems algorithms. Our results show that there exist selection strategies other than top-n that increase predictive performance for various algorithms and recommendation domains. However, the performance of the top ~43% of selection strategies is not significantly different. We discuss the impact of our findings on optimization and re-ranking in recommender systems and feasible solutions.
Instilling Multi-round Thinking to Text-guided Image Generation
Authors: Authors: Lidong Zeng, Zhedong Zheng, Yinwei Wei, Tat-seng Chua
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.08472
Pdf link: https://arxiv.org/pdf/2401.08472
Abstract In this paper, we study the text-guided image generation task. Our focus lies in the modification of a reference image, given user text feedback, to imbue it with specific desired properties. Despite recent strides in this field, a persistent challenge remains that single-round optimization often overlooks crucial details, particularly in the realm of fine-grained changes like shoes or sleeves. This misalignment accumulation significantly hampers multi-round customization during interaction. In an attempt to address this challenge, we introduce a new self-supervised regularization into the existing framework, i.e., multi-round regularization. It builds upon the observation that the modification order does not affect the final result. As the name suggests, the multi-round regularization encourages the model to maintain consistency across different modification orders. Specifically, our proposed approach addresses the issue where an initial failure to capture fine-grained details leads to substantial discrepancies after multiple rounds, as opposed to traditional one-round learning. Both qualitative and quantitative experiments show the proposed method achieves high-fidelity generation quality over the text-guided generation task, especially the local modification. Furthermore, we extend the evaluation to semantic alignment with text by applying our method to text-guided retrieval datasets, such as FahisonIQ, where it demonstrates competitive performance.
Real-Time Dynamic Layout Optimization for Floating Offshore Wind Farm Control
Authors: Authors: Timothé Jard, Reda Snaiki
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.08484
Pdf link: https://arxiv.org/pdf/2401.08484
Abstract Downstream wind turbines operating behind upstream turbines face significant performance challenges due to reduced wind speeds and increased turbulence. This leads to decreased wind energy production and higher dynamic loads on downwind turbines. Consequently, real-time monitoring and control have become crucial for improving wind farm performance. One promising solution involves optimizing wind farm layouts in real-time, taking advantage of the added flexibility offered by floating offshore wind turbines (FOWTs). This study explores a dynamic layout optimization strategy to minimize wake effects in wind farms while meeting power requirements. Two scenarios are considered: power maximization and power set-point tracking. The methodology involves a centralized wind farm controller optimizing the layout, followed by wind turbine controllers to meet the prescribed targets. Each FOWT employs model predictive control to adjust aerodynamic thrust force. The control strategy integrates a dynamic wind farm model that considers floating platform motion and wake transport in changing wind conditions. In a case study with a 1x3 wind farm layout of 5 MW FOWTs, the results show a 25% increase in stable energy production compared to a static layout in one hour for the first scenario. In the second scenario, desired power production was swiftly and consistently achieved.
Battery-Swapping Multi-Agent System for Sustained Operation of Large Planetary Fleets
Authors: Authors: Ethan Holand, Jarrod Homer, Alex Storrer, Musheeera Khandeker, Ethan F. Muhlon, Maulik Patel, Ben-oni Vainqueur, David Antaki, Naomi Cooke, Chloe Wilson, Bahram Shafai, Nathaniel Hanson, Taşkın Padır
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.08497
Pdf link: https://arxiv.org/pdf/2401.08497
Abstract We propose a novel, heterogeneous multi-agent architecture that miniaturizes rovers by outsourcing power generation to a central hub. By delegating power generation and distribution functions to this hub, the size, weight, power, and cost (SWAP-C) per rover are reduced, enabling efficient fleet scaling. As these rovers conduct mission tasks around the terrain, the hub charges an array of replacement battery modules. When a rover requires charging, it returns to the hub to initiate an autonomous docking sequence and exits with a fully charged battery. This confers an advantage over direct charging methods, such as wireless or wired charging, by replenishing a rover in minutes as opposed to hours, increasing net rover uptime. This work shares an open-source platform developed to demonstrate battery swapping on unknown field terrain. We detail our design methodologies utilized for increasing system reliability, with a focus on optimization, robust mechanical design, and verification. Optimization of the system is discussed, including the design of passive guide rails through simulation-based optimization methods which increase the valid docking configuration space by 258%. The full system was evaluated during integrated testing, where an average servicing time of 98 seconds was achieved on surfaces with a gradient up to 10{\deg}. We conclude by briefly proposing flight considerations for advancing the system toward a space-ready design. In sum, this prototype represents a proof of concept for autonomous docking and battery transfer on field terrain, advancing its Technology Readiness Level (TRL) from 1 to 3.
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering
Authors: Authors: Tal Ridnik, Dedy Kredo, Itamar Friedman
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2401.08500
Pdf link: https://arxiv.org/pdf/2401.08500
Abstract Code generation problems differ from common natural language problems - they require matching the exact syntax of the target language, identifying happy paths and edge cases, paying attention to numerous small details in the problem spec, and addressing other code-specific issues and requirements. Hence, many of the optimizations and tricks that have been successful in natural language generation may not be effective for code tasks. In this work, we propose a new approach to code generation by LLMs, which we call AlphaCodium - a test-based, multi-stage, code-oriented iterative flow, that improves the performances of LLMs on code problems. We tested AlphaCodium on a challenging code generation dataset called CodeContests, which includes competitive programming problems from platforms such as Codeforces. The proposed flow consistently and significantly improves results. On the validation set, for example, GPT-4 accuracy (pass@5) increased from 19% with a single well-designed direct prompt to 44% with the AlphaCodium flow. Many of the principles and best practices acquired in this work, we believe, are broadly applicable to general code generation tasks. Full implementation is available at: https://github.com/Codium-ai/AlphaCodium
X Hacking: The Threat of Misguided AutoML
Authors: Authors: Rahul Sharma, Sergey Redyuk, Sumantrak Mukherjee, Andrea Sipka, Sebastian Vollmer, David Selby
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2401.08513
Pdf link: https://arxiv.org/pdf/2401.08513
Abstract Explainable AI (XAI) and interpretable machine learning methods help to build trust in model predictions and derived insights, yet also present a perverse incentive for analysts to manipulate XAI metrics to support pre-specified conclusions. This paper introduces the concept of X-hacking, a form of p-hacking applied to XAI metrics such as Shap values. We show how an automated machine learning pipeline can be used to search for 'defensible' models that produce a desired explanation while maintaining superior predictive performance to a common baseline. We formulate the trade-off between explanation and accuracy as a multi-objective optimization problem and illustrate the feasibility and severity of X-hacking empirically on familiar real-world datasets. Finally, we suggest possible methods for detection and prevention, and discuss ethical implications for the credibility and reproducibility of XAI research.
Experimentally implemented dynamic optogenetic optimization of ATPase expression using knowledge-based and Gaussian-process-supported models
Authors: Authors: Sebastián Espinel-Ríos, Gerrich Behrendt, Jasmin Bauer, Bruno Morabito, Johannes Pohlodek, Andrea Schütze, Rolf Findeisen, Katja Bettenbrock, Steffen Klamt
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.08556
Pdf link: https://arxiv.org/pdf/2401.08556
Abstract Optogenetic modulation of adenosine triphosphatase (ATPase) expression represents a novel approach to maximize the efficiency of bioprocesses by leveraging the concept of enforced adenosine triphosphate (ATP) turnover. In this study, we experimentally implement a model-based open-loop optimization scheme for optogenetic modulation of the expression of the ATPase. Increasing the intracellular concentration of ATPase, and thus the level of ATP turnover, in bioprocesses with product synthesis coupled with ATP generation, can lead to an increase in product formation and substrate uptake. Previous simulation studies involved formulating optimal control problems using dynamic constraint-based models to find optimal light inputs in fermentations with optogenetically mediated ATPase expression. However, using these models poses challenges due to resulting bilevel optimization problems and complex parameterization. Here, we outline a simplified unsegregated and quasi-unstructured kinetic modeling approach that reduces the number of dynamic states and leads to single-level optimization problems. The proposed models can be augmented with Gaussian processes to compensate for model uncertainties. We show the use of optimal control constrained by knowledge-based and hybrid models in the context of optogenetic ATPase expression in $\textit{Escherichia coli}$ with lactate as the main fermentation product. To do so, we genetically engineer $\textit{E. coli}$ to obtain optogenetic expression of ATPase using the CcaS/CcaR system. This work represents the first experimental implementation of model-based optimization to dynamically modulate ATPase expression in bioprocesses.
Safe Mission-Level Path Planning for Exploration of Lunar Shadowed Regions by a Solar-Powered Rover
Authors: Authors: Olivier Lamarre, Shantanu Malhotra, Jonathan Kelly
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.08558
Pdf link: https://arxiv.org/pdf/2401.08558
Abstract Exploration of the lunar south pole with a solar-powered rover is challenging due to the highly dynamic solar illumination conditions and the presence of permanently shadowed regions (PSRs). In turn, careful planning in space and time is essential. Mission-level path planning is a global, spatiotemporal paradigm that addresses this challenge, taking into account rover resources and mission requirements. However, existing approaches do not proactively account for random disturbances, such as recurring faults, that may temporarily delay rover traverse progress. In this paper, we formulate a chance-constrained mission-level planning problem for the exploration of PSRs by a solar-powered rover affected by random faults. The objective is to find a policy that visits as many waypoints of scientific interest as possible while respecting an upper bound on the probability of mission failure. Our approach assumes that faults occur randomly, but at a known, constant average rate. Each fault is resolved within a fixed time, simulating the recovery period of an autonomous system or the time required for a team of human operators to intervene. Unlike solutions based upon dynamic programming alone, our method breaks the chance-constrained optimization problem into smaller offline and online subtasks to make the problem computationally tractable. Specifically, our solution combines existing mission-level path planning techniques with a stochastic reachability analysis component. We find mission plans that remain within reach of safety throughout large state spaces. To empirically validate our algorithm, we simulate mission scenarios using orbital terrain and illumination maps of Cabeus Crater. Results from simulations of multi-day, long-range drives in the LCROSS impact region are also presented.
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
Authors: Authors: Yuhui Zhang, Elaine Sui, Serena Yeung-Levy
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.08567
Pdf link: https://arxiv.org/pdf/2401.08567
Abstract Building cross-modal applications is challenging due to limited paired multi-modal data. Recent works have shown that leveraging a pre-trained multi-modal contrastive representation space enables cross-modal tasks to be learned from uni-modal data. This is based on the assumption that contrastive optimization makes embeddings from different modalities interchangeable. However, this assumption is under-explored due to the poorly understood geometry of the multi-modal contrastive space, where a modality gap exists. In our study, we provide a theoretical explanation of this space's geometry and introduce a three-step method, $C^3$ (Connect, Collapse, Corrupt), to bridge the modality gap, enhancing the interchangeability of embeddings. Our $C^3$ method significantly improves cross-modal learning from uni-modal data, achieving state-of-the-art results on zero-shot image / audio / video captioning and text-to-image generation.
RoHM: Robust Human Motion Reconstruction via Diffusion
Authors: Authors: Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu, Alexander Winkler, Petr Kadlecek, Siyu Tang, Federica Bogo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.08570
Pdf link: https://arxiv.org/pdf/2401.08570
Abstract We propose RoHM, an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization at test time. The former do not recover globally coherent motion and fail under occlusions; the latter are time-consuming, prone to local minima, and require manual tuning. To overcome these shortcomings, we exploit the iterative, denoising nature of diffusion models. RoHM is a novel diffusion-based motion model that, conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. Given the complexity of the problem -- requiring one to address different tasks (denoising and infilling) in different solution spaces (local and global motion) -- we decompose it into two sub-tasks and learn two models, one for global trajectory and one for local motion. To capture the correlations between the two, we then introduce a novel conditioning module, combining it with an iterative inference scheme. We apply RoHM to a variety of tasks -- from motion reconstruction and denoising to spatial and temporal infilling. Extensive experiments on three popular datasets show that our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time. The code will be available at https://sanweiliti.github.io/ROHM/ROHM.html.
Keyword: adam

Deep Learning Based Cyberbullying Detection in Bangla Language
Authors: Authors: Sristy Shidul Nath, Razuan Karim, Mahdi H. Miraz
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2401.06787
Pdf link: https://arxiv.org/pdf/2401.06787
Abstract The Internet is currently the largest platform for global communication including expressions of opinions, reviews, contents, images, videos and so forth. Moreover, social media has now become a very broad and highly engaging platform due to its immense popularity and swift adoption trend. Increased social networking, however, also has detrimental impacts on the society leading to a range of unwanted phenomena, such as online assault, intimidation, digital bullying, criminality and trolling. Hence, cyberbullying has become a pervasive and worrying problem that poses considerable psychological and emotional harm to the people, particularly amongst the teens and the young adults. In order to lessen its negative effects and provide victims with prompt support, a great deal of research to identify cyberbullying instances at various online platforms is emerging. In comparison to other languages, Bangla (also known as Bengali) has fewer research studies in this domain. This study demonstrates a deep learning strategy for identifying cyberbullying in Bengali, using a dataset of 12282 versatile comments from multiple social media sites. In this study, a two-layer bidirectional long short-term memory (Bi-LSTM) model has been built to identify cyberbullying, using a variety of optimisers as well as 5-fold cross validation. To evaluate the functionality and efficacy of the proposed system, rigorous assessment and validation procedures have been employed throughout the project. The results of this study reveals that the proposed model's accuracy, using momentum-based stochastic gradient descent (SGD) optimiser, is 94.46%. It also reflects a higher accuracy of 95.08% and a F1 score of 95.23% using Adam optimiser as well as a better accuracy of 94.31% in 5-fold cross validation.
Keyword: gradient

Deep Learning Based Cyberbullying Detection in Bangla Language
Authors: Authors: Sristy Shidul Nath, Razuan Karim, Mahdi H. Miraz
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Arxiv link: https://arxiv.org/abs/2401.06787
Pdf link: https://arxiv.org/pdf/2401.06787
Abstract The Internet is currently the largest platform for global communication including expressions of opinions, reviews, contents, images, videos and so forth. Moreover, social media has now become a very broad and highly engaging platform due to its immense popularity and swift adoption trend. Increased social networking, however, also has detrimental impacts on the society leading to a range of unwanted phenomena, such as online assault, intimidation, digital bullying, criminality and trolling. Hence, cyberbullying has become a pervasive and worrying problem that poses considerable psychological and emotional harm to the people, particularly amongst the teens and the young adults. In order to lessen its negative effects and provide victims with prompt support, a great deal of research to identify cyberbullying instances at various online platforms is emerging. In comparison to other languages, Bangla (also known as Bengali) has fewer research studies in this domain. This study demonstrates a deep learning strategy for identifying cyberbullying in Bengali, using a dataset of 12282 versatile comments from multiple social media sites. In this study, a two-layer bidirectional long short-term memory (Bi-LSTM) model has been built to identify cyberbullying, using a variety of optimisers as well as 5-fold cross validation. To evaluate the functionality and efficacy of the proposed system, rigorous assessment and validation procedures have been employed throughout the project. The results of this study reveals that the proposed model's accuracy, using momentum-based stochastic gradient descent (SGD) optimiser, is 94.46%. It also reflects a higher accuracy of 95.08% and a F1 score of 95.23% using Adam optimiser as well as a better accuracy of 94.31% in 5-fold cross validation.
Make Prompts Adaptable: Bayesian Modeling for Vision-Language Prompt Learning with Data-Dependent Prior
Authors: Authors: Youngjae Cho, HeeSun Bae, Seungjae Shin, Yeo Dong Youn, Weonyoung Joo, Il-Chul Moon
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.06799
Pdf link: https://arxiv.org/pdf/2401.06799
Abstract Recent Vision-Language Pretrained (VLP) models have become the backbone for many downstream tasks, but they are utilized as frozen model without learning. Prompt learning is a method to improve the pre-trained VLP model by adding a learnable context vector to the inputs of the text encoder. In a few-shot learning scenario of the downstream task, MLE training can lead the context vector to over-fit dominant image features in the training data. This overfitting can potentially harm the generalization ability, especially in the presence of a distribution shift between the training and test dataset. This paper presents a Bayesian-based framework of prompt learning, which could alleviate the overfitting issues on few-shot learning application and increase the adaptability of prompts on unseen instances. Specifically, modeling data-dependent prior enhances the adaptability of text features for both seen and unseen image features without the trade-off of performance between them. Based on the Bayesian framework, we utilize the Wasserstein Gradient Flow in the estimation of our target posterior distribution, which enables our prompt to be flexible in capturing the complex modes of image features. We demonstrate the effectiveness of our method on benchmark datasets for several experiments by showing statistically significant improvements on performance compared to existing methods. The code is available at https://github.com/youngjae-cho/APP.
Reinforcement Learning for Optimizing RAG for Domain Chatbots
Authors: Authors: Mandar Kulkarni, Praveen Tangarajan, Kyung Kim, Anusua Trivedi
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.06800
Pdf link: https://arxiv.org/pdf/2401.06800
Abstract With the advent of Large Language Models (LLM), conversational assistants have become prevalent for domain use cases. LLMs acquire the ability to contextual question answering through training, and Retrieval Augmented Generation (RAG) further enables the bot to answer domain-specific questions. This paper describes a RAG-based approach for building a chatbot that answers user's queries using Frequently Asked Questions (FAQ) data. We train an in-house retrieval embedding model using infoNCE loss, and experimental results demonstrate that the in-house model works significantly better than the well-known general-purpose public embedding model, both in terms of retrieval accuracy and Out-of-Domain (OOD) query detection. As an LLM, we use an open API-based paid ChatGPT model. We noticed that a previously retrieved-context could be used to generate an answer for specific patterns/sequences of queries (e.g., follow-up queries). Hence, there is a scope to optimize the number of LLM tokens and cost. Assuming a fixed retrieval model and an LLM, we optimize the number of LLM tokens using Reinforcement Learning (RL). Specifically, we propose a policy-based model external to the RAG, which interacts with the RAG pipeline through policy actions and updates the policy to optimize the cost. The policy model can perform two actions: to fetch FAQ context or skip retrieval. We use the open API-based GPT-4 as the reward model. We then train a policy model using policy gradient on multiple training chat sessions. As a policy model, we experimented with a public gpt-2 model and an in-house BERT model. With the proposed RL-based optimization combined with similarity threshold, we are able to achieve significant cost savings while getting a slightly improved accuracy. Though we demonstrate results for the FAQ chatbot, the proposed RL approach is generic and can be experimented with any existing RAG pipeline.
Always-Sparse Training by Growing Connections with Guided Stochastic Exploration
Authors: Authors: Mike Heddes, Narayan Srinivasa, Tony Givargis, Alexandru Nicolau
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.06898
Pdf link: https://arxiv.org/pdf/2401.06898
Abstract The excessive computational requirements of modern artificial neural networks (ANNs) are posing limitations on the machines that can run them. Sparsification of ANNs is often motivated by time, memory and energy savings only during model inference, yielding no benefits during training. A growing body of work is now focusing on providing the benefits of model sparsification also during training. While these methods greatly improve the training efficiency, the training algorithms yielding the most accurate models still materialize the dense weights, or compute dense gradients during training. We propose an efficient, always-sparse training algorithm with excellent scaling to larger and sparser models, supported by its linear time complexity with respect to the model width during training and inference. Moreover, our guided stochastic exploration algorithm improves over the accuracy of previous sparse training methods. We evaluate our method on CIFAR-10/100 and ImageNet using ResNet, VGG, and ViT models, and compare it against a range of sparsification methods.
Gradient Coreset for Federated Learning
Authors: Authors: Durga Sivasubramanian, Lokesh Nagalapatti, Rishabh Iyer, Ganesh Ramakrishnan
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.06989
Pdf link: https://arxiv.org/pdf/2401.06989
Abstract Federated Learning (FL) is used to learn machine learning models with data that is partitioned across multiple clients, including resource-constrained edge devices. It is therefore important to devise solutions that are efficient in terms of compute, communication, and energy consumption, while ensuring compliance with the FL framework's privacy requirements. Conventional approaches to these problems select a weighted subset of the training dataset, known as coreset, and learn by fitting models on it. Such coreset selection approaches are also known to be robust to data noise. However, these approaches rely on the overall statistics of the training data and are not easily extendable to the FL setup. In this paper, we propose an algorithm called Gradient based Coreset for Robust and Efficient Federated Learning (GCFL) that selects a coreset at each client, only every $K$ communication rounds and derives updates only from it, assuming the availability of a small validation dataset at the server. We demonstrate that our coreset selection technique is highly effective in accounting for noise in clients' data. We conduct experiments using four real-world datasets and show that GCFL is (1) more compute and energy efficient than FL, (2) robust to various kinds of noise in both the feature space and labels, (3) preserves the privacy of the validation dataset, and (4) introduces a small communication overhead but achieves significant gains in performance, particularly in cases when the clients' data is noisy.
An ADRC-Incorporated Stochastic Gradient Descent Algorithm for Latent Factor Analysis
Authors: Authors: Jinli Li, Ye Yuan
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.07012
Pdf link: https://arxiv.org/pdf/2401.07012
Abstract High-dimensional and incomplete (HDI) matrix contains many complex interactions between numerous nodes. A stochastic gradient descent (SGD)-based latent factor analysis (LFA) model is remarkably effective in extracting valuable information from an HDI matrix. However, such a model commonly encounters the problem of slow convergence because a standard SGD algorithm only considers the current learning error to compute the stochastic gradient without considering the historical and future state of the learning error. To address this critical issue, this paper innovatively proposes an ADRC-incorporated SGD (ADS) algorithm by refining the instance learning error by considering the historical and future state by following the principle of an ADRC controller. With it, an ADS-based LFA model is further achieved for fast and accurate latent factor analysis on an HDI matrix. Empirical studies on two HDI datasets demonstrate that the proposed model outperforms the state-of-the-art LFA models in terms of computational efficiency and accuracy for predicting the missing data of an HDI matrix.
BP(λ): Online Learning via Synthetic Gradients
Authors: Authors: Joseph Pemberton, Rui Ponte Costa
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.07044
Pdf link: https://arxiv.org/pdf/2401.07044
Abstract Training recurrent neural networks typically relies on backpropagation through time (BPTT). BPTT depends on forward and backward passes to be completed, rendering the network locked to these computations before loss gradients are available. Recently, Jaderberg et al. proposed synthetic gradients to alleviate the need for full BPTT. In their implementation synthetic gradients are learned through a mixture of backpropagated gradients and bootstrapped synthetic gradients, analogous to the temporal difference (TD) algorithm in Reinforcement Learning (RL). However, as in TD learning, heavy use of bootstrapping can result in bias which leads to poor synthetic gradient estimates. Inspired by the accumulate $\mathrm{TD}(\lambda)$ in RL, we propose a fully online method for learning synthetic gradients which avoids the use of BPTT altogether: accumulate $BP(\lambda)$. As in accumulate $\mathrm{TD}(\lambda)$, we show analytically that accumulate $\mathrm{BP}(\lambda)$ can control the level of bias by using a mixture of temporal difference errors and recursively defined eligibility traces. We next demonstrate empirically that our model outperforms the original implementation for learning synthetic gradients in a variety of tasks, and is particularly suited for capturing longer timescales. Finally, building on recent work we reflect on accumulate $\mathrm{BP}(\lambda)$ as a principle for learning in biological circuits. In summary, inspired by RL principles we introduce an algorithm capable of bias-free online learning via synthetic gradients.
Resource Allocation in Uplink Multi STAR-RIS-aided NOMA System via Meta-Learning
Authors: Authors: Sepideh Javadi, Armin Farhadi, Mohammad Robat Mili, Eduard Jorswieck
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.07100
Pdf link: https://arxiv.org/pdf/2401.07100
Abstract Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) is a novel technology which enables the full-space coverage by splitting the incident signal into reflected and transmitted signals. In this letter, a multi STAR-RIS-aided system using non-orthogonal multiple access (NOMA) in an uplink transmission is considered, where the multi-order reflections among multiple STAR-RISs assist the transmission from the single-antenna users to the multi-antenna base station (BS). Specifically, the total sum rate maximization problem is solved by jointly optimizing the active beamforming, power allocation, transmission and reflection beamforming at the STAR-RIS, and user-STAR-RIS association indicator. To solve the non-convex optimization problem, a novel deep reinforcement learning algorithm is proposed which is the combination of meta-learning and deep deterministic policy gradient (DDPG), namely Meta-DDPG. Numerical results demonstrate that the proposed Meta-DDPG algorithm outperforms the conventional DDPG algorithm.
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models
Authors: Authors: Zhengxin Zhang, Dan Zhao, Xupeng Miao, Gabriele Oliaro, Qing Li, Yong Jiang, Zhihao Jia
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.07159
Pdf link: https://arxiv.org/pdf/2401.07159
Abstract Finetuning large language models (LLMs) has been empirically effective on a variety of downstream tasks. Existing approaches to finetuning an LLM either focus on parameter-efficient finetuning, which only updates a small number of trainable parameters, or attempt to reduce the memory footprint during the training phase of the finetuning. Typically, the memory footprint during finetuning stems from three contributors: model weights, optimizer states, and intermediate activations. However, existing works still require considerable memory and none can simultaneously mitigate memory footprint for all three sources. In this paper, we present Quantized Side Tuing (QST), which enables memory-efficient and fast finetuning of LLMs by operating through a dual-stage process. First, QST quantizes an LLM's model weights into 4-bit to reduce the memory footprint of the LLM's original weights; QST also introduces a side network separated from the LLM, which utilizes the hidden states of the LLM to make task-specific predictions. Using a separate side network avoids performing backpropagation through the LLM, thus reducing the memory requirement of the intermediate activations. Furthermore, QST leverages several low-rank adaptors and gradient-free downsample modules to significantly reduce the trainable parameters, so as to save the memory footprint of the optimizer states. Experiments show that QST can reduce the total memory footprint by up to 2.3 $\times$ and speed up the finetuning process by up to 3 $\times$ while achieving competent performance compared with the state-of-the-art. When it comes to full finetuning, QST can reduce the total memory footprint up to 7 $\times$.
The Effects of Data Imbalance Under a Federated Learning Approach for Credit Risk Forecasting
Authors: Authors: Shuyao Zhang, Jordan Tay, Pedro Baiz
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.07234
Pdf link: https://arxiv.org/pdf/2401.07234
Abstract Credit risk forecasting plays a crucial role for commercial banks and other financial institutions in granting loans to customers and minimise the potential loss. However, traditional machine learning methods require the sharing of sensitive client information with an external server to build a global model, potentially posing a risk of security threats and privacy leakage. A newly developed privacy-preserving distributed machine learning technique known as Federated Learning (FL) allows the training of a global model without the necessity of accessing private local data directly. This investigation examined the feasibility of federated learning in credit risk assessment and showed the effects of data imbalance on model performance. Two neural network architectures, Multilayer Perceptron (MLP) and Long Short-Term Memory (LSTM), and one tree ensemble architecture, Extreme Gradient Boosting (XGBoost), were explored across three different datasets under various scenarios involving different numbers of clients and data distribution configurations. We demonstrate that federated models consistently outperform local models on non-dominant clients with smaller datasets. This trend is especially pronounced in highly imbalanced data scenarios, yielding a remarkable average improvement of 17.92% in model performance. However, for dominant clients (clients with more data), federated models may not exhibit superior performance, suggesting the need for special incentives for this type of clients to encourage their participation.
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy
Authors: Authors: Chengli Tan, Jiangshe Zhang, Junmin Liu, Yicheng Wang, Yunda Hao
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.07250
Pdf link: https://arxiv.org/pdf/2401.07250
Abstract Recently, sharpness-aware minimization (SAM) has attracted a lot of attention because of its surprising effectiveness in improving generalization performance.However, training neural networks with SAM can be highly unstable since the loss does not decrease along the direction of the exact gradient at the current point, but instead follows the direction of a surrogate gradient evaluated at another point nearby. To address this issue, we propose a simple renormalization strategy, dubbed StableSAM, so that the norm of the surrogate gradient maintains the same as that of the exact gradient. Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost. With elementary tools from convex optimization and learning theory, we also conduct a theoretical analysis of sharpness-aware training, revealing that compared to stochastic gradient descent (SGD), the effectiveness of SAM is only assured in a limited regime of learning rate. In contrast, we show how StableSAM extends this regime of learning rate and when it can consistently perform better than SAM with minor modification. Finally, we demonstrate the improved performance of StableSAM on several representative data sets and tasks.
Low-Rank Gradient Compression with Error Feedback for MIMO Wireless Federated Learning
Authors: Authors: Mingzhao Guo, Dongzhu Liu, Osvaldo Simeone, Dingzhu Wen
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.07496
Pdf link: https://arxiv.org/pdf/2401.07496
Abstract This paper presents a novel approach to enhance the communication efficiency of federated learning (FL) in multiple input and multiple output (MIMO) wireless systems. The proposed method centers on a low-rank matrix factorization strategy for local gradient compression based on alternating least squares, along with over-the-air computation and error feedback. The proposed protocol, termed over-the-air low-rank compression (Ota-LC), is demonstrated to have lower computation cost and lower communication overhead as compared to existing benchmarks while guaranteeing the same inference performance. As an example, when targeting a test accuracy of 80% on the Cifar-10 dataset, Ota-LC achieves a reduction in total communication costs of at least 30% when contrasted with benchmark schemes, while also reducing the computational complexity order by a factor equal to the sum of the dimension of the gradients.
Editing Arbitrary Propositions in LLMs without Subject Labels
Authors: Authors: Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2401.07526
Pdf link: https://arxiv.org/pdf/2401.07526
Abstract Large Language Model (LLM) editing modifies factual information in LLMs. Locate-and-Edit (L\&E) methods accomplish this by finding where relevant information is stored within the neural network, and editing the weights at that location. The goal of editing is to modify the response of an LLM to a proposition independently of its phrasing, while not modifying its response to other related propositions. Existing methods are limited to binary propositions, which represent straightforward binary relations between a subject and an object. Furthermore, existing methods rely on semantic subject labels, which may not be available or even be well-defined in practice. In this paper, we show that both of these issues can be effectively skirted with a simple and fast localization method called Gradient Tracing (GT). This localization method allows editing arbitrary propositions instead of just binary ones, and does so without the need for subject labels. As propositions always have a truth value, our experiments prompt an LLM as a boolean classifier, and edit its T/F response to propositions. Our method applies GT for location tracing, and then edit the model at that location using a mild variant of Rank-One Model Editing (ROME). On datasets of binary propositions derived from the CounterFact dataset, we show that our method -- without access to subject labels -- performs close to state-of-the-art L\&E methods which has access subject labels. We then introduce a new dataset, Factual Accuracy Classification Test (FACT), which includes non-binary propositions and for which subject labels are not generally applicable, and therefore is beyond the scope of existing L\&E methods. Nevertheless, we show that with our method editing is possible on FACT.
Multi-Objective Optimization in STAR-RIS-Aided SWIPT with RSMA via Meta-Learning
Authors: Authors: Mojtaba Amiri, Elaheh Vaezpour, Sepideh Javadi, Mohammad Robat Mili, Halim Yanikomeroglu, Mehdi Bennis
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2401.07644
Pdf link: https://arxiv.org/pdf/2401.07644
Abstract Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) is a cutting-edge concept for the sixth-generation (6G) wireless networks. In this letter, we propose a novel system that incorporates STAR-RIS with simultaneous wireless information and power transfer (SWIPT) using rate splitting multiple access (RSMA). The proposed system facilitates communication from a multi-antenna base station (BS) to single-antenna users in a downlink transmission. The BS concurrently sends energy and information signals to multiple energy harvesting receivers (EHRs) and information data receivers (IDRs) with the support of a deployed STAR-RIS. Furthermore, a multi-objective optimization is introduced to strike a balance between users' sum rate and the total harvested energy. To achieve this, an optimization problem is formulated to optimize the energy/information beamforming vectors at the BS, the phase shifts at the STAR-RIS, and the common message rate. Subsequently, we employ a meta deep deterministic policy gradient (Meta-DDPG) approach to solve the complex problem. Simulation results validate that the proposed algorithm significantly enhances both data rate and harvested energy in comparison to conventional DDPG.
Online Learning of Piecewise Polynomial Signed Distance Fields for Manipulation Tasks
Authors: Authors: Ante Marić, Yiming Li, Sylvain Calinon
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.07698
Pdf link: https://arxiv.org/pdf/2401.07698
Abstract Reasoning about distance is indispensable for establishing or avoiding contact in manipulation tasks. To this end, we present an online method for learning implicit representations of signed distance using piecewise polynomial basis functions. Starting from an arbitrary prior shape, our approach incrementally constructs a continuous representation from incoming point cloud data. It offers fast access to distance and analytical gradients without the need to store training data. We assess the accuracy of our model on a diverse set of household objects and compare it to neural network and Gaussian process counterparts. Distance reconstruction and real-time updates are further evaluated in a physical experiment by simultaneously collecting sparse point cloud data and using the evolving model to control a manipulator.
Efficient Nonparametric Tensor Decomposition for Binary and Count Data
Authors: Authors: Zerui Tao, Toshihisa Tanaka, Qibin Zhao
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2401.07711
Pdf link: https://arxiv.org/pdf/2401.07711
Abstract In numerous applications, binary reactions or event counts are observed and stored within high-order tensors. Tensor decompositions (TDs) serve as a powerful tool to handle such high-dimensional and sparse data. However, many traditional TDs are explicitly or implicitly designed based on the Gaussian distribution, which is unsuitable for discrete data. Moreover, most TDs rely on predefined multi-linear structures, such as CP and Tucker formats. Therefore, they may not be effective enough to handle complex real-world datasets. To address these issues, we propose ENTED, an \underline{E}fficient \underline{N}onparametric \underline{TE}nsor \underline{D}ecomposition for binary and count tensors. Specifically, we first employ a nonparametric Gaussian process (GP) to replace traditional multi-linear structures. Next, we utilize the \pg augmentation which provides a unified framework to establish conjugate models for binary and count distributions. Finally, to address the computational issue of GPs, we enhance the model by incorporating sparse orthogonal variational inference of inducing points, which offers a more effective covariance approximation within GPs and stochastic natural gradient updates for nonparametric models. We evaluate our model on several real-world tensor completion tasks, considering binary and count datasets. The results manifest both better performance and computational advantages of the proposed model.
Activations and Gradients Compression for Model-Parallel Training
Authors: Authors: Mikhail Rudakov, Aleksandr Beznosikov, Yaroslav Kholodov, Alexander Gasnikov
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.07788
Pdf link: https://arxiv.org/pdf/2401.07788
Abstract Large neural networks require enormous computational clusters of machines. Model-parallel training, when the model architecture is partitioned sequentially between workers, is a popular approach for training modern models. Information compression can be applied to decrease workers communication time, as it is often a bottleneck in such systems. This work explores how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We analyze compression methods such as quantization and TopK compression, and also experiment with error compensation techniques. Moreover, we employ TopK with AQ-SGD per-batch error feedback approach. We conduct experiments on image classification and language model fine-tuning tasks. Our findings demonstrate that gradients require milder compression rates than activations. We observe that $K=10\%$ is the lowest TopK compression level, which does not harm model convergence severely. Experiments also show that models trained with TopK perform well only when compression is also applied during inference. We find that error feedback techniques do not improve model-parallel training compared to plain compression, but allow model inference without compression with almost no quality drop. Finally, when applied with the AQ-SGD approach, TopK stronger than with $ K=30\%$ worsens model performance significantly.
Optimal experimental design via gradient flow
Authors: Authors: Ruhui Jin, Martin Guerra, Qin Li, Stephen Wright
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2401.07806
Pdf link: https://arxiv.org/pdf/2401.07806
Abstract Optimal experimental design (OED) has far-reaching impacts in many scientific domains. We study OED over a continuous-valued design space, a setting that occurs often in practice. Optimization of a distributional function over an infinite-dimensional probability measure space is conceptually distinct from the discrete OED tasks that are conventionally tackled. We propose techniques based on optimal transport and Wasserstein gradient flow. A practical computational approach is derived from the Monte Carlo simulation, which transforms the infinite-dimensional optimization problem to a finite-dimensional problem over Euclidean space, to which gradient descent can be applied. We discuss first-order criticality and study the convexity properties of the OED objective. We apply our algorithm to the tomography inverse problem, where the solution reveals optimal sensor placements for imaging.
The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
Authors: Authors: Shuze Liu, Shuhang Chen, Shangtong Zhang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.07844
Pdf link: https://arxiv.org/pdf/2401.07844
Abstract Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lyapunov drift condition and trivially holds if the Markov chain is finite and irreducible.
A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions
Authors: Authors: Daniel Tschernutter, Mathias Kraus, Stefan Feuerriegel
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.07936
Pdf link: https://arxiv.org/pdf/2401.07936
Abstract We propose an algorithm for optimizing the parameters of single hidden layer neural networks. Specifically, we derive a blockwise difference-of-convex (DC) functions representation of the objective function. Based on the latter, we propose a block coordinate descent (BCD) approach that we combine with a tailored difference-of-convex functions algorithm (DCA). We prove global convergence of the proposed algorithm. Furthermore, we mathematically analyze the convergence rate of parameters and the convergence rate in value (i.e., the training loss). We give conditions under which our algorithm converges linearly or even faster depending on the local shape of the loss function. We confirm our theoretical derivations numerically and compare our algorithm against state-of-the-art gradient-based solvers in terms of both training loss and test loss.
Small Object Detection by DETR via Information Augmentation and Adaptive Feature Fusion
Authors: Authors: Ji Huang, Hui Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.08017
Pdf link: https://arxiv.org/pdf/2401.08017
Abstract The main challenge for small object detection algorithms is to ensure accuracy while pursuing real-time performance. The RT-DETR model performs well in real-time object detection, but performs poorly in small object detection accuracy. In order to compensate for the shortcomings of the RT-DETR model in small object detection, two key improvements are proposed in this study. Firstly, The RT-DETR utilises a Transformer that receives input solely from the final layer of Backbone features. This means that the Transformer's input only receives semantic information from the highest level of abstraction in the Deep Network, and ignores detailed information such as edges, texture or color gradients that are critical to the location of small objects at lower levels of abstraction. Including only deep features can introduce additional background noise. This can have a negative impact on the accuracy of small object detection. To address this issue, we propose the fine-grained path augmentation method. This method helps to locate small objects more accurately by providing detailed information to the deep network. So, the input to the transformer contains both semantic and detailed information. Secondly, In RT-DETR, the decoder takes feature maps of different levels as input after concatenating them with equal weight. However, this operation is not effective in dealing with the complex relationship of multi-scale information captured by feature maps of different sizes. Therefore, we propose an adaptive feature fusion algorithm that assigns learnable parameters to each feature map from different levels. This allows the model to adaptively fuse feature maps from different levels and effectively integrate feature information from different scales. This enhances the model's ability to capture object features at different scales, thereby improving the accuracy of detecting small objects.
Machine Learning-Based Malicious Vehicle Detection for Security Threats and Attacks in Vehicle Ad-hoc Network (VANET) Communications
Authors: Authors: Thanh Nguyen Canh, Xiem HoangVan
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
Arxiv link: https://arxiv.org/abs/2401.08135
Pdf link: https://arxiv.org/pdf/2401.08135
Abstract With the rapid growth of Vehicle Ad-hoc Network (VANET) as a promising technology for efficient and reliable communication among vehicles and infrastructure, the security and integrity of VANET communications has become a critical concern. One of the significant threats to VANET is the presence of blackhole attacks, where malicious nodes disrupt the network's functionality and compromise data confidentiality, integrity, and availability. In this paper, we propose a machine learning-based approach for blackhole detection in VANET. To achieve this task, we first create a comprehensive dataset comprising normal and malicious traffic flows. Afterward, we study and define a promising set of features to discriminate the blackhole attacks. Finally, we evaluate various machine learning algorithms, including Gradient Boosting, Random Forest, Support Vector Machines, k-Nearest Neighbors, Gaussian Naive Bayes, and Logistic Regression. Experimental results demonstrate the effectiveness of these algorithms in distinguishing between normal and malicious nodes. Our findings also highlight the potential of machine learning based approach in enhancing the security of VANET by detecting and mitigating blackhole attacks.
Distributed Stackelberg Equilibrium Seeking for Networked Multi-Leader Multi-Follower Games with A Clustered Information Structure
Authors: Authors: Yue Chen, Peng Yi
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2401.08144
Pdf link: https://arxiv.org/pdf/2401.08144
Abstract The Stackelberg game depicts a leader-follower relationship wherein decisions are made sequentially, and the Stackelberg equilibrium represents an expected optimal solution when the leader can anticipate the rational response of the follower. Motivated by control of network systems with two levels of decision-making hierarchy, such as the management of energy networks and power coordination at cellular networks, a networked multi-leaders and multi-followers Stackelberg game is proposed. Due to the constraint of limited information interaction among players, a clustered information structure is assumed that each leader can only communicate with a portion of overall followers, namely its subordinated followers, and also only with its local neighboring leaders. In this case, the leaders cannot fully anticipate the collective rational response of all followers with its local information. To address Stackelberg equilibrium seeking under this partial information structure, we propose a distributed seeking algorithm based on implicit gradient estimation and network consensus mechanisms. We rigorously prove the convergence of the algorithm for both diminishing and constant step sizes under strict and strong monotonicity conditions, respectively. Furthermore, the model and the algorithm can also incorporate linear equality and inequality constraints into the followers' optimization problems, with the approach of the interior point barrier function. Finally, we present numerical simulations in applications to corroborate our claims on the proposed framework.
Boosting Gradient Ascent for Continuous DR-submodular Maximization
Authors: Authors: Qixin Zhang, Zongqi Wan, Zengde Deng, Zaiyi Chen, Xiaoming Sun, Jialin Zhang, Yu Yang
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2401.08330
Pdf link: https://arxiv.org/pdf/2401.08330
Abstract Projected Gradient Ascent (PGA) is the most commonly used optimization scheme in machine learning and operations research areas. Nevertheless, numerous studies and examples have shown that the PGA methods may fail to achieve the tight approximation ratio for continuous DR-submodular maximization problems. To address this challenge, we present a boosting technique in this paper, which can efficiently improve the approximation guarantee of the standard PGA to \emph{optimal} with only small modifications on the objective function. The fundamental idea of our boosting technique is to exploit non-oblivious search to derive a novel auxiliary function $F$, whose stationary points are excellent approximations to the global maximum of the original DR-submodular objective $f$. Specifically, when $f$ is monotone and $\gamma$-weakly DR-submodular, we propose an auxiliary function $F$ whose stationary points can provide a better $(1-e^{-\gamma})$-approximation than the $(\gamma^2/(1+\gamma^2))$-approximation guaranteed by the stationary points of $f$ itself. Similarly, for the non-monotone case, we devise another auxiliary function $F$ whose stationary points can achieve an optimal $\frac{1-\min{\boldsymbol{x}\in\mathcal{C}}|\boldsymbol{x}|{\infty}}{4}$-approximation guarantee where $\mathcal{C}$ is a convex constraint set. In contrast, the stationary points of the original non-monotone DR-submodular function can be arbitrarily bad~\citep{chen2023continuous}. Furthermore, we demonstrate the scalability of our boosting technique on four problems. In all of these four problems, our resulting variants of boosting PGA algorithm beat the previous standard PGA in several aspects such as approximation ratio and efficiency. Finally, we corroborate our theoretical findings with numerical experiments, which demonstrate the effectiveness of our boosting PGA methods.
Adjoint Monte Carlo Method
Authors: Authors: Russel Caflisch, Yunan Yang
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
Arxiv link: https://arxiv.org/abs/2401.08361
Pdf link: https://arxiv.org/pdf/2401.08361
Abstract This survey explores the development of adjoint Monte Carlo methods for solving optimization problems governed by kinetic equations, a common challenge in areas such as plasma control and device design. These optimization problems are particularly demanding due to the high dimensionality of the phase space and the randomness in evaluating the objective functional, a consequence of using a forward Monte Carlo solver. To overcome these difficulties, a range of ``adjoint Monte Carlo methods'' have been devised. These methods skillfully combine Monte Carlo gradient estimators with PDE-constrained optimization, introducing innovative solutions tailored for kinetic applications. In this review, we begin by examining three primary strategies for Monte Carlo gradient estimation: the score function approach, the reparameterization trick, and the coupling method. We also delve into the adjoint-state method, an essential element in PDE-constrained optimization. Focusing on applications in the radiative transfer equation and the nonlinear Boltzmann equation, we provide a comprehensive guide on how to integrate Monte Carlo gradient techniques within both the optimize-then-discretize and the discretize-then-optimize frameworks from PDE-constrained optimization. This approach leads to the formulation of effective adjoint Monte Carlo methods, enabling efficient gradient estimation in complex, high-dimensional optimization problems.
Centralized vs. Decoupled Dual-Arm Planning Taking into Account Path Quality
Authors: Authors: Jonas Wittmann, Franziska Ochsenfarth, Valentin Sonneville, Daniel Rixen
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.08443
Pdf link: https://arxiv.org/pdf/2401.08443
Abstract The aim of coordinated planning is to avoid robot-to-robot collisions in a multi-robot system, and there are two standard solution approaches: centralized planning and decoupled planning. Our first contribution is a decoupled planning approach that ensures C2-continuous control commands with zero velocities at the start and goal. We benchmark our decoupled approach with a centralized approach. Contrary to literature, we show that for a standard motion planning pipeline, such as the one used by MoveIt!, centralized planning is superior to decoupled planning in dual-arm manipulation: It has a lower computation time and a higher robustness. Our second contribution is an optimization that minimizes the rotational motion of an end-effector while considering obstacle avoidance. We derive the analytic gradients of this optimization problem, making the algorithm suitable for online motion planning. Our optimization extends an existing path quality improvement method. Integrating it into our decoupled approach overcomes its shortcomings and provides a motion planning pipeline that is robust at up to 99.9% with a planning time of less than 1s and that computes high-quality paths.
Battery-Swapping Multi-Agent System for Sustained Operation of Large Planetary Fleets
Authors: Authors: Ethan Holand, Jarrod Homer, Alex Storrer, Musheeera Khandeker, Ethan F. Muhlon, Maulik Patel, Ben-oni Vainqueur, David Antaki, Naomi Cooke, Chloe Wilson, Bahram Shafai, Nathaniel Hanson, Taşkın Padır
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2401.08497
Pdf link: https://arxiv.org/pdf/2401.08497
Abstract We propose a novel, heterogeneous multi-agent architecture that miniaturizes rovers by outsourcing power generation to a central hub. By delegating power generation and distribution functions to this hub, the size, weight, power, and cost (SWAP-C) per rover are reduced, enabling efficient fleet scaling. As these rovers conduct mission tasks around the terrain, the hub charges an array of replacement battery modules. When a rover requires charging, it returns to the hub to initiate an autonomous docking sequence and exits with a fully charged battery. This confers an advantage over direct charging methods, such as wireless or wired charging, by replenishing a rover in minutes as opposed to hours, increasing net rover uptime. This work shares an open-source platform developed to demonstrate battery swapping on unknown field terrain. We detail our design methodologies utilized for increasing system reliability, with a focus on optimization, robust mechanical design, and verification. Optimization of the system is discussed, including the design of passive guide rails through simulation-based optimization methods which increase the valid docking configuration space by 258%. The full system was evaluated during integrated testing, where an average servicing time of 98 seconds was achieved on surfaces with a gradient up to 10{\deg}. We conclude by briefly proposing flight considerations for advancing the system toward a space-ready design. In sum, this prototype represents a proof of concept for autonomous docking and battery transfer on field terrain, advancing its Technology Readiness Level (TRL) from 1 to 3.
Spatial Entity Resolution between Restaurant Locations and Transportation Destinations in Southeast Asia
Authors: Authors: Emily Gao, Dominic Widdows
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2401.08537
Pdf link: https://arxiv.org/pdf/2401.08537
Abstract As a tech company, Grab has expanded from transportation to food delivery, aiming to serve Southeast Asia with hyperlocalized applications. Information about places as transportation destinations can help to improve our knowledge about places as restaurants, so long as the spatial entity resolution problem between these datasets can be solved. In this project, we attempted to recognize identical place entities from databases of Points-of-Interest (POI) and GrabFood restaurants, using their spatial and textual attributes, i.e., latitude, longitude, place name, and street address. Distance metrics were calculated for these attributes and fed to tree-based classifiers. POI-restaurant matching was conducted separately for Singapore, Philippines, Indonesia, and Malaysia. Experimental estimates demonstrate that a matching POI can be found for over 35% of restaurants in these countries. As part of these estimates, test datasets were manually created, and RandomForest, AdaBoost, Gradient Boosting, and XGBoost perform well, with most accuracy, precision, and recall scores close to or higher than 90% for matched vs. unmatched classification. To the authors' knowledge, there are no previous published scientific papers devoted to matching of spatial entities for the Southeast Asia region.
Keyword: super-resolution

Deep Blind Super-Resolution for Satellite Video
Authors: Authors: Yi Xiao, Qiangqiang Yuan, Qiang Zhang, Liangpei Zhang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.07139
Pdf link: https://arxiv.org/pdf/2401.07139
Abstract Recent efforts have witnessed remarkable progress in Satellite Video Super-Resolution (SVSR). However, most SVSR methods usually assume the degradation is fixed and known, e.g., bicubic downsampling, which makes them vulnerable in real-world scenes with multiple and unknown degradations. To alleviate this issue, blind SR has thus become a research hotspot. Nevertheless, existing approaches are mainly engaged in blur kernel estimation while losing sight of another critical aspect for VSR tasks: temporal compensation, especially compensating for blurry and smooth pixels with vital sharpness from severely degraded satellite videos. Therefore, this paper proposes a practical Blind SVSR algorithm (BSVSR) to explore more sharp cues by considering the pixel-wise blur levels in a coarse-to-fine manner. Specifically, we employed multi-scale deformable convolution to coarsely aggregate the temporal redundancy into adjacent frames by window-slid progressive fusion. Then the adjacent features are finely merged into mid-feature using deformable attention, which measures the blur levels of pixels and assigns more weights to the informative pixels, thus inspiring the representation of sharpness. Moreover, we devise a pyramid spatial transformation module to adjust the solution space of sharp mid-feature, resulting in flexible feature adaptation in multi-level domains. Quantitative and qualitative evaluations on both simulated and real-world satellite videos demonstrate that our BSVSR performs favorably against state-of-the-art non-blind and blind SR models. Code will be available at https://github.com/XY-boy/Blind-Satellite-VSR
City Scene Super-Resolution via Geometric Error Minimization
Authors: Authors: Zhengyang Lu, Feng Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.07272
Pdf link: https://arxiv.org/pdf/2401.07272
Abstract Super-resolution techniques are crucial in improving image granularity, particularly in complex urban scenes, where preserving geometric structures is vital for data-informed cultural heritage applications. In this paper, we propose a city scene super-resolution method via geometric error minimization. The geometric-consistent mechanism leverages the Hough Transform to extract regular geometric features in city scenes, enabling the computation of geometric errors between low-resolution and high-resolution images. By minimizing mixed mean square error and geometric align error during the super-resolution process, the proposed method efficiently restores details and geometric regularities. Extensive validations on the SET14, BSD300, Cityscapes and GSV-Cities datasets demonstrate that the proposed method outperforms existing state-of-the-art methods, especially in urban scenes.
Sparsity-based background removal for STORM super-resolution images
Authors: Authors: Patris Valera, Josué Page Vizcaíno, Tobias Lasser
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2401.07746
Pdf link: https://arxiv.org/pdf/2401.07746
Abstract Single-molecule localization microscopy techniques, like stochastic optical reconstruction microscopy (STORM), visualize biological specimens by stochastically exciting sparse blinking emitters. The raw images suffer from unwanted background fluorescence, which must be removed to achieve super-resolution. We introduce a sparsity-based background removal method by adapting a neural network (SLNet) from a different microscopy domain. The SLNet computes a low-rank representation of the images, and then, by subtracting it from the raw images, the sparse component is computed, representing the frames without the background. We compared our approach with widely used background removal methods, such as the median background removal or the rolling ball algorithm, on two commonly used STORM datasets, one glial cell, and one microtubule dataset. The SLNet delivers STORM frames with less background, leading to higher emitters' localization precision and higher-resolution reconstructed images than commonly used methods. Notably, the SLNet is lightweight and easily trainable (<5 min). Since it is trained in an unsupervised manner, no prior information is required and can be applied to any STORM dataset. We uploaded a pre-trained SLNet to the Bioimage model zoo, easily accessible through ImageJ. Our results show that our sparse decomposition method could be an essential and efficient STORM pre-processing tool.
No-Clean-Reference Image Super-Resolution: Application to Electron Microscopy
Authors: Authors: Mohammad Khateri, Morteza Ghahremani, Alejandra Sierra, Jussi Tohka
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2401.08115
Pdf link: https://arxiv.org/pdf/2401.08115
Abstract The inability to acquire clean high-resolution (HR) electron microscopy (EM) images over a large brain tissue volume hampers many neuroscience studies. To address this challenge, we propose a deep-learning-based image super-resolution (SR) approach to computationally reconstruct clean HR 3D-EM with a large field of view (FoV) from noisy low-resolution (LR) acquisition. Our contributions are I) Investigating training with no-clean references for $\ell_2$ and $\ell_1$ loss functions; II) Introducing a novel network architecture, named EMSR, for enhancing the resolution of LR EM images while reducing inherent noise; and, III) Comparing different training strategies including using acquired LR and HR image pairs, i.e., real pairs with no-clean references contaminated with real corruptions, the pairs of synthetic LR and acquired HR, as well as acquired LR and denoised HR pairs. Experiments with nine brain datasets showed that training with real pairs can produce high-quality super-resolved results, demonstrating the feasibility of training with non-clean references for both loss functions. Additionally, comparable results were observed, both visually and numerically, when employing denoised and noisy references for training. Moreover, utilizing the network trained with synthetically generated LR images from HR counterparts proved effective in yielding satisfactory SR results, even in certain cases, outperforming training with real pairs. The proposed SR network was compared quantitatively and qualitatively with several established SR techniques, showcasing either the superiority or competitiveness of the proposed method in mitigating noise while recovering fine details.
The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and Aggregation
Authors: Authors: Xinni Jiang, Zengsheng Kuang, Chunle Guo, Ruixun Zhang, Lei Cai, Xiao Fan, Chongyi Li
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.08123
Pdf link: https://arxiv.org/pdf/2401.08123
Abstract Guided depth super-resolution (GDSR) involves restoring missing depth details using the high-resolution RGB image of the same scene. Previous approaches have struggled with the heterogeneity and complementarity of the multi-modal inputs, and neglected the issues of modal misalignment, geometrical misalignment, and feature selection. In this study, we rethink some essential components in GDSR networks and propose a simple yet effective Dynamic Dual Alignment and Aggregation network (D2A2). D2A2 mainly consists of 1) a dynamic dual alignment module that adapts to alleviate the modal misalignment via a learnable domain alignment block and geometrically align cross-modal features by learning the offset; and 2) a mask-to-pixel feature aggregate module that uses the gated mechanism and pixel attention to filter out irrelevant texture noise from RGB features and combine the useful features with depth features. By combining the strengths of RGB and depth features while minimizing disturbance introduced by the RGB image, our method with simple reuse and redesign of basic components achieves state-of-the-art performance on multiple benchmark datasets. The code is available at https://github.com/JiangXinni/D2A2.
Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary
Authors: Authors: Leheng Zhang, Yawei Li, Xingyu Zhou, Xiaorui Zhao, Shuhang Gu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.08209
Pdf link: https://arxiv.org/pdf/2401.08209
Abstract Single Image Super-Resolution is a classic computer vision problem that involves estimating high-resolution (HR) images from low-resolution (LR) ones. Although deep neural networks (DNNs), especially Transformers for super-resolution, have seen significant advancements in recent years, challenges still remain, particularly in limited receptive field caused by window-based self-attention. To address these issues, we introduce a group of auxiliary Adapeive Token Dictionary to SR Transformer and establish an ATD-SR method. The introduced token dictionary could learn prior information from training data and adapt the learned prior to specific testing image through an adaptive refinement step. The refinement strategy could not only provide global information to all input tokens but also group image tokens into categories. Based on category partitions, we further propose a category-based self-attention mechanism designed to leverage distant but similar tokens for enhancing input features. The experimental results show that our method achieves the best performance on various single image super-resolution benchmarks.
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Authors: Authors: Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2401.08503
Pdf link: https://arxiv.org/pdf/2401.08503
Abstract One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio to generate a talking portrait video. The existing methods fail to simultaneously achieve the goals of accurate 3D avatar reconstruction and stable talking face animation. Besides, while the existing works mainly focus on synthesizing the head part, it is also vital to generate natural torso and background segments to obtain a realistic talking portrait video. To address these limitations, we present Real3D-Potrait, a framework that (1) improves the one-shot 3D reconstruction power with a large image-to-plane model that distills 3D prior knowledge from a 3D face generative model; (2) facilitates accurate motion-conditioned animation with an efficient motion adapter; (3) synthesizes realistic video with natural torso movement and switchable background using a head-torso-background super-resolution model; and (4) supports one-shot audio-driven talking face generation with a generalizable audio-to-motion model. Extensive experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos compared to previous methods.

zoq / arxiv-updates

New submissions for Wed, 17 Jan 24 #688

Keyword: sgd

Deep Learning Based Cyberbullying Detection in Bangla Language

An ADRC-Incorporated Stochastic Gradient Descent Algorithm for Latent Factor Analysis

Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy

Activations and Gradients Compression for Model-Parallel Training

Keyword: optimization

Reinforcement Learning for Optimizing RAG for Domain Chatbots

Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification

MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization

Advanced safety filter based on SOS Control Barrier and Lyapunov Functions

Multi-hop Relaying with Mixed Half and Full Duplex Relays for Offloading to MEC

Fast and Accurate Zero-Training Classification for Tabular Engineering Data

Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud Semantic Segmentation via Decoupling Optimization

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

UAV-assisted Emergency Integrated Sensing and Communication Networks: A CNN-based Rapid Deployment Approach

COIN: Chance-Constrained Imitation Learning for Uncertainty-aware Adaptive Resource Oversubscription Policy

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

Optimization of Inter-group Criteria for Clustering with Minimum Size Constraints

Resource Allocation in Uplink Multi STAR-RIS-aided NOMA System via Meta-Learning

Secrecy Coding for the Binary Symmetric Wiretap Channel via Linear Programming

Adaptive Prognostic Malfunction Based Processor for Autonomous Landing Guidance Assistance System Using FPGA

Inroads to a Structured Data Natural Language Bijection and the role of LLM annotation

Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy

Emergency Localization for Mobile Ground Users: An Adaptive UAV Trajectory Planning Method

FROST-BRDF: A Fast and Robust Optimal Sampling Technique for BRDF Acquisition

Hybrid Coded-Uncoded Caching in Multi-Access Networks with Non-uniform Demands

CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design

A Data-driven Resilience Framework of Directionality Configuration based on Topological Credentials in Road Networks

A Novel Optimization Algorithm for Buffer and Splitter Minimization in Phase-Skipping Adiabatic Quantum-Flux-Parametron Circuits

Fairness-aware Photovoltaic Generation Limits for Voltage Regulation in Power Distribution Networks using Conservative Linear Approximations

Multi-Task DNS Security Analysis via High-Order Heterogeneous Graph Embedding

Startup Delay Aware Short Video Ordering: Problem, Model, and A Reinforcement Learning based Algorithm

Evolutionary Multi-Objective Diversity Optimization

Input Convex Lipschitz RNN: A Fast and Robust Approach for Engineering Tasks

Study Features via Exploring Distribution Structure

Eco-driving Intelligent Systems and Algorithms: A Patent Review

A greedy heuristic for graph burning

RedEx: Beyond Fixed Representation Methods via Convex Optimization

Multi-Objective Optimization in STAR-RIS-Aided SWIPT with RSMA via Meta-Learning

Preserving Power Optimizations Across the High Level Synthesis of Distinct Application-Specific Circuits

HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation

Joint Probability Selection and Power Allocation for Federated Learning

Learning Soft Constrained MPC Value Functions: Efficient MPC Design and Implementation providing Stability and Safety Guarantees

Certifiable Mutual Localization and Trajectory Planning for Bearing-Based Robot Swarm

Optimal experimental design via gradient flow

Online Simulation at Machine Level: A Systematic Review

PATSMA: Parameter Auto-tuning for Shared Memory Algorithms

The Chronicles of RAG: The Retriever, the Chunk and the Generator

6-DoF Grasp Pose Evaluation and Optimization via Transfer Learning from NeRFs

Playing the MEV Game on a First-Come-First-Served Blockchain

Hardware Acceleration for Real-Time Wildfire Detection Onboard Drone Networks

Operation Scheme Optimizations to Achieve Ultra-high Endurance (1010) in Flash Memory with Robust Reliabilities

Distributed Stackelberg Equilibrium Seeking for Networked Multi-Leader Multi-Follower Games with A Clustered Information Structure

Learning Stable Koopman Embeddings for Identification and Control

PRewrite: Prompt Rewriting with Reinforcement Learning

Efficient and Mathematically Robust Operations for Certified Neural Networks Inference

Phase-free Dynamic Movement Primitives Applied to Kinesthetic Guidance in Robotic Co-manipulation Tasks

Optimizing $k$ in $k$NN Graphs with Graph Learning Perspective

The Faiss library

Boosting Gradient Ascent for Continuous DR-submodular Maximization

Adjoint Monte Carlo Method

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Centralized vs. Decoupled Dual-Arm Planning Taking into Account Path Quality

Revealing the Hidden Impact of Top-N Metrics on Optimization in Recommender Systems

Instilling Multi-round Thinking to Text-guided Image Generation

Real-Time Dynamic Layout Optimization for Floating Offshore Wind Farm Control

Battery-Swapping Multi-Agent System for Sustained Operation of Large Planetary Fleets

Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering

X Hacking: The Threat of Misguided AutoML

Experimentally implemented dynamic optogenetic optimization of ATPase expression using knowledge-based and Gaussian-process-supported models

Safe Mission-Level Path Planning for Exploration of Lunar Shadowed Regions by a Solar-Powered Rover

Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data

RoHM: Robust Human Motion Reconstruction via Diffusion

Keyword: adam

Deep Learning Based Cyberbullying Detection in Bangla Language

Keyword: gradient