New submissions for Mon, 15 Apr 24

Keyword: differential privacy

QI-DPFL: Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy

Authors: Wenhao Yuan, Xuehe Wang
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2404.08261
Pdf link: https://arxiv.org/pdf/2404.08261
Abstract Federated Learning (FL) has increasingly been recognized as an innovative and secure distributed model training paradigm, aiming to coordinate multiple edge clients to collaboratively train a shared model without uploading their private datasets. The challenge of encouraging mobile edge devices to participate zealously in FL model training procedures, while mitigating the privacy leakage risks during wireless transmission, remains comparatively unexplored so far. In this paper, we propose a novel approach, named QI-DPFL (Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy), to address the aforementioned intractable issue. To select clients with high-quality datasets, we first propose a quality-aware client selection mechanism based on the Earth Mover's Distance (EMD) metric. Furthermore, to attract high-quality data contributors, we design an incentive-boosted mechanism that constructs the interactions between the central server and the selected clients as a two-stage Stackelberg game, where the central server designs the time-dependent reward to minimize its cost by considering the trade-off between accuracy loss and total reward allocated, and each selected client decides the privacy budget to maximize its utility. The Nash Equilibrium of the Stackelberg game is derived to find the optimal solution in each global iteration. The extensive experimental results on different real-world datasets demonstrate the effectiveness of our proposed FL framework, by realizing the goal of privacy protection and incentive compatibility.
Keyword: privacy

FedAuxHMTL: Federated Auxiliary Hard-Parameter Sharing Multi-Task Learning for Network Edge Traffic Classification
Authors: Faisal Ahmed, Myungjin Lee, Suresh Subramaniam, Motoharu Matsuura, Hiroshi Hasegawa, Shih-Chun Lin
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2404.08028
Pdf link: https://arxiv.org/pdf/2404.08028
Abstract Federated Learning (FL) has garnered significant interest recently due to its potential as an effective solution for tackling many challenges in diverse application scenarios, for example, data privacy in network edge traffic classification. Despite its recognized advantages, FL encounters obstacles linked to statistical data heterogeneity and labeled data scarcity during the training of single-task models for machine learning-based traffic classification, leading to hindered learning performance. In response to these challenges, adopting a hard-parameter sharing multi-task learning model with auxiliary tasks proves to be a suitable approach. Such a model has the capability to reduce communication and computation costs, navigate statistical complexities inherent in FL contexts, and overcome labeled data scarcity by leveraging knowledge derived from interconnected auxiliary tasks. This paper introduces a new framework for federated auxiliary hard-parameter sharing multi-task learning, namely, FedAuxHMTL. The introduced framework incorporates model parameter exchanges between edge server and base stations, enabling base stations from distributed areas to participate in the FedAuxHMTL process and enhance the learning performance of the main task-network edge traffic classification. Empirical experiments are conducted to validate and demonstrate the FedAuxHMTL's effectiveness in terms of accuracy, total global loss, communication costs, computing time, and energy consumption compared to its counterparts.
WildGraph: Realistic Graph-based Trajectory Generation for Wildlife
Authors: Ali Al-Lawati, Elsayed Eshra, Prasenjit Mitra
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08068
Pdf link: https://arxiv.org/pdf/2404.08068
Abstract Trajectory generation is an important task in movement studies; it circumvents the privacy, ethical, and technical challenges of collecting real trajectories from the target population. In particular, real trajectories in the wildlife domain are scarce as a result of ethical and environmental constraints of the collection process. In this paper, we consider the problem of generating long-horizon trajectories, akin to wildlife migration, based on a small set of real samples. We propose a hierarchical approach to learn the global movement characteristics of the real dataset and recursively refine localized regions. Our solution, WildGraph, discretizes the geographic path into a prototype network of H3 (https://www.uber.com/blog/h3/) regions and leverages a recurrent variational auto-encoder to probabilistically generate paths over the regions, based on occupancy. WildGraph successfully generates realistic months-long trajectories using a sample size as small as 60. Experiments performed on two wildlife migration datasets demonstrate that our proposed method improves the generalization of the generated trajectories in comparison to existing work while achieving superior or comparable performance in several benchmark metrics. Our code is published on the following repository: \url{https://github.com/aliwister/wildgraph}.
QI-DPFL: Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy
Authors: Wenhao Yuan, Xuehe Wang
Subjects: Computer Science and Game Theory (cs.GT)
Arxiv link: https://arxiv.org/abs/2404.08261
Pdf link: https://arxiv.org/pdf/2404.08261
Abstract Federated Learning (FL) has increasingly been recognized as an innovative and secure distributed model training paradigm, aiming to coordinate multiple edge clients to collaboratively train a shared model without uploading their private datasets. The challenge of encouraging mobile edge devices to participate zealously in FL model training procedures, while mitigating the privacy leakage risks during wireless transmission, remains comparatively unexplored so far. In this paper, we propose a novel approach, named QI-DPFL (Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy), to address the aforementioned intractable issue. To select clients with high-quality datasets, we first propose a quality-aware client selection mechanism based on the Earth Mover's Distance (EMD) metric. Furthermore, to attract high-quality data contributors, we design an incentive-boosted mechanism that constructs the interactions between the central server and the selected clients as a two-stage Stackelberg game, where the central server designs the time-dependent reward to minimize its cost by considering the trade-off between accuracy loss and total reward allocated, and each selected client decides the privacy budget to maximize its utility. The Nash Equilibrium of the Stackelberg game is derived to find the optimal solution in each global iteration. The extensive experimental results on different real-world datasets demonstrate the effectiveness of our proposed FL framework, by realizing the goal of privacy protection and incentive compatibility.
Collaborative-Enhanced Prediction of Spending on Newly Downloaded Mobile Games under Consumption Uncertainty
Authors: Peijie Sun, Yifan Wang, Min Zhang, Chuhan Wu, Yan Fang, Hong Zhu, Yuan Fang, Meng Wang
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08301
Pdf link: https://arxiv.org/pdf/2404.08301
Abstract With the surge in mobile gaming, accurately predicting user spending on newly downloaded games has become paramount for maximizing revenue. However, the inherently unpredictable nature of user behavior poses significant challenges in this endeavor. To address this, we propose a robust model training and evaluation framework aimed at standardizing spending data to mitigate label variance and extremes, ensuring stability in the modeling process. Within this framework, we introduce a collaborative-enhanced model designed to predict user game spending without relying on user IDs, thus ensuring user privacy and enabling seamless online training. Our model adopts a unique approach by separately representing user preferences and game features before merging them as input to the spending prediction module. Through rigorous experimentation, our approach demonstrates notable improvements over production models, achieving a remarkable \textbf{17.11}\% enhancement on offline data and an impressive \textbf{50.65}\% boost in an online A/B test. In summary, our contributions underscore the importance of stable model training frameworks and the efficacy of collaborative-enhanced models in predicting user spending behavior in mobile gaming.
Manifest V3 Unveiled: Navigating the New Era of Browser Extensions
Authors: Nikolaos Pantelaios, Alexandros Kapravelos
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.08310
Pdf link: https://arxiv.org/pdf/2404.08310
Abstract Introduced over a decade ago, Chrome extensions now exceed 200,000 in number. In 2020, Google announced a shift in extension development with Manifest Version 3 (V3), aiming to replace the previous Version 2 (V2) by January 2023. This deadline was later extended to January 2025. The company's decision is grounded in enhancing three main pillars: privacy, security, and performance. This paper presents a comprehensive analysis of the Manifest V3 ecosystem. We start by investigating the adoption rate of V3, detailing the percentage of adoption from its announcement up until 2024. Our findings indicate, prior to the 2023 pause, less than 5% of all extensions had transitioned to V3, despite the looming deadline for the complete removal of V2, while currently nine out of ten new extensions are being uploaded in Manifest V3. Furthermore, we compare the security and privacy enhancements between V2 and V3 and we evaluate the improved security attributable to V3's safer APIs, examining how certain APIs, which were vulnerable or facilitated malicious behavior, have been deprecated or removed in V3. We dynamically execute 517 confirmed malicious extensions and we see a 87.8% removal of APIs related to malicious behavior due to the improvements of V3. We discover that only 154 (29.8%) of these extensions remain functional post-conversion. This analysis leads to the conclusion that V3 reduces the avenues for abuse of such APIs. However, despite the reduction in APIs associated with malicious activities, the new Manifest V3 protocol is not immune to such behavior. Our research demonstrates, through a proof of concept, the adaptability of malicious activities to V3. After the proof of concept changes are applied, we showcase 290 (56%) of the examined malicious extensions retain their capability to conduct harmful activities within the V3 framework.
Keyword: machine learning

An inclusive review on deep learning techniques and their scope in handwriting recognition
Authors: Sukhdeep Singh, Sudhir Rohilla, Anuj Sharma
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08011
Pdf link: https://arxiv.org/pdf/2404.08011
Abstract Deep learning expresses a category of machine learning algorithms that have the capability to combine raw inputs into intermediate features layers. These deep learning algorithms have demonstrated great results in different fields. Deep learning has particularly witnessed for a great achievement of human level performance across a number of domains in computer vision and pattern recognition. For the achievement of state-of-the-art performances in diverse domains, the deep learning used different architectures and these architectures used activation functions to perform various computations between hidden and output layers of any architecture. This paper presents a survey on the existing studies of deep learning in handwriting recognition field. Even though the recent progress indicates that the deep learning methods has provided valuable means for speeding up or proving accurate results in handwriting recognition, but following from the extensive literature survey, the present study finds that the deep learning has yet to revolutionize more and has to resolve many of the most pressing challenges in this field, but promising advances have been made on the prior state of the art. Additionally, an inadequate availability of labelled data to train presents problems in this domain. Nevertheless, the present handwriting recognition survey foresees deep learning enabling changes at both bench and bedside with the potential to transform several domains as image processing, speech recognition, computer vision, machine translation, robotics and control, medical imaging, medical information processing, bio-informatics, natural language processing, cyber security, and many others.
AI-Guided Feature Segmentation Techniques to Model Features from Single Crystal Diamond Growth
Authors: Rohan Reddy Mekala, Elias Garratt, Matthias Muehle, Arjun Srinivasan, Adam Porter, Mikael Lindvall
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08017
Pdf link: https://arxiv.org/pdf/2404.08017
Abstract Process refinement to consistently produce high-quality material over a large area of the grown crystal, enabling various applications from optics crystals to quantum detectors, has long been a goal for diamond growth. Machine learning offers a promising path toward this goal, but faces challenges such as the complexity of features within datasets, their time-dependency, and the volume of data produced per growth run. Accurate spatial feature extraction from image to image for real-time monitoring of diamond growth is crucial yet complicated due to the low-volume and high feature complexity nature of the datasets. This paper compares various traditional and machine learning-driven approaches for feature extraction in the diamond growth domain, proposing a novel deep learning-driven semantic segmentation approach to isolate and classify accurate pixel masks of geometric features like diamond, pocket holder, and background, along with their derivative features based on shape and size. Using an annotation-focused human-in-the-loop software architecture for training datasets, with modules for selective data labeling using active learning, data augmentations, and model-assisted labeling, our approach achieves effective annotation accuracy and drastically reduces labeling time and cost. Deep learning algorithms prove highly efficient in accurately learning complex representations from datasets with many features. Our top-performing model, based on the DeeplabV3plus architecture, achieves outstanding accuracy in classifying features of interest, with accuracies of 96.31% for pocket holder, 98.60% for diamond top, and 91.64% for diamond side features.
The OxMat dataset: a multimodal resource for the development of AI-driven technologies in maternal and newborn child health
Authors: M. Jaleed Khan, Ioana Duta, Beth Albert, William Cooke, Manu Vatish, Gabriel Davis Jones
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08024
Pdf link: https://arxiv.org/pdf/2404.08024
Abstract The rapid advancement of Artificial Intelligence (AI) in healthcare presents a unique opportunity for advancements in obstetric care, particularly through the analysis of cardiotocography (CTG) for fetal monitoring. However, the effectiveness of such technologies depends upon the availability of large, high-quality datasets that are suitable for machine learning. This paper introduces the Oxford Maternity (OxMat) dataset, the world's largest curated dataset of CTGs, featuring raw time series CTG data and extensive clinical data for both mothers and babies, which is ideally placed for machine learning. The OxMat dataset addresses the critical gap in women's health data by providing over 177,211 unique CTG recordings from 51,036 pregnancies, carefully curated and reviewed since 1991. The dataset also comprises over 200 antepartum, intrapartum and postpartum clinical variables, ensuring near-complete data for crucial outcomes such as stillbirth and acidaemia. While this dataset also covers the intrapartum stage, around 94% of the constituent CTGS are antepartum. This allows for a unique focus on the underserved antepartum period, in which early detection of at-risk fetuses can significantly improve health outcomes. Our comprehensive review of existing datasets reveals the limitations of current datasets: primarily, their lack of sufficient volume, detailed clinical data and antepartum data. The OxMat dataset lays a foundation for future AI-driven prenatal care, offering a robust resource for developing and testing algorithms aimed at improving maternal and fetal health outcomes.
FedAuxHMTL: Federated Auxiliary Hard-Parameter Sharing Multi-Task Learning for Network Edge Traffic Classification
Authors: Faisal Ahmed, Myungjin Lee, Suresh Subramaniam, Motoharu Matsuura, Hiroshi Hasegawa, Shih-Chun Lin
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2404.08028
Pdf link: https://arxiv.org/pdf/2404.08028
Abstract Federated Learning (FL) has garnered significant interest recently due to its potential as an effective solution for tackling many challenges in diverse application scenarios, for example, data privacy in network edge traffic classification. Despite its recognized advantages, FL encounters obstacles linked to statistical data heterogeneity and labeled data scarcity during the training of single-task models for machine learning-based traffic classification, leading to hindered learning performance. In response to these challenges, adopting a hard-parameter sharing multi-task learning model with auxiliary tasks proves to be a suitable approach. Such a model has the capability to reduce communication and computation costs, navigate statistical complexities inherent in FL contexts, and overcome labeled data scarcity by leveraging knowledge derived from interconnected auxiliary tasks. This paper introduces a new framework for federated auxiliary hard-parameter sharing multi-task learning, namely, FedAuxHMTL. The introduced framework incorporates model parameter exchanges between edge server and base stations, enabling base stations from distributed areas to participate in the FedAuxHMTL process and enhance the learning performance of the main task-network edge traffic classification. Empirical experiments are conducted to validate and demonstrate the FedAuxHMTL's effectiveness in terms of accuracy, total global loss, communication costs, computing time, and energy consumption compared to its counterparts.
A Multi-Expert Large Language Model Architecture for Verilog Code Generation
Authors: Bardia Nadimi, Hao Zheng
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL); Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2404.08029
Pdf link: https://arxiv.org/pdf/2404.08029
Abstract Recently, there has been a surging interest in using large language models (LLMs) for Verilog code generation. However, the existing approaches are limited in terms of the quality of the generated Verilog code. To address such limitations, this paper introduces an innovative multi-expert LLM architecture for Verilog code generation (MEV-LLM). Our architecture uniquely integrates multiple LLMs, each specifically fine-tuned with a dataset that is categorized with respect to a distinct level of design complexity. It allows more targeted learning, directly addressing the nuances of generating Verilog code for each category. Empirical evidence from experiments highlights notable improvements in terms of the percentage of generated Verilog outputs that are syntactically and functionally correct. These findings underscore the efficacy of our approach, promising a forward leap in the field of automated hardware design through machine learning.
Persistent Classification: A New Approach to Stability of Data and Adversarial Examples
Authors: Brian Bell, Michael Geyer, David Glickenstein, Keaton Hamm, Carlos Scheidegger, Amanda Fernandez, Juston Moore
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08069
Pdf link: https://arxiv.org/pdf/2404.08069
Abstract There are a number of hypotheses underlying the existence of adversarial examples for classification problems. These include the high-dimensionality of the data, high codimension in the ambient space of the data manifolds of interest, and that the structure of machine learning models may encourage classifiers to develop decision boundaries close to data points. This article proposes a new framework for studying adversarial examples that does not depend directly on the distance to the decision boundary. Similarly to the smoothed classifier literature, we define a (natural or adversarial) data point to be $(\gamma,\sigma)$-stable if the probability of the same classification is at least $\gamma$ for points sampled in a Gaussian neighborhood of the point with a given standard deviation $\sigma$. We focus on studying the differences between persistence metrics along interpolants of natural and adversarial points. We show that adversarial examples have significantly lower persistence than natural examples for large neural networks in the context of the MNIST and ImageNet datasets. We connect this lack of persistence with decision boundary geometry by measuring angles of interpolants with respect to decision boundaries. Finally, we connect this approach with robustness by developing a manifold alignment gradient metric and demonstrating the increase in robustness that can be achieved when training with the addition of this metric.
Towards a Robust Soft Baby Robot With Rich Interaction Ability for Advanced Machine Learning Algorithms
Authors: Mohannad Alhakami, Dylan R. Ashley, Joel Dunham, Francesco Faccio, Eric Feron, Jürgen Schmidhuber
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08093
Pdf link: https://arxiv.org/pdf/2404.08093
Abstract Artificial intelligence has made great strides in many areas lately, yet it has had comparatively little success in general-use robotics. We believe one of the reasons for this is the disconnect between traditional robotic design and the properties needed for open-ended, creativity-based AI systems. To that end, we, taking selective inspiration from nature, build a robust, partially soft robotic limb with a large action space, rich sensory data stream from multiple cameras, and the ability to connect with others to enhance the action space and data stream. As a proof of concept, we train two contemporary machine learning algorithms to perform a simple target-finding task. Altogether, we believe that this design serves as a first step to building a robot tailor-made for achieving artificial general intelligence.
Predictive Handover Strategy in 6G and Beyond: A Deep and Transfer Learning Approach
Authors: Ioannis Panitsas, Akrit Mudvari, Ali Maatouk, Leandros Tassiulas
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08113
Pdf link: https://arxiv.org/pdf/2404.08113
Abstract Next-generation cellular networks will evolve into more complex and virtualized systems, employing machine learning for enhanced optimization and leveraging higher frequency bands and denser deployments to meet varied service demands. This evolution, while bringing numerous advantages, will also pose challenges, especially in mobility management, as it will increase the overall number of handovers due to smaller coverage areas and the higher signal attenuation. To address these challenges, we propose a deep learning based algorithm for predicting the future serving cell utilizing sequential user equipment measurements to minimize the handover failures and interruption time. Our algorithm enables network operators to dynamically adjust handover triggering events or incorporate UAV base stations for enhanced coverage and capacity, optimizing network objectives like load balancing and energy efficiency through transfer learning techniques. Our framework complies with the O-RAN specifications and can be deployed in a Near-Real-Time RAN Intelligent Controller as an xApp leveraging the E2SM-KPM service model. The evaluation results demonstrate that our algorithm achieves a 92% accuracy in predicting future serving cells with high probability. Finally, by utilizing transfer learning, our algorithm significantly reduces the retraining time by 91% and 77% when new handover trigger decisions or UAV base stations are introduced to the network dynamically.
Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls
Authors: Amin Hosseiny Marani, Ulie Schnaithmann, Youngseo Son, Akil Iyer, Manas Paldhe, Arushi Raghuvanshi
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.08155
Pdf link: https://arxiv.org/pdf/2404.08155
Abstract Current Conversational AI systems employ different machine learning pipelines, as well as external knowledge sources and business logic to predict the next action. Maintaining various components in dialogue managers' pipeline adds complexity in expansion and updates, increases processing time, and causes additive noise through the pipeline that can lead to incorrect next action prediction. This paper investigates graph integration into language transformers to improve understanding the relationships between humans' utterances, previous, and next actions without the dependency on external sources or components. Experimental analyses on real calls indicate that the proposed Graph Integrated Language Transformer models can achieve higher performance compared to other production level conversational AI systems in driving interactive calls with human users in real-world settings.
A Survey on Security of Ultra/Hyper Reliable Low Latency Communication: Recent Advancements, Challenges, and Future Directions
Authors: Annapurna Pradhan, Susmita Das, Md. Jalil Piran, Zhu Han
Subjects: Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.08160
Pdf link: https://arxiv.org/pdf/2404.08160
Abstract Ultra-reliable low latency communication (URLLC) is an innovative service offered by fifth-generation (5G) wireless systems. URLLC enables various mission-critical applications by facilitating reliable and low-latency signal transmission to support extreme Quality of Service (QoS) requirements. Apart from reliability and latency, ensuring secure data transmission for URLLC has been a prominent issue for researchers in recent years. Using finite blocklength signals to achieve the stringent reliability and latency criteria in URLLC eliminates the possibility of using conventional complex cryptographic security enhancement techniques based on encoding and decoding of secret keys. Thus, the development of lightweight security mechanisms is of paramount importance for URLLC. Recently, Physical-Layer Security (PLS) techniques have emerged as a powerful alternative to the complex cryptography-based security approaches for facilitating secure URLLC by exploiting the randomness of the wireless channel. Therefore, in this survey, we present a comprehensive and in-depth review of the state-of-the-art PLS enhancements utilized to unleash secure URLLC while analyzing the impact of various system design parameters on its performance. Moreover, the survey incorporates a detailed overview of the recent advancements in ensuring secure URLLC using PLS in various mission-critical applications, and 5G URLLC enabling technologies like non-orthogonal multiple access (NOMA), multi-antenna systems, cooperative communication using unmanned aerial vehicles (UAV), and intelligent reflective surfaces (IRS). Apart from this, we briefly discuss the role of advanced Machine Learning (ML) techniques in designing robust and intelligent PLS schemes for URLLC service.
Clustering Analysis of US COVID-19 Rates, Vaccine Participation, and Socioeconomic Factors
Authors: Morteza Maleki
Subjects: Computational Engineering, Finance, and Science (cs.CE); Physics and Society (physics.soc-ph)
Arxiv link: https://arxiv.org/abs/2404.08186
Pdf link: https://arxiv.org/pdf/2404.08186
Abstract The COVID-19 pandemic has presented unprecedented challenges worldwide, with its impact varying significantly across different geographic and socioeconomic contexts. This study employs a clustering analysis to examine the diversity of responses to the pandemic within the United States, aiming to provide nuanced insights into the effectiveness of various strategies. We utilize an unsupervised machine learning approach, specifically K-Means clustering, to analyze county-level data that includes variables such as infection rates, death rates, demographic profiles, and socio-economic factors. Our analysis identifies distinct clusters of counties based on their pandemic responses and outcomes, facilitating a detailed examination of "high-performing" and "lower-performing" groups. These classifications are informed by a combination of COVID-specific datasets and broader socio-economic data, allowing for a comprehensive understanding of the factors that contribute to differing levels of pandemic impact. The findings underscore the importance of tailored public health responses that consider local conditions and capabilities. Additionally, this study introduces an innovative visualization tool that aids in hypothesis testing and further research, enhancing the ability of policymakers and public health officials to deploy more effective and targeted interventions in future health crises.
Enhancing Fairness and Performance in Machine Learning Models: A Multi-Task Learning Approach with Monte-Carlo Dropout and Pareto Optimality
Authors: Khadija Zanna, Akane Sano
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2404.08230
Pdf link: https://arxiv.org/pdf/2404.08230
Abstract This paper considers the need for generalizable bias mitigation techniques in machine learning due to the growing concerns of fairness and discrimination in data-driven decision-making procedures across a range of industries. While many existing methods for mitigating bias in machine learning have succeeded in specific cases, they often lack generalizability and cannot be easily applied to different data types or models. Additionally, the trade-off between accuracy and fairness remains a fundamental tension in the field. To address these issues, we propose a bias mitigation method based on multi-task learning, utilizing the concept of Monte-Carlo dropout and Pareto optimality from multi-objective optimization. This method optimizes accuracy and fairness while improving the model's explainability without using sensitive information. We test this method on three datasets from different domains and show how it can deliver the most desired trade-off between model fairness and performance. This allows for tuning in specific domains where one metric may be more important than another. With the framework we introduce in this paper, we aim to enhance the fairness-performance trade-off and offer a solution to bias mitigation methods' generalizability issues in machine learning.
Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning
Authors: Hui Bai, Ran Cheng
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2404.08233
Pdf link: https://arxiv.org/pdf/2404.08233
Abstract Hyperparameter optimization plays a key role in the machine learning domain. Its significance is especially pronounced in reinforcement learning (RL), where agents continuously interact with and adapt to their environments, requiring dynamic adjustments in their learning trajectories. To cater to this dynamicity, the Population-Based Training (PBT) was introduced, leveraging the collective intelligence of a population of agents learning simultaneously. However, PBT tends to favor high-performing agents, potentially neglecting the explorative potential of agents on the brink of significant advancements. To mitigate the limitations of PBT, we present the Generalized Population-Based Training (GPBT), a refined framework designed for enhanced granularity and flexibility in hyperparameter adaptation. Complementing GPBT, we further introduce Pairwise Learning (PL). Instead of merely focusing on elite agents, PL employs a comprehensive pairwise strategy to identify performance differentials and provide holistic guidance to underperforming agents. By integrating the capabilities of GPBT and PL, our approach significantly improves upon traditional PBT in terms of adaptability and computational efficiency. Rigorous empirical evaluations across a range of RL benchmarks confirm that our approach consistently outperforms not only the conventional PBT but also its Bayesian-optimized variant.
Communication-Efficient Model Aggregation with Layer Divergence Feedback in Federated Learning
Authors: Liwei Wang, Jun Li, Wen Chen, Qingqing Wu, Ming Ding
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2404.08324
Pdf link: https://arxiv.org/pdf/2404.08324
Abstract Federated Learning (FL) facilitates collaborative machine learning by training models on local datasets, and subsequently aggregating these local models at a central server. However, the frequent exchange of model parameters between clients and the central server can result in significant communication overhead during the FL training process. To solve this problem, this paper proposes a novel FL framework, the Model Aggregation with Layer Divergence Feedback mechanism (FedLDF). Specifically, we calculate model divergence between the local model and the global model from the previous round. Then through model layer divergence feedback, the distinct layers of each client are uploaded and the amount of data transferred is reduced effectively. Moreover, the convergence bound reveals that the access ratio of clients has a positive correlation with model performance. Simulation results show that our algorithm uploads local models with reduced communication overhead while upholding a superior global model performance.
Learning representations of learning representations
Authors: Rita González-Márquez, Dmitry Kobak
Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08403
Pdf link: https://arxiv.org/pdf/2404.08403
Abstract The ICLR conference is unique among the top machine learning conferences in that all submitted papers are openly available. Here we present the ICLR dataset consisting of abstracts of all 24 thousand ICLR submissions from 2017-2024 with meta-data, decision scores, and custom keyword-based labels. We find that on this dataset, bag-of-words representation outperforms most dedicated sentence transformer models in terms of $k$NN classification accuracy, and the top performing language models barely outperform TF-IDF. We see this as a challenge for the NLP community. Furthermore, we use the ICLR dataset to study how the field of machine learning has changed over the last seven years, finding some improvement in gender balance. Using a 2D embedding of the abstracts' texts, we describe a shift in research topics from 2017 to 2024 and identify hedgehogs and foxes among the authors with the highest number of ICLR submissions.
An improved tabular data generator with VAE-GMM integration
Authors: Patricia A. Apellániz, Juan Parras, Santiago Zazo
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08434
Pdf link: https://arxiv.org/pdf/2404.08434
Abstract The rising use of machine learning in various fields requires robust methods to create synthetic tabular data. Data should preserve key characteristics while addressing data scarcity challenges. Current approaches based on Generative Adversarial Networks, such as the state-of-the-art CTGAN model, struggle with the complex structures inherent in tabular data. These data often contain both continuous and discrete features with non-Gaussian distributions. Therefore, we propose a novel Variational Autoencoder (VAE)-based model that addresses these limitations. Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture. This avoids the limitations imposed by assuming a strictly Gaussian latent space, allowing for a more accurate representation of the underlying data distribution during data generation. Furthermore, our model offers enhanced flexibility by allowing the use of various differentiable distributions for individual features, making it possible to handle both continuous and discrete data types. We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones, based on their resemblance and utility. This evaluation demonstrates significant outperformance against CTGAN and TVAE, establishing its potential as a valuable tool for generating synthetic tabular data in various domains, particularly in healthcare.
Federated Optimization with Doubly Regularized Drift Correction
Authors: Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2404.08447
Pdf link: https://arxiv.org/pdf/2404.08447
Abstract Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized. The standard method, FedAvg, suffers from client drift which can hamper performance and increase communication costs over centralized methods. Previous works proposed various strategies to mitigate drift, yet none have shown uniformly improved communication-computation trade-offs over vanilla gradient descent. In this work, we revisit DANE, an established method in distributed optimization. We show that (i) DANE can achieve the desired communication reduction under Hessian similarity constraints. Furthermore, (ii) we present an extension, DANE+, which supports arbitrary inexact local solvers and has more freedom to choose how to aggregate the local updates. We propose (iii) a novel method, FedRed, which has improved local computational complexity and retains the same communication complexity compared to DANE/DANE+. This is achieved by using doubly regularized drift correction.
Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized
Authors: Shomik Jain, Kathleen Creel, Ashia Wilson
Subjects: Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2404.08592
Pdf link: https://arxiv.org/pdf/2404.08592
Abstract Contrary to traditional deterministic notions of algorithmic fairness, this paper argues that fairly allocating scarce resources using machine learning often requires randomness. We address why, when, and how to randomize by proposing stochastic procedures that more adequately account for all of the claims that individuals have to allocations of social goods or opportunities.
Learning-Based Joint Antenna Selection and Precoding Design for Cell-Free MIMO Networks
Authors: Liangzhi Wang, Chen Chen, Carlo Fischione, Jie Zhang
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.08607
Pdf link: https://arxiv.org/pdf/2404.08607
Abstract This paper considers a downlink cell-free multiple-input multiple-output (MIMO) network in which multiple multi-antenna base stations (BSs) serve multiple users via coherent joint transmission. In order to reduce the energy consumption by radio frequency components, each BS selects a subset of antennas for downlink data transmission after estimating the channel state information (CSI). We aim to maximize the sum spectral efficiency by jointly optimizing the antenna selection and precoding design. To alleviate the fronthaul overhead and enable real-time network operation, we propose a distributed scalable machine learning algorithm. In particular, at each BS, we deploy a convolutional neural network (CNN) for antenna selection and a graph neural network (GNN) for precoding design. Different from conventional centralized solutions that require a large amount of CSI and signaling exchange among the BSs, the proposed distributed machine learning algorithm takes only locally estimated CSI as input. With well-trained learning models, it is shown that the proposed algorithm significantly outperforms the distributed baseline schemes and achieves a sum spectral efficiency comparable to its centralized counterpart.
Hyperbolic Delaunay Geometric Alignment
Authors: Aniss Aiman Medbouhi, Giovanni Luca Marchetti, Vladislav Polianskii, Alexander Kravberg, Petra Poklukar, Anastasia Varava, Danica Kragic
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08608
Pdf link: https://arxiv.org/pdf/2404.08608
Abstract Hyperbolic machine learning is an emerging field aimed at representing data with a hierarchical structure. However, there is a lack of tools for evaluation and analysis of the resulting hyperbolic data representations. To this end, we propose Hyperbolic Delaunay Geometric Alignment (HyperDGA) -- a similarity score for comparing datasets in a hyperbolic space. The core idea is counting the edges of the hyperbolic Delaunay graph connecting datapoints across the given sets. We provide an empirical investigation on synthetic and real-life biological data and demonstrate that HyperDGA outperforms the hyperbolic version of classical distances between sets. Furthermore, we showcase the potential of HyperDGA for evaluating latent representations inferred by a Hyperbolic Variational Auto-Encoder.
Keyword: optimization

Learning Efficient and Fair Policies for Uncertainty-Aware Collaborative Human-Robot Order Picking
Authors: Igor G. Smit, Zaharah Bukhsh, Mykola Pechenizkiy, Kostas Alogariastos, Kasper Hendriks, Yingqian Zhang
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2404.08006
Pdf link: https://arxiv.org/pdf/2404.08006
Abstract In collaborative human-robot order picking systems, human pickers and Autonomous Mobile Robots (AMRs) travel independently through a warehouse and meet at pick locations where pickers load items onto the AMRs. In this paper, we consider an optimization problem in such systems where we allocate pickers to AMRs in a stochastic environment. We propose a novel multi-objective Deep Reinforcement Learning (DRL) approach to learn effective allocation policies to maximize pick efficiency while also aiming to improve workload fairness amongst human pickers. In our approach, we model the warehouse states using a graph, and define a neural network architecture that captures regional information and effectively extracts representations related to efficiency and workload. We develop a discrete-event simulation model, which we use to train and evaluate the proposed DRL approach. In the experiments, we demonstrate that our approach can find non-dominated policy sets that outline good trade-offs between fairness and efficiency objectives. The trained policies outperform the benchmarks in terms of both efficiency and fairness. Moreover, they show good transferability properties when tested on scenarios with different warehouse sizes. The implementation of the simulation model, proposed approach, and experiments are published.
Differentiable Search for Finding Optimal Quantization Strategy
Authors: Lianqiang Li, Chenqian Yan, Yefei Chen
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.08010
Pdf link: https://arxiv.org/pdf/2404.08010
Abstract To accelerate and compress deep neural networks (DNNs), many network quantization algorithms have been proposed. Although the quantization strategy of any algorithm from the state-of-the-arts may outperform others in some network architectures, it is hard to prove the strategy is always better than others, and even cannot judge that the strategy is always the best choice for all layers in a network. In other words, existing quantization algorithms are suboptimal as they ignore the different characteristics of different layers and quantize all layers by a uniform quantization strategy. To solve the issue, in this paper, we propose a differentiable quantization strategy search (DQSS) to assign optimal quantization strategy for individual layer by taking advantages of the benefits of different quantization algorithms. Specifically, we formulate DQSS as a differentiable neural architecture search problem and adopt an efficient convolution to efficiently explore the mixed quantization strategies from a global perspective by gradient-based optimization. We conduct DQSS for post-training quantization to enable their performance to be comparable with that in full precision models. We also employ DQSS in quantization-aware training for further validating the effectiveness of DQSS. To circumvent the expensive optimization cost when employing DQSS in quantization-aware training, we update the hyper-parameters and the network parameters in a single forward-backward pass. Besides, we adjust the optimization process to avoid the potential under-fitting problem. Comprehensive experiments on high level computer vision task, i.e., image classification, and low level computer vision task, i.e., image super-resolution, with various network architectures show that DQSS could outperform the state-of-the-arts.
Enhanced Cooperative Perception for Autonomous Vehicles Using Imperfect Communication
Authors: Ahmad Sarlak, Hazim Alzorgan, Sayed Pedram Haeri Boroujeni, Abolfazl Razi, Rahul Amin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08013
Pdf link: https://arxiv.org/pdf/2404.08013
Abstract Sharing and joint processing of camera feeds and sensor measurements, known as Cooperative Perception (CP), has emerged as a new technique to achieve higher perception qualities. CP can enhance the safety of Autonomous Vehicles (AVs) where their individual visual perception quality is compromised by adverse weather conditions (haze as foggy weather), low illumination, winding roads, and crowded traffic. To cover the limitations of former methods, in this paper, we propose a novel approach to realize an optimized CP under constrained communications. At the core of our approach is recruiting the best helper from the available list of front vehicles to augment the visual range and enhance the Object Detection (OD) accuracy of the ego vehicle. In this two-step process, we first select the helper vehicles that contribute the most to CP based on their visual range and lowest motion blur. Next, we implement a radio block optimization among the candidate vehicles to further improve communication efficiency. We specifically focus on pedestrian detection as an exemplary scenario. To validate our approach, we used the CARLA simulator to create a dataset of annotated videos for different driving scenarios where pedestrian detection is challenging for an AV with compromised vision. Our results demonstrate the efficacy of our two-step optimization process in improving the overall performance of cooperative perception in challenging scenarios, substantially improving driving safety under adverse conditions. Finally, we note that the networking assumptions are adopted from LTE Release 14 Mode 4 side-link communication, commonly used for Vehicle-to-Vehicle (V2V) communication. Nonetheless, our method is flexible and applicable to arbitrary V2V communications.
Byzantine Reliable Broadcast with Low Communication and Time Complexity
Authors: Thomas Locher
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2404.08070
Pdf link: https://arxiv.org/pdf/2404.08070
Abstract Byzantine reliable broadcast is a fundamental problem in distributed computing, which has been studied extensively over the past decades. State-of-the-art algorithms are predominantly based on the approach to share encoded fragments of the broadcast message, yielding an asymptotically optimal communication complexity when the message size exceeds the network size, a condition frequently encountered in practice. However, algorithms following the standard coding approach incur an overhead factor of at least 3, which can already be a burden for bandwidth-constrained applications. Minimizing this overhead is an important objective with immediate benefits to protocols that use a reliable broadcast routine as a building block. This paper introduces a novel mechanism to lower the communication and computational complexity. Two algorithms are presented that employ this mechanism to reliably broadcast messages in an asynchronous network where less than a third of all nodes are Byzantine. The first algorithm reduces the overhead factor to 2 and has a time complexity of 3 if the sender is honest, whereas the second algorithm attains an optimal time complexity of 2 with the same overhead factor in the absence of equivocation. Moreover, an optimization for real-world implementations is proposed, reducing the overhead factor to 3/2 under normal operation. Lastly, a lower bound is proved that an overhead factor lower than 3/2 cannot be achieved for a relevant class of reliable broadcast algorithms.
Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Authors: Tanmay Gautam, Youngsuk Park, Hao Zhou, Parameswaran Raman, Wooseok Ha
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2404.08080
Pdf link: https://arxiv.org/pdf/2404.08080
Abstract Fine-tuning language models (LMs) has demonstrated success in a wide array of downstream tasks. However, as LMs are scaled up, the memory requirements for backpropagation become prohibitively high. Zeroth-order (ZO) optimization methods can leverage memory-efficient forward passes to estimate gradients. More recently, MeZO, an adaptation of ZO-SGD, has been shown to consistently outperform zero-shot and in-context learning when combined with suitable task prompts. In this work, we couple ZO methods with variance reduction techniques to enhance stability and convergence for inference-based LM fine-tuning. We introduce Memory-Efficient Zeroth-Order Stochastic Variance-Reduced Gradient (MeZO-SVRG) and demonstrate its efficacy across multiple LM fine-tuning tasks, eliminating the reliance on task-specific prompts. Evaluated across a range of both masked and autoregressive LMs on benchmark GLUE tasks, MeZO-SVRG outperforms MeZO with up to 20% increase in test accuracies in both full- and partial-parameter fine-tuning settings. MeZO-SVRG benefits from reduced computation time as it often surpasses MeZO's peak test accuracy with a $2\times$ reduction in GPU-hours. MeZO-SVRG significantly reduces the required memory footprint compared to first-order SGD, i.e. by $2\times$ for autoregressive models. Our experiments highlight that MeZO-SVRG's memory savings progressively improve compared to SGD with larger batch sizes.
SimpliCity: Reconstructing Buildings with Simple Regularized 3D Models
Authors: Jean-Philippe Bauchet, Raphael Sulzer, Florent Lafarge, Yuliya Tarabalka
Subjects: Computational Geometry (cs.CG)
Arxiv link: https://arxiv.org/abs/2404.08104
Pdf link: https://arxiv.org/pdf/2404.08104
Abstract Automatic methods for reconstructing buildings from airborne LiDAR point clouds focus on producing accurate 3D models in a fast and scalable manner, but they overlook the problem of delivering simple and regularized models to practitioners. As a result, output meshes often suffer from connectivity approximations around corners with either the presence of multiple vertices and tiny facets, or the necessity to break the planarity constraint on roof sections and facade components. We propose a 2D planimetric arrangement-based framework to address this problem. We first regularize, not the 3D planes as commonly done in the literature, but a 2D polyhedral partition constructed from the planes. Second, we extrude this partition to 3D by an optimization process that guarantees the planarity of the roof sections as well as the preservation of the vertical discontinuities and horizontal rooftop edges. We show the benefits of our approach against existing methods by producing simpler 3D models while offering a similar fidelity and efficiency.
Using Flexibility Envelopes for the Demand-Side Hierarchical Optimization of District Heating Networks
Authors: Audrey Blizard, Colin N. Jones, Stephanie Stockar
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.08107
Pdf link: https://arxiv.org/pdf/2404.08107
Abstract The demand-side control of district heating networks is notoriously challenging due to the large number of connected users and the high number of states to be considered. To overcome these challenges, this paper presents a hierarchical optimization scheme using the flexibility in heating demand provided by the users to improve the performance of the network. This hierarchical scheme relies on a low level controller to calculate the costs for a subsystem over a given set of potential pressure drops for that subsystem. The high level controller then uses these calculated costs to determine the optimal set of pressure drops for every subgraph of the partitioned network. The proposed hierarchical optimization scheme is demonstrated on a representative 20 user district heating network, resulting in a 67\% reduction in bypass mass flow while ensuring all network users stay within 2 \degree C of their desired nominal temperatures.
S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for Face Video Editing
Authors: Guangzhi Wang, Tianyi Chen, Kamran Ghasedi, HsiangTao Wu, Tianyu Ding, Chris Nuesmeyer, Ilya Zharkov, Mohan Kankanhalli, Luming Liang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.08111
Pdf link: https://arxiv.org/pdf/2404.08111
Abstract Face attribute editing plays a pivotal role in various applications. However, existing methods encounter challenges in achieving high-quality results while preserving identity, editing faithfulness, and temporal consistency. These challenges are rooted in issues related to the training pipeline, including limited supervision, architecture design, and optimization strategy. In this work, we introduce S3Editor, a Sparse Semantic-disentangled Self-training framework for face video editing. S3Editor is a generic solution that comprehensively addresses these challenges with three key contributions. Firstly, S3Editor adopts a self-training paradigm to enhance the training process through semi-supervision. Secondly, we propose a semantic disentangled architecture with a dynamic routing mechanism that accommodates diverse editing requirements. Thirdly, we present a structured sparse optimization schema that identifies and deactivates malicious neurons to further disentangle impacts from untarget attributes. S3Editor is model-agnostic and compatible with various editing approaches. Our extensive qualitative and quantitative results affirm that our approach significantly enhances identity preservation, editing fidelity, as well as temporal consistency.
Predictive Handover Strategy in 6G and Beyond: A Deep and Transfer Learning Approach
Authors: Ioannis Panitsas, Akrit Mudvari, Ali Maatouk, Leandros Tassiulas
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08113
Pdf link: https://arxiv.org/pdf/2404.08113
Abstract Next-generation cellular networks will evolve into more complex and virtualized systems, employing machine learning for enhanced optimization and leveraging higher frequency bands and denser deployments to meet varied service demands. This evolution, while bringing numerous advantages, will also pose challenges, especially in mobility management, as it will increase the overall number of handovers due to smaller coverage areas and the higher signal attenuation. To address these challenges, we propose a deep learning based algorithm for predicting the future serving cell utilizing sequential user equipment measurements to minimize the handover failures and interruption time. Our algorithm enables network operators to dynamically adjust handover triggering events or incorporate UAV base stations for enhanced coverage and capacity, optimizing network objectives like load balancing and energy efficiency through transfer learning techniques. Our framework complies with the O-RAN specifications and can be deployed in a Near-Real-Time RAN Intelligent Controller as an xApp leveraging the E2SM-KPM service model. The evaluation results demonstrate that our algorithm achieves a 92% accuracy in predicting future serving cells with high probability. Finally, by utilizing transfer learning, our algorithm significantly reduces the retraining time by 91% and 77% when new handover trigger decisions or UAV base stations are introduced to the network dynamically.
R2 Indicator and Deep Reinforcement Learning Enhanced Adaptive Multi-Objective Evolutionary Algorithm
Authors: Farajollah Tahernezhad-Javazm, Debbie Rankin, Naomi Du Bois, Alice E. Smith, Damien Coyle
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08161
Pdf link: https://arxiv.org/pdf/2404.08161
Abstract Choosing an appropriate optimization algorithm is essential to achieving success in optimization challenges. Here we present a new evolutionary algorithm structure that utilizes a reinforcement learning-based agent aimed at addressing these issues. The agent employs a double deep q-network to choose a specific evolutionary operator based on feedback it receives from the environment during optimization. The algorithm's structure contains five single-objective evolutionary algorithm operators. This single-objective structure is transformed into a multi-objective one using the R2 indicator. This indicator serves two purposes within our structure: first, it renders the algorithm multi-objective, and second, provides a means to evaluate each algorithm's performance in each generation to facilitate constructing the reinforcement learning-based reward function. The proposed R2-reinforcement learning multi-objective evolutionary algorithm (R2-RLMOEA) is compared with six other multi-objective algorithms that are based on R2 indicators. These six algorithms include the operators used in R2-RLMOEA as well as an R2 indicator-based algorithm that randomly selects operators during optimization. We benchmark performance using the CEC09 functions, with performance measured by inverted generational distance and spacing. The R2-RLMOEA algorithm outperforms all other algorithms with strong statistical significance (p<0.001) when compared with the average spacing metric across all ten benchmarks.
Loco-Manipulation with Nonimpulsive Contact-Implicit Planning in a Slithering Robot
Authors: Adarsh Salagame, Kruthika Gangaraju, Harin Kumar Nallaguntla, Eric Sihite, Gunar Schirner, Alireza Ramezani
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.08174
Pdf link: https://arxiv.org/pdf/2404.08174
Abstract Object manipulation has been extensively studied in the context of fixed base and mobile manipulators. However, the overactuated locomotion modality employed by snake robots allows for a unique blend of object manipulation through locomotion, referred to as loco-manipulation. The following work presents an optimization approach to solving the loco-manipulation problem based on non-impulsive implicit contact path planning for our snake robot COBRA. We present the mathematical framework and show high-fidelity simulation results and experiments to demonstrate the effectiveness of our approach.
Enhancing Fairness and Performance in Machine Learning Models: A Multi-Task Learning Approach with Monte-Carlo Dropout and Pareto Optimality
Authors: Khadija Zanna, Akane Sano
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Arxiv link: https://arxiv.org/abs/2404.08230
Pdf link: https://arxiv.org/pdf/2404.08230
Abstract This paper considers the need for generalizable bias mitigation techniques in machine learning due to the growing concerns of fairness and discrimination in data-driven decision-making procedures across a range of industries. While many existing methods for mitigating bias in machine learning have succeeded in specific cases, they often lack generalizability and cannot be easily applied to different data types or models. Additionally, the trade-off between accuracy and fairness remains a fundamental tension in the field. To address these issues, we propose a bias mitigation method based on multi-task learning, utilizing the concept of Monte-Carlo dropout and Pareto optimality from multi-objective optimization. This method optimizes accuracy and fairness while improving the model's explainability without using sensitive information. We test this method on three datasets from different domains and show how it can deliver the most desired trade-off between model fairness and performance. This allows for tuning in specific domains where one metric may be more important than another. With the framework we introduce in this paper, we aim to enhance the fairness-performance trade-off and offer a solution to bias mitigation methods' generalizability issues in machine learning.
Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning
Authors: Hui Bai, Ran Cheng
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Arxiv link: https://arxiv.org/abs/2404.08233
Pdf link: https://arxiv.org/pdf/2404.08233
Abstract Hyperparameter optimization plays a key role in the machine learning domain. Its significance is especially pronounced in reinforcement learning (RL), where agents continuously interact with and adapt to their environments, requiring dynamic adjustments in their learning trajectories. To cater to this dynamicity, the Population-Based Training (PBT) was introduced, leveraging the collective intelligence of a population of agents learning simultaneously. However, PBT tends to favor high-performing agents, potentially neglecting the explorative potential of agents on the brink of significant advancements. To mitigate the limitations of PBT, we present the Generalized Population-Based Training (GPBT), a refined framework designed for enhanced granularity and flexibility in hyperparameter adaptation. Complementing GPBT, we further introduce Pairwise Learning (PL). Instead of merely focusing on elite agents, PL employs a comprehensive pairwise strategy to identify performance differentials and provide holistic guidance to underperforming agents. By integrating the capabilities of GPBT and PL, our approach significantly improves upon traditional PBT in terms of adaptability and computational efficiency. Rigorous empirical evaluations across a range of RL benchmarks confirm that our approach consistently outperforms not only the conventional PBT but also its Bayesian-optimized variant.
RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning
Authors: Hongqiao Lian, Zeyuan Ma, Hongshu Guo, Ting Huang, Yue-Jiao Gong
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08242
Pdf link: https://arxiv.org/pdf/2404.08242
Abstract Solving multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations. Although existing works strike the balance of exploration and exploitation through hand-crafted adaptive strategies, they require certain expert knowledge, hence inflexible to deal with MMOP with different properties. In this paper, we propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent for flexibly adjusting individual-level searching strategies to match the up-to-date optimization status, hence boosting the search performance on MMOP. Concretely, we encode landscape properties and evolution path information into each individual and then leverage attention networks to advance population information sharing. With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm. The experimental results on the CEC2013 MMOP benchmark underscore the competitive optimization performance of RLEMMO against several strong baselines.
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
Authors: Yuqun Wu, Jae Yong Lee, Chuhang Zou, Shenlong Wang, Derek Hoiem
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08252
Pdf link: https://arxiv.org/pdf/2404.08252
Abstract The latest regularized Neural Radiance Field (NeRF) approaches produce poor geometry and view extrapolation for multiview stereo (MVS) benchmarks such as ETH3D. In this paper, we aim to create 3D models that provide accurate geometry and view synthesis, partially closing the large geometric performance gap between NeRF and traditional MVS methods. We propose a patch-based approach that effectively leverages monocular surface normal and relative depth predictions. The patch-based ray sampling also enables the appearance regularization of normalized cross-correlation (NCC) and structural similarity (SSIM) between randomly sampled virtual and training views. We further show that "density restrictions" based on sparse structure-from-motion points can help greatly improve geometric accuracy with a slight drop in novel view synthesis metrics. Our experiments show 4x the performance of RegNeRF and 8x that of FreeNeRF on average F1@2cm for ETH3D MVS benchmark, suggesting a fruitful research direction to improve the geometric accuracy of NeRF-based models, and sheds light on a potential future approach to enable NeRF-based optimization to eventually outperform traditional MVS.
Joint Design of Self-Tuning UHF RFID Antenna and Microfluidic Channel for Liquid Sensing
Authors: Giulio Maria Bianco, Gaetano Marrocco
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.08268
Pdf link: https://arxiv.org/pdf/2404.08268
Abstract Microfluidic has been an enabling technology for over a decade, particularly in the field of medical and wearable devices, allowing for the manipulation of small amounts of fluid in confined spaces. Micro-channels can also be used for wireless sensing thanks to the variations in antenna properties when the fluid flows near it. However, up to now, microfluidic channels and sensing antennas have always been designed separately; instead, since the liquid flow and the antenna geometry both contribute to the overall performance, they should be considered simultaneously when optimizing the antenna-microfluidic system. In this paper, the joint design of the antenna and microfluidic channels is investigated for liquid quantification. Self-tuning RFID microchips are exploited to minimize communication degradation due to the increase of lossy liquid amount over the sensing antenna while digitalizing the impedance mismatch itself. To experimentally corroborate the joint design technique, two different geometries are obtained and prototyped starting from a given antenna-microfluidic layout by setting different goals for an optimization function. The two flexible RFID prototypes returned performance in agreement with the simulated ones, achieving a maximum sensitivity of about 20 units of the digital metric per milligram increase of water.
Struggle with Adversarial Defense? Try Diffusion
Authors: Yujie Li, Yanbin Wang, Haitao xu, Bin Liu, Jianguo Sun, Zhenhao Guo, Wenrui Ma
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2404.08273
Pdf link: https://arxiv.org/pdf/2404.08273
Abstract Adversarial attacks induce misclassification by introducing subtle perturbations. Recently, diffusion models are applied to the image classifiers to improve adversarial robustness through adversarial training or by purifying adversarial noise. However, diffusion-based adversarial training often encounters convergence challenges and high computational expenses. Additionally, diffusion-based purification inevitably causes data shift and is deemed susceptible to stronger adaptive attacks. To tackle these issues, we propose the Truth Maximization Diffusion Classifier (TMDC), a generative Bayesian classifier that builds upon pre-trained diffusion models and the Bayesian theorem. Unlike data-driven classifiers, TMDC, guided by Bayesian principles, utilizes the conditional likelihood from diffusion models to determine the class probabilities of input images, thereby insulating against the influences of data shift and the limitations of adversarial training. Moreover, to enhance TMDC's resilience against more potent adversarial attacks, we propose an optimization strategy for diffusion classifiers. This strategy involves post-training the diffusion model on perturbed datasets with ground-truth labels as conditions, guiding the diffusion model to learn the data distribution and maximizing the likelihood under the ground-truth labels. The proposed method achieves state-of-the-art performance on the CIFAR10 dataset against heavy white-box attacks and strong adaptive attacks. Specifically, TMDC achieves robust accuracies of 82.81% against $l{\infty}$ norm-bounded perturbations and 86.05% against $l{2}$ norm-bounded perturbations, respectively, with $\epsilon=0.05$.
Emerging Property of Masked Token for Effective Pre-training
Authors: Hyesong Choi, Hunsang Lee, Seyoung Joung, Hyejin Park, Jiyeong Kim, Dongbo Min
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08330
Pdf link: https://arxiv.org/pdf/2404.08330
Abstract Driven by the success of Masked Language Modeling (MLM), the realm of self-supervised learning for computer vision has been invigorated by the central role of Masked Image Modeling (MIM) in driving recent breakthroughs. Notwithstanding the achievements of MIM across various downstream tasks, its overall efficiency is occasionally hampered by the lengthy duration of the pre-training phase. This paper presents a perspective that the optimization of masked tokens as a means of addressing the prevailing issue. Initially, we delve into an exploration of the inherent properties that a masked token ought to possess. Within the properties, we principally dedicated to articulating and emphasizing the `data singularity' attribute inherent in masked tokens. Through a comprehensive analysis of the heterogeneity between masked tokens and visible tokens within pre-trained models, we propose a novel approach termed masked token optimization (MTO), specifically designed to improve model efficiency through weight recalibration and the enhancement of the key property of masked tokens. The proposed method serves as an adaptable solution that seamlessly integrates into any MIM approach that leverages masked tokens. As a result, MTO achieves a considerable improvement in pre-training efficiency, resulting in an approximately 50% reduction in pre-training epochs required to attain converged performance of the recent approaches.
Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks
Authors: Yang Yang, Hongpeng Pan, Qing-Yuan Jiang, Yi Xu, Jinghui Tang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08347
Pdf link: https://arxiv.org/pdf/2404.08347
Abstract Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the "modality imbalance" problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approaches often employ a modal-level control mechanism for adjusting the update of each modal parameter. However, such a global-wise updating mechanism ignores the different importance of each parameter. Inspired by subnetwork optimization, we explore a uniform sampling-based optimization strategy and find it more effective than global-wise updating. According to the findings, we further propose a novel importance sampling-based, element-wise joint optimization method, called Adaptively Mask Subnetworks Considering Modal Significance(AMSS). Specifically, we incorporate mutual information rates to determine the modal significance and employ non-uniform adaptive sampling to select foreground subnetworks from each modality for parameter updates, thereby rebalancing multi-modal learning. Additionally, we demonstrate the reliability of the AMSS strategy through convergence analysis. Building upon theoretical insights, we further enhance the multi-modal mask subnetwork strategy using unbiased estimation, referred to as AMSS+. Extensive experiments reveal the superiority of our approach over comparison methods.
Optimization-Based System Identification and Moving Horizon Estimation Using Low-Cost Sensors for a Miniature Car-Like Robot
Authors: Sabrina Bodmer, Lukas Vogel, Simon Muntwiler, Alexander Hansson, Tobias Bodewig, Jonas Wahlen, Melanie N. Zeilinger, Andrea Carron
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.08362
Pdf link: https://arxiv.org/pdf/2404.08362
Abstract This paper presents an open-source miniature car-like robot with low-cost sensing and a pipeline for optimization-based system identification, state estimation, and control. The overall robotics platform comes at a cost of less than $700 and thus significantly simplifies the verification of advanced algorithms in a realistic setting. We present a modified bicycle model with Pacejka tire forces to model the dynamics of the considered all-wheel drive vehicle and to prevent singularities of the model at low velocities. Furthermore, we provide an optimization-based system identification approach and a moving horizon estimation (MHE) scheme. In extensive hardware experiments, we show that the presented system identification approach results in a model with high prediction accuracy, while the MHE results in accurate state estimates. Finally, the overall closed-loop system is shown to perform well even in the presence of sensor failure for limited time intervals. All hardware, firmware, and control and estimation software is released under a BSD 2-clause license to promote widespread adoption and collaboration within the community.
Code Generation and Performance Engineering for Matrix-Free Finite Element Methods on Hybrid Tetrahedral Grids
Authors: Fabian Böhm, Daniel Bauer, Nils Kohl, Christie Alappat, Dominik Thönnes, Marcus Mohr, Harald Köstler, Ulrich Rüde
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Arxiv link: https://arxiv.org/abs/2404.08371
Pdf link: https://arxiv.org/pdf/2404.08371
Abstract This paper introduces a code generator designed for node-level optimized, extreme-scalable, matrix-free finite element operators on hybrid tetrahedral grids. It optimizes the local evaluation of bilinear forms through various techniques including tabulation, relocation of loop invariants, and inter-element vectorization - implemented as transformations of an abstract syntax tree. A key contribution is the development, analysis, and generation of efficient loop patterns that leverage the local structure of the underlying tetrahedral grid. These significantly enhance cache locality and arithmetic intensity, mitigating bandwidth-pressure associated with compute-sparse, low-order operators. The paper demonstrates the generator's capabilities through a comprehensive educational cycle of performance analysis, bottleneck identification, and emission of dedicated optimizations. For three differential operators ($-\Delta$, $-\nabla \cdot (k(\mathbf{x})\, \nabla\,)$, $\alpha(\mathbf{x})\, \mathbf{curl}\ \mathbf{curl} + \beta(\mathbf{x}) $), we determine the set of most effective optimizations. Applied by the generator, they result in speed-ups of up to 58$\times$ compared to reference implementations. Detailed node-level performance analysis yields matrix-free operators with a throughput of 1.3 to 2.1 GDoF/s, achieving up to 62% peak performance on a 36-core Intel Ice Lake socket. Finally, the solution of the curl-curl problem with more than a trillion ($ 10^{12}$) degrees of freedom on 21504 processes in less than 50 seconds demonstrates the generated operators' performance and extreme-scalability as part of a full multigrid solver.
Collective Bayesian Decision-Making in a Swarm of Miniaturized Robots for Surface Inspection
Authors: Thiemen Siemensma, Darren Chiu, Sneha Ramshanker, Radhika Nagpal, Bahar Haghighat
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.08390
Pdf link: https://arxiv.org/pdf/2404.08390
Abstract Robot swarms can effectively serve a variety of sensing and inspection applications. Certain inspection tasks require a binary classification decision. This work presents an experimental setup for a surface inspection task based on vibration sensing and studies a Bayesian two-outcome decision-making algorithm in a swarm of miniaturized wheeled robots. The robots are tasked with individually inspecting and collectively classifying a 1mx1m tiled surface consisting of vibrating and non-vibrating tiles based on the majority type of tiles. The robots sense vibrations using onboard IMUs and perform collision avoidance using a set of IR sensors. We develop a simulation and optimization framework leveraging the Webots robotic simulator and a Particle Swarm Optimization (PSO) method. We consider two existing information sharing strategies and propose a new one that allows the swarm to rapidly reach accurate classification decisions. We first find optimal parameters that allow efficient sampling in simulation and then evaluate our proposed strategy against the two existing ones using 100 randomized simulation and 10 real experiments. We find that our proposed method compels the swarm to make decisions at an accelerated rate, with an improvement of up to 20.52% in mean decision time at only 0.78% loss in accuracy.
Evolutionary Preference Sampling for Pareto Set Learning
Authors: Rongguang Ye, Longcan Chen, Jinyuan Zhang, Hisao Ishibuchi
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08414
Pdf link: https://arxiv.org/pdf/2404.08414
Abstract Recently, Pareto Set Learning (PSL) has been proposed for learning the entire Pareto set using a neural network. PSL employs preference vectors to scalarize multiple objectives, facilitating the learning of mappings from preference vectors to specific Pareto optimal solutions. Previous PSL methods have shown their effectiveness in solving artificial multi-objective optimization problems (MOPs) with uniform preference vector sampling. The quality of the learned Pareto set is influenced by the sampling strategy of the preference vector, and the sampling of the preference vector needs to be decided based on the Pareto front shape. However, a fixed preference sampling strategy cannot simultaneously adapt the Pareto front of multiple MOPs. To address this limitation, this paper proposes an Evolutionary Preference Sampling (EPS) strategy to efficiently sample preference vectors. Inspired by evolutionary algorithms, we consider preference sampling as an evolutionary process to generate preference vectors for neural network training. We integrate the EPS strategy into five advanced PSL methods. Extensive experiments demonstrate that our proposed method has a faster convergence speed than baseline algorithms on 7 testing problems. Our implementation is available at https://github.com/rG223/EPS.
Federated Optimization with Doubly Regularized Drift Correction
Authors: Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2404.08447
Pdf link: https://arxiv.org/pdf/2404.08447
Abstract Federated learning is a distributed optimization paradigm that allows training machine learning models across decentralized devices while keeping the data localized. The standard method, FedAvg, suffers from client drift which can hamper performance and increase communication costs over centralized methods. Previous works proposed various strategies to mitigate drift, yet none have shown uniformly improved communication-computation trade-offs over vanilla gradient descent. In this work, we revisit DANE, an established method in distributed optimization. We show that (i) DANE can achieve the desired communication reduction under Hessian similarity constraints. Furthermore, (ii) we present an extension, DANE+, which supports arbitrary inexact local solvers and has more freedom to choose how to aggregate the local updates. We propose (iii) a novel method, FedRed, which has improved local computational complexity and retains the same communication complexity compared to DANE/DANE+. This is achieved by using doubly regularized drift correction.
Riemannian optimization on the symplectic Stiefel manifold using second-order information
Authors: Rasmus Jensen, Ralf Zimmermann
Subjects: Numerical Analysis (math.NA)
Arxiv link: https://arxiv.org/abs/2404.08463
Pdf link: https://arxiv.org/pdf/2404.08463
Abstract Riemannian optimization is concerned with problems, where the independent variable lies on a smooth manifold. There is a number of problems from numerical linear algebra that fall into this category, where the manifold is usually specified by special matrix structures, such as orthogonality or definiteness. Following this line of research, we investigate tools for Riemannian optimization on the symplectic Stiefel manifold. We complement the existing set of numerical optimization algorithms with a Riemannian trust region method tailored to the symplectic Stiefel manifold. To this end, we derive a matrix formula for the Riemannian Hessian under a right-invariant metric. Moreover, we propose a novel retraction for approximating the Riemannian geodesics. Finally, we conduct a comparative study in which we juxtapose the performance of the Riemannian variants of the steepest descent, conjugate gradients, and trust region methods on selected matrix optimization problems that feature symplectic constraints.
A stable decoupled perfectly matched layer for the 3D wave equation using the nodal discontinuous Galerkin method
Authors: Sophia Julia Feriani, Matthias Cosnefroy, Allan Peter Engsig-Karup, Tim Warburton, Finnur Pind, Cheol-Ho Jeong
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
Arxiv link: https://arxiv.org/abs/2404.08464
Pdf link: https://arxiv.org/pdf/2404.08464
Abstract In outdoor acoustics, the calculations of sound propagating in air can be computationally heavy if the domain is chosen large enough to fulfil the Sommerfeld radiation condition. By strategically truncating the computational domain with a efficient boundary treatment, the computational cost is lowered. One commonly used boundary treatment is the perfectly matched layer (PML) that dampens outgoing waves without polluting the computed solution in the inner domain. The purpose of this study is to propose and assess a new perfectly matched layer formulation for the 3D acoustic wave equation, using the nodal discontinuous Galerkin finite element method. The formulation is based on an efficient PML formulation that can be decoupled to further increase the computational efficiency and guarantee stability without sacrificing accuracy. This decoupled PML formulation is demonstrated to be long-time stable and an optimization procedure of the damping functions is proposed to enhance the performance of the formulation.
Dataset Reset Policy Optimization for RLHF
Authors: Jonathan D. Chang, Wenhao Shan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.08495
Pdf link: https://arxiv.org/pdf/2404.08495
Abstract Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a reward model from an offline preference dataset followed by running online RL to optimize the learned reward model. In this work, leveraging the idea of reset, we propose a new RLHF algorithm with provable guarantees. Motivated by the fact that offline preference dataset provides informative states (i.e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution. In theory, we show that DR-PO learns to perform at least as good as any policy that is covered by the offline dataset under general function approximation with finite sample complexity. In experiments, we demonstrate that on both the TL;DR summarization and the Anthropic Helpful Harmful (HH) dataset, the generation from DR-PO is better than that from Proximal Policy Optimization (PPO) and Direction Preference Optimization (DPO), under the metric of GPT4 win-rate. Code for this work can be found at https://github.com/Cornell-RL/drpo.
Prescribing Optimal Health-Aware Operation for Urban Air Mobility with Deep Reinforcement Learning
Authors: Mina Montazeri, Chetan Kulkarni, Olga Fink
Subjects: Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.08497
Pdf link: https://arxiv.org/pdf/2404.08497
Abstract Urban Air Mobility (UAM) aims to expand existing transportation networks in metropolitan areas by offering short flights either to transport passengers or cargo. Electric vertical takeoff and landing aircraft powered by lithium-ion battery packs are considered promising for such applications. Efficient mission planning is cru-cial, maximizing the number of flights per battery charge while ensuring completion even under unforeseen events. As batteries degrade, precise mission planning becomes challenging due to uncertainties in the end-of-discharge prediction. This often leads to adding safety margins, reducing the number or duration of po-tential flights on one battery charge. While predicting the end of discharge can support decision-making, it remains insufficient in case of unforeseen events, such as adverse weather conditions. This necessitates health-aware real-time control to address any unexpected events and extend the time until the end of charge while taking the current degradation state into account. This paper addresses the joint problem of mission planning and health-aware real-time control of opera-tional parameters to prescriptively control the duration of one discharge cycle of the battery pack. We pro-pose an algorithm that proactively prescribes operational parameters to extend the discharge cycle based on the battery's current health status while optimizing the mission. The proposed deep reinforcement learn-ing algorithm facilitates operational parameter optimization and path planning while accounting for the degradation state, even in the presence of uncertainties. Evaluation of simulated flights of a NASA concep-tual multirotor aircraft model, collected from Hardware-in-the-loop experiments, demonstrates the algo-rithm's near-optimal performance across various operational scenarios, allowing adaptation to changed en-vironmental conditions.
Analyzing and Overcoming Local Optima in Complex Multi-Objective Optimization by Decomposition-Based Evolutionary Algorithms
Authors: Ting Dong, Haoxin Wang, Hengxi Zhang, Wenbo Ding
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08501
Pdf link: https://arxiv.org/pdf/2404.08501
Abstract When addressing the challenge of complex multi-objective optimization problems, particularly those with non-convex and non-uniform Pareto fronts, Decomposition-based Multi-Objective Evolutionary Algorithms (MOEADs) often converge to local optima, thereby limiting solution diversity. Despite its significance, this issue has received limited theoretical exploration. Through a comprehensive geometric analysis, we identify that the traditional method of Reference Point (RP) selection fundamentally contributes to this challenge. In response, we introduce an innovative RP selection strategy, the Weight Vector-Guided and Gaussian-Hybrid method, designed to overcome the local optima issue. This approach employs a novel RP type that aligns with weight vector directions and integrates a Gaussian distribution to combine three distinct RP categories. Our research comprises two main experimental components: an ablation study involving 14 algorithms within the MOEADs framework, spanning from 2014 to 2022, to validate our theoretical framework, and a series of empirical tests to evaluate the effectiveness of our proposed method against both traditional and cutting-edge alternatives. Results demonstrate that our method achieves remarkable improvements in both population diversity and convergence.
Advancing Forest Fire Prevention: Deep Reinforcement Learning for Effective Firebreak Placement
Authors: Lucas Murray, Tatiana Castillo, Jaime Carrasco, Andrés Weintraub, Richard Weber, Isaac Martín de Diego, José Ramón González, Jordi García-Gonzalo
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08523
Pdf link: https://arxiv.org/pdf/2404.08523
Abstract Over the past decades, the increase in both frequency and intensity of large-scale wildfires due to climate change has emerged as a significant natural threat. The pressing need to design resilient landscapes capable of withstanding such disasters has become paramount, requiring the development of advanced decision-support tools. Existing methodologies, including Mixed Integer Programming, Stochastic Optimization, and Network Theory, have proven effective but are hindered by computational demands, limiting their applicability. In response to this challenge, we propose using artificial intelligence techniques, specifically Deep Reinforcement Learning, to address the complex problem of firebreak placement in the landscape. We employ value-function based approaches like Deep Q-Learning, Double Deep Q-Learning, and Dueling Double Deep Q-Learning. Utilizing the Cell2Fire fire spread simulator combined with Convolutional Neural Networks, we have successfully implemented a computational agent capable of learning firebreak locations within a forest environment, achieving good results. Furthermore, we incorporate a pre-training loop, initially teaching our agent to mimic a heuristic-based algorithm and observe that it consistently exceeds the performance of these solutions. Our findings underscore the immense potential of Deep Reinforcement Learning for operational research challenges, especially in fire prevention. Our approach demonstrates convergence with highly favorable results in problem instances as large as 40 x 40 cells, marking a significant milestone in applying Reinforcement Learning to this critical issue. To the best of our knowledge, this study represents a pioneering effort in using Reinforcement Learning to address the aforementioned problem, offering promising perspectives in fire prevention and landscape management
Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation
Authors: Hanlin Tian, Kethan Reddy, Yuxiang Feng, Mohammed Quddus, Yiannis Demiris, Panagiotis Angeloudis
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08570
Pdf link: https://arxiv.org/pdf/2404.08570
Abstract This paper introduces CRITICAL, a novel closed-loop framework for autonomous vehicle (AV) training and testing. CRITICAL stands out for its ability to generate diverse scenarios, focusing on critical driving situations that target specific learning and performance gaps identified in the Reinforcement Learning (RL) agent. The framework achieves this by integrating real-world traffic dynamics, driving behavior analysis, surrogate safety measures, and an optional Large Language Model (LLM) component. It is proven that the establishment of a closed feedback loop between the data generation pipeline and the training process can enhance the learning rate during training, elevate overall system performance, and augment safety resilience. Our evaluations, conducted using the Proximal Policy Optimization (PPO) and the HighwayEnv simulation environment, demonstrate noticeable performance improvements with the integration of critical case generation and LLM analysis, indicating CRITICAL's potential to improve the robustness of AV systems and streamline the generation of critical scenarios. This ultimately serves to hasten the development of AV agents, expand the general scope of RL training, and ameliorate validation efforts for AV safety.
Keyword: deep learning

A Multi-Level Framework for Accelerating Training Transformer Models
Authors: Longwei Zou, Han Zhang, Yangdong Deng
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2404.07999
Pdf link: https://arxiv.org/pdf/2404.07999
Abstract The fast growing capabilities of large-scale deep learning models, such as Bert, GPT and ViT, are revolutionizing the landscape of NLP, CV and many other domains. Training such models, however, poses an unprecedented demand for computing power, which incurs exponentially increasing energy cost and carbon dioxide emissions. It is thus critical to develop efficient training solutions to reduce the training costs. Motivated by a set of key observations of inter- and intra-layer similarities among feature maps and attentions that can be identified from typical training processes, we propose a multi-level framework for training acceleration. Specifically, the framework is based on three basic operators, Coalescing, De-coalescing and Interpolation, which can be orchestrated to build a multi-level training framework. The framework consists of a V-cycle training process, which progressively down- and up-scales the model size and projects the parameters between adjacent levels of models via coalescing and de-coalescing. The key idea is that a smaller model that can be trained for fast convergence and the trained parameters provides high-qualities intermediate solutions for the next level larger network. The interpolation operator is designed to break the symmetry of neurons incurred by de-coalescing for better convergence performance. Our experiments on transformer-based language models (e.g. Bert, GPT) as well as a vision model (e.g. DeiT) prove that the proposed framework reduces the computational cost by about 20% on training BERT/GPT-Base models and up to 51.6% on training the BERT-Large model while preserving the performance.
An inclusive review on deep learning techniques and their scope in handwriting recognition
Authors: Sukhdeep Singh, Sudhir Rohilla, Anuj Sharma
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08011
Pdf link: https://arxiv.org/pdf/2404.08011
Abstract Deep learning expresses a category of machine learning algorithms that have the capability to combine raw inputs into intermediate features layers. These deep learning algorithms have demonstrated great results in different fields. Deep learning has particularly witnessed for a great achievement of human level performance across a number of domains in computer vision and pattern recognition. For the achievement of state-of-the-art performances in diverse domains, the deep learning used different architectures and these architectures used activation functions to perform various computations between hidden and output layers of any architecture. This paper presents a survey on the existing studies of deep learning in handwriting recognition field. Even though the recent progress indicates that the deep learning methods has provided valuable means for speeding up or proving accurate results in handwriting recognition, but following from the extensive literature survey, the present study finds that the deep learning has yet to revolutionize more and has to resolve many of the most pressing challenges in this field, but promising advances have been made on the prior state of the art. Additionally, an inadequate availability of labelled data to train presents problems in this domain. Nevertheless, the present handwriting recognition survey foresees deep learning enabling changes at both bench and bedside with the potential to transform several domains as image processing, speech recognition, computer vision, machine translation, robotics and control, medical imaging, medical information processing, bio-informatics, natural language processing, cyber security, and many others.
ONNXPruner: ONNX-Based General Model Pruning Adapter
Authors: Dongdong Ren, Wenbin Li, Tianyu Ding, Lei Wang, Qi Fan, Jing Huo, Hongbing Pan, Yang Gao
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08016
Pdf link: https://arxiv.org/pdf/2404.08016
Abstract Recent advancements in model pruning have focused on developing new algorithms and improving upon benchmarks. However, the practical application of these algorithms across various models and platforms remains a significant challenge. To address this challenge, we propose ONNXPruner, a versatile pruning adapter designed for the ONNX format models. ONNXPruner streamlines the adaptation process across diverse deep learning frameworks and hardware platforms. A novel aspect of ONNXPruner is its use of node association trees, which automatically adapt to various model architectures. These trees clarify the structural relationships between nodes, guiding the pruning process, particularly highlighting the impact on interconnected nodes. Furthermore, we introduce a tree-level evaluation method. By leveraging node association trees, this method allows for a comprehensive analysis beyond traditional single-node evaluations, enhancing pruning performance without the need for extra operations. Experiments across multiple models and datasets confirm ONNXPruner's strong adaptability and increased efficacy. Our work aims to advance the practical application of model pruning.
AI-Guided Feature Segmentation Techniques to Model Features from Single Crystal Diamond Growth
Authors: Rohan Reddy Mekala, Elias Garratt, Matthias Muehle, Arjun Srinivasan, Adam Porter, Mikael Lindvall
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08017
Pdf link: https://arxiv.org/pdf/2404.08017
Abstract Process refinement to consistently produce high-quality material over a large area of the grown crystal, enabling various applications from optics crystals to quantum detectors, has long been a goal for diamond growth. Machine learning offers a promising path toward this goal, but faces challenges such as the complexity of features within datasets, their time-dependency, and the volume of data produced per growth run. Accurate spatial feature extraction from image to image for real-time monitoring of diamond growth is crucial yet complicated due to the low-volume and high feature complexity nature of the datasets. This paper compares various traditional and machine learning-driven approaches for feature extraction in the diamond growth domain, proposing a novel deep learning-driven semantic segmentation approach to isolate and classify accurate pixel masks of geometric features like diamond, pocket holder, and background, along with their derivative features based on shape and size. Using an annotation-focused human-in-the-loop software architecture for training datasets, with modules for selective data labeling using active learning, data augmentations, and model-assisted labeling, our approach achieves effective annotation accuracy and drastically reduces labeling time and cost. Deep learning algorithms prove highly efficient in accurately learning complex representations from datasets with many features. Our top-performing model, based on the DeeplabV3plus architecture, achieves outstanding accuracy in classifying features of interest, with accuracies of 96.31% for pocket holder, 98.60% for diamond top, and 91.64% for diamond side features.
Physics-Enhanced Graph Neural Networks For Soft Sensing in Industrial Internet of Things
Authors: Keivan Faghih Niresi, Hugo Bissig, Henri Baumann, Olga Fink
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Arxiv link: https://arxiv.org/abs/2404.08061
Pdf link: https://arxiv.org/pdf/2404.08061
Abstract The Industrial Internet of Things (IIoT) is reshaping manufacturing, industrial processes, and infrastructure management. By fostering new levels of automation, efficiency, and predictive maintenance, IIoT is transforming traditional industries into intelligent, seamlessly interconnected ecosystems. However, achieving highly reliable IIoT can be hindered by factors such as the cost of installing large numbers of sensors, limitations in retrofitting existing systems with sensors, or harsh environmental conditions that may make sensor installation impractical. Soft (virtual) sensing leverages mathematical models to estimate variables from physical sensor data, offering a solution to these challenges. Data-driven and physics-based modeling are the two main methodologies widely used for soft sensing. The choice between these strategies depends on the complexity of the underlying system, with the data-driven approach often being preferred when the physics-based inference models are intricate and present challenges for state estimation. However, conventional deep learning models are typically hindered by their inability to explicitly represent the complex interactions among various sensors. To address this limitation, we adopt Graph Neural Networks (GNNs), renowned for their ability to effectively capture the complex relationships between sensor measurements. In this research, we propose physics-enhanced GNNs, which integrate principles of physics into graph-based methodologies. This is achieved by augmenting additional nodes in the input graph derived from the underlying characteristics of the physical processes. Our evaluation of the proposed methodology on the case study of district heating networks reveals significant improvements over purely data-driven GNNs, even in the presence of noise and parameter inaccuracies.
DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models
Authors: Nastaran Saadati, Minh Pham, Nasla Saleem, Joshua R. Waite, Aditya Balu, Zhanhong Jiang, Chinmay Hegde, Soumik Sarkar
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2404.08079
Pdf link: https://arxiv.org/pdf/2404.08079
Abstract Recent advances in decentralized deep learning algorithms have demonstrated cutting-edge performance on various tasks with large pre-trained models. However, a pivotal prerequisite for achieving this level of competitiveness is the significant communication and computation overheads when updating these models, which prohibits the applications of them to real-world scenarios. To address this issue, drawing inspiration from advanced model merging techniques without requiring additional training, we introduce the Decentralized Iterative Merging-And-Training (DIMAT) paradigm--a novel decentralized deep learning framework. Within DIMAT, each agent is trained on their local data and periodically merged with their neighboring agents using advanced model merging techniques like activation matching until convergence is achieved. DIMAT provably converges with the best available rate for nonconvex functions with various first-order methods, while yielding tighter error bounds compared to the popular existing approaches. We conduct a comprehensive empirical analysis to validate DIMAT's superiority over baselines across diverse computer vision tasks sourced from multiple datasets. Empirical results validate our theoretical claims by showing that DIMAT attains faster and higher initial gain in accuracy with independent and identically distributed (IID) and non-IID data, incurring lower communication overhead. This DIMAT paradigm presents a new opportunity for the future decentralized learning, enhancing its adaptability to real-world with sparse and light-weight communication and computation.
Predictive Handover Strategy in 6G and Beyond: A Deep and Transfer Learning Approach
Authors: Ioannis Panitsas, Akrit Mudvari, Ali Maatouk, Leandros Tassiulas
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08113
Pdf link: https://arxiv.org/pdf/2404.08113
Abstract Next-generation cellular networks will evolve into more complex and virtualized systems, employing machine learning for enhanced optimization and leveraging higher frequency bands and denser deployments to meet varied service demands. This evolution, while bringing numerous advantages, will also pose challenges, especially in mobility management, as it will increase the overall number of handovers due to smaller coverage areas and the higher signal attenuation. To address these challenges, we propose a deep learning based algorithm for predicting the future serving cell utilizing sequential user equipment measurements to minimize the handover failures and interruption time. Our algorithm enables network operators to dynamically adjust handover triggering events or incorporate UAV base stations for enhanced coverage and capacity, optimizing network objectives like load balancing and energy efficiency through transfer learning techniques. Our framework complies with the O-RAN specifications and can be deployed in a Near-Real-Time RAN Intelligent Controller as an xApp leveraging the E2SM-KPM service model. The evaluation results demonstrate that our algorithm achieves a 92% accuracy in predicting future serving cells with high probability. Finally, by utilizing transfer learning, our algorithm significantly reduces the retraining time by 91% and 77% when new handover trigger decisions or UAV base stations are introduced to the network dynamically.
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
Authors: Sina Hajimiri, Ismail Ben Ayed, Jose Dolz
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08181
Pdf link: https://arxiv.org/pdf/2404.08181
Abstract Despite the significant progress in deep learning for dense visual recognition problems, such as semantic segmentation, traditional methods are constrained by fixed class sets. Meanwhile, vision-language foundation models, such as CLIP, have showcased remarkable effectiveness in numerous zero-shot image-level tasks, owing to their robust generalizability. Recently, a body of work has investigated utilizing these models in open-vocabulary semantic segmentation (OVSS). However, existing approaches often rely on impractical supervised pre-training or access to additional pre-trained networks. In this work, we propose a strong baseline for training-free OVSS, termed Neighbour-Aware CLIP (NACLIP), representing a straightforward adaptation of CLIP tailored for this scenario. Our method enforces localization of patches in the self-attention of CLIP's vision transformer which, despite being crucial for dense prediction tasks, has been overlooked in the OVSS literature. By incorporating design choices favouring segmentation, our approach significantly improves performance without requiring additional data, auxiliary pre-trained networks, or extensive hyperparameter tuning, making it highly practical for real-world applications. Experiments are performed on 8 popular semantic segmentation benchmarks, yielding state-of-the-art performance on most scenarios. Our code is publicly available at https://github.com/sinahmr/NACLIP .
Measuring Domain Shifts using Deep Learning Remote Photoplethysmography Model Similarity
Authors: Nathan Vance, Patrick Flynn
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08184
Pdf link: https://arxiv.org/pdf/2404.08184
Abstract Domain shift differences between training data for deep learning models and the deployment context can result in severe performance issues for models which fail to generalize. We study the domain shift problem under the context of remote photoplethysmography (rPPG), a technique for video-based heart rate inference. We propose metrics based on model similarity which may be used as a measure of domain shift, and we demonstrate high correlation between these metrics and empirical performance. One of the proposed metrics with viable correlations, DS-diff, does not assume access to the ground truth of the target domain, i.e. it may be applied to in-the-wild data. To that end, we investigate a model selection problem in which ground truth results for the evaluation domain is not known, demonstrating a 13.9% performance improvement over the average case baseline.
A Survey of Neural Network Robustness Assessment in Image Recognition
Authors: Jie Wang, Jun Ai, Minyan Lu, Haoran Su, Dan Yu, Yutao Zhang, Junda Zhu, Jingyu Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Arxiv link: https://arxiv.org/abs/2404.08285
Pdf link: https://arxiv.org/pdf/2404.08285
Abstract In recent years, there has been significant attention given to the robustness assessment of neural networks. Robustness plays a critical role in ensuring reliable operation of artificial intelligence (AI) systems in complex and uncertain environments. Deep learning's robustness problem is particularly significant, highlighted by the discovery of adversarial attacks on image classification models. Researchers have dedicated efforts to evaluate robustness in diverse perturbation conditions for image recognition tasks. Robustness assessment encompasses two main techniques: robustness verification/ certification for deliberate adversarial attacks and robustness testing for random data corruptions. In this survey, we present a detailed examination of both adversarial robustness (AR) and corruption robustness (CR) in neural network assessment. Analyzing current research papers and standards, we provide an extensive overview of robustness assessment in image recognition. Three essential aspects are analyzed: concepts, metrics, and assessment methods. We investigate the perturbation metrics and range representations used to measure the degree of perturbations on images, as well as the robustness metrics specifically for the robustness conditions of classification models. The strengths and limitations of the existing methods are also discussed, and some potential directions for future research are provided.
Interference Motion Removal for Doppler Radar Vital Sign Detection Using Variational Encoder-Decoder Neural Network
Authors: Mikolaj Czerkawski, Christos Ilioudis, Carmine Clemente, Craig Michie, Ivan Andonovic, Christos Tachtatzis
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08298
Pdf link: https://arxiv.org/pdf/2404.08298
Abstract The treatment of interfering motion contributions remains one of the key challenges in the domain of radar-based vital sign monitoring. Removal of the interference to extract the vital sign contributions is demanding due to overlapping Doppler bands, the complex structure of the interference motions and significant variations in the power levels of their contributions. A novel approach to the removal of interference through the use of a probabilistic deep learning model is presented. Results show that a convolutional encoder-decoder neural network with a variational objective is capable of learning a meaningful representation space of vital sign Doppler-time distribution facilitating their extraction from a mixture signal. The approach is tested on semi-experimental data containing real vital sign signatures and simulated returns from interfering body motions. The application of the proposed network enhances the extraction of the micro-Doppler frequency corresponding to the respiration rate is demonstrated.
Multi-Step Traffic Prediction for Multi-Period Planning in Optical Networks
Authors: Hafsa Maryam, Tania Panayiotou, Georgios Ellinas
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08314
Pdf link: https://arxiv.org/pdf/2404.08314
Abstract A multi-period planning framework is proposed that exploits multi-step ahead traffic predictions to address service overprovisioning and improve adaptability to traffic changes, while ensuring the necessary quality-of-service (QoS) levels. An encoder-decoder deep learning model is initially leveraged for multi-step ahead prediction by analyzing real-traffic traces. This information is then exploited by multi-period planning heuristics to efficiently utilize available network resources while minimizing undesired service disruptions (caused due to lightpath re-allocations), with these heuristics outperforming a single-step ahead prediction approach.
Graph data augmentation with Gromow-Wasserstein Barycenters
Authors: Andrea Ponti
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2404.08376
Pdf link: https://arxiv.org/pdf/2404.08376
Abstract Graphs are ubiquitous in various fields, and deep learning methods have been successful applied in graph classification tasks. However, building large and diverse graph datasets for training can be expensive. While augmentation techniques exist for structured data like images or numerical data, the augmentation of graph data remains challenging. This is primarily due to the complex and non-Euclidean nature of graph data. In this paper, it has been proposed a novel augmentation strategy for graphs that operates in a non-Euclidean space. This approach leverages graphon estimation, which models the generative mechanism of networks sequences. Computational results demonstrate the effectiveness of the proposed augmentation framework in improving the performance of graph classification models. Additionally, using a non-Euclidean distance, specifically the Gromow-Wasserstein distance, results in better approximations of the graphon. This framework also provides a means to validate different graphon estimation approaches, particularly in real-world scenarios where the true graphon is unknown.
NC-TTT: A Noise Contrastive Approach for Test-Time Training
Authors: David Osowiechi, Gustavo A. Vargas Hakim, Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08392
Pdf link: https://arxiv.org/pdf/2404.08392
Abstract Despite their exceptional performance in vision tasks, deep learning models often struggle when faced with domain shifts during testing. Test-Time Training (TTT) methods have recently gained popularity by their ability to enhance the robustness of models through the addition of an auxiliary objective that is jointly optimized with the main task. Being strictly unsupervised, this auxiliary objective is used at test time to adapt the model without any access to labels. In this work, we propose Noise-Contrastive Test-Time Training (NC-TTT), a novel unsupervised TTT technique based on the discrimination of noisy feature maps. By learning to classify noisy views of projected feature maps, and then adapting the model accordingly on new domains, classification performance can be recovered by an important margin. Experiments on several popular test-time adaptation baselines demonstrate the advantages of our method compared to recent approaches for this task. The code can be found at:https://github.com/GustavoVargasHakim/NCTTT.git
A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations
Authors: Lorenc Kapllani, Long Teng
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Computational Finance (q-fin.CP)
Arxiv link: https://arxiv.org/abs/2404.08456
Pdf link: https://arxiv.org/pdf/2404.08456
Abstract In this work, we propose a novel backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations (BSDEs), where the deep neural network (DNN) models are trained not only on the inputs and labels but also the differentials of the corresponding labels. This is motivated by the fact that differential deep learning can provide an efficient approximation of the labels and their derivatives with respect to inputs. The BSDEs are reformulated as differential deep learning problems by using Malliavin calculus. The Malliavin derivatives of solution to a BSDE satisfy themselves another BSDE, resulting thus in a system of BSDEs. Such formulation requires the estimation of the solution, its gradient, and the Hessian matrix, represented by the triple of processes $\left(Y, Z, \Gamma\right).$ All the integrals within this system are discretized by using the Euler-Maruyama method. Subsequently, DNNs are employed to approximate the triple of these unknown processes. The DNN parameters are backwardly optimized at each time step by minimizing a differential learning type loss function, which is defined as a weighted sum of the dynamics of the discretized BSDE system, with the first term providing the dynamics of the process $Y$ and the other the process $Z$. An error analysis is carried out to show the convergence of the proposed algorithm. Various numerical experiments up to $50$ dimensions are provided to demonstrate the high efficiency. Both theoretically and numerically, it is demonstrated that our proposed scheme is more efficient compared to other contemporary deep learning-based methodologies, especially in the computation of the process $\Gamma$.
SpectralMamba: Efficient Mamba for Hyperspectral Image Classification
Authors: Jing Yao, Danfeng Hong, Chenyu Li, Jocelyn Chanussot
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2404.08489
Pdf link: https://arxiv.org/pdf/2404.08489
Abstract Recurrent neural networks and Transformers have recently dominated most applications in hyperspectral (HS) imaging, owing to their capability to capture long-range dependencies from spectrum sequences. However, despite the success of these sequential architectures, the non-ignorable inefficiency caused by either difficulty in parallelization or computationally prohibitive attention still hinders their practicality, especially for large-scale observation in remote sensing scenarios. To address this issue, we herein propose SpectralMamba -- a novel state space model incorporated efficient deep learning framework for HS image classification. SpectralMamba features the simplified but adequate modeling of HS data dynamics at two levels. First, in spatial-spectral space, a dynamical mask is learned by efficient convolutions to simultaneously encode spatial regularity and spectral peculiarity, thus attenuating the spectral variability and confusion in discriminative representation learning. Second, the merged spectrum can then be efficiently operated in the hidden state space with all parameters learned input-dependent, yielding selectively focused responses without reliance on redundant attention or imparallelizable recurrence. To explore the room for further computational downsizing, a piece-wise scanning mechanism is employed in-between, transferring approximately continuous spectrum into sequences with squeezed length while maintaining short- and long-term contextual profiles among hundreds of bands. Through extensive experiments on four benchmark HS datasets acquired by satellite-, aircraft-, and UAV-borne imagers, SpectralMamba surprisingly creates promising win-wins from both performance and efficiency perspectives.
ChatGPT and general-purpose AI count fruits in pictures surprisingly well
Authors: Konlavach Mengsuwan, Juan Camilo Rivera Palacio, Masahiro Ryo
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2404.08515
Pdf link: https://arxiv.org/pdf/2404.08515
Abstract Object counting is a popular task in deep learning applications in various domains, including agriculture. A conventional deep learning approach requires a large amount of training data, often a logistic problem in a real-world application. To address this issue, we examined how well ChatGPT (GPT4V) and a general-purpose AI (foundation model for object counting, T-Rex) can count the number of fruit bodies (coffee cherries) in 100 images. The foundation model with few-shot learning outperformed the trained YOLOv8 model (R2 = 0.923 and 0.900, respectively). ChatGPT also showed some interesting potential, especially when few-shot learning with human feedback was applied (R2 = 0.360 and 0.460, respectively). Moreover, we examined the time required for implementation as a practical question. Obtaining the results with the foundation model and ChatGPT were much shorter than the YOLOv8 model (0.83 hrs, 1.75 hrs, and 161 hrs). We interpret these results as two surprises for deep learning users in applied domains: a foundation model with few-shot domain-specific learning can drastically save time and effort compared to the conventional approach, and ChatGPT can reveal a relatively good performance. Both approaches do not need coding skills, which can foster AI education and dissemination.
Fuxi-DA: A Generalized Deep Learning Data Assimilation Framework for Assimilating Satellite Observations
Authors: Xiaoze Xu, Xiuyu Sun, Wei Han, Xiaohui Zhong, Lei Chen, Hao Li
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
Arxiv link: https://arxiv.org/abs/2404.08522
Pdf link: https://arxiv.org/pdf/2404.08522
Abstract Data assimilation (DA), as an indispensable component within contemporary Numerical Weather Prediction (NWP) systems, plays a crucial role in generating the analysis that significantly impacts forecast performance. Nevertheless, the development of an efficient DA system poses significant challenges, particularly in establishing intricate relationships between the background data and the vast amount of multi-source observation data within limited time windows in operational settings. To address these challenges, researchers design complex pre-processing methods for each observation type, leveraging approximate modeling and the power of super-computing clusters to expedite solutions. The emergence of deep learning (DL) models has been a game-changer, offering unified multi-modal modeling, enhanced nonlinear representation capabilities, and superior parallelization. These advantages have spurred efforts to integrate DL models into various domains of weather modeling. Remarkably, DL models have shown promise in matching, even surpassing, the forecast accuracy of leading operational NWP models worldwide. This success motivates the exploration of DL-based DA frameworks tailored for weather forecasting models. In this study, we introduces FuxiDA, a generalized DL-based DA framework for assimilating satellite observations. By assimilating data from Advanced Geosynchronous Radiation Imager (AGRI) aboard Fengyun-4B, FuXi-DA consistently mitigates analysis errors and significantly improves forecast performance. Furthermore, through a series of single-observation experiments, Fuxi-DA has been validated against established atmospheric physics, demonstrating its consistency and reliability.
Dynamic Neural Control Flow Execution: An Agent-Based Deep Equilibrium Approach for Binary Vulnerability Detection
Authors: Litao Li, Steven H. H. Ding, Andrew Walenstein, Philippe Charland, Benjamin C. M. Fung
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08562
Pdf link: https://arxiv.org/pdf/2404.08562
Abstract Software vulnerabilities are a challenge in cybersecurity. Manual security patches are often difficult and slow to be deployed, while new vulnerabilities are created. Binary code vulnerability detection is less studied and more complex compared to source code, and this has important practical implications. Deep learning has become an efficient and powerful tool in the security domain, where it provides end-to-end and accurate prediction. Modern deep learning approaches learn the program semantics through sequence and graph neural networks, using various intermediate representation of programs, such as abstract syntax trees (AST) or control flow graphs (CFG). Due to the complex nature of program execution, the output of an execution depends on the many program states and inputs. Also, a CFG generated from static analysis can be an overestimation of the true program flow. Moreover, the size of programs often does not allow a graph neural network with fixed layers to aggregate global information. To address these issues, we propose DeepEXE, an agent-based implicit neural network that mimics the execution path of a program. We use reinforcement learning to enhance the branching decision at every program state transition and create a dynamic environment to learn the dependency between a vulnerability and certain program states. An implicitly defined neural network enables nearly infinite state transitions until convergence, which captures the structural information at a higher level. The experiments are conducted on two semi-synthetic and two real-world datasets. We show that DeepEXE is an accurate and efficient method and outperforms the state-of-the-art vulnerability detection methods.
Going Forward-Forward in Distributed Deep Learning
Authors: Ege Aktemur, Ege Zorlutuna, Kaan Bilgili, Tacettin Emre Bok, Berrin Yanikoglu, Suha Orhun Mutluergil
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Arxiv link: https://arxiv.org/abs/2404.08573
Pdf link: https://arxiv.org/pdf/2404.08573
Abstract This paper introduces a new approach in distributed deep learning, utilizing Geoffrey Hinton's Forward-Forward (FF) algorithm to enhance the training of neural networks in distributed computing environments. Unlike traditional methods that rely on forward and backward passes, the FF algorithm employs a dual forward pass strategy, significantly diverging from the conventional backpropagation process. This novel method aligns more closely with the human brain's processing mechanisms, potentially offering a more efficient and biologically plausible approach to neural network training. Our research explores the implementation of the FF algorithm in distributed settings, focusing on its capability to facilitate parallel training of neural network layers. This parallelism aims to reduce training times and resource consumption, thereby addressing some of the inherent challenges in current distributed deep learning systems. By analyzing the effectiveness of the FF algorithm in distributed computing, we aim to demonstrate its potential as a transformative tool in distributed deep learning systems, offering improvements in training efficiency. The integration of the FF algorithm into distributed deep learning represents a significant step forward in the field, potentially revolutionizing the way neural networks are trained in distributed environments.
Generating Synthetic Time Series Data for Cyber-Physical Systems
Authors: Alexander Sommers, Somayeh Bakhtiari Ramezani, Logan Cummins, Sudip Mittal, Shahram Rahimi, Maria Seale, Joseph Jaboure
Subjects: Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2404.08601
Pdf link: https://arxiv.org/pdf/2404.08601
Abstract Data augmentation is an important facilitator of deep learning applications in the time series domain. A gap is identified in the literature, demonstrating sparse exploration of the transformer, the dominant sequence model, for data augmentation in time series. A architecture hybridizing several successful priors is put forth and tested using a powerful time domain similarity metric. Results suggest the challenge of this domain, and several valuable directions for future work.
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Authors: Matteo Tucat, Anirbit Mukherjee
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Arxiv link: https://arxiv.org/abs/2404.08624
Pdf link: https://arxiv.org/pdf/2404.08624
Abstract In this work, we instantiate a regularized form of the gradient clipping algorithm and prove that it can converge to the global minima of deep neural network loss functions provided that the net is of sufficient width. We present empirical evidence that our theoretically founded regularized gradient clipping algorithm is also competitive with the state-of-the-art deep-learning heuristics. Hence the algorithm presented here constitutes a new approach to rigorous deep learning. The modification we do to standard gradient clipping is designed to leverage the PL* condition, a variant of the Polyak-Lojasiewicz inequality which was recently proven to be true for various neural networks for any depth within a neighborhood of the initialisation.

qiaoyuet / arxiv_daily

New submissions for Mon, 15 Apr 24 #89

Keyword: differential privacy

QI-DPFL: Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy

Keyword: privacy

FedAuxHMTL: Federated Auxiliary Hard-Parameter Sharing Multi-Task Learning for Network Edge Traffic Classification

WildGraph: Realistic Graph-based Trajectory Generation for Wildlife

QI-DPFL: Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy

Collaborative-Enhanced Prediction of Spending on Newly Downloaded Mobile Games under Consumption Uncertainty

Manifest V3 Unveiled: Navigating the New Era of Browser Extensions

Keyword: machine learning

An inclusive review on deep learning techniques and their scope in handwriting recognition

AI-Guided Feature Segmentation Techniques to Model Features from Single Crystal Diamond Growth

The OxMat dataset: a multimodal resource for the development of AI-driven technologies in maternal and newborn child health

FedAuxHMTL: Federated Auxiliary Hard-Parameter Sharing Multi-Task Learning for Network Edge Traffic Classification

A Multi-Expert Large Language Model Architecture for Verilog Code Generation

Persistent Classification: A New Approach to Stability of Data and Adversarial Examples

Towards a Robust Soft Baby Robot With Rich Interaction Ability for Advanced Machine Learning Algorithms

Predictive Handover Strategy in 6G and Beyond: A Deep and Transfer Learning Approach

Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls

A Survey on Security of Ultra/Hyper Reliable Low Latency Communication: Recent Advancements, Challenges, and Future Directions

Clustering Analysis of US COVID-19 Rates, Vaccine Participation, and Socioeconomic Factors

Enhancing Fairness and Performance in Machine Learning Models: A Multi-Task Learning Approach with Monte-Carlo Dropout and Pareto Optimality

Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning

Communication-Efficient Model Aggregation with Layer Divergence Feedback in Federated Learning

Learning representations of learning representations

An improved tabular data generator with VAE-GMM integration

Federated Optimization with Doubly Regularized Drift Correction

Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized

Learning-Based Joint Antenna Selection and Precoding Design for Cell-Free MIMO Networks

Hyperbolic Delaunay Geometric Alignment

Keyword: optimization

Learning Efficient and Fair Policies for Uncertainty-Aware Collaborative Human-Robot Order Picking

Differentiable Search for Finding Optimal Quantization Strategy

Enhanced Cooperative Perception for Autonomous Vehicles Using Imperfect Communication

Byzantine Reliable Broadcast with Low Communication and Time Complexity

Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models

SimpliCity: Reconstructing Buildings with Simple Regularized 3D Models

Using Flexibility Envelopes for the Demand-Side Hierarchical Optimization of District Heating Networks

S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for Face Video Editing

Predictive Handover Strategy in 6G and Beyond: A Deep and Transfer Learning Approach

R2 Indicator and Deep Reinforcement Learning Enhanced Adaptive Multi-Objective Evolutionary Algorithm

Loco-Manipulation with Nonimpulsive Contact-Implicit Planning in a Slithering Robot

Enhancing Fairness and Performance in Machine Learning Models: A Multi-Task Learning Approach with Monte-Carlo Dropout and Pareto Optimality

Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning

RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning

MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

Joint Design of Self-Tuning UHF RFID Antenna and Microfluidic Channel for Liquid Sensing

Struggle with Adversarial Defense? Try Diffusion

Emerging Property of Masked Token for Effective Pre-training

Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks

Optimization-Based System Identification and Moving Horizon Estimation Using Low-Cost Sensors for a Miniature Car-Like Robot

Code Generation and Performance Engineering for Matrix-Free Finite Element Methods on Hybrid Tetrahedral Grids

Collective Bayesian Decision-Making in a Swarm of Miniaturized Robots for Surface Inspection

Evolutionary Preference Sampling for Pareto Set Learning

Federated Optimization with Doubly Regularized Drift Correction

Riemannian optimization on the symplectic Stiefel manifold using second-order information

A stable decoupled perfectly matched layer for the 3D wave equation using the nodal discontinuous Galerkin method

Dataset Reset Policy Optimization for RLHF

Prescribing Optimal Health-Aware Operation for Urban Air Mobility with Deep Reinforcement Learning

Analyzing and Overcoming Local Optima in Complex Multi-Objective Optimization by Decomposition-Based Evolutionary Algorithms

Advancing Forest Fire Prevention: Deep Reinforcement Learning for Effective Firebreak Placement

Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation

Keyword: deep learning

A Multi-Level Framework for Accelerating Training Transformer Models

An inclusive review on deep learning techniques and their scope in handwriting recognition

ONNXPruner: ONNX-Based General Model Pruning Adapter

AI-Guided Feature Segmentation Techniques to Model Features from Single Crystal Diamond Growth

Physics-Enhanced Graph Neural Networks For Soft Sensing in Industrial Internet of Things

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

Predictive Handover Strategy in 6G and Beyond: A Deep and Transfer Learning Approach

Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation

Measuring Domain Shifts using Deep Learning Remote Photoplethysmography Model Similarity

A Survey of Neural Network Robustness Assessment in Image Recognition

Interference Motion Removal for Doppler Radar Vital Sign Detection Using Variational Encoder-Decoder Neural Network

Multi-Step Traffic Prediction for Multi-Period Planning in Optical Networks

Graph data augmentation with Gromow-Wasserstein Barycenters

NC-TTT: A Noise Contrastive Approach for Test-Time Training

A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

SpectralMamba: Efficient Mamba for Hyperspectral Image Classification