Abstract
Modern deep networks are trained with stochastic gradient descent (SGD) whose key parameters are the number of data considered at each step or batch size $B$, and the step size or learning rate $\eta$. For small $B$ and large $\eta$, SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the `temperature' $T\equiv \eta/B$. Yet this description is observed to break down for sufficiently large batches $B\geq B^$, or simplifies to gradient descent (GD) when the temperature is sufficiently small. Understanding where these cross-overs take place remains a central challenge. Here we resolve these questions for a teacher-student perceptron classification model, and show empirically that our key predictions still apply to deep networks. Specifically, we obtain a phase diagram in the $B$-$\eta$ plane that separates three dynamical phases: $\textit{(i)}$ a noise-dominated SGD governed by temperature, $\textit{(ii)}$ a large-first-step-dominated SGD and $\textit{(iii)}$ GD. These different phases also corresponds to different regimes of generalization error. Remarkably, our analysis reveals that the batch size $B^$ separating regimes $\textit{(i)}$ and $\textit{(ii)}$ scale with the size $P$ of the training set, with an exponent that characterizes the hardness of the classification problem.
Semi-supervised Domain Adaptation in Graph Transfer Learning
Abstract
As a specific case of graph transfer learning, unsupervised domain adaptation on graphs aims for knowledge transfer from label-rich source graphs to unlabeled target graphs. However, graphs with topology and attributes usually have considerable cross-domain disparity and there are numerous real-world scenarios where merely a subset of nodes are labeled in the source graph. This imposes critical challenges on graph transfer learning due to serious domain shifts and label scarcity. To address these challenges, we propose a method named Semi-supervised Graph Domain Adaptation (SGDA). To deal with the domain shift, we add adaptive shift parameters to each of the source nodes, which are trained in an adversarial manner to align the cross-domain distributions of node embedding, thus the node classifier trained on labeled source nodes can be transferred to the target nodes. Moreover, to address the label scarcity, we propose pseudo-labeling on unlabeled nodes, which improves classification on the target graph via measuring the posterior influence of nodes based on their relative position to the class centroids. Finally, extensive experiments on a range of publicly accessible datasets validate the effectiveness of our proposed SGDA in different experimental settings.
Keyword: optimization
TCGF: A unified tensorized consensus graph framework for multi-view representation learning
Authors: Authors: Xiangzhu Meng, Wei Wei, Qiang Liu, Shu Wu, Liang Wang
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Abstract
Multi-view learning techniques have recently gained significant attention in the machine learning domain for their ability to leverage consistency and complementary information across multiple views. However, there remains a lack of sufficient research on generalized multi-view frameworks that unify existing works into a scalable and robust learning framework, as most current works focus on specific styles of multi-view models. Additionally, most multi-view learning works rely heavily on specific-scale scenarios and fail to effectively comprehend multiple scales holistically. These limitations hinder the effective fusion of essential information from multiple views, resulting in poor generalization. To address these limitations, this paper proposes a universal multi-view representation learning framework named Tensorized Consensus Graph Framework (TCGF). Specifically, it first provides a unified framework for existing multi-view works to exploit the representations for individual view, which aims to be suitable for arbitrary assumptions and different-scales datasets. Then, stacks them into a tensor under alignment basics as a high-order representation, allowing for the smooth propagation of consistency and complementary information across all views. Moreover, TCGF proposes learning a consensus embedding shared by adaptively collaborating all views to uncover the essential structure of the multi-view data, which utilizes view-consensus grouping effect to regularize the view-consensus representation. To further facilitate related research, we provide a specific implementation of TCGF for large-scale datasets, which can be efficiently solved by applying the alternating optimization strategy. Experimental results conducted on seven different-scales datasets indicate the superiority of the proposed TCGF against existing state-of-the-art multi-view learning methods.
Instant Photorealistic Style Transfer: A Lightweight and Adaptive Approach
Authors: Authors: Rong Liu, Enyu Zhao, Zhiyuan Liu, Andrew Wei-Wen Feng, Scott John Easley
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
In this paper, we propose an Instant Photorealistic Style Transfer (IPST) approach, designed to achieve instant photorealistic style transfer on super-resolution inputs without the need for pre-training on pair-wise datasets or imposing extra constraints. Our method utilizes a lightweight StyleNet to enable style transfer from a style image to a content image while preserving non-color information. To further enhance the style transfer process, we introduce an instance-adaptive optimization to prioritize the photorealism of outputs and accelerate the convergence of the style network, leading to a rapid training completion within seconds. Moreover, IPST is well-suited for multi-frame style transfer tasks, as it retains temporal and multi-view consistency of the multi-frame inputs such as video and Neural Radiance Field (NeRF). Experimental results demonstrate that IPST requires less GPU memory usage, offers faster multi-frame transfer speed, and generates photorealistic outputs, making it a promising solution for various photorealistic transfer applications.
Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin
Authors: Authors: Gabriel Moreira, Manuel Marques, João Paulo Costeira, Alexander Hauptmann
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract
Recent research in representation learning has shown that hierarchical data lends itself to low-dimensional and highly informative representations in hyperbolic space. However, even if hyperbolic embeddings have gathered attention in image recognition, their optimization is prone to numerical hurdles. Further, it remains unclear which applications stand to benefit the most from the implicit bias imposed by hyperbolicity, when compared to traditional Euclidean features. In this paper, we focus on prototypical hyperbolic neural networks. In particular, the tendency of hyperbolic embeddings to converge to the boundary of the Poincar\'e ball in high dimensions and the effect this has on few-shot classification. We show that the best few-shot results are attained for hyperbolic embeddings at a common hyperbolic radius. In contrast to prior benchmark results, we demonstrate that better performance can be achieved by a fixed-radius encoder equipped with the Euclidean metric, regardless of the embedding dimension.
Dual Student Networks for Data-Free Model Stealing
Authors: Authors: James Beetham, Navid Kardan, Ajmal Mian, Mubarak Shah
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
Existing data-free model stealing methods use a generator to produce samples in order to train a student model to match the target model outputs. To this end, the two main challenges are estimating gradients of the target model without access to its parameters, and generating a diverse set of training samples that thoroughly explores the input space. We propose a Dual Student method where two students are symmetrically trained in order to provide the generator a criterion to generate samples that the two students disagree on. On one hand, disagreement on a sample implies at least one student has classified the sample incorrectly when compared to the target model. This incentive towards disagreement implicitly encourages the generator to explore more diverse regions of the input space. On the other hand, our method utilizes gradients of student models to indirectly estimate gradients of the target model. We show that this novel training objective for the generator network is equivalent to optimizing a lower bound on the generator's loss if we had access to the target model gradients. We show that our new optimization framework provides more accurate gradient estimation of the target model and better accuracies on benchmark classification datasets. Additionally, our approach balances improved query efficiency with training computation cost. Finally, we demonstrate that our method serves as a better proxy model for transfer-based adversarial attacks than existing data-free model stealing methods.
Efficient Low-Rank GNN Defense Against Structural Attacks
Authors: Authors: Abdullah Alchihabi, Qing En, Yuhong Guo
Abstract
Graph Neural Networks (GNNs) have been shown to possess strong representation abilities over graph data. However, GNNs are vulnerable to adversarial attacks, and even minor perturbations to the graph structure can significantly degrade their performance. Existing methods either are ineffective against sophisticated attacks or require the optimization of dense adjacency matrices, which is time-consuming and prone to local minima. To remedy this problem, we propose an Efficient Low-Rank Graph Neural Network (ELR-GNN) defense method, which aims to learn low-rank and sparse graph structures for defending against adversarial attacks, ensuring effective defense with greater efficiency. Specifically, ELR-GNN consists of two modules: a Coarse Low-Rank Estimation Module and a Fine-Grained Estimation Module. The first module adopts the truncated Singular Value Decomposition (SVD) to initialize the low-rank adjacency matrix estimation, which serves as a starting point for optimizing the low-rank matrix. In the second module, the initial estimate is refined by jointly learning a low-rank sparse graph structure with the GNN model. Sparsity is incorporated into the learned low-rank adjacency matrix by pruning weak connections, which can reduce redundant data while maintaining valuable information. As a result, instead of using the dense adjacency matrix directly, ELR-GNN can learn a low-rank and sparse estimate of it in a simple, efficient and easy to optimize manner. The experimental results demonstrate that ELR-GNN outperforms the state-of-the-art GNN defense methods in the literature, in addition to being very efficient and easy to train.
Comparing an android head with its digital twin regarding the dynamic expression of emotions
Authors: Authors: Amelie Kassner, Christian Becker-Asano
Abstract
Emotions, which are an important component of social interaction, can be studied with the help of android robots and their appearance, which is as similar to humans as possible. The production and customization of android robots is expensive and time-consuming, so it may be practical to use a digital replica. In order to investigate whether there are any perceptual differences in terms of emotions based on the difference in appearance, a robot head was digitally replicated. In an experiment, the basic emotions evaluated in a preliminary study were compared in three conditions and then statistically analyzed. It was found that apart from fear, all emotions were recognized on the real robot head. The digital head with "ideal" emotions performed better than the real head apart from the anger representation, which offers optimization potential for the real head. Contrary to expectations, significant differences between the real and the replicated head with the same emotions could only be found in the representation of surprise.
A System-Level Energy-Efficient Digital Twin Framework for Runtime Control of Batch Manufacturing Processes
Authors: Authors: Hongliang Li, Herschel C. Pangborn, Ilya Kovalenko
Abstract
The manufacturing sector has a substantial influence on worldwide energy consumption. Therefore, improving manufacturing system energy efficiency is becoming increasingly important as the world strives to move toward a more resilient and sustainable energy paradigm. Batch processes are a major contributor to energy consumption in manufacturing systems. In batch manufacturing, a number of parts are grouped together before starting a batch process. To improve the scheduling and control of batch manufacturing processes, we propose a system-level energy-efficient Digital Twin framework that considers Time-of-Use (TOU) energy pricing for runtime decision-making. As part of this framework, we develop a model that combines batch manufacturing process dynamics and TOU-based energy cost. We also provide an optimization-based decision-making algorithm that makes batch scheduling decisions during runtime. A simulated case study showcases the benefits of the proposed framework.
Generalizing Trajectory Retiming to Quadratic Objective Functions
Authors: Authors: Gerry Chen, Frank Dellaert, Seth Hutchinson
Abstract
Trajectory retiming is the task of computing a feasible time parameterization to traverse a path. It is commonly used in the decoupled approach to trajectory optimization whereby a path is first found, then a retiming algorithm computes a speed profile that satisfies kino-dynamic and other constraints. While trajectory retiming is most often formulated with the minimum-time objective (i.e. traverse the path as fast as possible), it is not always the most desirable objective, particularly when we seek to balance multiple objectives or when bang-bang control is unsuitable. In this paper, we present a novel algorithm based on factor graph variable elimination that can solve for the global optimum of the retiming problem with quadratic objectives as well (e.g. minimize control effort or match a nominal speed by minimizing squared error), which may extend to arbitrary objectives with iteration. Our work extends prior works, which find only solutions on the boundary of the feasible region, while maintaining the same linear time complexity from a single forward-backward pass. We experimentally demonstrate that (1) we achieve better real-world robot performance by using quadratic objectives in place of the minimum-time objective, and (2) our implementation is comparable or faster than state-of-the-art retiming algorithms.
QoS-Aware Service Prediction and Orchestration in Cloud-Network Integrated Beyond 5G
Authors: Authors: Mohammad Farhoudi, Masoud Shokrnezhad, Tarik Taleb
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Numerical Analysis (math.NA)
Abstract
Novel applications such as the Metaverse have highlighted the potential of beyond 5G networks, which necessitate ultra-low latency communications and massive broadband connections. Moreover, the burgeoning demand for such services with ever-fluctuating users has engendered a need for heightened service continuity consideration in B5G. To enable these services, the edge-cloud paradigm is a potential solution to harness cloud capacity and effectively manage users in real time as they move across the network. However, edge-cloud networks confront a multitude of limitations, including networking and computing resources that must be collectively managed to unlock their full potential. This paper addresses the joint problem of service placement and resource allocation in a network-cloud integrated environment while considering capacity constraints, dynamic users, and end-to-end delays. We present a non-linear programming model that formulates the optimization problem with the aiming objective of minimizing overall cost while enhancing latency. Next, to address the problem, we introduce a DDQL-based technique using RNNs to predict user behavior, empowered by a water-filling-based algorithm for service placement. The proposed framework adeptly accommodates the dynamic nature of users, the placement of services that mandate ultra-low latency in B5G, and service continuity when users migrate from one location to another. Simulation results show that our solution provides timely responses that optimize the network's potential, offering a scalable and efficient placement.
Computational Design of Wiring Layout on Tight Suits with Minimal Motion Resistance
Authors: Authors: Wang Kai, Xu Xiaoyu, Zhen Yinping, Zhou Da, Guo Shihui, Qin Yipeng, Guo Xiaohu
Abstract
An increasing number of electronics are directly embedded on the clothing to monitor human status (e.g., skeletal motion) or provide haptic feedback. A specific challenge to prototype and fabricate such a clothing is to design the wiring layout, while minimizing the intervention to human motion. We address this challenge by formulating the topological optimization problem on the clothing surface as a deformation-weighted Steiner tree problem on a 3D clothing mesh. Our method proposed an energy function for minimizing strain energy in the wiring area under different motions, regularized by its total length. We built the physical prototype to verify the effectiveness of our method and conducted user study with participants of both design experts and smart cloth users. On three types of commercial products of smart clothing, the optimized layout design reduced wire strain energy by an average of 77\% among 248 actions compared to baseline design, and 18\% over the expert design.
Transferable Adversarial Attack on Image Tampering Localization
Authors: Authors: Yuqi Wang, Gang Cao, Zijie Lou, Haochen Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Abstract
It is significant to evaluate the security of existing digital image tampering localization algorithms in real-world applications. In this paper, we propose an adversarial attack scheme to reveal the reliability of such tampering localizers, which would be fooled and fail to predict altered regions correctly. Specifically, the adversarial examples based on optimization and gradient are implemented for white/black-box attacks. Correspondingly, the adversarial example is optimized via reverse gradient propagation, and the perturbation is added adaptively in the direction of gradient rising. The black-box attack is achieved by relying on the transferability of such adversarial examples to different localizers. Extensive evaluations verify that the proposed attack sharply reduces the localization accuracy while preserving high visual quality of the attacked images.
Safe Control Design through Risk-Tunable Control Barrier Functions
Authors: Authors: Vipul K. Sharma, S. Sivaranjani
Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract
We consider the problem of designing controllers to guarantee safety in a class of nonlinear systems under uncertainties in the system dynamics and/or the environment. We define a class of uncertain control barrier functions (CBFs), and formulate the safe control design problem as a chance-constrained optimization problem with uncertain CBF constraints. We leverage the scenario approach for chance constrained optimization to develop a risk-tunable control design that provably guarantees the satisfaction of CBF safety constraints up to a user-defined probabilistic risk bound, and provides a trade-off between the sample complexity and risk tolerance. We demonstrate the performance of this approach through simulations on a quadcopter navigation problem with obstacle avoidance constraints.
FRAMU: Attention-based Machine Unlearning using Federated Reinforcement Learning
Authors: Authors: Thanveer Shaik, Xiaohui Tao, Lin Li, Haoran Xie, Taotao Cai, Xiaofeng Zhu, Qing Li
Abstract
Machine Unlearning is an emerging field that addresses data privacy issues by enabling the removal of private or irrelevant data from the Machine Learning process. Challenges related to privacy and model efficiency arise from the use of outdated, private, and irrelevant data. These issues compromise both the accuracy and the computational efficiency of models in both Machine Learning and Unlearning. To mitigate these challenges, we introduce a novel framework, Attention-based Machine Unlearning using Federated Reinforcement Learning (FRAMU). This framework incorporates adaptive learning mechanisms, privacy preservation techniques, and optimization strategies, making it a well-rounded solution for handling various data sources, either single-modality or multi-modality, while maintaining accuracy and privacy. FRAMU's strength lies in its adaptability to fluctuating data landscapes, its ability to unlearn outdated, private, or irrelevant data, and its support for continual model evolution without compromising privacy. Our experiments, conducted on both single-modality and multi-modality datasets, revealed that FRAMU significantly outperformed baseline models. Additional assessments of convergence behavior and optimization strategies further validate the framework's utility in federated learning applications. Overall, FRAMU advances Machine Unlearning by offering a robust, privacy-preserving solution that optimizes model performance while also addressing key challenges in dynamic data environments.
Abstract
2D irregular shape packing is a necessary step to arrange UV patches of a 3D model within a texture atlas for memory-efficient appearance rendering in computer graphics. Being a joint, combinatorial decision-making problem involving all patch positions and orientations, this problem has well-known NP-hard complexity. Prior solutions either assume a heuristic packing order or modify the upstream mesh cut and UV mapping to simplify the problem, which either limits the packing ratio or incurs robustness or generality issues. Instead, we introduce a learning-assisted 2D irregular shape packing method that achieves a high packing quality with minimal requirements from the input. Our method iteratively selects and groups subsets of UV patches into near-rectangular super patches, essentially reducing the problem to bin-packing, based on which a joint optimization is employed to further improve the packing ratio. In order to efficiently deal with large problem instances with hundreds of patches, we train deep neural policies to predict nearly rectangular patch subsets and determine their relative poses, leading to linear time scaling with the number of patches. We demonstrate the effectiveness of our method on three datasets for UV packing, where our method achieves a higher packing ratio over several widely used baselines with competitive computational speed.
Anti-Aliased Neural Implicit Surfaces with Encoding Level of Detail
Authors: Authors: Yiyu Zhuang, Qi Zhang, Ying Feng, Hao Zhu, Yao Yao, Xiaoyu Li, Yan-Pei Cao, Ying Shan, Xun Cao
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract
We present LoD-NeuS, an efficient neural representation for high-frequency geometry detail recovery and anti-aliased novel view rendering. Drawing inspiration from voxel-based representations with the level of detail (LoD), we introduce a multi-scale tri-plane-based scene representation that is capable of capturing the LoD of the signed distance function (SDF) and the space radiance. Our representation aggregates space features from a multi-convolved featurization within a conical frustum along a ray and optimizes the LoD feature volume through differentiable rendering. Additionally, we propose an error-guided sampling strategy to guide the growth of the SDF during the optimization. Both qualitative and quantitative evaluations demonstrate that our method achieves superior surface reconstruction and photorealistic view synthesis compared to state-of-the-art approaches.
FedWOA: A Federated Learning Model that uses the Whale Optimization Algorithm for Renewable Energy Prediction
Abstract
Privacy is important when dealing with sensitive personal information in machine learning models, which require large data sets for training. In the energy field, access to household prosumer energy data is crucial for energy predictions to support energy grid management and large-scale adoption of renewables however citizens are often hesitant to grant access to cloud-based machine learning models. Federated learning has been proposed as a solution to privacy challenges however report issues in generating the global prediction model due to data heterogeneity, variations in generation patterns, and the high number of parameters leading to even lower prediction accuracy. This paper addresses these challenges by introducing FedWOA a novel federated learning model that employs the Whale Optimization Algorithm to aggregate global prediction models from the weights of local LTSM neural network models trained on prosumer energy data. The proposed solution identifies the optimal vector of weights in the search spaces of the local models to construct the global shared model and then is subsequently transmitted to the local nodes to improve the prediction quality at the prosumer site while for handling non-IID data K-Means was used for clustering prosumers with similar scale of energy data. The evaluation results on prosumers energy data have shown that FedWOA can effectively enhance the accuracy of energy prediction models accuracy by 25% for MSE and 16% for MAE compared to FedAVG while demonstrating good convergence and reduced loss.
Optimal Beamforming Structure for Rate Splitting Multiple Access
Abstract
In this paper, we aim at maximizing the weighted sum-rate (WSR) of rate splitting multiple access (RSMA) in multi-user multi-antenna transmission networks through the joint optimization of rate allocation and beamforming. Unlike conventional methods like weighted minimum mean square error (WMMSE) and standard fractional programming (FP), which tackle the non-convex WSR problem iteratively using disciplined convex subproblems and optimization toolboxes, our work pioneers a novel toolbox-free approach. For the first time, we identify the optimal beamforming structure and common rate allocation for WSR maximization in RSMA by leveraging FP and Lagrangian duality. Then we propose an algorithm based on FP and fixed point iteration to optimize the beamforming and common rate allocation without the need for optimization toolboxes. Our numerical results demonstrate that the proposed algorithm attains the same performance as standard FP and classical WMMSE methods while significantly reducing computational time.
Decoupling the Curve Modeling and Pavement Regression for Lane Detection
Authors: Authors: Wencheng Han, Jianbing Shen
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
The curve-based lane representation is a popular approach in many lane detection methods, as it allows for the representation of lanes as a whole object and maximizes the use of holistic information about the lanes. However, the curves produced by these methods may not fit well with irregular lines, which can lead to gaps in performance compared to indirect representations such as segmentation-based or point-based methods. We have observed that these lanes are not intended to be irregular, but they appear zigzagged in the perspective view due to being drawn on uneven pavement. In this paper, we propose a new approach to the lane detection task by decomposing it into two parts: curve modeling and ground height regression. Specifically, we use a parameterized curve to represent lanes in the BEV space to reflect the original distribution of lanes. For the second part, since ground heights are determined by natural factors such as road conditions and are less holistic, we regress the ground heights of key points separately from the curve modeling. Additionally, we have unified the 2D and 3D lane detection tasks by designing a new framework and a series of losses to guide the optimization of models with or without 3D lane labels. Our experiments on 2D lane detection benchmarks (TuSimple and CULane), as well as the recently proposed 3D lane detection datasets (ONCE-3Dlane and OpenLane), have shown significant improvements. We will make our well-documented source code publicly available.
An overview of some mathematical techniques and problems linking 3D vision to 3D printing
Abstract
Computer Vision and 3D printing have rapidly evolved in the last 10 years but interactions among them have been very limited so far, despite the fact that they share several mathematical techniques. We try to fill the gap presenting an overview of some techniques for Shape-from-Shading problems as well as for 3D printing with an emphasis on the approaches based on nonlinear partial differential equations and optimization. We also sketch possible couplings to complete the process of object manufacturing starting from one or more images of the object and ending with its final 3D print. We will give some practical examples of this procedure.
Abstract
Cross-lingual entity alignment is the task of finding the same semantic entities from different language knowledge graphs. In this paper, we propose a simple and novel unsupervised method for cross-language entity alignment. We utilize the deep learning multi-language encoder combined with a machine translator to encode knowledge graph text, which reduces the reliance on label data. Unlike traditional methods that only emphasize global or local alignment, our method simultaneously considers both alignment strategies. We first view the alignment task as a bipartite matching problem and then adopt the re-exchanging idea to accomplish alignment. Compared with the traditional bipartite matching algorithm that only gives one optimal solution, our algorithm generates ranked matching results which enabled many potentials downstream tasks. Additionally, our method can adapt two different types of optimization (minimal and maximal) in the bipartite matching process, which provides more flexibility. Our evaluation shows, we each scored 0.966, 0.990, and 0.996 Hits@1 rates on the DBP15K dataset in Chinese, Japanese, and French to English alignment tasks. We outperformed the state-of-the-art method in unsupervised and semi-supervised categories. Compared with the state-of-the-art supervised method, our method outperforms 2.6% and 0.4% in Ja-En and Fr-En alignment tasks while marginally lower by 0.2% in the Zh-En alignment task.
Neural Metamaterial Networks for Nonlinear Material Design
Authors: Authors: Yue Li, Stelian Coros, Bernhard Thomaszewski
Abstract
Nonlinear metamaterials with tailored mechanical properties have applications in engineering, medicine, robotics, and beyond. While modeling their macromechanical behavior is challenging in itself, finding structure parameters that lead to ideal approximation of high-level performance goals is a challenging task. In this work, we propose Neural Metamaterial Networks (NMN) -- smooth neural representations that encode the nonlinear mechanics of entire metamaterial families. Given structure parameters as input, NMN return continuously differentiable strain energy density functions, thus guaranteeing conservative forces by construction. Though trained on simulation data, NMN do not inherit the discontinuities resulting from topological changes in finite element meshes. They instead provide a smooth map from parameter to performance space that is fully differentiable and thus well-suited for gradient-based optimization. On this basis, we formulate inverse material design as a nonlinear programming problem that leverages neural networks for both objective functions and constraints. We use this approach to automatically design materials with desired strain-stress curves, prescribed directional stiffness and Poisson ratio profiles. We furthermore conduct ablation studies on network nonlinearities and show the advantages of our approach compared to native-scale optimization.
A Novel Hybrid Algorithm for Optimized Solutions in Ocean Renewable Energy Industry: Enhancing Power Take-Off Parameters and Site Selection Procedure of Wave Energy Converters
Abstract
Ocean renewable energy, particularly wave energy, has emerged as a pivotal component for diversifying the global energy portfolio, reducing dependence on fossil fuels, and mitigating climate change impacts. This study delves into the optimization of power take-off (PTO) parameters and the site selection process for an offshore oscillating surge wave energy converter (OSWEC). However, the intrinsic dynamics of these interactions, coupled with the multi-modal nature of the optimization landscape, make this a daunting challenge. Addressing this, we introduce the novel Hill Climb - Explorative Gray Wolf Optimizer (HC-EGWO). This new methodology blends a local search method with a global optimizer, incorporating dynamic control over exploration and exploitation rates. This balance paves the way for an enhanced exploration of the solution space, ensuring the identification of superior-quality solutions. Further anchoring our approach, a feasibility landscape analysis based on linear water wave theory assumptions and the flap's maximum angular motion is conducted. This ensures the optimized OSWEC consistently operates within safety and efficiency parameters. Our findings hold significant promise for the development of more streamlined OSWEC power take-off systems. They provide insights for selecting the prime offshore site, optimizing power output, and bolstering the overall adoption of ocean renewable energy sources. Impressively, by employing the HC-EGWO method, we achieved an upswing of up to 3.31% in power output compared to other methods. This substantial increment underscores the efficacy of our proposed optimization approach. Conclusively, the outcomes offer invaluable knowledge for deploying OSWECs in the South Caspian Sea, where unique environmental conditions intersect with considerable energy potential.
Learning Adaptive Safety for Multi-Agent Systems
Authors: Authors: Luigi Berducci, Shuo Yang, Rahul Mangharam, Radu Grosu
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
Abstract
Ensuring safety in dynamic multi-agent systems is challenging due to limited information about the other agents. Control Barrier Functions (CBFs) are showing promise for safety assurance but current methods make strong assumptions about other agents and often rely on manual tuning to balance safety, feasibility, and performance. In this work, we delve into the problem of adaptive safe learning for multi-agent systems with CBF. We show how emergent behavior can be profoundly influenced by the CBF configuration, highlighting the necessity for a responsive and dynamic approach to CBF design. We present ASRL, a novel adaptive safe RL framework, to fully automate the optimization of policy and CBF coefficients, to enhance safety and long-term performance through reinforcement learning. By directly interacting with the other agents, ASRL learns to cope with diverse agent behaviours and maintains the cost violations below a desired limit. We evaluate ASRL in a multi-robot system and a competitive multi-agent racing scenario, against learning-based and control-theoretic approaches. We empirically demonstrate the efficacy and flexibility of ASRL, and assess generalization and scalability to out-of-distribution scenarios. Code and supplementary material are public online.
Asymptotically Optimal Belief Space Planning in Discrete Partially-Observable Domains
Authors: Authors: Janis Eric Freund, Camille Phiquepal, Andreas Orthey, Marc Toussaint
Abstract
Robots often have to operate in discrete partially observable worlds, where the states of world are only observable at runtime. To react to different world states, robots need contingencies. However, computing contingencies is costly and often non-optimal. To address this problem, we develop the improved path tree optimization (PTO) method. PTO computes motion contingencies by constructing a tree of motion paths in belief space. This is achieved by constructing a graph of configurations, then adding observation edges to extend the graph to belief space. Afterwards, we use a dynamic programming step to extract the path tree. PTO extends prior work by adding a camera-based state sampler to improve the search for observation points. We also add support to non-euclidean state spaces, provide an implementation in the open motion planning library (OMPL), and evaluate PTO on four realistic scenarios with a virtual camera in up to 10-dimensional state spaces. We compare PTO with a default and with the new camera-based state sampler. The results indicate that the camera-based state sampler improves success rates in 3 out of 4 scenarios while having a significant lower memory footprint. This makes PTO an important contribution to advance the state-of-the-art for discrete belief space planning.
Learning-Initialized Trajectory Planning in Unknown Environments
Abstract
Autonomous flight in unknown environments requires precise planning for both the spatial and temporal profiles of trajectories, which generally involves nonconvex optimization, leading to high time costs and susceptibility to local optima. To address these limitations, we introduce the Learning-Initialized Trajectory Planner (LIT-Planner), a novel approach that guides optimization using a Neural Network (NN) Planner to provide initial values. We first leverage the spatial-temporal optimization with batch sampling to generate training cases, aiming to capture multimodality in trajectories. Based on these data, the NN-Planner maps visual and inertial observations to trajectory parameters for handling unknown environments. The network outputs are then optimized to enhance both reliability and explainability, ensuring robust performance. Furthermore, we propose a framework that supports robust online replanning with tolerance to planning latency. Comprehensive simulations validate the LIT-Planner's time efficiency without compromising trajectory quality compared to optimization-based methods. Real-world experiments further demonstrate its practical suitability for autonomous drone navigation.
Machine Learning-Driven Burrowing with a Snake-Like Robot
Authors: Authors: Sean Even, Holden Gordon, Hoeseok Yang, Yasemin Ozkan-Aydin
Abstract
Subterranean burrowing is inherently difficult for robots because of the high forces experienced as well as the high amount of uncertainty in this domain. Because of the difficulty in modeling forces in granular media, we propose the use of a novel machine-learning control strategy to obtain optimal techniques for vertical self-burrowing. In this paper, we realize a snake-like bio-inspired robot that is equipped with an IMU and two triple-axis magnetometers. Utilizing magnetic field strength as an analog for depth, a novel deep learning architecture was proposed based on sinusoidal and random data in order to obtain a more efficient strategy for vertical self-burrowing. This strategy was able to outperform many other standard burrowing techniques and was able to automatically reach targeted burrowing depths. We hope these results will serve as a proof of concept for how optimization can be used to unlock the secrets of navigating in the subterranean world more efficiently.
PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes
Abstract
Training perception systems for self-driving cars requires substantial annotations. However, manual labeling in 2D images is highly labor-intensive. While existing datasets provide rich annotations for pre-recorded sequences, they fall short in labeling rarely encountered viewpoints, potentially hampering the generalization ability for perception models. In this paper, we present PanopticNeRF-360, a novel approach that combines coarse 3D annotations with noisy 2D semantic cues to generate consistent panoptic labels and high-quality images from any viewpoint. Our key insight lies in exploiting the complementarity of 3D and 2D priors to mutually enhance geometry and semantics. Specifically, we propose to leverage noisy semantic and instance labels in both 3D and 2D spaces to guide geometry optimization. Simultaneously, the improved geometry assists in filtering noise present in the 3D and 2D annotations by merging them in 3D space via a learned semantic field. To further enhance appearance, we combine MLP and hash grids to yield hybrid scene features, striking a balance between high-frequency appearance and predominantly contiguous semantics. Our experiments demonstrate PanopticNeRF-360's state-of-the-art performance over existing label transfer methods on the challenging urban scenes of the KITTI-360 dataset. Moreover, PanopticNeRF-360 enables omnidirectional rendering of high-fidelity, multi-view and spatiotemporally consistent appearance, semantic and instance labels. We make our code and data available at https://github.com/fuxiao0719/PanopticNeRF
Keyword: adam
There is no result
Keyword: gradient
Introspective Deep Metric Learning
Authors: Authors: Chengkun Wang, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
Abstract
This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images. Conventional deep metric learning methods focus on learning a discriminative embedding to describe the semantic features of images, which ignore the existence of uncertainty in each image resulting from noise or semantic ambiguity. Training without awareness of these uncertainties causes the model to overfit the annotated labels during training and produce unsatisfactory judgments during inference. Motivated by this, we argue that a good similarity model should consider the semantic discrepancies with awareness of the uncertainty to better deal with ambiguous images for more robust training. To achieve this, we propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively. We further propose an introspective similarity metric to make similarity judgments between images considering both their semantic differences and ambiguities. The gradient analysis of the proposed metric shows that it enables the model to learn at an adaptive and slower pace to deal with the uncertainty during training. The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling and attains state-of-the-art results on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets for image retrieval and clustering. We further provide an in-depth analysis of our framework to demonstrate the effectiveness and reliability of IDML. Code: https://github.com/wzzheng/IDML.
Energy stable neural network for gradient flow equations
Abstract
In this paper, we propose an energy stable network (EStable-Net) for solving gradient flow equations. The solution update scheme in our neural network EStable-Net is inspired by a proposed auxiliary variable based equivalent form of the gradient flow equation. EStable-Net enables decreasing of a discrete energy along the neural network, which is consistent with the property in the evolution process of the gradient flow equation. The architecture of the neural network EStable-Net consists of a few energy decay blocks, and the output of each block can be interpreted as an intermediate state of the evolution process of the gradient flow equation. This design provides a stable, efficient and interpretable network structure. Numerical experimental results demonstrate that our network is able to generate high accuracy and stable predictions.
Dual Student Networks for Data-Free Model Stealing
Authors: Authors: James Beetham, Navid Kardan, Ajmal Mian, Mubarak Shah
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Abstract
Existing data-free model stealing methods use a generator to produce samples in order to train a student model to match the target model outputs. To this end, the two main challenges are estimating gradients of the target model without access to its parameters, and generating a diverse set of training samples that thoroughly explores the input space. We propose a Dual Student method where two students are symmetrically trained in order to provide the generator a criterion to generate samples that the two students disagree on. On one hand, disagreement on a sample implies at least one student has classified the sample incorrectly when compared to the target model. This incentive towards disagreement implicitly encourages the generator to explore more diverse regions of the input space. On the other hand, our method utilizes gradients of student models to indirectly estimate gradients of the target model. We show that this novel training objective for the generator network is equivalent to optimizing a lower bound on the generator's loss if we had access to the target model gradients. We show that our new optimization framework provides more accurate gradient estimation of the target model and better accuracies on benchmark classification datasets. Additionally, our approach balances improved query efficiency with training computation cost. Finally, we demonstrate that our method serves as a better proxy model for transfer-based adversarial attacks than existing data-free model stealing methods.
Bearing and Distance Formation Control of Rigid Bodies in SE(3) with Bearing and Distance Constraints
Authors: Authors: Sara Mansourinasab, Mahdi Sojoodi, Seyed Reza Moghadasi
Abstract
Rigidity of the interaction graph is a fundamental condition for achieving the desired formation which can be defined in terms of distance or bearing constraints between agents. In this paper, for reaching a unique formation with the same scaling and orientation as the target formation, both distance and bearing constraints are considered for defining the desired formation. Besides, both distance and bearing measurements are also available. Each agent is able to gather the measurements with respect to other agents in its own body frame. So, the agents are coordinated-free concerning a global reference frame. On the other hand, the framework is embedded in SE(3). The control signal is designed based on a gradient descent method by introducing a cost function. Firstly, the formation problem is considered for bearing-only constraints in SE(3) configuration. Then, the formation control is expressed for the general case of both bearing and distance constraints. Furthermore, the essential conditions that guarantee reaching the desired formation are discussed. Finally, the validity of the proposed formation control is verified by numerical simulations.
A Hierarchy-based Analysis Approach for Blended Learning: A Case Study with Chinese Students
Authors: Authors: Yu Ye, Gongjin Zhang, Hongbiao Si, Liang Xu, Shenghua Hu, Yong Li, Xulong Zhang, Kaiyu Hu, Fangzhou Ye
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Abstract
Blended learning is generally defined as the combination of traditional face-to-face learning and online learning. This learning mode has been widely used in advanced education across the globe due to the COVID-19 pandemic's social distance restriction as well as the development of technology. Online learning plays an important role in blended learning, and as it requires more student autonomy, the quality of blended learning in advanced education has been a persistent concern. Existing literature offers several elements and frameworks regarding evaluating the quality of blended learning. However, most of them either have different favours for evaluation perspectives or simply offer general guidance for evaluation, reducing the completeness, objectivity and practicalness of related works. In order to carry out a more intuitive and comprehensive evaluation framework, this paper proposes a hierarchy-based analysis approach. Applying gradient boosting model and feature importance evaluation method, this approach mainly analyses student engagement and its three identified dimensions (behavioral engagement, emotional engagement, cognitive engagement) to eliminate some existing stubborn problems when it comes to blended learning evaluation. The results show that cognitive engagement and emotional engagement play a more important role in blended learning evaluation, implying that these two should be considered to improve for better learning as well as teaching quality.
Transferable Adversarial Attack on Image Tampering Localization
Authors: Authors: Yuqi Wang, Gang Cao, Zijie Lou, Haochen Zhu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Abstract
It is significant to evaluate the security of existing digital image tampering localization algorithms in real-world applications. In this paper, we propose an adversarial attack scheme to reveal the reliability of such tampering localizers, which would be fooled and fail to predict altered regions correctly. Specifically, the adversarial examples based on optimization and gradient are implemented for white/black-box attacks. Correspondingly, the adversarial example is optimized via reverse gradient propagation, and the perturbation is added adaptively in the direction of gradient rising. The black-box attack is achieved by relying on the transferability of such adversarial examples to different localizers. Extensive evaluations verify that the proposed attack sharply reduces the localization accuracy while preserving high visual quality of the attacked images.
Memory-based Controllers for Efficient Data-driven Control of Soft Robots
Abstract
Controller design for soft robots is challenging due to nonlinear deformation and high degrees of freedom of flexible material. The data-driven approach is a promising solution to the controller design problem for soft robots. However, the existing data-driven controller design methods for soft robots suffer from two drawbacks: (i) they require excessively long training time, and (ii) they may result in potentially inefficient controllers. This paper addresses these issues by developing two memory-based controllers for soft robots that can be trained in a data-driven fashion: the finite memory controller (FMC) approach and the long short-term memory (LSTM) based approach. An FMC stores the tracking errors at different time instances and computes the actuation signal according to a weighted sum of the stored tracking errors. We develop three reinforcement learning algorithms for computing the optimal weights of an FMC using the Q-learning, soft actor-critic, and deterministic policy gradient (DDPG) methods. An LSTM-based controller is composed of an LSTM network where the inputs of the network are the robot's desired configuration and current configuration. The LSTM network computes the required actuation signal for the soft robot to follow the desired configuration. We study the performance of the proposed approaches in controlling a soft finger where, as benchmarks, we use the existing reinforcement learning (RL) based controllers and proportional-integral-derivative (PID) controllers. Our numerical results show that the training time of the proposed memory-based controllers is significantly shorter than that of the classical RL-based controllers. Moreover, the proposed controllers achieve a smaller tracking error compared with the classical RL algorithms and the PID controller.
Learning End-to-End Channel Coding with Diffusion Models
Authors: Authors: Muah Kim, Rick Fritschek, Rafael F. Schaefer
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)
Abstract
The training of neural encoders via deep learning necessitates a differentiable channel model due to the backpropagation algorithm. This requirement can be sidestepped by approximating either the channel distribution or its gradient through pilot signals in real-world scenarios. The initial approach draws upon the latest advancements in image generation, utilizing generative adversarial networks (GANs) or their enhanced variants to generate channel distributions. In this paper, we address this channel approximation challenge with diffusion models, which have demonstrated high sample quality in image generation. We offer an end-to-end channel coding framework underpinned by diffusion models and propose an efficient training algorithm. Our simulations with various channel models establish that our diffusion models learn the channel distribution accurately, thereby achieving near-optimal end-to-end symbol error rates (SERs). We also note a significant advantage of diffusion models: A robust generalization capability in high signal-to-noise ratio regions, in contrast to GAN variants that suffer from error floor. Furthermore, we examine the trade-off between sample quality and sampling speed, when an accelerated sampling algorithm is deployed, and investigate the effect of the noise scheduling on this trade-off. With an apt choice of noise scheduling, sampling time can be significantly reduced with a minor increase in SER.
Neural Metamaterial Networks for Nonlinear Material Design
Authors: Authors: Yue Li, Stelian Coros, Bernhard Thomaszewski
Abstract
Nonlinear metamaterials with tailored mechanical properties have applications in engineering, medicine, robotics, and beyond. While modeling their macromechanical behavior is challenging in itself, finding structure parameters that lead to ideal approximation of high-level performance goals is a challenging task. In this work, we propose Neural Metamaterial Networks (NMN) -- smooth neural representations that encode the nonlinear mechanics of entire metamaterial families. Given structure parameters as input, NMN return continuously differentiable strain energy density functions, thus guaranteeing conservative forces by construction. Though trained on simulation data, NMN do not inherit the discontinuities resulting from topological changes in finite element meshes. They instead provide a smooth map from parameter to performance space that is fully differentiable and thus well-suited for gradient-based optimization. On this basis, we formulate inverse material design as a nonlinear programming problem that leverages neural networks for both objective functions and constraints. We use this approach to automatically design materials with desired strain-stress curves, prescribed directional stiffness and Poisson ratio profiles. We furthermore conduct ablation studies on network nonlinearities and show the advantages of our approach compared to native-scale optimization.
KFC: Kinship Verification with Fair Contrastive Loss and Multi-Task Learning
Authors: Authors: Jia Luo Peng, Keng Wei Chang, Shang-Hong Lai
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Kinship verification is an emerging task in computer vision with multiple potential applications. However, there's no large enough kinship dataset to train a representative and robust model, which is a limitation for achieving better performance. Moreover, face verification is known to exhibit bias, which has not been dealt with by previous kinship verification works and sometimes even results in serious issues. So we first combine existing kinship datasets and label each identity with the correct race in order to take race information into consideration and provide a larger and complete dataset, called KinRace dataset. Secondly, we propose a multi-task learning model structure with attention module to enhance accuracy, which surpasses state-of-the-art performance. Lastly, our fairness-aware contrastive loss function with adversarial learning greatly mitigates racial bias. We introduce a debias term into traditional contrastive loss and implement gradient reverse in race classification task, which is an innovative idea to mix two fairness methods to alleviate bias. Exhaustive experimental evaluation demonstrates the effectiveness and superior performance of the proposed KFC in both standard deviation and accuracy at the same time.
On the different regimes of Stochastic Gradient Descent
Authors: Authors: Antonio Sclocchi, Matthieu Wyart
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (stat.ML)
Abstract
Modern deep networks are trained with stochastic gradient descent (SGD) whose key parameters are the number of data considered at each step or batch size $B$, and the step size or learning rate $\eta$. For small $B$ and large $\eta$, SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the `temperature' $T\equiv \eta/B$. Yet this description is observed to break down for sufficiently large batches $B\geq B^$, or simplifies to gradient descent (GD) when the temperature is sufficiently small. Understanding where these cross-overs take place remains a central challenge. Here we resolve these questions for a teacher-student perceptron classification model, and show empirically that our key predictions still apply to deep networks. Specifically, we obtain a phase diagram in the $B$-$\eta$ plane that separates three dynamical phases: $\textit{(i)}$ a noise-dominated SGD governed by temperature, $\textit{(ii)}$ a large-first-step-dominated SGD and $\textit{(iii)}$ GD. These different phases also corresponds to different regimes of generalization error. Remarkably, our analysis reveals that the batch size $B^$ separating regimes $\textit{(i)}$ and $\textit{(ii)}$ scale with the size $P$ of the training set, with an exponent that characterizes the hardness of the classification problem.
$O(k)$-Equivariant Dimensionality Reduction on Stiefel Manifolds
Authors: Authors: Andrew Lee, Harlin Lee, Jose A. Perea, Nikolas Schonsheck, Madeleine Weinstein
Abstract
Many real-world datasets live on high-dimensional Stiefel and Grassmannian manifolds, $V_k(\mathbb{R}^N)$ and $Gr(k, \mathbb{R}^N)$ respectively, and benefit from projection onto lower-dimensional Stiefel (respectively, Grassmannian) manifolds. In this work, we propose an algorithm called Principal Stiefel Coordinates (PSC) to reduce data dimensionality from $ V_k(\mathbb{R}^N)$ to $V_k(\mathbb{R}^n)$ in an $O(k)$-equivariant manner ($k \leq n \ll N$). We begin by observing that each element $\alpha \in V_n(\mathbb{R}^N)$ defines an isometric embedding of $V_k(\mathbb{R}^n)$ into $Vk(\mathbb{R}^N)$. Next, we optimize for such an embedding map that minimizes data fit error by warm-starting with the output of principal component analysis (PCA) and applying gradient descent. Then, we define a continuous and $O(k)$-equivariant map $\pi\alpha$ that acts as a ``closest point operator'' to project the data onto the image of $V_k(\mathbb{R}^n)$ in $V_k(\mathbb{R}^N)$ under the embedding determined by $\alpha$, while minimizing distortion. Because this dimensionality reduction is $O(k)$-equivariant, these results extend to Grassmannian manifolds as well. Lastly, we show that the PCA output globally minimizes projection error in a noiseless setting, but that our algorithm achieves a meaningfully different and improved outcome when the data does not lie exactly on the image of a linearly embedded lower-dimensional Stiefel manifold as above. Multiple numerical experiments using synthetic and real-world data are performed.
Keyword: super-resolution
Instant Photorealistic Style Transfer: A Lightweight and Adaptive Approach
Authors: Authors: Rong Liu, Enyu Zhao, Zhiyuan Liu, Andrew Wei-Wen Feng, Scott John Easley
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
In this paper, we propose an Instant Photorealistic Style Transfer (IPST) approach, designed to achieve instant photorealistic style transfer on super-resolution inputs without the need for pre-training on pair-wise datasets or imposing extra constraints. Our method utilizes a lightweight StyleNet to enable style transfer from a style image to a content image while preserving non-color information. To further enhance the style transfer process, we introduce an instance-adaptive optimization to prioritize the photorealism of outputs and accelerate the convergence of the style network, leading to a rapid training completion within seconds. Moreover, IPST is well-suited for multi-frame style transfer tasks, as it retains temporal and multi-view consistency of the multi-frame inputs such as video and Neural Radiance Field (NeRF). Experimental results demonstrate that IPST requires less GPU memory usage, offers faster multi-frame transfer speed, and generates photorealistic outputs, making it a promising solution for various photorealistic transfer applications.
AI Foundation Models for Weather and Climate: Applications, Design, and Implementation
Authors: Authors: S. Karthik Mukkavilli, Daniel Salles Civitarese, Johannes Schmude, Johannes Jakubik, Anne Jones, Nam Nguyen, Christopher Phillips, Sujit Roy, Shraddha Singh, Campbell Watson, Raghu Ganti, Hendrik Hamann, Udaysankar Nair, Rahul Ramachandran, Kommy Weldemariam
Abstract
Machine learning and deep learning methods have been widely explored in understanding the chaotic behavior of the atmosphere and furthering weather forecasting. There has been increasing interest from technology companies, government institutions, and meteorological agencies in building digital twins of the Earth. Recent approaches using transformers, physics-informed machine learning, and graph neural networks have demonstrated state-of-the-art performance on relatively narrow spatiotemporal scales and specific tasks. With the recent success of generative artificial intelligence (AI) using pre-trained transformers for language modeling and vision with prompt engineering and fine-tuning, we are now moving towards generalizable AI. In particular, we are witnessing the rise of AI foundation models that can perform competitively on multiple domain-specific downstream tasks. Despite this progress, we are still in the nascent stages of a generalizable AI model for global Earth system models, regional climate models, and mesoscale weather models. Here, we review current state-of-the-art AI approaches, primarily from transformer and operator learning literature in the context of meteorology. We provide our perspective on criteria for success towards a family of foundation models for nowcasting and forecasting weather and climate predictions. We also discuss how such models can perform competitively on downstream tasks such as downscaling (super-resolution), identifying conditions conducive to the occurrence of wildfires, and predicting consequential meteorological phenomena across various spatiotemporal scales such as hurricanes and atmospheric rivers. In particular, we examine current AI methodologies and contend they have matured enough to design and implement a weather foundation model.
Keyword: sgd
On the different regimes of Stochastic Gradient Descent
Semi-supervised Domain Adaptation in Graph Transfer Learning
Keyword: optimization
TCGF: A unified tensorized consensus graph framework for multi-view representation learning
Instant Photorealistic Style Transfer: A Lightweight and Adaptive Approach
Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin
Dual Student Networks for Data-Free Model Stealing
Efficient Low-Rank GNN Defense Against Structural Attacks
Comparing an android head with its digital twin regarding the dynamic expression of emotions
A System-Level Energy-Efficient Digital Twin Framework for Runtime Control of Batch Manufacturing Processes
Generalizing Trajectory Retiming to Quadratic Objective Functions
QoS-Aware Service Prediction and Orchestration in Cloud-Network Integrated Beyond 5G
Computational Design of Wiring Layout on Tight Suits with Minimal Motion Resistance
Transferable Adversarial Attack on Image Tampering Localization
Safe Control Design through Risk-Tunable Control Barrier Functions
FRAMU: Attention-based Machine Unlearning using Federated Reinforcement Learning
Learning based 2D Irregular Shape Packing
Anti-Aliased Neural Implicit Surfaces with Encoding Level of Detail
FedWOA: A Federated Learning Model that uses the Whale Optimization Algorithm for Renewable Energy Prediction
Optimal Beamforming Structure for Rate Splitting Multiple Access
Decoupling the Curve Modeling and Pavement Regression for Lane Detection
An overview of some mathematical techniques and problems linking 3D vision to 3D printing
Unsupervised Deep Cross-Language Entity Alignment
Neural Metamaterial Networks for Nonlinear Material Design
A Novel Hybrid Algorithm for Optimized Solutions in Ocean Renewable Energy Industry: Enhancing Power Take-Off Parameters and Site Selection Procedure of Wave Energy Converters
Learning Adaptive Safety for Multi-Agent Systems
Asymptotically Optimal Belief Space Planning in Discrete Partially-Observable Domains
Learning-Initialized Trajectory Planning in Unknown Environments
Machine Learning-Driven Burrowing with a Snake-Like Robot
PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes
Keyword: adam
There is no result
Keyword: gradient
Introspective Deep Metric Learning
Energy stable neural network for gradient flow equations
Dual Student Networks for Data-Free Model Stealing
Bearing and Distance Formation Control of Rigid Bodies in SE(3) with Bearing and Distance Constraints
A Hierarchy-based Analysis Approach for Blended Learning: A Case Study with Chinese Students
Transferable Adversarial Attack on Image Tampering Localization
Memory-based Controllers for Efficient Data-driven Control of Soft Robots
Learning End-to-End Channel Coding with Diffusion Models
Neural Metamaterial Networks for Nonlinear Material Design
KFC: Kinship Verification with Fair Contrastive Loss and Multi-Task Learning
On the different regimes of Stochastic Gradient Descent
$O(k)$-Equivariant Dimensionality Reduction on Stiefel Manifolds
Keyword: super-resolution
Instant Photorealistic Style Transfer: A Lightweight and Adaptive Approach
AI Foundation Models for Weather and Climate: Applications, Design, and Implementation