Open yyf17 opened 1 year ago
CVPR 2022
Audio-Adaptive Activity Recognition Across Video Domains
Wnet: Audio-Guided Video Semantic Segmentation via Wavelet-Based Cross-Modal Denoising Networks
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
Self-supervised object detection from audio-visual correspondence
Mix and Localize: Localizing Sound Sources from Mixtures
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
A Proposal-based Paradigm for Self-supervised Sound Source Localization in Videos
Sound and Visual Representation Learning with Multiple Pretraining Tasks
PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound
Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory
Sound-Guided Semantic Image Manipulation
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
Continuous Scene Representations for Embodied AI
Interactron: Embodied Adaptive Object Detection
Simple but Effective: CLIP Embeddings for Embodied AI Learning Embodied Object-Search Strategies from 50k Human Demonstrations Symmetry-aware Neural Architecture for Embodied Visual Exploration Continuous Scene Representations for Embodied AI
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Reinforced Structured State-Evolution for Vision-Language Navigation
Online Learning of Reusable Abstract Models for Object Goal Navigation
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Towards real-world navigation with deep differentiable planners
ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts
Coupling Vision and Proprioception for Navigation of Legged Robots
Less is More: Generating Grounded Navigation Instructions from Landmarks
What do navigation agents learn about their environment?
HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation
EnvEdit: Environment Editing for Vision-and-Language Navigation
PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning
Is Mapping Necessary for Realistic PointGoal Navigation?
Cross-modal Map Learning for Vision and Language Navigation
One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Meta Agent Teaming Active Learning for Pose Estimation
What do navigation agents learn about their environment?
HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
Hire-MLP: Vision MLP via Hierarchical Rearrangement
sound-spaces Project: RLR-Audio-Propagation
Audio Sensor