Open yyf17 opened 2 years ago
Meta Agent Teaming Active Learning for Pose Estimation
Gong J, Fan Z, Ke Q, et al.
What do navigation agents learn about their environment?
Dwivedi K, Roig G, Kembhavi A, et al.
HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction
Zhou Z, Ye L, Wang J, et al.
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
Goyal A, Mousavian A, Paxton C, et al.
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Guo J, Tang Y, Han K, et al.
Continuous Scene Representations for Embodied AI
Gadre S Y, Ehsani K, Song S, et al.
Interactron: Embodied Adaptive Object Detection
Kotar K, Mottaghi R.
Simple but Effective: CLIP Embeddings for Embodied AI
Khandelwal A, Weihs L, Mottaghi R, et al..
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
Ramrakhya R, Undersander E, Batra D, et al.
Symmetry-aware Neural Architecture for Embodied Visual Exploration
Liu S, Okatani T.
CVPR, 2022. [Paper] [Code] [Website]
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Hong Y, Wang Z, Wu Q, et al.
Reinforced Structured State-Evolution for Vision-Language Navigation
Chen J, Gao C, Meng E, et al.
Online Learning of Reusable Abstract Models for Object Goal Navigation
Campari T, Lamanna L, Traverso P, et al.
CVPR, 2022. [Paper] [Code] [Website]
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation
Al-Halah Z, Ramakrishnan S K, Grauman K.
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation
Wang H, Liang W, Shen J, et al.
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Chen S, Guhur P L, Tapaswi M, et al.
Towards real-world navigation with deep differentiable planners
Ishida S, Henriques J F.
ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts
Lin B, Zhu Y, Chen Z, et al.
CVPR, 2022. [Paper] [Code] [Website]
Coupling Vision and Proprioception for Navigation of Legged Robots
Fu Z, Kumar A, Agarwal A, et al.
Less is More: Generating Grounded Navigation Instructions from Landmarks
Wang S, Montgomery C, Orbay J, et al.
CVPR, 2022. [Paper] [Code] [Website]
What do navigation agents learn about their environment?
Dwivedi K, Roig G, Kembhavi A, et al.
HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation
Qiao Y, Qi Y, Hong Y, et al.
EnvEdit: Environment Editing for Vision-and-Language Navigation
Li J, Tan H, Bansal M.
PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning
Ramakrishnan S K, Chaplot D S, Al-Halah Z, et al.
Is Mapping Necessary for Realistic PointGoal Navigation?
Partsey R, Wijmans E, Yokoyama N, et al.
Cross-modal Map Learning for Vision and Language Navigation
Georgakis G, Schmeckpeper K, Wanchoo K, et al.
One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Song C H, Kil J, Pan T Y, et al.
Audio-Adaptive Activity Recognition Across Video Domains
Zhang Y, Doughty H, Shao L, et al.
Wnet: Audio-Guided Video Semantic Segmentation via Wavelet-Based Cross-Modal Denoising Networks
Pan W, Shi H, Zhao Z, et al.
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Gan C, Gu Y, Zhou S, et al.
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Yang K, Marković D, Krenn S, et al.
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Jiang H, Murdock C, Ithapu V K.
CVPR, 2022. [Paper] [Code] [Website]
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Li G, Wei Y, Tian Y, et al.
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
Mercea O B, Riesch L, Koepke A, et al.
Self-supervised object detection from audio-visual correspondence
Afouras T, Asano Y M, Fagan F, et al.
CVPR, 2022. [Paper] [Code] [Website]
Mix and Localize: Localizing Sound Sources from Mixtures
Hu X, Chen Z, Owens A.
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
Song Z, Wang Y, Fan J, et al.
A Proposal-based Paradigm for Self-supervised Sound Source Localization in Videos
Xuan H, Wu Z, Yang J, et al.
CVPR, 2022. [Paper] [[Code]]() [Website]
Sound and Visual Representation Learning with Multiple Pretraining Tasks
Vasudevan A B, Dai D, Van Gool L.
CVPR, 2022. [Paper] [Code] [Website]
PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound
Yang Z, Fan X, Isler V, et al.
CVPR, 2022. [Paper] [Code] [Website]
Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory
Lee S, Kim H I, Ro Y M.
CVPR, 2022. [Paper] [Code] [Website]
Sound-Guided Semantic Image Manipulation
Lee S H, Roh W, Byeon W, et al.
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Zellers R, Lu J, Lu X, et al.
CVPR 2022
格式
Author(s)
CVPR, 2022. [Paper] [Code] [Website]
需要填充: 1)Paper Title 2) Author(s) 3) 3个“link” 4)两篇文章之间间隔一行
agent
Meta Agent Teaming Active Learning for Pose Estimation
What do navigation agents learn about their environment?
HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction
rearrangement
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Embodied
Continuous Scene Representations for Embodied AI
Interactron: Embodied Adaptive Object Detection
Simple but Effective: CLIP Embeddings for Embodied AI Learning Embodied Object-Search Strategies from 50k Human Demonstrations Symmetry-aware Neural Architecture for Embodied Visual Exploration Continuous Scene Representations for Embodied AI
Navigation
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Reinforced Structured State-Evolution for Vision-Language Navigation
Online Learning of Reusable Abstract Models for Object Goal Navigation
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Towards real-world navigation with deep differentiable planners
ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts
Coupling Vision and Proprioception for Navigation of Legged Robots
Less is More: Generating Grounded Navigation Instructions from Landmarks
What do navigation agents learn about their environment?
HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation
EnvEdit: Environment Editing for Vision-and-Language Navigation
PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning
Is Mapping Necessary for Realistic PointGoal Navigation?
Cross-modal Map Learning for Vision and Language Navigation
One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
audio
Audio-Adaptive Activity Recognition Across Video Domains
Wnet: Audio-Guided Video Semantic Segmentation via Wavelet-Based Cross-Modal Denoising Networks
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
Self-supervised object detection from audio-visual correspondence
sound
Mix and Localize: Localizing Sound Sources from Mixtures
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
A Proposal-based Paradigm for Self-supervised Sound Source Localization in Videos
Sound and Visual Representation Learning with Multiple Pretraining Tasks
PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound
Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory
Sound-Guided Semantic Image Manipulation
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes