CVPR 2022

格式

Paper Title
Author(s)
CVPR, 2022. [Paper] [Code] [Website]

需要填充： 1）Paper Title 2） Author(s) 3） 3个“link” 4）两篇文章之间间隔一行

agent

Meta Agent Teaming Active Learning for Pose Estimation

What do navigation agents learn about their environment?

HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction

rearrangement

IFOR: Iterative Flow Minimization for Robotic Object Rearrangement

Hire-MLP: Vision MLP via Hierarchical Rearrangement

Embodied

Continuous Scene Representations for Embodied AI

Interactron: Embodied Adaptive Object Detection

Simple but Effective: CLIP Embeddings for Embodied AI Learning Embodied Object-Search Strategies from 50k Human Demonstrations Symmetry-aware Neural Architecture for Embodied Visual Exploration Continuous Scene Representations for Embodied AI

Navigation

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

Reinforced Structured State-Evolution for Vision-Language Navigation

Online Learning of Reusable Abstract Models for Object Goal Navigation

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

Towards real-world navigation with deep differentiable planners

ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts

Coupling Vision and Proprioception for Navigation of Legged Robots

Less is More: Generating Grounded Navigation Instructions from Landmarks

What do navigation agents learn about their environment?

HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation

EnvEdit: Environment Editing for Vision-and-Language Navigation

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

Is Mapping Necessary for Realistic PointGoal Navigation?

Cross-modal Map Learning for Vision and Language Navigation

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

audio

Audio-Adaptive Activity Recognition Across Video Domains

Wnet: Audio-Guided Video Semantic Segmentation via Wavelet-Based Cross-Modal Denoising Networks

Finding Fallen Objects Via Asynchronous Audio-Visual Integration

Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

Self-supervised object detection from audio-visual correspondence

sound

Mix and Localize: Localizing Sound Sources from Mixtures

Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

A Proposal-based Paradigm for Self-supervised Sound Source Localization in Videos

Sound and Visual Representation Learning with Multiple Pretraining Tasks

PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound

Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory

Sound-Guided Semantic Image Manipulation

MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound

Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Agent

Meta Agent Teaming Active Learning for Pose Estimation

Gong J, Fan Z, Ke Q, et al.

CVPR, 2022. [Paper] [Code] [Website]
What do navigation agents learn about their environment?

Dwivedi K, Roig G, Kembhavi A, et al.

CVPR, 2022. [Paper] [Code] [Website]
HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction

Zhou Z, Ye L, Wang J, et al.

CVPR, 2022. [Paper] [Code] [Website]

Rearrangement

IFOR: Iterative Flow Minimization for Robotic Object Rearrangement

Goyal A, Mousavian A, Paxton C, et al.

CVPR, 2022. [Paper] [Code] [Website]
Hire-MLP: Vision MLP via Hierarchical Rearrangement

Guo J, Tang Y, Han K, et al.

CVPR, 2022. [Paper] [Code] [Website]

Embodied

Continuous Scene Representations for Embodied AI

Gadre S Y, Ehsani K, Song S, et al.

CVPR, 2022. [Paper] [Code] [Website]
Interactron: Embodied Adaptive Object Detection

Kotar K, Mottaghi R.

CVPR, 2022. [Paper] [Code] [Website]
Simple but Effective: CLIP Embeddings for Embodied AI

Khandelwal A, Weihs L, Mottaghi R, et al..

CVPR, 2022. [Paper] [Code] [Website]
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale

Ramrakhya R, Undersander E, Batra D, et al.

CVPR, 2022. [Paper] [Code] [Website]
Symmetry-aware Neural Architecture for Embodied Visual Exploration

Liu S, Okatani T.

CVPR, 2022. [Paper] [Code] [Website]

Navigation

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

Hong Y, Wang Z, Wu Q, et al.

CVPR, 2022. [Paper] [Code] [Website]
Reinforced Structured State-Evolution for Vision-Language Navigation

Chen J, Gao C, Meng E, et al.

CVPR, 2022. [Paper] [Code] [Website]
Online Learning of Reusable Abstract Models for Object Goal Navigation

Campari T, Lamanna L, Traverso P, et al.

CVPR, 2022. [Paper] [Code] [Website]
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

Al-Halah Z, Ramakrishnan S K, Grauman K.

CVPR, 2022. [Paper] [Code] [Website]
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

Wang H, Liang W, Shen J, et al.

CVPR, 2022. [Paper] [Code] [Website]
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

Chen S, Guhur P L, Tapaswi M, et al.

CVPR, 2022. [Paper] [Code] [Website]
Towards real-world navigation with deep differentiable planners

Ishida S, Henriques J F.

CVPR, 2022. [Paper] [Code] [Website]
ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts

Lin B, Zhu Y, Chen Z, et al.

CVPR, 2022. [Paper] [Code] [Website]
Coupling Vision and Proprioception for Navigation of Legged Robots

Fu Z, Kumar A, Agarwal A, et al.

CVPR, 2022. [Paper] [Code] [Website]
Less is More: Generating Grounded Navigation Instructions from Landmarks

Wang S, Montgomery C, Orbay J, et al.

CVPR, 2022. [Paper] [Code] [Website]
What do navigation agents learn about their environment?

Dwivedi K, Roig G, Kembhavi A, et al.

CVPR, 2022. [Paper] [Code] [Website]
HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation

Qiao Y, Qi Y, Hong Y, et al.

CVPR, 2022. [Paper] [Code] [Website]
EnvEdit: Environment Editing for Vision-and-Language Navigation

Li J, Tan H, Bansal M.

CVPR, 2022. [Paper] [Code] [Website]
PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

Ramakrishnan S K, Chaplot D S, Al-Halah Z, et al.

CVPR, 2022. [Paper] [[Code]]() [Website]
Is Mapping Necessary for Realistic PointGoal Navigation?

Partsey R, Wijmans E, Yokoyama N, et al.

CVPR, 2022. [Paper] [Code] [Website]
Cross-modal Map Learning for Vision and Language Navigation

Georgakis G, Schmeckpeper K, Wanchoo K, et al.

CVPR, 2022. [Paper] [Code] [Website]
One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

Song C H, Kil J, Pan T Y, et al.

CVPR, 2022. [Paper] [Code] [Website]

Audio

Audio-Adaptive Activity Recognition Across Video Domains

Zhang Y, Doughty H, Shao L, et al.

CVPR, 2022. [Paper] [Code] [Website]
Wnet: Audio-Guided Video Semantic Segmentation via Wavelet-Based Cross-Modal Denoising Networks

Pan W, Shi H, Zhao Z, et al.

CVPR, 2022. [Paper] [Code] [Website]
Finding Fallen Objects Via Asynchronous Audio-Visual Integration

Gan C, Gu Y, Zhou S, et al.

CVPR, 2022. [Paper] [Code] [Website]
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

Yang K, Marković D, Krenn S, et al.

CVPR, 2022. [Paper] [Code] [Website]
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization

Jiang H, Murdock C, Ithapu V K.

CVPR, 2022. [Paper] [Code] [Website]
Learning to Answer Questions in Dynamic Audio-Visual Scenarios

Li G, Wei Y, Tian Y, et al.

CVPR, 2022. [Paper] [Code] [Website]
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language

Mercea O B, Riesch L, Koepke A, et al.

CVPR, 2022. [Paper] [Code] [Website]
Self-supervised object detection from audio-visual correspondence

Afouras T, Asano Y M, Fagan F, et al.

CVPR, 2022. [Paper] [Code] [Website]

Sound

Mix and Localize: Localizing Sound Sources from Mixtures

Hu X, Chen Z, Owens A.

CVPR, 2022. [Paper] [Code] [Website]
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Song Z, Wang Y, Fan J, et al.

CVPR, 2022. [Paper] [Code] [Website]
A Proposal-based Paradigm for Self-supervised Sound Source Localization in Videos

Xuan H, Wu Z, Yang J, et al.

CVPR, 2022. [Paper] [[Code]]() [Website]
Sound and Visual Representation Learning with Multiple Pretraining Tasks

Vasudevan A B, Dai D, Van Gool L.

CVPR, 2022. [Paper] [Code] [Website]
PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound

Yang Z, Fan X, Isler V, et al.

CVPR, 2022. [Paper] [Code] [Website]
Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory

Lee S, Kim H I, Ro Y M.

CVPR, 2022. [Paper] [Code] [Website]
Sound-Guided Semantic Image Manipulation

Lee S H, Roh W, Byeon W, et al.

CVPR, 2022. [Paper] [Code] [Website]
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound

Zellers R, Lu J, Lu X, et al.

CVPR, 2022. [Paper] [Code] [Website]

yyf17 / NavigationProject

CVPR 2022 #8

agent

rearrangement

Embodied

Navigation

audio

sound

Agent

Rearrangement

Embodied

Navigation

Audio

Sound