New submissions for Mon, 10 Jan 22

Keyword: SLAM

There is no result

Keyword: Visual inertial

There is no result

Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: Visual inertial odometry

There is no result

Keyword: lidar

Continuous-time Radar-inertial Odometry for Automotive Radars

Authors: Yin Zhi Ng, Benjamin Choi, Robby Tan, Lionel Heng
Subjects: Robotics (cs.RO)
Arxiv link: https://arxiv.org/abs/2201.02437
Pdf link: https://arxiv.org/pdf/2201.02437
Abstract We present an approach for radar-inertial odometry which uses a continuous-time framework to fuse measurements from multiple automotive radars and an inertial measurement unit (IMU). Adverse weather conditions do not have a significant impact on the operating performance of radar sensors unlike that of camera and LiDAR sensors. Radar's robustness in such conditions and the increasing prevalence of radars on passenger vehicles motivate us to look at the use of radar for ego-motion estimation. A continuous-time trajectory representation is applied not only as a framework to enable heterogeneous and asynchronous multi-sensor fusion, but also, to facilitate efficient optimization by being able to compute poses and their derivatives in closed-form and at any given time along the trajectory. We compare our continuous-time estimates to those from a discrete-time radar-inertial odometry approach and show that our continuous-time method outperforms the discrete-time method. To the best of our knowledge, this is the first time a continuous-time framework has been applied to radar-inertial odometry.
Keyword: loop detection

There is no result

Keyword: autonomous driving

Extending One-Stage Detection with Open-World Proposals
Authors: Sachin Konan, Kevin J Liang, Li Yin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2201.02302
Pdf link: https://arxiv.org/pdf/2201.02302
Abstract In many applications, such as autonomous driving, hand manipulation, or robot navigation, object detection methods must be able to detect objects unseen in the training set. Open World Detection(OWD) seeks to tackle this problem by generalizing detection performance to seen and unseen class categories. Recent works have seen success in the generation of class-agnostic proposals, which we call Open-World Proposals(OWP), but this comes at the cost of a big drop on the classification task when both tasks are considered in the detection model. These works have investigated two-stage Region Proposal Networks (RPN) by taking advantage of objectness scoring cues; however, for its simplicity, run-time, and decoupling of localization and classification, we investigate OWP through the lens of fully convolutional one-stage detection network, such as FCOS. We show that our architectural and sampling optimizations on FCOS can increase OWP performance by as much as 6% in recall on novel classes, marking the first proposal-free one-stage detection network to achieve comparable performance to RPN-based two-stage networks. Furthermore, we show that the inherent, decoupled architecture of FCOS has benefits to retaining classification performance. While two-stage methods worsen by 6% in recall on novel classes, we show that FCOS only drops 2% when jointly optimizing for OWP and classification.
Keyword: mapping

Embodied Hands: Modeling and Capturing Hands and Bodies Together
Authors: Javier Romero, Dimitrios Tzionas, Michael J. Black
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2201.02610
Pdf link: https://arxiv.org/pdf/2201.02610
Abstract Humans move their hands and bodies together to communicate and solve tasks. Capturing and replicating such coordinated activity is critical for virtual characters that behave realistically. Surprisingly, most methods treat the 3D modeling and tracking of bodies and hands separately. Here we formulate a model of hands and bodies interacting together and fit it to full-body 4D sequences. When scanning or capturing the full body in 3D, hands are small and often partially occluded, making their shape and pose hard to recover. To cope with low-resolution, occlusion, and noise, we develop a new model called MANO (hand Model with Articulated and Non-rigid defOrmations). MANO is learned from around 1000 high-resolution 3D scans of hands of 31 subjects in a wide variety of hand poses. The model is realistic, low-dimensional, captures non-rigid shape changes with pose, is compatible with standard graphics packages, and can fit any human hand. MANO provides a compact mapping from hand poses to pose blend shape corrections and a linear manifold of pose synergies. We attach MANO to a standard parameterized 3D body shape model (SMPL), resulting in a fully articulated body and hand model (SMPL+H). We illustrate SMPL+H by fitting complex, natural, activities of subjects captured with a 4D scanner. The fitting is fully automatic and results in full body models that move naturally with detailed hand motions and a realism not seen before in full body performance capture. The models and data are freely available for research purposes in our website (this http URL).
Keyword: localization

Extending One-Stage Detection with Open-World Proposals
Authors: Sachin Konan, Kevin J Liang, Li Yin
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2201.02302
Pdf link: https://arxiv.org/pdf/2201.02302
Abstract In many applications, such as autonomous driving, hand manipulation, or robot navigation, object detection methods must be able to detect objects unseen in the training set. Open World Detection(OWD) seeks to tackle this problem by generalizing detection performance to seen and unseen class categories. Recent works have seen success in the generation of class-agnostic proposals, which we call Open-World Proposals(OWP), but this comes at the cost of a big drop on the classification task when both tasks are considered in the detection model. These works have investigated two-stage Region Proposal Networks (RPN) by taking advantage of objectness scoring cues; however, for its simplicity, run-time, and decoupling of localization and classification, we investigate OWP through the lens of fully convolutional one-stage detection network, such as FCOS. We show that our architectural and sampling optimizations on FCOS can increase OWP performance by as much as 6% in recall on novel classes, marking the first proposal-free one-stage detection network to achieve comparable performance to RPN-based two-stage networks. Furthermore, we show that the inherent, decoupled architecture of FCOS has benefits to retaining classification performance. While two-stage methods worsen by 6% in recall on novel classes, we show that FCOS only drops 2% when jointly optimizing for OWP and classification.
Learning Target-aware Representation for Visual Tracking via Informative Interactions
Authors: Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing, Yilin Lyu, Bing Li, Weiming Hu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2201.02526
Pdf link: https://arxiv.org/pdf/2201.02526
Abstract We introduce a novel backbone architecture to improve target-perception ability of feature representation for tracking. Specifically, having observed that de facto frameworks perform feature matching simply using the outputs from backbone for target localization, there is no direct feedback from the matching module to the backbone network, especially the shallow layers. More concretely, only the matching module can directly access the target information (in the reference frame), while the representation learning of candidate frame is blind to the reference target. As a consequence, the accumulation effect of target-irrelevant interference in the shallow stages may degrade the feature quality of deeper layers. In this paper, we approach the problem from a different angle by conducting multiple branch-wise interactions inside the Siamese-like backbone networks (InBN). At the core of InBN is a general interaction modeler (GIM) that injects the prior knowledge of reference image to different stages of the backbone network, leading to better target-perception and robust distractor-resistance of candidate feature representation with negligible computation cost. The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer for improvements, as evidenced by our extensive experiments on multiple benchmarks. In particular, the CNN version (based on SiamCAR) improves the baseline with 3.2/6.9 absolute gains of SUC on LaSOT/TNL2K, respectively. The Transformer version obtains SUC scores of 65.7/52.0 on LaSOT/TNL2K, which are on par with recent state of the arts. Code and models will be released.

zhuhu00 / Paper-Daily-Notice

New submissions for Mon, 10 Jan 22 #77

Keyword: SLAM

Keyword: Visual inertial

Keyword: livox

Keyword: loam

Keyword: Visual inertial odometry

Keyword: lidar

Continuous-time Radar-inertial Odometry for Automotive Radars

Keyword: loop detection

Keyword: autonomous driving

Extending One-Stage Detection with Open-World Proposals

Keyword: mapping

Embodied Hands: Modeling and Capturing Hands and Bodies Together

Keyword: localization

Extending One-Stage Detection with Open-World Proposals

Learning Target-aware Representation for Visual Tracking via Informative Interactions