Abstract
Simultaneous localization and mapping (SLAM) is a critical capability for any autonomous underwater vehicle (AUV). However, robust, accurate state estimation is still a work in progress when using low-cost sensors. We propose enhancing a typical low-cost sensor package using widely available and often free prior information; overhead imagery. Given an AUV's sonar image and a partially overlapping, globally-referenced overhead image, we propose using a convolutional neural network (CNN) to generate a synthetic overhead image predicting the above-surface appearance of the sonar image contents. We then use this synthetic overhead image to register our observations to the provided global overhead image. Once registered, the transformation is introduced as a factor into a pose SLAM factor graph. We use a state-of-the-art simulation environment to perform validation over a series of benchmark trajectories and quantitatively show the improved accuracy of robot state estimation using the proposed approach. We also show qualitative outcomes from a real AUV field deployment. Video attachment: https://youtu.be/_uWljtp58ks
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Cyclops: Open Platform for Scale Truck Platooning
Authors: Hyeongyu Lee, Jaegeun Park, Changjin Koo, Jong-Chan Kim, Yongsoon Eun
Abstract
Cyclops, introduced in this paper, is an open research platform for everyone that wants to validate novel ideas and approaches in the area of self-driving heavy-duty vehicle platooning. The platform consists of multiple 1/14 scale semi-trailer trucks, a scale proving ground, and associated computing, communication and control modules that enable self-driving on the proving ground. A perception system for each vehicle is composed of a lidar-based object tracking system and a lane detection/control system. The former is to maintain the gap to the leading vehicle and the latter is to maintain the vehicle within the lane by steering control. The lane detection system is optimized for truck platooning where the field of view of the front-facing camera is severely limited due to a small gap to the leading vehicle. This platform is particularly amenable to validate mitigation strategies for safety-critical situations. Indeed, a simplex structure is adopted in the embedded module for testing various fail safe operations. We illustrate a scenario where camera sensor fails in the perception system but the vehicle operates at a reduced capacity to a graceful stop. Details of the Cyclops including 3D CAD designs and algorithm source codes are released for those who want to build similar testbeds.
Keyword: loop detection
There is no result
Keyword: autonomous driving
There is no result
Keyword: mapping
Unsupervised HDR Imaging: What Can Be Learned from a Single 8-bit Video?
Authors: Francesco Banterle, Demetris Marnerides, Kurt Debattista, Thomas Bashford-Rogers
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract
Recently, Deep Learning-based methods for inverse tone-mapping standard dynamic range (SDR) images to obtain high dynamic range (HDR) images have become very popular. These methods manage to fill over-exposed areas convincingly both in terms of details and dynamic range. Typically, these methods, to be effective, need to learn from large datasets and to transfer this knowledge to the network weights. In this work, we tackle this problem from a completely different perspective. What can we learn from a single SDR video? With the presented zero-shot approach, we show that, in many cases, a single SDR video is sufficient to be able to generate an HDR video of the same quality or better than other state-of-the-art methods.
Video-driven Neural Physically-based Facial Asset for Production
Abstract
Production-level workflows for producing convincing 3D dynamic human faces have long relied on a disarray of labor-intensive tools for geometry and texture generation, motion capture and rigging, and expression synthesis. Recent neural approaches automate individual components but the corresponding latent representations cannot provide artists explicit controls as in conventional tools. In this paper, we present a new learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets. Two key components are well-structured latent spaces due to dense temporal samplings from videos and explicit facial expression controls to regulate the latent spaces. For data collection, we construct a hybrid multiview-photometric capture stage, coupling with an ultra-fast video camera to obtain raw 3D facial assets. We then model the facial expression, geometry and physically-based textures using separate VAEs with a global MLP-based expression mapping across the latent spaces, to preserve characteristics across respective attributes while maintaining explicit controls over geometry and texture. We also introduce to model the delta information as wrinkle maps for physically-base textures, achieving high-quality rendering of dynamic textures. We demonstrate our approach in high-fidelity performer-specific facial capture and cross-identity facial motion retargeting. In addition, our neural asset along with fast adaptation schemes can also be deployed to handle in-the-wild videos. Besides, we motivate the utility of our explicit facial disentangle strategy by providing promising physically-based editing results like geometry and material editing or winkle transfer with high realism. Comprehensive experiments show that our technique provides higher accuracy and visual fidelity than previous video-driven facial reconstruction and animation methods.
A Multi-Domain VNE Algorithm based on Load Balancing in the IoT networks
Abstract
Virtual network embedding is one of the key problems of network virtualization. Since virtual network mapping is an NP-hard problem, a lot of research has focused on the evolutionary algorithm's masterpiece genetic algorithm. However, the parameter setting in the traditional method is too dependent on experience, and its low flexibility makes it unable to adapt to increasingly complex network environments. In addition, link-mapping strategies that do not consider load balancing can easily cause link blocking in high-traffic environments. In the IoT environment involving medical, disaster relief, life support and other equipment, network performance and stability are particularly important. Therefore, how to provide a more flexible virtual network mapping service in a heterogeneous network environment with large traffic is an urgent problem. Aiming at this problem, a virtual network mapping strategy based on hybrid genetic algorithm is proposed. This strategy uses a dynamically calculated cross-probability and pheromone-based mutation gene selection strategy to improve the flexibility of the algorithm. In addition, a weight update mechanism based on load balancing is introduced to reduce the probability of mapping failure while balancing the load. Simulation results show that the proposed method performs well in a number of performance metrics including mapping average quotation, link load balancing, mapping cost-benefit ratio, acceptance rate and running time.
Overhead Image Factors for Underwater Sonar-based SLAM
Authors: John McConnell, Fanfei Chen, Brendan Englot
Abstract
Simultaneous localization and mapping (SLAM) is a critical capability for any autonomous underwater vehicle (AUV). However, robust, accurate state estimation is still a work in progress when using low-cost sensors. We propose enhancing a typical low-cost sensor package using widely available and often free prior information; overhead imagery. Given an AUV's sonar image and a partially overlapping, globally-referenced overhead image, we propose using a convolutional neural network (CNN) to generate a synthetic overhead image predicting the above-surface appearance of the sonar image contents. We then use this synthetic overhead image to register our observations to the provided global overhead image. Once registered, the transformation is introduced as a factor into a pose SLAM factor graph. We use a state-of-the-art simulation environment to perform validation over a series of benchmark trajectories and quantitatively show the improved accuracy of robot state estimation using the proposed approach. We also show qualitative outcomes from a real AUV field deployment. Video attachment: https://youtu.be/_uWljtp58ks
SafePicking: Learning Safe Object Extraction via Object-Level Mapping
Authors: Kentaro Wada, Stephen James, Andrew J. Davison
Abstract
Robots need object-level scene understanding to manipulate objects while reasoning about contact, support, and occlusion among objects. Given a pile of objects, object recognition and reconstruction can identify the boundary of object instances, giving important cues as to how the objects form and support the pile. In this work, we present a system, SafePicking, that integrates object-level mapping and learning-based motion planning to generate a motion that safely extracts occluded target objects from a pile. Planning is done by learning a deep Q-network that receives observations of predicted poses and a depth-based heightmap to output a motion trajectory, trained to maximize a safety metric reward. Our results show that the observation fusion of poses and depth-sensing gives both better performance and robustness to the model. We evaluate our methods using the YCB objects in both simulation and the real world, achieving safe object extraction from piles.
Keyword: localization
Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
Authors: Yair Kittenplon, Inbal Lavi, Sharon Fogel, Yarin Bar, R. Manmatha, Pietro Perona
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract
Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks \footnote {Our code will be publicly available upon publication.
Abstract
Modern development environments provide a widely used auto-correction facility for quickly repairing syntactic errors. Auto-correction cannot deal with semantic errors, which are much more difficult to repair. Automated program repair techniques, designed for repairing semantic errors, are not well-suited for interactive use while debugging, as they typically assume the existence of a high-quality test suite and take considerable time. To bridge the gap, we developed ROSE, a tool to suggest quick-yet-effective repairs of semantic errors during debugging. ROSE does not rely on a test suite. Instead, it assumes a debugger stopping point where a problem is observed. It asks the developer to quickly describe what is wrong, performs a light-weight fault localization to identify potential responsible locations, and uses a generate-and-validate strategy to produce and validate repairs. Finally, it presents the results so the developer can choose and make the appropriate repair. To assess its utility, we implemented a prototype of ROSE that works in the Eclipse IDE and applied it to two benchmarks, QuixBugs and Defects4J, for repair. ROSE was able to suggest correct repairs for 17 QuixBugs and 16 Defects4J errors in seconds.
Tiny Object Tracking: A Large-scale Dataset and A Baseline
Authors: Yabin Zhu, Chenglong Li, Yao Liu, Xiao Wang, Jin Tang, Bin Luo, Zhixiang Huang
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
Tiny objects, frequently appearing in practical applications, have weak appearance and features, and receive increasing interests in meany vision tasks, such as object detection and segmentation. To promote the research and development of tiny object tracking, we create a large-scale video dataset, which contains 434 sequences with a total of more than 217K frames. Each frame is carefully annotated with a high-quality bounding box. In data creation, we take 12 challenge attributes into account to cover a broad range of viewpoints and scene complexities, and annotate these attributes for facilitating the attribute-based performance analysis. To provide a strong baseline in tiny object tracking, we propose a novel Multilevel Knowledge Distillation Network (MKDNet), which pursues three-level knowledge distillations in a unified framework to effectively enhance the feature representation, discrimination and localization abilities in tracking tiny objects. Extensive experiments are performed on the proposed dataset, and the results prove the superiority and effectiveness of MKDNet compared with state-of-the-art methods. The dataset, the algorithm code, and the evaluation code are available at https://github.com/mmic-lcl/Datasets-and-benchmark-code.
Patch-NetVLAD+: Learned patch descriptor and weighted matching strategy for place recognition
Abstract
Visual Place Recognition (VPR) in areas with similar scenes such as urban or indoor scenarios is a major challenge. Existing VPR methods using global descriptors have difficulty capturing local specific regions (LSR) in the scene and are therefore prone to localization confusion in such scenarios. As a result, finding the LSR that are critical for location recognition becomes key. To address this challenge, we introduced Patch-NetVLAD+, which was inspired by patch-based VPR researches. Our method proposed a fine-tuning strategy with triplet loss to make NetVLAD suitable for extracting patch-level descriptors. Moreover, unlike existing methods that treat all patches in an image equally, our method extracts patches of LSR, which present less frequently throughout the dataset, and makes them play an important role in VPR by assigning proper weights to them. Experiments on Pittsburgh30k and Tokyo247 datasets show that our approach achieved up to 6.35\% performance improvement than existing patch-based methods.
Overhead Image Factors for Underwater Sonar-based SLAM
Authors: John McConnell, Fanfei Chen, Brendan Englot
Abstract
Simultaneous localization and mapping (SLAM) is a critical capability for any autonomous underwater vehicle (AUV). However, robust, accurate state estimation is still a work in progress when using low-cost sensors. We propose enhancing a typical low-cost sensor package using widely available and often free prior information; overhead imagery. Given an AUV's sonar image and a partially overlapping, globally-referenced overhead image, we propose using a convolutional neural network (CNN) to generate a synthetic overhead image predicting the above-surface appearance of the sonar image contents. We then use this synthetic overhead image to register our observations to the provided global overhead image. Once registered, the transformation is introduced as a factor into a pose SLAM factor graph. We use a state-of-the-art simulation environment to perform validation over a series of benchmark trajectories and quantitatively show the improved accuracy of robot state estimation using the proposed approach. We also show qualitative outcomes from a real AUV field deployment. Video attachment: https://youtu.be/_uWljtp58ks
Keyword: SLAM
Overhead Image Factors for Underwater Sonar-based SLAM
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Cyclops: Open Platform for Scale Truck Platooning
Keyword: loop detection
There is no result
Keyword: autonomous driving
There is no result
Keyword: mapping
Unsupervised HDR Imaging: What Can Be Learned from a Single 8-bit Video?
Video-driven Neural Physically-based Facial Asset for Production
A Multi-Domain VNE Algorithm based on Load Balancing in the IoT networks
Overhead Image Factors for Underwater Sonar-based SLAM
SafePicking: Learning Safe Object Extraction via Object-Level Mapping
Keyword: localization
Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
A Quick Repair Facility for Debugging
Tiny Object Tracking: A Large-scale Dataset and A Baseline
Patch-NetVLAD+: Learned patch descriptor and weighted matching strategy for place recognition
Overhead Image Factors for Underwater Sonar-based SLAM