Abstract
A common approach for modeling the environment of an autonomous vehicle are dynamic occupancy grid maps, in which the surrounding is divided into cells, each containing the occupancy and velocity state of its location. Despite the advantage of modeling arbitrary shaped objects, the used algorithms rely on hand-designed inverse sensor models and semantic information is missing. Therefore, we introduce a multi-task recurrent neural network to predict grid maps providing occupancies, velocity estimates, semantic information and the driveable area. During training, our network architecture, which is a combination of convolutional and recurrent layers, processes sequences of raw lidar data, that is represented as bird's eye view images with several height channels. The multi-task network is trained in an end-to-end fashion to predict occupancy grid maps without the usual preprocessing steps consisting of removing ground points and applying an inverse sensor model. In our evaluations, we show that our learned inverse sensor model is able to overcome some limitations of a geometric inverse sensor model in terms of representing object shapes and modeling freespace. Moreover, we report a better runtime performance and more accurate semantic predictions for our end-to-end approach, compared to our network relying on measurement grid maps as input data.
Keyword: loop detection
There is no result
Keyword: autonomous driving
There is no result
Keyword: mapping
Utility of Optical See-Through Head Mounted Displays in Augmented Reality-Assisted Surgery: A systematic review
Authors: Manuel Birlo, P.J. "Eddie'' Edwards, Matthew Clarkson, Danail Stoyanov
Abstract
This article presents a systematic review of optical see-through head mounted display (OST-HMD) usage in augmented reality (AR) surgery applications from 2013 to 2020. Articles were categorised by: OST-HMD device, surgical speciality, surgical application context, visualisation content, experimental design and evaluation, accuracy and human factors of human-computer interaction. 91 articles fulfilled all inclusion criteria. Some clear trends emerge. The Microsoft HoloLens increasingly dominates the field, with orthopaedic surgery being the most popular application (28.6\%). By far the most common surgical context is surgical guidance (n=58) and segmented preoperative models dominate visualisation (n = 40). Experiments mainly involve phantoms (n = 43) or system setup (n = 21), with patient case studies ranking third (n = 19), reflecting the comparative infancy of the field. Experiments cover issues from registration to perception with very different accuracy results. Human factors emerge as significant to OST-HMD utility. Some factors are addressed by the systems proposed, such as attention shift away from the surgical site and mental mapping of 2D images to 3D patient anatomy. Other persistent human factors remain or are caused by OST-HMD solutions, including ease of use, comfort and spatial perception issues. The significant upward trend in published articles is clear, but such devices are not yet established in the operating room and clinical studies showing benefit are lacking. A focused effort addressing technical registration and perceptual factors in the lab coupled with design that incorporates human factors considerations to solve clear clinical problems should ensure that the significant current research efforts will succeed.
Colouring the sculpture through corresponding area from 2D to 3D with augmented reality
Abstract
With the development of 3D modelling techniques and AR techniques, the traditional methods of establishing 2D to 3D relation is no longer sufficient to meet the demand for complex models and rapid relation building. This dissertation presents a prototype development implemented that creating many- to-many correspondences by marking image and 3D model regions that can be used to colouring 3D model by colouring image for the end user. After comparing the three methods in the conceptual design, I chose the creating render textures relation to further development by changing to Zeus bust model and connecting the AR environment. The results of testing each part of the prototype shows the viability of the creating render textures relation method. The advantages of it are easy to build many-to-many relations and adaptable for any model with properly UV mapping. But there are still three main limitations and future work will focus on solving them, building a database to store relation information data and printing 3D coloured models in real world.
Amplitude Spectrum Transformation for Open Compound Domain Adaptive Semantic Segmentation
Abstract
Open compound domain adaptation (OCDA) has emerged as a practical adaptation setting which considers a single labeled source domain against a compound of multi-modal unlabeled target data in order to generalize better on novel unseen domains. We hypothesize that an improved disentanglement of domain-related and task-related factors of dense intermediate layer features can greatly aid OCDA. Prior-arts attempt this indirectly by employing adversarial domain discriminators on the spatial CNN output. However, we find that latent features derived from the Fourier-based amplitude spectrum of deep CNN features hold a more tractable mapping with domain discrimination. Motivated by this, we propose a novel feature space Amplitude Spectrum Transformation (AST). During adaptation, we employ the AST auto-encoder for two purposes. First, carefully mined source-target instance pairs undergo a simulation of cross-domain feature stylization (AST-Sim) at a particular layer by altering the AST-latent. Second, AST operating at a later layer is tasked to normalize (AST-Norm) the domain content by fixing its latent to a mean prototype. Our simplified adaptation technique is not only clustering-free but also free from complex adversarial alignment. We achieve leading performance against the prior arts on the OCDA scene segmentation benchmarks.
Using 5G in Smart Cities: A Systematic Mapping Study
Abstract
5G is the fifth generation wireless network, with a set of characteristics, e.g., high bandwidth and data rates. The scenarios of using 5G include enhanced Mobile Broadband (eMBB), massive Machine Type Communications (mMTC), and ultra-Reliable and Low-Latency Communications (uRLLC). 5G is expected to support a wide variety of applications. We conducted a systematic mapping study that covers the literature published between Jan 2012 and Dec 2019 regarding using 5G in smart cities. The scenarios, architecture, technologies, challenges, and lessons learned of using 5G in smart cities are summarized and further analyzed based on 32 selected studies, and the results are that: (1) The studies are distributed over 27 publication venues. 17 studies report results based on academic studies and 13 studies use demonstration or toy examples. Only 2 studies report using 5G in smart cities based on industrial studies. 16 studies include assumptions of 5G network design or smart city scenarios. (2) The most discussed smart city scenario is transportation, followed by public safety, healthcare, city tourism, entertainment, and education. (3) 28 studies propose and/or discuss the architecture of 5G-enabled smart cities, containing smart city architecture (treating 5G as a component), 5G network architecture in smart cities, and business architecture of using 5G in smart cities. (4) The most mentioned 5G-related technologies are radio access technologies, network slicing, and edge computing. (5) Challenges are mainly about complex context, challenging requirements, and network development of using 5G in smart cities. (6) Most of the lessons learned identified are benefits regarding 5G itself or the proposed 5G-related methods in smart cities. This work provides a reflection of the past eight years of the state of the art on using 5G in smart cities, which can benefit both researchers and practitioners.
Who to Watch Next: Two-side Interactive Networks for Live Broadcast Recommendation
Abstract
With the prevalence of live broadcast business nowadays, a new type of recommendation service, called live broadcast recommendation, is widely used in many mobile e-commerce Apps. Different from classical item recommendation, live broadcast recommendation is to automatically recommend user anchors instead of items considering the interactions among triple-objects (i.e., users, anchors, items) rather than binary interactions between users and items. Existing methods based on binary objects, ranging from early matrix factorization to recently emerged deep learning, obtain objects' embeddings by mapping from pre-existing features. Directly applying these techniques would lead to limited performance, as they are failing to encode collaborative signals among triple-objects. In this paper, we propose a novel TWo-side Interactive NetworkS (TWINS) for live broadcast recommendation. In order to fully use both static and dynamic information on user and anchor sides, we combine a product-based neural network with a recurrent neural network to learn the embedding of each object. In addition, instead of directly measuring the similarity, TWINS effectively injects the collaborative effects into the embedding process in an explicit manner by modeling interactive patterns between the user's browsing history and the anchor's broadcast history in both item and anchor aspects. Furthermore, we design a novel co-retrieval technique to select key items among massive historic records efficiently. Offline experiments on real large-scale data show the superior performance of the proposed TWINS, compared to representative methods; and further results of online experiments on Diantao App show that TWINS gains average performance improvement of around 8% on ACTR metric, 3% on UCTR metric, 3.5% on UCVR metric.
MBCT: Tree-Based Feature-Aware Binning for Individual Uncertainty Calibration
Authors: Siguang Huang, Yunli Wang, Lili Mou, Huayue Zhang, Han Zhu, Chuan Yu, Bo Zheng
Abstract
Most machine learning classifiers only concern classification accuracy, while certain applications (such as medical diagnosis, meteorological forecasting, and computation advertising) require the model to predict the true probability, known as a calibrated estimate. In previous work, researchers have developed several calibration methods to post-process the outputs of a predictor to obtain calibrated values, such as binning and scaling methods. Compared with scaling, binning methods are shown to have distribution-free theoretical guarantees, which motivates us to prefer binning methods for calibration. However, we notice that existing binning methods have several drawbacks: (a) the binning scheme only considers the original prediction values, thus limiting the calibration performance; and (b) the binning approach is non-individual, mapping multiple samples in a bin to the same value, and thus is not suitable for order-sensitive applications. In this paper, we propose a feature-aware binning framework, called Multiple Boosting Calibration Trees (MBCT), along with a multi-view calibration loss to tackle the above issues. Our MBCT optimizes the binning scheme by the tree structures of features, and adopts a linear function in a tree node to achieve individual calibration. Our MBCT is non-monotonic, and has the potential to improve order accuracy, due to its learnable binning scheme and the individual calibration. We conduct comprehensive experiments on three datasets in different fields. Results show that our method outperforms all competing models in terms of both calibration error and order accuracy. We also conduct simulation experiments, justifying that the proposed multi-view calibration loss is a better metric in modeling calibration error.
Keyword: localization
Stein Particle Filter for Nonlinear, Non-Gaussian State Estimation
Authors: Fahira Afzal Maken, Fabio Ramos, Lionel Ott
Abstract
Estimation of a dynamical system's latent state subject to sensor noise and model inaccuracies remains a critical yet difficult problem in robotics. While Kalman filters provide the optimal solution in the least squared sense for linear and Gaussian noise problems, the general nonlinear and non-Gaussian noise case is significantly more complicated, typically relying on sampling strategies that are limited to low-dimensional state spaces. In this paper we devise a general inference procedure for filtering of nonlinear, non-Gaussian dynamical systems that exploits the differentiability of both the update and prediction models to scale to higher dimensional spaces. Our method, Stein particle filter, can be seen as a deterministic flow of particles, embedded in a reproducing kernel Hilbert space, from an initial state to the desirable posterior. The particles evolve jointly to conform to a posterior approximation while interacting with each other through a repulsive force. We evaluate the method in simulation and in complex localization tasks while comparing it to sequential Monte Carlo solutions.
Object-Guided Day-Night Visual Localization in Urban Scenes
Abstract
We introduce Object-Guided Localization (OGuL) based on a novel method of local-feature matching. Direct matching of local features is sensitive to significant changes in illumination. In contrast, object detection often survives severe changes in lighting conditions. The proposed method first detects semantic objects and establishes correspondences of those objects between images. Object correspondences provide local coarse alignment of the images in the form of a planar homography. These homographies are consequently used to guide the matching of local features. Experiments on standard urban localization datasets (Aachen, Extended-CMU-Season, RobotCar-Season) show that OGuL significantly improves localization results with as simple local features as SIFT, and its performance competes with the state-of-the-art CNN-based methods trained for day-to-night localization.
Point-Level Region Contrast for Object Detection Pre-Training
Authors: Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C. Berg
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Abstract
In this work we present point-level region contrast, a self-supervised pre-training approach for the task of object detection. This approach is motivated by the two key factors in detection: localization and recognition. While accurate localization favors models that operate at the pixel- or point-level, correct recognition typically relies on a more holistic, region-level view of objects. Incorporating this perspective in pre-training, our approach performs contrastive learning by directly sampling individual point pairs from different regions. Compared to an aggregated representation per region, our approach is more robust to the change in input region quality, and further enables us to implicitly improve initial region assignments via online knowledge distillation during training. Both advantages are important when dealing with imperfect regions encountered in the unsupervised setting. Experiments show point-level region contrast improves on state-of-the-art pre-training methods for object detection and segmentation across multiple tasks and datasets, and we provide extensive ablation studies and visualizations to aid understanding. Code will be made available.
Keyword: SLAM
There is no result
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
There is no result
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
A Multi-Task Recurrent Neural Network for End-to-End Dynamic Occupancy Grid Mapping
Keyword: loop detection
There is no result
Keyword: autonomous driving
There is no result
Keyword: mapping
Utility of Optical See-Through Head Mounted Displays in Augmented Reality-Assisted Surgery: A systematic review
Colouring the sculpture through corresponding area from 2D to 3D with augmented reality
Amplitude Spectrum Transformation for Open Compound Domain Adaptive Semantic Segmentation
Using 5G in Smart Cities: A Systematic Mapping Study
Who to Watch Next: Two-side Interactive Networks for Live Broadcast Recommendation
MBCT: Tree-Based Feature-Aware Binning for Individual Uncertainty Calibration
Keyword: localization
Stein Particle Filter for Nonlinear, Non-Gaussian State Estimation
Object-Guided Day-Night Visual Localization in Urban Scenes
Point-Level Region Contrast for Object Detection Pre-Training