New submissions for Mon, 8 Aug 22

Keyword: SLAM

There is no result

Keyword: odometry

There is no result

Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: lidar

A Lightweight Machine Learning Pipeline for LiDAR-simulation

Authors: Richard Marcus, Niklas Knoop, Bernhard Egger, Marc Stamminger
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2208.03130
Pdf link: https://arxiv.org/pdf/2208.03130
Abstract Virtual testing is a crucial task to ensure safety in autonomous driving, and sensor simulation is an important task in this domain. Most current LiDAR simulations are very simplistic and are mainly used to perform initial tests, while the majority of insights are gathered on the road. In this paper, we propose a lightweight approach for more realistic LiDAR simulation that learns a real sensor's behavior from test drive data and transforms this to the virtual domain. The central idea is to cast the simulation into an image-to-image translation problem. We train our pix2pix based architecture on two real world data sets, namely the popular KITTI data set and the Audi Autonomous Driving Dataset which provide both, RGB and LiDAR images. We apply this network on synthetic renderings and show that it generalizes sufficiently from real images to simulated images. This strategy enables to skip the sensor-specific, expensive and complex LiDAR physics simulation in our synthetic world and avoids oversimplification and a large domain-gap through the clean synthetic environment.
Discover the Mysteries of the Maya: Selected Contributions from the Machine Learning Challenge & The Discovery Challenge Workshop at ECML PKDD 2021
Authors: Dragi Kocev, Nikola Simidjievski, Ana Kostovska, Ivica Dimitrovski, Žiga Kokalj
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2208.03163
Pdf link: https://arxiv.org/pdf/2208.03163
Abstract The volume contains selected contributions from the Machine Learning Challenge "Discover the Mysteries of the Maya", presented at the Discovery Challenge Track of The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021). Remote sensing has greatly accelerated traditional archaeological landscape surveys in the forested regions of the ancient Maya. Typical exploration and discovery attempts, beside focusing on whole ancient cities, focus also on individual buildings and structures. Recently, there have been several successful attempts of utilizing machine learning for identifying ancient Maya settlements. These attempts, while relevant, focus on narrow areas and rely on high-quality aerial laser scanning (ALS) data which covers only a fraction of the region where ancient Maya were once settled. Satellite image data, on the other hand, produced by the European Space Agency's (ESA) Sentinel missions, is abundant and, more importantly, publicly available. The "Discover the Mysteries of the Maya" challenge aimed at locating and identifying ancient Maya architectures (buildings, aguadas, and platforms) by performing integrated image segmentation of different types of satellite imagery (from Sentinel-1 and Sentinel-2) data and ALS (lidar) data.
Keyword: loop detection

There is no result

Keyword: nerf

There is no result

Keyword: mapping

Data-free Backdoor Removal based on Channel Lipschitzness
Authors: Runkai Zheng, Rongjun Tang, Jianze Li, Li Liu
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Arxiv link: https://arxiv.org/abs/2208.03111
Pdf link: https://arxiv.org/pdf/2208.03111
Abstract Recent studies have shown that Deep Neural Networks (DNNs) are vulnerable to the backdoor attacks, which leads to malicious behaviors of DNNs when specific triggers are attached to the input images. It was further demonstrated that the infected DNNs possess a collection of channels, which are more sensitive to the backdoor triggers compared with normal channels. Pruning these channels was then shown to be effective in mitigating the backdoor behaviors. To locate those channels, it is natural to consider their Lipschitzness, which measures their sensitivity against worst-case perturbations on the inputs. In this work, we introduce a novel concept called Channel Lipschitz Constant (CLC), which is defined as the Lipschitz constant of the mapping from the input images to the output of each channel. Then we provide empirical evidences to show the strong correlation between an Upper bound of the CLC (UCLC) and the trigger-activated change on the channel activation. Since UCLC can be directly calculated from the weight matrices, we can detect the potential backdoor channels in a data-free manner, and do simple pruning on the infected DNN to repair the model. The proposed Channel Lipschitzness based Pruning (CLP) method is super fast, simple, data-free and robust to the choice of the pruning threshold. Extensive experiments are conducted to evaluate the efficiency and effectiveness of CLP, which achieves state-of-the-art results among the mainstream defense methods even without any data. Source codes are available at https://github.com/rkteddy/channel-Lipschitzness-based-pruning.
Watson-Crick conjugates of words and languages
Authors: Kalpana Mahalingam, Anuran Maity
Subjects: Formal Languages and Automata Theory (cs.FL); Combinatorics (math.CO)
Arxiv link: https://arxiv.org/abs/2208.03123
Pdf link: https://arxiv.org/pdf/2208.03123
Abstract This paper is a theoretical study of notions in combinatorics of words motivated by information being encoded as DNA strands in DNA computing. We study Watson-Crick conjugates or \theta-conjugates, a generalization of the classical notions of conjugates of a word, inspired by biomolecular computing. The Watson-Crick mapping \theta is an involution that is also an antimorphism. We study some combinatorial properties of \theta-conjugates of a word. We characterize words that have the same set of \theta-conjugates. We investigate whether or not the family of certain languages is closed under \theta-conjugate operation. We also study the iterated \theta-conjugates. We then discuss the concept of \theta-conjugate-free language and some decidability problems for \theta-conjugate-freeness for different language classes.
A Method for Deriving Technical Requirements of Digital Twins as Industrial Product-Service System Enablers
Authors: Jürgen Dobaj, Andreas Riel, Georg Macher, Markus Egretzberger
Subjects: Software Engineering (cs.SE)
Arxiv link: https://arxiv.org/abs/2208.03136
Pdf link: https://arxiv.org/pdf/2208.03136
Abstract Industrial Product-Service Systems (IPSS) are increasingly dominant in several sectors. Predominant value-adding services provided for industrial assets such as production systems, electric power plants, and car fleets are remote asset maintenance, monitoring, control, and reconfiguration. IPSS designers lack methods and tools supporting them in systematically deriving technical design requirements for the underlying Cyber-Physical System (CPS) IPSS services. At the same time, the use of Digital Twins (DTs) as digital representations of CPS as-sets is becoming increasingly feasible thanks to powerful, networked information technology (IT) and operation technology (OT) infrastructures and the ubiquity of sensors and data. This paper proposes a method for guiding IPSS designers in the specification and implementation of DT instances to serve as the key enablers of IPSS services. The systematic mapping of the continuous IT design-build-deployment cycle concept to the OT domain of CPS is at the heart of the applied methodology, which is complemented by a stakeholder-driven requirements elicitation. The key contribution is a structured method for deriving technical design requirements for DT instances as IPSS. This method is validated on real-world use cases in an evaluation environment for distributed CPS IPSS.
Improving Task Generalization via Unified Schema Prompt
Authors: Wanjun Zhong, Yifan Gao, Ning Ding, Zhiyuan Liu, Ming Zhou, Jiahai Wang, Jian Yin, Nan Duan
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2208.03229
Pdf link: https://arxiv.org/pdf/2208.03229
Abstract Task generalization has been a long standing challenge in Natural Language Processing (NLP). Recent research attempts to improve the task generalization ability of pre-trained language models by mapping NLP tasks into human-readable prompted forms. However, these approaches require laborious and inflexible manual collection of prompts, and different prompts on the same downstream task may receive unstable performance. We propose Unified Schema Prompt, a flexible and extensible prompting method, which automatically customizes the learnable prompts for each task according to the task input schema. It models the shared knowledge between tasks, while keeping the characteristics of different task schema, and thus enhances task generalization ability. The schema prompt takes the explicit data structure of each task to formulate prompts so that little human effort is involved. To test the task generalization ability of schema prompt at scale, we conduct schema prompt-based multitask pre-training on a wide variety of general NLP tasks. The framework achieves strong zero-shot and few-shot generalization performance on 16 unseen downstream tasks from 8 task types (e.g., QA, NLI, etc). Furthermore, comprehensive analyses demonstrate the effectiveness of each component in the schema prompt, its flexibility in task compositionality, and its ability to improve performance under a full-data fine-tuning setting.
Quantifying and Mitigating Popularity Bias in Conversational Recommender Systems
Authors: Shuo Lin, Jianling Wang, Ziwei Zhu, James Caverlee
Subjects: Information Retrieval (cs.IR)
Arxiv link: https://arxiv.org/abs/2208.03298
Pdf link: https://arxiv.org/pdf/2208.03298
Abstract Conversational recommender systems (CRS) have shown great success in accurately capturing a user's current and detailed preference through the multi-round interaction cycle while effectively guiding users to a more personalized recommendation. Perhaps surprisingly, conversational recommender systems can be plagued by popularity bias, much like traditional recommender systems. In this paper, we systematically study the problem of popularity bias in CRSs. We demonstrate the existence of popularity bias in existing state-of-the-art CRSs from an exposure rate, a success rate, and a conversational utility perspective, and propose a suite of popularity bias metrics designed specifically for the CRS setting. We then introduce a debiasing framework with three unique features: (i) Popularity-Aware Focused Learning to reduce the popularity-distorting impact on preference prediction; (ii) Cold-Start Item Embedding Reconstruction via Attribute Mapping, to improve the modeling of cold-start items; and (iii) Dual-Policy Learning, to better guide the CRS when dealing with either popular or unpopular items. Through extensive experiments on two frequently used CRS datasets, we find the proposed model-agnostic debiasing framework not only mitigates the popularity bias in state-of-the-art CRSs but also improves the overall recommendation performance.
Keyword: localization

Task-Balanced Distillation for Object Detection
Authors: Ruining Tang, Zhenyu Liu, Yangguang Li, Yiguo Song, Hui Liu, Qide Wang, Jing Shao, Guifang Duan, Jianrong Tan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.03006
Pdf link: https://arxiv.org/pdf/2208.03006
Abstract Mainstream object detectors are commonly constituted of two sub-tasks, including classification and regression tasks, implemented by two parallel heads. This classic design paradigm inevitably leads to inconsistent spatial distributions between classification score and localization quality (IOU). Therefore, this paper alleviates this misalignment in the view of knowledge distillation. First, we observe that the massive teacher achieves a higher proportion of harmonious predictions than the lightweight student. Based on this intriguing observation, a novel Harmony Score (HS) is devised to estimate the alignment of classification and regression qualities. HS models the relationship between two sub-tasks and is seen as prior knowledge to promote harmonious predictions for the student. Second, this spatial misalignment will result in inharmonious region selection when distilling features. To alleviate this problem, a novel Task-decoupled Feature Distillation (TFD) is proposed by flexibly balancing the contributions of classification and regression tasks. Eventually, HD and TFD constitute the proposed method, named Task-Balanced Distillation (TBD). Extensive experiments demonstrate the considerable potential and generalization of the proposed method. Specifically, when equipped with TBD, RetinaNet with ResNet-50 achieves 41.0 mAP under the COCO benchmark, outperforming the recent FGD and FRS.
TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection
Authors: Zhipeng Luo, Gongjie Zhang, Changqing Zhou, Tianrui Liu, Shijian Lu, Liang Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.03141
Pdf link: https://arxiv.org/pdf/2208.03141
Abstract 3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics. However, most existing studies focus on single point cloud frames without harnessing the temporal information in point cloud sequences. In this paper, we design TransPillars, a novel transformer-based feature aggregation technique that exploits temporal features of consecutive point cloud frames for multi-frame 3D object detection. TransPillars aggregates spatial-temporal point cloud features from two perspectives. First, it fuses voxel-level features directly from multi-frame feature maps instead of pooled instance features to preserve instance details with contextual information that are essential to accurate object localization. Second, it introduces a hierarchical coarse-to-fine strategy to fuse multi-scale features progressively to effectively capture the motion of moving objects and guide the aggregation of fine features. Besides, a variant of deformable transformer is introduced to improve the effectiveness of cross-frame feature matching. Extensive experiments show that our proposed TransPillars achieves state-of-art performance as compared to existing multi-frame detection approaches. Code will be released.
Keyword: transformer

Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification
Authors: Faris Almalik, Mohammad Yaqub, Karthik Nandakumar
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.02851
Pdf link: https://arxiv.org/pdf/2208.02851
Abstract Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging such as classification and segmentation. While the vulnerability of CNNs to adversarial attacks is a well-known problem, recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack. The vulnerability of ViTs to carefully engineered adversarial samples raises serious concerns about their safety in clinical settings. In this paper, we propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks. The proposed Self-Ensembling Vision Transformer (SEViT) leverages the fact that feature representations learned by initial blocks of a ViT are relatively unaffected by adversarial perturbations. Learning multiple classifiers based on these intermediate feature representations and combining these predictions with that of the final ViT classifier can provide robustness against adversarial attacks. Measuring the consistency between the various predictions can also help detect adversarial samples. Experiments on two modalities (chest X-ray and fundoscopy) demonstrate the efficacy of SEViT architecture to defend against various adversarial attacks in the gray-box (attacker has full knowledge of the target model, but not the defense mechanism) setting. Code: https://github.com/faresmalik/SEViT
PointConvFormer: Revenge of the Point-based Convolution
Authors: Wenxuan Wu, Qi Shan, Li Fuxin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2208.02879
Pdf link: https://arxiv.org/pdf/2208.02879
Abstract We introduce PointConvFormer, a novel building block for point cloud based deep neural network architectures. Inspired by generalization theory, PointConvFormer combines ideas from point convolution, where filter weights are only based on relative position, and Transformers which utilizes feature-based attention. In PointConvFormer, feature difference between points in the neighborhood serves as an indicator to re-weight the convolutional weights. Hence, we preserved the invariances from the point convolution operation whereas attention is used to select relevant points in the neighborhood for convolution. To validate the effectiveness of PointConvFormer, we experiment on both semantic segmentation and scene flow estimation tasks on point clouds with multiple datasets including ScanNet, SemanticKitti, FlyingThings3D and KITTI. Our results show that PointConvFormer substantially outperforms classic convolutions, regular transformers, and voxelized sparse convolution approaches with smaller, more computationally efficient networks. Visualizations show that PointConvFormer performs similarly to convolution on flat surfaces, whereas the neighborhood selection effect is stronger on object boundaries, showing that it got the best of both worlds.
LaTTe: Language Trajectory TransformEr
Authors: Arthur Bucker, Luis Figueredo, Sami Haddadin, Ashish Kapoor, Shuang Ma, Rogerio Bonatti
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2208.02918
Pdf link: https://arxiv.org/pdf/2208.02918
Abstract Natural language is one of the most intuitive ways to express human intent. However, translating instructions and commands towards robotic motion generation, and deployment in the real world, is far from being an easy task. Indeed, combining robotic's inherent low-level geometric and kinodynamic constraints with human's high-level semantic information reinvigorates and raises new challenges to the task-design problem -- typically leading to task or hardware specific solutions with a static set of action targets and commands. This work instead proposes a flexible language-based framework that allows to modify generic 3D robotic trajectories using language commands with reduced constraints about prior task or robot information. By taking advantage of pre-trained language models, we employ an auto-regressive transformer to map natural language inputs and contextual images into changes in 3D trajectories. We show through simulations and real-life experiments that the model can successfully follow human intent, modifying the shape and speed of trajectories for multiple robotic platforms and contexts. This study takes a step into building large pre-trained foundational models for robotics and shows how such models can create more intuitive and flexible interactions between human and machines. Codebase available at: https://github.com/arthurfenderbucker/NL_trajectory_reshaper.
TransMatting: Enhancing Transparent Objects Matting with Transformers
Authors: Huanqia Cai, Fanglei Xue, Lele Xu, Lili Guo
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.03007
Pdf link: https://arxiv.org/pdf/2208.03007
Abstract Image matting refers to predicting the alpha values of unknown foreground areas from natural images. Prior methods have focused on propagating alpha values from known to unknown regions. However, not all natural images have a specifically known foreground. Images of transparent objects, like glass, smoke, web, etc., have less or no known foreground. In this paper, we propose a Transformer-based network, TransMatting, to model transparent objects with a big receptive field. Specifically, we redesign the trimap as three learnable tri-tokens for introducing advanced semantic features into the self-attention mechanism. A small convolutional network is proposed to utilize the global feature and non-background mask to guide the multi-scale feature propagation from encoder to decoder for maintaining the contexture of transparent objects. In addition, we create a high-resolution matting dataset of transparent objects with small known foreground areas. Experiments on several matting benchmarks demonstrate the superiority of our proposed method over the current state-of-the-art methods.
Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution
Authors: Zhongwei Qiu, Huan Yang, Jianlong Fu, Dongmei Fu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.03012
Pdf link: https://arxiv.org/pdf/2208.03012
Abstract Compressed video super-resolution (VSR) aims to restore high-resolution frames from compressed low-resolution counterparts. Most recent VSR approaches often enhance an input frame by borrowing relevant textures from neighboring video frames. Although some progress has been made, there are grand challenges to effectively extract and transfer high-quality textures from compressed videos where most frames are usually highly degraded. In this paper, we propose a novel Frequency-Transformer for compressed video super-resolution (FTVSR) that conducts self-attention over a joint space-time-frequency domain. First, we divide a video frame into patches, and transform each patch into DCT spectral maps in which each channel represents a frequency band. Such a design enables a fine-grained level self-attention on each frequency band, so that real visual texture can be distinguished from artifacts, and further utilized for video frame restoration. Second, we study different self-attention schemes, and discover that a divided attention which conducts a joint space-frequency attention before applying temporal attention on each frequency band, leads to the best video enhancement quality. Experimental results on two widely-used video super-resolution benchmarks show that FTVSR outperforms state-of-the-art approaches on both uncompressed and compressed videos with clear visual margins. Code is available at https://github.com/researchmm/FTVSR.
TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection
Authors: Zhipeng Luo, Gongjie Zhang, Changqing Zhou, Tianrui Liu, Shijian Lu, Liang Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.03141
Pdf link: https://arxiv.org/pdf/2208.03141
Abstract 3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics. However, most existing studies focus on single point cloud frames without harnessing the temporal information in point cloud sequences. In this paper, we design TransPillars, a novel transformer-based feature aggregation technique that exploits temporal features of consecutive point cloud frames for multi-frame 3D object detection. TransPillars aggregates spatial-temporal point cloud features from two perspectives. First, it fuses voxel-level features directly from multi-frame feature maps instead of pooled instance features to preserve instance details with contextual information that are essential to accurate object localization. Second, it introduces a hierarchical coarse-to-fine strategy to fuse multi-scale features progressively to effectively capture the motion of moving objects and guide the aggregation of fine features. Besides, a variant of deformable transformer is introduced to improve the effectiveness of cross-frame feature matching. Extensive experiments show that our proposed TransPillars achieves state-of-art performance as compared to existing multi-frame detection approaches. Code will be released.
RadTex: Learning Efficient Radiograph Representations from Text Reports
Authors: Keegan Quigley, Miriam Cha, Ruizhi Liao, Geeticka Chauhan, Steven Horng, Seth Berkowitz, Polina Golland
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.03218
Pdf link: https://arxiv.org/pdf/2208.03218
Abstract Automated analysis of chest radiography using deep learning has tremendous potential to enhance the clinical diagnosis of diseases in patients. However, deep learning models typically require large amounts of annotated data to achieve high performance -- often an obstacle to medical domain adaptation. In this paper, we build a data-efficient learning framework that utilizes radiology reports to improve medical image classification performance with limited labeled data (fewer than 1000 examples). Specifically, we examine image-captioning pretraining to learn high-quality medical image representations that train on fewer examples. Following joint pretraining of a convolutional encoder and transformer decoder, we transfer the learned encoder to various classification tasks. Averaged over 9 pathologies, we find that our model achieves higher classification performance than ImageNet-supervised and in-domain supervised pretraining when labeled training data is limited.
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
Authors: Margaret Li, Suchin Gururangan, Tim Dettmers, Mike Lewis, Tim Althoff, Noah A. Smith, Luke Zettlemoyer
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2208.03306
Pdf link: https://arxiv.org/pdf/2208.03306
Abstract We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel training of large language models (LLMs). We show it is possible to independently train subparts of a new class of LLMs on different subsets of the data, eliminating the massive multi-node synchronization currently required to train LLMs. BTM learns a set of independent expert LMs (ELMs), each specialized to a different textual domain, such as scientific or legal text. These ELMs can be added and removed to update data coverage, ensembled to generalize to new domains, or averaged to collapse back to a single LM for efficient inference. New ELMs are learned by branching from (mixtures of) ELMs in the current set, further training the parameters on data for the new domain, and then merging the resulting model back into the set for future use. Experiments show that BTM improves in- and out-of-domain perplexities as compared to GPT-style Transformer LMs, when controlling for training cost. Through extensive analysis, we show that these results are robust to different ELM initialization schemes, but require expert domain specialization; LM ensembles with random data splits do not perform well. We also present a study of scaling BTM into a new corpus of 64 domains (192B whitespace-separated tokens in total); the resulting LM (22.4B total parameters) performs as well as a Transformer LM trained with 2.5 times more compute. These gains grow with the number of domains, suggesting more aggressive parallelism could be used to efficiently train larger models in future work.
Keyword: autonomous driving

Drive Right: Shaping Public's Trust, Understanding, and Preference Towards Autonomous Vehicles Using a Virtual Reality Driving Simulator
Authors: Zhijie Qiao, Xiatao Sun, Helen Loeb, Rahul Mangharam
Subjects: Human-Computer Interaction (cs.HC)
Arxiv link: https://arxiv.org/abs/2208.02939
Pdf link: https://arxiv.org/pdf/2208.02939
Abstract While autonomous vehicles are being increasingly introduced into our lives, people's misunderstanding and mistrust have become the key factors hindering the acceptance of these technologies. In response to this problem, proper work must be done to increase public's understanding and awareness of self-driving and help them rationally evaluate the system. The method proposed in this paper is a virtual reality driving simulator which serves as a low-cost and reliable platform for autonomous vehicle demonstration and education. To test its validity, we recruited 36 participants and conducted a test drive using three different scenarios. The results have shown that our simulator successfully increased participants' understanding and awareness of the autonomous system and changed their attitude to be more positive. The methodology and findings presented in this paper can be further explored by policy makers, driving schools, and auto manufacturers to improve the legislative and technical process in the field of autonomous driving.
A Lightweight Machine Learning Pipeline for LiDAR-simulation
Authors: Richard Marcus, Niklas Knoop, Bernhard Egger, Marc Stamminger
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Arxiv link: https://arxiv.org/abs/2208.03130
Pdf link: https://arxiv.org/pdf/2208.03130
Abstract Virtual testing is a crucial task to ensure safety in autonomous driving, and sensor simulation is an important task in this domain. Most current LiDAR simulations are very simplistic and are mainly used to perform initial tests, while the majority of insights are gathered on the road. In this paper, we propose a lightweight approach for more realistic LiDAR simulation that learns a real sensor's behavior from test drive data and transforms this to the virtual domain. The central idea is to cast the simulation into an image-to-image translation problem. We train our pix2pix based architecture on two real world data sets, namely the popular KITTI data set and the Audi Autonomous Driving Dataset which provide both, RGB and LiDAR images. We apply this network on synthetic renderings and show that it generalizes sufficiently from real images to simulated images. This strategy enables to skip the sensor-specific, expensive and complex LiDAR physics simulation in our synthetic world and avoids oversimplification and a large domain-gap through the clean synthetic environment.
TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection
Authors: Zhipeng Luo, Gongjie Zhang, Changqing Zhou, Tianrui Liu, Shijian Lu, Liang Pan
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Arxiv link: https://arxiv.org/abs/2208.03141
Pdf link: https://arxiv.org/pdf/2208.03141
Abstract 3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics. However, most existing studies focus on single point cloud frames without harnessing the temporal information in point cloud sequences. In this paper, we design TransPillars, a novel transformer-based feature aggregation technique that exploits temporal features of consecutive point cloud frames for multi-frame 3D object detection. TransPillars aggregates spatial-temporal point cloud features from two perspectives. First, it fuses voxel-level features directly from multi-frame feature maps instead of pooled instance features to preserve instance details with contextual information that are essential to accurate object localization. Second, it introduces a hierarchical coarse-to-fine strategy to fuse multi-scale features progressively to effectively capture the motion of moving objects and guide the aggregation of fine features. Besides, a variant of deformable transformer is introduced to improve the effectiveness of cross-frame feature matching. Extensive experiments show that our proposed TransPillars achieves state-of-art performance as compared to existing multi-frame detection approaches. Code will be released.

zhuhu00 / Paper-Daily-Notice

New submissions for Mon, 8 Aug 22 #211

New submissions for Mon, 8 Aug 22

Keyword: SLAM

Keyword: odometry

Keyword: livox

Keyword: loam

Keyword: lidar

A Lightweight Machine Learning Pipeline for LiDAR-simulation

Discover the Mysteries of the Maya: Selected Contributions from the Machine Learning Challenge & The Discovery Challenge Workshop at ECML PKDD 2021

Keyword: loop detection

Keyword: nerf

Keyword: mapping

Data-free Backdoor Removal based on Channel Lipschitzness

Watson-Crick conjugates of words and languages

A Method for Deriving Technical Requirements of Digital Twins as Industrial Product-Service System Enablers

Improving Task Generalization via Unified Schema Prompt

Quantifying and Mitigating Popularity Bias in Conversational Recommender Systems

Keyword: localization

Task-Balanced Distillation for Object Detection

TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection

Keyword: transformer

Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification

PointConvFormer: Revenge of the Point-based Convolution

LaTTe: Language Trajectory TransformEr

TransMatting: Enhancing Transparent Objects Matting with Transformers

Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection

RadTex: Learning Efficient Radiograph Representations from Text Reports

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

Keyword: autonomous driving

Drive Right: Shaping Public's Trust, Understanding, and Preference Towards Autonomous Vehicles Using a Virtual Reality Driving Simulator

A Lightweight Machine Learning Pipeline for LiDAR-simulation

TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection