weisongwen / researchTools

useful blogs for research
193 stars 40 forks source link

Stereo-Vision Aided GNSS for Automotive Navigation in Challenging Environments #191

Open weisongwen opened 8 months ago

weisongwen commented 8 months ago
Vision-based position determination plays a significant role in robotics for simultaneous localization and mapping (SLAM) based map building. While it can be performed precisely with high cost laser ranging sensors, stereo vision based systems using low cost imaging sensors can also provide sufficient accuracy at significantly lower cost and are, therefore, gaining increased attention. In vehicular applications, positioning systems based on GNSS are used to a large extent for navigation and emergency reporting applications. To cover GNSS-challenged or GNSS-denied areas, additional sensors are necessary, where currently accelerometers combined with gyroscopes, or vehicle-internal measurements from wheel speeds and steering angle are used. For all existing solutions, the efforts for integration and calibration of the sensors are significant, or their accuracies are limited. Furthermore, most of the sensors used are specific for the navigation application and either cannot be or are currently not used for other applications such as obstacle avoidance or lane assistance. In this context vision sensors have some distinct advantages, delivering high volume and extremely versatile information. The approach proposed herein focuses on the design and accuracy constraints of multi-camera (i.e., stereo) vision aided GNSS-based automotive navigation and combines the advantages of both systems to provide a robust positioning solution with increased availability. While using a GNSS-based solution for position estimation and simultaneous vision system calibration in open-sky environments, the vision system can assist with position estimation in urban areas and be the primary system in underground environments. The amount of information delivered by a color stereo vision system is massive and pushes the limits of current embedded systems or even consumer-grade workstations. Algorithms to reduce the information to a manageable amount and extract three-dimensional information from the stereo vision system are well-known in the field of image processing. Without the a-priori knowledge of any feature-location or map, the extracted relative 3D positions of imaged feature points are tracked and, depending on the availability of GNSS measurements, are used as a complement or exclusively used to estimate the user’s ego-motion. With the assumption of estimating the user’s ego-motion by monitoring static objects’ positions, the problem of categorizing the image-captured objects into static and moving objects arises, where several constraints based on user dynamics and statistics can be used to detect the outliers caused by moving objects and exclude them from the final state estimation. Algorithms for calibrating lens distortion effects and stereo camera misalignments exist and can be applied to the proposed system in case the calibration data is not already available from the camera manufacturer, as it is the case with a commercial camera used herein. Estimating the depth to the imaged object after camera calibration reduces to a simple geometric equation, including focal length, baseline and image sensor resolution of the cameras, which are basic design variables for systems of this kind and will be analyzed in detail. Collected data sets with commercial and low-cost sensors show good reliability for corner- and object detection as well as depth estimation in laboratory environments under moderate lighting conditions. Experiments cover scenarios including sub-urban environments, dense urban environments and underground parking lots. The sensors used are a commercial calibrated CCD stereo vision camera with a relatively short baseline, an ultra-low-cost stereo vision camera with a medium baseline, and a triple-camera sensor array combining short and large baseline using 0.3 megapixel CMOS imaging sensors with wide-angle lenses. Part of the work is focusing on a high frame rate, hence high dynamics can be captured, and related effects will be mainly limited to image blurring due to long exposure time, which in turn is dependent on the quality of the sensor and the lighting conditions of the captured scene. Results compare information degradation between the CCD and CMOS image sensors for corner- and object-identification and tracking, as well as effects of sensor calibration inaccuracies on the measurements. The effect of the baseline and number of sensors used on the accuracy and robustness of the object position estimation and outlier detection, and eventually the user’s position estimation, are analyzed as part of this research. Although implementation on a real-time embedded platform is not intended yet, current techniques for combining hardware-accelerated massive parallel processing on FPGAs or ASICs with high-performance CPUs even on embedded platforms are likely to achieve real-time capabilities using the highly optimized algorithm to be developed herein. Following Moore’s law, we can expect the processing power in the next few years to increase steadily, while the sensor’s quality in terms of resolution and sensitivity increases as well. As their price and power consumption decrease, the usage of vision sensors will get major attention for a wide variety of applications. --