Some datasets about other tasks....

lartpang commented 4 years ago

注意

这里记录一些偶然间碰到的与显著性或者不好归类的数据集.

lartpang commented 4 years ago

MVS-Synth Dataset

简单介绍

MVS-Synth Dataset is a photo-realistic synthetic dataset prepared for learning-based Multi-View Stereo algorithms. It consists of 120 sequences, each with 100 frames of urban scenes captured in the video game Grand Theft Auto V.[note] The RGB image, the ground truth depth map, and the camera parameters of each frame are provided.

Compared to other synthetic datasets, MVS-Synth Dataset is more realistic in terms of context and shading, and compared to real-world datasets, MVS-Synth provides complete ground truth disparities which cover regions such as the sky, reflective surfaces, and thin structures, whose ground truths are usually missing in real-world datasets.

GTA-3D Dataset

简单介绍

A dataset of 2D imagery, 3D point cloud data, and 3D vehicle bounding box labels all generated using the Grand Theft Auto 5 game engine. The dataset contains image and depth map data captured at 1680x1050 resolution and oriented 3D bounding box labels of all vehicles. It is 55GB in total.

DIH: Depth Images with Humans

简单介绍

The DIH dataset has been created for human body landmark detection and human pose estimation from depth images. Training and deploying good models for these tasks require large amounts of data with high quality annotations. Unfortunately, obtaining precise manual annotation of depth images with body parts is hampered by the fact that people appear roughly as blobs and the annotation task is very time consuming. Synthesizing images provides an easy way to introduce variability in body pose, view perspective and high quality annotations can be easily generated.

However, synthetic depth images does not match real depth data in several aspects: visual characteristics that arise from the depth image generation, i.e. measurement noise, the most problematically depth discontinuity and missing measurements. Hence, real data with annotations is required to fill the detection performance gap this data mismatch can provoke in real data. With this is mind, the DIH dataset also provides real data with annotations for Kinect 2. This real data can be used for both finetuning and testing.

The DIH dataset contains a set of synthetic images and a set of images acquired with a Kinect 2 depth sensor as detailed below. Both dataset contain annotations of 17 body landmarks: head, neck, shoulders, elbows, wrists, hips, knees, ankles and eyes (Figure 1(a)). You are encouraged to see our example code of how to load and visualize the data.

LARGE-SCALE RGB+D DATABASE

简单介绍

We introduce an RGB-D scene dataset consisting of more than 200 indoor / outdoor scenes. This dataset contains synchronized RGB-D frames from both Kinect v2 and Zed stereo camera. For the outdoor scene, we first generate disparity maps using an accurate stereo matching method and convert them using calibration parameters. A per-pixel confidence map of disparity is also provided. Our scenes are captured at various places, e.g., offices, rooms, dormitory, exhibition center, street, road etc., from Yonsei University and Ewha University

This dataset has been used to train convolutional neural networks in our project [1] and for our papers[2], [3], [4], [5] "High quality 2D-to-multiview contents generation from large-scale RGB-D database", under Grant by the Institute for Information and Communications Technology Promotion(IITP) through the Korean Government(MSIP)(R0115-16-1007)

B-T4SA

简单介绍

Much progress has been made in the field of sentiment analysis in the past years. Researchers relied on textual data for this task, while only recently they have started investigating approaches to predict sentiments from multimedia content. With the increasing amount of data shared on social media, there is also a rapidly growing interest in approaches that work ``in the wild'', i.e. that are able to deal with uncontrolled conditions. In this work, we faced the challenge of training a visual sentiment classifier starting from a large set of user-generated and unlabeled contents. In particular, we collected more than 3 million tweets containing both text and images, and we leveraged on the sentiment polarity of the textual contents to train a visual sentiment classifier. To the best of our knowledge, this is the first time that a cross-media learning approach is proposed and tested in this context. We assessed the validity of our model by conducting comparative studies and evaluations on a benchmark for visual sentiment analysis. Our empirical study shows that although the text associated to each image is often noisy and weakly correlated with the image content, it can be profitably exploited to train a deep Convolutional Neural Network that effectively predicts the sentiment polarity of previously unseen images. The dataset used in our experiments, named T4SA (Twitter for Sentiment Analysis), is available on this page.

lartpang / awesome-segmentation-saliency-dataset