Truth Discovery in DBGroup@SUSTech
This is my graduation project for my bachelor’s degree.
Within this project, we implement several present algorithms in Truth Discovery, including LTM (VLDB 2012), DART (VLDB 2018) and some naive solution as Majority Vote. We propose a two-stage model to infer truth from conflicting data.
First stage: estimate source quality based on quality measure, i.e. recall and specificity.
Second stage: use estimated source quality for initialization and perform truth discovery.
Thank Xueling LIN for sharing the data used in her work (Domain-Aware Multi-Truth Discovery from Conflicting Sources, VLDB 2018).
Orginal data contains two datasets:
We perform data cleaning on both datasets and use them as inputs for our model.
The dataset downloading links provided by Xueling are:
The validation database for our experiments is available under the validation
directory. We have selected 400 books and 320 movies with conflicts from the raw database for validations.