Open joesingo opened 5 years ago
Initial work done in e59cc6c
Remains to
Current problem is that there is a chance for sources to make no claims at all, which causes problems e.g. with Average.Log. FIXED in 6e8cad9
Would be good to add option to make incorrect source claims more-or-less close to true values depending on source trust, e.g. a source with high trust makes claims close to true value.
This would make it possible to use claim implications based on how far apart variable guesses are (e.g. for TruthFinder).
As it stands incorrect guesses are chosen randomly, which means that a claim X=v
being true does not imply anything about X=v+1
.
Once the above has been done, the data created in synthetic_experiment.py
can be changed so that TruthFinder can use implications data to hopefully achieve better performance.
See TruthFinder paper, section 4.3 for one approach for the above. They perform the following:
z
uniformly from [1000, 10000]
z / 2
, placed so that the true value z
appears at any position within the interval with equal probabilitySee here for an existing truth discovery library including a tool to create synthetic datasets: https://github.com/daqcri/DAFNA-EA/blob/master/README.md
In particular see the PDF documentation for the synthetic data creation for some ideas.