joesingo / truthdiscovery

Python3 library implementing a selection of truth-discovery algorithms
GNU General Public License v3.0
7 stars 5 forks source link

Refine synthetic dataset generation #9

Open joesingo opened 5 years ago

joesingo commented 5 years ago
joesingo commented 5 years ago

Initial work done in e59cc6c

Remains to

joesingo commented 5 years ago

Current problem is that there is a chance for sources to make no claims at all, which causes problems e.g. with Average.Log. FIXED in 6e8cad9

joesingo commented 5 years ago

Would be good to add option to make incorrect source claims more-or-less close to true values depending on source trust, e.g. a source with high trust makes claims close to true value.

This would make it possible to use claim implications based on how far apart variable guesses are (e.g. for TruthFinder).

As it stands incorrect guesses are chosen randomly, which means that a claim X=v being true does not imply anything about X=v+1.

joesingo commented 5 years ago

Once the above has been done, the data created in synthetic_experiment.py can be changed so that TruthFinder can use implications data to hopefully achieve better performance.

joesingo commented 5 years ago

See TruthFinder paper, section 4.3 for one approach for the above. They perform the following:

joesingo commented 5 years ago

See here for an existing truth discovery library including a tool to create synthetic datasets: https://github.com/daqcri/DAFNA-EA/blob/master/README.md

In particular see the PDF documentation for the synthetic data creation for some ideas.