timrdf / DataFAQs

LINKED DATA QUALITY REPORTS
41 stars 7 forks source link

How does DataFAQs impact vocabulary selection? #23

Open olyerickson opened 12 years ago

olyerickson commented 12 years ago

Vocabulary selection is critically important to creating "high quality" Linked Data; indeed, in current W3C Government Linked Data (GLD) WG discussions surrounding GLD best practices, data quality is an implicit (if not explicit) driver for vocabulary selection.

So the question is, how does DataFAQs play a role in vocabulary selection? Would DataFAQs be used as part of an iterative process?

momochen commented 12 years ago

Hi John,

Vocabulary selection is critical criteria. Datafaqs will help by initially ranking the vocabulary namespaces with different priorities(scores) according to a statistical analysis on the proportion each namespaces currently contributed to the LOD cloud. For example, data with namespaces http://dbpedia.org/... might rank higher than http://tw.rpi.edu/... at the moment. Later it will take more elements into considerations, such as user's voting. So it will be a dynamic and iterative process in ranking those datasets.

Cheers Yu

On Jan 25, 2012, at 8:41 AM, John S. Erickson, Ph.D. wrote:

Vocabulary selection is critically important to creating "high quality" Linked Data; indeed, in current W3C Government Linked Data (GLD) WG discussions surrounding GLD best practices, data quality is an implicit (if not explicit) driver for vocabulary selection.

So the question is, how does DataFAQs play a role in vocabulary selection? Would DataFAQs be used as part of an iterative process?


Reply to this email directly or view it on GitHub: https://github.com/timrdf/DataFAQs/issues/23

olyerickson commented 12 years ago

Thanks Yu! Is this documented somewhere? I would like to point my colleagues on the W3C GLD WG to some statement about how DataFAQs can be/will be used as part of a vocabulary selection process.

olyerickson commented 12 years ago

Note that LOV http://labs.mondeca.com/dataset/lov/ has a vocabulary ranking system. How does their's compare with what DataFAQs can/will do?

momochen commented 12 years ago

Not explicitly at the moment. I will update the wiki page and let you know once it is done. Thanks!

Cheers Yu On Jan 25, 2012, at 8:59 AM, John S. Erickson, Ph.D. wrote:

Thanks Yu! Is this documented somewhere? I would like to point my colleagues on the W3C GLD WG to some statement about how DataFAQs can be/will be used as part of a vocabulary selection process.


Reply to this email directly or view it on GitHub: https://github.com/timrdf/DataFAQs/issues/23#issuecomment-3650920

momochen commented 12 years ago

I think there are many things we will do better.

Basically, datafaqs evaluate the quality of the dataset so not only vocabulary will be concerned but also many other relevant elements will be considered, such as self-consistency, completeness etc. Vocab is just one perspective while many more will be required to evaluate the quality of one dataset.

Besides that, what Datafaqs makes a difference is the user-involved approach and the feeback loop. This complements the statistical vocabulary ranking with more considerations from the data producer as well as the data consumer.

So I think LOV just resembles the very first step of datafaqs since vocabulary is not enough for giving evaluation on a dataset.

On Jan 25, 2012, at 9:08 AM, John S. Erickson, Ph.D. wrote:

Note that LOV http://labs.mondeca.com/dataset/lov/ has a vocabulary ranking system. How does their's compare with what DataFAQs can/will do?


Reply to this email directly or view it on GitHub: https://github.com/timrdf/DataFAQs/issues/23#issuecomment-3651064

timrdf commented 12 years ago

I made https://github.com/timrdf/DataFAQs/wiki/Assisting-vocabulary-selection