jimregan / mlode

Automatically exported from code.google.com/p/mlode
0 stars 0 forks source link

New languoid dump #33

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Download http://www.glottolog.org/downloadarea/languoids.n3.tgz
2. untar (yields 10 GB)
3. check RDF
sebastian_nordhoff@lingua182:~/rdf$ rapper -i turtle 0-10000languoids.n3 
>/dev/null
rapper: Parsing URI file:///home/sebastian_nordhoff/rdf/0-10000languoids.n3 
with parser turtle
rapper: Serializing with serializer ntriples
rapper: Parsing returned 1277081 triples

The dump is split into 12 subsets for reasons of size
The dump is very large because of the recursive structure of the tree. It 
currently states that Saxonian is a member of German and that Saxonian is a 
member of West Germanic, even if this follows from German being a member of 
West Germanic. 

By the same token "Sächsisches Wörterbuch" is attached to all of Saxonian, 
Germanic, West Germanic, Indo-European

Recursive searches in relational databases are already slow, so I imagine that 
recursive searches in triple stores are even slower. This is why the dataset is 
denormalized

What is the expected output? What do you see instead?
to me, the RDF looks fine ;) but I suppose there will be some issues

How many triples are affected? (if less than 3-5% of the whole data set,
please set priority to _low_)
10^8

Please use labels and text to provide additional information.

Original issue reported on code.google.com by sebastia...@googlemail.com on 2 Aug 2012 at 9:26

GoogleCodeExporter commented 9 years ago

Original comment by kur...@googlemail.com on 25 Aug 2012 at 7:46

GoogleCodeExporter commented 9 years ago

Original comment by sebastia...@googlemail.com on 5 Sep 2012 at 11:28

GoogleCodeExporter commented 9 years ago

Original comment by mohamedd...@gmail.com on 5 Sep 2012 at 2:43