spencermountain / wikidata-freebase

helping out in the wikidata migration
7 stars 2 forks source link

Data flow & provenance? #3

Open tfmorris opened 9 years ago

tfmorris commented 9 years ago

Cool project. Any chance you could add a few more details as to the provenance of the various files and the data flow?

I'm guessing that the topic map, mappings.json, comes from the Freebase RDF (https://developers.google.com/freebase/data#freebase-wikidata-mappings) not Wikidata, since Wikidata has, somewhat bizarrely, only incorporated 1.1M of the 2.1M mappings (http://www.wikidata.org/wiki/Wikidata:Database_reports/List_of_properties/Top100).

Is this list of 295 property IDs in property_names.js derived from this list of topics somehow or does it come from an external source?

If the output is something that's useful and at least relatively stable, it might be a nice convenience to include a snapshot in the repo. I'm going to poke around a little more and if I figure out enough to be useful, I'll send a PR with updates for the ReadMe, but you probably can do it a lot more easily. :-)

spencermountain commented 9 years ago

hey thanks tom, exactly! Given that the 4.7 wikipedia articles are in both WD and freebase, it is completely insane that both projects have chosen to link only ~1m of the topics. I can't believe how different the cultures are between both projects. It's crazy-making.

yeah, i forget where the property_names are from exactly, they're just used so they can be pretty-printed here for denny, though I'd love to do a bigger run at this. More data means more results. happy to merge anything you do. Honestly, i've given up on the WD import, wikidata is just shit. I'm really impressed with the latest dbpedia version, and they seem to be way more aligned with the freebase ethos, though its sad-times -spence

dav009 commented 9 years ago

@tfmorris I've done this with a bit more than 1M, Im trying to make the code less dependent, as I was querying two local neo4j databases.

@spencermountain sad to hear that :(, Im looking for a freebase alternative and it is turning out to be a hard choice. On one hand I love wikidata snak's design. But in terms of coverage it is still a bit poor. It will take some years to get mature enough I guess