hadyelsahar / extraction-framework

The software used to extract structured data from Wikipedia
1 stars 0 forks source link

Missing Wikidata node #1

Open JulienCojan opened 11 years ago

JulienCojan commented 11 years ago

Hi Hady,

Thanks for this work, I am happy to get lingks with Wikidata. But it seems some nodes from Wikidata where not included. For instance, I could not find Q19675 (http://www.wikidata.org/wiki/Q19675 for the Louvre museum) in the sameAs dump. Idem for Q90 (http://www.wikidata.org/wiki/Q90 for Paris). It must be a consequence, the corresponding interlanguage links are missing too, I didn't check in other datasets.

Cheers, Julien

hadyelsahar commented 11 years ago

yeah , normally if you didn't find an item in the LL dump you wouldn't find it in the sameAs Dump , because both has the same resource.

i'll check that and get back to you , thanks

hadyelsahar commented 11 years ago

because running the extraction on the large Wikidata dump takes a lot of time , i tried it again on a subset of the dump that contains the Louvre page and it worked fine and it was contained in all output. dumps ( LL , labels , sameas )

i'm not sure what causes this , but maybe it will be fixed in the next run iteration i'll also after make statistics with how many pages in Wikidata and how many pages extracted to keep track of lost entities