usc-isi-i2 / kgtk

Knowledge Graph Toolkit
https://kgtk.readthedocs.io/en/latest/
MIT License
353 stars 57 forks source link

Maybe incorrectly extracted birthdate #256

Open szeke opened 3 years ago

szeke commented 3 years ago

For https://www.wikidata.org/wiki/Q16515807 the birth date is 21 century, shows up as 2010-00-00 with precision century. Why 2010? is that what is in the data?

id  node1   label   node2   rank    node2;wikidatatype
Q16515807-P106-Q33999-285a55e8-0    Q16515807   P106    Q33999  normal  wikibase-item
Q16515807-P106-Q970153-f9c11847-0   Q16515807   P106    Q970153 normal  wikibase-item
Q16515807-P1477-69fe1d-f8504ec5-0   Q16515807   P1477   'Natálie Miroslava Havelková'@cs    normal  monolingualtext
Q16515807-P19-Q155993-9c796f27-0    Q16515807   P19 Q155993 normal  wikibase-item
Q16515807-P21-Q6581072-70378435-0   Q16515807   P21 Q6581072    normal  wikibase-item
Q16515807-P2605-8cb85f-9a0573db-0   Q16515807   P2605   "292876"    normal  external-id
Q16515807-P27-Q213-98d068e5-0   Q16515807   P27 Q213    normal  wikibase-item
Q16515807-P31-Q5-3aba8c99-0 Q16515807   P31 Q5  normal  wikibase-item
Q16515807-P569-42a69c-36932550-0    Q16515807   P569    ^2010-00-00T00:00:00Z/7 normal  time
Q16515807-P735-Q28732407-65ef2f48-0 Q16515807   P735    Q28732407   normal  wikibase-item
Q16515807-P735-Q923005-d5e0f80d-0   Q16515807   P735    Q923005 normal  wikibase-item
szeke commented 3 years ago

Marking as priority 1 because the kgtk_date_year function returns it as 2010. It is possible that the problem is in the profiler, which needs to pay attention to the precision.

For now, let's understand why the year is 2010.

The whole issue was triggered by the following histogram coming from the profiler with a spike of births in 2010. It turns out that many are horses, but I found this person and sounds like the histogram is showing two problems:

image