Closed Selina-Mutz closed 4 years ago
the problem was that the P31:Q515 claim for "Frankfurt am Main" (Q1794) has a normal
rank, while other of its P31 claims have a preferred
rank, making non-preferred
claims be considered non-truthy (see https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Truthy_statements). The case you describe probably being the expected behavior in most cases, I made a patch (b13c7ca ) to include non-truthy claims in the filter test, it is now published as wikibase-dump-filter@v5.0.1
(beware of the module name change)
as for the counting problem, there as been some fixes and improvement in the last versions, please retry with the latest version and open a dedicated issue if that kind of problem is still happening
I tried this claim:
--claim 'P31:Q515,Q7930989,Q15284'
to get all cities and municipalities from around the world. I found that "Frankfurt am Main" isn't in the resulting file, even though it is an instance of "city" (Q515). "Frankfurt am Main" also has other items in the "instance of" property but they shouldn't be affecting the outcome, right? Also similar entities like "Munich", which also have multiple items in that property next to the "city" item and are in the resulting file. I noticed that the filter shows this after finishing:in: 1736 | total: 9762074 | last entity in: Q84908318
. If I understand it correctly this means 1736 entities have been filtered from 9.7 Million. However, the resulting file has over 14 000 lines, of which each is an entity, right? How does this fit together?