weso / sparkwdsub

Spark processing of wikidata subsets
MIT License
0 stars 3 forks source link

Add option to show shapes validated in ouput #19

Open labra opened 2 years ago

labra commented 2 years ago

We can add an option that adds information about the shapes that have been validated together with each node. This information can be useful to debug the algorithm

labra commented 2 years ago

We have added the option to version 0.0.17 but it should be improved.

The option is called: "keepShapes" and it is a flag that can be invoked from the command line as:

sparkwdsub dump --keepShapes . . .

However, it will not work properly yet because the output generated adds to each line in the JSON a comment of the form:

// shape1, shape2, ...

and there are no comments in JSON. It would be better to embed that information in the proper JSON structure, but it is more difficult because when we serialize an entity we delegate the serialization to the wikidata toolkit serializer, i.e. the ValueWriter is invoking the JSON serializer: JsonSerializer.getJsonString(id) and we would probably need to replace that call by our own Json serializer.

Anyway, by now, it should work and we can later remove the comments if needed.