rdfhdt / hdt-java

HDT Java library and tools.
Other
94 stars 69 forks source link

Fix "IllegalFormatException or IllegalArgumentException while reading RDF with B-Nodes in two-pass mode" #158 #159

Closed ate47 closed 2 years ago

ate47 commented 2 years ago

In this pull request, I've fixed the issue #158, it add a new class MultiPassBNodeParser to clear a RDFParserCallback.RDFCallback and return the same BNodes id at each parse, the bnodes names are still random, but we can now control the random seed with the HDTOptions "loader.bnode.seed" (0 = random).

It also impact the One-Pass parser, so if we set a seed, we will get the same bnode names with a Two or One pass parser, removing the randomness if needed.

Edit:

In this pull request, I added to the API in the RDF parser an option to ask to keep blank nodes or node while parsing, this feature is then used to fix the issue #158 with the 2-pass parsing.

D063520 commented 2 years ago

Hola, looks good, have a question, why to we have the parameter boolean keepBNode. Should it not be the default? Is there a use case where we set it to "false"

ate47 commented 2 years ago

Because maybe someone is using the API's parser with the random as a feature and removing it would break his code, so to be sure, we need to ask if we want to keep or not the blank nodes.

D063520 commented 2 years ago

ok!