weso / sparkwdsub

Spark processing of wikidata subsets
MIT License
0 stars 3 forks source link

Obtain a subset corresponding to the genwiki shape expression and the 2014 dump #6

Open thewillyhuman opened 2 years ago

thewillyhuman commented 2 years ago

The genwiki shape can be located at https://github.com/weso/sparkwdsub/blob/master/examples/genewiki.shex.

thewillyhuman commented 2 years ago

It raises the following exception that seems it comes from the Shape Expressions parsing

Exception in thread "main" java.lang.RuntimeException: Error at 1:106 extraneous input 'EXTRA' expecting {<EOF>, KW_ABSTRACT, KW_BASE, KW_IMPORT, KW_PREFIX, KW_START, IRIREF, PNAME_NS, PNAME_LN, BLANK_NODE_LABEL}

I think this error comes from the point where spark gets the schema file and put it into a single String line.

thewillyhuman commented 2 years ago

We are waiting for wdsub to stop processing dumps in the weso infrastructure in order to launch the spark cluster and re-execute the tests regarding this issue there.