Open mhoangvslev opened 2 years ago
From the user's pov, I don't see the option for it. Can you give me hint?
serd already has a lax parsing mode for roughly this purpose, although (as you might expect) things can go horribly wrong with syntactically invalid Turtle or TriG documents and drop a ton of data on the floor. It works fine for line-based formats like NTriples and NQuads though.
Let's consider my second point. I am willing to fix the bug and I want to have the list of the bugs to fix instead of launch-fix-launch.
@mhoangvslev You could use serdi
on the command line to strip the bad triples out yourself before loading it. It uses the same parser, so should encounter the same errors as hdt-cpp but be much quicker to use as a tool for this. With lax parsing (-l
) it should print all the errors encountered in one run.
I usually do this from a text editor with a compilation mode that understands GCC warning syntax (vim, emacs, etc etc) so you can jump immediately to each error.
While working with dirty data, I realised that being able to skip bad rows when parsing RDF is very useful. This feature is suggested in issue #117 but was met with strong opposition. I would like to bring that up once more time, in hope that mentality might have changed since.
The program should give the option to warn-instead-of-error for these reasons: