Open sesuncedu opened 8 years ago
On the TriG/etc. issue, you should be able to parse the subsets (Turtle/N-Triples) directly using the TriG parser, although I can't recall the exact performance overhead for that compared to the dedicated parsers. Putting N-Quads in front of TriG (and pushing Turtle/N-Triples to just before TriX) could be useful even without active heuristics, as N-Quads isnt a subset of TriG but in all but the N-Triples cases will fizz out after a few lines and it won't bork on N-Triples at any stage.
Splitting off from #550 .
There are a number of heuristics that can be used to select and order parsers instead of trying every parser until one of them succeeds.
There are three sources of information that are available:
Input Streams used for content analysis might need to use some subclass of BufferedInputStream modified to fail attempts to read past marklimit (instead of only throwing the exception in reset(). This prevents a misbehaving content analyzer from messing up later analyzers.