uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
917
stars
252
forks
source link
Unicode Special Character at the Beginning is Corrupted #491
val settings = new CsvParserSettings
unescapedSettings.getFormat.setQuoteEscape('\u0000')
unescapedSettings.getFormat.setQuote('\u0000')
unescapedSettings.setUnescapedQuoteHandling(STOP_AT_DELIMITER)
unescapedSettings.setQuoteDetectionEnabled(false)
val parser = new CsvParser(settings)
val peekableData = new PushbackInputStream(data)
parser.beginParsing(peekableData)
Explicitly passing the charsetName of "UTF-8" into beginParsing is a workaround for the issue.
When parsing a file that starts with a unicode special character, the unicode special character is replaced with the replacement character,
For example, a UTF-8 file without a BOM containing
のTESTING
will be parsed as��TESTING
.This is a result of the BOM logic added.
Sample code:
Explicitly passing the charsetName of "UTF-8" into beginParsing is a workaround for the issue.