uniVocity / univocity-parsers

uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
905 stars 249 forks source link

CSV Parser is parsing single quote incorrectly. #504

Open chintan2201 opened 2 years ago

chintan2201 commented 2 years ago

Hi Univocity Team,

I am facing an issue while parsing a data with multiple records with single quote. Below is the example. All other scenarios are working fine. This scenario is working fine in all third party tools such as excel, sheet etc

Example: Input CSV:
Id,LastName,FirstName,Number,Date,Text 100, 'test, bob john, 10000, 2006-5-25, Dr Smith's Shop

Actual Output:
100,"'test,bob john,10000,2006-5-25,Dr Smith's Shop",,,,

Expected Output: 100, "'test", "bob john", 10000, 2006-5-25, "Dr Smith's Shop"

Parser setting:

CsvParserSettings settings = new CsvParserSettings();
settings.detectFormatAutomatically();
settings.setHeaderExtractionEnabled(true);
settings.setIgnoreLeadingWhitespaces(true);
settings.setIgnoreTrailingWhitespaces(true);
settings.setSkipEmptyLines(true);

// quotes inside quoted values are escaped as \"
settings.getFormat().setQuoteEscape('\\');

// but if two backslashes are found before a quote symbol they represent a single slash.
settings.getFormat().setCharToEscapeQuoteEscaping('\\');

// Max number of characters that we can read from a column
settings.setMaxCharsPerColumn(CommonUtils.MAX_CHARS_PER_COLUMN);

Note: Already tried 2.7.3 and 2.9.1(latest version)