uniVocity / univocity-parsers

uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.
905 stars 249 forks source link

CSV Reader does not escape ASCII control characters #512

Open nandinir-db opened 2 years ago

nandinir-db commented 2 years ago
import java.io.*;
import java.util.*;
import com.univocity.parsers.csv.*;

public class Test {

    public static void main(String ... args){
        CsvParserSettings settings = new CsvParserSettings();
        settings.getFormat().setLineSeparator("\n");
        settings.getFormat().setQuote('\u0012');
        settings.getFormat().setQuoteEscape('\u0012');
        // RAISE_ERROR // STOP_AT_CLOSING_QUOTE
        UnescapedQuoteHandling u= UnescapedQuoteHandling.valueOf("RAISE_ERROR");
        settings.setUnescapedQuoteHandling(u);

        settings.setParseUnescapedQuotes(true);
        CsvParser parser = new CsvParser(settings);

        String line1 = "\u00127\u0012,\u0012EmbeddedDouble\u0012,\u0012field\u0012\u0012 t\u0012\u0012ext\u0012,\u0012field\u0012\u0012 t\u0012\u0012ext\u0012";

        System.out.println("Input line: " + line1);

        List<String[]> allLines = parser.parseAll(new StringReader(line1));

        int count = 0;
        for(String[] line : allLines){
            System.out.println("Line " + ++count);
            for(String element : line){
                System.out.println("\t" + element);
            }
            System.out.println();
        }
    }
}

Error:

/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/bin/java -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=51890:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8 -classpath /Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/deploy.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/ext/jaccess.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/javaws.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/jfxswt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/management-agent.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/plugin.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_333.jdk/Contents/Home/jre/lib/rt.jar:/Users/nandini.r/IdeaProjects/JarTest/out/production/JarTest:/Users/nandini.r/Desktop/univocity-parsers-2.8.4.jar Test
Input line: 7,EmbeddedDouble,field text,field text
Exception in thread "main" com.univocity.parsers.common.TextParsingException: com.univocity.parsers.common.TextParsingException - Unexpected character 't' following quoted value of CSV field. Expecting ','. Cannot parse CSV input.
Internal state when error was thrown: line=0, column=2, record=0, charIndex=31, content parsed=field
Parser Configuration: CsvParserSettings:
    Auto configuration enabled=true
    Auto-closing enabled=true
    Autodetect column delimiter=false
    Autodetect quotes=false
    Column reordering enabled=true
    Delimiters for detection=null
    Empty value=null
    Escape unquoted values=false
    Header extraction enabled=null
    Headers=null
    Ignore leading whitespaces=true
    Ignore leading whitespaces in quotes=false
    Ignore trailing whitespaces=true
    Ignore trailing whitespaces in quotes=false
    Input buffer size=1048576
    Input reading on separate thread=true
    Keep escape sequences=false
    Keep quotes=false
    Length of content displayed on error=-1
    Line separator detection enabled=false
    Maximum number of characters per column=4096
    Maximum number of columns=512
    Normalize escaped line separators=true
    Null value=null
    Number of records to read=all
    Processor=none
    Restricting data in exceptions=false
    RowProcessor error handler=null
    Selected fields=none
    Skip bits as whitespace=true
    Skip empty lines=true
    Unescaped quote handling=RAISE_ERRORFormat configuration:
    CsvFormat:
        Comment character=#
        Field delimiter=,
        Line separator (normalized)=\n
        Line separator sequence=\n
        Quote character=
        Quote escape character=
        Quote escape escape character=null
Internal state when error was thrown: line=0, column=2, record=0, charIndex=31, content parsed=field
    at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:395)
    at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:616)
    at com.univocity.parsers.common.AbstractParser.internalParseAll(AbstractParser.java:545)
    at com.univocity.parsers.common.AbstractParser.parseAll(AbstractParser.java:538)
    at com.univocity.parsers.common.AbstractParser.parseAll(AbstractParser.java:525)
    at Test.main(Test.java:33)
Caused by: com.univocity.parsers.common.TextParsingException: Unexpected character 't' following quoted value of CSV field. Expecting ','. Cannot parse CSV input.
Internal state when error was thrown: line=0, column=2, record=0, charIndex=31, content parsed=field
    at com.univocity.parsers.csv.CsvParser.parseQuotedValue(CsvParser.java:458)
    at com.univocity.parsers.csv.CsvParser.parseSingleDelimiterRecord(CsvParser.java:176)
    at com.univocity.parsers.csv.CsvParser.parseRecord(CsvParser.java:108)
    at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:574)
    ... 4 more

Process finished with exit code 1

Issue is reproducible on the latest jar https://mvnrepository.com/artifact/com.univocity/univocity-parsers/2.9.1