jvalue / jayvee

Jayvee is a domain-specific language and runtime for automated processing of data pipelines
https://jvalue.github.io/jayvee/
150 stars 15 forks source link

[BUG] Cannot parse CSV with newlines #608

Open TungstnBallon opened 3 months ago

TungstnBallon commented 3 months ago

Steps to reproduce

  1. create file data_with_newlines.csv
    C1,C2,C3
    2,"some
    text",true
  2. run this model with debug output: jv pipeline.jv -d -dg exhaustive:

    pipeline Pipeline {
    
    Extractor
        -> ToTextFile
        -> ToCSV
        -> ToTable
        -> Loader;
    
    block Extractor oftype LocalFileExtractor {
        filePath: "./data_with_newline.csv";
    }
    
    block ToTextFile oftype TextFileInterpreter { }
    
    block ToCSV oftype CSVInterpreter {
        enclosing: '"';
    }
    
    block ToTable oftype TableInterpreter {
        header: true;
        columns: [
            "C1" oftype integer,
            "C2" oftype text,
            "C3" oftype boolean,
        ];
    }
    
    block Loader oftype SQLiteLoader {
        table: "Data";
        file: "./Data.sqlite";
    }
    }

Description

Additional Notes

The library we use for csv parsing fast-csv could parse the newline correctly, if it gets the input data before it's split.