Open TungstnBallon opened 3 months ago
data_with_newlines.csv
C1,C2,C3 2,"some text",true
run this model with debug output: jv pipeline.jv -d -dg exhaustive:
jv pipeline.jv -d -dg exhaustive
pipeline Pipeline { Extractor -> ToTextFile -> ToCSV -> ToTable -> Loader; block Extractor oftype LocalFileExtractor { filePath: "./data_with_newline.csv"; } block ToTextFile oftype TextFileInterpreter { } block ToCSV oftype CSVInterpreter { enclosing: '"'; } block ToTable oftype TableInterpreter { header: true; columns: [ "C1" oftype integer, "C2" oftype text, "C3" oftype boolean, ]; } block Loader oftype SQLiteLoader { table: "Data"; file: "./Data.sqlite"; } }
Actual: The TextFileInterpreter splits "some text" into distinct lines and passes them to CSVInterpreter.
TextFileInterpreter
CSVInterpreter
Found 1 pipelines to execute: Pipeline [Pipeline] Overview: Blocks (5 blocks with 1 pipes): -> Extractor (LocalFileExtractor) -> ToTextFile (TextFileInterpreter) -> ToCSV (CSVInterpreter) -> ToTable (TableInterpreter) -> Loader (SQLiteLoader) [Extractor] Successfully extraced file ./data_with_newline.csv [Extractor] [Output] <hex> 43312C43322C43330A322C22736F6D650A74657874222C747275650A [Extractor] Execution duration: 2 ms. [ToTextFile] Decoding file content using encoding "utf-8" [ToTextFile] Splitting lines using line break /\r?\n/ [ToTextFile] Lines were split successfully, the resulting text file has 3 lines [ToTextFile] [Output] [Line 0] C1,C2,C3 [ToTextFile] [Output] [Line 1] 2,"some [ToTextFile] [Output] [Line 2] text",true [ToTextFile] Execution duration: 1 ms. [ToCSV] Parsing raw data as CSV using delimiter "," [ToCSV] Execution duration: 4 ms. error: CSV parse failed in line 2: Parse Error: missing closing: '"' in line: at '"some' $In /home/jonas/Code/uni/hiwi/jayvee/pipeline.jv:20:8 20 | block ToCSV oftype CSVInterpreter { | ^^^^^ [ToCSV] Execution duration: 8 ms.
The library we use for csv parsing fast-csv could parse the newline correctly, if it gets the input data before it's split.
fast-csv
Steps to reproduce
data_with_newlines.csv
run this model with debug output:
jv pipeline.jv -d -dg exhaustive
:Description
Actual: The
TextFileInterpreter
splits "some text" into distinct lines and passes them toCSVInterpreter
.Additional Notes
The library we use for csv parsing
fast-csv
could parse the newline correctly, if it gets the input data before it's split.