Closed TungstnBallon closed 6 months ago
Cool, nice work!
Thanks :)
Have you tried parsing something unparseable? What is the current behavior and does the user get an understandable log?
Rows where the parsing fails get excluded from the resulting table. The error is only visible with the -d
flag.
e.g.
> nx run interpreter:run -d example/parse.jv
[CarsPipeline] Overview:
Blocks (9 blocks with 2 pipes):
-> OCarsExtractor (LocalFileExtractor)
-> CarsTextFileInterpreter (TextFileInterpreter)
-> CarsCSVInterpreter (CSVInterpreter)
-> NameHeaderWriter (CellWriter)
-> CarsTableInterpreter (TableInterpreter)
-> DispTransform (TableTransformer)
-> HpTransform (TableTransformer)
-> CarsLoader (SQLiteLoader)
[OCarsExtractor] Successfully extraced file /home/jonas/Downloads/parse.csv
[OCarsExtractor] Execution duration: 1 ms.
[CarsTextFileInterpreter] Decoding file content using encoding "utf-8"
[CarsTextFileInterpreter] Splitting lines using line break /\r?\n/
[CarsTextFileInterpreter] Lines were split successfully, the resulting text file has 33 lines
[CarsTextFileInterpreter] Execution duration: 1 ms.
[CarsCSVInterpreter] Parsing raw data as CSV using delimiter ","
[CarsCSVInterpreter] Parsing raw data as CSV-sheet successful
[CarsCSVInterpreter] Execution duration: 10 ms.
[NameHeaderWriter] Writing "name" at cell A1
[NameHeaderWriter] Execution duration: 1 ms.
[CarsTableInterpreter] Matching header with provided column names
[CarsTableInterpreter] Validating 32 row(s) according to the column types
[CarsTableInterpreter] Validation completed, the resulting table has 32 row(s) and 12 column(s)
[CarsTableInterpreter] Execution duration: 1 ms.
[DispTransform] Column "disp" will be overwritten
[DispTransform] Column "disp" will change its type from text to decimal
[parseDisp] Invalid value in row 1: "NaN" does not match the type decimal
[DispTransform] Execution duration: 1 ms.
[HpTransform] Column "hp" will be overwritten
[HpTransform] Column "hp" will change its type from text to integer
[HpTransform] Execution duration: 1 ms.
[CarsLoader] Opening database file ./cars.sqlite
[CarsLoader] Dropping previous table "Cars" if it exists
[CarsLoader] Creating table "Cars"
[CarsLoader] Inserting 31 row(s) into table "Cars"
[CarsLoader] The data was successfully loaded into the database
[CarsLoader] Execution duration: 13 ms.
[CarsPipeline] Execution duration: 30 ms.
IMO the interpreter should't crash in this case, but a more visible error is necessary. I don't really know how to do this though, so some pointers would be welcome.
Invalid value in row 1: "NaN" does not match the type decimal -> "can not be cast to type decimal".
The error message is now
[parsefailer] Could not parse "Mazda RX4" into a Decimal
[parsefailer] Dropping row 1: Could not evaluate transform expression
This PR allows to parse
text
into the builtin primitivesdecimal
,integer
andboolean
example file:
closes #543