ostap / comp

a tool for querying files in various formats
MIT License
43 stars 2 forks source link

treating all characters but TAB in the ".txt" files as part of the value #30

Closed julochrobak closed 10 years ago

julochrobak commented 10 years ago

The standard CSV parser in go treats double quotes as special characters. However, TAB delimited files very often do not use quoted strings values, for example Geonames data sets. In case of values which start with double quotes, the parse loads data incorrectly:

$ cat input.txt 
name
"John" the first.
"Marry" the second.
$ bin/comp -f input.txt 'input'
[ { "name": "John\" the first.\n\"Marry\" the second.\n" } ]

Therefore, this commit uses a simpler and faster approach to parse TAB delimited files and treats ALL characters but the '\t' and '\n' in the ".txt." files as part of the value:

$ bin/comp -f input.txt 'input'
[ { "name": "\"John\" the first." }, { "name": "\"Marry\" the second." } ]]
julochrobak commented 10 years ago

See the duplicate issue #27 for some more info.