queryverse / TextParse.jl

A bunch of fast text parsing tools
Other
57 stars 20 forks source link

Quote escaped quotes #40

Closed oxinabox closed 6 years ago

oxinabox commented 6 years ago

Apparently it is a thing in CSV to escape double quotes within quoted strings, by repeating them. I encounted this in: https://github.com/fivethirtyeight/data/blob/master/avengers/avengers.csv

This is inline with RFC 4180

  1. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

    "aaa","b""bb","ccc"

Here is a simple breaking example:

using TestParse
csvread(IOBuffer("""
name,age
"Bill Billson",21
"Fred ""Big Freddy"" Fredson", 18
"""))

Outputs:

Couldn't split line, error at char 35:
"Fred ""Big Freddy"" Fredson", 18
______^

Stacktrace:
 [1] quotedsplit(::String, ::TextParse.LocalOpts, ::Bool, ::Int64, ::Int64) at /home/wheel/oxinabox/.julia/v0.6/TextParse/src/csv.jl:668
 [2] guesscolparsers(::String, ::Array{String,1}, ::TextParse.LocalOpts, ::Int64, ::Int64, ::Array{Any,1}, ::Array{String,1}, ::Void) at /home/wheel/oxinabox/.julia/v0.6/TextParse/src/csv.jl:487
 [3] #_csvread_internal#35(::Bool, ::Char, ::Char, ::Bool, ::Bool, ::Int64, ::Void, ::Int64, ::Void, ::Bool, ::Array{String,1}, ::Array{String,1}, ::DataStructures.OrderedDict{Union{Int64, String},AbstractArray{T,1} where T}, ::Int64, ::Void, ::Array{Any,1}, ::Void, ::Int64, ::TextParse.#_csvread_internal, ::String, ::Char) at /home/wheel/oxinabox/.julia/v0.6/TextParse/src/csv.jl:194
 [4] _csvread_internal(::String, ::Char) at /home/wheel/oxinabox/.julia/v0.6/TextParse/src/csv.jl:157
 [5] #_csvread#30(::Array{Any,1}, ::Function, ::String, ::Char) at /home/wheel/oxinabox/.julia/v0.6/TextParse/src/csv.jl:86
 [6] #csvread#29(::Array{Any,1}, ::Function, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Char) at /home/wheel/oxinabox/.julia/v0.6/TextParse/src/csv.jl:82
 [7] csvread(::Base.AbstractIOBuffer{Array{UInt8,1}}) at /home/wheel/oxinabox/.julia/v0.6/TextParse/src/csv.jl:81
 [8] include_string(::String, ::String) at ./loading.jl:522
oxinabox commented 6 years ago

I will note CSV.jl also fails on this. (idk if it is for the same reason).

The error message from TextParse.jl is way better :+1:

oxinabox commented 6 years ago

Ah it works fine if one sets: the kwarg escapechar='"'. I am thinking that maybe that should be turned on by default, to be inline with RFC 4180