queryverse / TextParse.jl

A bunch of fast text parsing tools
Other
57 stars 20 forks source link

Detect "" always as quoted string column #131

Closed davidanthoff closed 5 years ago

davidanthoff commented 5 years ago

Fixes https://github.com/JuliaComputing/JuliaDB.jl/issues/267.

This changes the column type detection logic for empty quoted strings. With this change, any "" is now always detected as a quoted string, end of story. Right now I think that makes the most sense, but I'm not super sure?

It seems generally weird to detect "" as a missing value, in general. I think it makes sense to detected say ,, as missing (i.e. the column between the two commas), but I think as soon as there are quotes there, we should interpret it as "there is a value, namely an empty string".

@shashi and @joshday can you think about this as well a bit? I feel this is the kind of change where it helps to poll a bunch of folks to make sure I don't overlooked some important corner case :)

There is probably another fix for the that is linked above, but this seemed the most straightforward one to me...

davidanthoff commented 5 years ago

I think I'm going to merge this without review :) We have some real bugs with the currently tagged version and master that users are hitting in the wild, and so I think we need to get a new version out ASAP to deal with those problems. If this PR here is the wrong strategy, it strikes me as wrong for corner cases, whereas the issues on the currently shipping version are not corner cases.

So, I would still value a review, and if you guys disagree, we can revert in a follow up PR!

joshday commented 5 years ago

Catching up on notifications now! Your reasoning on "" sounds solid to me.