Importing CSV with single double-quote inside field results in incorrect value in table.

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Create a table in the database with appropriate schema (see below.)
2. Import a CSV file (see below) containing a field surrounding by 
double-quotes that in turn contains a single double-quote (escaped in CSV as 
"").
3. The resulting table record will have an extraneous double-quote at the end 
of the record.

What is the expected output? What do you see instead?
===BEGIN TABLE SCHEMA===
CREATE TABLE `items` (
  `id` int(11) unsigned NOT NULL auto_increment,
  `description` varchar(255) NOT NULL default '',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
===END TABLE SCHEMA===

===BEGIN SAMPLE CSV INPUT===
0,"JOB LGTH BRONZ 1/16"" 21A1"
===END SAMPLE CSV INPUT===

===BEGIN RESULTING RECORD===
1   JOB LGTH BRONZ 1/16" 21A1"
===END RESULTING RECORD===

===BEGIN EXPECTED RECORD===
1   JOB LGTH BRONZ 1/16" 21A1
===END EXPECTED RECORD===

Using:
* Sequel Pro v0.9.9.1
* MySQL Server v5.0.89

Original issue reported on code.google.com by bepor...@gmail.com on 22 Dec 2011 at 10:38

Attachments:

GoogleCodeExporter commented 9 years ago

Hi beporter,

Thanks for the report.

The problem here is that the CSV you have uses Excel-style quote escaping, 
where escaped double quotes are represented by "" instead of using backslash 
escaping - \" .

When you select the CSV file in the import dialog, you can see there's a 
"Fields escaped by" setting underneath the file selection dialog.  If you 
change this to a double quote, instead of a backslash, you'll find your file is 
imported correctly (and your choice will be remembered in future).

(The result you saw was because the parser treats the second double quote as 
the end of the quoted field, and then just appends the rest of the string - so 
"1/16"" 21A1" is read first as '1/16', leaving the string '" 21A1" before the 
end of the field, which is then added on.   I don't think we can change the 
backslash-escaping parser to handle doubled-up-character escapes more 
intelligently because it's hard to tell when it's genuinely meant to be a field 
end and when it's not… but perhaps if there is any remaining content before 
the field end character, including more quote characters, we can handle it in a 
compatibility fallback… any bright ideas? :)  I'll mark this as 
AwaitingFeedback for now, and maybe attempt some compatibility work )

Original comment by rowanb@gmail.com on 5 Jan 2012 at 12:39

Changed state: AwaitingFeedback

GoogleCodeExporter commented 9 years ago

Ha! It's amazing how many times I must have raced past that option, never 
having to think about it. 

Of course you're completely right about just selecting the correct escape 
character, although if I was to nitpick it would seem that in spite of CSV not 
having an "official" spec, double double-quotes are the more commonly accepted 
method of escaping, not the C-style backslash: 
http://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules  I'm slightly 
familiar with this, having written a (crude) CSV parser myself in PHP before 
such a thing was included in the language by default: 
http://stackoverflow.com/questions/2835269/csv-import-split-by-comma-what-to-do-
about-quotes/2837009#2837009

Regardless, I don't think any code change is necessary here: You already have 
the options in place to handle different escaping. The fact that Sequel Pro's 
default didn't match my expectation is not a bug. Thanks for you explanation!

Original comment by bepor...@gmail.com on 5 Jan 2012 at 5:42

GoogleCodeExporter commented 9 years ago

While it's not technically a bug, it's a reasonably common point of user 
confusion - so it could well be classed as a UI bug :)

After thinking about it, the occurrence of unescaped doubled field ends within 
a backslash-escaped CSV is vanishingly small (it should only occur if the CSV 
is invalid).  So I've had a play around, and in r3459 I've committed a new 
default setting for the CSV escape option, namely '\ or "'.  This should 
correctly import both types of CSV.

It appears to work correctly with your small test CSV and a few others I've 
tried it with.  If you have now previously selected an import escape type, the 
new setting default won't override it - your previous selection will still be 
maintained.

Let me know if you run into any problems with the new versions.  (We make 
development versions available to test at http://nightly.sequelpro.com/ )

Original comment by rowanb@gmail.com on 6 Jan 2012 at 2:07

Changed state: Implemented
Added labels: Component-Logic, Component-UI, Type-Defect, Usability

GoogleCodeExporter commented 9 years ago

That's awfully kind of you. Thanks for taking the time to work on such a 
relatively small detail. And thanks for all of the time your dev team spends on 
this great (free) product. It's an essential app in my toolbox.

Original comment by bepor...@gmail.com on 6 Jan 2012 at 5:56

vieten / sequel-pro

Importing CSV with single double-quote inside field results in incorrect value in table. #1252