shawmut / data-preflight

A utility to preflight spreadsheets.
0 stars 0 forks source link

Data values include quoting characters #21

Open dwightkelly opened 8 years ago

dwightkelly commented 8 years ago

The data values are displayed with their quoting characters. As data values may actually include escaped quoted characters I think you'd not want to display those that aren't part of the data value.

'CONSUMER_ID','ACCOUNT_NUM_OUTPUT','LC_Account_Type','NAME_PREFIX','FIRST_NAME','MiddleInitial','LAST_NAME','ADDRESS_TYPE_CODE','ADDRESS1','ADDRESS2','ADDRESS3','ADDRESS4','CITY_LOCALITY','STATE_REGION','POSTAL_CODE','COUNTRY_DESC','FulFillment_Type','FULL_KIT','RunDate','TrackingCode','NOMINATION_CODE','ORIG_JOIN_YR','EU_END_DATE_V2','COMPANY_NAME','COMPANY_DEPT','LHW_REGION_CODE','SALES_OFFICE_DESC','LANGUAGE_DESC','NAME_CARD','CARD_IMAGE','MERRILL_LYNCH_FINANCIAL_ADVISOR','keyline'
'382391','1377111','LCU','Mr.','David','','Chiu','H','22 Nanking Street','Acme Building 4th Floor','','','Kowloon','','0000','CHINA','1','FULL_KIT','03/28/2016','854069','NJET','2016','03/17','','','APAC','HONG KONG','English','David Chiu','0108','','1.U.FS.1'
dominickp commented 8 years ago

I've never run into this, but if I preflight that row I do see it. I wonder why. Perhaps the CSVs I've been using have been quoting with double quotes. I'll look into it.

dominickp commented 8 years ago

Is this even a valid standard convention? When I open files like this in Sheets or Excel, I see one quote on the right side. It seems that the single-quote is a special field in spreadsheets which indicates something numeric being treated as text. Double quotes are handled fine.

dwightkelly commented 8 years ago

From RFC 4180

  1. Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example:

    "aaa","bbb","ccc" CRLF zzz,yyy,xxx

  2. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:

    "aaa","b CRLF bb","ccc" CRLF zzz,yyy,xxx

  3. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

    "aaa","b""bb","ccc"

dominickp commented 8 years ago

So quoting fields with single quotes within CSV is a non-standard convention?

dwightkelly commented 8 years ago

yes, but created by popular software. Also be aware that you can escape a double quote as "" or \"

dominickp commented 8 years ago

I don't know if we can easily support that. The underlying libraries used here parse data very similar to Excel. So given that these are not standard, we would need to write our own parser to support this sort of thing. We could process the list beforehand to achieve something similar though.