shadanks / google-refine

Automatically exported from code.google.com/p/google-refine
Other
0 stars 0 forks source link

Importing Excel files - blank columns and time formatting #491

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
CSV file: "MochaCSV.csv"
Excel file: "MochaXLSX.xlsx"

CSV import screenshot: "MochaCSV_import.png"
Excel import screenshot: "MochaXLSX_import.png"

Two issues with imported Excel data:

1. Blank columns

After creating a new worksheet in Excel and filling out columns with data 
one-by-one, the imported Excel file acquires a blank column named "Column" at 
the end.

I'm not sure whether to trim all blank columns from the data is a good idea, 
but certainly any blank columns named "Column" at the end of the data.

2. Times

When the time value "14:00:00" is imported, it should probably be left in that 
format - and not typed as a Date within Refine (e.g. "1899-12-31T14:00:00Z"), 
which can be seen in the XLSX import screenshot. This causes issues when trying 
to work with the time value as it has become a date object.

Original issue reported on code.google.com by danpaulsmith on 23 Nov 2011 at 10:37

GoogleCodeExporter commented 8 years ago
Version 2386

Original comment by danpaulsmith on 23 Nov 2011 at 2:38

GoogleCodeExporter commented 8 years ago
Times can't be "left" in their original format, because their native format in 
Excel is just a number.  It's the format code that turns it into a time (which 
Refine doesn't support natively) or a date.  Perhaps we could convert it back 
to a string (but not sure how hard that would be).

We'll check into the added column.

Original comment by tfmorris on 26 Nov 2011 at 6:15

GoogleCodeExporter commented 8 years ago
Does Refine pick up on some sort of metadata in the Excel file that tells it to 
format a time? A time value is apparently a decimal number (0.75 for 18:00:00) 
in Excel.

If a time can be distinguished from a date, could it be as easy as using a GREL 
transform on import? i.e.

    value.toDate().toString("HH-mm-ss")

Original comment by danpaulsmith on 28 Nov 2011 at 11:38

GoogleCodeExporter commented 8 years ago
I've fixed the off-by-one error causing the extra column in r2405

Excel doesn't actually have a time datatype.  It just has datetimes with a 
formatting string which causes the date to not be shown.  Your example of 0.75 
is actually December 30, 1899 at 18:00 which you can see by changing the format 
string to "M/D/YYYY H:MM".  You can postprocess anything that occurs on this 
date if that's the right thing to do for your app.

Original comment by tfmorris on 9 Dec 2011 at 11:54

GoogleCodeExporter commented 8 years ago

Original comment by dfhu...@google.com on 11 Dec 2011 at 10:33

GoogleCodeExporter commented 8 years ago

Original comment by tfmorris on 12 Dec 2011 at 7:42