zdenu / google-refine

Automatically exported from code.google.com/p/google-refine
Other
0 stars 0 forks source link

Refine wrongly turns certain strings into numbers #423

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

Open this CSV in Google Refine, specifying no header row:

A, "5E1"
B, "5E2"
C, "5E3"

What is the expected output? What do you see instead?

Instead of seeing the actual text values, Google refine interprets the second 
column as numbers, producing 50, 500, 5000. I've tried the files with and 
without quotes just in case that affected parsing behaviour but that doesn't 
help.

Attempting to fix the data in Google Refine I chose to edit the cell value. I 
am able to edit a single cell and Apply change and this fixes the data. However 
if choose "Apply to all Identical Cells" then no changes are made, so the bulk 
update options don't seem to work either.

What version of Google Refine are you using?

Google Refine 2.0

What operating system and browser are you using?

Chrome, Ubutun

Is this problem specific to the type of browser you're using or it happens in 
all the browsers you tried?

Haven't tried other browsers.

Please provide any additional information below.

Original issue reported on code.google.com by l...@talis.com on 23 Jul 2011 at 9:17

GoogleCodeExporter commented 9 years ago
This is actually expected behavior. The 'e means 'exponent ' in scientific 
notation and indicates "times ten to the power of" hence your results.

When you import, turn off the default checkbox for "auto-detect types" and 
Refine won't attempt this conversion and your data will come on as is.

Original comment by paulm%pa...@gtempaccount.com on 23 Jul 2011 at 10:19

GoogleCodeExporter commented 9 years ago
Paul's solution is correct.  Additionally, please upgrade to the current 
release version 2.1.

Original comment by tfmorris on 23 Jul 2011 at 1:23

GoogleCodeExporter commented 9 years ago
Hi,

Yes I suspected there was some automatic inference happening. Presumably if I 
turn off auto-detect types then this will stop Refine detecting types on 
another of my other columns (my actual worksheet is much larger with dates and 
numbers). That means I'll need to do some manual coercion to fix.

I still think its an issue that after having imported those values I can't seem 
to fix them using the bulk edit options ("Apply to all Identical Cells"). 
Surely that's an actual bug?

If that worked then I could simply import by data, auto-detecting types for the 
bulk of the columns and then fix up that single column. Probably less work than 
manually coercing types.

Original comment by l...@talis.com on 25 Jul 2011 at 8:45

GoogleCodeExporter commented 9 years ago
5E2 maps to 500, and if you have '500' in your dataset in that column you are 
hoping to be able to tell which is which which isn't possible.

If your dataset won't contain powers-of-ten numbers then you could conceivably 
map them back but you could run into issues with 50E1 v 5E2 etc

If you think you've found an actual bug I'd suggest discussing & characterizing 
it in detail first on the user mailing list, google-refine

Original comment by paulm%pa...@gtempaccount.com on 25 Jul 2011 at 9:41