leerssej / google-refine

Automatically exported from code.google.com/p/google-refine
Other
0 stars 0 forks source link

Accessing other rows than the current row #200

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I've searched through the issues and all the documentation I could find and 
come up blank, sorry if this has been answered elsewhere.

Is there any way of accessing rows other than the current row? Something like 
rows[rowIndex-1].cells["Column1"].value would be priceless. Unfortunately it 
doesn't work.

I'm looking for ways of computing moving averages etc and I need this feature. 
Thanks, and keep up the good work, this software looks great!

Original issue reported on code.google.com by sanderse...@gmail.com on 13 Nov 2010 at 10:18

GoogleCodeExporter commented 9 years ago
You're right--it's not possible right now. I'd say what you're asking for falls 
into a new feature area that includes grouping rows and computing sum, average, 
min, max within each group. That's necessary for visualization, too.

Original comment by dfhu...@gmail.com on 14 Nov 2010 at 3:42

GoogleCodeExporter commented 9 years ago

Original comment by iainsproat on 15 Nov 2010 at 4:00

GoogleCodeExporter commented 9 years ago
I have actually had success referring to another column's cell by using the 
'cell.column_name' command. However if you want to perform operation on the 
cells in another column you are out of luck. I was trying to create a new 
column by summing two existing column with no avail e.g. 
sum([value,cells.column_name]). 

Original comment by jluci...@gmail.com on 12 Jan 2011 at 7:16

GoogleCodeExporter commented 9 years ago
jlucido2 - That doesn't sound related to enhancement that was requested.  
Please ask general questions on the mailing list or create a new issue if 
you've got a problem or would like something enhanced.

It's likely that you just need cells.column_name.value for your example.

Original comment by tfmorris on 12 Jan 2011 at 7:22

GoogleCodeExporter commented 9 years ago
This is a very important feature for fixing common types of messy data.  For 
example, I have a 10000 record file where the dates have been input in 
different orders yy/mm/dd, dd/mm/yy, yyyy/mm/dd and dd/mm/yyyy (and sometimes 
noise), because different people entered the data.  I only have natural row 
ordering to help me out. In order to tell if 03/12/01 is March 12 or December 
1st, I would look at the previous row to see what the 4 digit year was, and 
make a decision on that.

This could be implemented as a modification of "Fill Down" that takes a GREL 
expression. 

Original comment by TeamP...@gmail.com on 16 Jun 2011 at 11:43

GoogleCodeExporter commented 9 years ago
Typing out Comment 5 gave me an idea:  If I know the month is always in the 
middle of a date format, I can use that to my advantage.
By matching out the 2 digit (yy/mm/dd and dd/mm/yy) versions as v[0],v[1], v[2] 
==> whenever v[0] == v[2] we know the year, and can convert it to a 4 digit 
year!

From this I was able to establish a sufficient number of known good 4 digit 
years.  Then I created a new column called "LastGoodDate", based on the 
existence of 4 digit years, and used the fill down option on that new column.

Now when I use a facet to display the remaining 2 digit versions, I have a 
reference to the last known good date that preceded it.  I can exploit the 
natural ordering to better predict which part is the day versus the year.

There are still more complex types of data where referencing a previous row is 
necessary to clean it up, I think I just lucked out this time.

Original comment by TeamP...@gmail.com on 16 Jun 2011 at 12:41