Open adambom opened 11 years ago
Could you give me a sample of your data? I'm not really sure why the year is getting converted to something strange. Is it being added up?
-- Irene
On Thu, Jun 13, 2013 at 4:57 PM, Adam Savitzky notifications@github.comwrote:
I want to be able provide a function for each of the columns I specify. Let me provide an example.
The following code doesn't work because it just does sum on the year column.
ds.where(function (row) { return row.year === 2010;}).groupBy('race', ['population', 'year']).columns(['year', 'race', 'population']).toJSON();
I end up with a view that looks like this:
[ { _id: 39155, race: 'White', population: 77036790.6869098, year: 643200 }, { _id: 39157, race: 'Black', population: 11941573.990469573, year: 643200 }, { _id: 39159, race: 'Asian', population: 5038865.1221144805, year: 643200 }, { _id: 39161, race: 'Hispanic', population: 13029068.243787775, year: 643200 }, { _id: 39163, race: 'Other', population: 6686024.6175516695, year: 643200 } ]
Notice how the value for year is not 2010, which is what I want.
I'd like to propose an update to the method option in groupBy that allows me to pass an object with key being the column I'm aggregating by, and value being a function that determines the aggregation method. If none is specified, you get sum. I could rewrite my example above like so:
ds.where(function (row) { return row.year === 2010;}).groupBy('race', ['population', 'year'], { method: { year: .identity population: .sum // optional, but showing here to be explicit }}).columns(['year', 'race', 'population']).toJSON();
This would return:
[ { _id: 39155, race: 'White', population: 77036790.6869098, year: 2010 }, { _id: 39157, race: 'Black', population: 11941573.990469573, year: 2010 }, { _id: 39159, race: 'Asian', population: 5038865.1221144805, year: 2010 }, { _id: 39161, race: 'Hispanic', population: 13029068.243787775, year: 2010 }, { _id: 39163, race: 'Other', population: 6686024.6175516695, year: 2010 } ]
Whatcha think? I'm open to other approaches too. What I'm trying to emulate is this SQL
SELECT year, race, sum(population) FROM mytable WHERE year=2010 GROUP BY race;
— Reply to this email directly or view it on GitHubhttps://github.com/misoproject/dataset/issues/202 .
Yes, the year is being added up. It would be nice if there was a way to specify aggregators for all the columns, so this didin't happen.
The data looks like this:
year | race | region | population
-------------------------------------
2010 | hispanic | south | 100
2010 | hispanic | east | 100
2010 | hispanic | west | 100
2011 | hispanic | east | 200
2011 | hispanic | south | 200
2011 | hispanic | west | 200
2012 | hispanic | west | 300
2012 | hispanic | east | 300
2012 | hispanic | south | 300
2010 | white | south | 200
2010 | white | east | 200
2010 | white | west | 200
2011 | white | east | 300
2011 | white | south | 300
2011 | white | west | 300
2012 | white | west | 400
2012 | white | east | 400
2012 | white | south | 400
2010 | black | south | 150
2010 | black | east | 150
2010 | black | west | 150
2011 | black | east | 160
2011 | black | south | 160
2011 | black | west | 160
2012 | black | west | 170
2012 | black | east | 170
2012 | black | south | 170
What I'd like to get is data that looks like this:
year | race | sum(population)
-----------------------------
2010 | hispanic | 300
2010 | white | 600
2010 | black | 450
I want to be able provide a function for each of the columns I specify. Let me provide an example.
The following code doesn't work because it just does sum on the year column.
I end up with a view that looks like this:
Notice how the value for year is not 2010, which is what I want.
I'd like to propose an update to the method option in groupBy that allows me to pass an object with key being the column I'm aggregating by, and value being a function that determines the aggregation method. If none is specified, you get sum. I could rewrite my example above like so:
This would return:
Whatcha think? I'm open to other approaches too. What I'm trying to emulate is this SQL