misoproject / dataset

JavaScript library that makes managing the data behind client-side visualisations easy
http://misoproject.com
GNU General Public License v2.0
1.18k stars 99 forks source link

groupBy method should allow more control over method option #202

Open adambom opened 11 years ago

adambom commented 11 years ago

I want to be able provide a function for each of the columns I specify. Let me provide an example.

The following code doesn't work because it just does sum on the year column.

ds.where(function (row) {
    return row.year === 2010;
}).groupBy('race', ['population', 'year']).columns(['year', 'race', 'population']).toJSON();

I end up with a view that looks like this:

[ { _id: 39155,
    race: 'White',
    population: 77036790.6869098,
    year: 643200 },
  { _id: 39157,
    race: 'Black',
    population: 11941573.990469573,
    year: 643200 },
  { _id: 39159,
    race: 'Asian',
    population: 5038865.1221144805,
    year: 643200 },
  { _id: 39161,
    race: 'Hispanic',
    population: 13029068.243787775,
    year: 643200 },
  { _id: 39163,
    race: 'Other',
    population: 6686024.6175516695,
    year: 643200 } ]

Notice how the value for year is not 2010, which is what I want.

I'd like to propose an update to the method option in groupBy that allows me to pass an object with key being the column I'm aggregating by, and value being a function that determines the aggregation method. If none is specified, you get sum. I could rewrite my example above like so:

ds.where(function (row) {
    return row.year === 2010;
}).groupBy('race', ['population', 'year'], {
    method: {
        year: _.identity
        population: _.sum // optional, but showing here to be explicit
    }
})
.columns(['year', 'race', 'population']).toJSON();

This would return:

[ { _id: 39155,
    race: 'White',
    population: 77036790.6869098,
    year: 2010 },
  { _id: 39157,
    race: 'Black',
    population: 11941573.990469573,
    year: 2010 },
  { _id: 39159,
    race: 'Asian',
    population: 5038865.1221144805,
    year: 2010 },
  { _id: 39161,
    race: 'Hispanic',
    population: 13029068.243787775,
    year: 2010 },
  { _id: 39163,
    race: 'Other',
    population: 6686024.6175516695,
    year: 2010 } ]

Whatcha think? I'm open to other approaches too. What I'm trying to emulate is this SQL

SELECT year, race, sum(population) 
FROM mytable 
WHERE year=2010 
GROUP BY race;
iros commented 11 years ago

Could you give me a sample of your data? I'm not really sure why the year is getting converted to something strange. Is it being added up?

-- Irene

On Thu, Jun 13, 2013 at 4:57 PM, Adam Savitzky notifications@github.comwrote:

I want to be able provide a function for each of the columns I specify. Let me provide an example.

The following code doesn't work because it just does sum on the year column.

ds.where(function (row) { return row.year === 2010;}).groupBy('race', ['population', 'year']).columns(['year', 'race', 'population']).toJSON();

I end up with a view that looks like this:

[ { _id: 39155, race: 'White', population: 77036790.6869098, year: 643200 }, { _id: 39157, race: 'Black', population: 11941573.990469573, year: 643200 }, { _id: 39159, race: 'Asian', population: 5038865.1221144805, year: 643200 }, { _id: 39161, race: 'Hispanic', population: 13029068.243787775, year: 643200 }, { _id: 39163, race: 'Other', population: 6686024.6175516695, year: 643200 } ]

Notice how the value for year is not 2010, which is what I want.

I'd like to propose an update to the method option in groupBy that allows me to pass an object with key being the column I'm aggregating by, and value being a function that determines the aggregation method. If none is specified, you get sum. I could rewrite my example above like so:

ds.where(function (row) { return row.year === 2010;}).groupBy('race', ['population', 'year'], { method: { year: .identity population: .sum // optional, but showing here to be explicit }}).columns(['year', 'race', 'population']).toJSON();

This would return:

[ { _id: 39155, race: 'White', population: 77036790.6869098, year: 2010 }, { _id: 39157, race: 'Black', population: 11941573.990469573, year: 2010 }, { _id: 39159, race: 'Asian', population: 5038865.1221144805, year: 2010 }, { _id: 39161, race: 'Hispanic', population: 13029068.243787775, year: 2010 }, { _id: 39163, race: 'Other', population: 6686024.6175516695, year: 2010 } ]

Whatcha think? I'm open to other approaches too. What I'm trying to emulate is this SQL

SELECT year, race, sum(population) FROM mytable WHERE year=2010 GROUP BY race;

— Reply to this email directly or view it on GitHubhttps://github.com/misoproject/dataset/issues/202 .

adambom commented 11 years ago

Yes, the year is being added up. It would be nice if there was a way to specify aggregators for all the columns, so this didin't happen.

The data looks like this:

year | race     | region | population
-------------------------------------
2010 | hispanic | south  | 100
2010 | hispanic | east   | 100
2010 | hispanic | west   | 100
2011 | hispanic | east   | 200
2011 | hispanic | south  | 200
2011 | hispanic | west   | 200
2012 | hispanic | west   | 300
2012 | hispanic | east   | 300
2012 | hispanic | south  | 300
2010 | white    | south  | 200
2010 | white    | east   | 200
2010 | white    | west   | 200
2011 | white    | east   | 300
2011 | white    | south  | 300
2011 | white    | west   | 300
2012 | white    | west   | 400
2012 | white    | east   | 400
2012 | white    | south  | 400
2010 | black    | south  | 150
2010 | black    | east   | 150
2010 | black    | west   | 150
2011 | black    | east   | 160
2011 | black    | south  | 160
2011 | black    | west   | 160
2012 | black    | west   | 170
2012 | black    | east   | 170
2012 | black    | south  | 170

What I'd like to get is data that looks like this:

year | race     | sum(population)
-----------------------------
2010 | hispanic | 300
2010 | white    | 600
2010 | black    | 450