socallinuxexpo / SCALE-Planning

SCALE Planning
http://www.socallinuxexpo.org
15 stars 3 forks source link

Demographic data dump #185

Closed jaymzh closed 8 years ago

jaymzh commented 8 years ago

Ron - can you provide me a dump of the CFP demographic data for analysis?

We can use dropbox or some other reasonable safe mechanism you prefer.

irabinovitch commented 8 years ago

Do you need similar data from reg? On Feb 12, 2016 3:47 PM, "Phil Dibowitz" notifications@github.com wrote:

Ron - can you provide me a dump of the CFP demographic data for analysis?

We can use dropbox or some other reasonable safe mechanism you prefer.

— Reply to this email directly or view it on GitHub https://github.com/socallinuxexpo/SCALE-Design/issues/185.

jaymzh commented 8 years ago

I do - but that comes from Lei - see chairs list for that discussion. :)

rgolan commented 8 years ago

I can do that in a few ways. I'm thinking that a csv file with user ID and demographic fields only would be enough. Do you need names or any other data?

On Feb 12, 2016, at 3:47 PM, Phil Dibowitz notifications@github.com wrote:

Ron - can you provide me a dump of the CFP demographic data for analysis?

We can use dropbox or some other reasonable safe mechanism you prefer.

— Reply to this email directly or view it on GitHub.

jaymzh commented 8 years ago

I would like names (which is why we should use a secure transport) to de-dup them with other mechanisms of cutting this data. We'll publish only non-identifying stats, of course.

rgolan commented 8 years ago

I’ve created a CSV export on my local installation in the form below. Let me know if you want other data added. 1,629 users exist in this database, many of which have not filled in the all the fields.

"Uid”,"Name","Postal code","Age","Gender","Created date" "1","Ron Golan","","","","2012-08-04 08:37"

— Ron Golan rgolan@superconcentrated.com

On Feb 13, 2016, at 6:14 PM, Phil Dibowitz notifications@github.com wrote:

I would like names (which is why we should use a secure transport) to de-dup them with other mechanisms of cutting this data. We'll publish only non-identifying stats, of course.

— Reply to this email directly or view it on GitHub https://github.com/socallinuxexpo/SCALE-Design/issues/185#issuecomment-183799327.

rgolan commented 8 years ago

I just noticed there are a lot of spam users in the database which have the same first and last name. If you want, I can make the first and last names separate fields and you can remove them from your analysis.

— Ron Golan rgolan@superconcentrated.com

On Feb 15, 2016, at 9:52 AM, Ron Golan rgolan@superconcentrated.com wrote:

I’ve created a CSV export on my local installation in the form below. Let me know if you want other data added. 1,629 users exist in this database, many of which have not filled in the all the fields.

"Uid”,"Name","Postal code","Age","Gender","Created date" "1","Ron Golan","","","","2012-08-04 08:37"

— Ron Golan rgolan@superconcentrated.com mailto:rgolan@superconcentrated.com

On Feb 13, 2016, at 6:14 PM, Phil Dibowitz <notifications@github.com mailto:notifications@github.com> wrote:

I would like names (which is why we should use a secure transport) to de-dup them with other mechanisms of cutting this data. We'll publish only non-identifying stats, of course.

— Reply to this email directly or view it on GitHub https://github.com/socallinuxexpo/SCALE-Design/issues/185#issuecomment-183799327.

jaymzh commented 8 years ago

Separating first/last name would be great. This looks like all the data I need. I can imagine wanting to follow up on a few emails to ensure I'm de-duping correctly, but I don't think I'll need them in general.

rgolan commented 8 years ago

I’ve got a CSV file with all that information. The email associated with my Dropbox account is ron@urbaninsight.com mailto:ron@urbaninsight.com. Do you want to share a folder with me?

— Ron Golan rgolan@superconcentrated.com

On Feb 15, 2016, at 5:35 PM, Phil Dibowitz notifications@github.com wrote:

Separating first/last name would be great. This looks like all the data I need. I can imagine wanting to follow up on a few emails to ensure I'm de-duping correctly, but I don't think I'll need them in general.

— Reply to this email directly or view it on GitHub https://github.com/socallinuxexpo/SCALE-Design/issues/185#issuecomment-184467747.

jaymzh commented 8 years ago

Done.

rgolan commented 8 years ago

I added the demographic-data.csv file to that folder. Let me know if you need the file to be modified.

— Ron Golan rgolan@superconcentrated.com

On Feb 15, 2016, at 7:29 PM, Phil Dibowitz notifications@github.com wrote:

Done.

— Reply to this email directly or view it on GitHub https://github.com/socallinuxexpo/SCALE-Design/issues/185#issuecomment-184497806.

jaymzh commented 8 years ago

Oh. I just realized... do you have data on if they got a talk accepted this year? I can do the manual correlation myself, but if it's in the DB, that'll save me a boatload of time.

rgolan commented 8 years ago

I added the SCALE event, Accepted status and talk title in demographics-data2.csv. This has the effect of giving speakers with multiple talks, multiple rows. You’ll need to deal with that.

If you are really just interested in users that submitted talks for 14x, I can filter everything else out. I thought you might want to make a comparison between 13x and 14x.

— Ron Golan rgolan@superconcentrated.com

On Feb 15, 2016, at 7:55 PM, Phil Dibowitz notifications@github.com wrote:

Oh. I just realized... do you have data on if they got a talk accepted this year? I can do the manual correlation myself, but if it's in the DB, that'll save me a boatload of time.

— Reply to this email directly or view it on GitHub https://github.com/socallinuxexpo/SCALE-Design/issues/185#issuecomment-184506693.

jaymzh commented 8 years ago

A few quick greps and cuts shows I can easily got this data the way I need. It'll take a bit, but this is perfect thanks. I'll come back if I feel like I'm missing anything. I look forward to publishing my findings!