uchicago-computation-workshop / 2018_spring_conference

0 stars 1 forks source link

Xingyun Wu -- Participants of Music Festivals #19

Open xywu-soc opened 6 years ago

xywu-soc commented 6 years ago

Thanks!

rickecon commented 6 years ago

Can you use the demographic make up of the zip codes that you used? Some nice features would be the demographic percentages of the home zone from which individuals travel. My sense is that race and income will be very important.

shugamoe commented 6 years ago

Why do zhege "big numbers" tell better yige story

jamesallenevans commented 6 years ago

What kinds of amenities attract what types of people? Wikipedia define music types Divvy bikes to capture types of people Smart reconstruction of people coming to events Nice variation.

yiqingzhu007 commented 6 years ago

The transportation data you used are merely Divvy data, so if I understand you correctly, you are assuming that people ride bikes to these movies Festival?

LeosonH commented 6 years ago

Interesting topic and methods! Given that Wikipedia returned 1312 articles describing different types of music, how did you narrow that vast spread of definitions to the ones that you wanted to analyse? Did you limit the genres to those corresponding with events explicitly occurring in Chicago or other areas?

ningyin-xu commented 6 years ago

For the limitation of Divvy data (only one kind of transportation?), I agree with Dr. Rick Evans about using demographic data to check whether the data is representative.

dpzhang commented 6 years ago

What could be some potential implications of this study?

xywu-soc commented 6 years ago

@rickecon Thanks! That's also what I really want to do in my thesis! But I'm concerned that it would be a ecological fallacy to use the home zone data to infer features of individuals. So I'm kind of stuck in this problem. And currently my planned strategy is to the concept of "scenes", making the analysis go from micro-level to meso-level. So I could directly analyze the relationship between features of the origins and the features of the music festivals.

So do I need to consider ecological fallacy? Or does the strategy described above make sense?

Ideally, micro-level demographic data of the users would be very helpful. Unfortunately, neither the Divvy data nor the taxi data provides detailed demographic and socioeconomic variables of users. Divvy only releases the gender and the birth year of its subscribers. And taxi data doesn't include anything about the users, except for the locations and how many tips they pay the driver (which might be helpful to "infer" their socioeconomic status).

xywu-soc commented 6 years ago

@shugamoe Because too small changes (like increase by 5, or even 1) could be made by chance. Large differences are safer.

xywu-soc commented 6 years ago

@yiqingzhu007 Thanks! This is an important question! I've also thought about that. And that's why I said on the last slide that I would include taxi data in the next step.

Of course, people would still use other means of transportation. For example, by driving, uber/lyft, buses/trains. But it is impossible to get those data. Capturing something is better than getting nothing.

tamos commented 6 years ago

You might want to check out Uber Movement. They have data on trips between locations; don't know if it's available for Chicago but maybe.

xywu-soc commented 6 years ago

@LeosonH Thanks! In the content analysis part, I didn't limit the articles to music genres related to Chicago. Since each of the music festivals is for a specific music genre, I make the assumption that the style of music could represent the style of the music festival.

And since there're many mixed-type genres like 'pop funk' and 'punk blues', it is hard to directly differentiate them by their names. So I put all the articles in a word2vec model, and extract the similarities between 'house'/'gospel'/'blues' and the key words in my radar plot.

xywu-soc commented 6 years ago

@sixisxu Thanks for the advice! I don't think the Divvy data is representative, so I mentioned in the 19th slide that there is likely to be self-selection bias in the data. Bicycle/bus/train/taxi/uber/lyft data would always have this problem. An ideal type of data would be survey data. Although I've spent a long time trying to find one, I've still found nothing. So I have to say merely 'people' instead of 'population' in my title.

Hopefully some day there would be a more representative data!

xywu-soc commented 6 years ago

@dpzhang Yes. According to the theory, different amenities attract different types of people. And some kind of urban amenities could make some cities more attractive than others. So this might be helpful to understand people's geographical mobility in the long run (which might be a little bit too ambitious). Also, I think this could be helpful for urban policy arrangement.