uchicago-computation-workshop / Fall2020

Repository for the Fall 2020 Computational Social Science Workshop
13 stars 6 forks source link

11/12: Brooke Luetgert #8

Open ehuppert opened 3 years ago

ehuppert commented 3 years ago

Comment below with questions or thoughts about the reading for this week's workshop.

Please make your comments by Wednesday 11:59 PM, and upvote at least five of your peers' comments on Thursday prior to the workshop. You need to use 'thumbs-up' for your reactions to count towards 'top comments,' but you can use other emojis on top of the thumbs up.

lulululugagaga commented 3 years ago

Thanks for your presentation. The approaches you used is easy to understand, but I'm just curious about why you select such methods and I wonder how you would discuss the results from a political science perspective?

wu-yt commented 3 years ago

Thank you for your presentation! What is your next step in research?

fyzh-git commented 3 years ago

Thank you for presenting us the methods of PCA and K-means clustering. I'm also wondering your trade-offs between different methods and the choice of PCA and K-means clustering. Look forward to your interpretation for the clustering results as well.

chun-hu commented 3 years ago

Thank you for the presentation! Have you tried other clustering algorithms?

shenyc16 commented 3 years ago

Thank you for sharing this interesting research with us. I have also employed PCA several times in researches. However, I am still curious about how we can rationalize the factors we choose for constructing an index using the method. Also, what adjustments should we make if the factors move in opposite direction with the constructed index?

YuxinNg commented 3 years ago

Thank you for the presentation. Like other students, I am also very curious why you choose these methods. Thanks

kthomas14 commented 3 years ago

Thank you for sharing your research! I was wondering if you considered any other statistical analysis methods before starting the research process, and if so, why you eventually decided those may not have worked.

siruizhou commented 3 years ago

Thanks for sharing. I'm also curious about why you choose these variables.

ghost commented 3 years ago

What are the pros and cons of the PCA?

FranciscoRMendes commented 3 years ago

Like many of the students here, I too am curious about the selection of methods. I think a soft clustering approach may have better served the purpose here, but I am curious to hear your thoughts.

97seshu commented 3 years ago

It is a very interesting paper. Thanks for sharing. Like other students, I am also wondering how missing data from countries that are not included can have an impact.

minminfly68 commented 3 years ago

Thanks for the presentation. I am also wondering why you choose PCA instead of others?

mintaow commented 3 years ago

Hi, it is a pleasure reading this paper. I am particularly interested in your thoughts to further deal with the potential weakness of K-means and make it a stronger classifier. As also explicated in the article, cluster 4 has an unclear boundary. Would you please elaborate a little more about the reason for this suboptimal clustering? And I am wondering if stacking other clustering algorithms into an ensemble learning will make it better? Thanks!

image

MegicLF commented 3 years ago

Thanks for sharing your work in advance! Could you explain the reason you choose PCA for this question and how would you identify the samples that are at the intersection of two clusters?

xzmerry commented 3 years ago

Thanks for sharing! PCA and K-means clustering are quite common computational methods, but it just never occurs to me that it could be applied to analyze the heterogeneity of political attributes in different countries, such as social cleavage!

I have a quite technical question here: why you use k-means clustering in your paper?

Though k-means is easy to use and is quick and easy to compute, you have to know the exact number of groups before clustering, which means you need to settle the center for clustering before you actually conduct it. However, it may not always the case in comparative politics, since your division may not always correct.

It could happen that the theory to be checked does not fully capture all the heterogeneity across countries. Maybe it is better if other types of clustering are also discussed?

ddlee19 commented 3 years ago

Hi, can you speak more about the Andrews Curve and why you use it in your paper?

TwoCentimetre commented 3 years ago

Thank you for sharing. I noticed that this paper mainly used those readymade data to do the research instead of creating a custommade data. I wonder what is the advantages for using the readymade data and why you do not create a specific data for this purpose. Which one is more difficult? Or in what scenario should we create our own index and in what scenario should we explore the readymade data?

Rui-echo-Pan commented 3 years ago

Thank you for sharing. Could you explain further how such methods to identify countries characteristics can be developed and used in future research to explore interesting findings at the macro or micro level?

chiayunc commented 3 years ago

Thank you for sharing your wonderful work. My question is in regard to the interpretability of the results of computationally robust methods. As you stated in your paper, we have a much more advanced theoretical framework, and we can always see researches after complex data analysis processes but gain limited interpretability. What is your thought on this? Thank you.

heathercchen commented 3 years ago

Thank you for your presentation in advance! It is very exciting to see your results demonstrate such a clear pattern in K-means cluster estimates. I am wondering is there a chance that we can know what these four clusters stand for? So that we might better understand how these indices are categorized. Thanks!

timqzhang commented 3 years ago

Thank you for the presentation ! I am also wondering the method choice you did, is there any specific criteria that help this decision?

weijiexu-charlie commented 3 years ago

Thanks for your presentation. I'm wondering why did you choose to use PCA and K-means? How would you interpret the PCs you get?

yiq029 commented 3 years ago

Thank you so much for your paper! Could you give some more interpretation about PC used in the paper? Thank you so much.

Panyw97 commented 3 years ago

Thank you for sharing! As you mentioned that you extracted 112+ indicators from the World Bank/IMF, Freedom House, WHO, UN, ILO, IDEA, Polity IV, Gapminder and CREG3, which contains a large scale of data, how did you choose the relevant and important indicators among them? Thank you so much!

WMhYang commented 3 years ago

Thank you very much for the paper. My first question is related to the interpretation of the PCs. Like many others have mentioned, how should we interpret the PCs intuitively? Or in other words, say two countries are far from each other in the figure. But how should we relate the longer distance to the specific factors we are interested in? My second question is about why PCA is employed here. There are also other algorithms used to reduce dimensions like t-SNE. Why PCA is preferred to others? Thanks again.

XinSu6 commented 3 years ago

Thank you so much for sharing this fascinating work. In the paper, we all witnessed how powerful PCA and K-means were. I am wondering what made you choose them at the first place? Did you do some sorts of rigorous scientific selection or it is just some kind of intuition based on experiences. Thank you.

Qlei23 commented 3 years ago

Thank you for sharing your paper with us! It was interesting to read about variation across countries with regard to their social and economic divisions. In economics, researchers tend to look at major macro indicators separately (with regard to the causal relationship). I don't quite follow the economic interpretation of K-means clustering on macro data. What can cluster tell us? Also, in your opinion, what are the most useful way to analyze and find trends in the data? Thank you!

RuoyunTan commented 3 years ago

Thank you for sharing your research with us. I completely agree with you when you say that computational algorithms like PCA are powerful, but the more important part of our work is still to understand and interpret the mechanisms underneath. Could you elaborate more on that?

caibengbu commented 3 years ago

Thank you for sharing! As you mentioned that you extracted 112+ indicators from the World Bank/IMF, Freedom House, WHO, UN, ILO, IDEA, Polity IV, Gapminder and CREG3, which contains a large scale of data, how did you choose the relevant and important indicators among them? Thank you so much!

qishenfu1 commented 3 years ago

Hi Brooke, thank you for sharing! Your work applies a lot of computational methods, especially machine learning. I am curious about how did you clean the highly complex data set very initially? Did you use Python, Stata, or other methods? How did you finally produce the results in this paper?

chuqingzhao commented 3 years ago

Thank you for sharing your work. It is an interesting method to apply PCA and K-means in country-level data. I am wondering could you please elaborate on how did you choose your variables? Specifically, you have used indicators from different resources, but I am curious whether we can use more fine-grained data, rather than global indicators?

Another question is when dealing with high dimensional data, feature selection is an important component of PCA. I hope to learn more about how these research selects features? Do you have any socio-economic implication behind the selection?

Thank you!

ginxzheng commented 3 years ago

Thanks for sharing! Except for curiosity about choosing PCA as many students, I was also wondering why did you include these indicators? You described indicators in comparative politics as well as macroeconomics, but would it be possible to have a central focus on specific topics that you would like to cluster on? Thank you!

luckycindyyx commented 3 years ago

Thank you for sharing such interesting work! In fact, I have applied K-means once to my modeling in pricing in a crowdsourcing app, and the result turns out to be biased. So I was wondering if there are certain cases you choose not to use K-means and turns to alternative methods. Thank you!

YileC928 commented 3 years ago

Thanks for sharing! I'm also curious about how you chose K-means and PCA as your main methods.

ttsujikawa commented 3 years ago

Thank you for your wonderful work. It is very curious how you dealt with incomplete datasets and raised the accuracy of the results. Since I am leaning toward Economics, I really feel your efforts in dealing with and filtering raw data. I was wondering if you might have any ideas on future possibilities of the research at the time when you have fairly complete datasets covering socio-economic indices. I think that current public datasets are not really representing the countries around the world since most under-developed countries are excluded. I would love to hear your presentation!

ziwnchen commented 3 years ago

Thanks for sharing! One common problem when using the unsupervised clustering models is interpretability. The reduced dimensions are usually hard to be explained for social science research. I'm wondering what do you think of this problem? And what prompts you to choose PCA/Kmeans instead of other clustering methods (e.g., hierarchical clustering, DBSCAN)?

AlexPrizzy commented 3 years ago

Quite interesting as this is research that I am unfamiliar with. I see that the paper mentioned an optimization game of country sample size and number of available dimensions to describe these countries. Would you say this research is limited to certain regions of the world due to the available data?

YaoYao121 commented 3 years ago

Thanks for coming to the workshop! This is really a comprehensive data research. However, since this is a research about social science in essence, maybe why data perform that is more significant. So I am very curious about the detailed machanism about this model. Thanks!

boyafu commented 3 years ago

Thanks for sharing! Since the database is of great importance in the research, I am interested in how to deal with the imperfectness of raw data. For example, how to deal with missing data as well as extreme values? Thanks!

Yiqing-Zh commented 3 years ago

Thank you for the presentation. It is quite a complex question to answer the development degree of a country. I am wondering whether this method of combining different indices differ among country sizes.

chentian418 commented 3 years ago

Thanks for sharing the interesting paper! One question is that, by utilizing the statistical learning methods, like K-means clustering, these methods would come with difficulty to interpret, how would you balance the clustering accuracy and the interpretable aspects of the methods?

Thanks!

zixu12 commented 3 years ago

Thanks for sharing. This is a nice methodology paper which I am not familiar with. I am wondering how it related to the comparative study?

wanxii commented 3 years ago

As you've mentioned at the end of the paper that "Political science research is moving toward predictive risk assessment". If it's the case, how would you enclose shocks (i.e. unaticipated turmoils, terrorist attacks) into your models?

aolajide commented 3 years ago

Thank you for presenting. How do you that what you are doing is predictive? And is there a way to note what the prediction would have been had certain past events, i.e. oppression from other countries, not occurred?

YijingZhang-98 commented 3 years ago

Sorry for the late comment. I just went through this paper and really enjoyed reading it. My question is about how do you determine "four" is the best number of clusters? I think the turning point in Figure 4 is not very obvious.

截屏2020-11-12 14 26 56的副本

From Figure 5, clustering into 4 groups makes sense, but would it be more convincing if you could provide the graph when clustering into five or more?

截屏2020-11-12 14 27 00
Tanzi11 commented 3 years ago

I apologize for my late comment as well. I look forward to your presentation as it is material I am not familiar with. Would you say that global development indicators are only as valuable as the available data?

j2401 commented 3 years ago

Thanks for sharing with us! I wonder what’s your opinion of the possible dependency of the model we specified on the training set. To be specific, will you expect that the model we specified remain robust if we add or eliminate some other indicators? Look forward to your presentation today!

anqi-hu commented 3 years ago

Thank you for sharing your work with us. How did you narrow your methods down to PCA and K-means, among many other algorithms? In addition, out of curiousity, what are some of the most salient country-level attributes shard by the nations that were grouped into the same cluster? As you mentioned in the last section regarding the challenge of interpretation, how would you go about interpreting the results in terms of various issue areas that are the most critical in different countries?

PAHADRIANUS commented 3 years ago

Thank you for introducing this method for narrowing the numerous indicators available to measure country developments. I understand that statistically your application of PCA and K-means clustering is quite robust in dimension reduction, but I am slightly concerned of theoretical implications behind that. Indeed, the ever enlarging pool of global development indicators does demand us to be more selective. Still, is it sufficiently salient to make such as selection using statistical methods alone without thoroughly reviewing how those ruled out indicators are theorized and generated?

yierrr commented 3 years ago

Thanks for the paper! Similar to the question on missing data, I'm wondering how people deal with possibly fake data in this kind of studies. One of my undergrad professors once shared with us his story of researching in Africa but only to find out the data provided by the local government was totally made up (it was econ growth rates and they were the same every year). Thank you!