Open jmausolf opened 4 years ago
Hey, this is really cool - I work on pretty similar themes here at the Knowledge Lab with Prof. Evans, and that Kozlowski paper is a favourite of mine. I am curious, have you used or looked up either
to go about exploring this? I've often wondered if it would be better to have one embedding space instead of multiples ones to explore word change. Though I'd imagine you'd run into similar problems of sample size... I do think that having multiple ones might just be necessary (at any rate it makes things easier in terms of training etc). But to be able to explore how the nature of the gender vector itself changes over time (with respect to older variations of it) might be a stronger way to measure change of cultural models as opposed to checking how other words relate to it. Of course, since the "diffusion" of the embedding space itself is controlled by the algorithm we choose, it isn't necessarily a very telling account of how the cultural models change.
In general I'm curious to hear what you think of the changing structures of embedding spaces can tell us about cultural models changing.
We use the COHA corpus you mention in the Computational Content Analysis course here - I was happy to read about how it can be a good corpus to measure such changes. I wonder, with regard to this, if the increased feminity of schooling had maybe to do with aspirational qualities (e.g a need for more women due to post world war 2 influx of women in the job market) in the writing? Maybe not, I really have no clue, but it is certainly curious to see if there are patterns in the way certain stereotypes catch up (or lag) to demographics across a variety of contexts, and not just schooling.
I'm trying to play with word embeddings in similar ways with regard to research trends, academic syllabi, and job advertisements. I hope to find interesting results like yours throughout the analysis!
Fascinating work! In terms of the inconsistency you've found between the feminization trend of "schooling" and trends in women’s higher educational attainment, I wonder if you have check other women's education attainment measurements. Since a lot of words that represent "schooling" does not necessarily link to college education (and seems to be used more in describing pre-college educational scenes, I'm not entirely sure), it is possible that the general increase in educated women gave rise to the close association between schooling and feminine. Maybe the rate of women who receive some sorts of education (high school, middle school, etc,) has surpassed that of men before 1950s.
Also, culture has more continuity and slower pace to change than the demographic reality. So if a cultural perception that link women to schooling has been established in 1940s, I guess it won't be easily reversed due to the sudden external shock (the decline of women’s rates of degree attainment relative to men’s rates between the late 1940s and the late 1950s). And I'me curious about why such decline happened...
Besides, I'm interested in the implications of your findings. Why is it so important for us to understand this cultural trend expressed in 2 opposite dimensions? It has been a puzzle for social scientist to understand why despite the tremendous progress in women's educational attainment, women's occupational/career attainment does not match to the same scale of progress in the educational arena. Some scholars attribute this discrepancy to the social expectations/burdens of women in the private spheres that hinder their career development (e.g., motherhood penalty). But your findings might also point to a possible direction of explanations from a cultural perspective. I wonder if you could build a measurement for occupational/professional/career attainment (like the schooling one) in this embedding space, and show readers its position in the feminine-masculine dimension, and then show the changing relations between occupational attainment, schooling and intelligence.
This is quite an interesting research! I am wondering how to interpret the sample of COHA. The results derived from a subset of COHA suggest how "average Americans" talk about masculine and feminine subjects. But how heterogenous are these "average Americans"? Where do education and schooling have the most gender annotations--adult fictions, books for kids, news, etc.?
I am concerned about this heterogeneity because it matters to causality: are we looking at gender connotations or the real gender difference partly as a result of these connotations but also partly due to biological differences or existing social institutions? If the 50% of COHA which is fictions suggest strong gender annotations, it is clear that the idea of education and schooling is associated with the idea of gender. But if the rest 50%, which is mostly news and non-fictions, also has strong gender annotations, it might suggest the consequences of gender annotations (stereotype) rather than constitute gender annotations (stereotype) itself. For example, news about sports tends to be biased (say, most well-known players are male and aggressive) and also disproportionally dominate the news corpus. But this is not because people necessarily believe that males are more aggressive, but because historically most sports are for males and no matter for which gender, you need to be aggressive to win a game. Therefore, if we only look at news about sports, we may say that sports news is creating gender connotations (problem behaviors have masculine connotations). As a result, the signal of consequences of stereotype--amplified by the historical social structure--might overshadow the signal of its cause, i.e. gender annotations.
This is a very interesting paper!
Your research is centered on the nature of English where gender is coded in agent and patient. However, many languages code gender differently: some heavily code gender in verb, object, and concept, while some have less and more flexible gender code in agent and patient.
In addition, I am curious about how your method deals with the dynamic of gender code in the language itself over time. For example, in Thai, the equivalent of 'her' and female final utterance in the modern-day is acceptable to use with hetero males as well as the equivalent of 'his' with hetero female while it was not in the past. Would scaling into z-score, like you do with English, be sufficient in this case where word identities are not always unique anymore?
Thank you for your presentation! I’m very inspired by your approach to examine the established cultural models on gender attributes. While reading your draft, I found my personal perception of gendered concepts and wordings are a little deviant from your descriptions. Is it because there is a gap between the perception of the language and the actual use of the language, in our case the words that point to gendered concepts? If so, could you share some thoughts on this gap, that what we perceive is not necessarily what we speak?
In addition, I noticed that this work is solely based on the American culture and the English language. I wonder how would you redo this research in a culture which gender-based models are drastically different from that of American, and as @wanitchayap mentioned, in a culture which language possesses very little masculine and feminine attributes and connotations
Thank you for the presentation! Do you think the cultural model shift in the article is a country-specific mechanism or can such change be generalized to other countries in Europe and Aisa? Moreover, how would such a change in culture in schooling affect women's career attainment and occupation status?
Thanks for the presentation! My question is that I think culture is not limited to print media. Although the corpus of print media is large, people still participate in cultural activities in other ways. How do you think the performance of different gender groups in other cultural activities has influenced the results of this study?
Thank you in advance for the presentation! Your study really gave me a lot to think about. I have a two questions:
Based on Appendix Table A2 and A3, I think that the words that were analyzed for socio-behavioral skills and problem behaviors are quite general. Considering that the corpus is a mix of fiction, newspapers and other texts not specific to educational settings, I was wondering how this could be seen as behaviors in an educational setting. For example, in the problem behaviors section, "anger", "aggression", "violent", "fight" or "attacking" looks like the very general term to me. These words could appear in educational settings, of course, but these words could also appear in an article about World War 2, for example. Considering the corpus mixture that you mentioned, I feel like if these kinds of words appear on the corpus, it would be more likely to be outside the educational settings. Could you elaborate more on why did you think these results based on these relatively general words could be related to educational settings?
Although there is a scale named IQ, I feel like "intelligence" is a very broad term and often contains a lot of things incorporated under the same word. For example, the ability to do multi-digit mental arithmetics and the ability to come up with a creative solution for a problem could be seen as "intelligent", but I think these two covers vastly different areas of human cognition. Do you think the pattern you captured using the terms related to intelligence is related to the broader umbrella term or is related to some specific portion of what the people often refer to as "intelligence"?
I note that I do not have good understanding of nerual network embedding models, so the answer to my questions might be something very obvious. If that is the case, I apologize in advance, and would love to get some elaboration on the matter!
Thank you again for sharing unpublished work with us. I look forward to the presentation!
Thanks for the presentation! I noticed that many keywords you selected may share the same root (e.g, intellect--intellectual--intellectually or listens--listened--listening). I'm not quite familiar with word embedding techniques, so I'm curious is there any specific reason that you decide to keep these variations of words instead of collapsing them to a single word when measuring their gender connotation?
Fascinating work, thank you for sharing your research!
More of a social science question on my part: I would love to see how notions of race complicate educational attainment and views on intelligence. It has been suggested that African American boys are more likely to be punished in the classroom for similar levels of activeness. Do you think the methods you've applied here could be used to explore changing models of race or other categories of difference?
Thanks a lot for your representation! It is interesting to see the trends, especially using the deep learning skills to dive deep into the social science field. My question is kinda general, you used the print media data from 1930 to 2009, however, at the same time, the print media is not as powerful as before. In other words, the electrical media is becoming more and more popular and influential, this also includes the social media like Twitter and Facebook. I am just wondering do you consider add some analytics regarding the social media?
Thank you for presenting. I notice you used the Corpus of Historical American English (COHA) to study the gendered meanings of education across time. This corpus consists of fiction, popular magazines, newspapers, and nonfiction books. To make sure this corpus is representative of the works that were widely read by Americans, you validated the contents of the corpus against the Publisher’s Weekly list of top-selling fiction for each year.
Although these books, magazines and newspapers were widely read by Americans, they were written by a limited group of people, thus representing mostly the opinions of these people. It would be interesting to compare the COHA data (of recent years) with online social media data (e.g. Twitter), since the latter is more representative of the opinions of all people.
Thanks a lot for your representation! It is interesting to see the trends, especially using deep learning skills to dive deep into the social science field. The electrical media is becoming more and more popular and influential, this also includes social media like Twitter and Facebook. Thank you for interesting research! Looking forward to your presentation.
Thank you for sharing your really interesting work! Since you are trying to understand gender from a corpus of text, I can't help but wonder if the gender of the author itself majorly contributes towards the found trends. Are both genders of authors equally represented in the corpus for each of the topics explored?
Thanks for your sharing. I enjoy reading the whole paper. This would be a really good attempt to apply deep learning method in gendered representation in media. I expect to see more work that extends to other forms of media, especially social media. Comparative study between different forms of media would also be very interesting. I really get inspired by this work. Thank you again.
Thanks for sharing! The main thing that stuck out to me is the use of intelligence measures and how they can be confounded across time. For example, it's been well observed that the 'average' IQ score has been rising over the years, changing everyone's scores on the scale. Do you think this will have an effect on your results?
Thank you for sharing your work with us.
I noticed that your selection of the keyword scales comes from factors that play a role in shaping educational outcomes. We see that yes, educational attainment is shifting towards the feminine end of the spectrum, yet it is inherently different from educational 'achievement', which your model did not directly account for. Is there any reason why you chose not to include educational outcomes itself as a keyword scale? Do you think it would make any difference in terms of implications?
Thank you for the paper! I really enjoy reading the whole paper. In this article you used American print media from 1930-2009 to examine how and whether cultural models have changed. I am wondering why stop at year 2009? I have a feeling that the cultural model change in the last decade is more rapid and significant. So I am thinking if we also add the print media from 2009-2019, will this change your finding. Also, it is interesting to see what will happen if online media is also added to the dataset. Thanks!
Thank you for presentation! Your conclusions are interesting and reasonable. In this paper, you proposed a view that people usually attribute girls' achievement on studying to hardworking and boys' achievement to talents, which demotivates boys who don't do well on school work in the past. I think that's a social bias which is harmful to the accumulation of human capital. So have you considered about what policies or publicity may help to reduce and eliminate the bias?
Thanks very much for your presentation! It is a very impressive attempt to use neural analysis when tracing the cultural patterns of gender, but it seems to me that as for the specific procedure of word selection, some words like "anger" may not connote something significant in terms of educational setting. Do you think there is some problem in it or it is restricted to the capacity of explanation in content analysis?
Thank you for sharing your work! I would be really interested to see a follow up to this broken down by certain subjects/fields. I have done some research studying equity in the teaching of computer science over the past year. Most of the first computer scientists were women, but the field has since shifted to being dominated by and taught in a way that advantages men. While there probably is not enough of a corpus to analyze CS specifically, do you think it would be possible to look at Math or English to see if different relationships exist between the education of those subjects and gender?
Thanks a lot for your interesting and inspiring paper. The cause of the difference in educational attainment between different genders has been a hot topic for quite a long time. Your paper indeed gives us a fresh perspective. As for the question, in the traditional labor economics model, income is closely correlated with education level. I'm wondering if your results could help to explain the gender discrimination in wages. As the educational attainment of women already surpass men, would this help to mitigate the discrimination?
It is indeed an amazing work that investigates the intriguing topic - gender equity and shows the power of combining neural network and content analysis. It is notable that the corpus is rapidly increasing with the fast development of the internet. Would you consider the enlargement of data a blessing or a burden?
Thank you for your presentation. I have similar question with @hihowme, which is how to put the new information into the current model and make proper justifications. Also, since it is a gender analysis, how to ensure that there does not exist significant subjective biased opinions on gender, which should be an important precondition for this research.
Thank you for your presentation! One question I have in mind is about the gender discrimination. We know that the education is correlated with education. Although the education of women has surpassed that of men, the income gap still exists between men and women. How would we apply the deep learning to study the gender discrimination?
Thanks a lot for your representation! It is interesting to see the trends, especially using deep learning skills to dive deep into the social science field. The electrical media is becoming more and more popular and influential, this also includes social media like Twitter and Facebook. Thank you for interesting research! Looking forward to your presentation. (cited from CBB)
Thanks so much for your research. In the paper, you examined the trends of words relevant to education towards women and men. You provided three different types of trends in the conclusion. I am curious about what the mechanisms behind these trends are. In particular, could we rigorously test their relationship with the trends in women's educational performance, like whether the cultural change is impacted by the increasing women's educational attainment, or the reverse, or there are some other factors impacting the two simultaneously? Thanks!
Thank you for your presentation. From the results in the paper, I couldn't help but notice there are obvious outliers in each group, having contrasting or very different trend with the whole group. It seems to me then, how these words are chosen and how they are grouped would have a profound impact on the results. My question is, is this accounted for in the methodology you used? Thank you.
Thank you for the presentation.
I was wondering how you might qualitatively explain the scope of these findings - I love the tracing of the trajectory of associations over the 80 year period, but was curious if we can develop an understanding of similar trajectories within the life cycle of people in the present.
For instance, one might expect certain dominant narratives correlated with people's ages and typical career trajectories leading to associations with concepts like 'intelligence' or 'success'. What are your thoughts on this?
Thanks for providing such an interesting paper to read! Do you think the performance across different gender groups in cultural activities other than print media can make a difference in the outcome of this study?
Thanks for your presentation! You applied neural network word embeddings to a 200-million-word corpus of American print media (1930-2009) to examine whether and how the cultural models have changed. I am quite confused whether the occurrence of online social media would influence the results or not.
Thank you for the interesting paper! The changes in gendered association of intelligence and studying are so interesting to see. My question is, from the 40s until now, did the meanings of some of the words included in the research change fundamentally in a way that could affect the result? For example, do words such as 'brilliant' or 'genius' carry the same connotation in 1940 as they do now? If not, does this influence these words' ability to capture gendered association of 'intelligence'?
Thank you so much for the presentation! In the paper, you have introduced the gendered cultural models for education and the corresponding results. I am wondering if you could create a "polarization" measurement for gendered differences with respect to the educational outcomes and see how that evolved over the years?
Thank you for the presentation! The discussions towards gender and race are the two most significant topics in the U.S. since the WW1. You also talked about group concentration and network centralization in the paper. Back in the 1910s, the women had rare chances to communicate with men, so when they decided that they wanted to have more legal rights, they didn't ask help from men but organized their group and activities. Nowadays, communications have never been easier between people and society, and perhaps remaining in one group could cause biases and misjudgments. How could we educate people to know the other groups better before making comments? And could we create an environment that information about gender and race could be filtered before it becomes public?
Thank you so much for your presentation! I am wondering how you define a certain word as feminism or masculinity. Can you explain that in further detail?
Thank you for sharing your great progress and your planned presentation tomorrow. It is truly inspiring to observe how the gender qualities of the English language shifted over the last eighty years. Honestly speaking, upon seeing your title I am in some disbelief that you can explore such a massive trend in a single study, and therefore I really appreciate how you utilized the COHA and formalized it in a longitudinal way. In addition, the way you constructed the gender axis was really solid even to the most demanding eyes.
My question, though, is a bit of a diversion. Clearly English is a language without gender of words, whereas most Romance languages as well as German feature gender for nouns. Under your research context, you explored the evolution of the American English, and I wonder if such trends can be exported to be applied on a broader scale. How would other languages' word gender quality, under your definition, change over the years and how does that coincide with the social movement in those countries? How might the gender defined by you correlate with the words' original grammatical gender?
Thanks for your presentation. I have a similar question as @wanitchayap . If the gender information not explicitly coded, and we need to identify the gender information according to its POS (Part of Speech) or its context , how would we deal with this situation and how to evaluate the accuracy of this coding?
Thank you so much for your paper. Looking forward to your presentation tomorrow!
Thank you very much for the presentation! It's really impressive to use the powerful computational methods to analyze the cultural pattern underpinning this trend. However, how can you prove that the text analysis is in line with the real world? It's pretty common that the two don't match. For instance, when the disparity is more severe and conspicuous in reality, the text tends to devise a world of no inequity more than usual.
Thank you very much for sharing your work with us! Gender equality is a very important topic to consider in education. In addition, I am also very interested in the intersectionality of gender and racial identities in education. It might be interesting to apply the same method but adding an additional variable.
Thank you for the presentation! It is fascinating to see how the cultural model changed over the year. My question is how do you identify the cultural pattern behind those word corpus? What are the sign-vehicle, object and interpretant?
This is a really interesting topic to me. I am wondering if we can apply similar research approaches to analyze gendered connotations in the expressions and languages used in the workspace. For example, you mentioned that the connotations of words used to describe social-behavioral skills in the classroom, such as attentiveness and communication, have been quite stable. I am wondering how the same set of words change in the context of work. Do they still have the same feminine or masculine connotations? Do they exhibit the same changing pattern as in education?
Thank you for your presentation! I agree with @di-Tong that the similarity between words is connected with the frequency that they appear together, and thus the relative position change of schooling and feminine may due to the demographic change. Although the content somehow reveal people's recognization of the gender, I think it might be better if you could control the total frequency of each words and then conduct the Word2Vec training.
I am really interested in word2vec and am conducting researches with it with my friends, and I think there might be something in embedding spaces that simply counting co-occurrence can not interpret.
My question is somehow a little trivial but I have been wondering in a serious way: during your experiment with W2V, is there some non-trivial facts that we would like to choose W2V to conduct the analysis instead of just, you know, counting?
Thanks for your research and it's certainly interesting to see how deep learning method could be applied in gendered representation in media. I wonder how could you extend your research to more kinds of media so that more people could have a choice, and how could you extend your results when incorporating other important social factors, such as race, age, region, social-economic status, etc, which are frequently discussed together with gender? Besides, @Leahjl I really like your question, so I decide to give you THUMBS UP in my text, a different sign vehicle from the normal one to show my respect :)
Thank you for your presentation! Could you discuss a little bit about which bias may exist in the research design and how do you overcome it? For example, could you elaborate on the way you decide a word whether it is feminine or masculine? Would this categorization involve bias?
Thanks for your presentation. How you tell us that is it similar with the real world? Thanks a lot!
Thank you for the interesting paper! Like others have mentioned, some languages have genders encoded in their words, and in German, there is even a third gender: neutral. How should our model distinguish between gender encodings in the language and the actual gender associations?
Thank you very much for your paper. As you mentioned in the paper, the origins of the culture change are unclear, which requires future investigation. I wonder if you could shed some lights on how could this kind of researches be carried out. In addition, could this method be generalized to other contexts such as gender difference in work place or races difference in education?
Thank you for the talk. As journalists with different genders and training might choose different words in feminine versus masculine contexts, how do you account for the heterogeneity in the characteristics of the article pieces?
Thank you so much for presenting this inspiring research! Tracking the changing gender connotations through content analysis offers a cultural and longitudinal perspective of gender stereotype. During the reading process, I kept thinking about the implications of your research. For instance, does people's attaching socio-behavioral skills and studiousness to female suggests a inferior status or negative impression? Do you think such implication can be further revealed by laboratory experiments applying methods like IAT (Implicit-association test)? Also, as @nwrim mentioned, the definition of intelligence has been controversial but most is commonly used as a combination of multiple skills. Wechsler Adult Intelligence Scale, for instance, includes both verbal and non-verbal performance test. As far as I'm concerned, female are normally related to higher talent in verbal skills while male are expected to excel at maths or reasoning in general. Do you think such dispcrepancy or variance within the concept of intelligence worth further investigation?
Comment below with questions or thoughts about the reading for this week's workshop.
Please make your comments by Wednesday 11:59 PM, and upvote at least five of your peers' comments on Thursday prior to the workshop. You need to use 'thumbs-up' for your reactions to count towards 'top comments,' but you can use other emojis on top of the thumbs up.
As an additional reminder, please do not distribute, share, or post the reading for this week.