02/03 - Githubissues

ehuppert commented 2 years ago

Comment below with a well-developed group question about the reading for this week's workshop. Please collaborate with your groups on Hypothesis (via the Canvas page) to develop your question.

One person can submit on the group's behalf and put the Group Name in the submission for credit. Your group only needs to post on assigned week (rotating every other week).

Please post your question by Wednesday 11:59 PM, and upvote at least three of your peers' comments on Thursday prior to the workshop. You need to use 'thumbs-up' for your reactions to count towards 'top comments,' but you can use other emojis on top of the thumbs up.

Raychanan commented 2 years ago

Group 1C: Yawei Li, Rui Chen, Val Alvern Cueco Ligo, Yutai Li, Max Kramer

Thank you, Professor Wang, for sharing your amazing work. I found them clear, easy to follow and logical.

(1) In your Lexical Substitution paper, you operationalized neighborhood desirability into a binary variable by comparing its crime rate to the city, which I think may be a little bit oversimplified because safety is only one of the many factors of neighborhood desirability. Especially so when most of your 10 example substitutions that increases desirability are not about safety. What are your thoughts for this operationalization?

(2) In your Lexical Substitution paper, I noticed that you created balanced sample for male/female distribution. However, they both become quite unbalanced after generating candidate substitutions. Would that affect the results?

(3) In your Lexical Substitution paper, you spent quite some writing and citations explaining definitions of treatment effect and its variants, while not so much in the other paper. What are the motivations for this?

Peihan12 commented 2 years ago

Group 1M: Jiehan Liu, Partha Kadambi, Peihan Gao, Shiyang Lai, Zhibin Chen Thank you, Prof. Wang! We are just a little bit confused about the construction of perception. For example, why can the perception of desirability be measured/labeled by crime rates, which seems not intuitive? Will this make the study more like a kind of research on the correlation between lexical choice and crime rates? Furthermore, why do you choose crime rates instead of others like ratings?

LynetteDang commented 2 years ago

Group 1J Silvan Baier, Lynette Dang*, Sabina Hartnett, Yingxuan Liu

Thank you Professor Wang for sharing your work with us! Regarding the paper on training robust text classifier, we have the following question:

You have explained in the method section that

the counterfactual samples are generated by substituting causal terms with antonyms and assigning an opposite label to the counterfactual sample, and
the performance of the resulting robust text classifier is evaluated partially based on its performance on the counterfactual test data, partially based on the original data.

Since this is the case, how do you make sure that the casual terms are being generated in the way that you want, since you are using the counterfactual data (which is generated based on the causal terms) for evaluation of your classifier? And will this affect the effectiveness of your text classifier?

What if a sentence has more than one casual term or has an ambivalent casual term? For example, a sentence like:

“It is bittersweet to ...”
"I feel conflicted ..."
"I feel sad because ..., but ...makes me happy"

How will the classifier perform? What would you do when generating counterfactual samples?

NaiyuJ commented 2 years ago

Group 1N: Qiuyu Li, Alfred Chao, Dehong Lu, Henry Liu, Naiyu Jiang

Hi Dr. Wang, Thanks so much for bringing your excellent works to our workshop! I find them really interesting.

In the word substitution paper, you use the Airbnb data to assess how word choices influence the perception of neighborhood desirability and use the Twitter Message and Yelp Reviews data to investigate the substitution effect on gender perception. The methods you are adopting to discover the change of perception are really insightful, but I'm kind of curious about why you wanted to study gender perception in the Twitter and Yelp data. Are there any special patterns that you might find before conducting this experiment? Because people may perceive differently on many kinds of things based on the change of word choices. For example, some word substitutions may influence how people perceive whether the person is rich or poor, and other word substitutions may influence how people perceive whether the person is from the city or the suburb. There are many different types of perceptions associated with different demographic or socioeconomic factors. Why do we want to study gender? Or, is gender the most meaningful factor we may want to know?
I'm thinking of a different context, where we may also apply this method to find out interesting patterns. Lemme call it the multilingual context. For example, with the number of Chinese overseas students constantly increasing, there is a trend that Chinese young people may mix Chinese and English when they express something online (like on Weibo, or Twitter). These multilingual speakers make word choices when speaking. The audience's perceptions may change when multilingual users change their language style in a multilingual way. This might be a more complex case. Do you think this multilingual context is worth further studying?

FrederickZhengHe commented 2 years ago

Group 1A Members: Angelica Bosco, Chongyu Fang, Yier Ling, Zheng He

Thank you very much Professor Wang! Our question is: In the words and sentences employed in your content analysis, there are names of subjects like "British Empire", "America", "China" that may either sound positive or negative in different contexts, then how does the robust classifier determine the coefficients of such words?

y8script commented 2 years ago

Group 1H: Yuetong Bai, Boya Fu, Zhiyun Hu, Hsin-Keng Ling

Thank you for sharing your interesting work, Prof. Wang! We have the following questions:

We are curious about the possible bias in the human judgment process. As the internal validity of research quality could be ensured by the rigorous control of AMT tasks, human perceptions may introduce bias into the "ground truth" for the model. Will the model be affected by possible bias in the data label?
In the essay on the impact of lexical choice on audience perception, why did you choose to collect data on the human perception of neighborhood desirability in Airbnb while studying gender messages in Twitter and Yelp Reviews because these two topics seem unrelated to each other? Also, are 120 control and treatment sentences sample size enough to make human-derived LSE Estimates?
Do you think lexical choice analysis may be able to predict more general traits of people (e.g. personality) rather than the traits in your article, which seem to be tied to specific scenarios? What type of characteristics or personality do you think lexical choice could reflect or couldn't reflect?

javad-e commented 2 years ago

Group 1F: Sudhamshu, Javad, Fiona, Zhiqian

Thank you Dr. Wang for presenting at our workshop. Our understanding is that the effect of an outcome given a covariate using the Individual Treatment Effect estimation only depends on the treatment effect. The paper introduces some of the covariates that were controlled for as part of the analysis; e.g. age, gender, and height. We were wondering what other variables were taken into account? How was it decided on what covariates to include and how important each variable was? And how were you able to collect data on these covariates? Also, since the mentioned platforms each have samples with different demographics, controlling for factors such as age and geography becomes even more important. So, we were wondering how we could ensure representative results and avoid sample biases?

xin2006 commented 2 years ago

Group 1E: Kaya Borlase, Xin Li, Zoey Jiao, Shuyi Yang

Thank you Prof. Wang for sharing your interesting work with us! We have the following questions:

Regarding the paper on training robust text classifier, in the process of searching antonyms, we're curious that how did you deal with the polysemy problem? Considering the iteration, whether keep getting synonyms for a causal term with multiple different meaning will drift away the original meaning, so that the result is not the antonym for the causal term in that context? And just interested, since the counterfactual sample with antonyms could improve the robustness, whether the synonyms could also help emphasizing causal features or they just make no difference?
Regarding the paper on the impact of lexical choice, you were looking at gendered "terms" in Twitter and Yelp feeds. I have often been told in professional work that it is important to come off as gender-neutral or even masculine in my written communication. One way that a mentor suggested doing this was through punctuation (i.e. not using exclamation points). Do you think one could do a cross reference of sentences determined to be "female" and which sentence users take more seriously? Further, could you do the same sort of substitutions as seen here but with punctuation (switching out "!" and ".")?

JadeBenson commented 2 years ago

Group 1K: Jade Benson, Isabella Duan, Hazel Chui and Joesph Helbing.

Thank you so much Zhao for the great research! We have a few questions about the applications and extensions of the methods described in your papers. Could we use these techniques to label unstructured datasets - how might that function and be most useful? Could we develop software that would allow people to write more effectively (like choosing words that make their Airbnb description as desirable as possible)? Are there subgroups of people that have different classifications based on different words - i.e. are the perceptions of these words culturally contigent? Could these techniques be used to detect bias? I’m imagining a few possible situations like sending resumes that read like women wrote them and seeing how they’re perceived differently or how less desirable words are more commonly used among disadvantaged populations.

hhx2207061197 commented 2 years ago

Group 1L: Xi Cheng, Elliot Delahaye, Hongxian Huang, Yutong Li

Thank you Prof. Wang for sharing your interesting work with us! We have the following question:

We have noticed that in your paper "When Do Words Matter? Understanding the Impact of Lexical Choice on Audience Perception Using Individual Treatment Effect Estimation", you use several covariates to predict the dependent variable (satisfaction, etc) under different lexical choices. Such prediction can hence serve as the counterfactual/potential outcome, which helps to derive the causality. My sense is, the more covariates included, the more accurate the prediction will be, which will help to derive the more accurate causality / treatment effect. However, the more covariates included, the computational time will also be longer. So, how do you make this trade-off?

MengChenC commented 2 years ago

Group 1D: Yujing Huang, Zhihan Xiong, Zixu Chen, MengChen Chung(*), Feihong Lei

Hi Professor Wang, glad to be able to formally see your work and research. We have some questions in the following:

One question is that even if the desirability of an Airbnb listing is characterized as neighborhood safety, which is indeed somehow objective using the measure of neighborhood crime rate, the actual decision-making of rentals are much more complicated, especially that people would prioritize location (convenience to destinations/scenic spots). For example, in downtown LA, maybe the crime rates are high, yet it is close to the Hollywood so many would still consider the neighborhood as desirable for one- or two-night stay. Then the labelling may have bias in the first place.
We are also curious about why we use "the difference in median rating between the sentences" (My boyfriend/buddy is super picky) instead of the mean value. Say, four MTurkers give the values of "5" while six give "1", wouldn't it be more reasonable to choose (4*5+6)/10 = 2.6 rather than simply 1? I wonder what is the rationale behind this measure.
Do substitution words found in this research that increase desirability affect each other when presenting together in a sentence? I am curious if the effect of substitution words aligns with the law of diminishing marginal benefit. If I construct a sentence completely with the substitution words, is that possible to annoy someone instead of having LSE?
It seems to me that the power of classifier increases largely due to 1) a larger training set, and 2)more balanced "sentiments" of the data. Which do you think play a bigger role in the improvement? In other words, if I know my inputs tend to be on the one side of the sentiments, could I improve my classifier by only creating more sample of similar data instead of counterfactual one?

chentian418 commented 2 years ago

Group 1G: Tian Chen, Tanzima Chowdhury, Yulun Han, Qihui Lei

Hi Professor Wang, we are really interested in your paper about improving robustness to Spurious Correlations in Text Classification. And we are still confused about two questions:

Can you elaborate on the exact definition of spurious correlations and causal associations in the text classification scenario? We know example like 'a sentiment classifier learns that “Spielberg” is correlated with positive movie reviews', but are confused about the social science definitions.
We have learnt about your mathematic method of Identifying likely causal features, and we want to know more about how does this method exactly correspond to the causal features we know from social science domain. Thank you!

qishenfu1 commented 2 years ago

Group 1B: Pranathi Iyer, Yuxuan Chen, Guangyuan Chen, Qishen Fu

We are most interested in how you linked your work from pure observational to causal inference. The combination of causal inference and machine learning is a very popular topic these days in computer science. We are wondering that compared to machine learning models that are completely based on observational data, whether and how the introduction of casual inference can improve the robustness of the model? Whether this improvement is unique in online language processing?

j2401 commented 2 years ago

Group 1I Yu-Hsuan Chou, Bowen Zheng, Jasmine Huang, Jingnan Liu, Yile Chen

Hi Dr. Wang,

Thank you so much for presenting your work! We have a generic question and a more specific one.

We were a little confused about the algorithms to lower the dimension of X. Here is a concrete example: The room is not (), but (). The room is not (), and (). Will these two sentences be grouped together (as they are very similar), or but/and will form a pair of substitution that we are interested in? If they are grouped together, then it is possible that their score will cancel with each other in the average (same in the other group). What’s the potential impact on the results? Otherwise, if they form a pair of substitution, what’s the effect of words in the parenthesis? Also, we were also wondering how you decide the dimension (similarity score). For longer sentences, since the interactions between words become more complicated, low scores might lead to errors (too hard to compare two long sentences), but high scores lead to high dimensions.

Your method for selecting closet opposite match is limited to binary classification problem. For multi-class classification tasks, how should we identify words that are causal features?

uchicago-computation-workshop / Winter2022

02/03 #4