Open shevajia opened 2 years ago
Hi Professor Pan, thanks so much for sharing your work! I have a few questions regarding the paper.
The limitation of the size of the sampled data. Despite the fact that the original datasets from Twitter and Weibo are extremely large (14 million and 6.7 million), the paper appears to sample only the ten most retweeted tweets for each week. For Weibo, the top hundred Weibo posts were retrieved with Word2Vec and USE given the viral tweet, and these form the basis for annotations used to determine whether Weibo posts are relevant. The fact that only the top ten tweets were used was quite surprising to me. Considering that the most popular posts may differ significantly from less popular posts, does the application of such a sampling strategy have a significant impact on your analysis and conclusions?
Can you elaborate a bit more on the "ordinary users" on Weibo? I find it interesting to see that ordinary users (or in your words, Weibo users without any affiliation to the media or government) play such an essential role in transferring information from Twitter to Weibo. My prior belief was that those Yellow Verified Users would be playing the major roles. In light of Sina Weibo's recent move to enforce the display of the IP addresses of its users, would it be possible to dig deeper into their characteristics? For instance, are most of them living in China's eastern regions that are much more prosperous? I think it would be interesting to examine whether people from more affluent areas are acting as intermediaries in the information transfer process between China and the rest of the world (80% users are outside the US as you point out).
Classification of the contents of the inflow. There has already been a substantial amount of work done by this paper, but I am still curious about the types of tweets smuggled into Weibo. As a worst case scenario, the imported posts are mostly racist criticism of Chinese citizens even if racism does not constitute the popular topic of Twitter, which might have caused major backlash and resentment within Chinese communities during COVID (as implies by your previous experiment on related issues).
Professor Pan, Thank you for your time in sharing the research! do agree with you that "governments all over the world impose restrictions on access to digital information". For example, on 5.05, Professor Lazer gave us an excellent sharing about how Twitter suppressed the spreading of fake news during the January 6 insurrection, and there's no clear evidence that government did not have a role in it. In addition, I think the limited quantity of the inflowed information can also be explained by something other than government intervention. Your data collection happened during the most challenging time in China, and Chinese people simply didn't care that much about what the outsiders were commenting. The most relevant issues are staying at home during the Spring Festival and waiting for the victory of the battle. For example, in Data Index 9, the tweet criticized Wuhan people as "worst passengers, no manners, stubborn, uncivilized and dirty." Apparently, it's not true. And compared to the great battle to fight, including building hospitals and supporting Wuhan, Chinese people simply ignored the non-sense criticism. In addition, delivering such information back would harm the friendship between China and the entire world.
Hi Professor Pan,
Thank you for taking the time to share your work with us. I was wondering if you had a little more information on the "ordinary users" who are circumventing barriers to information flows.
Overall, I am wondering about the effect of possible survivorship bias on the ordinary Weibo users you have identified. Thank you!
Dear Professor Pan,
Thank you for your time today! It is a really interesting topic. Since you discussed about the different response windows in terms of the news/media types in the research, I wonder how you filtered out those side influences driven behind which may cause bias in the timeline window interpretations. For instance, it is possible that CGTN facilitated media responses to news faster than other topics simply because the posts was made in a special period/ festival such as the National Day of the People's Republic of China. They may want to avoid potential collective actions. Another question I have is about the participants as I did not see their information in appendix. You mentioned those bilingual researchers manually pair the posts/comments on Twitter and Weibo. I am curious if they have similar backgrounds such as education levels and cultural knowledge to distinguish those tweets or weibo posts precisely. Will this cause any possible bias in results?
Thank you Professor Pan for sharing your work! I am wondering if you have looked at the expat community in China (especially in the big cities such as Shanghai, Beijing, Shenzhen and Guangzhou) and their role in facilitating the inflow of global information into China, versus domestic residents. The expat community emigrated to China after living in an environment with easy access to globally-used social media platforms, and thus could be more attached to vpn services. I suspect that they were the ones who use both global social media platforms (such as Twitter) and China-specific social media (such as Weibo) to circulate information.
Thank you Prof. for sharing such interesting and applicable work. In your paper, you mention the fact that social media sites restrict content, such as "fake news", but there is no guarantee that these are laissez-faire restrictions. Obviously, there is a lot of hidden information as companies become privatized. In your opinion, in cases such as Twitter where Elon Musk is purchasing Twitter, supposedly for freedom of speech and to ease these restrictions, can the company really be laissez-faire and unregulated, in regards to censorship? Are there any ways that you would imagine one could measure this? Would this idea even be possible under restrictions such as those in China? Thanks
Hi Professor Pan, Thanks for sharing your work. I am interested in the measure you use to define information inflow given two spontaneously occuring signals. How do you tease apart in-flow information to China from mutual information due to common driver or flow of information in the opposite direction. I am reading something on transfer entropy, which define explicitly directional information flow [https://journals.aps.org/prl/pdf/10.1103/PhysRevLett.85.461]. I wonder if such a measure would be appropriate and benefical to this study. In particular, I am curious if the transfer entropy from other countries to China exihibits dramatic decrease while transfer entropy from state media to popular media plafroms increase after Chairman Xi seized power. Thank you!
Hi Professor Pan,
Your study is super interesting! I was wondering whether your wording of your results might be slightly misleading. You phrased it as information making its way into China 'despite' government censorship. I defer to your expertise, but none of the examples you cited making its way into China seemed to be information that the government actively attempted to censor. It seems like from the way you described things, the overall censorship apparatus makes it easier to stamp down on the stuff that they actually want to censor. Is that the case, or am I misunderstanding things? How does information that the Chinese government actually prioritize censoring make its way into China (or not!)?
Thank you or sharing this work with our group! We've seen a fair share of content on the topic of Chinese online censorship, perhaps due to the ease with which it lends itself to the computational social science toolkit. One previous presentation which comes to mind is from a group at UCLA who found, studying the outbreak of COVID-19 in China, that these crisis moments in fact increase popular access to media by means of promoting more risky censor-circumventing behavior. This seems to directly relate to the acknowledgement in your paper that studying specific Chinese media outlets may not offer a full picture of media consumption behavior behind the great firewall. This is a kind of meandering way of asking whether government censorship in China is more about political grandstanding than the actual management of information.
I have 2 questions:
Thank you!
Hi Professor, Thanks for the sharing. I think the limited amount of incoming information can also be explained by reasons other than government intervention. In addition, I am also curious if the transfer entropy from other countries to China will drop sharply when President Xi takes office, while the transfer entropy from state media to mass media platforms will increase.
Hi Professor Pan, thank you very much for sharing your work with us! My question is about the research design. Do you think it is necessary to apply the similar method on a different topic (e.g., topics not related to politics) as comparison to the results in the current study? This may help us better estimate the Chinese government’s role in the process. Also, adding other countries that does not ban Twitter as comparison may also be helpful, because we don’t naturally know what should be a normal transmission pattern of information between global social media platforms and domestic media platforms without comparison groups. Thanks!
Hi Professor, thank you for coming to speak with us.
Given the limited flow of information, have there been any studies done that test how informed China's citizens are about global news coverage compared to similar but more open societies? I believe this would be the next step for your research question and this paper could establish the causal link depending on the results. Further, how does your methodology scale to other instances such as Russia's current information environment?
It’s no secret that CCP exerts significant control over Chinese media (mass or social). Nonetheless, this paper that captures such process is thought-provoking. It would be interesting to compare the patterns of information inflow from Twitter to Weibo with that from Weibo to Twitter. That is, what are the differences in the ways COVID-related, event-driven messages diffuse transnationally to a highly moderated versus a loosely regulated platform? Additionally, like my fellow classmate mentioned, characterizing the group of Weibo users who facilitate the flow of information would be worth looking into. Lastly, I am curious about the content that didn’t reach the Weibo platform - are they trivial commentaries or significant events that could have been actively censored? Looking forward to your presentation!
Professor Pan, Thank you for your time in sharing the research! do agree with you that "governments all over the world impose restrictions on access to digital information". For example, on 5.05, Professor Lazer gave us an excellent sharing about how Twitter suppressed the spreading of fake news during the January 6 insurrection, and there's no clear evidence that government did not have a role in it. In addition, I think the limited quantity of the inflowed information can also be explained by something other than government intervention. Your data collection happened during the most challenging time in China, and Chinese people simply didn't care that much about what the outsiders were commenting. The most relevant issues are staying at home during the Spring Festival and waiting for the victory of the battle. For example, in Data Index 9, the tweet criticized Wuhan people as "worst passengers, no manners, stubborn, uncivilized and dirty." Apparently, it's not true. And compared to the great battle to fight, including building hospitals and supporting Wuhan, Chinese people simply ignored the non-sense criticism. In addition, delivering such information back would harm the friendship between China and the entire world.
I would count not "delivering such information back" due to fear of "harming friendship" as government intervention, though. Seems like a strategic move to achieve a political outcome. I do agree that Weibo doesn't have to pick up everything on Twitter. I am wondering if the language barrier would play a role in information flow as the majority of Weibo users do not understand English. The agents that facilitate the transmission are more salient in this case.
Thank you for sharing the research! I am also interested in why you chose such a small sample and the patterns of tweets that emerge on Chinese social media.
Professor Pan, Thank you for your time in sharing the research! do agree with you that "governments all over the world impose restrictions on access to digital information". For example, on 5.05, Professor Lazer gave us an excellent sharing about how Twitter suppressed the spreading of fake news during the January 6 insurrection, and there's no clear evidence that government did not have a role in it. In addition, I think the limited quantity of the inflowed information can also be explained by something other than government intervention. Your data collection happened during the most challenging time in China, and Chinese people simply didn't care that much about what the outsiders were commenting. The most relevant issues are staying at home during the Spring Festival and waiting for the victory of the battle. For example, in Data Index 9, the tweet criticized Wuhan people as "worst passengers, no manners, stubborn, uncivilized and dirty." Apparently, it's not true. And compared to the great battle to fight, including building hospitals and supporting Wuhan, Chinese people simply ignored the non-sense criticism. In addition, delivering such information back would harm the friendship between China and the entire world.
I do not agree with the above since what is freely asserted can be freely deserted (apparently).
Hi Professor Pan, thank you for sharing your work! Very interesting topic. One clarification question: how do you count the matched content between tweets and weibo? I see in the appendix that you use the Word2vec model to calculate the cosine similarity between weibo and tweets, but I wonder how the Word2vec works for Chinese and English respectively? What threshold were you used to determine the content is matched? Another question is that your study is during the covid-19 timeframe which has a lot of particularities. Would you anticipate the same limited inflow during another time period? Thanks.
Hi Professor Pan, Thank you for sharing your work with us! It was very interesting to read about government censorship and how information flows despite this barrier. I am not very knowledgeable in this area of research, so it was very interesting and exciting to read about your research! In the paper "How Information Flows from the World to China", the authors mention about one-fifth of the content that gains widespread attention on Twitter can be found on Weibo. Was this a surprising finding for you? Is this what you predicted, or did you predict that other media outlets would be more dominant? Also, how can the findings of this study be applied to other cultures and other countries? Thank you!
Thank you for coming Professor Pan! It seems like your work has sparked a pretty intense discussion among my fellow classmates. That's always a good sign. I am curious, why did you choose the COVID crisis as the sole time frame within which to study censorship? I understand that this is probably the best time to test censorship, however, having some baseline level of censorship to act as a control would have probably made more sense. Censorship of online platforms during COVID was actually quite common and it's hard for me to distinguish between censoring a medically unsound opinion circulated on social media from a politically motivated removal of a social media post. Also, could you walk us through the reaction your research has had among Chinese researchers? Could you characterize their response to your work?
Thank you for presenting your research at our workshop! How do you expect the flow of information to be different for topics other than COVID-19? In particular, how are entertainment-related information and contentious information on political matters expected to differ?
Thank you for sharing your research with us! I am curious how you think of indirect and nuanced ways of speaking that Chinese Weibo users have picked up quickly to talk about sensitive things despite censorship - what proportion of that information would be compared to explicit one? how could we account for the indirect way of communicating ideas? Thank you!
Thank you for sharing your work! I was wondering if you observed any patterns in terms of the types of content that are flowing into China that were facilitated by Weibo users. In your paper, two of the examples you gave were more satirical, rather than providing new knowledge. In addition, all three examples seem like the state has little reason to censor them. Considering this, could it be that although non-state-controlled outlets are contributing to the inflow of information, they are not actually able to contribute substantive information--that “matters” or that can “set the agenda” (i.e., whether because of state- or self-censorship)?
Thank you for coming Professor Pan, I am very interested in China's censorship and how censorship changes over time. Also, could it be any method to distinguish between self-censorship and government-imposed censorship? (Eg. some content may not appear on Sina Weibo because user chose not to post on it compared to content that is posted and later censored)
Thank you for sharing your research with us - I found it very interesting! I have one main question. You mention that despite the high levels of censorship in China, the vast majority of people living there don't make serious attempts to evade the censorship. How has the Chinese government has gone about censoring the media while avoiding a significant backlash?
Thank you for coming to our workshop Professor Pan! I am curious about what happens to information inflows when censors are overwhelmed with a plethora of domestic content. Meaning, that if there is a lot of discussion about a topic occurring on Weibo, say about recent lockdowns in Shanghai, are censors overwhelmed, allowing for greater inflow from outside sources?
Thank you for sharing your work, Professor. I had two queries regarding your method:
Thank you for sharing such interesting work with us. Since this research on information inflow was based on Covid-19 and was to some extent similar to an event study, I was wondering if the conclusion can be generally applied on other events. If so, is there any evidence for the robustness? If not, what could be the potential differences? Thank you so much!
Thank you for sharing your work with us, Professor Pan! I am wondering if you have done any research regarding account banning due to political reasons on Chinese social media platforms. What are the effects of the bans? How is account banning related to smuggling information from social media platforms outside of China? Another question is do you worry that your research might be used by other governments who wish to impose control on information flow like China?
Thank you for sharing your study that is particularly unique in this structure. It is a truly daunting to conduct a cross-lingual analysis using computation methods, and your project managed to complete a splendid job with w2v and USE tools, preventing loss/miscommunication of information due to translation and transcription issues. The project is truly enterprising in its advancement towards looking into the entire world ethernet as a whole. Still, I have following worries and think it may be made more delicate and robust: 1) I am concerned that the 150 tweets sample may not generate representative information: the selection process from the massive original dataset measured in millions was an impressive job, but reduction of this scale also greatly reshaped the sample environment and must be done with some degrees of subjectivity. In addition, the eventual result based on the analysis of such a small sample cannot really be considered a computational method and really forgoes the advantages of the large dataset. 2) I think the explanation on how Chinese authorities control news inflow is too simplified. Using Weibo is a great option since it really comes to personal account to transcribe foreign news outlets while state medias just neglect them. However, there are multiple layers of censorship mechanisms that not only come from governmental intervention, but also autonomous abridgement by posts' authors and the social media platform. In addition, using the compiled historical Weibo dataset could miss a substantial amount of information: (based on my personal observation) during a daily period from midnight to about 4 am, Weibo users tend to communicate in large numbers on matters that would normally be censored, using phrases that could circumvent automated filters, and such records would be deleted by the platform's human supervisors once they come online in the morning. The traces of these communications would not be found in the dataset, but in fact a lot of news inflow occurred and masses of Chinese people were informed. I wonder if using a real-time collected dataset (definitely many more times more work) could evade the issue.
Hi Professor, thank you for sharing your work. I found the results on "whose Twitter content happens to appear on Weibo," but are there any clues on what kind of content happens to make its way into Weibo as well? Would it be possible for these deep models to infer what characteristics of a Twitter content, besides its creator, make it more likely to flow into Weibo? What topics and what kind of language is tolerated by the enforcers of censorship for instance, and how has it evolved over time? I guess these questions can be answered without deep learning and through subject matter expertise as well, but I think actually understanding how and in which contexts censorship is enforced by looking at empirical data could be illuminating as well.
Hi Professor Pan, thank you for the amazing project. I'm wondering what is the expected influence of the deleted Tweets or Weibo posts? Since some of the censorship information is deleted on a daily basis, it does not mean that those posts had no influence on the community. During the current Shanghai lockdown, a lot of posts are deleted. However, those posts were circulated and largely changed the public discursive field. Without incorporating those deleted posts, I wonder how would the result be different from the true effect?
Hi Professor Pan, thank you very much for sharing your work. What is the government's criteria in deciding which type of information to censor or not? How does this criteria evolve over time and in the face of new events (e.g. covid outbreak)? How does the censorship criteria remain consistent across different regions and social media platforms?
Hi profesor Pan, thank you for sharing your work! I am curious that during latest Shanghai lockdown and other previous series social events, sensitive posts can last for a much shorter time on social media in China. Since this strict and predominant censorship does not look likely to stop or go back to pre-covid time, do you think this will affect future study about social media in China?
Dr. Pan,
Thank you for sharing your work with us! My question comes more from a general curiosity with regards to censorship in the digital age. As I understand, China has practiced censorship of foreign media far before the development of social media (or even the internet as a whole). Have there been major changes to the government's policies stifling the flow of transnational information with the advent of these new technologies? Has the level of control increased or decreased as a result (thinking along the lines of more powerful censorship tools, but also additional inroads for information).
Hi Professor Pan, thank you for sharing such interesting work. A quick data question regards the proportion of "tweets" in the Weibo-COV dataset that are pre-censored. Could your result be understating the information flow if some posts are censored after they are posted? I also have a question regarding the content of the viral tweets. Among the viral tweets, do you see if a certain type of tweets are more represented among the tweets that flow toward China (those support China, against China, or neutral)?
Thank you professor Pan for sharing this amazing work related information transmission via social media! As a frequent user of Weibo, I have a deep feeling about the flow of unmanaged information in social media despite censorship and state control of media. According to my experience, a substantial amount of such information flow is transmitted by influencers who are not based in China and who have unlimited access to the global internet and can create accounts on Chinese social media and share content. I have two related questions: 1) Even though these global information can be disseminated quickly by these users, these sensitive information is usually quickly blocked by the officials and disappeared on Chinese social media after some substantial views in the short time. How can we weight these information showing up and disseminated initially and then disappeared? 2) Nowadays Chinese media like Weibo has put on some new measures, among which a significant one is every post has an attached geographic place of origin. As I mentioned before, if a substantial part of information transmission is conveyed through overseas users, then how will the place of origin attached to the post influence the dissemination process?
Hi Professor Pan, thanks for sharing your work with us! I am curious about the effects of deleted Weibo posts on the results. Since Chinese social media is strictly censored and some sensitive contents/topics would be removed in a very short time after published, would these contents affect the inflow of information? Also, besides the inflow of information from Twitter to Weibo, would you also be interested in the outflow of information from Weibo to Twitter (from China to the world)??
Professor, thank you for sharing your work and I really appreciate the attempt to compare information across platforms. This idea is really cool and I know it's challenging. However, I am a little confused about how did the data and the method answer the question posted at the beginning using 150 sampled tweets out of billions of tweets? Can you discuss it more in detail? As one of my colleague said that it's not a secret China sets an internet firewall and censors contents. I am thinking about a less identifiable topic. I am wondering would it be a topic worth doing that how do medias in countries where information could seemingly freely flow in like BBC and CNN selectively ingest information from other places of the world and pass them to the public and what the difference would be comparing to countries exert extensive censorships?
Hi Professor Pan, thanks for sharing your work! I find the research design very interesting, which combines automatic data collection pipelines and human-involved preprocessing. I was wondering how the validity of the human matching process is verified. Thanks!
Dear Prof. Pan, thank you very much for sharing your work! A lot of us here have experienced the censorship on Weibo ourselves, and your research really attracted our interest (and dispute). I think my classmates have addressed the curiosity I have, and a lot of them are adding insightful comments. My biggest concern is: Many Weibo posts are censored and deleted due to report of other users. Therefore, is it possible that some posts were not deleted simply because the dissemination scale is small, or only within a certain group of people?
Thank you for presenting such interesting work! I'm wondering if it is possible to look at other topics (such as popular culture, and negative news about foreign countries, especially U.S.) and its penetration into the Chinese social media world? That should provide some indication about the strategical consideration of the Chinese government's acquiescence to information inflow.
Prof. Pan, thank you so much for sharing your work with us.
The saying "lost in translation" implies that as information move across different languages and media, their meaning also changes. This also seems to be the case in the examples of cross platform information flow shown in the paper, as instead of showing the original article in full, people cited, quoted, and questioned the information flown in. My question is: have you thought of ways to characterize or quantify this type of translation? If the Chinese government is capable of controlling the ways of translation (even if they have less control over the content) then the argument for citizen agency becomes weaker.
Hi Professor Pan, thank you for bringing us such an interesting and significant work! Yes, I think we as a Chinese citizens living in China could get a sense that information is not impenetrable even though in a country with a strong censorship like China from our daily life. What I am also interested in is the phenomenon that nowadays people are aware of the existence of government censorship and make corresponding, instant response such as forwarding censored news one by one as a fight-back. How do you think of it? Would if be possible if you look into the interaction between Chinese netizens and the government?
Hi Professor Pan,
Thank you for sharing your research with us. I’m curious if you noticed any patterns of focus or sentiment in the Twitter and Weibo data between the transmitted and untransmitted. In other words, were the posts that were more transmitted to Weibo more likely to be offensive/negative about China and more likely to generate debates about behaviors regarding Western nations?
Hi Prof. Pan, thank you so much for presenting your research with us this week! This topic is quite interesting, especially for us Chinese. I have a question regarding step 3 human matching. I noticed that two research assistants finish all the matching work. Is there any possibility that we can hire more RAs and cross-validate the matching results to prevent any misclassification?
Hello Professor Pan,
It's a fascinating methodology, quite impressive and on a very interesting topic. I understand given the complexity of the research that a significant amount of human hours were needed to complete it, but I wondered if you might talk a bit about any extra levels of automation you tried for this project and then abandoned for lack of efficacy, if there were any.
Dear Professor Pan,
Thank you so much for presenting your research. The research is really interesting. Besides the existence of Chinese censorship, I think not having any restriction also contains some problems as well. I was wondering what would be ideal balance or control that you think we should have as society. It would be appreciated if we could hear your thoughts on it.
Thank you so much for sharing your work! Here is my question: As you mentioned as well, censorship do exist on Chinese social media like Weibo, have you ever thought about weight the censored information? Do you think it is plausible at all? What can be the possible ways and methodology. And recently social media like Weibo put on new features to show one's geographic location IP. How do you think this new move can have an impact and how can they be studied? Can this change contribute to any specific topics?
Looking forward to your presentation.
Dear Professor Pan,
Thank you for sharing such interested research related to the media censorship in China. I have several both conceptual and technique questions regarding cultural and language differences for across-border research.
Thank you!
Thanks for your share of research! I wonder is there any deplatform strategy for social media platform such as Weibo, and how Weibo adjust their popularity scoring algorithm to serve for the purpose of government as well as oppressing the collective actions?
Comment below with a well-developed question or comment about the reading for this week's workshop. These are individual questions and comments.
Please post your question by Wednesday 11:59 PM, and upvote at least three of your peers' comments on Thursday prior to the workshop. You need to use 'thumbs-up' for your reactions to count towards 'top comments,' but you can use other emojis on top of the thumbs up.