uchicago-computation-workshop / Winter2021

Repository for the Winter 2021 Computational Social Science Workshop
7 stars 5 forks source link

01/21: Chris Kanich #2

Open smiklin opened 3 years ago

smiklin commented 3 years ago

Comment below with questions or thoughts about the reading for this week's workshop.

Please make your comments by Wednesday 11:59 PM, and upvote at least five of your peers' comments on Thursday prior to the workshop. You need to use 'thumbs-up' for your reactions to count towards 'top comments,' but you can use other emojis on top of the thumbs up.

bakerwho commented 3 years ago

Thank you for presenting at our workshop! Your operationalization of the axes along which users think about their data on the cloud was a clear and useful framework that I'm sure I'll keep coming back to.

While reading this research, I couldn't help but think about my own inertia (for years now) against organizing my Google Drive storage. I recently bit the bullet and paid for additional storage - especially now that I'm having to keep track of data across multiple Google accounts. At the same time, I certainly have a lot of personal data on hard drives which is differently concerning - it is arguably more secure to access a file on another computer via the cloud rather than plugging in a pen drive and uploading it there!

What does your work on Aletheia say about the types of new paradigms users might find valuable in managing their data? Do you see a feature like 'delete by X date' or 'ask me to delete in Y days' being useful or catching on? Do you see these solutions coming from within organizations like Dropbox and Google, or do you think they might be disrupted by new 'smart file management' SaaS technologies (especially if they can integrate well with these platforms as your MTurk study seems to have done)?

Raychanan commented 3 years ago

People often forget that they have stored certain files on cloud services like Google Cloud and Dropbox. Considering the protection of user risk, you have designed Aletheia to predict the usefulness and sensitivity of files, and the accuracy of this algorithm seems to be pretty good.

My concern is that even though such algorithms are very mature, some companies may still not give users such buttons (or alerts) about deleting files for profit purposes. I see that the recent new policies of Google Cloud and Google Photos are actually shrinking the amount of free cloud storage users can use, thus driving people to pay for additional storage. So, from a profitability perspective, I think companies have an incentive not to take advantage of algorithms like Aletheia. Can I ask you what you think about this problem? Thanks!

MkramerPsych commented 3 years ago

Professor Kanich,

Thank you for sharing your research with us! As I recently migrated all my work from cloud storage to a self-managed NAS, I spent some time pondering the tradeoff between ease of use and efficiency of cloud storage. My questions are as follows.

  1. I am curious as to whether the observed effects would translate to other external storage mediums, or even internal storage. I know from experience that Drive and Box can quickly turn into a graveyard for files - especially when one is regularly shared on files. Do you think people would hold the same preferences for external hard drives, or even folders on the desktops of old computers? What is it about cloud storage specifically that drives this behavior?

  2. To a similar point, how much of the observed effect do you think may be attributable to HUI elements common to cloud storage? I wonder if users would be as quick to delete/encrypt/offload certain files if a clear directory structure and supporting GUI were overlayed, potentially even using automated processes.

j2401 commented 3 years ago

Hello,

Thank you for sharing your research with us!

I am currently a heavy user of Dropbox, therefore your research immediately reminds me of checking files in my Cloud storage. I find that most of them are documents (HWs, paper, codes) and a large proportion of them are totally useless, for example .aux and .log files for LaTeX compiling. However, just as @MkramerPsych mentioned, it seems that huge collection of forgotten files also appears in my old computers and even my laptop in use. Do you believe that Cloud Services are more vulnerable to the risk of breach? If not, would you believe people are holding similar or different attitudes towards forgotten files that left in cloud storage and hardware devices?

Lynx-jr commented 3 years ago

Hi Professor Kanich, thanks for sharing your research with us! I don't have questions to be honest but my thoughts of the question from @Raychanan is that maybe Google does not have enough incentives to incorporate Aletheia for individuals to use, there are still application scenarios in private companies' databases and that's always a larger piece of pie. Service providers like google could also consider streaming the data that is used less into other servers to reduce pressure (not an expert on this so I'm just making a wild guess).

It is fun to read everyone's experience of cloud storage and I'd like to share mine as well: Google Cloud (barely use it); Baiduyunpan (the Chinese Google Cloud, VVIP account till today for storing games). But I got tired of enduring the net speed constraint, so I bought a portable hard disk last year of 1T.

boyafu commented 3 years ago

Professor Kanich, Thank you for sharing this fascinating research! I am curious about the origin of the abundance in digital storage. Is this phenomenon rising with the new era, or the advancing storage techniques somehow magnify people's inherent habits? In the latter case, it might be more like a tradeoff between convenience and awareness of potential risks.

lulululugagaga commented 3 years ago

Thanks for sharing! Waste in cloud storage is indeed a big problem. Based on my knowledge, most tech companies in China do not earn money from this service. Rather, they just wish to use this service to keep their customers more active. Therefore, much of their effort is spent on saving the storage space. Baidu, for example, owns an advanced technology for immediate data splitting and assembling, which helps the company to maintain its leading position in this field because such technology achieves data slices storage and they don't have to pay for data storage files as a whole.

NikkiTing commented 3 years ago

Thank you for sharing your work! I agree with what @Raychanan has mentioned about private companies. With the convenience and relatively smaller costs associated with cloud-storage services, I think more people are turning to paid services to store more and more of their files (e.g., rather than buying external hard drives). And one likely factor driving people to opt to buy more cloud storage is that they are too lazy or don’t have the time to manage a large accumulation of files. Private companies likely take advantage of this behavior and may not have the incentive to adopt retrospective cloud data management systems. Given an increasing number of people are probably paying for extra cloud storage, I would like to ask how you think the observations from your research would compare to the behavior of users of paid cloud-storage services.

ttsujikawa commented 3 years ago

Thank you for sharing your interesting and fascinating work.
I personally feel that the abundance of digital files has been increasingly problematic especially since people started unthoughtfully storing their personal information. In my opinion, I need companies to be more responsible for potential risks. Is there any way that private companies make sure the protection of a huge amount of digital data, any cases?

skanthan95 commented 3 years ago

Thank you for sharing this important work! My peers have asked the questions that I had in mind, so I'll ask some more general questions about cloud storage.

xzmerry commented 3 years ago

Professor Kanich,

Thanks for sharing this research with us! It is meaningful and related to everyone's daily life, as I often encounter such data management problems when using cloud storage. A lot of questions I had when reading your first paper were addressed in the second paper, but some still remain:

First, how you control or eliminate selection biases here? Since the research is kind of sensitive and people might hesitate to participate, how you ensure that the patterns we observed were not induced by selection bias? A more statistical check might be better? (In the first paper, your explanation is that people in the survey might have less sensitive data in their storage, and there were still some sensitive data in their storage. But you also mentioned people's attitudes towards cloud storage vary with their understanding of how the cloud work, so those who participated might also have a better understanding of cloud storage and regard it as a safer choice.)

Second, I am still curious about some ethical details mentioned in your paper (though you have stressed it). How you ensure that there would be no leakage of privacy here technically? Also, how you convince the users to participate in this research?

What is more, a conceptual question, as "participants perceived cloud storage to be less secure than local storage", and they might tend to store less sensitive data in the cloud, how you think of the value of this research? And what kind of security should the data management strategies aim to be under this setting?

Thx

chrismaurice0 commented 3 years ago

Thank you for Professor Kanich,

I am curious as to the motivations behind writing these two papers, both of which were fascinating. As someone who uses google drive solely as a backup for what is on my hard drive, I have never been that concerned with any personal information that exists in the cloud because those same files exist on my hard drive. I know having files stored on the cloud presents a risk, but not any greater than the risk of having files on my hard drive. Did you ask the participants questions about privacy concerns with the cloud versus a hard drive? Are you instead interested in freeing up storage in the cloud?

JadeBenson commented 3 years ago

I've been really interested in the push for greater privacy on social media (like the right to be forgotten) and its many challenges - but I had never thought about these issues in regards to Cloud storage. Thank you for introducing me to this and for your thoughtful research.

I was particularly interested in your discussion on property and privacy perceptions of shared files. I could imagine a case where one user has shared information which they deem as now undesirable, but the recipient wants to retain the information to pursue legal action (I'm sure you can also see how this could be used unethically too). You mention a few possibilities in your paper and I was just wondering what are your opinions on how to handle shared files and how would you adjust the algorithm to reflect this? Do you think other security features could be developed like warning users that shared files have been accessed at different frequencies by the recipient, differences in "uselessness" perception between the two, the recipient selects different storage preferences when prompted than the user, etc.?

rkcatipon commented 3 years ago

Dr Kanich, thank you for sharing your work! Is there a good rule of thumb everyday consumers like myself should consider when managing their private data? Because of other data breaches and given how easy it is to track personal information online, I have very little expectation of my own privacy. At this point, I expect companies to companies to use my data to advance their own business objectives, with or without my knowledge. What are some of the ways consumers can learn more about privacy vulnerabilities and also demand more privacy features from their providers?

YijingZhang-98 commented 3 years ago

Dr. Kanich, I really enjoy reading your paper. It's an interesting and important issue to explore. It just comes up to me that we may use the advanced prediction model to predict what kinds of files are the users may want to delete, stop sharing, etc. And the cloud storage service could automatically ask its users whether they want to delete, share this file. That might be a good application of the prediction model : )

a-bosko commented 3 years ago

Dr. Kanich,

Thank you for sharing your knowledge with us! The topic of cloud storage and data management is extremely relevant in today's time. This topic is something not always thought about, but can lead to security risks and privacy breaches, as you mention.

Here are a few questions I have pertaining to your Forgotten But Not Gone article:

  1. Should all data storage platforms be required to share how data is managed on their platform? This can be a way to hold these platforms accountable for protecting their users, as well as pushes the platforms to do their best to ensure security and prevent breaches.

  2. Should data storage platforms be encouraged to send reminders about old stored files? If these platforms send out reminders about unused or unopened files that haven't been used for a while, this might help push people to remember and delete files that aren't necessary to them anymore.

mikepackard415 commented 3 years ago

I'll add to the chorus of thanks for sharing your work with us! Very interesting and relevant research. My question has to do with data storage literacy in the general population. Do you think that in addition to automated tools for human-assisted file management, there is a parallel channel of work to be done in educating people in file management best practices? Thanks again.

sabinahartnett commented 3 years ago

Thank you for sharing your work with us! Since this study interrogated the possibility of a personal 'data leak' I'm wondering what relationship you see between this type of security breach and the ways in which machine learning algorithms which can predict some of this private data without necessitating a leak? How do you think has this influenced people's perceptions of their data privacy?

nwrim commented 3 years ago

Thanks for coming to our workshop and sharing your work! I find your approach in your draft of using qualitative interviews for guiding quantitative study at the later part of the study highly interesting. Could you tell us more about this process? For example, did you have a pre-designated set of participants to aim for when you started recruiting? Or did you stop when you were confident that all the important information has been acquired from the sum of the interviews? How did you map the features you found in the interviews to the features of your quantitative survey/experiment, and to features of the machine learning model at the final stage?

kthomas14 commented 3 years ago

Hello Dr. Kanich, thank you for sharing your work! My question pertains to the Forgotten but not Gone paper. How did you manage participant privacy? When dealing with such personal and sensitive information, how were you able to ensure that there would not be any issues regarding collecting and storing participant information?

alevi98 commented 3 years ago

Hi Dr. Kanich,

Thanks for sharing your research and coming to the workshop! I'm excited to hear your presentation tomorrow. Data management feels like one of the most crucial emerging social issues in today's world, especially as we've spent the past year operating almost entirely online. I have a couple questions:

1) others have raised concerns that large companies might not implement retroactive file deletion reminder mechanisms because they can charge customers for purchasing extra storage. Do you think public, government-subsidized cloud infrastructure is a viable possibility for the future? Why or why not? If yes, how would privacy concern mitigation compare to private companies?

2) I know in gmail there's a feature that tells me when I haven't opened messages from a particular sender list in a while, and it asks if I want to be removed from that list. Has anyone tried to use the same technique for a retroactive file deletion reminder mechanism? That came to my mind when I was reading about how this task might be challenging to automate.

Thanks in advance :)

ydeng117 commented 3 years ago

Thank you for sharing your work! My question is that whether the problem of forgetting personal files relates to the absence of information in the user interface? To be more specific, if could storage platforms are required to show how long has a particular file been saved and whether it has been shared in a conspicuous place in their UI, will people still forget to manage their files and unconsciously leak their privacy?

hesongrun commented 3 years ago

Thank you so much for the wonderful presentation! How do you establish external validity of this research? From the figures, we know Aletheia performs very well for your 3525 labeled files, but how do we know it works for more broader range of population, especially when there is great heterogeneity across people in their perceptions of the files' sensitivity, usefulness, and desired management? Thanks!

SoyBison commented 3 years ago

Thanks for sharing your work! I noticed that you used technical background as a proxy for privacy-mindedness, or at least you included it in analysis as to have something to link to technical knowledge. The usual intuition, I think (and mine at first) is that these should be positively related. More technical people are more privacy-minded. But at the same time, I know a decent amount of people with advanced computer science, engineering, and even cryptography degrees that don't think about protecting their privacy online at all. As such I wonder how important technical expertise actually is to the analysis, and perhaps this could explain the small effect size you observed.

linghui-wu commented 3 years ago

Thank you for sharing your research with us and it is really exciting to see studies on topics relating to our everyday life. I believe individuals like me who have terrible file managing system and thus waste unnecessary storage would benefit greatly from your algorithm. However, I have a similar concern, like the one proposed by @Raychanan, - to what extend would tech giants like google are sufficiently motivated to adopt it at the potential cost of profitability?

ddlee19 commented 3 years ago

Thanks for sharing your work! I am curious about the specific duration of longitudinal data management you propose. When working with participants, were there common timelines or a threshold of time passed before the participants forgot about their file (important enough to be private and protected)?

Thanks!

AlexPrizzy commented 3 years ago

Super cool research! I'm interested in the privacy of social media and expecting temporality to fluctuate as a function of time, since users can't always predict what their preferences will be in the future. Do you investigate the aspect of emotions on memory of files? Since memory can be affected by internal states (anxious, depressed, happy, etc), I'd be interested in how the emotions related with a certain file may affect the users ability to remember that they have these files stored somewhere.

cytwill commented 3 years ago

Thanks for presenting this work. I think this is an important issue for many cloud storage platforms today. Besides acknowledging the regrets of users for not deleting some files, I want to know what kind of factors might lead to stored private files forgotten by individuals, and more importantly, can we make predictions on what kind of files to be removed from the cloud for which customs based on their past uploading and deleting or encrypting histories. Also, Are there specific and scientific approaches like (checking the editing history) to decide whether a file is forgotten by its user?

bjcliang-uchi commented 3 years ago

Thanks for sharing! I am curious about the psychological implications of your survey questions. For example, 1) do participants expect "deleting" a file to be permanent, or do they simply want it to be removed from the clouds? 2) do work-related files (does content matter) have a faster decay rate? 3) does "not remembering the file name" necessarily mean that the users cannot remember anything about the files? It seems to be that they may simply have very bad naming habits.

Tanzi11 commented 3 years ago

Thank you, Professor Kanich for your work, and it takes a new dimension for me as my Google Drive threatens each day to be over capacity...I wanted to ask you about your interview methodology: how did you decide that 17 qualitative interviews was enough and in this scenario, why did you choose to include a qualitative component? How do you feel it added to the quantitative interviews?

Yutong0828 commented 3 years ago

Thanks for this very inspiring paper! It provides a unique way of perceiving the relationship between users and big data. I have a question about the motivation behind deleting action. It could be either because they want to release the space, or it's due to some privacy concerns. Could there be difference between the two? Maybe such difference in motivation could influence the algorithm designed for giving suggestions in retrospective file management.

jsoll1 commented 3 years ago

Thank you for sharing your papers with us! It's always exciting to get early access to an unpublished paper!! I'm still pretty concerned with the selection bias present in your papers: I don't see any people who have sensitive stuff that they care about agreeing to take part in this study. And if this is a discrete group in the population, are they major contributers to the overfilling of cloud based storage?

I'm also a bit unclear on what you did when you were categorizing people as google drive or dropbox users. You categorized people based on what they had more files in, and then needed more dropbox users so put out the call for the group exclusively. Isn't it possible that the people you later recruited also had more stuff stored on drive than dropbox?

william-wei-zhu commented 3 years ago

Thank you for your paper. I wonder what is the next step for Aletheia? Does your team plan to improve the product and push it to market so that users can use it to clean their cloud storage space?

tianyueniu commented 3 years ago

Thank you for sharing the inspiring work! I can definitely see myself using a product like Aletheia. As Aletheia also has to gain access to users' data in order to analyze them, how would the product guarantee users' data privacy?

wanitchayap commented 3 years ago

Thank you for sharing your works! I wonder if there is a way to better design the cloud storage platform to aid or incentivize user to organize files better and more aware of their own files (I mean before the stage they need Aletheia)? Another question is that, is it possible there is an unintended negative effects from Aletheia? For example, it is biased in some harful ways, or it encourage some sort of self fullfilling prophecy.

wanxii commented 3 years ago

Technically, it's an intriguing research. However, I'm quite concerned about the privacy issues that might accompany with real-life application (especially for personal users rather than firms).

As cloud storage with its massive capacity are atrracting more users to upload their private files to the cloud servers, adopting this kind of algorithms might require users' consent, since classifying their files also means that their files would be "inspected". It could be extremely offensive if some algorithms are judging whether users' files are "sensitive" without users' recognition.

Therefore, I wonder if it's feasible to encrypt all the process of the algorithm during classification in order to better protect users' privacy.

Yilun0221 commented 3 years ago

Thank you for the presentation, Dr. Kanich! My question is how to select the participants and files to reduce the bias of the research results?

hihowme commented 3 years ago

Thanks a lot for this amazing paper! I am wondering how those wastes in cloud would affect the business settings that based on this, and how will this influence the consumer welfare? Thanks a lot!

yutianlai commented 3 years ago

Thanks for your presentation! I'm wondering how you perceive the future of cloud storage.

FranciscoRMendes commented 3 years ago

Thank you for sharing this wonderful research, I think we are all very aware of breaches of privacy in social media but much less so about breaches in cloud storage. I'm wondering if you have thought about applying your work to cloud data storage at the corporate level?

More importantly, what can we do to automate this process of data management, I do not think it's reasonable to expect a human being to do this, particularly since people who are less likely to understand the best practices of data management make up the vast majority of cloud tech users (by account numbers)

heathercchen commented 3 years ago

Hi! Thanks for presenting these wonderful papers to us. It occurs to me that what is your criterion in selecting the participants of the online survey? Amazon MTurk seems not to be a very reliable source of survey respondents, especially for a small sample of 100 just like this. What would you expect the external validity of your results? Thanks again in advance for your presentation!

NaiyuJ commented 3 years ago

Thanks for sharing! Fantastic work! My question is: In practice, how and how much would the user study you conducted in the paper eventually improve the real product.

goldengua commented 3 years ago

Thank you for presenting in this workshop! I was wondering what kinds of implications your results may have to inform internet companies' policy-making?

Dxu1 commented 3 years ago

Thank you for sharing your interesting work! It seems to me that there is a potential trade-off between privacy and efficiency here. While one might benefit from an algorithm that stores information more efficiently, the algorithm may also have access to sensitive information which users may barely use. Users may also disguise the information (file name, etc) hoping to draw less attention from others. What is your view on this potential conflict? Thank you!

Jasmine97Huang commented 3 years ago

Thank you so much for sharing this interesting topic! Different cloud services might be used differently by users. Based on personal experiences, I feel like dropbox & google drive are mostly used for school/work while icloud contain more personal information. In this case, I am wondering why you focus your paper primarily on the privacy issues on dropbox and google drive?

Bin-ary-Li commented 3 years ago

Thank you for this thought-provoking share. I always think that retrospective cloud management regarding personal privacy is a really big issue. Let me cut to the chase, I believe that digital permanence, while tempting, is not a good idea for security and privacy. Cloud management service should not have the indefinite right to the user's uploaded data. Law should mandate the maximum life-span of cloud data in order to fully protect everyone. I wonder what is your opinion on this matter?

mintaow commented 3 years ago

Thanks for sharing your works! My questions center on the potential risk in data processing by Aletheia itself.

While I agree that the potential demand for cloud data management tool is large, as demonstrated in Khan et al. (2018) that 83% of participants wanted to delete at least one file they saw, while 13% wanted to unshare at least one file, it occurs to me that people might regard using Aletheia as taking an additional risk because they have to open the metadata to Aletheia. Therefore, I am a bit curious about how would you address this concern, if Aletheia is really going to the market?

luckycindyyx commented 3 years ago

Thanks for sharing! I realized that you conducted a 100-participant online-survey, and I was wondering if you had considered expand the sample size. Also, we you pass this survey assignment to AMT, is there any requirement for selecting interviewees? Thanks!

mingtao-gao commented 3 years ago

Thank you for sharing your paper and presenting tomorrow! Although in the paper you talk about encrypting and deleting together, since deleting a file for the lack of future utility and encrypting a file for privacy concern are two different types of behaviors in my opinion. Can you elaborate more on how both reflect the need for retrospective privacy mechanisms?

xxicheng commented 3 years ago

I enjoyed reading your work! What is your next step in this project? How do you see its market application value in the future?