uchicago-computation-workshop / Winter2021

Repository for the Winter 2021 Computational Social Science Workshop
7 stars 5 forks source link

01/21: Chris Kanich #2

Open smiklin opened 3 years ago

smiklin commented 3 years ago

Comment below with questions or thoughts about the reading for this week's workshop.

Please make your comments by Wednesday 11:59 PM, and upvote at least five of your peers' comments on Thursday prior to the workshop. You need to use 'thumbs-up' for your reactions to count towards 'top comments,' but you can use other emojis on top of the thumbs up.

Leahjl commented 3 years ago

Thank you for sharing your interesting work!I'm curious about how would you improve the efficiency of cloud storage management.

XinSu6 commented 3 years ago

Thank you for sharing this great research. I am wondering how are users' privacy being guaranteed in the data selection and processing part specifically?

Thank you and looking forward to your speech.

Qiuyu-Li commented 3 years ago

Thank you for sharing your interesting and fascinating work. It’s certainly interesting to hear everyone’s discussion about the conflict between personal interests and companies, or service providers’ profits, and the ambiguity between convenience and privacy. And @JadeBenson provided interesting ideas on how to handle shared files in the cloud. Honestly, I don’t have any new questions that have not been brought about by my classmates.

MegicLF commented 3 years ago

Thank you for sharing your research! I am actually a heavy user of cloud service, and I barely use software for documentation locally. I use both Google Docs and MS Word online for writing papers, Overleave for writing Latex, Google Colab for larger datasets, Google Drive and OneDrive for storing documents. Using these cloud services allows me to get rid of bring my laptop everywhere - I can just use my iPad or borrow laptops from others to access my own documents. Like my peers, I also think companies have fewer incentives to implement Aletheia on their cloud service, and I wonder what's your opinion about this issue.

shenyc16 commented 3 years ago

Thank you for sharing this interesting research with us. If I get it correctly, the ultimate goal of the research is to design mechanisms for automatical retrospective file management in the cloud. I was wondering whether additional factors that may have latent impacts on users' decision should be included in the model. Also, is the sample size of 100 sufficient for the purpose?

luyingjiang commented 3 years ago

Thank you for the presentation, Dr. Kanich! My question is how did you manage participant privacy when dealing with personal and sensitive information?

MengChenC commented 3 years ago

Thank you for your excellent work. I really like the second paper which extends the concern and develops an algorithm for the previous research. The comparison between random, majority, and GDLP also makes the model more practical. I notice that, however, the model cannot recognize/predict files that need to be protected accurately. Would you be able to elaborate why causes this case and any approaches to improve the prediction? Thank you.

YuxinNg commented 3 years ago

Thanks for sharing. Like many others, I am wondering how your research could be applied to the real world? Will those huge tech companies be willing to apply it? And how well will the customers accept it? Thanks

adarshmathew commented 3 years ago

Thank you for your paper and Aletheia!

I have a question about your choice of labels. The way I see it, keep & delete are one binary pair choice, but the question of protection is separate from that of preservation. I could have a racy photo of an ex-partner (sensitive, needs protection, don't want it featured in a data breach) that I want to delete nonetheless. Would it make more sense to split your outcome labels as a (keep/delete, protect/don't protect)tuple? It might shed some light as to why Aletheia's accuracy on 'Protect' decisions is extremely low.

image

jinfei1125 commented 3 years ago

Thanks for sharing your great papers! They are really interesting and I think managing forgotten files in the cloud storage is really a pain point to many users (including me, I have too many photos in my cloud storage that I don't have time to manage them...)! I feel excited and start expecting the service discussed in your second paper that can help users automatically find and manage files in cloud storage.

I have two questions in your paper:

  1. If such a service has been commercialized, how do you persuade users to trust you? As you say, many files are sensitive.
  2. I am from China where we don't use Google Drive or Drop box often, of which free storage is small (20GB and 2GB), but downloading speed is fast. I think many Chinese users use Baidu Cloud, which you can have large free storage (I have 2068G free storage and already use 1000G of them--most of them are my photos and backup of my computer and cell phone). No need to mention, I have forgotten most of them. Do you think in this condition, the service discussed in second paper need to make some relatively adjustments?

Thank you again for these interesting papers!

Rui-echo-Pan commented 3 years ago

Thank you for sharing! I don't have any questions related to this paper, but I do get curious about how people's willingness to keep files always accessible and unwillingness to delete is influenced.

ghost commented 3 years ago

How did you manage to get the data?

zixu12 commented 3 years ago

Thanks for sharing your inspiring work! I have a very basic question: could you please elaborate more on the "retrospective" feature of the data management? How would it help us manage the data in the fast? In Google drive, there are grids of the most recently open files displaying in a very outstanding place. Also, in the photo gallery of iphones, there is a "memories" feature which help you throw back to one year ago etc. Would they be counted as retrospective features as well?

bowen-w-zheng commented 3 years ago

Hi Professor Kanich, thanks for sharing this work. I am curious about how you turn quantitative interviews into useful insights for feature engineering. I rarely see people incorporating quantitative information into part of the model building process. What're the advantages of using this kind of approach to aid feature selection, as compared to a more data-driven approach?

Thanks!

97seshu commented 3 years ago

Thank you for presenting. I wonder how did you manage to overcome ethical issues associated with your project since private and sensitive information was collected. And how did you work with the companies (e.g. Dropbox and Google Drive)? Thank you.

chiayunc commented 3 years ago

Thank you for your presentation. In your 2021 working paper, you mention that for low sensitivity files, the preliminary classifier has low accuracy. This would not be an issue since then the classifier serves its purpose, but I wonder what could be the reason behind this. Do you suspect it to be method-related? or that in the data itself, high sensitive data has some common trait that just would not be mistaken? Thank you.

fyzh-git commented 3 years ago

Thank you for presenting this interesting topic. The idea of cloud storage management is especially useful for improving the storage allocation efficiency, especially in digital age with explosive data storage needs.

The implicit conflicts between the size of data stored, convenience, and privacy issues have inevitably imposed difficult tradeoffs for the developers of cloud storage platforms.

And the finding in this paper, especially on the proportion of data that are useless (those not intended to be stored), or useless after long enough periods (no longer needed or gradually forgotten), does provide with a practical guidance on how to dig out the users' real needs, and thus to best coordinate those needs with the storage and privacy protection load for cloud storage companies.

An enlightening point for developing research idea from practical needs. Thank you!

anqi-hu commented 3 years ago

Thank you for sharing your work! Your study is definitely innovative in using MTurks for examining cloud-storage platforms. My question has to do with the nature of accounts that these users have created on these platforms. For example, Dropbox accounts are usually linked to one's institutional affilation, which one may abandon/ lose access to once their affiliation is terminated. Assuming that at least some of the files are no longer of importance and the accounts are not used for the users' primary storage purposes, do you think it would be valuable to look at the users' attitude towards these files and perhaps storage decisions as a time-varying factor?

chentian418 commented 3 years ago

Thanks for sharing the inspiring paper! When taking advantage of algorithms like Aletheia to help promote data management in cloud storage probably in the future, would you expect the data privacy problem be an impediment, or how do you see the feasibility of using the algorithms to identify files that are both risky and useless in the real life? Thanks!

afchao commented 3 years ago

Thank you for sharing your work with our group! My question is more of a comment, but I'd like to know whether you would expect a larger amount of more clearly defined "privacy personas" in a broader sample. I suspect that cloud storage use behavior would vary significantly across different demographic groups, and that each such behavioral cluster would bring with it a relatively specific preference for how they would like their data handled. I agree with the paper's claim that the convenience sample could be influencing the results, but I'm curious about what you might expect something more representative to show.

chun-hu commented 3 years ago

Thank you for sharing! I'm wondering if we can use AI-based approaches to predict these personal files?

RuoyunTan commented 3 years ago

Thank you for sharing your work with us. As a cloud storage user myself, I wonder if the interface of the file hosting services affect how people think about a file that has been uploaded some time ago. Do you think this would be an issue to consider when conducting your surveys and gathering the data?

TwoCentimetre commented 3 years ago

It seems that this research only use 108 pieces of data to train the classifier Aletheia. Why will that be enough? Is there a standard for industrial or academy to decide the amount of data which are used for model training? Second, I wonder how the sensitivity and usefulness of files are decided by the model. Just by several interviews?

weijiexu-charlie commented 3 years ago

Thanks for your presentation. As mentioned in other posts, I'm also curious about what we can do as consumers for our data privacy?

PAHADRIANUS commented 3 years ago

Thank you for sharing both the established paper as well as this brilliant draft in progress. Your observation of cloud storage user behaviors are of great keenness and demonstrated a substantial yet mostly unseen risk in our everyday enjoyment of the convenience rendered by the service. It is worth noting that it is rather difficult for users themselves to possess a comprehensive memory of what they had stored in the past, not to mention monitor its sensitivity in matters of privacy and security. Thus my question is on which agent should we take effective actions to reduce such a risk: should we advise the users' storage behavior to be more vigilant and cautious to curb the amount of sensitive information going into cloud, or should we call for supervision and regulation on service providers so that they can take up the responsibility of enhancing their own cyber security to prevent leakage and adjusting designs of products so that users may manage files better?

YaoYao121 commented 3 years ago

Thank you for sharing your work! The topic about modern digital storage milieu is really very interesting. Besides, your research design perfectly conbimed the methodology of social science and computational methods. However, I have a question about the data sample. The number of participants of your survey is only 100. How do you decides this sample size? And do you think the sample size is not problematic? If you extend the sample size, do you think the results would not change? Thanks!

JuneZzj commented 3 years ago

Thank you for sharing. I am impressed by the idea that the preference of sharing in the cloud changes as the passage of time. I noticed that you used stratified sampling to categorize the file. I am wondering what is the rational of categorizing files according to these criteria. Have you even considered any other sampling methods? Thanks a lot.

ginxzheng commented 3 years ago

Thank you for coming! I find the file storage topic very interesting and unique. Would you touch more on the qualitative interviews prior? I was curious how do you let people aware that these files are private and should be securely stored, etc. Many thanks!

YanjieZhou commented 3 years ago

Thanks very much for sharing! I think it is very meaningful to research the security of cloud storage. I am also very interested in the algorithms to use for categorizing cloud files. Could you elaborate on it?

harryx113 commented 3 years ago

Thanks for the paper and presentation. One issue around categorization is subjectivity/objectivity - ie. how do we standardize the process to ensure a degree a personalization?

k-partha commented 3 years ago

Thanks for sharing your research! I found the framework you developed quite interesting. What ethically significant questions do you see arising from the tradeoff between data privacy/ better algorithm-based data management in the future?

timqzhang commented 3 years ago

Thank you for your paper ! I wonder the application of this privacy, how do normal consumers protect their privacy efficiently?

FrederickZhengHe commented 3 years ago

Thanks for this paper and presentation. I am wondering whether 100 online survey participants are sufficient ...... Perhaps a larger sample size might be better.

caibengbu commented 3 years ago

Thanks for the paper and presentation. One issue around categorization is subjectivity/objectivity - ie. how do we standardize the process to ensure a degree a personalization?

yierrr commented 3 years ago

Hi Prof Kanich, Thanks for this intriguing research! My question is about a more peculiar case: in situations where people are not familiar with cloud storage or when cloud storage is expensive compared to the local price level, there might be multiple people sharing one storage account with people they trust; under these circumstances, will the accuracy of Aletheia be negatively affected, and more importantly, will using Aletheia actually bring additional ethics concerns in that sensitive files of one user may be presented or highlighted to another one? Thank you!

luxin-tian commented 3 years ago

Thank you very much for sharing. This is really an interesting and meaningful topic. I also wonder about the sample size issue - would you further engage more participants to validate the findings?

WMhYang commented 3 years ago

Thanks for the interesting paper. The idea to use Aletheia to predict sensitivity and usefulness really freshs my mind. This reminds me the trash emails I received. They are directly put in a the trash folder in my mailbox and I will not receive any notification. Though it is helpful for most of the times, it is extremely annoying if I fail to save an important email (e.g. an offer) that is direct to trash. (And I guess for most people, they will only remember the cons, rather than the pros.) Hence, I was wondering if Aletheia is put into commercial use, how to deal with this problem to enhance users satisfaction? Thanks.

Qlei23 commented 3 years ago

Hi Professor Kanich,thank you for sharing this research with us! I totally agree that waste in cloud storage is a major problem. However, it seems that it's not the primary concern to the tech companies as large cloud storage will help them keep their customers. Also, how can one adjust his storage behavior to better manage the cloud as well as protect his privacy?

Yiqing-Zh commented 3 years ago

Thank you for the presentation. This is a very interesting topic related to a common issue in our daily life. I am wondering which one you consider to be more important, to remind the consumers to delete, manage, and protect their files, or to develop cheaper and safer storage tech in order to store as much data as possible.

minminfly68 commented 3 years ago

Thanks for the presentation. It seems very interesting and promising. We are wondering how to deal with the privacy issue for your topic, thanks!

lyl010 commented 3 years ago

Thank you for your presentation! It is a unique perspective to understand cloud storage and it is interesting to notice that forgetting about the details of files is a major issue of efficient management. I am curious about whether it is possible that we could build a system that reminds us of the status of files or it is better to set a limit of storage?

ziwnchen commented 3 years ago

Thanks for sharing this interesting work! The problem of the "abundance" of files in cloud storage is indeed quite important. I am wondering apart from labeling files with "perceived sensibility", is there another management practice that could help people "remember" the files they saved before? For example, some version control tools like a storage version GitHub?

chuqingzhao commented 3 years ago

Thank you for sharing. It is very interesting to read about how the research gains insights from interview and predict sensitivity and usefulness by computational methods. The idea of solving real-life cloud storage issue is criticial. I have similar concerns about the real-life application of Alethia. And could you please explain how do you classify the usefulness. Thanks

Yaweili19 commented 3 years ago

Your research really has enough innovative significance. It is true that I have rarely considered this impact of the data age, and the way you put it into a real research problem is also very clever. Looking forward to your speech!

hhx2207061197 commented 3 years ago

Dear Professor Kanich, Sorry for the late reply. Just want to know how you can combine your research findings with economics research?

Anqi-Zhou commented 3 years ago

Thanks for your sharing!! I'm wondering how your team is going to deal with personal privacy problem.

yiq029 commented 3 years ago

Thanks for your sharing!!Could you talk more about the cloud technology and privacy issue?

qishenfu1 commented 3 years ago

Hi Prof. Kanich, thank you very much for your sharing! I think your work is very important and meaningful in the contemporary era. Because now people use too many mobile apps and sometimes unconsciously agreed to these mobile apps to get our information. It is important for us to discover a method to protect personal privacy without compromising convenience.

yongfeilu commented 3 years ago

Thank you so much for your presentation! I wonder what kind of techs your team has used to protect the privacy. Thanks!

YijingZhang-98 commented 3 years ago

Thanks for your excellent work! I was wondering that only including the tweets with tags would leave out some tweets without any text? It bears watching that some people only post a photo without any text. So I was thinking that maybe techniques of computer vision can also be applied to this research.