Closed artu134 closed 1 year ago
My Analysis on my report:
After analyzing my report, I understood that I’ve needed to spend more time on data research rather than just trying to predict anything from that data. Using logistic regressions just to predict the sender id might not be a good approach. Rather than that I’d better use RNN model that would be trained on this data to predict the style of the writings of every specific user and then that model would be more accurate in predicting which user wrote those messages.
Also I needed to be more scientifically correct when writing my report, add more categories and data plotting.
But, overall it’s a pretty good try of using ML methods on the raw chat data.
Chat gpt answer:
This research conducted an analysis of Zulip chat data with the aim of predicting the sender ID based on text content. The approach included preprocessing the text data, converting it into numerical format using Word2Vec embeddings, and training logistic regression and linear regression models. While this research is quite comprehensive, there are areas where it could be potentially improved:
Proposed Research Question: How can we leverage data science techniques to identify and protect against social engineering threats posed by chatbots?
Rationale: In the era of digital communication, chatbots have become a pervasive part of online interactions. They have been used for a variety of purposes, ranging from customer service to user engagement. However, alongside their beneficial uses, chatbots also pose a significant risk in terms of social engineering attacks. Social engineering is a method of tricking individuals into disclosing confidential information, and chatbots provide an effective and scalable way for malicious actors to carry out such attacks. Therefore, understanding how to identify and mitigate these threats is crucial for personal and collective digital security.
Decisions to be Made: This research will require a variety of decisions to be made. First, we need to decide on the dataset to be used - this could either be an existing dataset of chatbot interactions or a dataset we collect ourselves. Next, we need to decide on the techniques to be used to analyze the data. This could involve machine learning to identify patterns typical of chatbots, natural language processing to understand the linguistic nuances of chatbot interactions, or a combination of both. We also need to consider ethical implications, such as privacy concerns when collecting and analyzing chat data.
Stakes: The stakes for this research are high. As chatbots become more sophisticated and indistinguishable from human interaction, the potential for their misuse in social engineering attacks also rises. By identifying the characteristics of these chatbots, we can help inform individuals and organizations about the risks and provide strategies for protection. Additionally, our findings could be used to improve the design and regulation of chatbots to prevent their misuse. Ultimately, this research could contribute to safer online spaces for everyone.
Reading
Pro tip: try using an app on your phone or computer to read aloud to you at 1.5x speed! This can save time and make it easier to absorb information while not being tied down to a computer or device visually.
@indrekromet
] Read https://www.palladiummag.com/2023/02/23/the-west-lives-on-in-the-talibans-afghanistan/Doing
Set timer: 10 minutes maximum
] Before asking GPT (to avoid biasing yourself!), write your own critique of your homework. Questions to consider could be: what could be improved? What doesn’t make sense in the visualization? What doesn’t make sense in the writing?Set timer: 10 minutes maximum
] Ask GPT-4 to critique the homework or the visual using your favorite data thinking definition we have so far.Set timer: 10 minutes maximum
] Add this critique as a comment on the homework’s github issue, and link to the critique in Zulip.Set timer: 10 minutes maximum
] Repeat this exercise for the previous homework of one other person in the classCreating
duckdb
to load the Zulip data into a SQL database, and usealtair
to visualize the data, following https://github.com/onefact/datathinking.org-codespace/blob/main/notebooks/in-class-notebooks/230420-debugging-duckdb-altair-falcon-3-1-1-service-requests.ipynb (run this notebook with this data: https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9 - and try changing the data source to be the Zulip data and post a visualization of the chat data on Zulip)Thinking
Listening
Large Language Model Access Checklist