Closed nesmaAlmoazamy closed 1 year ago
Please find the PDF here: Homework_PDF.pdf
Feedback about my report: 1- Overall, I feel I spent more time in the data preparation, and implementation than writing explanations in the report 2-I should have added a reference to chat Gpt since I used it in Section 1.3. 3- I still like the plots for word embeddings and word clouds. 4-The logistic regression and linear regression could have presented better with another dataset or in another way rather than the way in the homework. 5- I like using latex to write the equations rather than being a standard paragraph text.
Chat GPT Feedback :) Overall, Nesma Mahmoud's homework report provides a good example of how to approach data analysis using text data. The report covers several important steps of data analysis, including data cleaning, data visualization, and modeling. The use of NLTK library for filtering the dataset and preparing it for analysis is a good approach. The visualizations, including word cloud and word frequency plot, provide a quick understanding of the dataset's most frequent words. The word embeddings plot also shows how the embeddings nicely capture the words' relationships based on the dataset.
The report's use of Linear and Logistic Regression models is also appropriate for analyzing the text data. The plots of Actual vs Predicted values provide a quick understanding of the models' performance. The report also provides the equations for both models, which helps in understanding how these models work.
The inclusion of the Word2Vec equation is a nice touch, as it shows how the model calculates the vector representations for the words in the dataset. However, the report does not provide a clear explanation of how the Word2Vec model fits into the analysis or how it is used to generate the word embeddings plot.
Overall, the report provides a good example of how to approach text data analysis using common techniques like data cleaning, data visualization, and modeling. However, a more detailed explanation of the Word2Vec model's role in the analysis could improve the report's clarity.
My thoughts
WoW, such a nice report! Feels that quite some time was really invested into it. The person who wrote it most likely knows NLP.
However, there are some points to be considered:
Chat gpt "thoughts": To improve the report, some suggestions are:
Doing
[x] Clean Data Thinking Zulip chat data, located at https://github.com/onefact/datathinking.org-codespace/blob/main/data/datathinking.zulipchat.com/raw/messages-000001.json - put it in a
polars
dataframe and compute summary statistics of the dataset[x] Analyze this Zulip chat data using logistic regression, linear regression, and embeddings with the tools we have learned in the lectures (don't forget to ask ChatGPT, Claude, Lex, GPT-4 for help as much as you need, and ask for help on the Data Thinking Zulip chat :)
[x] Create a visualization of logistic regression of the Data Thinking Zulip chat data
[x] Create a visualization of linear regression applied to the Data Thinking Zulip chat data
[x] Create a visualization of embeddings using the Data Thinking Zulip Chat data
[x] Make a copy of the Overleaf template: https://www.overleaf.com/read/ghpyzqwqwxpv (need to create an account and/or sign in if this is your first time using Overleaf). To make a copy, open the project after signing in using this link, and click on
Menu
, thenCopy Project
:[x] In Overleaf, edit the template and figure out how to include a PDF figure in the report, alongside a brief description (a few sentences or paragraphs is fine!) of each of the analyses you performed, why you chose them, and the math equation for the linear regression, logistic regression, and embedding you used.
[x] Add the PDF of the report to this issue as a comment.
[x] Send a message on Zulip with a link to this comment, alongside the image representing your favorite visualization
Reviewing
Reading
json
format with chatgpt] https://genmon.github.io/braggoscope/about & https://news.ycombinator.com/item?id=35073603The Boy Whose Light Went Out
by Jack Clark http://techpolicylab.uw.edu/wp-content/uploads/2022/04/Telling_Stories_Pages_4-4-22.pdfWatching
(message Jaan if you need a VPN or these links don't work)