onefact / datathinking.org

Data Thinking website deployed using GitHub Pages
https://datathinking.org
Apache License 2.0
7 stars 7 forks source link

[homework: doing, reading, watching] Linear, logistic regressions and embedding visualizations of Zulip data #136

Closed nesmaAlmoazamy closed 1 year ago

nesmaAlmoazamy commented 1 year ago

Doing

Reviewing

Reading

Watching

(message Jaan if you need a VPN or these links don't work)

nesmaAlmoazamy commented 1 year ago

Please find the PDF here: Homework_PDF.pdf

nesmaAlmoazamy commented 1 year ago

Feedback about my report: 1- Overall, I feel I spent more time in the data preparation, and implementation than writing explanations in the report 2-I should have added a reference to chat Gpt since I used it in Section 1.3. 3- I still like the plots for word embeddings and word clouds. 4-The logistic regression and linear regression could have presented better with another dataset or in another way rather than the way in the homework. 5- I like using latex to write the equations rather than being a standard paragraph text.

nesmaAlmoazamy commented 1 year ago

Chat GPT Feedback :) Overall, Nesma Mahmoud's homework report provides a good example of how to approach data analysis using text data. The report covers several important steps of data analysis, including data cleaning, data visualization, and modeling. The use of NLTK library for filtering the dataset and preparing it for analysis is a good approach. The visualizations, including word cloud and word frequency plot, provide a quick understanding of the dataset's most frequent words. The word embeddings plot also shows how the embeddings nicely capture the words' relationships based on the dataset.

The report's use of Linear and Logistic Regression models is also appropriate for analyzing the text data. The plots of Actual vs Predicted values provide a quick understanding of the models' performance. The report also provides the equations for both models, which helps in understanding how these models work.

The inclusion of the Word2Vec equation is a nice touch, as it shows how the model calculates the vector representations for the words in the dataset. However, the report does not provide a clear explanation of how the Word2Vec model fits into the analysis or how it is used to generate the word embeddings plot.

Overall, the report provides a good example of how to approach text data analysis using common techniques like data cleaning, data visualization, and modeling. However, a more detailed explanation of the Word2Vec model's role in the analysis could improve the report's clarity.

image
0rd0s1n1ster commented 1 year ago

My thoughts

WoW, such a nice report! Feels that quite some time was really invested into it. The person who wrote it most likely knows NLP.

However, there are some points to be considered:

Chat gpt "thoughts": To improve the report, some suggestions are: