Closed 0rd0s1n1ster closed 1 year ago
Red Teaming
My own critique (10 mins):
While personal experience in the abstract is nice, it does not represent anything about what the report includes, what the goals and methodology of the work done
The report doesn't have much structure, there’s only one part called “Main part”, so it’s difficult to find, for example, what logistic regression was used for.
The first paragraph of the main part is more suitable for the abstract than the abstract itself (biased opinion)
The first figure is very hard to read. In theory it should visualise both - embeddings and logistic regression results, but sender ids are discrete, and they are impossible to read on a linear axis.
All the figures have the same description, so it’s not helpful. One of the figures appears twice.
I cannot really make any conclusions from the plots, the titles are not descriptive and there’s no legend (or axis names in the last one).
Embeddings are used in every single experiment, but they’re fully explained in the last one. Maybe changing the order is a good idea.
Critique from chatGPT (I Don’t have access to GPT-4 yet):
This homework report has several issues:
Lack of clarity and structure: The report lacks a clear structure and organization, making it difficult to follow and understand. The author jumps between different topics without providing a clear introduction or conclusion.
Poor writing: The report contains several grammatical errors and lacks coherence and clarity in its writing style. This makes it hard to understand the author's ideas and arguments.
Lack of depth and analysis: The report provides a basic description of the methods used, but does not provide a detailed analysis or evaluation of the results. The author also does not provide any context or background information on the DataThinking Zulip chat data or the techniques used.
Inappropriate tone: The author's use of language is unprofessional and inappropriate for an academic report. The report contains several informal and unprofessional phrases that undermine the credibility of the author and the report.
Lack of reproducibility: The report does not provide enough details on the data cleaning and preprocessing steps, making it difficult to reproduce the analysis. Additionally, the report does not provide any code or scripts used for the analysis, which makes it impossible to verify the results.
My critique 10 mins
The report is not well structured and academic writing style is much to be desired. A little bit of irritation is sensible, wide definition bounds which suit well homework formulation.
Things to be improved:
Chat GPT 3.5 (GPT 4 failed)
Introduction: The opening paragraph does not provide any context for the report and does not relate to the main analysis. It should be removed or rewritten to introduce the purpose of the report.
Language: The language used is informal and unprofessional. The report should be written in a more formal and technical language appropriate for a research report.
Methods: The methods used for data cleaning and preprocessing are not described. It is unclear how the Word2Vec model was trained and how the embeddings were generated. The report should provide more detail about the methodology used.
Results: The results are poorly presented. The figures are not labeled or explained, and it is unclear what they represent. The report should provide clear and concise explanations of the results, including the statistical significance and interpretation of the findings.
Conclusion: The conclusion does not summarize the findings or provide any recommendations for future research. The report should conclude with a clear and concise summary of the findings and suggestions for further investigation.
Overall structure: The report lacks a clear structure and organization. It should be reorganized into clear sections, with headings and subheadings to guide the reader through the report.
Professionalism: The report includes inappropriate comments and self-deprecation. These should be removed to maintain a professional and objective tone.
Doing
[x] Clean Data Thinking Zulip chat data, located at https://github.com/onefact/datathinking.org-codespace/blob/main/data/datathinking.zulipchat.com/raw/messages-000001.json - put it in a
polars
dataframe and compute summary statistics of the dataset[x] Analyze this Zulip chat data using logistic regression, linear regression, and embeddings with the tools we have learned in the lectures (don't forget to ask ChatGPT, Claude, Lex, GPT-4 for help as much as you need, and ask for help on the Data Thinking Zulip chat :)
[x] Create a visualization of logistic regression of the Data Thinking Zulip chat data
[x] Create a visualization of linear regression applied to the Data Thinking Zulip chat data
[x] Create a visualization of embeddings using the Data Thinking Zulip Chat data
[x] Make a copy of the Overleaf template: https://www.overleaf.com/read/ghpyzqwqwxpv (need to create an account and/or sign in if this is your first time using Overleaf). To make a copy, open the project after signing in using this link, and click on
Menu
, thenCopy Project
:[x] In Overleaf, edit the template and figure out how to include a PDF figure in the report, alongside a brief description (a few sentences or paragraphs is fine!) of each of the analyses you performed, why you chose them, and the math equation for the linear regression, logistic regression, and embedding you used.
[x] Add the PDF of the report to this issue as a comment.
[x] Send a message on Zulip with a link to this comment, alongside the image representing your favorite visualization
Reviewing
Reading
json
format with chatgpt] https://genmon.github.io/braggoscope/about & https://news.ycombinator.com/item?id=35073603The Boy Whose Light Went Out
by Jack Clark http://techpolicylab.uw.edu/wp-content/uploads/2022/04/Telling_Stories_Pages_4-4-22.pdfWatching
(message Jaan if you need a VPN or these links don't work)