Closed 0rd0s1n1ster closed 1 year ago
The research question I want to focus on is linked to Chat-GPT detection. The availability of such convenient tools makes people lazy, and abuse comes in hand with it(even though, at the moment of this comment chat gpt is not available due to overload). My idea is to take theses of Science and Technology curriculum as a human written source since I believe there is something in common in the psychosphere of me and other graduated students from my curriculum. Then I will cut text into chunks of 150 words and send to chat-GPTwith the request to rephrase them. Based on the responses collected I am planning to train DistillBert model to classify text to be written by human(Sci&Tech student) or Chat-GPT. The all-in-all budget is expected to be <5$. The model choice is constrained by the size I can load into my GPU and amount of time I am willing to wait until is ready, for that the lighter model is used.
Here I have also visualized using altair and loaded with duckdb. Figure 1. The number of messages that have attachments and out of them which have images. It is interesting to see that some messages without attachments have image attached, which I find weird:
Figure 2.. Number of messages per sender:
Reading
Pro tip: try using an app on your phone or computer to read aloud to you at 1.5x speed! This can save time and make it easier to absorb information while not being tied down to a computer or device visually.
@indrekromet
] Read https://www.palladiummag.com/2023/02/23/the-west-lives-on-in-the-talibans-afghanistan/Doing
Set timer: 10 minutes maximum
] Before asking GPT (to avoid biasing yourself!), write your own critique of your homework. Questions to consider could be: what could be improved? What doesn’t make sense in the visualization? What doesn’t make sense in the writing?Set timer: 10 minutes maximum
] Ask GPT-4 to critique the homework or the visual using your favorite data thinking definition we have so far.Set timer: 10 minutes maximum
] Add this critique as a comment on the homework’s github issue, and link to the critique in Zulip.Set timer: 10 minutes maximum
] Repeat this exercise for the previous homework of one other person in the classCreating
duckdb
to load the Zulip data into a SQL database, and usealtair
to visualize the data, following https://github.com/onefact/datathinking.org-codespace/blob/main/notebooks/in-class-notebooks/230420-debugging-duckdb-altair-falcon-3-1-1-service-requests.ipynb (run this notebook with this data: https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9 - and try changing the data source to be the Zulip data and post a visualization of the chat data on Zulip)Thinking
Listening
Large Language Model Access Checklist