Closed beijn closed 1 year ago
My own critique (summarized from the report using ChatGPT):
Linear Regression for Message Length Prediction: The linear regression task partially fails due to non-linearities in the data distribution. The model is unable to capture these non-linearities, resulting in incorrect predictions, including negative message lengths and failure to predict very long outlier messages.
Logistic Regression for Message Subject Prediction: The logistic regression task succeeds almost perfectly, but the high input dimensionalities (96 dimensions) in relation to the number of samples (around 200) suggest potential overfitting. The model predicts message subjects accurately, with the exception of some confusion between the "main" topic and other topics. Lower accuracy is observed for topics with few samples, indicating a lack of representation in the dataset.
Dimensionality Reduction and Visualization: A 2D PCA projection of the embeddings reveals that the subject "linkedin profile" is distinctly separated from other subjects, indicating its unique characteristics. However, it is suggested that this topic should be removed from the dataset as it distorts the distribution of other samples.
I inputted the datathinking wikipedia article preceded by the prompt "Memorize this definition of Datathinking. I will ask you questions about it later.". Then I inputted a text only version (aquired using pdftotext
) preceded by the prompt "Critize the following report using the definition of Datathinking". The results are below. View the conversation here
The report on Zulip Chat Analysis can be critiqued using the definition of Data Thinking as follows:
Lack of Emphasis on Data Thinking Framework: The report provides a detailed analysis of the Zulip chat data, including data cleaning, computing embeddings, and applying regression models. However, it does not explicitly mention the application of a data thinking framework or the integration of computational thinking, statistical thinking, and domain thinking.
Limited Discussion on User Experience: The definition of data thinking highlights the importance of user experience in the design of data-driven solutions. The report does not extensively discuss user experience or how it was considered in the analysis process. Considering user needs and incorporating design thinking principles should be an integral part of data thinking.
Limited Exploration of Data Collection Improvement: Data thinking involves questioning the quality of data and considering how data collection can be improved. The report briefly mentions data cleaning but does not thoroughly discuss the process of data collection or potential improvements in data collection methods.
Lack of Iterative Monitoring and Data Collection: Data thinking emphasizes that improving models and solutions is a continuous process that includes iterative monitoring and data collection. The report does not mention iterative monitoring or how ongoing data collection was incorporated into the analysis process.
Inadequate Consideration of Generalization and Bias: Data thinking acknowledges that problem abstraction is domain-dependent, and generalization is subject to biases and variance in the data. The report does not explicitly address these considerations or discuss potential biases or limitations in the analysis.
Limited Discussion on Strategic Context and Risks: Data thinking involves clarifying the strategic context and identifying data-driven risks and opportunities. The report lacks a comprehensive analysis of the strategic context and does not discuss the risks and opportunities associated with the Zulip chat data analysis.
Insufficient Evaluation of Feasibility and Profitability: Data thinking involves measuring business impact and assessing the feasibility and profitability of data-driven solutions. While the report briefly mentions the success of logistic regression and the profitability of the developed solution, it lacks a comprehensive evaluation of feasibility and cost-benefit analysis.
Overall, the report demonstrates some aspects of data analysis but falls short in fully embracing the principles and components of data thinking as defined in the provided definition.
I agree with the model that I didn't place emphasis on datathinking thoughtflows, but performed a conventional analysis including my own critique.
Editors: GPT-4, Copilot
code repo link github.com/beijn/project-craw
To get started save an OpenAI API key at secret/openai_api_key
, run pip install -r requirements.txt
(optionally in a python -m venv venv; source venv/bin/activate
) and run python main.py
.
There is currently no GUI. The progess will be printed to the console and the final result is a recursive tree of questions and answers, reasoning and sources.
Language models are powerful but they are not good at reasoning nor referencing.
The core idea is that complex reasoning tasks involve nested trees of propositions and partial conclusions. Research has shown that decomposition of reasoning tasks into a 'chain of thoughts' dramatically improves the 'reasoning capabilities' of language models. And generalizing to nested 'trees of thought' improves even more dramatically (2305.10601).
Language model 'reasoning' is largely opaque and factual accuarcy questionable, as they usually do not provide references.
This project aims to build a reasoning agent that uses language models in different modular subtasks to answer questions by recursively answering subquestions. We aim to solve the intransparent 'reasoning' problem by decomposing as task into disjoint subtasks and an explanation of th reasoning behind that. Atomic subquestions are answered by connectecting a compound inference engine to Wikipedia. By collecting (only the) relevant interpretable source citations, we
The overall algorithm skelleton is hardcoded in a modular, funcionally pure way. The resulting recursive tree of questions with answers, explanations and eventually references is created elegantly as the mirror image of the algorithms pure recursion. For replacing the static skeleton with a dynamic agent see Future directions
All interactions with LLMs work through modular, declarative and descriptive as single-purpose functions. See LLM Interfacing. Therefore the division in Reasoning and Referencing is historic and aestetic.
Question Decomposition - Given a question decides whether the question is atomic or compound. An atomic question will be answered by the Referencing Agent. A compound question will be decomposed into a set of subquestions and a natural language reasoning statements of how to combine answers to the subquestions into an answer of the given question.
Answer Recombination - Given the question, the decomposition reasoning and the recursively computed answers, compile and answer involving only reasoning and no halucinated knowledge.
Uses multiple subsystems to find the answer to an atomic question inside a database (Wikipedia) including citations and source link.
I didn't use langchains
convenient wikipedia api because it didn't return the full page content. Instead I collect documents manually with the standalone wikipedia
api.
Search Generator - Takes an atomic question and generates search queries for wikipedia, to maximize the chance of finding the answer in the first few results.
Abstract Pre-Selector - Compute vector embeddings of the abstracts of the wikipedia search results and selects what is semantically related to the question.
Content Reference Selector - Computes vector embeddings of chunks of content of the related wikipedia articles and what is semantically related to the question.
Atomic Answerer - Answers the question with each related content chunk until it finds an answer.
By decomposing the task in this way we can
[...]
Reading
Pro tip: try using an app on your phone or computer to read aloud to you at 1.5x speed! This can save time and make it easier to absorb information while not being tied down to a computer or device visually.
@indrekromet
] Read https://www.palladiummag.com/2023/02/23/the-west-lives-on-in-the-talibans-afghanistan/Doing
Set timer: 10 minutes maximum
] Before asking GPT (to avoid biasing yourself!), write your own critique of your homework. Questions to consider could be: what could be improved? What doesn’t make sense in the visualization? What doesn’t make sense in the writing?Set timer: 10 minutes maximum
] Ask GPT-4 to critique the homework or the visual using your favorite data thinking definition we have so far.Set timer: 10 minutes maximum
] Add this critique as a comment on the homework’s github issue, and link to the critique in Zulip.Set timer: 10 minutes maximum
] Repeat this exercise for the previous homework of one other person in the classCreating
duckdb
to load the Zulip data into a SQL database, and usealtair
to visualize the data, following https://github.com/onefact/datathinking.org-codespace/blob/main/notebooks/in-class-notebooks/230420-debugging-duckdb-altair-falcon-3-1-1-service-requests.ipynb (run this notebook with this data: https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9 - and try changing the data source to be the Zulip data and post a visualization of the chat data on Zulip)Thinking
Listening
Large Language Model Access Checklist