shure-dev / logllm

Automatically extract ML experimental conditions from your Python scripts with GPT4, and save them via WandB.
1 stars 2 forks source link
chatgpt gpt4 large-language-models llm llmops logging machine-learning mlflow mlops wandb
[๐ŸŒ **Project Website**](https://logllm.tiiny.site/) | [๐Ÿ’ฌ **Discord Community**](https://discord.gg/3xvUV6xcKW) # ๐Ÿš€ **LogLLM** ๐Ÿš€ **LLM-Enhanced ML Experiment Logging System** `Ever found yourself lost in a maze of changing experiment conditions in your early ML scripts? ๐Ÿ˜ต No worriesโ€”hereโ€™s the solution youโ€™ve been looking for! โœจ` ![image](./images/logllm-overview.png)

โœจ Features

๐Ÿ” Automatic Logging: Effortlessly extracts code from Jupyter Notebook files using GPT4o, saving the logs to Weights & Biases (W&B) for seamless tracking and analysis.

๐Ÿ’ฌ Natural Language Queries: Effortlessly extract and display information from WandB logs using natural language queries. Simply ask in plain language, and get insights directly. For example, you could ask, "What was the most effective method using decision trees so far?" and receive the relevant data instantly.

โš™๏ธ Installation

To install the package, run the following command in your terminal:

git clone https://github.com/shure-dev/logllm.git
pip install -e .
export OPENAI_API_KEY="your-openai-api-key"
wandb login

This command installs the package in editable mode, allowing you to modify the code and see changes without reinstalling.

๐Ÿš€ Usage

Hereโ€™s a simplified example of how to use the package:

๐Ÿ” Automatic Logging

Sample Notebook Script: sample-script.ipynb

###
# Your machine learning script goes here.
# Experimental conditions are extracted from here.
###

from logllm import log_llm

notebook_path = "sample-script.ipynb"  # The target file to log

log_llm(notebook_path)

๐Ÿ’ฌ Natural Language Queries

๐Ÿง  How It Works: Simple and Powerful!

LLM(Our Prompt + Your ML Script) = Extracted Experimental Conditions

Our Prompt:

You are an advanced machine learning experiment designer.
Extract all experimental conditions and results for logging via W&B API.
Add your original parameters in your JSON response if you want to log other parameters.
Extract all information you can find in the given script as int, bool, or float values.
If you cannot describe conditions with int, bool, or float values, use a list of natural language.
Give advice to improve accuracy.
If you use natural language, answers should be very short.
Do not include information already provided in param_name_1 for `condition_as_natural_langauge`.
Output JSON schema example:
This is just an example, make changes as you see fit.
{{
    "method": "str",
    "dataset": "str",
    "task": "str",
    "is_advanced_method": bool,
    "is_latest_method": "",
    "accuracy": "",
    "other_param_here": "",
    "other_param_here": "",
    ...
    "condition_as_natural_langauge": ["Small dataset."],
    "advice_to_improve_acc": ["Use a bigger dataset.", "Use a simpler model."]
}}

Your ML Script: svc-sample.ipynb

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

iris = datasets.load_iris()

X = iris.data[iris.target != 2]
y = iris.target[iris.target != 2]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = SVC(kernel='linear')
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")

Extracted Experimental Conditions:

{
    "method": "SVC",
    "dataset": "Iris",
    "task": "classification",
    "is_advanced_method": false,
    "is_latest_method": "",
    "accuracy": 1.00,
    "kernel": "linear",
    "test_size": 0.2,
    "random_state": 42,
    "condition_as_natural_langauge": [
        "Using linear kernel on SVC model.",
        "Excluding class 2 from Iris dataset.",
        "Splitting data into 80% training and 20% testing."
    ],
    "advice_to_improve_acc": [
        "Confirm dataset consistency.",
        "Consider cross-validation for validation."
    ]
}

๐Ÿ“„ Check the Demo Code


๐Ÿค Contributing

Contributions are welcome! If you have suggestions or improvements, please feel free to submit an issue or a pull request.

๐Ÿ“œ License

This project is licensed under the MIT License.