tacigomess / TalkData

Repository of the final project for Le Wagon's Data Science course! This project was developed to showcase the skills acquired throughout the course, with a special focus on Generative AI.
1 stars 0 forks source link

Check which LLMs to use for generating charts #6

Closed Ib191 closed 2 months ago

Ib191 commented 2 months ago

🎯 Prioritize one that integrates well with data visualization tools and offers robust data processing capabilities for accurate and customizable chart outputs.

1️⃣ OpenAI's GPT-3 and GPT-4 :

2️⃣ Google's BERT :

3️⃣ Google's T5 :

hachlam commented 2 months ago

4️⃣ Hugging Face Transformers: Use: Leverage a wide variety of pre-trained models (e.g., BERT, T5, RoBERTa) for diverse NLP tasks. Strengths: Provides a user-friendly API, supports multiple frameworks (PyTorch, TensorFlow), and facilitates model fine-tuning and deployment. So basically Hugging Face simplifies the way we access and use various NLP models like BERT, T5, and many others.

Integration Strategy example:

-Data Ingestion and Preprocessing: Use tools like Pandas and Numpy to load, clean, and normalize data.

-Feature Extraction with BERT (via Hugging Face): Utilize BERT for extracting relevant features and understanding text context.

-Data Transformation with T5 (via Hugging Face): Convert structured/tabular data into descriptive text.

-Summarization and Insight Generation with GPT-4(3.5): Generate comprehensive summaries, insights, and detailed explanations.

-Visualization: Create and customize visualizations using tools like LIDA,Plotly, Matplotlib, and Seaborn.

-Annotation and Explanation with GPT-4(3.5): Use GPT-4 to annotate charts with detailed explanations and narratives.

-Interactive Dashboards: Build interactive dashboards using frameworks like Dash or Streamlit for dynamic data exploration and visualization.

To answer the initial question: there isn't a specific LLM designed solely for generating charts, but certain models can aid in the process by providing the necessary context, annotations, and descriptions for the charts. We can use the insights generated by the LLMs to improve the automation of generating charts. However, there are tools that can help us automate the process of generating charts using these LLMs. One example is LIDA

Microsoft LIDA: LIDA (Language Integrated Data Analytics) is a library developed by Microsoft for automatic generation of data visualizations using large language models.

Features: Automatic Visualization: Generates data visualizations and infographics automatically. Model Integration: Compatible with multiple LLM providers including OpenAI and Hugging Face. Visualization Goals: Generates visualization goals from data summaries. Visualization Code: Creates and executes visualization code based on data summary and goals. Interactive Dashboards: Supports creating interactive dashboards.

LLMs: Models like GPT-4, BERT, and T5 that perform language understanding and generation tasks. LIDA: A tool that leverages these LLMs to automate and enhance data visualization tasks.

hachlam commented 2 months ago
**Cost Comparison for BERT, T5 (Hugging Face), and GPT-4(3.5)**

-Hugging Face (BERT, T5) Free Tier: Limited access. Starter Plan: $9/month. Professional Plan: $199/month.

-OpenAI GPT-4 GPT-4 Standard: Input: $0.03 per 1,000 tokens. Output: $0.06 per 1,000 tokens.

-GPT-4 Turbo: Input: $0.01 per 1,000 tokens. Output: $0.03 per 1,000 tokens.

-GPT-3.5 Turbo Input Tokens: $0.001 per 1,000 tokens Output Tokens: $0.002 per 1,000 tokens

Example Costs for Processing 1 Million Tokens

-GPT-4 Standard: Input: $30 Output: $60 Total: $90

-GPT-4 Turbo: Input: $10 Output: $30 Total: $40

-Hugging Face Professional Plan: $199/month