microsoft / lida

Automatic Generation of Visualizations and Infographics using Large Language Models
https://microsoft.github.io/lida/
MIT License
2.6k stars 266 forks source link

`lida.summarize` returns wrong column types #70

Open Shu-Ji opened 7 months ago

Shu-Ji commented 7 months ago

with sample csv file below, lida.summarize(https://github.com/microsoft/lida/blob/main/lida/components/summarizer.py#L53) will return a dtype for the app_version column.

And in lida final step, the matplotlib/seaborn will throw errors with :

6.4.3 can not be cast to a date.

and if i use the summary_method as summary = lida.summarize("./info.csv", summary_method='llm'), the AI will keep the column type, OpenAI doesn't fix the error type.

user,app_version
Jack,6.4.3
Lisa,6.4.4
Tom,6.4.2
{
    "column": "app_version",
    "properties": {
        "dtype": "date",
        "min": "6.4.2",
        "max": "6.4.4",
        "samples": [
            "6.4.3",
            "6.4.4",
            "6.4.2"
        ],
        "num_unique_values": 3,
        "semantic_type": "date",
        "description": ""
    }
}