microsoft / lida

Automatic Generation of Visualizations and Infographics using Large Language Models
https://microsoft.github.io/lida/
MIT License
2.6k stars 266 forks source link

LIDA rename columns #42

Closed trojrobert closed 9 months ago

trojrobert commented 10 months ago

Screenshot from 2023-09-20 08-33-11

LIDA columns replace special characters with _

victordibia commented 9 months ago

Can you explain the exact issue? E.g., are you saying that replacing special characters in column names (i.e. standardizing column names) with is a problem? If so, what is the use case, and how does this impact the developer workflow or solution?

trojrobert commented 9 months ago

I was building an app where I wanted to do more with the column names. For instance, I wanted to show number of nan and manually and more details about the data

victordibia commented 9 months ago

Thanks for the clarification. It seems to me that your app should keep track of the dataset and all other processing needed without relying on LIDA's representation of the data by the Manager (.data). You can also keep track of both name lists and use that information as needed. Note that the column index will not change, but the title may.

In general, the renaming is done (and required) for two high level reasons:

What do you think?

trojrobert commented 9 months ago

I understand why the column renaming is importance. Serialization error can be a big problem. Maybe we can add a note in the code or in the summary that let the user have an idea of some things that changed and few information about the summary