microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
30.56k stars 4.45k forks source link

[Feature Request]: Improve AutogenStudio Database Implementation (link database entities, remove api keys) #1694

Open jxraynaud opened 6 months ago

jxraynaud commented 6 months ago

Is your feature request related to a problem? Please describe.

I have the same issue than the one describe here : https://github.com/microsoft/autogen/issues/1415. So I dig a bit deeper and looked at the schema used for the sqlite database. I noticed a few things:

  1. The api keys are stored in the database. That's a problem because an easy way to share a whole project is to share the database itself. I'm aware that everything can be exported and then reimported, but easiest way is still to share the db, and some people will do it without being aware that the database is storing api keys.
  2. Storing, in plain text, api keys in a db is a very bad practice.
  3. The problem with updating the api key of the model and having to relink the model of every single agent is a "no-go" for any real world project. As long as we don't take advantage of relational databases like sqlite to solve this issue it means that maintenance is impossible.

Describe the solution you'd like

Analyse a bit better the actual underlying relational schema. For example the agent table should have a model_id field. Implement a serialised/deserializer logic allowing to be both flexible and strict when it comes to relations. Implement cascade/protect logic (for example not being able to delete an agent used by a workflow or a model used by an agent).

Modify the existing code to ensure that we use the serializer/deserializer logic everywhere and that we have proper handling of the errors in the UI.

Never store the value of the api key in the database. Instead allow the deserializer to automatically get it from env variable or a config file or a .env file. Obviously the serializer must remove the api keys when receiving the llm_config.

Obviously the hard part would be to create a script allowing to migrate from the previous version of the db to the new one. But I think agents can do this kind of scripts relatively well.

I can help for the db part.

Additional context

No response

victordibia commented 6 months ago

Hi @jxraynaud ,

Thanks for the notes. All of these can improve current behaviors in autogenstudio. Studio originally began as a example but is transitioning into a tool that we can all use to prototype with autogen. The changes above will help us get there.

To summarize your proposed changes and add thoughts on implementation:

Linking Database Entities.

The current implementation uses the database simply for persistence without managing associations across tables for each entity (skills, models, agents, workflows). The proposed change will natively support links between entities (primary and foreign keys). Some potential implementation steps

API Keys in Env Variables / Config Files.

Currently implementation saves all data provided in in the db (including api_keys). Proposed changes will remove api keys from db and maintain a reference to env variables. Implementation wise

These are significant changes and I am happy to treat these as a breaking update at the moment (i.e., start with a focus on functionality then implement migrations/backward compat later). We can also use the opportunity to completely rewrite the database layer for AGS using an ORM tool like SQLAlchemy. This lets us maintain the validation we have with fastapi whilst also supporting multiple db backends.

Overall, all of these will really improve the dev ex for AGS. What are your thoughts?

Related

jxraynaud commented 6 months ago

That seems great ! Indeed using SQL Alchemy will allow to easily tackle the Postgres support feature request.

I'd be happy to contribute. I'll work on a first database schema next week and I'll post it here. If you validate it, I can try to implement the SQL alchemy / fastapi part.

victordibia commented 6 months ago

Thanks. Sounds like a good plan. I'll also do some investigation when I have sometime and share updates.

victordibia commented 5 months ago

I have started looking into moving AGS backend to use an ORM (to mainly address db entity linking issue above).

Based on the research I have done, SQLModel seems to be the right approach - it integrates SQLAlchemy and FastAPI data models. The benefits are multiple

I have started a branch here to explore this.

ORM Implementation Progress

FYI @gagb , @ekzhu