microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
31.36k stars 4.57k forks source link

MongoDB Atlas VectorDB [clean] #2996

Closed ranfysvalle02 closed 3 months ago

ranfysvalle02 commented 3 months ago

Why are these changes needed?

MongoDB has been ranked as the best vector database(https://www.mongodb.com/blog/post/atlas-vector-search-commands-highest-developer-nps-retool-state-ai-2023-survey) in the Retool AI report, so it is quite important to add MongoDB vector search as an option for Autogen RAG.

You can easily start the MongoDB vector search on a free tier M0 MongoDB Atlas cluster. Free tier cluster provides the full functionality of the MongoDB vector search. https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview/

But why is MongoDB such a standout? Well, there are a few key reasons.

MongoDB Atlas integrates smoothly with existing databases. For organizations already using MongoDB, this means a seamless expansion into the vector storage—no major system overhauls required!
MongoDB Atlas is built to handle operational heavy-lifting. It excels when serving large-scale, mission-critical applications, offering robustness and reliability where it counts.
MongoDB's flexibility in handling a variety of data types and structures makes it perfectly suited to the complexity of vector embeddings.

As such, implementing MongoDB as a Retrieval Agent can unlock new potential in your AI applications, bringing the full power of vector storage to bear.

Related issue number: 711

Closes #711

Checks

ranfysvalle02 commented 3 months ago

@Hk669 @thinkall made a fresh pull request, with cleaner commit history. I did a lot of "learning" on that last pull request :)

I think we are pretty close to getting MongoDB into Autogen

thinkall commented 3 months ago

Test is still skipped:

https://github.com/microsoft/autogen/actions/runs/9621000866/job/26540852406?pr=2996#step:11:26

Need to update contrib-tests.yml

codecov-commenter commented 3 months ago

Codecov Report

Attention: Patch coverage is 0.92593% with 107 lines in your changes missing coverage. Please review.

Project coverage is 26.01%. Comparing base (89c2f20) to head (5f89f21). Report is 3 commits behind head on main.

Files Patch % Lines
autogen/agentchat/contrib/vectordb/mongodb.py 0.00% 104 Missing :warning:
autogen/agentchat/contrib/vectordb/base.py 25.00% 3 Missing :warning:

:exclamation: There is a different number of reports uploaded between BASE (89c2f20) and HEAD (5f89f21). Click for more details.

HEAD has 27 uploads more than BASE | Flag | BASE (89c2f20) | HEAD (5f89f21) | |------|------|------| |unittests|1|28|
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2996 +/- ## ========================================== - Coverage 32.49% 26.01% -6.49% ========================================== Files 93 100 +7 Lines 10097 10299 +202 Branches 2167 2356 +189 ========================================== - Hits 3281 2679 -602 - Misses 6532 7318 +786 - Partials 284 302 +18 ``` | [Flag](https://app.codecov.io/gh/microsoft/autogen/pull/2996/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft) | Coverage Δ | | |---|---|---| | [unittest](https://app.codecov.io/gh/microsoft/autogen/pull/2996/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft) | `12.28% <0.00%> (?)` | | | [unittests](https://app.codecov.io/gh/microsoft/autogen/pull/2996/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft) | `25.21% <0.92%> (-7.28%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=microsoft#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

gitguardian[bot] commented 3 months ago

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | | | -------------- | ------------------ | ------------------------------ | ---------------- | --------------- | -------------------- | | [-](https://dashboard.gitguardian.com/workspace/221093/incidents/secrets) | | MongoDB Credentials | 54655e80533434b8747b4dd1667d3ea07112697e | notebook/agentchat_mongodb_RetrieveChat.ipynb | [View secret](https://github.com/microsoft/autogen/commit/54655e80533434b8747b4dd1667d3ea07112697e#diff-3b19ca80eaa5af07cca96260ad5158f0e6283dbf82ad475df8155a877e469616R164) | | [-](https://dashboard.gitguardian.com/workspace/221093/incidents/secrets) | | MongoDB Credentials | 312230141b1562a1af6991d2620cc4397b07119a | notebook/agentchat_mongodb_RetrieveChat.ipynb | [View secret](https://github.com/microsoft/autogen/commit/312230141b1562a1af6991d2620cc4397b07119a#diff-3b19ca80eaa5af07cca96260ad5158f0e6283dbf82ad475df8155a877e469616L164) |
🛠 Guidelines to remediate hardcoded secrets
1. Understand the implications of revoking this secret by investigating where it is used in your code. 2. Replace and store your secrets safely. [Learn here](https://blog.gitguardian.com/secrets-api-management?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment) the best practices. 3. Revoke and [rotate these secrets](https://docs.gitguardian.com/secrets-detection/secrets-detection-engine/detectors/specifics/mongo_uri#revoke-the-secret?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment). 4. If possible, [rewrite git history](https://blog.gitguardian.com/rewriting-git-history-cheatsheet?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment). Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data. To avoid such incidents in the future consider - following these [best practices](https://blog.gitguardian.com/secrets-api-management/?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment) for managing and storing secrets including API keys and other credentials - install [secret detection on pre-commit](https://docs.gitguardian.com/ggshield-docs/integrations/git-hooks/pre-commit?utm_source=product&utm_medium=GitHub_checks&utm_campaign=check_run_comment) to catch secret before it leaves your machine and ease remediation.

🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

ranfysvalle02 commented 3 months ago

@thinkall - I think there is something going on with testing retrieval?

test/test_retrieve_utils.py ............s.                               [ 56%]
test/agentchat/contrib/retrievechat/test_pgvector_retrievechat.py s      [ 60%]
test/agentchat/contrib/retrievechat/test_qdrant_retrievechat.py s..      [ 72%]
test/agentchat/contrib/retrievechat/test_retrievechat.py s.              [ 80%]
test/agentchat/contrib/vectordb/test_mongodb.py s                        [ 88%]

---------- coverage: platform linux, python 3.10.14-final-0 ----------
Coverage XML written to file coverage.xml

======================== 20 passed, 5 skipped in 52.41s ========================
ranfysvalle02 commented 3 months ago

I polluted this PR :( sorry -- lets try this one last time

thinkall commented 3 months ago

I polluted this PR :( sorry -- lets try this one last time

There is no need to worry about the commit history. Make a new PR will lost the track history.