Open iQuxLE opened 1 week ago
This is great I leave PR merging to you Justin and Harry and others while I’m away
On Sat, Jul 6, 2024 at 4:17 AM Carlo Kroll @.***> wrote:
This is a implementation of the duckdbvss_adapter (vector similarity search extension). There is still some things to be done, like implentation in the CLI.
I have realised that it might be smarter to create the index after the insertion of the data, I will test that and add it if needed. Furthermore field_names from DBAdapter currently gives back a tuple like ((id, None), (metadata, None), (...)) for the DuckDBVSSAdaper so might be helpful to change this to abstract and make a revised implementation into the DuckDBVSSAdapter. Depends on where this is used.
I also have realised there might be still some problem with the persistence feature. So in example running the Search.ipynb on the same db path given for duckDB twice might cause in issue (WAL file related). However there were no problems with that in the tests. I will look into that.
@cmungall https://github.com/cmungall
You can view, comment on, or merge this pull request online at:
https://github.com/monarch-initiative/curate-gpt/pull/44 Commit Summary
- 1d9ebf7 https://github.com/monarch-initiative/curate-gpt/pull/44/commits/1d9ebf78cb8e80e86ff7174ece07c5375e453049 duckdb_adapter implementation
File Changes
(7 files https://github.com/monarch-initiative/curate-gpt/pull/44/files)
- M notebooks/api/Search.ipynb https://github.com/monarch-initiative/curate-gpt/pull/44/files#diff-bd653f909c7dbba92f50f8ab0142ab117e0a9dc10280edcdba07d67f37349d96 (8358)
- M src/curate_gpt/store/init.py https://github.com/monarch-initiative/curate-gpt/pull/44/files#diff-aac648474879e0dda0bbc8c449f72b3aec8863193a7a32ee5f0080b2be623e71 (3)
- M src/curate_gpt/store/db_adapter.py https://github.com/monarch-initiative/curate-gpt/pull/44/files#diff-d17da7a2e18064bc53a8f3f9ae265962cacca9ab83dfa85f13759aa829f57d5c (48)
- A src/curate_gpt/store/duckdb_adapter.py https://github.com/monarch-initiative/curate-gpt/pull/44/files#diff-d4554fe306a8cbaedda862f5ea3af3bc7799f4fe8a4e6c61b930fc3a3eaf2e3c (802)
- A src/curate_gpt/store/duckdb_result.py https://github.com/monarch-initiative/curate-gpt/pull/44/files#diff-bf6b1dba0838c5021cdadd9768e9a355fe73995afb9a13d326d1ed786f3d4176 (21)
- M tests/init.py https://github.com/monarch-initiative/curate-gpt/pull/44/files#diff-001e61d97d9dee27d0c5aa1f23ba6c1972cd8a2e8320bac839656b0ab935b84d (3)
- A tests/store/test_duckdb_adapter.py https://github.com/monarch-initiative/curate-gpt/pull/44/files#diff-83441255e403c872fe0a3ee8d701b3b1ae9c82b278ebbaae8e00a4a058ffd220 (241)
Patch Links:
- https://github.com/monarch-initiative/curate-gpt/pull/44.patch
- https://github.com/monarch-initiative/curate-gpt/pull/44.diff
— Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/curate-gpt/pull/44, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOJPINL3FRGGZDULWXTZK7G4LAVCNFSM6AAAAABKOLAM2SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TGNJSGM3TEOI . You are receiving this because you were mentioned.Message ID: @.***>
I can test this out once you say it's ready @iQuxLE
This is a implementation of the duckdbvss_adapter (vector similarity search extension). There is still some things to be done, like implentation in the CLI.
I have realised that it might be smarter to create the index after the insertion of the data, I will test that and add it if needed. Furthermore
field_names
from DBAdapter currently gives back a tuple like ((id, None), (metadata, None), (...)) for theDuckDBVSSAdaper
so might be helpful to change this to abstract and make a revised implementation into theDuckDBVSSAdapter
. Depends on where this is used.I also have realised there might be still some problem with the persistence feature. So in example running the
Search.ipynb
on the same db path given for duckDB twice might cause in issue (WAL file related). However there were no problems with that in the tests. I will look into that.@cmungall