monarch-initiative / curate-gpt

LLM-driven curation assist tool (pre-alpha)
https://monarch-initiative.github.io/curate-gpt/
BSD 3-Clause "New" or "Revised" License
49 stars 11 forks source link

duckdb_adapter implementation #44

Open iQuxLE opened 1 week ago

iQuxLE commented 1 week ago

This is a implementation of the duckdbvss_adapter (vector similarity search extension). There is still some things to be done, like implentation in the CLI.

I have realised that it might be smarter to create the index after the insertion of the data, I will test that and add it if needed. Furthermore field_names from DBAdapter currently gives back a tuple like ((id, None), (metadata, None), (...)) for the DuckDBVSSAdaper so might be helpful to change this to abstract and make a revised implementation into the DuckDBVSSAdapter. Depends on where this is used.

I also have realised there might be still some problem with the persistence feature. So in example running the Search.ipynb on the same db path given for duckDB twice might cause in issue (WAL file related). However there were no problems with that in the tests. I will look into that.

@cmungall

cmungall commented 1 week ago

This is great I leave PR merging to you Justin and Harry and others while I’m away

On Sat, Jul 6, 2024 at 4:17 AM Carlo Kroll @.***> wrote:

This is a implementation of the duckdbvss_adapter (vector similarity search extension). There is still some things to be done, like implentation in the CLI.

I have realised that it might be smarter to create the index after the insertion of the data, I will test that and add it if needed. Furthermore field_names from DBAdapter currently gives back a tuple like ((id, None), (metadata, None), (...)) for the DuckDBVSSAdaper so might be helpful to change this to abstract and make a revised implementation into the DuckDBVSSAdapter. Depends on where this is used.

I also have realised there might be still some problem with the persistence feature. So in example running the Search.ipynb on the same db path given for duckDB twice might cause in issue (WAL file related). However there were no problems with that in the tests. I will look into that.

@cmungall https://github.com/cmungall

You can view, comment on, or merge this pull request online at:

https://github.com/monarch-initiative/curate-gpt/pull/44 Commit Summary

File Changes

(7 files https://github.com/monarch-initiative/curate-gpt/pull/44/files)

Patch Links:

— Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/curate-gpt/pull/44, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOJPINL3FRGGZDULWXTZK7G4LAVCNFSM6AAAAABKOLAM2SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TGNJSGM3TEOI . You are receiving this because you were mentioned.Message ID: @.***>

caufieldjh commented 1 week ago

I can test this out once you say it's ready @iQuxLE