truefoundry / cognita

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
https://cognita.truefoundry.com
Apache License 2.0
3.32k stars 274 forks source link

feat: add read signed url of the vector chunks #374

Closed mnvsk97 closed 1 month ago

mnvsk97 commented 1 month ago
mnvsk97 commented 1 month ago

Need clarity on two things:

  • Need to update chunk metadata with document metadata, if any for all the parsers

  • How does caching work for parsers with same extension but different config, as the caching key is the file extension.

  1. Not sure about this, need an example.
  2. I missed the part around config. I'll add an md5 hash of the config and extension.
S1LV3RJ1NX commented 1 month ago

Not sure about this, need an example.

@mnvsk97 - if you would check get_chunks function of any parser, it takes file_path and metadata (lets call this as doc_metadata for sake of explanation) as input argument. So when an individual document chunk generates it's own metadata, we should also add the doc_metadata to the chunk metadata.

Another thing that is missing is, QueryController's required_metadata list should be updated to send pre-signed urls in the response. https://github.com/mnvsk97/cognita/blob/16c20fb006e9065cc435439d77a44bd58e3981a6/backend/modules/query_controllers/base.py#L24

mnvsk97 commented 1 month ago

Not sure about this, need an example.

@mnvsk97 - if you would check get_chunks function of any parser, it takes file_path and metadata (lets call this as doc_metadata for sake of explanation) as input argument. So when an individual document chunk generates it's own metadata, we should also add the doc_metadata to the chunk metadata.

Another thing that is missing is, QueryController's required_metadata list should be updated to send pre-signed urls in the response. https://github.com/mnvsk97/cognita/blob/16c20fb006e9065cc435439d77a44bd58e3981a6/backend/modules/query_controllers/base.py#L24

Resolved all the above mentioned points in the latest commits