Open JimVincentW opened 9 months ago
Hey @JimVincentW Sorry that our docs are still unclear on that. For local development you need the following things:
git clone https://github.com/technologiestiftung/parla-api.git
cd parla-api
supabase start
You will need to populate your environment with some variables. At Technologiestiftung we use direnv for that. Thats why there is a .envrc.sample
in the repo. If you use direnv:
cd parla-api
cp .envrc.sample .envrc
# populate the variables, then
direnv allow
Without direnv
not you can source the .envrc file
cd parla-api
cp .envrc.sample .envrc
# populate the variables, then
source .envrc
cd parla-api
npm ci
cd parla-api
npm run dev
The frontend only needs the URL of the api as env var. Which by default should be http://localhost:8080
npm ci
cp .env.sample .env
npm run dev
There are already some documents prepared in the supabase/seed.sql
. If you want to work with your own documents you need to take a look at the https://github.com/technologiestiftung/parla-document-processor/
And yes. There is also an docker image https://hub.docker.com/repository/docker/technologiestiftung/parla-api/general
No worries! And thanks for the further insight :)
I would like to use the parla vector storage, because plugging in some Llama or Mixtral on my rented instance would run that RAG a lot cheaper, but I really love the effort to store Berlins parliamentary documents! Could we hack a way together to make that vector storage accessible as an api endpoint?
I am personally in love with qdrant, I could imagiine a lightweight ci-cd pipeline for updating a hosted qdrant docker image. Open-source and easy to manage.
The vector storage is just supabase.com should be pretty straightforward to use it as vector db. No need to use the parla api for that. It provides an introspected api out of the box. This here might be a starting point.
No worries! And thanks for the further insight :)
I would like to use the parla vector storage, because plugging in some Llama or Mixtral on my rented instance would run that RAG a lot cheaper, but I really love the effort to store Berlins parliamentary documents! Could we hack a way together to make that vector storage accessible as an api endpoint?
I am personally in love with qdrant, I could imagiine a lightweight ci-cd pipeline for updating a hosted qdrant docker image.
Open-source and easy to manage.
A wait, now I get it. You want us to expose an endpoint for searching through our embeddings.
This is something we need to discuss internally. Comes with a cost for us since it might produce a lot of egress.
If you decide it's feasible and in scope I would like to contribute to it. Because imho plugging multiple of vector databases (e.g. Abgeordnetenhaus & Bundestag) would make research on really intelligent political RAG/ Agents much easier. Also I can't imagine calling the OpenAI Api is cost-efficient.
A cronjob could just regularly update the qdrant docker image from postgres/ or supabase in a microservice. Querying qdrant afterwards is super straightforward with multiple similary/ recommendation algorithms at hand.
Can you point me to the right documentation for how I can spin this up or access the service? Is there a docker image I can pull?