nuxt-hub / core

Build full-stack applications with Nuxt on CloudFlare, with zero configuration.
https://hub.nuxt.com
Apache License 2.0
937 stars 53 forks source link

Cloudflare Vectorize #174

Open RihanArfan opened 3 months ago

RihanArfan commented 3 months ago

Is your feature request related to a problem? Please describe. I'd like to use Cloudflare Vectorize (database for storing vectors) alongside Workers AI (#173).

Describe the solution you'd like Vectorize is similar to D1 where it's a database. It's implementation would look similar to D1 in Nuxt Hub.

Describe alternatives you've considered Manually adding a binding and directly using the API (#113)

Additional context

Happy to contribute a simple PR into @nuxt-hub/core to add hubVectorize() now but it wouldn't include proxying remote or devtools viewer though.

Also, should it be called hubVectorize() or should it have a different name like hubVectorDatabase()?

atinux commented 3 months ago

Thank your for creating the issue @RihanArfan

I think hubVectorize() makes the more sense, feel free to open a PR to start the work 😊

RihanArfan commented 4 weeks ago

While implementing it, I've ran into things which require big decisions to be made. Also moving this here instead of the PR description so it's clearer.

Multiple bindings:

Remote bindings:

Alternatively we just wait for Cloudflare to support local Vectorize bindings in wrangler to save massive complexity and overengineering 😄

Resetting indexes for local developent:

Wrangler would probably have a way to do this once local development is supported.

Approaches for managing indexes:

How Vectorize indexes should be managed using NuxtHub across different environments, etc.. Unlike other features like databases, a vectorize index needs options provided during creation based on the text embeddings model a user plans to use. Vectorize indexes also require specifying metadata indexes upfront if you want to use metadata filtering.

Any of these options would be used using hubVectorize(<binding>):

const vectorize = hubVectorize("products")

Here are some different options I've thought of to handle it.

Option A

pnpx wrangler vectorize create foo-ecommerce-products --dimensions=768 --metric=cosine
pnpx wrangler vectorize create foo-ecommerce-products-preview --dimensions=768 --metric=cosine
pnpx wrangler vectorize create foo-ecommerce-products-development --dimensions=768 --metric=cosine
export default defineNuxtConfig({
  hub: {
    // user needs to create metadata indexes via cli
    vectorize: {
      products: {
        production: 'foo-ecommerce-products',
        preview: 'foo-ecommerce-products-preview',
        development: 'foo-ecommerce-products-development'
      },
      reviews: {
        production: 'your-vectorize-id',
        preview: 'your-vectorize-id',
        development: 'your-vectorize-id'
      }
    },
  }
})

Option B

Specifying index details (dimensions, metric) via nuxt.config.ts. This approach needs extending to add metadata indexes, which are necessary to filter vectors via metadata.

    vectorize: {
      // nuxthub handled creation of the index across environments
      products: {
        metric: 'cosine',
        dimensions: 768,
      }
      // use own vectorize indexes
      reviews: {
        production: 'your-vectorize-id',
        preview: 'your-vectorize-id',
        development: 'your-vectorize-id'
      }
    }

DX might be confusing as changing the config probably shouldn't result in automatically recreating production index to prevent accidental data loss. A potential fix is keeping the old Vectorize index but simply pointing the binding to a new index.

Option C

Create, reset and delete indexes via a NuxtHub CLI and/or dashboard. All handled via CLI and backend rather than nuxt.config.ts. On start of dev server, Nuxt checks what indexes are available. This approach allows manually managing what indexes exist on each environment, including using existing indexes.

export default defineNuxtConfig({
  hub: {
    database: true,
    vectorize: true,
  },
});
$ # vectorize specific - future multi-bindings could have individual things
$ nuxthub vectorize create products --dimensions=768 --metric=cosine
# Done! Binding: PRODUCTS Index: foo-ecommerce-products, foo-ecommerce-products-preview, foo-ecommerce-local
# Use via `useVectorize("products")`

$ nuxthub vectorize list
# Vectorize indexes associated with "foo-ecommerce":
# [PRODUCTS]: # hubVectorize('products')
# foo-ecommerce-products         | dimensions: 768 | metrics: cosine
# foo-ecommerce-products-preview | dimensions: 768 | metrics: cosine
# foo-ecommerce-products-local   | dimensions: 768 | metrics: cosine
#
# [REVIEWS]:
# foo-ecommerce-products-local   | dimensions: 768 | metrics: cosine
#
# [KNOWLEDGEBASE]:
# support-system-articles        | dimensions: 768 | metrics: cosine
#
# ----------
# Create new index:
# nuxthub vectorize create <name> [--dimensions=<int>] [--metric=<string>] [--environments=<string=all>]
#
# Link an existing index:
# nuxthub vectorize link-existing-index support --index=support-system-articles-preview --environments=preview# 
#
# Create a metadata index for an index:
# nuxthub vectorize create-metadata-index products --environments=all --property-name=streaming_platform --type=string
#
# Recreate local-development index (other environments would need deleting and recreating explicitly)
# nuxthub vectorize reset-dev products