googleapis / nodejs-firestore

Node.js client for Google Cloud Firestore: a NoSQL document database built for automatic scaling, high performance, and ease of application development.
https://cloud.google.com/firestore/
Apache License 2.0
636 stars 148 forks source link

FieldValue.vector works well on top, but does not work in nested fields #2081

Open mrkaraaslan opened 3 weeks ago

mrkaraaslan commented 3 weeks ago

Environment

Problem

FieldValue.vector works well on top, but does not work in nested fields

Steps to reproduce:

This code works well

const admin = require("firebase-admin");
const functions = require("firebase-functions");
const embedder = require("./create_embedding");

exports.autoCreateEmbedding = functions.runWith({memory: "1GB"})
    .firestore.document("z_embedding/{docId}").onCreate(async (snap) => {
      const fv = admin.firestore.FieldValue;
      const data = snap.data();
      const n = data.name;
      const n2 = data.name2;

      const tensorEmbedding1 = await embedder.createEmbeddingTensor(n);
      const tensorEmbedding2 = await embedder.createEmbeddingTensor(n2);
      const googleEmbedding1 = await embedder.createEmbeddingGoogle(n);
      const googleEmbedding2 = await embedder.createEmbeddingGoogle(n2);

      console.log(`created embeddings`);
      const te = {t1: fv.vector( tensorEmbedding1), t2: fv.vector(tensorEmbedding2)};
      const ge = {g1: fv.vector(googleEmbedding1), g2: fv.vector(googleEmbedding2)};
      // concat embeddings into one object
      const embeddings = {...te, ...ge}; // PROBLEM WILL BE HERE

      console.log(`updating embeddings`);
      try {
        await snap.ref.update(embeddings);
      } catch (error) {
        console.log("Error updating document: ", error);
      }
    });

What happened? How can we make the problem occur? When I change the problem line from

const embeddings = {...te, ...ge}; // PROBLEM WILL BE HERE

to

const embeddings = {te, ge};
await snap.ref.update(embeddings);

operation fails with

Error updating document:  Error: 13 INTERNAL: An internal error occurred.

Relevant Code:

  createEmbeddingGoogle: async function(text) {
    const projectId = JSON.parse(process.env.FIREBASE_CONFIG).projectId;

    // SETUP MODEL
    const model = "text-multilingual-embedding-002";
    const clientOptions = {apiEndpoint: "us-central1-aiplatform.googleapis.com"};
    const endpoint = `projects/${projectId}/locations/us-central1/publishers/google/models/${model}`;
    const instances = [helpers.toValue({content: toSuperLower(text), taskType: "QUESTION_ANSWERING"})];
    const request = {endpoint, instances};
    const client = new PredictionServiceClient(clientOptions);

    // GET EMBEDDING
    const [response] = await client.predict(request);
    const prediction = response.predictions[0];
    const embeddings = prediction.structValue.fields.embeddings;
    const values = embeddings.structValue.fields.values.listValue.values;
    const embeddingsArray = values.map((value) => value.numberValue);
    return embeddingsArray;
  },

  createEmbeddingTensor: async function(text) {
    tf.setBackend("tensorflow");
    const model = await use.load();
    const embeddings = await model.embed(text);
    const embeddingsArray = embeddings.arraySync().flat();
    return embeddingsArray;
  },

Additional

To verify that I am creating the correct object format I tested with this and it works well

exports.autoCreateEmbedding = functions.runWith({memory: "1GB"})
    .firestore.document("z_embedding/{docId}").onCreate(async (snap) => {
      // const fv = admin.firestore.FieldValue;
      const data = snap.data();
      const n = data.name;
      const n2 = data.name2;

      // const tensorEmbedding1 = await embedder.createEmbeddingTensor(n);
      // const tensorEmbedding2 = await embedder.createEmbeddingTensor(n2);
      // const googleEmbedding1 = await embedder.createEmbeddingGoogle(n);
      // const googleEmbedding2 = await embedder.createEmbeddingGoogle(n2);

      console.log(`created embeddings`);
      const te = {t1: n, t2: n2};
      const ge = {g1: n, g2: n2};
      // concat embeddings into one object
      const embeddings = {te, ge};

      console.log(`updating embeddings`);
      try {
        await snap.ref.update(embeddings);
      } catch (error) {
        console.log("Error updating document: ", error);
      }
    });
MarkDuckworth commented 3 weeks ago

Error 13 internal is definitely unexpected. However, I wasn't able to reproduce the issue by nesting vectors in other object. Are you able to provide us with logs from the SDK? Add this line to your repro to configure SDK logging:

setLogFunction(console.log);
mrkaraaslan commented 3 weeks ago

I added the line and got the following results.

Firestore (7.9.0) 2024-07-11T07:20:23.748Z YLwHZ [WriteBatch.commit]: Sending 1 writes Firestore (7.9.0) 2024-07-11T07:20:23.749Z YLwHZ [ClientPool.acquire]: Creating a new client (requiresGrpc: false) Firestore (7.9.0) 2024-07-11T07:20:23.829Z ##### [clientFactory]: Initialized Firestore GAPIC Client (useFallback: false) Firestore (7.9.0) 2024-07-11T07:20:23.831Z YLwHZ [Firestore.request]: Sending request: Firestore (7.9.0) 2024-07-11T07:20:24.051Z YLwHZ [Firestore.request]: Received error: Error: 13 INTERNAL: An internal error occurred. at callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:19) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:193:76) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181) at /workspace/node_modules/@grpc/grpc-js/build/src/resolving-call.js:129:78 at process.processTicksAndRejections (node:internal/process/task_queues:77:11) for call at at ServiceClientImpl.makeUnaryRequest (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:161:32) at ServiceClientImpl. (/workspace/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19) at /workspace/node_modules/@google-cloud/firestore/build/src/v1/firestore_client.js:237:29 at /workspace/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16 at repeat (/workspace/node_modules/google-gax/build/src/normalCalls/retries.js:80:25) at /workspace/node_modules/google-gax/build/src/normalCalls/retries.js:119:13 at OngoingCallPromise.call (/workspace/node_modules/google-gax/build/src/call.js:67:27) at NormalApiCaller.call (/workspace/node_modules/google-gax/build/src/normalCalls/normalApiCaller.js:34:19) at /workspace/node_modules/google-gax/build/src/createApiCall.js:112:30 at process.processTicksAndRejections (node:internal/process/task_queues:95:5) { code: 13, details: 'An internal error occurred.', metadata: Metadata { internalRepr: Map(1) { 'x-debug-tracking-id' => [Array] }, options: {} }, note: 'Exception occurred in retry method that was not classified as transient' } Error updating document: Error: 13 INTERNAL: An internal error occurred. at callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:19) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:193:76) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181) at /workspace/node_modules/@grpc/grpc-js/build/src/resolving-call.js:129:78 at process.processTicksAndRejections (node:internal/process/task_queues:77:11) for call at at ServiceClientImpl.makeUnaryRequest (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:161:32) at ServiceClientImpl. (/workspace/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19) at /workspace/node_modules/@google-cloud/firestore/build/src/v1/firestore_client.js:237:29 at /workspace/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16 at repeat (/workspace/node_modules/google-gax/build/src/normalCalls/retries.js:80:25) at /workspace/node_modules/google-gax/build/src/normalCalls/retries.js:119:13 at OngoingCallPromise.call (/workspace/node_modules/google-gax/build/src/call.js:67:27) DEFAULT 2024-07-11T07:20:24.051863Z at NormalApiCaller.call (/workspace/node_modules/google-gax/build/src/normalCalls/normalApiCaller.js:34:19) at /workspace/node_modules/google-gax/build/src/createApiCall.js:112:30 at process.processTicksAndRejections (node:internal/process/task_queues:95:5) Caused by: Error at WriteBatch.commit (/workspace/node_modules/@google-cloud/firestore/build/src/write-batch.js:436:23) at DocumentReference.update (/workspace/node_modules/@google-cloud/firestore/build/src/reference/document-reference.js:384:14) at /workspace/vector_search/vector_search.js:93:24 { code: 13, details: 'An internal error occurred.', metadata: Metadata { internalRepr: Map(1) { 'x-debug-tracking-id' => [Array] }, options: {} }, note: 'Exception occurred in retry method that was not classified as transient' } Function execution took 30622 ms, finished with status: 'ok'

MarkDuckworth commented 3 weeks ago

Thanks for the log snippet. Unfortunately that doesn't contain any info that will help me diagnose further. Can you perhaps share your project ID?

mrkaraaslan commented 3 weeks ago

Should I share it publicly? I think I should not.

MarkDuckworth commented 3 weeks ago

You can share privately by creating a support case at https://firebase.google.com/support/troubleshooter/contact. Reference this github issue in the support case and ask them to forward the issue to me for faster resolution.

NickChittle commented 3 weeks ago

We were able to reproduce the issue and are working on a fix, however it will probably be a little bit before it's rolled out.

Thanks for reporting this!