typesense / firestore-typesense-search

Firebase Extension to automatically push Firestore documents to Typesense for full-text search with typo tolerance, faceting, and more
https://extensions.dev/extensions/typesense/firestore-typesense-search
Apache License 2.0
150 stars 27 forks source link

How to prevent Typesense from updating its documents when a specific field in a Firestore document updates #79

Closed henryteng07 closed 1 month ago

henryteng07 commented 1 month ago

Description

I want to prevent Typesense extension from running indexOnWrite when a specific field inside the doc is updated. For example, I have a likeCount field in my post document which gets updated whenever a user likes the post. Right now the doc in Typesense is updating on each user like which is not efficient. Is there a way to tell Typesense not to update its doc for changes to certain fields?

jasonbosco commented 1 month ago

When installing the extension, you'll find the option to specify which fields are synced from Firestore to Typesense in the extension configuration.

See the 2nd parameter here: https://github.com/typesense/firestore-typesense-search/blob/master/assets/extension_configuration_example.png

henryteng07 commented 1 month ago

I tried that first but indexOnWrite still seems to be running when voteCount is incremented. Do I create a new extension for it to work?

Screenshot 2024-05-22 at 1 01 24 am
henryteng07 commented 1 month ago

I also set index to false in my Typesense schema

Screenshot 2024-05-22 at 1 04 55 am
jasonbosco commented 1 month ago

Oh I see what you mean now. indexOnWrite will always get triggered by Firebase, since it's a document change listener trigger. But within the trigger function, ~we only make an API call to Typesense to sync the data, if the fields you've specified in the extension configuration have changed.~ See below.

I don't think there's a way to setup field-level change triggers in Firebase.

henryteng07 commented 1 month ago

Thanks for clarifying! My goal is to only sync Typesense data when certain fields have changed. That, as you mentioned, can be done in the extension config.

P.S. Do I still leave voteCount in the database schema if its not in the extension config fields?

jasonbosco commented 1 month ago

No harm in leaving voteCount in the schema since it's an optional field, but you might as well remove it if you're not sending that data over to Typesense.

henryteng07 commented 1 month ago

Screenshot 2024-05-22 at 3 23 54 pm

Sorry to bother you again but why is the doc still upserted to Typesense even though the updated field "voteCount" isn't in the extension config?

jasonbosco commented 1 month ago

My bad, I mis-spoke earlier. I just refreshed my memory looking at the code and confirmed this - the upsert will always happen in the extension, but the actual logic of whether a field should be re-indexed or not lives inside Typesense server, not in the extension - meaning that if Typesense received a document to update with the same values that it already has, then it will just ignore that update.

But the extension itself will always make the API call. What the fields configuration in the extension does is, it lets you pick if you want to send the full document in the update API call, or just a subset of fields from your document, regardless of whether they've changed or not.

henryteng07 commented 1 month ago

Thanks again! So I'll be charged a Firebase function invocation regardless but the API call will only update the Typesense document if the fields in the config are updated.

I'm still unsure whether to include voteCount in Typesense. I want to sort search results by "most popular", but 1000 votes on a post would equal 1000 Typesense doc updates on top of 1000 Firestore update requests. Are there any solutions you'd recommend? Perhaps a way to batch updates?

jasonbosco commented 1 month ago

You could exclude the voteCount from the official extension, but write a separate scheduled cloud function yourself that periodically looks at all records updated since the last time the scheduled function ran, and then bulk updates all the changed records where voteCount changed in a single API call.

henryteng07 commented 1 month ago

I managed to get the cloud function up and running. But when I check the doc inside Typesense, it doesn't have the voteCount field nor the rest of my postData. I've tried upsert, update, and even emplace. The doc only has fields I've mentioned inside the extension.

exports.scheduledTypesenseUpdate = functions.pubsub
  .schedule("every 5 minutes")
  .onRun(async (context) => {
    // get last run timestamp
    const lastRunDoc = await admin
      .firestore()
      .collection("metadata")
      .doc("lastRun")
      .get();
    const lastRun = lastRunDoc.exists
      ? lastRunDoc.data().timestamp
      : admin.firestore.Timestamp.now();

    // get posts updated since last run
    const postsSnapshot = await admin
      .firestore()
      .collection("posts")
      .where("updatedAt", ">", lastRun)
      .get();

    // for each post, check if voteCount has changed
    // if it has, add it to the list of posts to update in Typesense
    const postsToUpdate = [];

    postsSnapshot.forEach((doc) => {
      const postData = doc.data();
      if (postData.voteCount !== postData.voteCountInTypesense) {
        postsToUpdate.push({
          id: doc.id,
          ...postData,
          voteCountInTypesense: postData.voteCount,
          createdAt: postData.createdAt.toMillis(), // convert Timestamp to int64
          updatedAt: postData.updatedAt.toMillis(), // convert Timestamp to int64
        });
      }
    });

    console.log("Posts to update:", postsToUpdate.length);

    // update posts in Typesense
    if (postsToUpdate.length > 0) {
      try {
        await typesenseClient
          .collections("posts")
          .documents()
          .import(postsToUpdate, { action: "update" });
      } catch (error) {
        console.error("Error updating posts in Typesense! :", error);
      }

      // update voteCountInTypesense in Firestore for each updated post
      const batch = admin.firestore().batch();
      postsToUpdate.forEach((post) => {
        const postRef = admin.firestore().collection("posts").doc(post.id);
        batch.update(postRef, { voteCountInTypesense: post.voteCount });
      });
      await batch.commit();
    }

    // update last run timestamp
    await admin
      .firestore()
      .collection("metadata")
      .doc("lastRun")
      .set({ timestamp: admin.firestore.Timestamp.now() });
  });
jasonbosco commented 1 month ago

Did you add the voteCount field back to the Typesense Collection schema after removing it here: https://github.com/typesense/firestore-typesense-search/issues/79#issuecomment-2123393924

henryteng07 commented 1 month ago

I found out whats causing the issue!

I'm updating my firestore doc in my cloud function (voteCountInTypesense field). That triggers the plugin which updates only the mentioned fields and removes the rest, undoing the work of the cloud function.

But that doesn't make sense since you mentioned the API will only update if changes are made to the fields mentioned in the plugin config.

henryteng07 commented 1 month ago

After some testing, I'm pretty sure the plugin is updating the Typesense doc even if the field that's updated in firestore was not mentioned in the plugin.

  1. Typesense doc contains only fields mentioned in plugin
  2. When user updates voteCount (not included in plugin) Typesense shows only fields mentioned in plugin
  3. Every 5mins when cloud functions runs, it updates the doc with all fields including voteCount
  4. When another user updates voteCount (not included in plugin) Typesense resets and shows only fields mentioned in plugin

This means everytime user updates voteCount, typesense resets to only the plugin fields.

jasonbosco commented 1 month ago

You're right - the plugin does an upsert (not an update) which requires the full document to be sent and any fields not specified in the upsert will be removed from the document.

At this point, since you have a custom function going, I think it might be best to write your own function handler that also sends updates (instead of upserts), instead of using this extension.