typesense / firestore-typesense-search

Firebase Extension to automatically push Firestore documents to Typesense for full-text search with typo tolerance, faceting, and more
https://extensions.dev/extensions/typesense/firestore-typesense-search
Apache License 2.0
159 stars 35 forks source link

TypeError: value.toDate is not a function while streaming firestore (and backfilling) #77

Closed leah-potato closed 6 months ago

leah-potato commented 7 months ago

Description

Getting an error when using the extension with date fields.

Steps to reproduce

1. In Typesense Cloud, create a new collection with auto schema:

{
  "name": "users",  
  "fields": [
    {"name": ".*", "type": "auto" }
  ]
}

2. Install firebase extension (typesense/firestore-typesense-search@1.4.0)

This creates a v1 google function (should it be v2?) and the auto generated schema: users.json

Expected Behavior

I install the extension and it streams or backfills without errors.

Actual Behavior

Logs for (resource.type="cloud_function" resource.labels.function_name=("ext-firestore-typesense-search-indexOnWrite") resource.labels.region="us-central1")** show:

TypeError: value.toDate is not a function\n at mapValue (/workspace/src/utils.js:7:29): log_error.json


So that's the streaming, but for backfill I kept on getting the same error which caused the function to stop. I had to change the top part of utils.js to get it to run:

const config = require("./config.js");
const admin = require("firebase-admin");

const mapValue = (value) => {
  if (typeof value === "object" && value !== null && value._seconds != null && value._nanoseconds != null) {
    // convert Firestore Timestamp to Unix timestamp
    // https://typesense.org/docs/0.22.2/api/collections.html#indexing-dates
    const timestamp = new admin.firestore.Timestamp(value._seconds, value._nanoseconds);
    return Math.floor(timestamp.toDate().getTime() / 1000);
  } else if (typeof value === "object" && value !== null && value.latitude != null && value.longitude != null) {
    return [value.latitude, value.longitude];
  } else if (typeof value === "object" && value !== null && value.firestore != null && value.path != null) {
    return {"path": value.path};
  } else if (Array.isArray(value)) {
    return value.map(mapValue);
  } else if (typeof value === "object" && value !== null) {
    return Object.fromEntries(Object.entries(value).map(([key, value]) => [key, mapValue(value)]));
  } else {
    return value;
  }
};

... I think I also had to change to node 14. I can't really remember, it was late at night and if I'm completely honest I'm more of a designer than an engineer 😩

It kept on getting stuck on the lastUpdated (int64) field so I had to reinstall the extension and specifically removed that field. I then successfully backfill 155k documents. Yeah!

I'm sure this is all really basic stuff - apologies. Appreciate any thoughts you have!

Metadata

Typesense Version: Cloud

jasonbosco commented 7 months ago

Could you export one sample document from your Firestore collection and share it here?

I wonder if the datatype for that field is somehow not a timestamp...

leah-potato commented 7 months ago

They are timestamp fields, ie:

createdAt: Timestamp { _seconds: 1713945148, _nanoseconds: 845000000 },
updatedAt: Timestamp { _seconds: 1714475704, _nanoseconds: 262000000 },
lastLogin: Timestamp { _seconds: 1714633329, _nanoseconds: 468000000 }

Shouldn't the auto schema account for this?


I'm also having another issue with id fields. Since they aren't indexable, how do I deal with those? Do I need to copy the id field to a new one, like userId? Renaming my id field is not an option.

jasonbosco commented 7 months ago

I just tested this in a test Firebase project and I can't seem to replicate the issue.

This is how my Firestore document looks:

Screenshot 2024-05-02 at 11 22 17 AM

When I click on Edit on the timestamp I see this:

Screenshot 2024-05-02 at 11 28 27 AM

Then when I create / update that document, I see this in Typesense:

Screenshot 2024-05-02 at 11 22 47 AM

This is the schema after the document is inserted:

Screenshot 2024-05-02 at 11 23 05 AM

Could you share a similar set of screenshot with me, from your Firebase project?

leah-potato commented 7 months ago

I don't think screenshots are going to help. Lets dig into the detail!

I've narrowed it down to some timestamps not having a nanonseconds value. We've migrated data from another system and it looks like new accounts have it, and old ones don't.

To account for this, I've modified mapValue with:

if (typeof value === "object" && value !== null && value.seconds != null && value.nanoseconds != null) {
  if (typeof value.toDate === "function") {
    console.log("Using original timestamp calc")
    return Math.floor(value.toDate().getTime() / 1000);
  } else {
    console.log("Issue with timestamp calc. Date Object:", value);
    if (value.nanoseconds && value.nanoseconds !== 0) {
      console.log("Using seconds + nanoseconds as timestamp");
      return Math.floor(value.seconds + value.nanoseconds / 1e9); 
    } else {
      console.log("Using seconds as timestamp");
      return value.seconds;
    }
  }

Then I start to see logs with:

{
  "textPayload": "Issue with timestamp calc. Date Object: { seconds: 1693749600, nanoseconds: 0 }",
  "insertId": "6634494f000b6e41014eeba6",
  "resource": {
    "type": "cloud_function",
    "labels": {
      "project_id": "some-project",
      "region": "us-central1",
      "function_name": "ext-users-indexOnWrite"
    }
... etc...

The backfill across 155k records now works without errors. Nice!

leah-potato commented 7 months ago

Now I just need to figure out how to stop the extension to reverting to it's default code. I have just been editing directly into the gcloud functions ui. Any tips on that?

Oh and how to copy the {uid}.id to a new field within typesense {userId} so I can use that field to search.

But progress.

jasonbosco commented 7 months ago

I've narrowed it down to some timestamps not having a nanonseconds value. We've migrated data from another system and it looks like new accounts have it, and old ones don't.

That's interesting to hear.

Now I just need to figure out how to stop the extension to reverting to it's default code. I have just been editing directly into the gcloud functions ui. Any tips on that?

I'd recommend just forking the code in this repo and running it in your own Google Cloud function and not use the extension, since you need to modify it. I don't think there's another way to keep the extension and use a modified version just for one portion of the code.

Oh and how to copy the {uid}.id to a new field within typesense {userId} so I can use that field to search.

You'd want to do this as part of your forked version of the code as well.

leah-potato commented 7 months ago

Makes sense. Thanks for your help and hopefully this thread helps someone someday.