Closed owocowyy closed 3 years ago
@owocowyy Thank you for reporting this. I made an incorrect assumption that this method automatically paginates and returns documents in small batches from Firestore:
It looks like that might not be the case, so ALL documents are being loaded into memory, exhausting it.
Let me see if there's a batch retrieval mechanism...
@owocowyy Skimming through the docs I wasn't able to find a quick way to paginate through all the docs in a collection, without a sort field.
So for now I've increased the function's memory to 4x of what it used to be and published it as a new version: https://github.com/typesense/firestore-typesense-search/releases/tag/v0.2.5
That might resolve the issue but I think it isn't the optimal way of doing it. I know that Firestore has its own limitations and I didn't check the docs to find a better way so for now I think we can close this issue or mark it as "needs future improvements"
Yeah, I agree this is not an ideal solution.
I'll close this for now, but I'll keep an eye out for any potential alternate solutions.
The solution to this is to use a firestore query with cursors which allow you to iterate any size collection in batches, effectively re-starting from the last point each time. https://firebase.google.com/docs/firestore/query-data/query-cursors
The typical approach is to schedule a task to keep running a function, which itself schedules the next task to continue with until the entire collection has been iterated over. https://cloud.google.com/tasks/docs/tutorial-gcf
You can get even fancier by splitting up the key-space of the dataset so you can have multiple parallel processes running. Whether it's worthwhile really depends on how many records you have to process and how quickly you want it to happen. I used this approach with datastore, which was the forerunner to firestore, and it would work for tens of millions of records. https://github.com/CaptainCodeman/datastore-mapper
Description
Hi! I want to implement full text search into my project which is based on Firebase. Right now I'm trying to get some things rolling by backfilling my data to the TypeSense Collection. I'm using a Firebase TypeSense Extension from this repo. Everything went smoothly except the backfilling process. The
ext-firestore-typesense-search-backfillToTypesenseFromFirestore
cloud function gives me an error "Memory limit exceeded". Usually I fix that problem by deploying a function once again with higher available resources but I'm not sure if this is the proper solution. The test collection has a size of around 60MB and contains 53K documents. I was able to successfully export other collection which is a bit smaller (around 30k documents).Steps to reproduce
Expected Behavior
The
ext-firestore-typesense-search-backfillToTypesenseFromFirestore
trigger function shouldn't crash.Actual Behavior
If the collection is big enough the
ext-firestore-typesense-search-backfillToTypesenseFromFirestore
trigger function returns an errorFunction invocation was interrupted. Error: memory limit exceeded.
Metadata
Typsense Version: 0.21.0