rlingineni / Lambda-Serverless-Search

Use AWS Lambda to perform free-text search on documents - With SAM Template
MIT License
143 stars 16 forks source link

Save indexes directly to Lambda Function #1

Open rlingineni opened 5 years ago

rlingineni commented 5 years ago

You will have to update the indexing function to store the indexes directly in S3 Bucket where the Lambda function is stored.

It can almost ~2 seconds to get all the virtual indexes from S3, but considering, each file is only about ~1MB, if we save the index onto the lambda function directly, we can shave that time off.

Of course, this could make the architecture a bit dirty, but performance gains will be great.

seriousme commented 5 years ago

Hi,

Another way to speed up the search might be to use S3 Select https://aws.amazon.com/blogs/aws/s3-glacier-select/

This reduces the need to fetch the whole indexes.

Cheers, Hans

rlingineni commented 5 years ago

We would still have to load an entire index into the function's memory since we don't want just a subset of an index.

seriousme commented 5 years ago

It might work for larger datasets but then you need to alter the query algorithm as well. Standard lunrjs would not be able to work with that.

seriousme commented 5 years ago

btw: if updates are infrequent (e.g. only during nightly batches) and the index does not need to be super current then you might include the index with the lambda bundle so that with every time the index is updated a new version of the lambda is deployed.

rlingineni commented 5 years ago

Right, yeah that's what I was thinking. Upload it with the lambda bundle. Even if it was frequent, I don't think it would matter. It doesn't cost us anything to update Lambda functions, and usually, from experience, a new bundle doesn't mean downtime.

As far as lunrjs goes, I agree, changes should be made to the core. There should be a way in lunrjs to load multiple indexes for a server-side user case.