ishan0102 / engblogs

learn from your favorite tech companies
https://engblogs.dev
MIT License
162 stars 16 forks source link

Out of memory #25

Closed ishan0102 closed 1 year ago

ishan0102 commented 1 year ago

There are so many records now that pulling them all into memory to check for duplicates crashes my digital ocean box. Need to rewrite that script to be slow and just check via Supabase API.

raghunandanbhat commented 1 year ago

Can we use something like a bloom filter to check for duplicate records?

ishan0102 commented 1 year ago

Don't think bloom filter would work because wouldn't you still need to pull the db into memory? Fixed as of now by simplying asking supabase if the link exists before inserting a post.

raghunandanbhat commented 1 year ago

You need to pull the db into memory only once, while creating the filter. Once the filter is created, check if the corresponding bit is set for the new link in the filter. Skip if the bit is set; if the bit is not set, insert the link to db and set the bit in bloom filter.

Checking in supabase is also fine, it's a simpler solution. If querying supabase becomes slower, we could add a bloom filter.

ishan0102 commented 1 year ago

Right but that still involves pulling db into memory once, no? The digital ocean server has like 512mb RAM so it can't fit the db at all. Though I guess I could build the filter on my machine and port it over.