Open opme opened 1 year ago
I was reading the code save_stories_from_feed in tasks.py and it looks to be making one database call per feed entry to check for duplicates.
normalized_url_exists could be replaced by a single call to the database to check all feed entries at once.
There could a function call getValidFeedEntries that would apply the logic existing in save_stories_from_feed that skips invalid entries.
Then a single database call to identify what is duplicate and then bulk insert and commit.
If it sounds reasonable I can give it a try. This looks to be the eventual bottleneck of this implementation?
I was reading the code save_stories_from_feed in tasks.py and it looks to be making one database call per feed entry to check for duplicates.
normalized_url_exists could be replaced by a single call to the database to check all feed entries at once.
There could a function call getValidFeedEntries that would apply the logic existing in save_stories_from_feed that skips invalid entries.
Then a single database call to identify what is duplicate and then bulk insert and commit.
If it sounds reasonable I can give it a try. This looks to be the eventual bottleneck of this implementation?