sumodm / semrider

Semantic Similarity: Browser Tabs
Other
1 stars 2 forks source link

UI: Browser becomes slow due to sync updating #3

Open sumodm opened 4 months ago

sumodm commented 4 months ago

In the current version (86f6f07), the browser update call is synchronous. Though this runs in the background (background.js) but nonetheless:

a. This slows down the browser, it is visibly slower. b. And does not 'complete' the page as loaded, until the embedding is done.

To fix this, we would indexing (creation of embedding) to be async w.r.t to the browser background task. Here are possible solutions (see here):

  1. redis/celery: One clean solution to this, would be to create a message broker (like say redis) and then add worker-manager (say celery) to run indexing in the background.

    • ADV: More fine-grained control on parallelism: we have more control on tasks is run, meaning we can choose the amount of parallel tasks etc, along with this being async.
    • DIS: Biggest issue is the challenges this may provide in long term overhead to be considered as well. Having to run 3 separate process's on the users desktop seems a bit unnecessary (esp since our small scale requirement). Also the challenges in installation of a three process application on desktop.
    • DIS: Given desktop based app and not server app, we don't have web-like scale. Since this runs on local machine of the user, the only interactions are with user. Thus the installation and engineering overhead of using a production grade queue and worker-manager is too much. eg: keeping redis/celery up-to-date w.r.t security etc, knowing ins & outs for younger developer, doing it their way etc. Thus the overhead might not be worth it.
  2. Multiprocessing/Flask-v2 async routes: Spawn another process (or create async route) and offload the task to it.

    • DIS: When many pages (with lot of text) are loaded quickly, this can lead to many compute-heavy process and conking the system.
    • DIS: Since we need to keep track of processed data (as opposed to say offloading a compute without write-back), we will need to have some sort of global variable or inter-process communication.
  3. Threading: Create threads for each task and run them concurrently.

    • DIS: Just a purely thread based also, can leads to conking of the system. So task queue seems inevitable.
  4. Custom Task Queue with Threading (concurrent/futures): Create a queue of tasks, with x amount of concurrent threads (in our case, we can keep it simple with x=1 for now, since there is not advantage to processing different pages in parallel). Now launch the tasks.

    • ADV: async + fine grained control on parallelism
    • ADV: Even if too many pages are added, it won't conk of system.
    • DIS: Need to manage locking mechanism etc of shared resources, when x>1.
    • DIS: Need to make load/save task-queue aware. That is wait for tasks to complete before you save and don't accept more tasks.
sumodm commented 4 months ago

This commit (8f51d557) address the same.