internetarchive / Zeno

State-of-the-art web crawler 🔱
GNU Affero General Public License v3.0
83 stars 11 forks source link

Add pipeline control mechanism to instantiate, pause/resume and stop #170

Closed equals215 closed 4 days ago

equals215 commented 6 days ago

@CorentinB waiting on you to confirm the architecture/mechanism of this package then :

CorentinB commented 6 days ago

@CorentinB waiting on you to confirm the architecture/mechanism of this package then :

Honestly, very good job. It's very clean and it looks good to me.

equals215 commented 6 days ago

@CorentinB pause implemented on every stage (pre,archiver,post,finisher) but not on HQ/reactor as those 2 steps will not produce anything in the system if the pipeline is paused.
Feel free to implement inside controler a routine that get spawned at runtime and checks multiple params (disk usage, etc.) and pause the pipeline if needed as I can't manage to find the previous code.
Also pause/resume are unit tested and E2E tested but not tested while running