openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
461 stars 74 forks source link

First draft of a pain-points overview #1211

Open jamezpolley opened 5 years ago

rbtcollins commented 5 years ago

So I'm not sure I'd call this a points of pain overview so much as good docs ;P.

We've discussed a bunch offline; two thoughts that are really only relevant here: your scheduler failure modes are simple bugs; they should be fixed in-situ I think, because that can be done quickly (to whit: don't de-and-requeue things when there is no work slot available - thats not the task failing; use system metrics to inform work slot availability (e.g. if there is io overload, don't schedule more work); immediately place work when slots are freed up (e.g. schedule work immediately at the end of your cleanup of a work slot), cap exponential backoff (e.g. at 5 minutes), discard work after (say) 10 attempts, and finally implement a quick-reset mechanism to zero the queue and allow an immediate restoration of service without mucking around.