openaustralia / yinyo

A wonderfully simple API driven service to reliably execute many long running scrapers in a super scaleable way
https://yinyo.io
Apache License 2.0
6 stars 1 forks source link

Limit log size #162

Open mlandauer opened 4 years ago

mlandauer commented 4 years ago

Because logs are kept in redis there is a definite limit to the size that they should sensibly be. This is not a bad thing anyway. We have similar limits in place on morph.io.

mlandauer commented 4 years ago

Doing a quick little back-of-the-envelope calculation: Our current production redis (elasticache) instance has about 500MB of memory. If we want to easily support up to 1000 concurrent scrapers we can't allow each scraper to use more than 500KB of data on redis. Say 250KB of that is dedicated to the streaming logs and then we have heaps of head-room.

250KB corresponds roughly to 2000 lines of 128 characters each. That's a pretty respectable number.

mlandauer commented 4 years ago

morph.io allows up to 10,000 log lines. If we assume 128 characters on each that gives us a total memory usage for the logs of about 1.2MB. With that memory usage we could probably support up to about 400 simultaneous scrapers.

It is of course easy enough to just get a redis instance with more memory but it's nice to know that we can support quite a lot of simultaneous users without going completely overboard with the size of the redis nodes.

mlandauer commented 4 years ago

Maybe we should pick a total size limit on the logs of 1MB. That's a nice round number and matches roughly with the current restriction in place on morph.io

mlandauer commented 1 year ago

Coming back to this issue after a long time away it seems to me that the fundamental bottleneck that is created by using redis (which stores everything in memory) is a good argument against using redis. Perhaps postgres with its support for streaming is a more sensible choice here.