Closed vishaalgc closed 7 years ago
just have a look to those pointers https://github.com/thumbor/thumbor/wiki/Scaling-Thumbor
Yes, I have looked into them, they have mentioned scaling in one server of thumbor using supervisord, but any idea of scaling by using two thumbor servers so that if one goes down the request can be fulfilled by another server.
You can put before ours servers a load balancer that 'll guarantee the distribution flow. I use this in my local thumbor farm (i'm using AWS to do this)
You also can use other resources loader, for example thumbor-aws, for alternatives for shared folders (NAS).
You can create a service in front of thumbor service to serve your images direct from our static farm (It's like a cache server).
I got your method. What i thought of for a normal setup is using two load balancers by HA proxy over two thumbor servers running in seperate machines and using NFS to share the file system holding the images in thumbor server.
Is this correct ? Any other optimal solution you can think of ? I dont have S3 storage
I couldnt understand the second and third points youve mentioned above properly as im doing this for this first time. Can you elaborate and look into my setup too once ?
NFS would do the trick but does not scale very large.
puclic IPS -> haproxy (w/ failover mecanism) -> varnish (for cache) -> haproxy -> thumbor -> HTTP (for read only w/ loader) or NFS (r/w) or S3 (plugin,r/w) or redis (plugin, r/w) or mongodb (plugin,r/w) lustre, CEPH ...
redundancy at each level.
Any reference documentation related to this on how people are doing it outside i can look into as per your knowledge ?
actually this is a classic scalable web architecture (for the front part) only storage backend is specific to thumbor and you can choose in lots of plugins.
im using the built in thumbor file system storage in my application whenever i upload an image. So i think sharing the file system base folder over to NFS, (r/w) would be okay as then it would sync between two servers. Any ideas on better load balancers than ha proxy you can suggest me to look into ?
If you have to do by yourself, I don't know better than haproxy for loadbalancing ...
I have got the whole picture now. Im looking into NFS and how to implement shared storage for file system, and im also looking for rsync to sync data at a time between my two thumbor servers for data consistency. Any ideas on alternatives for shared storage between two servers to maintain data consistency ?
rsync is not required as you seams to use NFS for storage ?
@vishaalgc Are you using two servers two leverage load balancing only?
Any ideas on alternatives for shared storage between two servers to maintain data consistency ?
PopKey used redis instances as result_storage
and storage
which allowed several servers to access the same image data.
Originals where stored in S3 and fetched over HTTP.
Did anyone test this? The question is not weather two (or any number of) servers can access the same shared storage, but will they use it? Assuming one server caches the image (original or result) in some shared storage will the others know to check for it before re-fetching or re-creating it? Is there a cache map in Thumbor to track items in cache, or does it just "test always"?
@nkrgovic Thumbor will always check storages in order: result_storage, storage, loader.
There is the possibility of a race condition between multiple servers accessing the same resource. In such case each server process and cache but subsequent requests would be available in the storages.
@nkrgovic The cacheing mechanisms for thumbor do a test first. In the debug logs you'll se something along the lines of result_storage.hit: 0
or result_store.hit: 1
. If you are sharing caches then you should be fine. I recommend using a CDN in front of thumbor though. Currently doing that in production with great results.
OK, so this means that using a non-HA shared cache (say a single Mongo or Redis server) will still work - a failure will just make the cache miss? That's cool.
A failure will also introduce degraded performance because Thumbor has no "circuit-breaker" logic.
Im looking for setting up two seperate thumbor servers with shared folder methodology(High Availability) and load balancing over the setup. Any ideas/ projects of implementing the high availability and load balancing over the thumbor ?