Closed summera closed 6 years ago
Right now I'm using EC2 with m4.large machines with an 8 Gb boot volume and a 22 Gb Docker storage volume. This is a larger cluster not used exclusively for gglsbl-rest, so even though I don't typically get multiple gglsbl-rest containers on the same host, the only real limitation for that will be the disk space. I'm setting the memory reservation for 2 Gb and the hard limit for 4 Gb.
That setup has given me zero problems handling large request volumes with two containers and very low response times:
Since Fargate allows to use up to 10 Gb of Docker layer storage (as per https://docs.aws.amazon.com/AmazonECS/latest/developerguide/AWS_Fargate.html#fargate-task-defs) it should be perfectly possible to run gglsbl-rest there as well. I would go for at least two CPUs and 4 Gb of RAM (so that most of the database file is cached in RAM).
If you try that out, let me know how it goes.
By the way, @afilipovich tells me that performance can be further improved by loading the database file in a RAM disk. I haven't tested it myself. So you might want to explore using the tmpfs Linux parameter for /home/gglsbl/db
.
Again, if you get this to work please let me know how it works.
I can confirm that placing Sqlite file on tmpfs
can increase performance several times compared to fast SSD. Compared to HDD it is an order of magnitude.
@asieira @afilipovich Thanks a lot for the info! I messed around a bit with running it on a t2.medium
but could only get one task running. A second wouldn't have enough resources on the same machine. Also tried with Fargate and 4GB of memory. You can't set tmpfs
on Fargate and I'm not sure what it's using underneath.
Somewhat unrelated to this particular issue but are you seeing accurate results from the API when running this in production? Are you catching most of the spam or is a lot making it through? I'm attempting to protect a url shortener and as I mentioned in https://github.com/google/safebrowsing/issues/30#issuecomment-378805724 and https://github.com/google/safebrowsing/issues/30#issuecomment-378807286, the results I'm seeing from simple tests aren't great. I'm not sure whether this is because the results from the API are more up to date than https://transparencyreport.google.com or vice versa but if it's not catching much it doesn't seem worth the cost.
Interesting, didn't realize tmpfs
was not available on Fargate. @rfranco did you know about this?
I do know those results can be different, what I can tell you is that I haven't noticed any problem when using the API. I do think the Transparency Report page uses more than just the Google Safe Browsing API data, though. Maybe @afilipovich has more info on that.
@summera, could you please provide a few URLs that show different results with gglsbl
and https://transparencyreport.google.com ?
@afilipovich no problem. The ones you see in the screenshots in https://github.com/google/safebrowsing/issues/30#issuecomment-378805724 and https://github.com/google/safebrowsing/issues/30#issuecomment-378807286 are two examples. I don't want to paste the urls directly as I've seen it send the github email notification to spam before.
@summera I will suggest that you open this as an issue directly in https://github.com/afilipovich/gglsbl and will close this one, ok? Hope the ECS guidance was able to help you ggslsbl-rest.
@asieira yep, thanks for the help and info!
@asieira would you mind explaining the configuration settings you've used to run
gglsbl-rest
on ECS? For example:Any other information you think would be helpful to get started such as autoscaling settings would be great. ECS is pretty new to me.
Thanks in advance!