Open shlomitsur opened 6 months ago
Hi @shlomitsur,
Thanks for letting us know about this -- one possibility for the increase is that there have been recent backend changes to Safebrowsing & Web Risk that expanded our blocklist coverage to protect against more unsafe URLs. This additional coverage means there are more hash-prefixes that need to be looked up via API calls so the blocklist status can be verified, and so this could also mean higher number of API calls.
If you are trying to limit spend, a couple suggestions:
If you only require a subset of these threat types, you can limit your selection running wrserver
with the --threatTypes=...
arg. For example docker run <...> wr-container --threatTypes=SOCIAL_ENGINEERING,SOCIAL_ENGINEERING_EXTENDED_COVERAGE
would just look at the two social engineering threat types.
Other tips we could give might be more use-case specific.
We are also looking into ways to make the container more efficient for api calls, particularly in allowing them to share a cache -- are you running multiple instances of the container that might benefit from this? Or are there a collection of the same URLs that you need to look up repeatedly?
Thank you @rvilgalys! I would also want to know about operating in multiple regions: we are using AWS and we have Oregon and Frankfurt regions. Can we run a webrisk docker per region using the same api key and somehow share cache to reduce cost? Thanks!
We don't yet support a shared cache but it's part of the roadmap -- likely we will have an option to set a Redis target as a shared cache, and if set the client will use that alongside it's own local cache.
One feature I did just enable was the use of maxDatabaseEntries
and maxDiffEntries
. These were already part of our API but hadn't been included in this client, but if you want further control over the blocklist size (and indirectly control spend on API call lookups), you can set an upper limit with maxDatabaseEntries
.
See details in https://github.com/google/webrisk?tab=readme-ov-file#configuration
I'm having this issue as well. What would be some good values to set maxDatabaseEntries and maxDiffEntries at to try to still get relatively up to date blocklists and lower costs @rvilgalys?
Thank you @rvilgalys!
@corey-Robinson1337 sorry I don't have any guidance on this -- we just noticed the API supported these limits but our client here didn't have a way to set them.
The purpose these limits is a carryover from the noncommercial Safebrowsing API and was mainly intended to run the Safebrowsing API in resource-limited environments (like on mobile devices). I'm not sure how (or if) the hashes sent in a constrained blocklist response get prioritized.
We have a quick update on this:
Our team recently identified about 1.5M URLs we believe were out of date and could be safely removed from the Hash Prefix lists. Over the course of this week those patterns were removed from the threatList diffs.
Hopefully this will also helps cut down on spend while offering the same protection.
Thanks again for helping bring this issue to our attention @shlomitsur @corey-Robinson1337
thanks @rvilgalys for the update
hi Our company has receive almost 100% increase in webrisk bills in the last 2 months. We are using the docker version in Kubernetes. We are trying to understand what can be the cause for that.
Thank you