google / webrisk

Apache License 2.0
67 stars 34 forks source link

Lookup API - Support multiple URLs to test #66

Open drveresh opened 3 months ago

drveresh commented 3 months ago

If there is no option, can you please enhance the API to support testing multiple URLs against multiple threadTypes, via a single HTTP or API call?

This way it will significantly improve the performance and reduce the server load both at the client side API and Google WebRisk service side as well.

Currently, I have a requirement to validate almost 15-20K links per day, sometimes 50-100 new links within a minute, so it is really critical to validate multiple links in one short, as a batch call. Please accommodate this request.

Ref - https://cloud.google.com/web-risk/docs/reference/rest/v1beta1/uris/search

rvilgalys commented 3 months ago

Hi @drveresh!

Thanks for this feature request. It's something we will consider as we look at more API work, we are looking into how we can better handle large sets of URLs for batch processing like this. If you'd like to share any details of your use case, that can also better inform our understanding.

Currently, you can use the wrlookup binary that is part of this repo to do some batch processing. wrlookup is a client side tool that accepts any URLs sent via STDIN separated by line and will output the result to STDOUT.

You can run a batch through it via something like cat urls.txt | wrlookup -apikey=XXXYYY... or use [mkfifo](https://man7.org/linux/man-pages/man3/mkfifo.3.html) to run wrlookup in the background and set two files as a buffer.

I realize this isn't as convenient as an API that can take in large batches, but it might serve as a workaround.