google / safebrowsing

Safe Browsing API Go Client
Apache License 2.0
470 stars 129 forks source link

Can you plz clarify the URL format that needs to be sent in the query? #101

Closed ealashwali closed 5 years ago

ealashwali commented 5 years ago

In the example of the API HTTP request URL entries in the query body, the URLs are represented in two different format (e.g. the second entry is with http:// and ends with /, while the first one without http://, https://and does not end with /.

Does adding http://, https:// or ending the URL with / make difference? I tested with and without them and there seems no difference. But I wonder, how this is the case while the hashes with or without them is different. Can you plz briefly clarify?

Here is the example in your home page that I am referring to:


"threatEntries": [
            {"url": "google.com"},
            {"url": "http://testsafebrowsing.appspot.com/apiv4/ANY_PLATFORM/MALWARE/URL/"}
        ]
colonelxc commented 5 years ago

First, there is a canonicalization step, which ensures that URLs have at least a path of "/" (even if none was supplied, as in the google.com case

After canonicalization, the given url is turned into a variety of prefix/suffix expression combinations as described here: https://developers.google.com/safe-browsing/v4/urls-hashing#suffixprefix-expressions

Those expressions (which are what is actually hashed), do not have the scheme (http://) in them.

ealashwali commented 5 years ago

So am I understanding correctly that if I supplied a list with https:// leading the check will be ok. I guess so, ut I will appreciate confirming.