google / physical-web

The Physical Web: walk up and use anything
http://physical-web.org
Apache License 2.0
6k stars 665 forks source link

forwarder URL caching #820

Closed ferencbrachmann closed 7 years ago

ferencbrachmann commented 8 years ago

We're developing a forwarder for beacon URLs and enable our customers to change the destination pages of their beacons without reconfiguring a beacon. Is there any documentation on how the caching of the destination URLs work right now? We'd like to understand how the caching mechnism works so we can educate our clients on how long the might have to wait for a redirect to go through Google.

angst7 commented 8 years ago

I'm interested in knowing more about the way PWS caching works, too. I'd asked a related caching question in #785, and there is more on this topic in #739. In my experience, the PWS appear to recheck the URL about every 5 minutes.

scottjenson commented 8 years ago

I'll let @mmocny give a more detailed answer but in general, caching behavior is not guaranteed. It could change depending on a wide range of factors we just can't predict. At the moment, it is 5 minutes, but keep in mind that if you're using Opera's scanner, they'll have a different caching story. While Opera's usage is currently small, we hope, eventually to have multiple scanners.

In the longer run, it would make sense to respect http cache headers so a) sites that don't change very often can be cached longer and b) there is some consistency between various scanners. It's unlikely that we'll go much shorter than 5 minutes for the simple reason that as the Physical Web grows, it becomes increasingly hard to keep it up to date. It's possible of course, just saying that it gets harder.

Of course, if there are any changes we could make that would make your life easier, please let us know but keep in mind that we'd like to do this from the web side of things, not through a specific scanner as that doesn't scale well.

scottjenson commented 7 years ago

@ferencbrachmann Was this answer sufficient or would you like more detail?

ferencbrachmann commented 7 years ago

@scottjenson The answer was sufficient but I'm still getting way too many inconsistencies in beacon discovery (from caching lag I presume). I'm talking to a client as we speak and trying to extract enough information for reporting an Issue.

itsMattShull commented 7 years ago

@ferencbrachmann our team has a short URL service with an API specifically for beacons. If you want you can have full access to it. It provides analytics as well. I know this isn't related to your question but just wanted you to know!

ferencbrachmann commented 7 years ago

@derekshull we have one too, thanks!

mmocny commented 7 years ago

Sorry for being late to the party here:

Regarding caching: Today PWS fetches from redirectors no more than once every 5 minutes, and uses cached results in between. We do respect Cache-Control headers in order to use more than 5 minute caching, but most redirectors ask not to be cached. So extended caching mostly applies only to destination pages.

Caveat: There is also client-side caching on users' phones. This is to make sure that phones don't use network calls too frequently. Today, with Nearby Notifications, it may take up to 1 hour to update client side caches. We know this can be an issue, and we are working on heuristics to decrease this latency. This Caveat only applies to clients that have already recently seen the beacon -- new clients will get the new value.

Regarding analytics: Sometimes PWS will fetch from redirector because someone is physically near a beacon -- but this is actually rare. Most user requests don't go to the redirector at all. Sometimes the PWS just periodically fetches from redirectors to prefetch caches without any user traffic at all.

Unfortunately, I cannot give firm specifications, because our fetch policies actually change all the time as we grow and experiment. Any "general long term trends" tracking is likely useless, because it is affected more by our backend changes than by real user behaviour.

As such, any analytics tracking at the redirector level will both significantly undercount and slightly overcount user data. It's probably very misleading.

Instead, I would recommend you can use real user click tracking by using analytics at the final landing page (either via web-server requests or via JS analytics packages). I also recommend augmenting beacon URLs to include specific analytics campaign (for example).

Hopefully this clarifies some of the existing behaviours, but please know that we are always trying to improve, and that means things will change. Improvements balance both what is best for users and not just best for beacon deployers.

Good luck!

angst7 commented 7 years ago

That's a big help. Thanks! This answers my questions from #785 so I'll mark that closed.

jsiebens commented 7 years ago

@angst7 How does this explains that URLs are being fetched in bursts from different devices? We are experiencing the same issue you described in #785

angst7 commented 7 years ago

@jsiebens I read the following:

Sometimes the PWS just periodically fetches from redirectors to prefetch caches without any user traffic at all.

as a possible explanation for PWS servers requesting outdated short URLs. It does not address the secondary issue of why those requests, coming from google IPs present multiple, varying user agents. I'll reopen #785 for you to add any info on this you have. I wasn't aware that others were running into the same issue.

mmocny commented 7 years ago

@angst7 I added more context to that bug specifically.

I do not think it is cache prefetching, since that would also use the Google-PhysicalWeb UserAgent -- but it may be other systems. Let's continue on that thread.

mmocny commented 7 years ago

Closing this bug since we've documented the caching befaviour as requested.

We may want to move something into a FAQ but I'm hesitant to put anything in stone, since this stuff changes over time.

shailesh17mar commented 7 years ago

@mmocny After going through your reply. I have two questions here. 1) Why doesn't Physical web provides data or some way to indicating the number of impressions particular beacon has served? Since it doesn't include tracking the user. 2) There are 2-3 Beacon companies like Estimote, Beaconstac which claim that they are able to track number of impressions. How do you think that's possible? Because as you mentioned I have monitored physical web logs on my nginx and it's just random.