Closed rochforp closed 8 years ago
Can you produce a sample URL which you have on a beacon?
Then I can give you a curl
command which uses our Physical Web Service (PWS) to resolve the url into metadata. All the PW clients use a PWS to actually get page metadata so the errors shouldn't be client specific.
If LinkedIn is doing something specific to block google-bot from crawling its pages, this could explain the issue.
Can the source code be loaded to my google app engine? @mmocny
@Jerren34 you can find a sample PWS hosted right in this project.
It is written using python GAE, which you can run yourself locally or publish to your GAE account.
The Physical Web standalone apps use a version of this sample PWS hosted by us on GAE, but the Physical Web feature of Chrome uses a different PWS backend.
I'm trying to upload it on app ego e right now but I'm having trouble. Do you mind helping me? Can get it to load and don't know what I'm doing wrong
On Thursday, January 14, 2016, Michal Mocny notifications@github.com wrote:
@Jerren34 https://github.com/Jerren34 you can find a sample PWS https://github.com/google/physical-web/tree/master/web-service hosted right in this project.
It is written using python GAE, which you can run yourself locally or publish to your GAE account.
The Physical Web standalone apps use a version of this sample PWS hosted by us on GAE, but the Physical Web feature of Chrome uses a different PWS backend.
— Reply to this email directly or view it on GitHub https://github.com/google/physical-web/issues/580#issuecomment-171865479 .
BT3 Viral Marketing Jerren Harrison CEO, Founder 313 704 3444
I have a Google appspot account I'm using now. I have downloaded app engine launcher, python 2.7, and app whine SDK but I can't get it up running. Mind helping me?
On Thursday, January 14, 2016, Jerren Harrison jerren34@gmail.com wrote:
I'm trying to upload it on app ego e right now but I'm having trouble. Do you mind helping me? Can get it to load and don't know what I'm doing wrong
On Thursday, January 14, 2016, Michal Mocny <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
@Jerren34 https://github.com/Jerren34 you can find a sample PWS https://github.com/google/physical-web/tree/master/web-service hosted right in this project.
It is written using python GAE, which you can run yourself locally or publish to your GAE account.
The Physical Web standalone apps use a version of this sample PWS hosted by us on GAE, but the Physical Web feature of Chrome uses a different PWS backend.
— Reply to this email directly or view it on GitHub https://github.com/google/physical-web/issues/580#issuecomment-171865479 .
BT3 Viral Marketing Jerren Harrison CEO, Founder 313 704 3444
BT3 Viral Marketing Jerren Harrison CEO, Founder 313 704 3444
@mmocny sure, this is a shortened linkedin profile page goo.gl/v6QFk9 . As you can see through the Physical Web app, it doesn't render the metadata like other broadcasted, shortened site urls. For iOS PhyWeb is showing the correct url but not pulling any metadata
. Android doesn't even show the link in the list of Physical Web beacons and Chrome enabled on iOS today screen is doing the same thing where it's not rendering any of the data.
@rochforp Have you seen examples of this with anything other than LinkedIn?
That particular landing page for goo.gl/v6QFk9 does not have the required meta data... On the landing page, there is no meta data for description. meta name="description" content=".... Without that you won't get the main thing, description, displayed. It does have the other tag needed..
Observations: PW list on iOS shows shortened URL, yet sometimes shows LANDING URL. Don't know why. PW list on Android shows LANDING URL normally, but on this one only shortened URL with loading... Seems stuck.
It seems the LinkedIn site isn't optimized for PW queries, yet.
@tolson2000 How would that differ from a bot coming along? Could we implement the same methods?
If LinkedIn is not optimized for a title/description/favicon PW query, we should probably augment the PW to include older query methods. I would suspect that LinkedIn is not unique.
@kahjav So far it's been isolated to just linkedin but I'm concerned that it's possibly indicative of a problem that might exist on other sites as well.
First, just a quick aside: Chrome for Android will only show results which link to https pages. It looks like your short url does redirect to https://www.linkedin.com/in/tlytle
so this should be fine. Chrome for iOS still supports non-https but we will be making a switch soon. The Physical Web app will likely continue to show all results.
Second, some of our clients will show the raw URL if we fail to fetch page metadata. The intention was that local-intranet-only or local-development-machine-only URLs, which our PWS could not fetch, should still be available. However, our direction these days is to just filter these results out and we will add some "advanced" views where you will be able to see all results. This is less developer friendly but more user friendly. This may explain why iOS Physical Web app shows just the green link, while other clients show nothing.
Finally, the root of the problem appears to be that the PWS is not resolving your url:
$ curl -k -s -H "Content-Type: application/json" -d '{"objects":[{"url":"http://goo.gl/v6QFk9"}]}' http://url-caster.appspot.com/resolve-scan | python -m json.tool
{
"metadata": []
}
I will attempt to debug and resolve this issue. Thank you for raising it!
This may not be the only bug, but it appears that LinkedIn returns a different page depending on the requesting user agent. The page being served to PWS right now may be an empty page with only inline javascript and no semantic page information at all.
$ curl -L http://goo.gl/v6QFk9
<html><head>
<script type="text/javascript">
window.onload = function() {
// Parse the tracking code from cookies.
var trk = "sentinel_org_block";
var cookies = document.cookie.split("; ");
for (var i = 0; i < cookies.length; ++i) {
if ((cookies[i].indexOf("trkCode=") == 0) && (cookies[i].length > 8)) {
trk = cookies[i].substring(8);
}
}
// Get the protocol for the redirect url.
var protocol = "http:";
if (window.location.protocol == "https:") {
protocol = "https:";
} else {
// If "sl" cookie is set, redirect to https.
for (var i = 0; i < cookies.length; ++i) {
if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {
window.location.href = "https:" + window.location.href.substring(window.location.protocol.length);
return;
}
}
}
// Get the new domain. For touch.www.linkedin.com or tablet.www.linkedin.com
// we strip "touch." and "tablet.". For international domains such as
// fr.linkedin.com, we convert it to www.linkedin.com
var domain = location.host;
if (domain.substr(0, 6) == "touch.") {
domain = domain.substr(6);
} else if (domain.substr(0, 7) == "tablet.") {
domain = domain.substr(7);
} else if (domain.charAt(2) == ".") {
domain = "www" + domain.substr(2);
}
// D8E90337EA is the tracking code proposed by Harsh, representing guest request redirected
// to login.
window.location.href = protocol + "//" + domain + "/uas/login?trk=" + trk + "&session_redirect=" +
encodeURIComponent(protocol + "//" + domain +
window.location.href.substr(window.location.href.search(window.location.host) +
window.location.host.length));
}
</script>
</head></html>
Our current PWS does not evaluate JavaScript. We do not run the page through a headless browser.
However, once I change the User-Agent header I get the real data:
$ curl -L -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -H "user-agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.48 Safari/537.36" https://www.linkedin.com/in/tlytle
<...Lots of HTML...>
We already set a custom user agent, but I will see about adjusting it slightly.
@mmocny When does/will Chrome check for https? Would non-https URL shorteners still be able to be utilized if the destination is https?
@kahjav The PW is specifically looking for the "description" for a page. Places like LinkedIn, Facebook, don't bother making a "description" for a 'users' info page. In the case of Facebook when they are hit by a bot their server will substitute a generic description for bots for their users. Affectively just saying JoeBlow is on Facebook. And then throw a commercial at you. Join now to connect. Kind of useless for a PW concept. Companies like realtors don't always have a "description" field for their employee pages. And for bots will substitute maybe something again like JobBlow is a realtor for ... If you want a page for your resume you need to make one specifically with the needed tags and meta data fields to work with PW. Probably what we need to do is ask LinkedIn to provide a meaningful "description" meta data.
Here's my test page. It's a shortend URL with http which redirects to a https on another server. Here again, though, on my iPOD is shows the shortened URL and on Android is shows the landing page URL. http://io.ivt.com/to9
EDIT: Forgot to mention on that test page there is no "description" meta data. Yet the PWS made a description from my h1 and other hl links. Not sure how that works.
@kahjav Coincidentally I just got this question! Yes, at the moment you can use http redirector to link to https destination. The https requirement is only on the final URL, and it is to protect the user (so compromised network cannot replace the page contents, effectively having Chrome send users to the wrong destination). Since the redirect loop is done on PWS we are less concerned about compromised networks.
Yeah, I don't know if I agree with having to have the landing page be https: That isn't always possible. Will you be allowing self-signed certs?
Alright, it looks like we are failing to resolve LinkedIn because they are explicitly blocking crawling: HTTP/1.1 999 Request denied
. I haven't nor intend to pursue ways to circumvent this.
@tolson2000 We don't currently do much certificate verification, but if you send a user to such a page the browser will likely put up a big red flag, so is unlikely to be a good idea outside of local testing. And for local testing we hope to make it easier to just show all results.
To be clear: the https requirement is not a big change to physical web or the Eddystone-URL BLE frame format. It is only a policy decision by the developers of Chrome browser for its integration of the Physical Web feature, and meant to safeguard its users.
I've noticed that when broadcasting an Eddystone-URL for some linkedIn profiles when using a url shortner like bit.ly or goo.gl, the title, description and favicon are not being populated in some Physical Web browsers like Chrome and PhyWeb. I'm trying to figure out why this is but it appears that this is something that's occuring a little bit differently depending on the mobile OS and the browser. Looked a little further and thought it could be occuring in PWMetadataRequest.m somewhere but then happened upon this stack overflow post (https://stackoverflow.com/questions/32450569/linkedin-isnt-letting-me-google-users-anymore-sentinel-org-block#) talking about the problems with the sentinal org tool they are using. Has anyone else noticed this or have any ideas about fixing this?