Closed Abdelhady closed 5 years ago
Hello!
I can't reproduce, what version of MetaInspector are you using? I've tried with 5.3.1 and it goes fine, the demo also works:
I'm currently using 5.0.1, and actually the demo is not working, it is not giving any title/description/images or anything else!
try the non-mobile version of the same url, it will give you the correct results
It looks like the NY Times server is returning different content, probably based on the request IP.
You're right that the demo doesn't show anything now, but it did when I tried before.
Also, when I try from my development machine, I can get all the information:
2.3.1 :008 > p.title
=> "Mark Zuckerberg Is in Denial - NYTimes.com"
2.3.1 :009 > p.description
=> "CHAPEL HILL, N.C. — Donald J. Trump’s supporters were probably heartened in September, when, according to an article shared nearly a million times on Facebook, the candidate received an endorsement from Pope Francis. Their opinions on Hillary Clinton may have soured even further after reading a Denver Guardian article that also spread widely on Facebook, which reported days before the election that an F.B.I. agent suspected of involvement in leaking Mrs. Clinton’s emails was found dead in an apparent murder-suicide."
Are you trying from a server or from your dev machine? I suggest trying from a different computer and see if it works there, in that case I'm afraid we can't fix it in the code; it the remote server returns different content based on the location from where the request is made, then that's the only HTML we can parse.
You could also try setting a different User-Agent
string, maybe the server returns a different content based on that.
Well, the first time I figured it out was on our production environment located in "US East (N. Virginia)" region, but then my development machine gave the same empty results, at first I suspected the older version I'm using (5.0.1), that is why I've tried the demo which gave the same results to me,
I think it is somehow related to NY Times' mobile version, because their normal version is working fine with me in both production env. & dev. machine.
There's definitely something weird with that URL, now it's failing in my dev machine as well.
It still depends on what the server returns, which seems to be changing as it sometimes worked fine for me.
Now, what I see is a lot of scrambled text instead of a document:
2.2.4 :020 > p.url
=> "http://mobile.nytimes.com/2016/11/15/opinion/mark-zuckerberg-is-in-denial.html"
2.2.4 :021 > p.title
=> ""
2.2.4 :022 > p.to_s
=> "\u001F\x8B\b\u0000\u0000\u0000\u0000\u0000\u0000\u0003\xEC\xBDݒ\xDBH\xB6.v\xEF\xA7@s\xDCRqD\x80\u0000H\xF0\xA7JT\x9Fj\xB5zZ\xFBHݲ\xA4\xEE\x9E\xD9\xDA\u001A\u0005H\x82E\xB4@\x82\u0003\x80U*\x95jb\xDF\xF9\u0005\u001C\x8Ep\xC4\xF1#\xF8\xC2w\xBE\xF7\x9B\xECp\xF89\xBC\xBE\x95\t \xF1G\xB2J\xA59\xB3\u001D\xBD{O\t\u00042W\xAE\\\xB9\xFE\xF3\xEF\xE1W\xDF\xFD\xF4\xF8\xF5_^<і\xC9*x\xF4\u0010\u007F\xB5\xC0]\x9FMZ\u07BA\xA5\xCD\u00027\x8E'\xAD\u0016}\xF0\xDC\xF9\xA3\x87+/q\xA9d\xB2ѽ\
(...)
Proof that it's an intermittent error:
This Url can't be scrapped, and gives no results at all (may be because it is the mobile version of that website!)