Open valpackett opened 6 months ago
This is part of a function, not JSON:
<script>
(function() {
var random = Math.random();
var mirrorScript = document.createElement("script");
mirrorScript.src = "//sofire.bdstatic.com/js/xaf3.js" + '?v=' + random;
mirrorScript.setAttribute('async', 'async');
mirrorScript.setAttribute('data-bdms-faccdee21b68', '...')
var firstScriptDom = document.getElementsByTagName("script")[0];
firstScriptDom.parentNode.insertBefore(mirrorScript, firstScriptDom);
})();
</script>
That's exactly what I said. Baidu replaced the JSON that used to be there with this JS code.
@valpackett What I think is happening is they changed the AI URLs to just redirect to the homepage, and the scraper is blindly following the redirect and then trying to parse the homepage as JSON with a simple string search for {
and }
, which just happens to be part of a JS function. You can see this with curl (using your URL you provided):
$ curl http://ai.wenku.baidu.com/play/503c103c25c52cc58bd6be92\?pn\=1\&rn\=5
<a href="https://wenku.baidu.com/">Moved Permanently</a>.
Seems like instead of JSON the server now returns a JS snippet returned that loads another JS file…