scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.04k stars 508 forks source link

Can't load Meteor based sites #1133

Open jmcta opened 2 years ago

jmcta commented 2 years ago

I can't seem to scrape Meteor based projects with Splash. A simple example would be https://simpletasks.meteorapp.com

I use the curl command:

curl 'http://avogadro.relativity:8050/render.json?url=https://simpletasks.meteorapp.com&console=1&wait=20&html=1'

and I get the response back:

{"url": "https://simpletasks.meteorapp.com/", "requestedUrl": "https://simpletasks.meteorapp.com/", "geometry": [0, 0, 1024, 768], "title": "Simple Tasks", "html": "<!DOCTYPE html><html><head>\n\n<title>Simple Tasks</title>\n <link href=\"https://fonts.googleapis.com/css2?family=Caveat:wght@700&amp;display=swap\" rel=\"stylesheet\">\n <meta charset=\"utf-8\">\n <meta http-equiv=\"x-ua-compatible\" content=\"ie=edge\">\n <meta name=\"viewport\" content=\"width=device-width, height=device-height, viewport-fit=cover, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no\">\n <meta name=\"mobile-web-app-capable\" content=\"yes\">\n <meta name=\"apple-mobile-web-app-capable\" content=\"yes\">\n\n</head>\n<body><div id=\"react-target\"></div>\n <script type=\"text/javascript\">__meteor_runtime_config__ = JSON.parse(decodeURIComponent(\"%7B%22meteorRelease%22%3A%22METEOR%402.1.1%22%2C%22gitCommitHash%22%3A%229b7cf3fdf3018736be291acc2a8c399d68f1460d%22%2C%22meteorEnv%22%3A%7B%22NODE_ENV%22%3A%22production%22%2C%22TEST_METADATA%22%3A%22%7B%7D%22%7D%2C%22PUBLIC_SETTINGS%22%3A%7B%7D%2C%22ROOT_URL%22%3A%22http%3A%2F%2Fsimpletasks.meteorapp.com%22%2C%22ROOT_URL_PATH_PREFIX%22%3A%22%22%2C%22reactFastRefreshEnabled%22%3Atrue%2C%22autoupdate%22%3A%7B%22versions%22%3A%7B%22web.browser%22%3A%7B%22version%22%3A%2280cb53f28c7d2f51a1825740adbcf2f29e0943fc%22%2C%22versionRefreshable%22%3A%221952018619999f014765d73c14db1f446971e849%22%2C%22versionNonRefreshable%22%3A%2280cb53f28c7d2f51a1825740adbcf2f29e0943fc%22%2C%22versionReplaceable%22%3A%221952018619999f014765d73c14db1f446971e849%22%7D%2C%22web.browser.legacy%22%3A%7B%22version%22%3A%227dba16d05ded85fb54074236c9cfac1f22104c80%22%2C%22versionRefreshable%22%3A%221952018619999f014765d73c14db1f446971e849%22%2C%22versionNonRefreshable%22%3A%227dba16d05ded85fb54074236c9cfac1f22104c80%22%2C%22versionReplaceable%22%3A%221952018619999f014765d73c14db1f446971e849%22%7D%7D%2C%22autoupdateVersion%22%3Anull%2C%22autoupdateVersionRefreshable%22%3Anull%2C%22autoupdateVersionCordova%22%3Anull%2C%22appId%22%3A%22j3mxcj0j3jfj.pzmhdxgvur2%22%7D%2C%22appId%22%3A%22j3mxcj0j3jfj.pzmhdxgvur2%22%2C%22isModern%22%3Afalse%7D\"))</script>\n\n <script type=\"text/javascript\" src=\"/f1fb38ad0788d00a84814d94682d71151717d54e.js?meteor_js_resource=true\"></script>\n\n\n\n</body></html>"}%

Which includes none of the run time DOM elements. I've tried with private browsing, and using the /execute endpoint with the script:

function main(splash, args)
  splash.private_mode_enabled = false
  assert(splash:go(args.url))
  assert(splash:wait(20.0))
  return {
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

Is there anything else I can try, or where else can I look?