Closed jasonchanhku closed 6 years ago
Yes I agree we should move over to something like headless Chrome on morph.io now that PhantomJS is officially being archived.
I think whatever the new thing is should be:
Are there any other things to consider?
For archival purposes this is the announcement of phantomJS: ariya/phantomjs#15344
I'm not sure there is a software out there that supports all those languages and I'm also not sure that would be a good idea in the first place. I don't know enough about morph just yet to be super helpful but to the node community puppeteer which is headless webkit and SlimerJS which runs Gecko are the two that I would be looking into. For a scraper those two are more than enough to get you started with SPAs etc.
Here just an example of what it looks like to spin up puppeteer:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const height = await page.evaluate( () => document.documentElement.scrollHeight );
// etc
await browser.close();
})();
@jasonchanhku @dominikwilkowski thanks for the poke to start moving away from PhantomJS.
morph.io now supports Google Chrome headless which you can either use directly or use via webdriver. The documentation is super-sparse right now. See https://morph.io/documentation/scraping_javascript_sites. If you would be interested in helping out with the documentation that would be amazing.
@dominikwilkowski perhaps you would consider writing some documentation (and maybe an example scraper) for nodejs that uses puppeteer?
Thanks ! You guys are so efficient ! Cheers and happy easter
Hi guys,
It seems PhantomJS has been depreciated and hence I can't have selenium in my scraper.py script. Would Chromedriver support or another alternative be considered ? Would appreciate any feedback. Thanks.
/app/.heroku/python/lib/python3.6/site-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '