Closed justnom closed 12 years ago
http://stackoverflow.com/questions/9647277/phantomjs-and-pjscrape-failing-on-some-multiple-urls Above is the stack overflow link as well.
Could not reproduce. I just ran this several times, and was able to retrieve all of the URLs in the productUrls
list.
Okay - thank you for trying. I might have to code this using another framework.
Up to you :). Honestly, the weakness of pjscrape is that it depends on the stability of PhantomJS, which is still a work in progress. This sounds much more likely to be a PhantomJS issue.
I was thinking about modding your framework to work with a custom Chromium build. If you would be alright? From: Nick Rabinowitz Sent: 13/03/2012 19:45 To: justnom Subject: Re: [pjscrape] Failing on some multiple URLs (#13) Up to you :). Honestly, the weakness of pjscrape is that it depends on the stability of PhantomJS, which is still a work in progress. This sounds much more likely to be a PhantomJS issue.
Reply to this email directly or view it on GitHub: https://github.com/nrabinowitz/pjscrape/issues/13#issuecomment-4484316
It's Github! Fork away. But it sounds like you're aiming to recreate PhantomJS by building on Chromium, which might be a heck of a project, especially if you want it to be headless.
Yeah, that's basically what I would be doing, but would have a "preview" window so I can visually see the scraping. I don't need pictures to load, so cut down render time, but I will do a few tests for speed first just to confirm that it's going to be okay. I honestly don't think that it would be that bad, just registering a Chromium extension with the current context and binding that back to some native code to run the external JS when rendering has finished. I say that rather optimistically however!
On 13 March 2012 20:12, Nick Rabinowitz < reply@reply.github.com
wrote:
It's Github! Fork away. But it sounds like you're aiming to recreate PhantomJS by building on Chromium, which might be a heck of a project, especially if you want it to be headless.
Reply to this email directly or view it on GitHub: https://github.com/nrabinowitz/pjscrape/issues/13#issuecomment-4485051
where is the url array being executed? i've tried to put it inside pjs.addSuite but that didnt work out. any tips ?
Overview
I am trying to create a very basic scraper with PhantomJS and pjscrape framework.
My Code
URL Array's Used
This first array DOES NOT WORK and fails after the 3rd or 4th URL.
This second array WORKS and does not fail, even though it is from the same site.
Problem
When iterating through productURLs the PhantomJS page.open optional callback automatically assumes failure. Even when the page hasn't finished loading.
I know this as I started the script up while running an HTTP debugger and the HTTP request were still running even after PhantomJS had reported a a page load failure.
However, the code works fine when running with categoriesURLs.
Assumptions
Possible Solutions
These are solutions I have tried thus far.
Any ideas?