Closed nodeGarden closed 11 years ago
Yeah, you won't be able to invoke anything involving Pjscrape from inside a scraper function - it's completely sandboxed, with no access to the PhantomJS environment.
You want the moreUrls option - see any of the "recursive" tests for examples (https://github.com/nrabinowitz/pjscrape/tree/master/tests)
Sorry to bump this issue but I too am having issues running a nested scrape. I have looked at the moreUrls option and the recursive tests examples but still cannot seem to get it to work.
I don't get, how do you pipe the urls from the first scrape to the moreUrls parameter ? Is it at all possible ?
Thanks Chris
moreUrls
takes a function, which is executed in the remote context and should return a list of URL strings - this isn't working?
(It can also just take a selector, in the simple case.)
Hi,
I got the moreUrls part extracting the urls but it seems there is some ajax/javascript happening, because I cannot open them manually in a browser. (Prob why it's not working)
The page loads the result dynamically, can I trigger a click event for each link element ? Then extract the data and continue with the next one? (Page.goBack()??)
Thank you for your help. Chris
If they aren't really URLs (i.e. PhantomJS can't load them), then you'll likely need to handle the entire thing within your scraper function, triggering the click and return events there. Hard to offer more help w/o seeing the site in question.
Hi thanks for the reply,
I am trying to extract the google trends page items. I cant get all categories fine like this.
pjs.addSuite({
url: 'http://www.google.ca/trends/topcharts',
scraper: function()
{
return $('.topcharts-category-charts-container').children().map(function()
{
return _pjs.toFullUrl($(this).find("a.topcharts-smallchart-title-link").attr("href"));
}).toArray();
}
});
But If I try to run it with moreUrls to get the category details:
pjs.addSuite({
url: 'http://www.google.ca/trends/topcharts',
moreUrls: function() {
return $('.topcharts-category-charts-container').children().map(function()
{
return _pjs.toFullUrl($(this).find("a.topcharts-smallchart-title-link").attr("href"));
}).toArray();
},
scraper : function(){
return $('.common-title-text').first().text();
}
});
I get page did not load errors, well thats normal since google does not let you load the urls directly. Any ideas or suggestions ? Thanks
I'm trying to figure out how to do a nested scrape which relies on data from the first scraper in the second.
I'm pulling the Artist names from: http://www.billboard.com/artists/top-100?page=0 This part works:
I then want to go into the individual Artist's page and grab the top songs: http://www.billboard.com/artist/371422/taylor-swift
Individually this works too:
but what I want to get is the return from scrape #2 as a part of the return for scrape #1, so that it looks more like:
When I try and nest, then it says
_name
and_url
are undefined.Result:
I see the note on the Documentation page about the private scope, and I don't quite understand how to apply the
evaluate
suggested. I guess the question is: Is there a work around to this, or is there another way to accomplish the above?