Open danielelodola opened 8 years ago
Hello @danielelodola. Can you tell me which page & urls you are trying to scrape so I can assess the root of the problem please?
Hello; the pages require authentication in order to be accessed. How can I go about sharing the info with you?
Are you able to reproduce the issue on a different site? Other quick question, does the scraped content you are crawling with the spider need js execution on the page?
I have not tried to replicate the issue on other sites. I have been able to crawl the site I'm trying to extract data from however with a simpler scraper model (retrieving only one data element, without embedding recursive iterators). The content I'm trying to retrieve is not js generated and is plain and simple html.
And so the problem does not occur when not using recursive scrapers?
No it does not. I'm able to retrieve
company_name: {sel: '.container > .row > .col-xs-12 > .row > .col-xs-12 > h1.pull-left', method: function($) {return $(this).text().trim().replace(/[\(\)]/g, '').replace('XYZ','')}},
on multiple pages for example.
BTW, I have downloaded a local copy of a page if this can help.
Can you try to replace the scrape
recursive things with applying a function doing the scrape:
// Something along this:
{
field: function($) { return $(this).scrape(...); }
}
and tell me whether this works or not?
Noted, I will try this and keep you posted. Thanks for taking the time to look into this issue ;-).
Hi @Yomguithereal, no luck whatsoever with the function($) approach! I just can't wrap my brain around it.
For instance, instead of
company_details: {scrape: {iterator: 'ul.list-unstyled .wrap-1', data: 'text'}}
you can write
company_details: function($) {
return $('ul.list-unstyled .wrap-1').scrape();
}
I have the following spider :
It extracts the expected data (company_name, company_details, details_labels and details_values) as required, but only on ONE page. The spider is not actually crawling the list of URLs I provide it with.
Where am I going wrong?
Thanks a bunch for your help!
Dan