Closed Muscot closed 8 years ago
can we get this checked in?
I tried your solution and still doesn't work for this example here: It works if you comment out the uprice. If you try to crawl then you get a nice "undefined is not a URL"
x('https://www.simply.es/compra-online/especiales-2.html', {
title: ['title'],
items: x('.listing ul', [{
main: '.descripcionProducto',
link: '.descripcionProducto@href',
uprice: x('.descripcionProducto a@href', '.precioKilo')
}])
})(function (err, obj) {
if (err) { console.log(err); }
console.log(obj);
});
UPDATE: if you remove the square brackets also works.
Hi,
On the "link" it seems like descripcionProducto is a anchor and on the uprice you search for a anchor in ".descripcionProducto"?
Shouldn't it be something like this? couldn't access the www.simply.es url.
x('https://www.simply.es/compra-online/especiales-2.html', { title: ['title'], items: x('.listing ul', [{ main: '.descripcionProducto', link: '.descripcionProducto@href', uprice: x('.descripcionProducto@href', '.precioKilo') }]) })(function (err, obj) { if (err) { console.log(err); } console.log(obj); });
Hi @Muscot , Excellent that works perfect! you made my day
Cheers, r
I'm glad I could help! Cheers!
so you think guys we can have this merged and published?
just used this on another project and it is still broken.. Do we just need to up this test coverage in order for this to get checked in?
@matthewmueller Friendly ping. Would you consider merging this PR? It appears that the package is terribly broken for advanced usage since 2.0.3, see #189. I guess that fellow devs have no choice but to stick to forks or roll back to 2.0.2.
Just going to also throw in a friendly reminder here. This is the only real game changer for me in terms of functionality compared to rolling my own implementation with request + cheerio. Having the ability to follow nested links is key for simple pagination/scraping implementations.
Can we push this through?
sorry for the delay, thanks for your help @Muscot !
fixed in 2.3.1
if anyone wants to help maintain this library, so we can push these fixes through faster, let me know :-)
Description
I think I fixed this issue, I also added a follow.js example to test. Nested crawling broken on 'master'. When to merge 'bugfix/nested-crawling' #111
x('http://www.imdb.com/', { title: ['title'], links: x('.rhs-body .rhs-row', [{ text: 'a', href: 'a@href', next_page: x('a@href', { title: 'title', heading: 'h1' }) }]) })(function(err, obj) { console.log(obj); });
Checklist